Geometric Linear Algebra, Volume I - Hsiung Lin, Yixiong Lin

G EOM ET RI C
LINEAR ALGEBRA
Volume 1
This page intentionally left blank
GEOMETRIC
LINEAR ALGEBRA
Volume 1
I-Hsiung Lin
National Taiwan Normal University, China
We World Scientific
NEW JERSEY · LONDON · SINGAPORE · BEIJING · SHANGHAI · HONG KONG · TAIPEI · CHENNAI
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data

Lin, Yixiong.
Geometric linear algebra / I-hsiung Lin.
p. cm.
Includes bibliographical references and indexes.
ISBN 981-256-087-4 (v. 1) -- ISBN 981-256-132-3 (v. 1 : pbk.)
1. Algebras, Linear--Textbooks. 2. Geometry, Algebraic--Textbooks. I. Title.
QA184.2.L49 2005
512'.5--dc22
2005041722
British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.
Copyright © 2005 by World Scientific Publishing Co. Pte. Ltd.

All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
Printed in Singapore.
To the memory of my
grandparents, parents
and to my family
PREFACE
What is linear algebra about?

Many objects such as buildings, furniture, etc. in our physical world are
delicately constituted by counterparts of almost straight and flat shapes
which, in geometrical terminology, are portions of straight lines or planes.
A butcher’s customers ordered the meat in various quantities, some by
mass and some by price, so that he had to find answers to such questions
as: What is the cost of 2/3 kg of the meat? What mass a customer should
have if she only wants to spend 10 dollars? This can be solved by the linear
relation y = ax or y = ax+b with b as a bonus. The same is, when traveling
abroad, to know the value of a foreign currency in term of one’s own. How
many faces does a polyhedron with 30 vertices and 50 edges have? What is
the Fahrenheit equivalent of 25◦ C? One experiences numerous phenomena
in daily life, which can be put in the realms of straight lines or planes or
linear relations of several unknowns.
To start with, the most fundamental and essential ideas needed in
geometry are
1. (directed) line segment (including the extended line),
2. parallelogram (including the extended plane)
and the associated quantities such as length or signed length of line segment
and angle between segments.
The algebraic equivalence, in global sense, is linear equations such as
a11 x1 + a21 x2 = b1
or
a11 x1 + a21 x2 + a31 x3 = b1
and simultaneous equations composed of them. The core is how to deter-
mine whether such linear equations have a solution or solutions, and if so,
how to find them in an effective way.
vii
viii Preface
Algebra has operational priority over geometry, while the latter provides
intuitively geometric motivation or interpretations to results of the former.
Both play a role of head and tail of a coin in many situations.
Linear algebra is going to transform the afore-mentioned geometric ideas
into two algebraic operations so that solving linear equations can be handled
linearly and systematically. Its implication is far-reaching and its applica-
tion is widely-open and touches almost every field in modern science. More
precisely,
−
a directed line segment AB → a vector x;
ratio of signed lengths of (directed) line segments along the same line
−
PQ
− = α → y = α x , scalar multiplication of x by α.

AB
See Fig. P.1.
P
Q
B y
A
x
Fig. P.1
Hence, the whole line can be described algebraically as αx while α runs
through the real numbers. While, the parallelogram in Fig. P.2 indicates
− − −
that directed segments OA and BC represent the same vector x , OB and
−
AC represent the same vector y so that
−
the diagonal OC → x +y , the addition of vectors
x and
y.
B C
y
y x+y
A
O x
Fig. P.2
As a consequence, the whole plane can be described algebraically as the

linear combinations α x + β y where α and β are taken from all the real
numbers. In fact, parallelograms provide implicitly as an inductive process
to construct and visualize higher dimensional spaces. One may imagine the
line OA acting as an (n − 1)-dimensional space, so that x is of the form
x1 + · · · + αn−1
α1 xn−1 . In case the point C is outside the space, y cannot
Preface ix
be expressed as such linear combinations. Then the addition α

y +
x will
raise the space to the higher n-dimensional one.
As a whole, relying only on
α
x, α ∈ R (real field) and

x+y
with appropriate operational properties and using the techniques:
linear combination,
linear dependence, and
linear independence
of vectors, plus deductive and inductive methods, one can develop and
establish the whole theory of Linear Algebra, even formally and in a very
abstract manner.
The main theme of the theory is about linear transformation which
can be characterized as the mapping that preserves the ratio of the signed
lengths of directed line segments along the same or parallel lines. Linear
transformations between finite-dimensional vector spaces can be expressed
as matrix equations xA = y , after choosing suitable coordinate systems as
bases.
The matrix equation xA = y has two main features. The static struc-

ture of it, when consider y as a constant vector b , results from solving

algebraically the system x A = b of linear equations by the powerful and
useful Gaussian elimination method. Rank of a matrix and its factorization
as a product of simpler ones are the most important results among all.
Rank provides insights into the geometric character of subspaces based on
the concepts of linear combination, dependence and independence. While
factorization makes the introduction of determinant easier and provides
preparatory tools to understand another feature of matrices. The dynamic
structure, when consider y as a varying vector, results from treating A as
a linear transformation defined by x → xA = y . The kernel (for homoge-
neous linear equations) and range (for non-homogeneous linear equations)
of a linear transformation, dimension theorem, invariant subspaces, diago-
nalizability, various decompositions of spaces or linear transformations and
their canonical forms are the main topics among others.
When Euclidean concepts such as lengths and angles come into play, it
is the inner product that combines both and the Pythagorean Theorem or
orthogonality dominates everywhere. Therefore, linear operators y =xA
x Preface
are much more specified and results concerned more fruitful, and provide
wide and concrete applications in many fields.
Roughly speaking, using algebraic methods, linear algebra investigates
the possibility and how of solving system of linear equations, or geomet-
rically equivalent, studies the inner structures of spaces such as lines or
planes and possible interactions between them. Nowadays, linear algebra
turns out to be an indispensable shortcut from the global view to the local
view of objects or phenomena in our universe.
The purpose of this introductory book

The teaching of linear algebra and its contents has become too algebraic and
hence too abstract in the introduction of main concepts and the methods
which are going to become formal and well established in the theory. Too
fast abstraction of the theory definitely scares away many students whose
majors are not in mathematics but need linear algebra very much in their
careers.
For most beginners in a first course of linear algebra, the understanding
of clearer pictures or the reasons why to do this and that seems more urgent
and persuasive than the rigorousness of proofs and the completeness of the
theory. Understanding cultivates interestingness to the subject and abilities
of computation and abstraction.
To start from one’s knowledge and experience does enhance the under-
standing of a new subject. As far as beginning linear algebra is concerned,
I strongly believe that intuitive, even manipulatable, geometric objects or
concepts are the best ways to open the gate of entrance. This is the momen-
tum and the purpose behind the writing of this introductory book. I tried
before (in Chinese), and I am trying to write this book in this manner,
maybe not so successful as originally expected but away from the conven-
tional style in quite a few places (refer to Appendix B).
This book is designed for beginners, like freshman and sophomore or
honored high school students.
The general prerequisites to read this book are high-school algebra and
geometry. Appendix A, which discuss sets, functions, fields, groups and
polynomials, respectively, are intended to unify and review some basic ideas
used throughout the book.
Features of the book

Most parts of the contents of this book are abridged briefly from my seven
books on The Introduction to Elementary Linear Algebra (refer to [3–7],
Preface xi
published in Chinese from 1982 to 1984, with the last two still unable to
be published until now). I try to write the book in the following manner:
1. Use intuitive geometric concepts or methods to introduce or to motivate

or to reinforce the creation of abstract or general theory in linear algebra.
2. Emphasize the geometric characterizations of results in linear algebra.
3. Apply known results in linear algebra to describe various geometries
based on F. Klein’s Erlanger’ point of view.
Therefore, in order to vivify these connections of geometries with linear

algebra in a convincing argument, I focus the discussion of the whole book
on the real vector spaces R1 , R2 and R3 endowed with more than 500 graphic
illustrations. It is in this sense that I label this book the title as Geometric
Linear Algebra. Almost each section is followed by a set of exercises.
4. Usually, each set of Exercises contains two parts: <A> and . The
former is designed to familiarize the readers with or to practice the
established results in that section, while the latter contains challenging
ones whose solutions, in many cases, need some knowledge to be exposed
formally in sections that follow. In addition to these, some set of Exer-
cises also contain parts <C> and <D>. <C> asks the readers to try
to model after the content and to extend the process and results to vec-
tor spaces over arbitrary fields. <D> presents problems concerned with
linear algebra, such as in real or complex calculus, differential equations
and differential geometry, etc. Let such connections and applications of
linear algebra say how important and useful it is.
The readers are asked to do all problems in <A> and are encouraged to
try part in , while <C> and <D> are optional and are left to more
mature and serious students.
No applications outside pure mathematics are touched and the needed
readers should consult books such as Gilbert Strang’s Linear Algebra and
its Application.
Finally, three points deviated from most existed conventional books on
linear algebra should be cautioned. One is that chapters are divided accord-
ing to affine, linear, and Euclidian structures of R1 , R2 and R3 , but not
according to topics such as vectors spaces, determinants, etc. The other is
that few definitions are formal and most of them are allowed to come to
the surface in the middle of discussions, while main results obtained after a
discussion are summarized and are numbered along with important formu-
las. The third one is that a point x = (x1 , x2 ) is also treated as a position
xii Preface

vector from the origin 0 = (0, 0) to that point, when R2 is considered as a
two-dimensional vector space, rather than the common used notation

x1 x1
or .
x2 x2
As a consequence of this convention, when a given 2 × 2 matrix A is con-
sidered to represent a linear transformation on R2 to act on the vector
x,
we adopt x A and treat x as a 1 × 2 matrix but not A x2 to denote the
x1
image vector of x under A, unless otherwise stated. Similar explanation is

valid for
x A where x ∈ Rm and A is an m × n matrix, etc.
In order to avoid getting lost and for a striking contrast, I compensate
Appendix B and title it as Fundamentals of Algebraic Linear Algebra for
the sake of reference and comparison.
Ways of writing and how to treat Rn for n ≥ 4

The main contents are focused on the introduction of R1 , R2 and R3 , even
though the results so obtained and the methods cultivated can almost be
generalized verbatim to Rn for n ≥ 4 or finite-dimension vector spaces over
fields and, in many occasions, even to infinite-dimensional spaces.
As mentioned earlier, geometric motivation will lead the way of intro-
duction to the well-established and formulated methods in the contents. So
the general process of writing is as follows:
geometric objects in (Stage one)

←−−−−−−−−−−→ simple algebraic facts or relations
R1 , R2 and R3 transform to (most in linear forms) in one, two
equivalent form
or three variables or unknowns
|
|
|
(Stage two) generalized
imitation | formally to
|
|
? ?
geometric meanings ← (Stage three)
→ algebraic facts or relations in n
or interpretations or variables or unknowns (in order to
applications handle them, sophisticated
algebraic manipulations are
needed to be cultivated).
In most cases, we leave Stages two and three as Exercises <C> for mature
students.
Preface xiii
As a whole, we can use the following problem

Prove the identity 12 + 22 + 32 + · · · + n2 = 16 n(n + 1)(2n + 1), where n is
any natural number, by the mathematical induction.
as a model to describe how we probe deeply into the construction and
formulation of topics in abstract linear algebra. We proceed as follows. It
is a dull business at the very beginning to prove this identity by testing
both sides in cases n = 1, n = 2, . . . and then supposing both sides equal to
each other in case n = k and finally trying to show both sides equal when
n = k + 1. This is a well-established and sophiscated way of arguments, but
it is not necessarily the best way to understand thoroughly the implications
and the educational values this problem could provide. Instead, why not
try the following steps:
1. How does one know beforehand that the sum of the left side is equal to
1
6 n(n + 1)(2n + 1)?
2. To pursue this answer, try trivial yet simpler cases when n = 1, 2, 3 and
even n = 4, and then try to find out possible common rules owned by
all of them.
3. Conjecture that the common rules found are still valid for general n.
4. Try to prove this conjecture formally by mathematical induction or some
other methods.
Now, for n = 1, take a “shadow” unit square and a “white” unit square
and put them side by side as Fig. P.3:
area of shadow region 12 1 2·1+1

2 = = = .
1 area of the rectangle 2·1 2 6·1
Fig. P.3
For n = 2, use the same process and see Fig. P.4; for n = 3, see Fig. P.5.
12 area of shadow region 12 + 22 5 2·2+1

22
= = = .
area of the rectangle 3·4 12 6·2
Fig. P.4
xiv Preface
12
22 area of shadow region 12 + 22 + 32 14
= =
32 area of the rectangle 4·9 36
7 2·3+1
Fig. P.5 = = .
18 6·3
This suggests the conjecture
12 + 22 + 32 + · · · + n2 2n + 1
2
=
(n + 1)n 6n
1
⇒ 12 + 22 + 32 + · · · + n2 = n(n + 1)(2n + 1).
6
It is approximately in this manner that I wrote the contents of the book, in
particular, in Chaps. 1, 2 and 4. Of course, this procedure is roundabout,
overlapped badly in some cases and even makes one feel impatiently and
sick. So I tried to summarize key points and main results on time. But I do
strongly believe that it is a worthy way of educating beginners in a course
of linear algebra.
Well, I am not able to realize physically the existence of four or higher
dimensional spaces. Could you? How? It is algebraic method that convinces
us properly the existence of higher dimensional spaces. Let me end up this
puzzle with my own experience in the following story.
Some day in 1986, in a Taoism Temple in eastern Taiwan, I had a face-
to-face dialogue with a person epiphanized (namely, making its presence
or power felt) by the God Nuo Zha (also esteemed as the Third Prince in
Chinese communities):
I asked: Does God exist?
Nuo Zha answered: Gods do exist and they live in spaces, from dimension
seven to dimension thirteen. You common human being
lives in dimension three, while dimensions four, five and
six are buffer zones between human being and Gods.
Also, there are “human being” in underearth, which
are two-dimensional.
I asked: Does UFO (unfamiliar objects) really exist?
Nuo Zha answered: Yes. They steer improperly and fall into the three-
dimensional space so that you human being can see
them physically.
Believe it or not!
Preface xv
Sketch of the contents

Catch a quick glimpse of the Contents or Sketch of the content at the
beginning of each chapter and one will have rough ideas about what might
be going on inside the book.
Let us start from an example.
Fix a Cartesian coordinate system in space. Equation of a plane in space
is a1 x1 + a2 x2 + a3 x3 = b with b = 0 if and only if the plane passes through
the origin (0, 0, 0). Geometrically, the planes a1 x1 + a2 x2 + a3 x3 = b and
a1 x1 + a2 x2 + a3 x3 = 0 are parallel to each other and they will be coinci-
dent by invoking a translation. Algebraically, the main advantage of a plane
passing the origin over these that are not is that it can be vectorized as a
two-dimensional vector space v2 = {α1
v1 , v1 +α2 v2 | α1 , α2 ∈ R}, where

v1 and v2 are linear independent vectors lying on a1 x1 + a2 x2 + a3 x3 = 0,
while a1 x1 + a2 x2 + a3 x3 = b is the image x0 + v2 , called an
v1 ,
affine plane, of v1 , v2 under a translation x → x0 +

x where x0 is
a point lying on the plane. Since any point in space can be chosen as
the origin or as the zero vector, unnecessary distinction between vector
and affine spaces, except possibly for pedagogic reasons, should be empha-
sized or exaggerated. This is the main reason why I put the affine and
linear structures of R1 , R2 and R3 together as Part 1 which contains
Chaps. 1–3.
When the concepts of length and angle come into our mind, we use inner
product , to connect both. Then the plane a1 x1 + a2 x2 + a3 x3 = b can be
characterized as x − a = 0 where
x0 , a = (a1 , a2 , a3 ) is the normal vector
to the plane and x − x 0 is a vector lying on the plane which is determined by

the points x0 and x in the plane. This is Part 2, the Euclidean structures
of R2 and R3 , which contains Chaps. 4 and 5.
In our vivid physical world, it is difficult to realize that the parallel
planes a1 x1 + a2 x2 + a3 x3 = b (b = 0) and a1 x1 + a2 x2 + a3 x3 = 0 will
intersect along a “line” within our sights. By central projection, it would be
reasonable to imagine that they do intersect along an infinite or imaginary
line l∞ . The adjoint of l∞ to the plane a1 x1 + a2 x2 + a3 x3 = b constitutes
a projective plane. This is briefly touched in Exs. of Sec. 2.6 and
Sec. 3.6, Ex. of Sec. 2.8.5 and Sec. 3.8.4.
Changes of coordinates from x = (x1 , x2 ) to y = (y1 , y2 ) in R2 :
y1 = a1 + a11 x1 + a21 x1

y2 = a2 + a12 x1 + a22 x2 or y =
x0 +
x A,
xvi Preface

a11 a12
where A = a21 a22
with a11 a22 − a12 a21 = 0 is called an affine
transformation and, in particular, an invertible linear transformation if

x0 = (a1 , a2 ) = 0 . This can be characterized as a one-to-one mapping from
R2 onto R2 which preserves ratios of line segments along parallel lines
(Secs. 1.3, 2.7, 2.8 and 3.8). If it preserves distances between any two
points, then called a rigid or Euclidean motion (Secs. 4.8 and 5.8). While

y = σ( x A) for any scalar σ = 0 maps lines onto lines on the projective
plane and is called a projective transformation (Sec. 3.8.4). The invariants
under the group (Sec. A.4) of respective transformations constitute what
F. Klein called affine, Euclidean and projective geometries (Secs. 2.8.4, 3.8.4,
4.9 and 5.9).
As important applications of exterior products (Sec. 5.1) in R3 , elliptic
geometry (Sec. 5.11) and hyperbolic geometry (Sec. 5.12) are introduced in
the same manner as above. These two are independent of the others in
the book.
Almost every text about linear algebra treats R1 trivially and obviously.
Yes, really it is and hence some pieces of implicit information about R1
are usually ignored. Chapter 1 indicates that only scalar multiplication of a
vector is just enough to describe a straight line and how the concept of linear
dependence comes out of geometric intuition. Also, through vectorization
and coordinatization of a straight line, one can realize why the abstract
set R1 can be considered as standard representation of all straight lines.
Changes of coordinates enable us to interpret the linear equation y = ax+b,
a = 0, geometrically as an affine transformation preserving ratios of segment
lengths. Above all, this chapter lays the foundation of inductive approach
to the later chapters.
Ways of thinking and the methods adopted to realize them in Chap. 2
constitute a cornerstone for the development of the theory and a model to
go after in Chap. 3 and even farther more. The fact that a point outside a
given line is needed to construct a plane is algebraically equivalent to say
that, in addition to scalar multiplication, the addition of vectors is needed in
order, via concept of linear independence and method of linear combination,
to go from a lower dimensional space (like straight line) to a higher one
(like plane). Sections 2.2 up to 2.4 are counterparts of Secs. 1.1 up to 1.3
and they set up the abstract set R2 as the standard two-dimensional real
vector space and changes of coordinates in R2 . The existence of straight
lines (Sec. 2.5) on R2 implicitly suggests that it is possible to discuss vector
and affine subspaces in it. Section 2.6 formalizes affine coordinates and
Preface xvii
introduces another useful barycentric coordinates. The important concepts

of linear (affine) transformation and its matrix representation related to
bases are main theme in Secs. 2.7 and 2.8. The geometric behaviors of
elementary matrices considered as linear transformations are investigated in
Sec. 2.7.2 along with the factorization of a matrix in Sec. 2.7.5 as a product
of elementary ones. While, Secs. 2.7.6–2.7.8 are concerned respectively with
diagonal, Jordan and rational canonical forms of linear operators. Based
on Sec. 2.7, Sec. 2.8.3 collects invariants under affine transformations and
Sec. 2.8.4 introduces affine geometry in the plane. The last section Sec. 2.8.5
investigates affine invariants of quadratic curves.
Chapter 3, investigating R3 , is nothing new by nature and in content
from these in Chap. 2 but is more difficult in algebraic computations and in
the manipulation of geometric intuition. What should be mentioned is that,
basically, only middle-school algebra is enough to handle the whole Chap. 2
but I try to transform this classical form of algebra into rudimentary ones
adopted in Linear Algebra which are going to become sophisticated and
formally formulated in Chap. 3.
Chapters 4 and 5 use inner product , to connect concepts of length and
angle. The whole theory concerned is based on the Pythagorean Theorem
and orthogonality dominates everywhere. In addition to lines and planes,
circles (Sec. 4.2), spheres (Sec. 5.2) and exterior product of vectors in
R3 (Sec. 5.1) are discussed. One of the features here is that we use geo-
metric intuition to define determinants of order 2 and 3 and to develop
their algebraic operational properties (Secs. 4.3 and 5.3). An important
by-product of nonnatural inner product (Secs. 4.4 and 5.4) is orthogo-
nal matrix. Therefore, another feature is that we use geometric methods
to prove SVD for matrices of order 2 and 3 (Secs. 4.5 and 5.5), and
the diagonalization of symmetric matrices of order 2 and 3 (Secs. 4.7
and 5.7). Euclidean invariant and geometry are in Secs. 4.9 and 5.9.
Euclidean invariants of quadratic curves and surfaces are in Secs. 4.10
and 5.10. As companions of Euclidean (also called parabolic) geometry,
elliptic and hyperbolic geometries are sketched in Secs. 5.11 and 5.12,
respectively.
Notations
Sections denoted by an asterisk (∗ ) are optional and may be omitted.
[1] means the first book listed in the Reference, etc.
A.1 means the first section in Appendix A, etc.
xviii Preface
Section 1.1 means the first section in Chap. 1. So Sec. 4.3 means the
third section in Chap. 4, while Sec. 5.9.1 means the first subsection of
Sec. 5.9, etc.
Exercise <A> 1 of Sec. 1.1 means the first problem in Exercises <A>
of Sec. 1.1, etc.
(1.1.1) means the first numbered important or remarkable facts or sum-
marized theorem in Sec. 1.1, etc.
Figure 3.6 means that the sixth figure in Chap. 3, etc. Fig. II.1 means
the first figure in Part 2, etc. Figure A.1 means the first figure in Appendix
A; similarly for Fig. B.1, etc.
The end of a proof or an Example is sometimes but not always marked
by for attention.
For details, refer to Index of Notations.
Suggestions to the readers (how to use this book)

The materials covered in this book are rich and wide, especially in Exer-
cises <C> and <D>. It is almost impossible to cover the whole book in a
single course on linear algebra when being used as a textbook for beginners.
As a textbook, the depth and wideness of materials chosen, the degree
of rigorousness in proofs and how many topics of applications to be covered
depend, in my opinion, mainly on the purposes designed for the course
and the students’ mathematical sophistication and backgrounds. Certainly,
there are various combinations of topics. The instructors always play a
central role on many occasions. The following possible choices are suggested:
(1) For honored high school students: Chapters 1, 2 and 4 plus Exer-
cises <A>.
(2) For freshman students: Chapters 1, 2 (up to Sec. 2.7), 3 (up to Sec. 3.7),
4 (up to Sec. 4.7 and Sec. 4.10) and/or 5 (up to Sec. 5.7 and Sec. 5.10)
plus, at least, Ex. <A>, in a one-academic-year three-hour-per-week
course. As far as teaching order, one can adopt this original arrange-
ment in this book, or after finishing Chap. 1, try to combine Chaps. 2
and 3, 4 and 5 together according to the same titles of sections in each
chapter.
(3) For sophomore students: Just like (2) but contains some selected prob-
lems from Ex. .
(4) For a geometric course via linear algebra: Chapters 1, 2 (Sec. 2.8),
3 (Sec. 3.8), 4 (Sec. 4.8) and 5 (Secs. 5.8–5.12) in a one-academic-year
three-hour-per-week course.
Preface xix
(5) For junior and senior students who have had some prior exposure to
linear algebra: selective topics from the contents with emphasis on
problem-solving from Exercises <C>, <D> and Appendix B.
Of course, there are other options up to one’s taste.

In my opinion, this book might better be used as a reference book or
a companion one to a formal course on linear algebra. In my experience of
teaching linear algebra for many years, students often asked questions such
as, among many others:
1. Why linear dependence and independence are defined in such way?

2. Why linear transformation (f (
x + y ) = f (
x ) + f (
y ), f (α,
x ) = αf (
x ))
is defined in such way? Is there any sense behind it?
3. Does the definition for eigenvalue seem so artificial and is its main pur-
pose just for symmetric matrices?
Hence, all one needs to do is to cram up the algebraic rules of computation

and the results concerned, pass the exams and get the credits. That is all.
It is my hope that this book might provide a possible source of geometric
explanation or introduction to abstract concept or results formulated in
linear algebra (see Features of the book). But I am not sure that those
geometric interpretations appeared in this book are the most suitable ones
among all. Readers may try and provide a better one.
From Exercises <D>, readers can find possible connections and appli-
cations of linear algebra to other fields of pure mathematics or physics,
which are mentioned briefly near the end of the Sketch of the content
from Chap. 3 on.
Probably, Answers and Hints to problems in Exercises <A>, and
<C>, especially the latter two, should be attached near the end of the
book. Anyway, I will prepare them but this takes time.
This book can be used in multiple ways.
Acknowledgements
I had the honor of receiving so much help as I prepared the manuscripts of
this book.
Students listed below from my classes on linear algebra, advanced cal-
culus and differential geometry typed my manuscripts:
1. Sophomore: Shu-li Hsieh, Ju-yu Lai, Kai-min Wang, Shih-hao Huang,

Yu-ting Liao, Hung-ju Ko, Chih-chang Nien, Li-fang Pai;
xx Preface
2. Junior: S. D. Tom, Christina Chai, Sarah Cheng, I-ming Wu, Chih-

chiang Huang, Chia-ling Chang, Shiu-ying Lin, Tzu-ping Chuang, Shih-
hsun Chung, Wan-ju Liao, Siao-jyuan Wang;
3. Senior: Zheng-yue Chen, Kun-hong Xie, Shan-ying Chu, Hsiao-huei
Wang, Bo-wen Hsu, Hsiao-huei Tseng, Ya-fan Yen, Bo-hua Chen,
Wei-tzu Lu;
while
4. Bo-how Chen, Kai-min Wang, Sheng-fan Yang, Shih-hao Huang,

Feng-sheng Tsai
graphed the figures, using GSP, WORD and FLASH; and
5. S. D. Tom, Siao-jyuan Wang, Chih-chiang Huang, Wan-ju Liao, Shih-

hsun Chung, Chia-ling Chang
edited the initial typescript. They did these painstaking works voluntar-
ily, patiently, dedicatedly, efficiently and unselfishly without any payment.
Without their kind help, it is impossible to have this book coming into
existence so soon. I’m especially grateful, with my best regards and wishes,
to all of them.
And above all, special thanks should be given to Ms Shu-li Hsieh and
Mr Chih-chiang Huang for their enthusiasm, carefulness, patience and con-
stant assistance with trifles unexpected.
Teaching assistant Ching-yu Yang in the Mathematics Department,
provided technical assistance with computer works occasionally.
Prof. Shao-shiung Lin of National Taiwan University, Taipei reviewed the
inital typescript and offered many valuable comments and suggestions for
improving the text. Thank you both so much.
Also, thanks to Dr. K. K. Phua, Chairman and Editor-in-Chief, World
Scientific, for his kind invitation to join this book in their publication, and
to Ms Zhang Ji for her patience and carefulness in editing the book, and
to these who might help correcting the English.
Of course, it is me who should take the responsibility of possible errata
that remain. The author welcomes any positive and constructive comments
and suggestions.
I-hsiung Lin
NTNU, Taipei, Taiwan
June 21, 2004
CONTENTS
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Volume One
Part 1: The Affine and Linear Structures of R1 , R2 and R3
Chapter 1 The One-Dimensional Real Vector Space R (or R1 ) . . . . . . . 5

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Sketch of the Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Vectorization of a Straight Line: Affine Structure . . . . . . . . . . . . 7
1.2 Coordinatization of a Straight Line: R1 (or R) . . . . . . . . . . . . . . 10
1.3 Changes of Coordinates: Affine and Linear Transformations
(or Mappings) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Affine Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Chapter 2 The Two-Dimensional Real Vector Space R2 . . . . . . . . . . . 21

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 (Plane) Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Vectorization of a Plane: Affine Structure . . . . . . . . . . . . . . . . . 30
2.3 Coordinatization of a Plane: R2 . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Changes of Coordinates: Affine and Linear Transformations
(or Mappings) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Straight Lines in a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.6 Affine and Barycentric Coordinates . . . . . . . . . . . . . . . . . . . . 70
2.7 Linear Transformations (Operators) . . . . . . . . . . . . . . . . . . . . 81
2.7.1 Linear operators in the Cartesian coordinate system . . . . . . . 86
2.7.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.7.3 Matrix representations of a linear operator in various bases . . . 114
2.2.4 Linear transformations (operators) . . . . . . . . . . . . . . . . . 135
2.7.5 Elementary matrices and matrix factorizations . . . . . . . . . . 148
2.7.6 Diagonal canonical form . . . . . . . . . . . . . . . . . . . . . . . 186
2.2.7 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . 218
2.7.8 Rational canonical form . . . . . . . . . . . . . . . . . . . . . . . 230
xxi
xxii Contents
2.8 Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

2.8.1 Matrix representations . . . . . . . . . . . . . . . . . . . . . . . . 239
2.8.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
2.8.3 Affine invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
2.8.4 Affine geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
2.8.5 Quadratic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Chapter 3 The Three-Dimensional Real Vector Space R3 . . . . . . . . . . 319

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
3.1 Vectorization of a Space: Affine Structure . . . . . . . . . . . . . . . . 322
3.2 Coordinatization of a Space: R3 . . . . . . . . . . . . . . . . . . . . . . 326
3.3 Changes of Coordinates: Affine Transformation (or Mapping) . . . . . 335
3.4 Lines in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
3.5 Planes in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
3.6 Affine and Barycentric Coordinates . . . . . . . . . . . . . . . . . . . . 361
3.7 Linear Transformations (Operators) . . . . . . . . . . . . . . . . . . . . 365
3.7.1 Linear operators in the Cartesian coordinate system . . . . . . . 365
3.7.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
3.7.3 Matrix representations of a linear operator in various bases . . . 406
3.7.4 Linear transformations (operators) . . . . . . . . . . . . . . . . . 435
3.7.5 Elementary matrices and matrix factorizations . . . . . . . . . . 442
3.7.6 Diagonal canonical form . . . . . . . . . . . . . . . . . . . . . . . 476
3.7.7 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . 511
3.7.8 Rational canonical form . . . . . . . . . . . . . . . . . . . . . . . 558
3.8 Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
3.8.1 Matrix representations . . . . . . . . . . . . . . . . . . . . . . . . 579
3.8.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
3.8.3 Affine invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
3.8.4 Affine geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
3.8.5 Quadrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Appendix A Some Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . 681

A.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
A.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
A.3 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
A.4 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
A.5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
Appendix B Fundamentals of Algebraic Linear Algebra . . . . . . . . . . . 691

B.1 Vector (or Linear) Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 691
B.2 Main Techniques: Linear Combination, Dependence and Independence 695
B.3 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
B.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Contents xxiii
B.5 Elementary Matrix Operations and Row-Reduced Echelon Matrices . 719

B.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
B.7 Linear Transformations and Their Matrix Representations. . . . . . 732
B.8 A Matrix and its Transpose . . . . . . . . . . . . . . . . . . . . . . . 756
B.9 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
B.10 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . 790
B.11 Diagonalizability of a Square Matrix or a Linear Operator . . . . . . 793
B.12 Canonical Forms for Matrices: Jordan Form and Rational Form . . . 799
B.12.1 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . 799
B.12.2 Rational canonical form . . . . . . . . . . . . . . . . . . . . . 809
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
Index of Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
PART 1
The Affine and Linear Structures of R1 , R2 and R3
Introduction
Starting from intuitively geometric objects, we treat
1. a point as a zero vector,

2. a directed segment (along a line) as a vector, and
3. two directed segments along the same or parallel lines as the same vector
if both have the same length and direction.
And hence, we define two vector operations: scalar multiplication α x and

addition x + y and develop their operational properties. In the process,
we single out the linear combination, dependence and independence among
vectors as the main tools and establish the affine structures on a line, a plane
and a space, respectively. Then, we extract the essence of concepts obtained
and formulate, via rough ideas of linear isomorphism, the abstract sets
R1 , R2 and R3 as the standard one-dimensional, two-dimensional and three-
dimensional vector spaces over the real field, respectively. So far, changes
of coordinates in the same space are the most prominent results among all,
which indicates implicitly the concepts of affine and linear transformations.
Then, we focus our attention to these mappings between spaces that
preserve the ratios of signed lengths of segments along the same or parallel
lines. They are affine transformations (see Secs. 1.4, 2.7 and 2.8), in parti-
cular, linear transformations if they map the zero vector into the zero vector
when the spaces concerned are considered as vector spaces.
The main themes will be topics on linear transformations or, equiva-
lently, real matrices of order m × n, where m, n = 1, 2, 3, such as:
1. Eigenvalues and eigenvectors (Secs. 2.7.1, 2.7.2, 3.7.1 and 3.7.2).

2. Various decompositions, for example, elementary matrix factorizations,
LU, LDU, LDL∗ and LPU, etc. (Secs. 2.7.5 and 3.7.5).
3. Rank, Sylvester’s law of inertia (Secs. 2.7.1, 2.7.5, 3.7.1 and 3.7.5).
1
2 The Affine and Linear Structures of R1 , R2 and R3
4. Diagonalizability (Secs. 2.7.6 and 3.7.6).

5. Jordan and rational canonical forms (Secs. 2.7.7, 2.7.8, 3.7.7 and 3.7.8).
A suitable choice of basis for the kernel or/and the image of a linear trans-
formation will play a central role in handling these problems.
Based on results about linear transformations, affine transformations
are composed of linear transformations followed by translations. We discuss
topics such as:
1. Stretch, reflection, shearing, rotation and orthogonal reflection and their

matrix representations and geometric mapping properties (Secs. 2.8.2
and 3.8.2).
2. Affine invariants (Secs. 2.8.3 and 3.8.3).
3. Affine geometries, including sketches of projective plane P 2 (R) and pro-
jective spaces P 3 (R) (Ex. of Sec. 2.6, Sec. 2.8.4, Ex. of
Sec. 2.8.5 and Sec. 3.8.4).
4. Quadratic curves (Sec. 2.8.5) and quadrics (Sec. 3.8.5).
Chapter 1 is trivial in content but is necessary in the inductive process.

Chapter 2 is crucial both in content and in method. Methods in Chap. 2 are
essentially the extensions of geometric and algebraic ones which are learned
from the middle school mathematical courses, in particular, the ways of
solving simultaneous linear equations and their geometric interpretations.
Hence, methods adopted in Chap. 2 play a transitive role from the middle
school ones to the more sophisticated and well-established ones used in
nowadays linear algebra, as will be seen and formulated in Chap. 3. In
short, methods and contents in Chap. 2 can be considered as buffer zones
between classical middle school algebra and modern linear algebra.
Based on our inductive construction and description about linear
algebras on R1 , R2 and R3 , we hope that readers will possess enough solid
foundations, both in geometric intuition and in algebraic manipulation.
Thus, they can foresee, realize and construct what the n-dimensional vector
space Rn (n ≥ 4) and the linear algebras on it are, even on the more abstract
vector spaces over fields. For this purpose, we have arranged intensively
problems in Exercises and <C> for minded readers to practice, and
Appendix B for reference.
The use of matrices (of order m × n, m, n = 1, 2, 3) and determinants
(of order m, m = 2, 3) comes to surface naturally as we proceed without
introducing them beforehand in a particularly selected section. Here in
The Affine and Linear Structures of R1 , R2 and R3 3
Part 1, we emphasize the computational aspects of matrices and determi-

nants. So, the needed readers should consult Sec. B.4 for matrices and
Sec. B.6 for determinants. Sections 4.3 and 5.3 will formally introduce
the theory of determinants of order 2 and 3, respectively, via geometric
considerations.
CHAPTER 1
The One-Dimensional Real Vector Space R (or R1 )
Introduction
Our theory starts from the following simple geometric
Postulate A single point determines a unique zero-dimensional (vector)
space.
Usually, a little black point or spot is used as an intuitively geometric
model of zero-dimensional space. Notice that “point” is an undefined term
without length, width and height.
In the physical world, it is reasonable to imagine that there exits two
different points. Hence, one has the
Postulate Any two different points determine one and only one straight
line.
A straightened loop, extended beyond any finite limit in both directions,
is a lively geometric model of a straight line. Mathematically, pick up two
different points O and A on a flat paper, imagining extended beyond any
limit in any direction, and then, connect O and A by a ruler. Now, we have
a geometric model of an unlimited straight line L (see Fig. 1.1).
L
A
Fig. 1.1
As far as the basic concepts of straight lines are concerned, one should
know the following facts (1)–(6).
5
6 The One-Dimensional Real Vector Space R (or R1 )
(1) There are uncountably infinite points on L.

(2) The straight line determined by any two different points on L coin-
cides with L.
(3) Any two points P and Q on L decide a segment, denoted by P Q:
1. If P = Q (i.e. P and Q coincide, and represent the same point),
then the segment P Q degenerates into a single P (or Q);
2. If P = Q (i.e. P and Q are different points), then P Q consists of
those points on L lying between P and Q (included).
(4) If one starts from point P , walking along L toward point Q, then
−
one gets the directed segment P Q; if from Q to P , reversing the direction,
−
one has the directed segment QP (see Fig. 1.2).
L
Q
P
A
O
Fig. 1.2
(5) Arbitrarily fix two different points O and A on line L. Consider the

segment OA as one unit in length. Then, one should be able to measure
the distance between any two points P and Q on line L or the length of the
segment P Q. As usual, distance and length are always non-negative real
numbers.
In order to extend the mathematical knowledge we have up to now, here
we introduce the signed length of a segment P Q as follows:
1. Let P = Q, then P Q has length zero;
2. Let P = Q,
− −
P Q has length > 0 ⇔ P Q has the same direction as OA;
− −
P Q has length < 0 ⇔ P Q has the opposite direction as OA.
−
Therefore, the direction of OA is designated as the positive direction of the
−
line L with O as its origin, while AO the negative direction.
Remark For convenience, we endow P Q with two meanings: one repre-

sents the segment with endpoints P and Q, the other represents the length
−
of that segment. Similarly, P Q has two meanings too: the directed segment
from P to Q and the signed length of that segment.
1.1 Vectorization of a Straight Line: Affine Structure 7
Therefore, finally we have

(6) For any three points P , Q and R on line L, their signed lengths
− − −
P Q, QR and P R always satisfy the following identity:
− − −
P Q + QR = P R (see Fig. 1.3).
P Q R P R Q P=R Q
Fig. 1.3
Sketch of the Content

Based on these facts, this chapter contains four sections, trying to vectorize
(Sec. 1.1) and coordinatize (Sec. 1.2) the straight line, and studying the
linear changes between different coordinate systems (Sec. 1.3). Invariants
under affine transformation are discussed in Sec. 1.4.
The main result is that, under coordinatization, a straight line can
be considered as a concrete geometric model of the real number system
(field) R. Hence, R is an abstract representation of the one-dimensional
vector space over the real field R.
Our introduction to two-dimensional (Chap. 2) and three-dimensional
(Chap. 3) vector spaces will be modeled after the way we have treated here
in Chap. 1.
1.1 Vectorization of a Straight Line: Affine Structure

Fix a straight line L.
−
We provide a directed segment P Q on line L as a (line) vector. If
− −
P = Q, P Q is called a zero vector, denoted by 0 . On the contrary, P Q
is a nonzero vector if P = Q.
− −− − −−
Two vectors P Q and P Q are identical, i.e. P Q = P Q .
⇔ 1. P Q = P Q (equal in length),
2. “the direction from P to Q (along L)” is the same as “the direction
from P to Q ”. (1.1.1)
We call properties 1 and 2 as the parallel invariance of vector (see Fig. 1.4).
L
P Q P' Q'
Fig. 1.4
In particular, for any points P and Q on L, one has
− −
P P = QQ = 0 . (1.1.2)
Hence, zero vector is uniquely defined.
Now, fix any two different points O and X on L. For simplicity, denote
−−
the vector OX by x , i.e.
−−
x = OX.

Note that x = 0 .
−
For any fixed point P on L, the ratio of the signed length OP with
−−
respect to OX is
−
OP
−− = α,
OX
where the real number α has the following properties:
α = 0⇔P = O;
0 < α < 1⇔P lies on the segment OX (P = O, X);
α > 1⇔P and X lie on the same side of O and OP > OX;
α = 1⇔P = X; and
α < 0⇔P and X lie on the different sides of O.
In all cases, designate the vector
− −−
OP = αOX = α
x.
On the other hand, for any given α ∈ R, there corresponds one and only
−
one point P on line L such that OP = α x holds (see Fig. 1.5).
P
X
O αx
x
Fig. 1.5
Summarize as
The Vectorization of a straight line
Fix any two different points O and X on a straight line L and denote the
−−
vector OX by x . Then the scalar product α
x of an arbitrary real number
1.1 Vectorization of a Straight Line: Affine Structure 9
α ∈ R with the fixed vector x is suitable for describing any point P on L

−
(i.e. the position vector OP ). Call the set
x | α ∈ R}
L(O; X) = {α
−
the vectorized space of the line L with O as the origin, OO = 0 as zero

vector and x as the base vector. Elements in L(O; X) are called (line) vectors
which have the following algebraic operation properties: α, β ∈ R,
1. (α + β)
x = αx + β x = β x + α
x;

2. (αβ) x = α(β x ) = β(α x );
3. 1
x = x;
−
4. Let 0 = OO, then α x + 0 = 0 + α x = α
x;

5. (−α) x = −α x ; x + (− x ) = 0 = x −

x;

6. 0
x = α0 = 0.
−
In short, via the position vector OP = α x for any α, points P on L have
the above algebraic operation properties. (1.1.3)
Using the concept of (1.1.3), one can establish the algebraic characteri-
zation for three points lying on the same line.
−−
Suppose that points O, X and Y are collinear. Let x = OX and
−
y = OY .

In case O = X = Y : then x = y = 0 , and hence x = α y or y = αx
holds for any α ∈ R.

In case two of O, X and Y coincide, say X = O = Y : then y = 0 and

y = 0 x holds.
If O, X and Y are different from each other, owing to the fact that Y
lies on the line determined by O and X, y belongs to L(O; X). Hence, there
exists α ∈ R such that y = αx.
We summarize these results as
Linear dependence of line vectors
−− −
Let
x = OX and
y = OY . Then
(1) (geometric) Points O, X and Y are collinear.
⇔ (2) (algebraic) There exists α ∈ R such that y = αx or
x = α y.
⇔ (3) (algebraic) There exist scalars α, β ∈ R, not all equal to zero,

such that αx + βy = 0.
In any one of these three cases, vectors
x and
y are said to be linear
dependent (on each other). (1.1.4)
As contrast to linear dependence, one has

x = 0 ⇔ 1◦ α = 0 (
α x may not be 0 ); or
◦
2 x = 0 (α may not be 0).
Therefore, we have
Linear independence of a nonzero vector
Let OX = x . Then
(1) (geometric) Points O and X are different.

⇔ (2) (algebraic) If there exists α ∈ R such that α
x = 0 , then it is
necessarily that α = 0.
In any of these situations, vector

x is said to be linear independent (from
any other vector, whatsoever!). (1.1.5)
That is to say, a single nonzero vector must be linearly independent.
Exercises
<A>
1. Interpret geometrically 1 to 6 in (1.1.3).

2. Finish the incomplete proofs of (1.1.4), for example, (3) ⇔ (2) ⇒ (1).
1.2 Coordinatization of a Straight Line: R1 (or R)

Suppose the straight line L is provided with a fixed vectorized space
−−
L(O; X), where x = OX is the base vector.
We call the set
x}
B = {
a basis of L(O; X).

−
For any given point P ∈ L, the fact that OP ∈ L(O; X) induces a unique
−
α ∈ R such that OP = α x holds. Then, this unique scalar, denoted by
−
α = [OP ]B = [P ]B
−
is defined to be the coordinate of the point P or the vector OP with respect
to the basis B. In particular,
[O]B = 0, [X]B = 1.
1.2 Coordinatization of a Straight Line: R1 (or R) 11
For example:
−
[P ]B = −2 ⇔ OP = −2
x;
3 − 3
[Q]B = ⇔ OQ = x .
2 2
See Fig. 1.6.
X Q
L(O; X)
O 3x
x 2
P
–2x
Fig. 1.6
Now we summarize as
The coordinatization of a straight line
Let L(O; X) be an arbitrary vectorized space of line L, with B = {
x },
−−
x = OX, as a basis. The set
RL(O;X) = {[P ]B | P ∈ L}
is called the coordinatized space of L with respect to B. Explain further as
follows.
(1) There is a one-to-one correspondence from any point P on line L onto
corresponding number [P ]B of the real number system R.
(2) Define a mapping Φ: L(O; X) → R by
−
Φ(αx ) = α (or Φ(OP ) = [P ]B , P ∈ L).
Then Φ is one-to-one, onto and preserves algebraic operations, i.e. for
any α, β ∈ R,
1. Φ(β(α
x )) = βα,
2. Φ(α
x + β
x ) = α + β. (1.2.1)
According to Sec. B.7 of Appendix B, mappings like Φ here are called

linear isomorphisms and therefore, conceptually, L(O; X), RL(O; X) and R
are considered being identical (see Fig. 1.7).
We have already known that the following are equivalent.
(1) Only two different points, needless a third one, are enough to determine
a unique line.
[P] R L(O; X)
P Φ
α
X [X ]
O αx
x [O] 1
R
L(O; X) 0 0 1 α
Fig. 1.7
(2) Only one nonzero vector is enough to generate the whole space L(O; X)
(refer to (1.1.4) and (1.1.5)).
Hence, we say that L(O; X) is a one-dimensional vector space with zero

vector 0 . Accurately speaking, L(O; X) is a one-dimensional affine space
(see Sec. 2.8 or Fig. B.2) of the line L.
Owing to arbitrariness of O and X, the line L can be endowed with
infinitely many vectorized spaces L(O; X). But according to (1.2.1), no
matter how O and X are chosen, we always have that
L(O; X)
RL(O; X) = R. (1.2.2)
Φ
So, we assign R another role, representing the standard one-dimensional
real vector space and denoted by
R1 . (1.2.3)
A number α in R is identical with the position vector in R1 , starting from 0
and pointing toward α, and is still denoted by α (see Fig. 1.8).
For the sake of reference and comparison, we summarize as
The real number system R and the standard one-dimensional
vector space R1
(1) R (simply called the real field, refer to Sec. A.3)
(a) Addition For any x, y ∈ R,
x+y ∈R
satisfies the following properties.
1. (commutative) x + y = y + x.
2. (associative) (x + y) + z = x + (y + z).
3. (zero element) 0: 0 + x = x.
4. (inverse element) For each x ∈ R, there exists a unique element
in R, denoted as −x, such that
x + (−x) = x − x = 0.
1.2 Coordinatization of a Straight Line: R1 (or R) 13
(b) Multiplication For any x, y ∈ R,

xy ∈ R
satisfies the following properties.
1. (commutative) xy = yx.
2. (associative) (xy)z = x(yz).
3. (unit element) 1: 1x = x.
4. (inverse element) For each nonzero x ∈ R, there exists a unique
element in R, denoted by x−1 or x1 , such that
xx−1 = 1.
(c) The addition and multiplication satisfy the distributive law
x(y + z) = xy + xz.
(2) R1 (see (1.2.3) and refer to Sec. B.1)
(a) Addition To every pair of elements x and y in R1 , there is a unique
element
x + y ∈ R1
satisfying the following properties.
1. x + y = y + x.
2. (x + y) + z = x + (y + z).
3. (zero vector 0) x + 0 = x.
4. (inverse vector) For each x ∈ R1 , there exists a unique element
in R1 , denoted as −x, such that
x + (−x) = x − x = 0.
(b) Scalar multiplication To each x ∈ R1 and every real number α ∈ R,
there exists a unique element
αx ∈ R1
satisfying the following properties.
1. 1x = x.
2. α(βx) = (αβ)x, α, β ∈ R.
(c) The addition and scalar multiplication satisfy the distributive laws
(α + β)x = αx + βx;
(1.2.4)
α(x + y) = αx + αy.
R
x+y 0 x αx
R1
Fig. 1.8
See Fig. 1.8.
Remark On many occasions, no distinction between R and R1 , in nota-

tion, will be specified and we simply denote R1 by R.
In this sense, element in R has double meanings. One is to represent
a number treated as a point on the real line. The other is to represent a
position vector, pointing from 0 toward itself on the real line. One should
remember that any element in R, either as a number or as a vector, some-
what enjoys different algebraic operation properties as indicated in (1.2.4).
From now on, if necessarily or conveniently, we do not hesitate to
use R to replace R1 , both in notation and in meaning. Any straight line
L, endowed with a vectorized space L(O; X), is nothing but a concrete
geometric model of R (i.e. R1 ).
1.3 Changes of Coordinates: Affine and Linear

Transformations (or Mappings)
Let L be a straight line with two vectorized spaces L(O; X) and L(O ; Y )
on it.
The same point P on L has different coordinates [P ]B and [P ]B ,
respectively, with respect to the different bases
−−
B = {x }, x = OX, and
−−
B = { y = O Y
y },
(see Fig. 1.9).
L
y
Y
x O'
P
O
X
Fig. 1.9
Our purpose here is to find out the relation between [P ]B and [P ]B .
1.3 Changes of Coordinates: Affine and Linear Transformations (or Mappings) 15
Suppose that, temporarily,
−
[P ]B = µ (⇔ OP = µ x ), [O ]B = α0 ;
−−
[P ]B = ν (⇔ O P = ν
y ), [O]B = β0 .
Since x and y are collinear, there exist constants α and β such that

y = α x and x = β y . Hence
x = αβ x implies (αβ − 1)
x = 0 . The
x shows that αβ − 1 = 0 should hold, i.e.
linear independence of
αβ = 1.
− −− −−
Now, owing to the fact that OP = OO + O P ,
µ x + ν
x = α0 x + να
y = α0 x = (α0 + να)
x
⇒ µ = α0 + να, or

[P ]B = [O ]B + α[P ]B . (1.3.1)
−− −− −
Similarly, by using O P = O O + OP , one has
ν = β0 + βµ, or
[P ]B = [O]B + β[P ]B . (1.3.2)
Remark (1.3.1) and (1.3.2) are reversible.
Suppose that (1.3.1) is true. Then one has (why α = 0?)
α0 1
ν=− + µ.
α α
−− −− α0
But OO = −O O ⇒ α0 x = −β0
y = −β0 α
x ⇒ α0 = −β0 α or β0 = − α .
Since αβ = 1, therefore
ν = β0 + βµ.
This is (1.3.2).
Similarly, (1.3.1) is deduced from (1.3.2) with the same process.
Summarize the above results as

Coordinate changes of two vectorized spaces on the same line
Let
−−
L(O; X) with basis B = { x },
x = OX, and
−−
L(O ; Y ) with basis B = { y = O Y
y },
be two vectorized spaces of the line L. Suppose that

y = α
x,
x = β
y (therefore, αβ = 1).
Then the coordinates [P ]B and [P ]B with respect to bases B and B , of the
same point P on L, satisfy the following reversible linear relations:
[P ]B = [O ]B + α[P ]B , and
[P ]B = [O] B + β[P ]B .

In particular, if O = O , then [O ]B = [O]B = 0. (1.3.3)
In specific terminology, Eqs. such as (1.3.1) and (1.3.2) are called affine
transformations or mappings between affine spaces L(O; X) and L(O ; Y );
in case O = O , called (invertible) linear transformation (see Sec. B.7).
Finally, here is an example.
Example Determine the relation between the Centigrade (◦ C) and the

Fahrenheit (◦ F) on the thermometer.
Solution (see Fig. 1.10) Suppose O and X are 0◦ C and 1◦ C respectively.

Also, O and Y are 0◦ F and 1◦ F. Then, we have
−− −− −− 9 −−
O O = 32O Y and OX = O Y .
5
−− −−
Let C = {OX} and F = {O Y }. Now, for any given point P on the
thermometer, we have
[P ]C = the Centigrade degree of P, and
[P ]F = the Fahrenheit degree of P.
−− −− −
By using the fact that O P = O O + OP , one has the following relation
9 9
[P ]F = [O]F + [P ]C = 32 + [P ]F , or
5 5
5
[P ]C = {[P ]F − 32}
9
between [P ]F and [P ]C . 2
1.4 Affine Invariants 17
O
Y
O'
Fig. 1.10
Exercises
<A> (Adopt the notations in (1.3.3).)
1. Suppose that L(O; X) and L(O ; Y ) are two vectorized spaces on the
same line L. Let
1
[O]B = −5 and
y = x.
3
(a) Locate the points O, X, O and Y on the line L.
(b) If a point P ∈ L and [P ]B = 0.2, find [P ]B ? If [P ]B = 15, what
is [P ]B ?
2. Construct two vectorized spaces L(O; X) and L(O ; Y ) on the same
line L, and explain graphically the following equations as changes of
coordinates with
[P ]B = x and [P ]B = y, P ∈ L.
(a) y = −2x.
√
(b) y = 3x − 53 .
(c) x = 6y.
(d) x = −15y + 32.
1.4 Affine Invariants

Construct a vectorized space L(O; X) on the line L and a vectorized
space L (O ; X ) on the line L , where L and L may not be coincident.
−− −−−
Let B = {OX} and B = {O X } be the respective basis on L and L . Also,
let
[P ]B = x for P ∈ L, and

[P ]B = y for P ∈ L .
A mapping or transformation T from L onto L (see Sec. A.2) is called
an affine mapping or transformation if there exist constants a and b = 0
such that
T (x) = y = a + bx (1.4.1)
holds for all P ∈ L and the corresponding P ∈ L . Note that y = T (x)
is one-to-one. In case a = 0, y = T (x) = bx is called a linear transforma-
tion (isomorphism) from the vector space L(O; X) onto the vector space
L (O ; X ). In this sense, change of coordinates on the same line as intro-
duced in (1.3.3) is a special kind of affine mapping.
For any two fixed different points P1 and P2 with [P1 ]B = x1 and [P2 ]B =
x2 , the whole line L has coordinate representation
x = (1 − t)x1 + tx2 ,t∈R (1.4.2)
−−
with respect to the basis B. The directed segment P1 P2 or −x−
1 x2 with P1 as
initial point and P2 as terminal point is the set of points
x = (1 − t)x1 + tx2 , 0 ≤ t ≤ 1. (1.4.3)
In case 0 < t < 1, the corresponding point x is called an interior point of
−
x−
1 x2 , otherwise (i.e. t < 0 or t > 1) an exterior point. See Fig. 1.11.
P1 x (exterior point)
P2 x x1
L(O: X) x2 (interior point)
X
O
Fig. 1.11
Applying (1.4.1) and (1.4.3), we see that an affine mapping maps a

(directed) segment − x− −−
1 x2 onto a (directed) segment y1 y2 , preserving end
points, interior points and exterior points. In fact, a point x = (1−t)x1 +tx2
is mapped into the point
y = (1 − t)y1 + ty2 (1.4.4)
with y1 = a + bx1 and y2 = a + bx2 .
1.4 Affine Invariants 19
−−
Orient the line L by the basis vector OX in L(O; X) and let x2 − x1 be
−−
the signed length of the segment x1 x2 as we did in Sec. 1.1. For convenience,
we also use −x−
1 x2 to denote its signed length. Then, by (1.4.2), we see that
(1 − t)(x − x1 ) = t(x2 − x)
−
x1 x t
⇒ − = , t = 0, 1 (1.4.5)
xx2 1−t
−
which is equal to y−1y
, by using (1.4.4). This means that an affine mapping
yy 2
preserves the ratio of two line segments along the line (see also (1.4.6) and
Ex. <A> 2).
Finally, y2 − y1 = a + bx2 − (a + bx1 ) = b(x2 − x1 ) means that
−
y−
y = b− x−x
. (1.4.6)
1 2 1 2
Then, an affine mapping does not preserve the signed length except b = 1,
and does not preserve the orientation of directed segment except b > 0.
We summarize as
Affine invariants
An affine transformation between straight lines preserves
1. (directed) line segment along with end points, interior points and exterior
points, and
2. ratio of (signed) lengths of two line segments (along the same line) which
are called affine invariants. It does not necessarily preserve
3. (signed) length, and
4. orientation. (1.4.7)
Remark Affine coordinate system and affine (or barycentric) coordinate.

Let a straight line L be vectorized as L(O; X), and P1 and P2 be two
−−
arbitrarily fixed different points. As usual, denote the basis B = {OX} and
[P1 ]B = x1 , [P2 ]B = x2 .
(1.4.2) can be rewritten as
x = λ1 x1 + λ2 x2 , λ1 , λ2 ∈ R and λ1 + λ2 = 1 (1.4.8)
for any point P on L with [P ]B = x. Then, we call the order pair
(λ1 , λ2 ) (1.4.9)
the affine or barycentric coordinate of the point P or x with respect to the
−− −−
affine coordinate system {P1 , P2 } or {OP1 , OP2 }. In particular, ( 12 , 12 ) is the
barycenter or the middle point of the segment P1 P2 or x1 x2 .
(−, +)
(+, +) P2
P1
(+, −)
x2
x1
L(O; X) (0, 1)
(1, 0)
Fig. 1.12
x is an interior point of P1 P2 if and only if it has affine coordinate

(λ1 , λ2 ) with components λ1 > 0 and λ2 > 0. See Fig. 1.12. As a trivial
consequence, the points P1 and P2 divide the whole line L into three dif-
ferent parts: (+, −), (+, +), and (−, +).
Exercises
<A>
1. For each pair P1 , P2 of different points on L(O; X) and each pair P1 , P2 ,
of different points on L (O ; X ), show that there exists a unique affine
mapping T from L(O; X) onto L (O ; X ) such that
T (P1 ) = P1 and T (P2 ) = P2 .
2. A one-to-one and onto mapping T : L(O; X) → L (O ; X ) is affine if and
only if T preserves ratio of signed lengths of any two segments.
CHAPTER 2
The Two-Dimensional Real Vector Space R2
Introduction
In our physical world, one can realize that there does exist a point, lying
outside of a given straight line.
In Fig. 2.1, the point Q does not lie on the line L. This point Q and the
moving point X on the line L generate infinitely many straight lines QX.
We image that the collection of all such lines QX constitute a plane.
Q L
P
X
O
Fig. 2.1
Therefore, we formulate the

Postulate Three noncollinear different points determine a unique plane.
The face of a table or a piece of horizontally placed paper, imagined
extended beyond limit in all directions, all can be considered as a geometric
model of a plane. Usually, a parallelogram, including its interior, will act
as a symbolic graph of a plane Σ (see Fig. 2.2).
Fig. 2.2
21
22 The Two-Dimensional Real Vector Space R2
About plane, one should know the following basic facts.
(1) A plane Σ contains uncountably many points.

(2) The straight line, determined by any pair of different points in a
plane Σ, lies in Σ itself. Therefore, starting from a point and then a line on
which the point does not lie, one can construct a plane.
(3) The plane, generated by any three points in a plane Σ, coincides
with Σ.
(4) (Euclidean parallelism axiom) Let the point P and the line L be in
the same plane Σ, with P lying outside of L. Then, there exists one and
only one line on Σ, passing through the point P and parallel to the given
line L (i.e. no point of intersection exists).
(5) A plane possesses the following Euclidean geometric concepts or
quantities.
1. Length (of a segment) or distance (between two points).
2. Angle (between two intersection lines).
3. Area (of a triangle).
4. Rotation in the clockwise or anticlockwise sense (around a center).
See Fig. 2.3.
Q C
O O

A
P O B
(a) (b) (c) anticlockwise (d) clockwise

rotation rotation
Fig. 2.3
(6) And the related basic formulas are
1. (Pythagorean theorem) P Q2 = P O2 + OQ2 ,

2. (law of cosine) BC 2 = AB 2 + AC 2 − 2AB · AC cos θ, and
3. (area of a triangle) 12 AB · AC sin θ.
All these can be learned from middle school mathematical courses.

Sketch of the Content 23

In this chapter, the development of our theory will be based on the Postulate
mentioned on P.21. After introducing the concept of a plane vector
(Sec. 2.1), we model after Sec. 1.1 to vectorize a plane (Sec. 2.2) and Sec. 1.2
to coordinatize it (Sec. 2.3). Therefore, we establish the abstract set
R2 = {(x1 , x2 ) | x1 , x2 ∈ R}
with scalar multiplication and addition of vectors on it, which can be consid-
ered as a typical model of planes and is called the standard two-dimensional
real vector space.
Changes of coordinates on the same plane (Sec. 2.4) result in the alge-
braic equations
y1 = a1 + a11 x1 + a21 x2 , y2 = a2 + a12 x1 + a22 x2
with a11 a22 − a21 a12 = 0. This is the affine transformation or linear trans-
formation in case a1 = a2 = 0 to be formally discussed in Secs. 2.7 and 2.8.
Section 2.5 introduces straight lines in the affine plane R2 and its various
equation representations in different coordinate systems.
Section 2.6 formally introduces the terminology, affine basis B =
{a0 , a2 }, to replace the coordinate system Σ(
a1 , a0 ;
a1 ,
a2 ), and this will
be used thereafter constantly. Hence, we also discuss barycentric or affine
coordinates of a point with respect to an affine basis. Ex. of Sec. 2.6
introduces primary ideas about projective plane.
The contents in Secs. 2.5 and 2.6 provide us the necessary backgrounds
in geometric interpretation, which will be discussed in Secs. 2.7 and 2.8.
Just like Sec. 1.4, a one-to-one mapping T from R2 onto R2 that pre-
serves signed ratios of line segments along the same line is characterized
algebraically as
T (x) =
x0 +
xA, x ∈ R2 .
where A = [aij ]2×2 is invertible,
This is called affine transformation, and is called linear isomorphism, in the

special case x0 = 0 .
The main theme of this chapter lies in Sec. 2.7, concerning linear
transformations and their various operations and properties, such as: geo-
metric mapping properties, matrix representations, factorizations, diagonal,
Jordan and rational canonical forms. Eigenvalues and eigenvectors as well
as ranks of linear transformations are dominating, both conceptually and
computationally.
Section 2.8 focuses on affine transformations and their geometric map-

ping properties, such as reflections, stretches, shearings and rotations.
We single out affine invariants on the affine plane R2 , talk about affine
geometry illustrated by standard basic examples and apply both to the
study of quadratic curves.
The primary connections among sections are listed as follows.
In the above diagram, one may go directly from Sec. 2.4 to Sec. 2.7 without
seriously handicapping the understanding of the content in Sec. 2.7.
The manners presented, the techniques used and the results obtained in
this chapter will be repeated and reconstructed for R3 in Chap. 3.
2.1 (Plane) Vector

Displace a point P on a fixed plane along a fixed direction to another
−
point Q. The resulted directed segment P Q is called a plane vector
−
(see Fig. 2.4). Therefore, the vector P Q should have two required
properties, i.e.
1. length P Q, and
(2.1.1)
2. direction.
−
In case Q = P , call P Q a zero vector.
Q Σ
(fixed
P
direction)
Fig. 2.4
2.1 (Plane) Vector 25
−
In physics, a vector P Q can be interpreted as a force acting on the point
P and moving in the fixed direction to the point Q (see Fig. 2.5). Under
−
this circumstance, the vector P Q possesses both
1. the magnitude of acting force, and

(2.1.2)
2. the direction of movement,
as two main concepts.

Q
(acting force)
P
Fig. 2.5
Both displacement and acting force are concrete, vivid interpretation of

vectors, which is going to be abstractly defined below.
In what follows, everything will happen on the same fixed plane.

Let the segment P Q be moved, while keeping parallel to each other, to
the segment RS so that P QSR forms a parallelogram (see Fig. 2.6). Then
− −
we designate that the vector P Q is identical with RS. In other words,
− −
P Q = RS
⇔ 1. P Q = RS (in length), and

2. the direction from P to Q is the same as the direction from R to S.
(2.1.3)
This is called the parallel invariance of vectors. As indicated in Fig. 2.6,
− −
P R = QS.
Fig. 2.6
Usually, we use x to represent a vector, that is to say, for any fixed

point P , one can always find another point Q such that
−
x = P Q.
−
According to parallel invariance, there are infinitely many choices of P Q to
−
represent the same x , once these P Q have the same length and the same
direction. Hence,
x has a definite meaning, even though more abstract than
−
P Q. In particular, the zero vector is
−
0 = P P (P is any fixed point), (2.1.4)
−
and the negative of a vector
x = P Q is
−
−
x = (−1)
x = QP . (2.1.5)
x and −
Therefore, x represent, respectively, two vectors equal in length
but opposite in direction.
Now, two operations will be introduced among vectors.
Addition
− −
Suppose x = P Q and y = QS. Starting from P , along
x , to Q and then
along y , to S is the same as starting from P directly to S. Then, it is
reasonable to define
−
x +
y = PS (2.1.6)
(see Fig. 2.7). Physically,

x +
y can be interpreted as the composite force
of x and y . The vector x + y is called the sum of

x and
y , and the binary
operation of combining x and y into a single vector

x + y is called the
addition of vectors.
y
x+y
Q
x
P
Fig. 2.7
Scalar multiplication
−
Suppose α ∈ R and x = P Q. Keep P fixed. Stretch the segment P Q,
α times, to a collinear segment P R. As for α > 0 or α < 0, the point R
lies on the same side of P as Q or on the opposite of P , respectively (see
Fig. 2.8). In all cases, we define the scalar product of the scalar α and the
vector x by
−
αx = P R, (2.1.7)
R
Q x(> 0)
P
R x
x( < 0)
Fig. 2.8
and the whole process is called the scalar multiplication of vectors. In

particular,
−
0
x = P P = 0 , and
x = −(α
(−α) x ) = −α
x. (2.1.8)
Of course,
−
α0 = PP = 0 (2.1.9)
holds for all α ∈ R.
By the way, the subtraction vector of the vector
y from the vector
x is
defined as
x −

y =
x + (−
y ). (2.1.10)
See Fig. 2.9.
x+y
y
x–y
x
–y
Fig. 2.9
We summarize what we have done, up to now, as

Algebraic operation properties of (plane) vectors
For each pair of vectors
x and
y , there is a unique (plane) vector

x +
y (called addition),
and for each scalar α ∈ R and each vector
x , there is a unique (plane)
vector
α
x (called scalar multiplication)
such that the following properties hold (refer to Sec. B.1).
(a) Addition
1. (commutative) x + y = y +x.
2. (associative) ( x + y ) + z =

x + (y + z ).

3. (zero vector) There is a vector, specifically denoted by 0 , such that

x+ 0 = x.
4. (negative or inverse of a vector) For each (plane) vector x,
there exists another (plane) vector, denoted by − x , such that

x + (− x) = 0 .
(b) Scalar multiplication
1. 1
x = x.
2. α(β
x ) = (αβ)
x.
(c) The addition and scalar multiplication satisfy the distributive laws
(α + β)
x = α
x + β
x, and

α( x + y ) = α x + α y . (2.1.11)
When comparing them with (1.2.4)(2), we can easily find that line vectors

x and plane vectors x enjoy exactly the same operations and properties.
The only difference between them is that, one is one-dimensional, by nature,
while the other is two-dimensional.
Remark 2.1
Operation properties as shown in (2.1.11), not only hold for line vectors
and plane vectors, but also for (three-dimensional) space vectors. Now, we
will roughly explain as follows.
Line vectors

Suppose x = 0 . Then for any collinear vector y , there exists a unique
scalar α ∈ R such that
y = αx . Therefore, in essence, the addition of x
and y,x +y reduces to the scalar multiplication

x +
y =
x + α
x =

(1 + α) x .
Space vectors (see Chap. 3)

Pick up three arbitrary noncollinear points P, Q and R in space. Let
− −
x = P Q and y = P R. These three points determine a unique plane Σ
in space, which contains x and
y as plane vectors (see Fig. 2.10). It follows
that the space vectors x, α x (lying on a straight line in space) and

x + y
(lying on the same plane extended by x and y in space, if x and

y are
linearly independent) will enjoy the operation properties listed in (2.1.11).
This suggests implicitly that only the addition and scalar multiplica-
tion of vectors are just good enough to describe (i.e. vectorize) the three-
dimensional or even higher dimensional spaces. We will see it later in
Chap. 3.
R
x+y
y
x
P
x Q
Fig. 2.10
Remark 2.2
It is appropriate, now, to say what is a vector space over a field.
For general definition of a field F, please refer to Sec. A.3; and definition
of a vector space V over a field F, refer to Sec. B.1.
According to (1) in (1.2.4), R is a field and is called the real field.
Therefore, (1.1.3) or (2) in (1.2.4) says that R1 or R is a vector space over
the field R, simply called a real vector space.
Similarly, (2.1.11) indicates that R2 is also a real vector space, but of
two-dimensional (see Secs. 2.2 and 2.3).
The three-dimensional real vector spaces R3 will be defined in Secs. 3.1
and 3.2.
A vector space over the complex field C is specifically called a complex
vector space.
The elements of the field F are called scalars and the elements of the
vector space V are called vectors. The word “vector”, without any practical
meanings such as displacement or acting force, is now being used to describe
any element of a vector space.
Exercises
<A>
1. Explain graphically the properties listed in (2.1.11).
2. Explain physically the properties listed in (2.1.11).
2.2 Vectorization of a Plane: Affine Structure

We take three noncollinear points O, A1 and A2 , and then keep them fixed.
Let
− −
a1 = OA1 and a2 = OA2 .

Then a1 = 0 and a2 = 0 hold. Furthermore, since A2 lies outside of the
straight line generated by O and A1 , a2 = α
a1 for any α ∈ R. Similarly,
a1 = α a2 for any α ∈ R.

Denote by Σ the plane determined by O, A1 and A2 . Through any fixed

point P on Σ, draw two straight lines parallel to the lines OA1 and OA2
respectively. And then intersect with OA1 at point P1 and with OA2 at
point P2 (see Fig. 2.11). There exist two unique scalars x1 , x2 ∈ R such
that
−
OP1 = x1a1 , and
−
OP2 = x2 a2 .
Owing to the fact that OP 1 P P2 is a parallelogram, this implies that
− − −− − −
OP = OP1 + P1 P = OP1 + OP2 = x1 a1 + x2 a2 .
L(O; A2)
P
P2
A2 L(O; A1)
P1
A1
O
Fig. 2.11
2.2 Vectorization of a Plane: Affine Structure 31
Conversely, given any two scalars x1 , x2 ∈ R, we can find a unique point

−
P1 on the line OA1 such that OP1 = x1 a1 , and a unique point P2 on OA2
−
such that OP2 = x2 a2 . Construct a parallelogram OP1 PP2 with given sides
−
OP1 and OP2 . Then the position vector OP from O to its opposite vertex
−
P satisfies OP = x1 a1 + x2
a2 .
Now fix x1 , and let x2 run through all the real numbers in the equation
−
OP = x1 a1 + x2
a2 . It geometrically results in the movement of the line
L(O; A2 ) along the vector x1 a1 up to the parallel straight line
x1
a1 + L(O; A2 ), (2.2.1)
which passes through the point P1 (see Fig. 2.12). Then, if x1 starts to run
through all the real numbers, the family of straight lines x1
a1 + L(O; A2 ),
parallel to each other, will sweep through the whole plane Σ.
x1a1 + x2 a2
A2
a2
A1 P1
L(O; A1)
O a1 x1a1
L(O; A2) x1a1 + L(O; A2)
Fig. 2.12
Summarize the above results as

Algebraic vectorization of a plane
Let O, A1 and A2 be any fixed noncollinear points in a plane Σ. Let
− −
a1 = OA1 and a2 = OA2 .
Then the linear combination x1 a1 + x2 a2 , with coefficients x1 and x2 , of

the plane vectors a1 and a2 is suitable to describe any point P on Σ (i.e.
−
the position vector OP ). Specifically, call the set
Σ(O; A1 , A2 ) = {x1 a2 | x1 , x2 ∈ R}
a1 + x2
the vectorized space of the plane Σ with the point O as the origin or
−
0 = OO as zero vector and a1 ,
a2 as basis vectors. Σ(O; A1 , A2 ) indeed is a
vector space over R (see (2.1.11) and Remark 2.2 there), with { a2 } as a
a1 ,
basis. (2.2.2)
Once we have these concepts, we are able to algebraically characterize

the conditions for four points to be coplanar.
−−
Suppose O, B1 , B2 and B3 are coplanar. Let bi = OBi , i = 1, 2, 3. The
following three cases are considered.
Case 1 Suppose these four points are collinear (see Fig. 2.13). No matter
what possibilities these four points might be situated, at least one of the

vectors b1 , b2 and b3 can be expressed as a linear combination of the other

two. For example, in case O = B1 = B2 = B3 , we have b1 = b2 = 0 and

b3 = 0 , and hence b1 = α b2 + 0 b3 for any scalar α ∈ R.
O = B1 = B2 = B3 O = B1 = B2 B3 O = B1 B2 = B3 O B1 B2 B3
Fig. 2.13
Case 2 Suppose that three out of the four points are collinear (see
Fig. 2.14). For example, in case O = B1 = B2 but they are collinear,

then b2 = α b1 + 0 b3 for a suitable scalar α.
B3 B3
O B1 = B2 O B1 B2
Fig. 2.14
Case 3 Suppose that any three of the four points are not collinear. As we

already knew from Fig. 2.12, b3 = α b1 + β b2 is true for some scalars α
and β.
Conclusively, we have the
Linear dependence of plane vectors
The following statements hold and are equivalent.
(1) (geometric) Four points O, B1 , B2 and B3 are coplanar.
⇔ (2) (algebraic) Fix one of the points O, B1 , B2 and B3 as origin and
−
hence produce three vectors, say O as origin and bi = OB i ,

i = 1, 2, 3. Then, at least one of the vectors b1 , b2 and b3 is a
linear combination of the other two vectors.
⇔ (3) (algebraic) There exist real numbers y1 , y2 and y3 , not all equal to

zero, such that y1 b1 + y2 b2 + y3 b3 = 0 .
2.2 Vectorization of a Plane: Affine Structure 33
Under these circumstances, the four points are called affinely dependent
and the resulting plane vectors linearly dependent. (2.2.3)
Therefore, any three coplanar vectors are linearly dependent.

From what we did at the beginning of this section, it follows
immediately the
Linear independence of nonzero plane vectors
(1) (geometric) Three points O, B1 and B2 are not collinear.
⇔ (2) (algebraic) Fix one of the points O, B1 and B2 as origin and hence
−
produce two vectors, say O as origin and bi = OB i , i = 1, 2. Then,

for any α, β ∈ R, both b1 = α b2 and b2 = β b1 are true.

⇔ (3) (algebraic) If there exist real numbers y1 and y2 such that y1 b1 +

y2 b2 = 0 holds, then it is necessary that y1 and y2 should be equal
to zero simultaneously, i.e. y1 = y2 = 0 is the only possibility.
Under these circumstances, three points are affinely independent and the
resulting two plane vectors linearly independent (i.e. non-collinear). (2.2.4)
As a consequence, two nonzero plane vectors can be either linearly

dependent (i.e. collinear) or linearly independent (i.e. non-collinear).
Remark
The geometric fact that three points O, A1 and A2 are not collinear, is alge-
−
braically equivalent to the linear independence of plane vectors a1 = OA1
−
and a2 = OA2 . Any common vector in Σ(O; A1 , A2 ) is produced, via linear
combination x1 a1 + x2
a2 , of a unique vector x1
a1 in L(O; A1 ) and a unique

vector x2 a2 in L(O; A2 ). We combine these two facts together and write as
Σ(O; A1 , A2 ) = L(O; A1 ) ⊕ L(O; A2 ), (2.2.5)
indicating that two intersecting (but not coincident) straight lines deter-
mine a unique plane.
Exercises
<A>
1. Prove (2.2.3) in detail.

2. Use notation in (2.2.2). For any three vectors b1 , b2 , b3 ∈ Σ(O; A1 , A2 ),
prove that they are linearly dependent.
2.3 Coordinatization of a Plane: R2

Provide the plane Σ a fixed vectorized space Σ(O; A1 , A2 ) with basis
B = { a2 },
a1 ,
− −
where a1 = OA1 and a2 = OA2 are called basis vectors. It should be
mentioned that the order of appearance of a2 in B cannot be altered
a1 and
arbitrarily. Sometimes, it is just called an ordered basis to emphasize the
order of the basis vectors. Hence, B is different from the ordered basis
{ a1 }.
a2 ,
Take any point P ∈ Σ. Then
−
OP ∈ Σ(O; A1 , A2 )
−
⇔ OP = x1 a2 for some x1 , x2 ∈ R.
a1 + x2
The respective coefficients x1 and x2 of a1 and a2 form a pair (x1 , x2 ),
denoted by
−
[OP ]B = [P ]B = (x1 , x2 ) (2.3.1)
−
and called the coordinate vector of the point P or the vector OP with respect
to the basis B, where x1 as the first coordinate component and x2 the second
one. In particular,
[O]B = (0, 0),
[A1 ]B = (1, 0), and
[A2 ]B = (0, 1).
See Fig. 2.15.
P(x1, x2)
A2 (0,1)
A1
O (1,0)
(0,0)
Fig. 2.15
They can be concluded as

The coordinatization of a plane
Choose a vectorized space Σ(O; A1 , A2 ) of a plane Σ, with basis
− −
B = { a2 },
a1 , a2 = OA2 . Point P ∈ Σ has coordinate vector
a1 = OA1 ,
2.3 Coordinatization of a Plane: R2 35
−
[P ]B = (x1 , x2 ) ⇔ OP = x1
a1 + x2
a2 . The set of all these coordinate
vectors
R2Σ(O; A1 , A2 ) = {[P ]B | P ∈ Σ}
is called a coordinatized space of Σ, and is indeed a vector space over R.

Explain as follows.
(1) Points P in Σ are in one-to-one correspondence with the ordered pair

[P ]B = (x1 , x2 ) in R2Σ(O; A1 , A2 ) .
(2) Define the operations on the set R2Σ(O; A1 , A2 ) :
1. addition (x1 , x2 ) + (y1 , y2 ) = (x1 + y1 , x2 + y2 ), and

2. scalar multiplication α(x1 , x2 ) = (αx1 , αx2 ), where α ∈ R.
They have all properties listed in (2.1.11), treating (x1 , x2 ) as

x , etc.
Hence, RΣ(O; A1 , A2 ) is a real vector space.
2
(3) Define a mapping Φ: Σ(O; A1 , A2 ) → R2Σ(O; A1 , A2 ) by

−
Φ(x1
a1 + x2
a2 ) = (x1 , x2 ) or Φ(OP ) = [P ]B .
This Φ is one-to-one and onto, and it preserves vector operations, i.e.

− − − −
1. Φ(OP + OQ) = [P ]B + [Q]B (= Φ(OP ) + Φ(OQ)), and
− −
2. Φ(αOP ) = α[P ]B (= αΦ(OP )), α ∈ R.
Φ is called a linear isomorphism between Σ(O; A1 , A2 ) and R2Σ(O; A1 , A2 )
(see Sec. B.7).
Hence, Σ(O; A1 , A2 ) and R2Σ(O; A1 , A2 ) are considered identical. (2.3.2)
See Fig. 2.16.
OP + OQ (x1, x2) + ( y1, y2)
OQ OP ( y1, y2) (x1, x2)
a2 Φ
OP (0, 1) (x1, x2)
a1 (1, 0)
O (0, 0)
Σ(O; A1, A2) R2Σ(O; A1, A2)
Fig. 2.16
Notice that the linear isomorphism Φ carries overall vector operations

in Σ(O; A1 , A2 ) to the corresponding vector operations in R2Σ(O; A1 , A2 ) ,
a neat, easy-to-handle, but abstract space. In particular, Φ preserves
the linear dependence and independence of vectors. To see these, suppose

b1 = α11 a2 and b2 = α21
a1 + α12 a1 + α22
a2 are in Σ(O; A1 , A2 ), then

Φ(y1 b1 + y2 b2 ) = Φ((y1 α11 + y2 α21 )
a1 + (y1 α12 + y2 α22 )
a2 )
= (y1 α11 + y2 α21 , y1 α12 + y2 α22 )
= y1 (α11 , α12 ) + y2 (α21 , α22 )

= y1 Φ( b1 ) + y2 Φ( b2 )
Since Φ is one-to-one and onto, what we claimed follows easily.

By (2.2.3) and (2.2.4), the following are equivalent.
1. (geometric) Three non-collinear points, needless the fourth one, are

enough to determine a unique plane Σ.
2. (algebraic) Two linearly independent vectors, needless the third one, are
able to generate, via linear combination, the whole space Σ(O; A1 , A2 ).
Owing to what we explained in the previous paragraph, vectorized

space Σ(O; A1 , A2 ) and its isomorphic space R2Σ(O; A1 , A2 ) are called two-
dimensional.
Suppose that Σ(O; A1 , A2 ) and Σ(O ; B1 , B2 ) are two vectorized spaces
of a plane Σ, and that Ψ : Σ(O ; B1 , B2 ) → R2Σ(O ; B1 , B2 ) is the linear
isomorphism as Φ : Σ(O; A1 , A2 ) → R2Σ(O ; A1 , A2 ) is the linear isomorphism,
too. Then, we have the following diagram
Ψ−1 ° Φ
Σ Σ
Φ Ψ
RΣ = RΣ
(2.3.3)
where Ψ−1 is the inverse mapping of Ψ and Ψ−1 ◦ Φ means the composite
mapping of Φ followed by Ψ−1 (see Sec. A.2). Ψ−1 ◦ Φ is also a linear
isomorphism. Observing this fact, we have the
Standard two-dimensional vector space R2 over R

Let
R2 = {(x1 , x2 ) | x1 , x2 ∈ R}
and designate (x1 , x2 ) = (y1 , y2 ) ⇔ x1 = y1 , x2 = y2 . Provide it with the
following two operations,
(a) addition (x1 , x2 ) + (y1 , y2 ) = (x1 + y1 , x2 + y2 ),

(b) scalar multiplication α(x1 , x2 ) = (αx1 , αx2 ), α ∈ R.
R2 is a two-dimensional real vector space, when treated (x1 , x2 ) as vector

x in (2.1.11). In particular,

the zero vector 0 = (0, 0), and
x = (−x1 , −x2 ).
the inverse vector − (2.3.4)
In this text, R2 is considered as an abstract model of a concrete plane Σ,

in the sense that R2 is the universal representation of any vectorized space
Σ(O; A1 , A2 ) or its coordinatized space R2Σ(O; A1 , A2 ) of Σ.
Usually, element (x1 , x2 ) in ∈ R2 is denoted by x , i.e.

x = (x1 , x2 ), y = (y1 , y2 ), etc.
With this Convention, elements in R2 have double meanings:
1. (affine point of view) When R2 is considered as a plane, x is called a

point and two points x,
y determine a vector x −
y or y − x with

x − x = 0.

x in R2 is
2. (vector point of view) When considered as a vector space,

called a vector, pointing from zero vector 0 toward the point x (for
exact reason, see Definition 2.8.1 in Sec. 2.8).
If conveniently, we will feel free to use both of these two concepts. Refer
to Fig. 2.17. Both traditionally and commonly used in existed textbooks
of linear algebra, when the point (x1 , x2 ) in R2 is considered as a position
vector (as in 2), it is usually denoted by a column vector

x1 x1
or
x2 x2
and is treated as a 2 × 1 matrix. We would rather accept x = (x1 , x2 ) as a
vector too than this traditional convention, for simplicity and for later usage
in connection with matrix computational works (see Sec. 2.4).
Remark 2.3
In the Euclidean sense, a segment has its length, and two lines from the
angle between them.
For a coordinated space R2Σ(O;A1 ,A2 ) of a plane Σ, the basis vectors

e1 = (1, 0) and
e2 = (0, 1) together are usually said to form an oblique or
affine coordinate system (see Fig. 2.17(a)). When the following additional
requirements, that is,
1. the line OA1 and OA2 intersects orthogonally at O, and

2. the segments OA1 and OA2 have equal lengths,
are imposed, then e1 and e2 are said to form a rectangular or Cartesian
coordinate system (see Fig. 2.17(b)).
point x = (x1, x2)
A2 point x = (x1, x2)
A2 vector x
e2 = (0, 1) vector x
e2 = (0, 1)
A1
O
e1 = (1, 0) O
0
0 e1 = (1, 0) A1
(a) affine coordinate (b) rectangular coordinate
Fig. 2.17
From now on, if not specified, a plane endowed with a rectangular coordi-
nate system will always be considered as a concrete geometric model of R2 .
Remark 2.4 Reinterpretation of (2.2.5).

(2.2.5) is expected to be reinterpreted as
R2Σ(O; A1 , A2 ) = RL(O; A1 ) ⊕ RL(O; A2 ) or (2.3.5)

R =R⊕R
2
(direct sum of R and R). (2.3.6)
Explain further as follows.

When considered being existed by itself, i.e. independent of being as a
subset of the plane, we have already known from (1.2.2) that RL(O; A1 ) =
R = RL(O; A2 ) .
But, in reality, L(O; A1 ) is subordinated to the plane Σ, as part of it.

For any given P ∈ Σ, then
P lies on the line OA1 .

− −
⇔ OP ∈ L(O; A1 ), i.e. OP = x1
a1 + 0
a2 .
⇔ [P ]B = (x1 , 0).
Therefore, we have
RL(O; A1 ) = {(x, 0) | x ∈ R} ⊆ R2Σ(O; A1 , A2 ) .
Similarly, the inclusive relation holds:
RL(O; A2 ) = {(0, y) | y ∈ R} ⊆ R2Σ(O; A1 , A2 ) .
It is easily seen that vectors in RL(O; A1 ) are closed under the addition
and scalar multiplication which are inherited from those in R2Σ(O; A1 , A2 ) .
This simply means that RL(O; A1 ) exists by itself as a vector space, and
is called a one-dimensional vector (or linear) subspace of R2Σ(O; A1 , A2 ) . For
the same reason, RL(O; A2 ) is also a one-dimensional vector subspace of
R2Σ(O; A1 , A2 ) .
Observe the following two facts.
1. The vector subspaces RL(O; A1 ) and RL(O; A2 ) have only zero vector

0 = (0, 0) in common, i.e.

RL(O; A1 ) ∩ RL(O; A2 ) = { 0 }.
2. The sum vectors
(x1 , 0) + (0, x2 ) = (x1 , x2 ),
where (x1 , 0) ∈ RL(O; A1 ) and (0, x2 ) ∈ RL(O; A2 ) , generate the whole

space R2Σ(O; A1 , A2 ) . (2.3.7)
Just under these circumstances, we formulate (2.2.5) in the form of (2.3.5).

One can simplify (2.3.5) further.
For this purpose, define mappings
Φ1 : R → RL(O; A1 ) by Φ1 (x) = (x, 0), and

Φ2 : R → RL(O; A2 ) by Φ2 (x) = (0, x). (2.3.8)
See Fig. 2.18.

RL(O; A2)
Φ 1(x) = (x, 0)
Φ1 A2
R e2
x
O
0 1 0
e1 A1
Φ2
Φ 2(x) = (0, x)
RL(O; A1)
Fig. 2.18
It is obvious that Φ1 and Φ2 are linear isomorphisms (i.e. one-to-one, onto

and preserving two vector operations) between vector spaces concerned. It
is in this isomorphic sense that RL(O; A1 ) and RL(O; A2 ) are both considered
identical with the standard one-dimensional real vector space R, and we are
able to rewrite (2.3.5) as (2.3.6). In other words, one just directly considers
R as subspaces {(x, 0) | x ∈ R} and {(0, x) | x ∈ R} and interpret R2 as (like
(2.3.7))
R2 = R ⊕ R,
the (external) direct sum of its subspace R.
Exercises
<A>
1. In the diagram (2.3.3), prove that Ψ−1 ◦Φ: Σ(O; A1 , A2 ) → Σ(O ; B1 , B2 )

is a linear isomorphism. Then, modeling after Fig. 2.16, graphically
explain the behavior of the mapping Ψ−1 ◦ Φ.
2. Prove that any three vectors in R2 must be linearly dependent, i.e. if
x1 , x2 , x3 ∈ R2 , then there exist scalars α1 , α2 and α3 , not all equal

to zero, such that α1 x1 + α2
x2 + α3
x3 = 0 (refer to (2.2.3)). Note

that x1 ,
x2 are linearly independent if α1
x1 + α2
x2 = 0 always implies
α1 = α2 = 0.
3. A nonempty subset S of R2 is called a vector or linear subspace if the
following properties holds:
(1) y ∈S⇒
x, x +y ∈ S,

(2) α ∈ R and x ∈ S ⇒ α

x ∈ S (in particular, 0 ∈ S).

{ 0 } and R2 itself are trivial subspaces of R2 . A subspace which is not
identical with R2 is called a proper subspace. Prove that the following
are equivalent.

(a) S is a proper subspace which is not { 0 }.
x0 ∈ S such that
(b) There exists a vector
x0 | α ∈ R},
S = {α
which is denoted by x0 , called the subspace generated or spanned

by x0 = 0 in this case.
x0 . Note that
(c) There exist constants a and b such that
x = (x1 , x2 ) | ax1 + bx2 = 0}.

S = {

We simply call a straight line (refer to Sec. 2.5) passing 0 = (0, 0)
a one-dimensional subspace of R2 and denote S by ax1 + bx2 = 0.
Explain why a straight line ax1 + bx2 = c with c = 0 is never a subspace.
4. A nonempty subset B of R2 is called a basis for R2 if
1. B is linearly independent, and
2. B generates or spans R2 , namely, each vector in R2 can be expressed
as a linear combination of vectors from B (see Sec. B.2).
Show that the number of basis vectors in any bases for R2 is equal
to 2. This is the reason why we call R2 two-dimensional. Show that a

proper nonzero subspace of R2 is one-dimensional. One calls { 0 } the
zero-dimensional subspace.
5. Let S1 and S2 be two subspaces of R2 . The sum of S1 and S2 ,
S1 + S2 = { x2 |
x1 + x1 ∈ S1 and
x2 ∈ S2 }

is still a subspace of R2 . In case S1 ∩ S2 = { 0 }, write S1 + S2 as
S1 ⊕ S2
and call the direct sum of S1 and S2 . Suppose U is a subspace of R2 ,

show that there exists a subspace V of R2 such that
R2 = U ⊕ V.
Is V unique?
6. Let xn be position vectors in R2 such that the terminal point of
x1 , . . . ,

xj−1 is the initial point of xj for 2 ≤ j ≤ n and the terminal point of

xn is the initial point of
x1 . Then

n

xj = x2 + · · · +
x1 + xn = 0.
j=1

1. Suppose x1 , x3 are in R2 . For any other vector

x2 , x ∈ R2 , show that
there exist scalars α1 , α2 , α3 , not all zero, such that

x1 + α1
x,
x2 + α2
x,
x3 + α3
x
are linearly dependent.

2. Suppose x1 and x2 are linearly independent vectors in R2 . For any
x ∈ R , show that, among the vectors
2
x, x1 ,
x2 , at most one of these
can be represented as a linear combination of the preceding ones.
3. Suppose x1 and x2 are linearly independent vectors in R2 . If a vector
x ∈ R2 can be represented as linear combinations of

x1 alone and x2

alone, then x = 0 .
4. Suppose any two vectors of the linearly dependent vectors x1 ,
x2 and
x3
are linearly independent. If the scalars a1 , a2 , a3 satisfy a1 x1 + a2

x2 +

x3 = 0 , show that either a1 a2 a3 = 0 or a1 = a2 = a3 = 0.
a3

In the former case, if b1 x1 + b2 x2 + b3
x3 = 0 also holds, then
a1 : b1 = a2 : b2 = a3 : b3 .
5. Suppose x2 ∈ R2 satisfy the following:
x1 ,
(1) { x2 } generates R2 , i.e. each
x1 , x ∈ R2 is a linear combination

a1 x1 + a2 x2 of x1 and x2 .
x o ∈ R2 which has a unique linear combination
(2) There exists a vector
representation α x1 + β

x2 .
Then { x2 } is a basis for R2 . Is condition (2) necessary?
x1 ,
6. Let { x1 , x2 } and {

y2 } be two bases for R2 . Then at least one of
y1 ,
{ x1 , y1 } and { x1 , y2 } is a basis for R2 . How about {

y1 } and {
x2 , y2 }?
x2 ,
<C> Abstraction and generalization
Almost all the concepts we have introduced for R and R2 so far, can be
generalized verbatim to abstract vector spaces over an arbitrary field.
Such as:
Vector (or linear) space V over a field F: Fn .
Real or complex vector space: Rn , Cn .
Subspace (generated or spanned by a set S of vectors): S.

Zero subspace: { 0 }.
Intersection subspace: S1 ∩ S2 .
Sum subspace: S1 + S2 .
Direct sum of subspaces: S1 ⊕ S2 .

n
Linear combination (of vectors): xi ,
αi xi ∈ V and αi ∈ F.
i=1
Linear dependence (of a set of vectors).
Linear independence (of a set of vectors).
Basis and basis vector: B = { xn }.
x1 , . . . ,
Coordinate vector of a vector x with respect to a basis B = { xn }:
x1 , . . . ,

n
[ x ]B = (α1 , . . . , αn ) if x = αi xi .
i=1
Dimension (of a vector space V ): dim V = n.
Linear isomorphism.
Make sure you are able to achieve this degree of abstraction and generaliza-
tion. If not, please refer to Secs. B.1, B.2, B.3 and B.7 for certainty. Then,
try to extend these results stated in <A> and as far as possible to
a general vector space V over a field F. For example, Ex. 6 can be
restated as
6 . Let { xn } and {
x1 , . . . , yn } be two bases for an n-dimensional
y1 , . . . ,
vector space V . Then there exists a permutation j1 , j2 , . . . , jn of
1, 2, . . . , n so that all
{
x1 , . . . ,
xi−1 ,
yji , xn },
xi+1 , . . . , 1≤i≤n
are bases for V .
Of course, better try to justify generalized results true by rigorous proofs

or false by providing counterexamples in a project program or a seminar.
During the proceeding, it is suggested to notice:
(1) Does the intuition experienced in the construction of the abstract spaces
R and R2 help in solving the problems? In what way?
(2) Is more intuitive or algebraic experience, such as in R3 (see Chap. 3),
needed?
(3) Does one have other sources of intuition concerning geometric concepts
than those from R, R1 and R2 ?
(4) Are the algebraic methods developed and used to solve problems in R
and R2 still good? To what extend should it be generalized? Does the
nature of a scalar field play an essential role in some problems?
(5) Need the algebraic methods be more unified and simplified? Need new
methods such as matrix operations be introduced as early as possible
and widely used?
(6) Are more mathematical backgrounds, sophistication or maturity
needed?
Furthermore, try the following problems.
1. Model after (2) in (1.2.4) to explain that the set C of complex numbers
is a one-dimensional vector space over the complex field C.
(a) Show that the complex number 1 itself constitutes a basis for C. Try
to find all other bases for C.
(b) Is there any intuitive interpretation for C as we did for R in Secs. 1.1
and 1.2?
(c) Consider the set C of complex numbers as a vector space over the
√
real field. Show that {1, i}, where i = −1, is a basis for C and
hence C is a two-dimensional real vector space. Find all other bases
for this C. Is there any possible relation between R2 and C?
2. Consider the set R of real numbers as a vector space over the rational
field Q.
(a) Make sure what scalar multiplication means in this case!
(b) Two real numbers 1 and α are linearly independent if and only if α
is an irrational number.
(c) Is it possible for this R to be finite-dimensional?
3. Let the set
R+ = {x ∈ R | x > 0}
be endowed with two operations:
(1) Addition ⊕: x ⊕ y = xy (multiplication of x and y), and
(2) Scalar multiplication of x ∈ R+ by α ∈ R: α x = xa .
(a) Show that R+ is a real vector space with 1 as the zero vector
and x−1 the inverse of x.
(b) Show that each vector in R+ is linearly independent by itself
and every two different vectors in R+ are linearly dependent.
(c) Show that R+ is linear isomorphic to the real vector space R.
4. Show that B = {
x1 , xn }, where
x2 , . . . ,

xj = (1, 1, . . . , 1, 0, . . . , 0), 1≤j≤n

j
is a basis for F and find the coordinate [

n
x = (x1 , x2 , . . . , xn ) ∈ Fn
x ]B of
with respect to B.
2.4 Changes of Coordinates: Affine and Linear

Transformations (or Mappings)
Let Σ(O; A1 , A2 ) and Σ(O ; B1 , B2 ) be two geometric models of R2 , with
− −−
ai = OAi and bi = O B i , i = 1, 2 and the bases
B = { a2 },
a1 ,

B = { b1 , b2 }.
Our main purpose here is to establish the relationship between the coordi-
nate vectors [P ]B and [P ]B of the same point P ∈ Σ with respect to the
bases B and B , respectively (see Fig. 2.19).
A2
P
B1
a2
A1 B2 b1
a1 b2
O
O'
Fig. 2.19
The following notation will be adopted through this section:

−−
[O ]B = [OO ]B = (α1 , α2 ),
−−
[O]B = [O O]B = (β1 , β2 ),
−
[P ]B = [OP ]B = (x1 , x2 ),
−−
[P ]B = [O P ]B = (y1 , y2 ).

In view of parallel invariance of vectors, one may consider b1 and b2 as
vectors in Σ(O; A1 , A2 ), and hence

[ bi ]B = (αi1 , αi2 ), i.e. bi = αi1
a1 + αi2
a2 , i = 1, 2.
Similarly, when considered a2 ∈ Σ(O ; B1 , B2 ).
a1 ,

[
ai ]B = (βi1 , βi2 ), i.e.
ai = βi1 b1 + βi2 b2 , i = 1, 2.
As indicated in Fig. 2.19,
− −− −−
OP = OO + O P

⇒ x1
a1 + x2
a2 = (α1
a1 + α2
a2 ) + (y1 b1 + y2 b2 )
= α1
a1 + α2
a2 + y1 (α11
a1 + α12
a2 ) + y2 (α21
a1 + α22
a2 )
= (α1 + α11 y1 + α21 y2 )

a1 + (α2 + α12 y1 + α22 y2 )
a2 .
⇒ (since
a1 and
a2 are linearly independent)
x1 = α1 + α11 y1 + α21 y2 ,
x2 = α2 + α12 y1 + α22 y2 . (2.4.1)
Suppose the reader is familiar with basic facts about matrix (if not, please
refer to Sec. B.4). The above mentioned equations can be written into a
single one like
(x1 , x2 ) = (α1 , α2 ) + (α11 y1 + α21 y2 , α12 y1 + α22 y2 )

α11 α12
= (α1 α2 ) + (y1 y2 ) . (2.4.1 )
α21 α22
Note Hence and hereafter, when concerned with matrix computation,

vector (x1 , x2 ) would always be considered as a 1 × 2 matrix (x1 x2 ) or
[x1 x2 ], a row matrix. So, occasionally, the notations (x1 , x2 ) and (x1 x2 )
might rise to some confusion.
Summarize the above result as part of the
Coordinate changes of two vectorized spaces of R2
Let Σ(O; A1 , A2 ) and Σ(O ; B1 , B2 ) be two vectorized spaces of R2 (also
called coordinate system on R2 ), with bases
−
B = { a2 },
a1 , ai = OAi , i = 1, 2, and
−−
B = { b1 , b2 }, bi = O B i , i = 1, 2,
respectively. Then, the coordinate vectors [P ]B and [P ]B , of the same point
P ∈ R2 , satisfy the following two formulas of coordinate changes:

[P ]B = [O ]B + [P ]B AB
B ,
[P ]B = [O]B + [P ]B AB
B ,
(usually called affine transformation and linear transformation if O = O )

where

B [ b1 ]B α11 α12
AB = =
[ b2 ]B α21 α22
is called the transformation or transition matrix of the basis B with respect

to basis B;

[ a1 ]B β11 β12
ABB = =
[a2 ]B β21 β22
the transformation of B with respect to B , satisfy:
1. The determinants (see Sec. B.6 or Sec. 4.3)

α11 α12
B
det AB = = α11 α22 − α12 α21 = 0,
α21 α22

β11 β12
B
det AB = = β11 β22 − β12 β21 = 0.
β21 β22

2. The matrices AB B
B and AB are invertible to each other, i.e.

B B B B 1 0
AB AB = AB AB = I2 = ,
0 1
and hence
1
det AB
B = ;
det AB
B

−1 1 α22 −α12
AB
B = B = AB
B .
det AB −α21 α11
3. Therefore, the two formulas are reversible, i.e.
−1 −1
[P ]B = −[O ]B AB
B + [P ]B ABB ;

B −1 B
[O]B = −[O ]B AB = −[O ]B AB .
In particular, if O = O , then [O]B = [O ]B = (0, 0). (2.4.2)

Proof Only 1, 2 and 3 are needed to be proved.

About 1 Note that bi = αi1
a1 + αi2
a2 , one has:

b1
and b2 are linearly independent.

⇔ (see (2.2.4)) The only solution of y1 , y2 in y1 b1 +y2 b2 = 0 is y1 = y2 = 0.
⇔ The equation
y1 (α11
a1 + α12
a2 ) + y2 (α21
a1 + α22
a2 )

= (α11 y1 + α21 y2 )
a1 + (α12 y1 + α22 y2 )
a2 = 0
has zero solution only, i.e.
α11 y1 + α21 y2 = 0
α12 y1 + α22 y2 = 0
has only zero solution y1 = y2 = 0.

⇔ The coefficient matrix has the determinant

α11 α12
B
det AB = = α11 α22 − α12 α21 = 0.
α21 α22
This finishes the proof of 1.

About 2 By
ai = βi1 b1 + βi2 b2 , i = 1, 2,

b1 = α11
a1 + α12
a2

= α11 (β11 b1 + β12 b2 ) + α12 (β21 b1 + β22 b2 )

= (α11 β11 + α12 β21 ) b1 + (α11 β12 + α12 β22 ) b2

⇒ (owing to linear independence of b1 and b2 )
α11 β11 + α12 β21 = 1

α11 β12 + α12 β22 = 0.

Similarly, by b2 = α21
a1 + α22
a2 ,
α21 β11 + α22 β21 = 0

α21 β12 + α22 β22 = 1.
Put the above two sets of equations in the form of matrix product, and
they can be simplified, in notation, as

α11 α12 β11 β12 β11 β12 α11 α12 1 0
= =
α21 α22 β21 β22 β21 β22 α21 α22 0 1

⇒ AB B B B
B AB = AB AB = I2.

This means that AB B
B and AB are invertible to each other. Actual compu-
tation in solving βij in terms of αij shows that

B −1 1 α22 −α12
AB = AB
B = .
α11 α22 − α12 α21 −α21 α11
This is 2.
−1
About 3 Multiply both sides of the first formula from the right by AB
B ,
one has
−1 −1
[P ]B ABB = [O ]B AB B + [P ]B
−1
⇒ [P ]B = −[O ]B ABB + [P ]B AB
B
−1
All we need to prove in the remaining is that [O]B = −[O ]B AB
B . For
−− −−
this purpose, note first that OO = −O O. Therefore, remembering the
notations we adopted at the beginning of this subsection,
−−
OO = α1
a1 + α2
a2

= α1 (β11 b1 + β12 b2 ) + α2 (β21 b1 + β22 b2 )

= (α1 β11 + α2 β21 ) b1 + (α1 β12 + α2 β22 ) b2
−−
= −O O

= −(β1 b1 + β2 b2 ).
⇒ β1 = −(α1 β11 + α2 β21 )
β2 = −(α1 β12 + α2 β22 ).
⇒ (β1 β2 ) = −(α1 β11 + α2 β21 α1 β12 + α2 β22 )

β11 β12
= −(α1 α2 ) .
β21 β22
This finishes 3.
Example In R2 , fix the following points
O = (1, 0), A1 = (1, 2), A2 = (0, 1), and

O = (−1, −1), B1 = (0, 0), B2 = (2, 3).
Construct the vectorized spaces Σ(O; A1 , A2 ) and Σ(O ; B1 , B2 ), and then

use them to justify the content of (2.4.2).
Solution Suppose
−

a1 = OA1 = (1, 2) − (1, 0) = (0, 2),
−
a2 = OA2 = (0, 1) − (1, 0) = (−1, 1);

−−
b1 = O B 1 = (0, 0) − (−1, −1) = (1, 1),
−−
b2 = O B 2 = (2, 3) − (−1, −1) = (3, 4),
and let

B = { a2 },
a1 , B = { b1 , b2 }
be bases of Σ(O; A1 , A2 ) and Σ(O ; B1 , B2 ) respectively (see Fig. 2.20).
B2
b2
A1
A2
a1
a2
B1 O
b1
O'
Fig. 2.20

To compute AB
B :

b1 = α11
a1 + α12
a2
⇒ (1, 1) = α11 (0, 2) + α12 (−1, −1) = (−α12 , 2α11 + α12 )

α12 = −1
⇒
2α11 + α12 = 1

α12 = −1
⇒
α11 = 1

⇒ [ b1 ]B = (1, −1).
Similarly,

b2 = α21
a1 + α22
a2

⇒ (3, 4) = b2 = α21 (0, 2) + α22 (−1, 1) = (−α22 , 2α21 + α22 )

α22 = −3
⇒
2α21 + α22 = 4
7
α21 =
⇒ 2
α22 = −3

7
⇒ [ b2 ]B = , −3 .
2
Putting together, then

[b ] 1 −1
AB
B = 1 B = 7 .
[ b2 ]B 2 −3
To compute AB
B :

a1 = β11 b1 + β12 b2
⇒ (0, 2) = β11 (1, 1) + β12 (3, 4) = (β11 + 3β12 , β11 + 4β12 )

β11 + 3β12 = 0
⇒
β11 + 4β12 = 2

β11 = −6
⇒
β12 = 2
⇒ [
a1 ]B = (−6, 2).
Also,

a2 = β21 b1 + β22 b2
⇒ (−1, 1) = β21 (1, 1) + β22 (3, 4) = (β21 + 3β22 , β21 + 4β22 )

β + 3β22 = −1
⇒ 21
β21 + 4β22 = 1

β = −7
⇒ 21
β22 = 2
⇒ [
a2 ]B = (−7, 2).
Hence,

−6 2
AB
B = .
−7 2
Through actual computation, one gets

1 −6 2−1 1 0
AB B
B AB = = = I2 , and
7
2 −7 −3
2 0 1

−6 2 1 −1 0 1
AB B
B AB = = = I2 ,
−7 2 7
2 −3 1 0

which show that AB B
B and AB are, in fact, invertible to each other. By
the way,

1 −1 7 1
= 7

det AB
B = −3 + = ,
2 −3 2 2

−6 2 1
det AB = = −12 + 14 = 2 =
B
−7 2 det AB
B

and

−1 1 −3 1 −3 1 −6 2
AB
B = =2 = = AB
B
det AB
B − 72 1 − 72 1 −7 2
as expected from (2.4.2).

Finally,
−−
OO = (−1, −1) − (1, 0) = (−2, −1)
= α1
a1 + α2
a2 = α1 (0, 2) + α2 (−1, 1) = (−α1 , 2α1 + α2 )

α2 = 2
⇒
2α1 + α2 = −1
3
α1 = −
⇒ 2
α2 = 2

3
⇒ [O ]B = − , 2 .
2
While,
−− −−
O O = −OO = (2, 1)

= β1 b1 + β2 b2 = β1 (1, 1) + β2 (3, 4) = (β1 + 3β2 , β1 + 4β2 )

β1 + 3β2 = 2
⇒
β1 + 4β2 = 1

β =5
⇒ 1
β2 = −1
⇒ [O]B = (5, −1).
Now, by actual computation, we do have

B 3 −6 2
−[O ]B AB = − − 2 = −(9 − 14, −3 + 4)
2 −7 2
= −(−5, 1) = (5, −1) = [O]B .
The wanted formulas of changes of coordinates are

3 1 −1
[P ]B = − 2 + [P ]B 7 and
2 2 −3

−6 2
[P ]B = (5 −1) + [P ]B
.
−7 2
−−
For example, if [P ]B = (5, 2), i.e. O P = 5 b1 + 2 b2 , then

3 1 −1
[P ]B = − 2 + (5 2) 7
2 2 −3

3 21
= − , 2 + (12, −11) = , −9
2 2
−
2 a1 − 9 a2 . 2

which means OP = 21

Remark The computations of AB B
B and AB .
Here, we suppose that the readers are familiar with basic matrix computa-
tion techniques, which may be obtained from Sec. B.4.
Suppose

ai = (ai1 , ai2 ), i = 1, 2, and bi = (bi1 , bi2 ), i = 1, 2
both are linearly independent. Let

a1 a11 a12 b b b12
A= = and B = 1 = 11 .
a2 a21 a22 b2 b21 b22
Then A and B are invertible matrices (see Ex. <A> 2).

Now,

b1 = α11
a1 + α12
a2

b2 = α21
a1 + α22
a2

b1 α11
a1 + α12
a2 α11 α12 a1
⇒ = =
b2 α21
a1 + α22
a2 α21 α22
a2

⇒ B = AB
B A, where B = { a1 , a2 } and B = { b1 , b2 }

−1
B −1 b11 b12 a11 a12
⇒ AB = BA = . (2.4.3)
b21 b22 a21 a22
Similarly,
−1
a11 a12 b11 b12
AB
B = AB
−1
= . (2.4.3 )
a21 a22 b21 b22
Exercises
<A>
1. Prove [P ]B = [O]B + [P ]B AB B without recourse to the established

formula [P ]B = [O ]B + [P ]B AB
B .
2. For vectors in R2 , prove that the following are equivalent.
(a)
a1 = (a11 , a12 ) and
a2 = (a21 , a22 ) are linearly independent.
(b) The determinant

a1 a11 a12
=
a2 a21 a22 = a11 a22 − a12 a21 = 0.
(c) The square matrix

a1 a a12
= 11
a2 a21 a22
is invertible. In this case, prove that the inverse matrix is
−1
a11 a12 1 a22 −a12
= .
a21 a22 a11 a22 − a12 a21 −a21 a11
3. In R2 , let
e1 = (1, 0) and
e2 = (0, 1). Then
N = { e2 }
e1 ,
is a basis for the rectangular coordinate system Σ( o;e1 ,
e2 ). Owing to
the fact that x = (x1 , x2 ) = x1 e1 + x2 e2 = [ x ]N for any

x ∈ R2 , N
is specifically called the natural basis of R2 . Now, for any coordinate
−
system Σ(O; A1 , A2 ) in R2 with basis B = { a2 },
a1 , ai = OAi , i = 1, 2,
prove that
−1
a1
[ x ]B = x ,
a2
x ∈ R2 is considered as a 1 × 2 matrix.
where
4. Take two sets of points in R2 as follows,
O = (−1, 2), A1 = (4, −1), A2 = (−3, −4) and

O = (2, −3), B1 = (3, 5), B2 = (−2, 3).
Proceed as in the Example to justify (2.4.2) and use (2.4.3) to compute
B
ABB and AB .
5. Construct two coordinate systems (i.e. vectorized spaces) Σ(O; A1 , A2 )
and Σ(O ; B1 , B2 ) in R2 , and explain graphically the following formulas
(refer to (2.4.1)) for changes of coordinates, where [P ]B = (x1 , x2 ) and
[P ]B = (y1 , y2 ).
(a) x1 = y1 + y2 , x2 = 2y1 − 3y2 .
1
(b) x1 = − + 6y1 − 3y2 , x2 = 2 − 5y1 + 4y2 .
2
3 1 1 3
(c) y1 = x1 + x2 , y2 = x1 + x2 .
2 2 2 2
(d) y1 = 2 + 5x1 + 6x2 , y2 = 1 − 4x1 + 7x2 .
1 2 2 1
(e) y1 = 4 + √ x1 − √ x2 , y2 = −3 + √ x1 + √ x2 .
5 5 5 5

1. Consider the following system of equations
x1 = α1 + α11 y1 + α21 y2
x2 = α2 + α12 y1 + α22 y2 ,
where the coefficient matrix

α11 α12
α21 α22
is invertible. Construct in R2 two coordinate systems so that the above

prescribed equations will serve as changes of coordinates between them.
How many pairs of such coordinate systems could exist?
2. Give a real 2 × 1 matrix

a
A= 1 .
a2
x = (x1 , x2 ) ∈ R2 , define
For vector

a

x A = (x1 x2 ) 1 = a1 x1 + a2 x2
a2
and consider it as a vector in R (see the Note right after (2.4.1 )).
y ∈ R2 and α ∈ R,
x,
(a) Show that, for
(α
x +
y )A = α(
x A) +
y A.
Hence A represents a linear transformation (see Secs. 2.7 and B.7)

x →
defined as xA from R2 to R. The sets
x ∈ R2 |
Ker(A) = { x A = 0}, also denoted as N(A);
xA |
Im(A) = { x ∈ R2 }, also denoted as R(A),
are respectively called the kernel and the range of A. Show that
Ker(A) is a subspace of R2 while Im(A) a subspace of R.
(b) Show that A, as a linear transformation, is onto R, i.e. Im(A) = R,
if and only if Ker(A) = R2 .

(c) Show that A is one-to-one if and only if Ker(A) = { 0 }. Is it possible
that A is one-to-one?
(d) Suppose f : R2 → R is a linear transformation. Show that there exist

unique scalars a1 and a2 such that

a
f ( x ) = (x1 x2 ) 1 = a1 x1 + a2 x2

a2
x = (x1 , x2 ) ∈ R2 . In this case, call

where

a
[f ]N = 1
a2
the matrix representation of f related to the basis N = { e2 } for

e1 ,
R and the basis {1} for R.
2
3. Give a real 1 × 2 matrix A = [a1 a2 ]. Define the mapping A: R → R2 as
x → xA = (a1 x, a2 x).
(a) Show that A, as a mapping, is a linear transformation from R to R2 .

What are Ker(A) and Im(A)? Could A be both one-to-one and onto?
(b) Give a fixed subspace a1 x1 + a2 x2 = 0 of R2 . Show that there exist
infinitely many linear transformations f mapping R onto that sub-
space. Are such mappings one-to-one?
4. Let

a a12
A = 11
a21 a22
be a real 2 × 2 matrix. Define the mapping A: R2 → R2 by

a11 a12

x = (x1 , x2 ) → x A = (x1 x2 )

a21 a22
= (a11 x1 + a21 x2 , a12 x1 + a22 x2 ).
(a) Show that A is a linear transformation. Its kernel Ker(A) and range
Im(A) are subspaces of R2 .
(b) Show that dim Ker(A) + dim Im(A) = dim R2 = 2 (see Sec. B.3).
(c) Show that the following are equivalent.

(1) A is one-to-one, i.e. Ker(A) = { 0 }.
(2) A is onto, i.e. Im(A) = R2 .
(3) A maps every basis B = { x2 } for R2 onto the basis
x1 ,
{ x1 A, x2 A} for R .
2
(4) A maps a basis B = { x2 } for R2 onto the basis {

x1 , x1 A,
x2 A}
for R .
2
(5) A is invertible. In this case, A is called a linear isomorphism.

(d) Let B = { x2 } be any fixed basis for R2 and
x1 , y1 ,
y2 be any two
vectors in R , not necessarily linearly independent. Then, the map-
2
ping f : R2 → R2 defined by
f (α1
x1 + α2
x2 ) = α1
y1 + α2
y2
is the unique linear transformation from R2 into R2 satisfying
f (
x1 ) =
y1 , f (
x2 ) =
y2 .
Suppose [f (x1 )]B = (a11 , a12 ) and [f (
x2 )]B = (a21 , a22 ). Then

[f ( x1 )]B a11 a12
[f (
x )]B = [
x ]B [f ]B , where [f ]B = = .
[f (x2 )]B a21 a22
This [f ]B is called the matrix representation of f related to the
basis B.
(e) Let f : R2 → R2 be any linear transformation. Then (show that)
f (
x) =
x A,
where A = [f ]N and N = { e2 } is the natural basis for R2 .
e1 ,
(f) Let S: a1 x1 + a2 x2 = 0 be a subspace of R2 . Show that there are
infinitely many linear transformations f : R2 → R2 such that
S = Ker(f ), and
S = Im(f )
hold respectively.
(g) Let S1 : a11 x1 + a12 x2 = 0 and S2 : a21 x1 + a22 x2 = 0 be two sub-
spaces of R2 . Construct a linear transformation f : R2 → R2 such
that
f (S1 ) = S2 .
How many such f are possible?
5. Let B = { x2 } and B = {
x1 , y2 } be two bases for R2 and f : R2 → R2
y1 ,
be a linear transformation.
(a) Show that

[f ]B = AB B
B [f ]B AB .
2.5 Straight Lines in a Plane 59
(b) The matrix representation of f with respect to B and B is defined as

B [f ( x1 )]B a11 a12
[f ]B = = ,
[f (x2 )]B a21 a22
where [f (
xi )]B = (ai1 , ai2 ), i.e. f (
xi ) = ai1
y1 + ai2
y2 for i = 1, 2.
Show that
[f ( x ]B [f ]B
x )]B = [ B ;
in particular, [f ]B B
B = [f ]B and [ x ]B = [ x ]B AB . Show also that

[f ]B B
B = [f ]B AB and [f ]B B
B = [f ]B AB .
Let B = { xn } be a basis for an n-dimensional vector space V over F.

x1 , . . . ,
Then, the mapping f : V → Fn defined by
f (
x ) = [
x ]B
is a linear isomorphism. Thus, V is isomorphic to Fn .

Try to extend all the problems in Ex. to Fn . What is the coun-
terpart of Ex. <A> 2 in Fn ? Please refer to Secs. B.4, B.5 and B.7, if
necessary.
2.5 Straight Lines in a Plane

Let Σ(O; A1 , A2 ) be a fixed coordinate system (i.e. vectorized space) in R2 ,
−
with basis B = { a2 } where
a1 , ai = OAi , i = 1, 2.
Denote by Li the straight line determined by O and Ai , i = 1, 2 (see
Fig. 2.21).
L2
L1
P (0, x2 )
P
A2
( x1 , 0)
A1
O
Fig. 2.21
Take a point P ∈ R2 . Then

P ∈ L1
−
⇔ OP ∈ L(O; A1 )
⇔ [P ]B = (x1 , 0).
This implies that, a point P ∈ R2 lies on the straight line L1 , if and only if
the second component x2 of the coordinate [P ]B of P with respect to the
basis B is equal to zero, i.e. x2 = 0. Hence, call
x2 = 0 (2.5.1)
the (coordinate) equation of L1 with respect to B, and L1 the first coordinate
axis. By exactly the same reason,
x1 = 0 (2.5.2)
is called the (coordinate) equation of the second coordinate axis L2 with
respect to B.
These two coordinate axes intersect at the origin O and separate the
whole plane Σ(O; A1 , A2 ) into four parts, called quadrants, according to
the signs of x1 and x2 of the components of the coordinate [P ]B = (x1 , x2 ),
P ∈ R2 , as follows.
First quadrant: x1 > 0, x2 > 0.
Second quadrant: x1 < 0, x2 > 0.
Third quadrant: x1 < 0, x2 < 0.
Fourth quadrant: x1 > 0, x2 < 0. (2.5.3)
See Fig. 2.22.
(+, −) A1 (+, 0)
(0, −) a1
(−, −) O (+, +)
(−, 0) a2
(−, +) (0, +)
A2
Fig. 2.22
Note that the two coordinate axes separate each other at O into four half-
lines, symbolically denoted by (+, 0), (0, +), (−, 0) and (0, −).
Let X be a point lying on the straight line L, considered as a subset of

Σ(O; A1 , A2 ), and generated by two different points A and B in R2 .
Denote by
−
a = OA (viewed as a point on the line L),
−
b = AB (viewed as a direction vector of the line L), and
−−
x = OX (viewed as a point on L).
−
x −
See Fig. 2.23. Then, the position vector a = AX of the point x relative

to the point a must be linearly dependent on the direction b . Hence, there
exists a scalar t ∈ R such that

x −
a = tb

⇒
x =
a +tb, t∈R (2.5.4)
called the parametric equation (in vector form) of the line L, passing through

the point a with direction vector b , in the coordinate system Σ(O; A1 , A2 ).
L
X
B
b
A
x
a1
a
a2
O
Fig. 2.23
Suppose
[A]B = [
a ]B = (a1 , a2 ),

[ b ]B = (b1 , b2 ),
[X]B = [
x ]B = (x1 , x2 ).
In terms with these coordinates, (2.5.4) can be rewritten as

[X]B = [
a ]B + t[ b ]B (2.5.5)
⇒ (x1 , x2 ) = (a1 , a2 ) + t(b1 , b2 ) = (a1 + tb1 , a2 + tb2 )

x1 = a1 + tb1
⇒ , t∈R (2.5.6)
x2 = a2 + tb2
also called the parametric equation of L. Furthermore, eliminating the

parameter t from (2.5.6), one gets
x1 − a1 x2 − a2
= , or
b1 b2
αx1 + βx2 + γ = 0 (2.5.7)
called the coordinate equation of the line L in the coordinate system
Σ(O; A1 , A2 ), with coordinate equations (2.5.1) and (2.5.2) as special cases.
Example (continued from the Example in Sec. 2.4) Find the equation
of the line determined by the points A = (−3, 0) and B = (3, 3) in R2 ,
respectively, in Σ(O; A1 , A2 ) and Σ(O ; B1 , B2 ).
Solution In the coordinate system Σ(O; A1 , A2 ), let
−

a = OA = (−3, 0) − (1, 0) = (−4, 0) ⇒ [ a ]B = (−2, 4),

− 9
b = AB = (3, 3) − (−3, 0) = (6, 3) ⇒ [ b ]B = , −6 .
2
For arbitrary point X ∈ L, let [X]B = (x1 , x2 ) and (2.5.5) shows that

9
(x1 , x2 ) = (−2, 4) + t , −6 , t ∈ R
2

 9
x1 = −2 + t
⇒ 2 , or 4x1 + 3x2 − 4 = 0
x = 4 − 6t
2
is the required parametric and coordinate equations of L in Σ(O; A1 , A2 ).

In the coordinate system Σ(O ; B1 , B2 ), let
−−
a = O A = (−3, 0) − (−1, −1) = (−2, 1) ⇒ [

a ]B = (−11, 3),
−
b = AB = (6, 3) ⇒ [ b ]B = (15, −3).
For point X in L, let [X]B = (y1 , y2 ). Then
(y1 , y2 ) = (−11, 3) + t(15, −3), t ∈ R

y1 = −11 + 15t
⇒ , or y1 + 5y2 − 4 = 0
y2 = 3 − 3t
is the required equations in Σ(O ; B1 , B2 ). 2
Remark Change formulas of the equations of the same line in different

coordinate systems.
Adopt the notations and results in (2.4.2). From (2.5.5), the equation
of the line L in Σ(O; A1 , A2 ) is

[X]B = [
a ]B + t[ b ]B , X ∈ L.
Via the change of coordinate formula [X]B = [O]B + [X]B AB
B , one obtains

a ]B + t[ b ]B }AB
[X]B = [O]B + {[ B , (2.5.8)
which is the equation of the same line L in another coordinate system
Σ(O ; B1 , B2 ).
For example, we use the above example and compute as follows. As we
already knew that

−6 2
[O]B = (5, −1) and ABB = ,
−7 2
using [X]B = (x1 , x2 ) and [X]B = (y1 , y2 ), we have

9 −6 2
(y1 , y2 ) = (5, −1) + (−2, 4) + t −6
2 −7 2
= (5, −1) + (−16 + 15t, 4 − 3t)
= (−11 + 15t, 3 − 3t).
Hence y1 = −11 + 15t, y2 = 3 − 3t as shown in the example.
Finally, we discuss
The relative positions of two lines in a plane
Let

L1 :
x =a1 + t b1 (passing the point
a1 with direction b1 ),

L2 : x = a2 + t b2 (passing the point

a2 with direction b2 )
be two given lines in R2 . Then L1 and L2 may have the following three
relative positions. They are,

1. coincident (L1 = L2 ) ⇔ the vectors
a1 − a2 , b1 and b2 are linearly
dependent,

2. parallel (L1 // L2 ) ⇔ b1 and b2 are linearly dependent, but linearly
independent of a2 −
a1 , and

3. intersecting (in a unique single point) ⇔ b1 and b2 are linearly
independent. (2.5.9)
See Fig. 2.24.

a2 a1 b2
b2
b1 a2
a2 − a1 a1
b2 b1
b1
a1 a2
L1 = L 2
L1 // L 2
Fig. 2.24

Proof Since b1 and b2 are nonzero vectors, only these three cases on the
right sides of 1, 2 and 3 need to be considered. Only sufficiences are proved
and the necessities are left to the readers.

a2 −
Case 1 Let b2 = α b1 and x ∈ L1 , then
a1 = β b1 . For point
t−β

a2 − β b1 + t b1 =
x = a1 + t b1 = a2 + b2 ,
α
x ∈ L2 . Conversely, if
which means x ∈ L2 , then

x = a2 + t b2 =
a1 + β b1 + tα b1 =
a1 + (β + tα) b1
x ∈ L1 . Therefore, L1 = L2 holds.
indicates that

Case 2 Suppose b2 = α b1 , but a2 −
a1 = β b1 for any scalar β ∈ R. If

there exists a point x common to both L1 and L2 , then two scalars t1 and
t2 can be found so that

a1 + t1 b1 = a2 + t2 b2 =
a2 + t2 α b1

⇒
a2 −
a1 = (t1 − t2 α) b1
contradicting to the hypotheses. Hence, being no point of intersection, L1

and L2 should be parallel to each other.
Case 3 Since the vector a2 − a1 is coplanar with linearly independent

vectors b1 and b2 , there exist unique scalars t1 and t2 such that

a2 −
a1 = t1 b1 + t2 b2

⇒
a1 + t1 b1 =
a2 + (−t2 ) b2 ,
which means that L1 and L2 have only one point of intersection.

According to (2.5.4), the line determined by two distinct points

a1 and
a2 on the plane R has the parametric equation
2

x = a2 −
a1 + t( a1 ) = (1 − t)
a1 + t
a2 , t∈R (2.5.10)
Just like (1.4.2) and (1.4.3), the directed segment
a1
a2 with the initial point

a1 and terminal point a2 is the set of points
x = (1 − t)

a1 + t
a2 , 0 ≤ t ≤ 1. (2.5.11)

If 0 < t < 1, the point x is called an interior point of with end points a1 a2

a1 and a2 ; if t < 0 or t > 1,
x is called an exterior point. See Fig. 2.25
(compare with Fig. 1.11). 12 (a1 +
a2 ) is called the middle point of
a1
a2 .
(t < 0)
(t = 0)
(0 < t < 1)
a1 (t = 1)
1 (t > 1)
(a1 + a2) a
2 2
Fig. 2.25
By a triangle
∆
a1
a2
a3 (2.5.12)

with three noncollinear points and a1 , a2 a3
as vertices, we mean the plane
figure formed by three consecutive segments a2 ,
a1 a3 and
a2 a3
a1 , which
are called sides. See Fig. 2.26.
a3
a1 a2
Fig. 2.26
A line joining a vertex to the midpoint of the opposite side is called a

median of a triangle. Note that

1 1 1
a3 − (

a1 +
a2 ) + a1 − ( a2 +
a3 ) + a2 − (
a3 +
a1 ) = 0
2 2 2
and the three medians of ∆
a1
a2
a3 meet at its centroid (refer to Ex. 7
of Sec. 3.5).
For a somewhat different treatment of line segments and triangles, please

refer to Sec. 2.6.
Exercises
<A>
1. Use the notations and results from Ex. <A> 4 in Sec. 2.4. Denote by L
the line determined by the points A = (1, 3) and B = (−6, 1) in R2 .
(a) Find the parametric and coordinate equations of L in Σ(O; A1 , A2 )
and Σ(O ; B1 , B2 ), respectively.
(b) Check the answers in (a), by using (2.5.8).
2. It is known that the equation of the line L in the rectangular coordinate
system Σ( o;e1 ,
e2 ) is
−3x1 + x2 + 4 = 0.
Find the equation of L in the coordinate system Σ(O; A1 , A2 ), where

O = (1, 2), A1 = (−2, −3) and A2 = (−4, 5). Conversely, if the line L
has equation y1 + 6y2 −3 = 0 in Σ(O; A1 , A2 ), what is it in Σ(
o;
e1 ,
e2 )?

1. Give scalars ai , bi , i = 1, 2 and ci , di , i = 1, 2 with the requirement that

b1 and b2 , d1 and d2 , are not equal to zero simultaneously, and

x1 = a1 + tb1 y1 = c1 + td1
; , t ∈ R.
x2 = a2 + tb2 y2 = c2 + td2
Construct, in R2 , two coordinate systems Σ(O; A1 , A2 ) and

Σ(O ; B1 , B2 ) on which the same line L has the equations described
above, respectively.
2. Prove that the relative positions of two straight lines are independent of
the choices of coordinate systems in R2 .
3. Let

Li :
x =
ai + t bi , t ∈ R, i = 1, 2, 3
be three given lines in R2 . Describe all possible relative positions among

them and characterize it in each case.
4. Show that the whole plane R2 cannot be represented as a countable
union of distinct straight lines, all passing through a fixed point.
5. The image x0 + S of a subspace S of R2 under the translation

x → x0 + x is called an affine subspace of R2 (refer to Fig. B.2).

The dimension of x0 + S is defined to be the dimension of S, i.e.

dim(x0 + S) = dim S. Then, x0 + S = x0 −
y0 + S if and only if y0 ∈ S;
in particular, x0 + S = S if and only if x0 ∈ S. Show that:

(a) Each point in R2 is a zero-dimensional affine subspace of R2 .

(b) Each straight line in R2 is a one-dimensional affine subspace.
(c) R2 itself is the only two-dimensional affine subspace.
6. The composite T (x) = x ) of a linear isomorphism f : R2 → R2
x0 + f (
x →
followed by a translation x0 + x is called an affine transformation
or mapping. Therefore, related to the natural basis N = { e2 } of R2 ,
e1 ,
T can be represented as
T (
x) =
x0 +
x A or y1 = α1 + a11 x1 + a21 x2
y2 = α2 + a12 x1 + a22 x2 ,
where A = [aij ] is an invertible 2 × 2 matrix .

(a) For any two line segments
a1
a2 and b1 b2 , there exist infinitely many
affine transformations T such that

T (
ai ) = bi , i = 1, 2 and

T (
a1
a2 ) = b1 b2 .

(b) For any two given ∆a1
a2
a3 and ∆ b1 b2 b3 , there exists a unique
affine transformation T such that

T (
ai ) = bi , i = 1, 2, 3 and

T (∆
a1
a2
a3 ) = ∆ b1 b2 b3 .
In the following Exs. 7 and 8, readers should have basic knowledge about
the Euclidean plane (refer to Chap. 4, if necessary).
7. Let a1 = (−1, −2), a2 = (−2, 1) and a3 = (1, 3).
(a) Compute the three interior angles, side lengths and the area of
∆a1
a2
a3 .
(b) Give affine transformation

2 −1
x ) = (2, 3) +
T ( x .
−6 −5

Let bi = T (ai ), i = 1, 2, 3. Graph ∆ b1 b2 b3 and ∆ a1
a2
a3 .
Then, compute the three interior angles, side lengths and the area

of ∆ b1 b2 b3 . Compare these quantities with (a). What can you
find?
(c) Find the fourth point a4 in R2 so that a1 ,
a2 ,
a3 ,
a4 (in this order)

constitute the vertices of a parallelogram, denoted by a1 a2 a3 a4 .

Compute the area of a1 a2 a3 a4 . Do the same questions as in (b).

In particular, do the image points b1 , b2 , b3 , b4 form a parallelo-
gram? What is its area?
8. In R2 (of course, with a rectangular coordinate system Σ(
o;
e1 ,
e2 )),

the length of a vector x = (x1 , x2 ) is denoted by
|
x | = (x21 + x22 )1/2 .
The unit circle in R2 is
|
x |2 = 1 or x21 + x22 = 1.
See Fig. 2.27.

(a) Find the equation of the unit circle in the coordinate system
Σ(O; A1 , A2 ), where O = (0, 0), A1 = (1, 2), A2 = (−3, 1).
(b) If the equation obtained in (a) is observed in the coordinate system
Σ( o;
e1 ,
e2 ), then what curve does it represent? Try to graph this
curve and find the area enclosed by it. What is the ratio of this area
with respect to that of unit disk?
x2
x x
x1
O (1, 0)
Fig. 2.27
Let I2 = {0, 1} be the residue classes of modulus 2 (see Sec. A.3).

I2 is a finite field with two elements. Then, I22 = {(0, 0), (0, 1), (1, 0), (1, 1)}
is a two-dimensional vector space over I2 having subspaces {(0, 0)}, {(0, 0),
(1, 0)}, {(0, 0),(0, 1)} and {(0, 0),(1, 1)}. Moreover,
I22 = {(0, 0)} ∪ {(0, 0), (1, 0)} ∪ {(0, 0), (0, 1)} ∪ {(0, 0), (1, 1)}.
See Fig. 2.28. Since
{(0, 1), (1, 1)} = (0, 1) + {(0, 0), (1, 0)},
{(0, 1), (1, 1)} is an affine straight line not passing through (0, 0). The other
one is {(1, 0), (1, 1)}.
(0, 1) (1, 1)
(0,0) (1, 0)
Fig. 2.28
1. Let I3 = {0, 1, 2} and construct the vector space I33 over I3 . See Fig. 2.29
and try to spot the vectors (2, 1, 1), (1, 2, 1), (1, 1, 2).
(0, 0, 2) (0,1, 2)
(0, 2, 2)
(1, 0, 2) (1, 2, 2)
(2, 1, 2) (2, 2, 2)
(2, 0, 2)
(0,0,1)
(0,1,1) (0, 2,1)
(1, 0,1) (1,1,1)

(2, 0,1) (2, 2,1)
(0, 2, 0)
(1, 0, 0) (0, 0, 0) (0,1, 0)
(1, 2, 0)
(1,1, 0)
(2, 0, 0) (2, 2, 0)
(2,1, 0)
Fig. 2.29 (Dotted lines do not contain points except ·.)
(a) There are 27 vectors in I33 . List them all.

(b) List all 13 one-dimensional subspaces. How many two-dimensional
subspaces are there? List 10 of them. How many vectors are there
in each subspace?
(c) How many affine straight lines not passing through (0, 0, 0) are
there?
(d) How many different ordered bases for I33 are there?
(e) Is
    
1 0 0 1 2 1 1 1 1
0 1 0 = 2 0 1 1 0 2
0 0 1 1 1 1 1 2 1
true? Anything to do with changes of coordinates (or bases)?
(f) Let Si , 1 ≤ i ≤ 13, denote the 13 one-dimensional subspaces of I33 .
Then

13
I33 = Si .
i=1
2. Suppose F is a field of characteristic 0 (see Sec. A.3) and V is a vector

space over F.
(a) (extension of Ex. 4) V cannot be covered by any finite number
of proper subspaces of V . That is, suppose S1 , . . . , Sk are any finite
number of proper subspaces of V , then there exists x ∈ V such that
k
x∈

/ i=1 Si .
(b) Let S1 , . . . , Sk be as in (a) and suppose dim V = n. Then, there
exists a basis { xn } for V such that
x1 , . . . , xj ∈
/ Sj for 1 ≤ j ≤ k.
(c) Suppose S0 , S1 , . . . , Sk are subspaces of V such that

k
S0 ⊆ Sj .
j=1
Then there exists some j, 1 ≤ j ≤ k, such that S0 ⊆ Sj holds.

3. (extension of Ex. 4) Rn or Cn (n ≥ 2) cannot be expressed as a
countable union of its proper subspaces.
Try to generalize as many problems in Ex. as possible to abstract
space V over F, while Exs. 7 and 8 should be extended to the
Euclidean space Rn .
2.6 Affine and Barycentric Coordinates

Recall the convention stated right before Remark 2.3 in Sec. 2.3: when
R2 is considered as an affine plane, its element x = (x1 , x2 ) is treated as
a point, while if considered as a vector space,
x = (x1 , x2 ) represents a
2.6 Affine and Barycentric Coordinates 71
position vector from o to the point x itself. Sometimes, points in R2 are

also denoted by capital letters such as A, X, etc.
Suppose a0 ,
a1 and a2 are non-collinear points in R2 . Then, the vectors
v1 = a1 − a0 and v2 =

a2 − a0 are linear independent, and we call the

points a0 , a1 , a2 affinely independent (refer to (2.2.4)) and the ordered set
B = {
a0 , a2 } or
a1 , { v2 }
v1 ,
an affine basis with a0 as the base point. See Fig. 2.30. If so, the vectorized
space Σ( a0 ; a1 , a2 ) is a geometric model for R2 .

a2
a2 – a0 x
a0 a1 – a0
a1
v2
v1
O
Fig. 2.30
Note that the affine basis N = { o, e2 } is the standard basis {

e1 , e2 }
e1 ,
for R when considered as a vector space.
2
Here in this section, we are going to introduce affine and barycentric

coordinates for points in the affine plane R2 and changes of their coordi-
nates. Conceptually, these are somewhat different versions of results stated
in (2.2.2)–(2.2.4), so it could be skipped if one wants.
Suppose Σ( a0 ; a2 ) is a vectorized space of R2 with B = {
a1 , a0 , a2 }
a1 ,
as an affine basis.
For any given x ∈ R2 , there exist unique constants x1 and x2 such that
x −

a1 −
a0 = x1 ( a2 −
a0 ) + x2 ( a0 )
⇒
x = a1 −
a0 + x1 ( a2 −
a0 ) + x2 ( a0 )
= (1 − x1 − x2 )
a0 + x1
a1 + x2
a2
= λ0
a0 + λ1
a1 + λ2
a2 , λ0 + λ1 + λ2 = 1, (2.6.1)
where λ0 = 1 − x1 − x, λ1 = x1 , λ2 = x2 . The ordered triple
(
x )B = (λ0 , λ1 , λ2 ), λ0 + λ1 + λ2 = 1 (2.6.2)
is called the (normalized) barycentric coordinate of the point

x with respect
to the affine basis B. Once λ1 and λ2 are known, λ0 = 1 − λ1 − λ2 is then
uniquely determined. Hence, occasionally, we just call
[ x −
x ]B = [ a0 ]B = (x1 , x2 ) = (λ1 , λ2 ), (2.6.3)
the coordinate vector of the vector x −

a0
with respect to the basis {
a1 −
a0 ,

a2 − a0 } as introduced in (2.3.1), the affine coordinate of

x , for simplicity.
In particular,
(
a0 )B = (1, 0, 0) with [
a0 ]B = (0, 0),
(
a1 )B = (0, 1, 0) with [
a1 ]B = (1, 0), and
(
a2 )B = (0, 0, 1) with [
a2 ]B = (0, 1).
The coordinate axis passing through a1 and a2 has equation λ0 = 0,
while the other two are λ1 = 0 and λ2 = 0, respectively. These three
axes divide the plane R2 into 23 − 1 = 7 regions according to the positive
or negative signs of λ0 , λ1 and λ2 in (λ0 , λ1 , λ2 ). See Fig. 2.31.
1 = 0
(−, −, +)
(−, +, +)
a2
(0, 0, 1)
(+, −, +) (0, 1, 0)
(+, +, +)
a1 (−, +, −)
a0
2 = 0
(1, 0, 0) (+, +, −) 0 = 0
(+, −, −)
Fig. 2.31
1 1 1

Note that 3 , 3 , 3 is the barycenter of the base triangle ∆ a0 a1 a2 .

Let Σ( b0 ; b1 , b2 ) be another vectorized space of R2 with

B = { b0 , b1 , b2 } the corresponding affine basis. For any x ∈ R2 , denote
(
x )B = (µ0 , µ1 , µ2 ), µ0 + µ1 + µ2 = 1 or

x−
[ b0 ]B = (y1 , y2 ) = (µ1 , µ2 ).
Then, by (2.4.2), the change of coordinates from Σ(
a0 ;
a1 ,
a2 ) to

Σ( b0 ; b1 , b2 ) is

x − b0 ]B = [
[ a0 − b0 ]B + [ a0 ]B AB
x − B , (2.6.4)
where

[a − a ] β11 β12
AB
B = 1 0 B =
[ a2 − a0 ]B β21 β22
is the transition matrix, which is invertible. Suppose

a0 − b0 ]B = (p1 , p2 ),
[

a0 − b0 = p1 ( b1 − b0 ) + p2 ( b2 − b0 ). Then, (2.6.4) can be rewritten as
i.e.

β11 β12
(y1 y2 ) = (p1 p2 ) + (x1 x2 ) (2.6.5)
β21 β22
or
 
β11 β12 0
(y1 y2 1) = (x1 x2 1)  β21 β22 0 . (2.6.5 )
p1 p2 1

In particular, if
a0 = b0 holds, then (p1 , p2 ) = (0, 0) and (2.6.5)
reduces to

β11 β12
(y1 y2 ) = (x1 x2 ) (2.6.7)
β21 β22
x −
or (2.6.4) reduces to [ x −
a0 ]B = [ a0 ]B AB
B which is a linear transfor-

mation. In case bi − b0 = ai − a0 , i = 1, 2, AB

B = I2 is the 2 × 2 identity
matrix and (2.6.5) reduces to
(y1 , y2 ) = (p1 , p2 ) + (x1 , x2 ), (2.6.8)

x − b0 ]B = [
or (2.6.4) reduces to [ a0 − b0 ]B + [
x −
a0 ]B , which represents
a translation. Therefore, a change of coordinates is a composite mapping of
a linear transformation followed by a translation.
Exercises
<A>
ai = (ai1 , ai2 ) ∈ R2 , i = 1, 2, 3.
1. Suppose
(a) Prove that {
a1 , a3 } is an affine basis for R2 if and only if the
a2 ,
matrix
 
a11 a12 1
 a21 a22 1 
a31 a32 1
is invertible.
(b) Prove that {

a1 , a3 } is an affine basis if and only if, for any per-
a2 ,
mutation σ: {1, 2, 3} → {1, 2, 3}, { aσ(1) , aσ(3) } is an affine
aσ(2) ,
basis.
ai = (ai1 , ai2 ) ∈ R2 , i = 0, 1, 2 and suppose B = {
2. Let a0 , a2 } forms
a1 ,
an affine basis. For any point x = (x1 , x2 ) ∈ R , let ( x )B = (λ0 , λ1 , λ2 )
2
be as in (2.6.2) with λ0 = 1 − λ1 − λ2 . Show that

−1
a1 −
a0
(λ1 , λ2 ) = ( x −

a0 )
a2 −

a0
with

a01 a02 1
1
λ1 = x1 x2 1 ,
a1 − a0
det
a2 −a0 a21 a22 1

a01 a02 1
1
λ2 = a11 a12 1 ,
a1 − a0
det
a2 −a0 x1 x2 1
and, conversely,

a1 −
a0

x−
a0 = (λ1 λ2 )
a2 −
a0
with the expected results (see (2.6.1))
xi = λ0 a0i + λ1 a1i + λ2 a2i , i = 1, 2.

3. Let ai = (ai1 , ai2 ), i = 0, 1, 2 and bj = (bj1 , bj2 ), j = 1, 2 be points

in R with natural basis { 0 ,
2
e2 }. Let
e1 , x = (x1 , x2 ) ∈ R2 . Suppose

b1 = b2 . Denote by L the straight line determined by b1 and b2 . See
Fig. 2.32.

(a) Show that the equation of L in { 0 , e2 } is
e1 ,
x1 − b11 x2 − b12
= , or
b21 − b11 b22 − b12

x1 x2 1

b11 b12 1 = 0.

b b22 1
21
a2
e2 L
b2 x
b1
O e1
a0
a1
Fig. 2.32
(b) Suppose B = { a0 , a2 } is an affine basis for R2 . Let [

a1 , x − a0 ]B =

(λ1 , λ2 ), [ b1 − a0 ]B = (α11 , α12 ) and [ b2 − a0 ]B = (α21 , α22 ) accord-

ing to (2.6.3). Show that L has equation

λ1 − α11 λ2 − α12
= or
α21 − α11 α22 − α12

λ1 λ2 1

α11 α12 1 = 0.

α α22 1
21

(c) Let ( x )B = (λ0 , λ1 , λ2 ), ( b1 )B = (α10 , α11 , α12 ) and ( b2 )B =
(α20 , α21 , α22 ) be as in (2.6.2). Show that L has equation
λ0 − α10 λ1 − α11 λ2 − α12
= = , or
α20 − α10 α21 − α11 α22 − α12

λ0 λ1 λ2

α10 α11 α12 = 0.

α α21 α22
20
(d) Try to deduce equations in (a)–(c) among themselves.

1. Suppose ∆ a0
a1 a2 is a base triangle. Let p and q be points inside and
outside of ∆ a0
a1
a2 , respectively. Prove that the line segment connecting

p and q will intersect ∆ a0
a1
a2 at one point.
2. Oriented triangle and signed areas as coordinates
Suppose a1 , a2 and a3 are non-collinear points in R2 . The ordered
triples { a1 , a2 , a3 } is said to determine an oriented triangle denoted

by ∆¯a1
a2
a3 which is also used to represent its signed area, considered
positive if the ordered triple is in anticlockwise sense and negative
otherwise. Take any point x ∈ R2 and hence produce three oriented

¯ ¯ ¯
triangles ∆ x a2 a3 , ∆ x a3 a1 and ∆ x
a1
a2 . See Fig. 2.33.
a3
a2
a2
a1 x
a1 a3
Fig. 2.33
Let S = ∆ ¯a1
a2
a3 , S1 = ∆¯
xa2 ¯
a3 , S2 = ∆ xa3
a1 and S3 = ∆ ¯xa1
a2 .
Then S = S1 + S2 + S3 and the triple (S1 , S2 , S3 ) is called the area
coordinate of the point x with respect to coordinate or base triangle
∆a1 a3 with
a2 a1 ,
a2 and
a3 as base points and S1 , S2 , S3 as its coordi-
nate components.
(a) Let S1 : S2 : S3 = λ1 : λ2 : λ3 and call (λ1 : λ2 : λ3 ) a homogenous
area or barycentric coordinate. In case λ1 + λ2 + λ3 = 1, such λ1 ,
λ2 and λ3 are uniquely determined and denote (λ1 : λ2 : λ3 ) simply
by (λ1 , λ2 , λ3 ) as usual and call the coordinate normalized. Given
area coordinate (S1 , S2 , S3 ) with S = S1 + S2 + S3 , then (λS1 :
λS2 : λ(S − S1 − S2 )) is a barycentric coordinate for any scalar
λ = 0. Conversely, given barycentric coordinate (λ1 : λ2 : λ3 ) with
λ1 + λ2 + λ3 = 0,

λ1 S λ2 S λ3 S
, ,
λ 1 + λ2 + λ3 λ1 + λ2 + λ3 λ 1 + λ 2 + λ3
is the corresponding area coordinate. In case λ1 + λ2 + λ3 = 0,
(λ1 : λ2 : λ3 ) is said to represent an ideal or infinite point in the realm
of projective geometry (see Ex. of Sec. 2.8.5 and Sec. 3.8.4).
(b) Owing to the fact that S = S1 + S2 + S3 , the third quantity is
uniquely determined once two of S1, S2 and S3 have been decided.
S S
Therefore, one may use (S1 , S2 ) or S1 , S2 to represent the point

x and is called the affine coordinate of x with respect to affine basis
{a3 , a2 } or {
a1 , a1 − a2 −
a3 , a3 } with
a3 as a base point or vertex (see
Fig. 2.34(a)). In case the line segments a1 and
a3 a3
a2 have equal
◦
length, say one unit and the angle ∠ a1 a3 a2 = 90 , then {

a3 , a2 }
a1 ,
a2
a2
a1 a1
a3 a3
(a) (b)
Fig. 2.34
is called an orthogonal or rectangular or Cartesian coordinate system

(see Fig. 2.34(b), compare with Fig. 2.17). For generalization in Rn ,
see Sec. 5.9.5.
(c) Suppose the vertices a1 ,
a2 and a3 of the coordinate triangle

∆ a1 a2 a3 have respective coordinates
(x1 , y1 ), (x2 , y2 ) and (x3 , y3 )
in a certain orthogonal or affine coordinate system. Let the point

x = (x, y) have barycentric coordinate (λ1 : λ2 : λ3 ). Show that
λ1 x1 + λ2 x2 + λ3 x3 λ1 y1 + λ 2 y 2 + λ 3 y 3
x= , y= .
λ1 + λ2 + λ 3 λ1 + λ 2 + λ3
Then, compare these relations with those stated in (2.6.1) and

Ex. <A> 2 both in content and in proof.
3. Suppose x1 and x2 have respective normalized barycentric coordinates
(λ11 , λ12 , λ13 ) and (λ21 , λ22 , λ23 ). A point
x on the line
x1
x2 divides
the segment x1 x2 into the ratio x1 x : x x2 = k which is positive if

x

is an interior point of x1 x2 and negative if exterior. Show that x has
normalized barycentric coordinate

λ11 + kλ21 λ12 + kλ22 λ13 + kλ23
, ,
1+k 1+k 1+k
by the following two methods.

(a) Use (2.5.10).

(b) Let
a and b be two points other than
x1 ,
x2 and
x . Show that
¯ 1 ¯ k ¯
∆ x
ab = ∆ x1 a b + ∆ x2 a b .
1+k 1+k
See Fig. 2.35.
x1 x x2
b x
b x2 a
x1
a
Fig. 2.35
4. Fix a coordinate triangle ∆

a1
a2
a3 . Show that the equation of the line

passing through the point a with (a1 , a2 , a3 ) and the point b with
(b1 , b2 , b3 ) is

λ1 λ2 λ3

a1 a2 a3 = 0, or

b b2 b 3
1

a2 a3 a2
λ1 + a3 a1 λ2 + a1 λ3 = 0,
b2 b3 b3 b 1 b1 b2
where the coefficients satisfy

a2 a3 a3 a1 a1 a2
: : = h1 : h2 : h3
b2 b3 b3 b1 b 1 b2
with h1 , h2 and h3 the respective signed distances from the vertices

a1 , a2 and
a3 to the line. Hence, designate hi and hj to have the same
sign if both
a i and
a j lie on the same side of the line, the opposite sign
if on the opposite sides (see Fig. 2.36). Use the following two methods.
a3
h3
a
a2
h1
h2
c
a1
b
Fig. 2.36
(a) Try to use Ex. 3 and note that, in Fig. 2.36,

a3 a1
b b ∆¯c
a2
a3
a2 c h1
3 1
= − ¯ = = , etc.
a2 a3 ∆ a1 c a3 a c h
b b 1 2
2 3
(b) Treat the line as the y-axis in a rectangular coordinate system (see
Fig. 2.37). Then, a point
x on the line has its abscissa (via Ex. 2(c))
λ1 x1 + λ2 x2 + λ3 x3
x= =0
λ1 + λ 2 + λ3
with h1 = x1 , h2 = x2 and h3 = x3 .
y-axis

h3 a3
h2
a2

a1
h1
x-axis
Fig. 2.37
Furthermore, deduce the fact that the equation of the line in an affine
coordinate system is

x1 x2 1

a1 a2 1 = 0,

b b2 1
1
where x1 = λ1 , x2 = λ2 . For concrete examples, please see Ex. 6.

5. Fix a coordinate triangle ∆a1
a2
a3 .
(a) Show that the point of intersection of the two lines
h11 λ1 + h12 λ2 + h13 λ3 = 0,
h21 λ1 + h22 λ2 + h23 λ3 = 0,
has the barycentric coordinate

h12 h13 h13 h11 h11 h12

h22 h23 : h23 h21
: h21 h22
.
(b) Show that the line determined by the points (h11 , h12 , h13 ) and
(h21 , h22 , h23 ) has the equation

λ1 λ2 λ3

h11 h12 h13 = 0, or

h h22 h23
21

h12 h13
λ1 + h13 h11 λ2 + h11 h12 λ3 = 0.
h22 h23 h23 h21 h21 h22
(c) Three points (hi1 , hi2 , hi3 ), i = 1, 2, 3 are collinear if and only if

h11 h12 h13

h21 h22 h23 = 0.

h h32 h33
31
While, three lines hi1 λ1 + hi2 λ2 + hi3 λ3 = 0, i = 1, 2, 3, are concur-

rent if and only if

h11 h12 h13

h21 h22 h23 = 0.

h h h
31 32 33
(d) Two lines are parallel if and only if their equations in barycentric
coordinates are
h1 λ1 + h2 λ2 + h3 λ3 = 0,
(h0 + h1 )λ1 + (h0 + h2 )λ2 + (h0 + h3 )λ3 = 0,
where h0 is a constant.
6. Give a coordinate triangle ∆ a1 a3 and use |
a2 a2 | = |
a1 a1 −
a2 | to denote

the length of the side a1 a2 , etc. See Fig. 2.38.
a2
a1 b1
a3
Fig. 2.38
2.7 Linear Transformations (Operators) 81
(a) Show that the three sides have the respective equations:

a1 a2 : λ3 = 0,

a2 a3 : λ1 = 0,

a3 a1 : λ2 = 0.
(b) The three medians have the equations:
median on a2 : λ1 − λ2 = 0,
a1
median on a3 : λ2 − λ3 = 0,
a2
median on a1 : λ3 − λ1 = 0.
a3
(c) The three altitudes have the equations:
altitude on a2 : |
a1 a1 | cos ∠
a3 a1 · λ1 − | a3 | cos ∠
a2 a2 · λ2 = 0,
altitude on a3 : |
a2 a2 | cos ∠
a1 a2 · λ2 − | a1 | cos ∠
a3 a3 · λ3 = 0,
altitude on a1 : |
a3 a3 | cos ∠
a2 a3 · λ3 − | a2 | cos ∠
a1 a1 · λ1 = 0.
(d) From the given point a with (α1 : α2 : α3 ) draw three perpendicular

lines
a b1 ,
a b2 and
a b3 to the three sides a3 ,
a2 a1 and
a3 a1
a2

respectively. Show that a b1 has equation
| a2 | cos ∠
a1 a2 · λ2 − | a1 | cos ∠
a3 a3 · λ3
| a1 a2 | cos ∠ a2 · λ2 − | a3 a1 | cos ∠

a3 · λ3
− (λ1 + λ2 + λ3 ) = 0.
α1 + α2 + α3

What are equations for
a b2 and
a b3 ?
2.7 Linear Transformations (Operators)

As known already in Sec. 1.4, an affine transformation between two lines is
characterized as a one-to-one and onto mapping which preserves ratios of
signed lengths of two segments. With exactly the same manner, we define
affine transformation between planes.
Consider R2 as an affine plane.
A one-to-one mapping T from R2 onto R2 is called an affine transfor-
mation or mapping if T preserves ratios of signed lengths of two segments
along the same line or parallel lines. That is, for any points
x1 ,
x2 and
x3
and scalar α ∈ R,

x3 − x2 −
x1 = α( x1 )
x3 ) − T (
⇒ T ( x2 ) − T (
x1 ) = α[T ( x1 )] (2.7.1)
always holds; in short, x3 = α

x1 x2 always implies T (
x1 x1 )T (
x3 ) =

αT ( x1 )T ( x2 ). See Fig. 2.39.
x3 T ( x3 )
x2 T ( x2 )
x1 T ( x1 )
T T ( x2 ) – T ( x1 )
x2 – x1
T ( x3 ) – T ( x1 )
x3 – x1
Fig. 2.39
In particular, an affine transformation maps lines onto lines and segments

onto segments.
What follows immediately is to describe an affine transformation with
x 0 ∈ R2 and
linear structure of R2 . For this purpose, fix an arbitrary point
define a mapping f : R → R by
2 2
x −
f ( x ) − T (
x 0 ) = T ( x 0 ), x ∈ R2 .

This f is one-to-one, onto and f ( 0 ) = 0 holds. Also, for any scalar α ∈ R,
we have by definition (2.7.1)
x −
f (α( x ) − T (
x 0 )) = α(T ( x −
x 0 )) = αf ( x 0 ), x ∈ R2 .

(∗1 )
Furthermore, take any two other points x1 and x2 and construct a

parallelogram as in Fig. 2.40. Let x3 be the middle point of the line seg-
ments x2 , i.e.
x1 x3 = 12 (
x1 +
x2 ). Then
x3 is also the middle point of the
line segment connecting x 0 and

x2 −
x1 + x 0 . Now
x2 −

x1 = 2(
x3 −
x1 ) ⇒ T (
x2 ) − T ( x3 ) − T (
x1 ) = 2(T ( x1 ));
( x1 + x2 − x 0 )

− x3 −
x 0 = 2( x 0 ) ⇒ T ( x2 −
x1 + x 0 ) − T (
x 0)
x3 ) − T (
= 2(T ( x 0 )).
x1 + x2 − x0
( x1 − x0 ) + ( x2 − x0 )
x2
x3
x2 − x0
x1
x1 − x0
x0
Fig. 2.40
f ( x1 − x0 ) + f ( x2 − x0 )
f ( x2 − x0 )
f ( x1 − x0 )
O
Fig. 2.41
Then
x1 −
f (( x2 −
x 0 ) + ( x 0 )) = f (( x2 −
x1 + x 0) −
x 0)
= T ( x2 −
x1 + x 0 ) − T (
x 0 ) = 2(T (x3 ) − T (
x 0 ))
x3 ) − T (
= 2[(T ( x1 ) − T (
x1 )) + (T ( x 0 ))]
x2 ) − T (
= T ( x1 ) − T (
x1 ) + T ( x1 ) − T (
x 0 ) + (T ( x 0 ))
x2 ) − T (
= (T ( x1 ) − T (
x 0 )) + (T ( x 0 ))
x1 −
= f ( x2 −
x 0 ) + f ( x 0 ). (∗2 )
Therefore, the original affine transformation can be described by
T (
x ) = T ( x −
x0 ) + f ( x0 ), x ∈ R2 ,

(2.7.2)
where f has the operational properties (∗1 ) and (∗2 ).

In this section, we focus our attention on mappings from R2 into R2
having properties (∗1 ) and (∗2 ), the so-called linear transformation, and
leave the discussion of affine transformation (2.7.1) or (2.7.2) to Sec. 2.8.

In short, an affine transformation (2.7.2) keeping 0 fixed, i.e. T ( 0 ) = 0 ,
is a linear isomorphism. We summarize the above result as
Linear isomorphism on the vector space R2

Suppose f : R2 → R2 is a function (see Sec. A.2). Then the following are
equivalent.
(1) (geometric) f is one-to-one, onto and preserves ratios of signed lengths

of line segments along the same line passing 0 .
(2) (algebraic) f is one-to-one, onto and preserves the linear structures of
R2 , i.e.
1. f (α x ) for α ∈ R and
x ) = αf ( x ∈ R2 ,
2. f ( x + y ) = f ( x ) + f ( y ) for x , y ∈ R2 .

Such a f is called a linear isomorphism or invertible linear operator on R2 .

(2.7.3)
We have encountered the concepts of linear isomorphisms on quite a few

occasions, for examples: Φ in (1.2.1), (1.3.3) with O = O , and (1.4.1) with
a = 0 and b = 0; Φ in (2.3.2), (2.4.2) with O = O , and (2.6.7). A linear
isomorphism f is invertible and its inverse function f −1 : R2 → R2 is a
linear isomorphism too (refer to Sec. A.2) and
f ◦ f −1 = f −1 ◦ f = 1R2 (the identity mapping on R2 ).
The proof of this easy fact is left to the readers.
If the “one-to-one” and “onto” conditions are dropped in (2.7.3), the
resulting function f is called a linear transformation. Easy examples are
touched on Exs. 2–5 in Sec. 2.4. It seems timely now to give a formal
definition for it.
Linear transformation
Let V and W be either the vector space R or R2 . A function f : V → W is
called a linear transformation if
1. f preserves scalar multiplication of vectors, i.e.
f (α
x ) = αf (
x) for α ∈ R and
x ∈ V,
2. f preserves addition of vectors, i.e.
f (
x +
y ) = f (
x ) + f (
y) for y ∈ V.
x,
In case W = V , f is called a linear operator and if W = R, a linear

functional. In particular, f ( 0 ) = 0 . (2.7.4)
Hence, a linear transformation (from V to V ) which is both one-to-one and

onto is a linear isomorphism. This definition is still suitable for arbitrary
vector spaces over the same field (see Secs. B.1 and B.7).
We call the readers attention, once and for all, to the following concepts
and notations concerned:
A vector (or linear) subspace (see Sec. B.1), simply called a subspace.
Hom(V, W ) or L(V, W ): the vector space of linear transformations from
V to W (see Sec. B.7).
Hom(V, V ) or L(V, V ): the vector space of linear operators on V .

x ∈ V | f (
Ker(f ) or N(f ): the kernel{ x ) = 0 } of an f ∈ H(V, W ) (see
Sec. B.7).
Im(f ) or R(f ): the range{f ( x) |
x ∈ V } of an f ∈ H(V, W ) (see
Sec. B.7).
Note that Ker(f ) is a subspace of V and Im(f ) a subspace of W . If f is
a linear operator on V , a subspace U of V is called invariant under f or
simply f-invariant if
f (U ) ⊆ U
x ∈ U, f (
i.e. for any x ) ∈ U always holds. For examples,

{ 0 }, V, Ker(f ) and Im(f )
are trivial invariant subspaces of V for any linear operator f .
This section is divided into eight subsections.
Section 2.7.1 formulates what a linear operator looks like in the
Cartesian coordinate system N = { e2 }, and then Sec. 2.7.2 presents
e1 ,
some basic but important elementary operators with their eigenvalues and
eigenvectors.
Section 2.7.3 discusses various matrix representations of a linear oper-
ator related to different bases for R2 and the relations among them. The
rank of a linear operator or a matrix is an important topic here.
Some theoretical treatment, independent of particular choice of a basis
for R2 , about linear operators will be given in Sec. 2.7.4.
From Sec. 2.7.5 to Sec. 2.7.8, we will investigate various decompositions
of a linear operator or matrix.
Geometric mapping properties of elementary operators or matrices are
discussed in Sec. 2.7.5. Therefore, algebraically, a square matrix can be
expressed as a product of elementary matrices. And hence, geometrically,
its mapping behaviors can be tracked.
Section 2.7.6 deepens the important concepts of eigenvalues and eigen-
vectors introduced in Sec. 2.7.2. If a linear operator or matrix has two
distinct eigenvalues, then it is diagonalizable as a diagonal matrix as its
canonical form.
In case a linear operator or matrix has two coincident eigenvalues and

is not diagonalizable, Sec. 2.7.7 investigates its Jordan canonical form in a
suitable basis for R2 .
Finally, Sec. 2.7.8 discusses how to get the rational canonical form for
linear operators or matrices which do not have real eigenvalues.
2.7.1 Linear operators in the Cartesian coordinate system

Fix R2 with the Cartesian coordinate system N = { e2 }, see Fig. 2.17(b).
e1 ,
Here, we will formally do the exercise Ex. 4 of Sec. 2.4.
Let f : R2 → R2 be a linear operator. For any
x = (x1 , x2 ) ∈ R2 , since

x = x1
e1 + x2
e2 and

f(x) = x1 f (
e1 ) + x2 f (
e2 ),
e1 ) and f (
f is completely determined by the vectors f ( e2 ). Suppose
f (
ei ) = (ai1 , ai2 ) = ai1
e1 + ai2
e2 , i = 1, 2.
Then
f (
x ) = x1 (a11
e1 + a12
e2 ) + x2 (a21
e1 + a22
e2 )
= (x1 a11 + x2 a21 )

e1 + (x1 a12 + x2 a22 )
e2
= (x1 a11 + x2 a21 , x1 a12 + x2 a22 )

a11 a12 a a12
= (x1 x2 ) = x [f ]N , with [f ]N = 11

,
a21 a22 a21 a22
(2.7.5)
where the matrix [f ]N = [aij ]2×2 is called the matrix representation of f

with respect to the basis N .
Conversely, for a given real 2 × 2 matrix A = [aij ], define the mapping
f : R2 → R2 by
f (
x) =
x A, x ∈ R2 . (2.7.6)
Here in x = (x1 , x2 ) as a 1 × 2 matrix [x1 x2 ].

x A, we consider the vector
By (α x1 +

x2 )A = (αx1 )A +
x2 A = α(
x1 A) + x2 A, it follows that such a
f is linear and [f ]N = A. Henceforth, we do not make distinction between

f and A in (2.7.6) and adopt the
Convention
For a given real 2×2 matrix A = [aij ], when considered as a linear operator
A: R2 → R2 , it is defined as
x →

x A in N = { e2 }.
e1 , (2.7.7)
Therefore, a matrix A is an algebraic symbol as well as a linear oper-

ator on R2 if necessary. It is this point of view that matrices play
the core of the theory of linear algebra on finite dimensional vector
spaces.

It is well-known that { 0 } is the only zero-dimensional subspace
of R2 , R2 is the only two-dimensional subspace while one-dimensional
subspaces are nothing but straight lines ax1 + bx2 = 0 passing
x generated by nonzero vectors
(0, 0) or x . See Exs. <A> 3–5 of
Sec. 2.3.
For a nonzero matrix A = [aij ]2×2 , considered as a linear operator on
R2 , then
x = (x1 , x2 ) ∈ Ker(A), the kernel space of A

⇔xA = 0
⇔ a11 x1 + a21 x2 = 0
a12 x1 + a22 x2 = 0
⇔ (see Sec. 2.5) x lies on the lines x = t(−a21 , a11 ) and

x = t(−a22 , a12 ).

In case (a11 , a21 ) = (0, 0) = 0 , a11 x1 + a22 x2 = 0 is satisfied by
x ∈ R2 and should be considered as the equation for the
all vectors

whole space R2 . Suppose (−a21 , a11 ) = 0 and (−a22 , a12 ) = 0 . According
to (2.5.9),
The lines
x = t(−a21 , a11 ) and x = t(−a22 , a12 ) coincide.
⇔ The direction vectors (−a21 , a11 ) and (−a22 , a12 ) are linearly
dependent.
−a21 a11 a11 a12
⇔ = or = or
−a22 a12 a21 a22
a11 a12 a11 a12
det A = det = = a11 a22 − a12 a21 = 0.
a21 a22 a21 a22
In this case, the kernel space Ker(A) is the one-dimensional subspace, say
a11 x1 + a21 x2 = 0. On the other hand,

The lines x = t(−a21 , a11 ) and x = t(−a22 , a12 ) intersect at 0 .
−a21 a11 a11 a12
⇔ = or = or
−a22 a12 a21 a22
a a12
det A = 11 = a11 a22 − a12 a21 = 0.
a21 a22

In this case, Ker(A) = { 0 } holds.
What are the corresponding range spaces of A?
Suppose Ker(A) is a11 x1 + a21 x2 = 0. Then

y = (y1 , y2 ) ∈ Im(A), the range space of A
⇔ y1 = a11 x1 + a21 x2
y2 = a12 x1 + a22 x2 for some x = (x1 , x2 ) ∈ R2 .
a12 a22
⇔ remember that = =λ
a11 a21

y = (a11 x1 + a21 x2 )(1, λ) x = (x1 , x2 ) ∈ R2 .
for

This means that the range space Im(A) is a straight line passing 0 .

If Ker(A) = { 0 }, then
y = (y1 , y2 ) ∈ Im(A)

⇔ y1 = a11 x1 + a21 x2
y2 = a12 x1 + a22 x2 for some x = (x1 , x2 ) ∈ R2 .
⇔ (solve simultaneous equations with x1 and x2 as unknowns)
1 1 y1 y2
x1 = (a22 y1 − a21 y2 ) = ,
det A det A a21 a22

1 1 a11 a12
x2 = (−a12 y1 + a11 y2 ) = .
det A det A y1 y2
This is the Cramer rule (formulas).
Therefore, the range space Im(A) = R2 .
We summarize as
The kernel and the range of a linear operator
Let A = [aij ]2×2 be a real matrix, considered as a linear operator on R2
(see (2.7.7)).
(1) Ker(A) and Im(A) are subspaces of R2 and the dimension theorem is
dim Ker(A) + dim Im(A) = dim R2 = 2,
where dim Ker(A) is called the nullity of A and dim Im(A) the rank
of A (see (2.7.44)), particularly denoted as r(A).
(2) If Ker(A) = R2 , then A = O2×2 is the zero matrix or zero linear

operator and Im(A) = { 0 }. Note the rank r(O) = 0.
(3) The following are equivalent.
1. The rank r(A) = 1.

2. A = O2×2 , det A = 0.
3. A = O2×2 , the row (or column) vectors are linearly dependent.
4. The nullity dim Ker(A) = 2 − 1 = 1 < 2.

Both Ker(A) and Im(A) are straight lines passing 0 . Suppose Ker(A) =

x1 and take any vector
x2 ∈ R2 such that R2 = x1 ⊕
x2 ,
then
Im(A) =
x2 A.
x2 : x2 → x2 A is one-to-one

Hence, the restriction operator A|
and onto (see Sec. A.2).
(4) The following are equivalent (see Exs. <A> 2 and 4 of Sec. 2.4).

1. Ker(A) = { 0 }, i.e.
x A = 0 if and only if x = 0.
2. A is one-to-one.
3. A is onto, i.e. Im(A) = R2 .
4. The rank r(A) = 2.
5. Two row vectors of A are linearly independent. So are the column
vectors.
6. det A = 0.
7. A is invertible and hence is a linear isomorphism.
8. A maps every or a basis { x2 } for R2 onto a basis {
x1 , x1 A,
x2 A}
for R .
2
y ∈ R2 , the equation
For a given xA =
y has the unique solution
−1
x = y A , where

−1 1 a22 −a12
A = . (2.7.8)
det A −a21 a11
These results provide us insights needed to construct various linear opera-

tors with different mapping properties.
For reminiscence and reference, we combine (2.5.9)–(2.5.12), Ex. 5

of Sec. 2.5, (2.7.3) and (2.7.8) together to obtain
The general geometric mapping properties of a linear operator

Suppose A = [aij ]2×2 is a nonzero real matrix.
(1) Suppose r(A) = 2. Hence A: R2 → R2 , as a linear isomorphism,

preserves:
1. The relative positions (i.e. coincidence, parallelism and intersection)

of two straight lines.
2. Line segment (interior points to interior points).
3. The ratio of signed lengths of line segments along the same or parallel
lines (in particular, midpoint to midpoint).
4. Triangle (interior points to interior points).
5. Parallelogram (interior points to interior points).
6. Bounded set (i.e. a set that lies inside the interior of a triangle).
7. The ratio of areas.
It does not necessarily preserve:
(a) length,
(b) angle,
(c) area, and
(d) directions(anticlockwise or clockwise) (see Ex. <A> 4 of Sec. 2.8.1
for formal definition) or orientations.
(2) Suppose r(A) = 1. Suppose Ker(A) = x1 and take any x2 ∈ R2

such that R2 = x 1 ⊕ x2 , then Im(A) = x2 A. In particular,

1. For any v ∈ Ker(A), A maps the straight line v + x2 one-to-

one and onto
x2 A and hence preserves ratio of signed lengths of
segments along such parallel lines.
2. For any x0 ∈ R2 , A maps the whole line
x0 + Ker(A) into a single

point x0 A. (2.7.9)
See Fig. 2.42.
Readers can easily prove these results. So the proofs are left to the readers.
If any difficulty arises, you just accept these facts or turn to Sec. 2.8.3 for
detailed proofs. They are also true for affine transformations (see (2.7.1)
and (2.7.2)) and are called affine invariants.
x0 + Ker ( A)
Im( A)
Ker ( A)
x0
e2 x0 A
e2
x1
x2 A
e1
0 e1
0
v
〈〈 x2 〉〉
v + 〈〈 x2 〉〉
Fig. 2.42
Exercises
<A>
1. Prove (2.7.9) in detail, if possible.

2. Endow R with natural basis N0 = {1} and R2 with N = { e2 }.
e1 ,
(a) Try to formulate all linear transformations from R to R2 in terms
of N0 and N .
(b) Same as (a) for linear transformations from R2 to R.
3. Which properties stated in (2.7.8) and (2.7.9) are still true for linear
transformations from R to R2 ? State and prove them.
4. Do the same problem as in 3 for linear transformations from R2 to R.
Read Secs. B.4, B.6 and B.7 if necessary, and try to do the following
problems.
1. Extend (2.7.8) and (2.7.9) to linear operators on C or C2 .

2. Extend (2.7.8) and (2.7.9) to linear transformation from Fm to Fn .
3. Extend (2.7.8) and (2.7.9) to linear transformation from a finite-
dimensional vector space V to a vector space W .
2.7.2 Examples
This subsection will be concentrated on constructing some basic linear
operators, and we will emphasize their geometric mapping properties.
As in Sec. 2.7.1, R2 is always endowed with the Cartesian coordinate

system N = { e2 }.
e1 ,
Let A = [aij ]2×2 be a nonzero real matrix.
Suppose r(A) = 1. Remind that in this case, the row vectors

e1 A = (a11 , a12 ),

e2 A = (a21 , a22 )
are linearly dependent. Therefore, in order to construct linear operator of
rank 1, all one needs to do is to choose a nonzero matrix A such that its
row vectors or column vectors are linearly dependent. Geometrically, via
Fig. 2.42, this means that one can map linearly any prescribed line x1

v , and another noncoincident line
onto a prescribed line x2 onto { 0 }.
Here, we list some standard operators of this feature.
Example 1 The linear operator

λ 0
A= with λ = 0
0 0
has the properties:

1.
e1 A = λ e1 and e2 A = 0 = 0 · e2 . λ and 0 are called eigenvalues of A
with corresponding eigenvectors e1 and e2 , respectively.
2. Ker(A) = e2 , while Im(A) = e1 is an invariant line (subspace)
of A, i.e. A maps e1 into itself.
3. A maps every line x0 + x , where x is linearly independent of e2 ,
one-to-one and onto the line e1 .
See Fig. 2.43(a) and (b). In Fig. 2.43(b), we put the two plane coincide
and the arrow signs indicate how A preserves ratios of signed lengths of
segments.
Ker( A)
x0 + 〈〈 x 〉〉
x0 + 〈〈 x 〉〉
e2 e2
〈〈e1 〉〉 A 〈〈e1 〉〉 〈〈e1 〉〉

0 e1 0 e1 e1 0 e1
(a) (b)
Fig. 2.43
Operators

0 λ 0 0 0 0
, and , where λ = 0
0 0 λ 0 0 λ
are all of this type.

a b
A= with ab = 0,
0 0
which is equivalent to (x1 , x2 )A = (ax1 , bx1 ), has the properties:
1. Let
v = (a, b). Then

vA = a
v and
e2 A = 0 =0·
e2 .
a and 0 are called eigenvalues of A corresponding to eigenvectors v

and e2 .
2. Ker(A) = e2 , while Im(A) = v is an invariant line (subspace).
3. For any vector x , linear independent of e2 , A maps the line x0 +
x
one-to-one and onto the line v .

See Fig. 2.44(a) and (b).
Ker(A)
〈〈 v 〉〉
e2 〈〈 v 〉〉 e2
v
A av
0 v 0
e1 e1
0
x0 + 〈〈 x 〉〉 x0 + 〈〈 x 〉〉
(a) (b)
Fig. 2.44
Operators

a 0 0 b 0 0
, and , where ab = 0
b 0 0 a a b
are all of this type.
Suppose abα = 0. Consider the linear operator

f (x1 , x2 ) = (a(x1 + αx2 ), b(x1 + αx2 )) = (x1 + αx2 )(a, b)
and see if it has any invariant subspace. Firstly,

f (x1 , x2 ) = 0
⇔ x1 + αx2 = 0.
Hence, the kernel space Ker(f ) is the line x1 + αx2 = 0. Suppose

x = (x1 , x2 ) does not lie on x1 + αx2 = 0 and f (
x ) = λ
x for some
scalar λ, then
f (
x ) = λ
x
⇔ (x1 + αx2 )(a, b) = λ(x1 , x2 )
⇔ bx1 − ax2 = 0 (and λ = a + αb).
This means that the operator f keeps the line bx1 − ax2 = 0 invariant, and
maps a point x = (x1 , x2 ) on it into another point f (
x ) = (a + αb)
x , still
on that line. Summarize as part of
Example 3 The operator

a b
A= , where α ab = 0,
αa αb
which is equivalent to (x1 , x2 )A = (x1 + αx2 )(a, b), has the following
properties:
1. Let
v1 = (−α, 1) and
v2 = (a, b). Then

v1 A = 0 =0·
v1

v2 A = (a + αb)
v2 .
0 and a + αb are eigenvalues of A with corresponding eigenvectors v1
and
v2 .
2. The kernel Ker(A) =
v1 , while the range Im(A) =
v2 . Hence
Ker(A) = Im(A)
⇔
v1 = (−α, 1) and
v2 = (a, b) are linearly dependent.
⇔ a + αb = 0.
In this case, A does not have invariant lines. If Ker(A) = Im(A), then
R2 = Ker(A) ⊕ Im(A) and Im(A) is an invariant line (subspace) of A.
3. For any vector x ∈ R2 which is linearly independent of v1 , A maps the

straight line x0 +

x one-to-one and onto the line Im(A).
See Fig. 2.45.
x0 + 〈〈 x 〉〉
x0 + 〈〈 x 〉〉
Ker( A) v1
e2 〈〈 v2 〉〉
A v1
e1 (a + b) v2 〈〈 v2 〉〉
v2
0 v2
0
0
(a) (b)
Fig. 2.45
From now on, suppose the rank r(A) = 2.
Example 4 The operator

λ 0
A= 1 , where λ1 λ2 = 0
0 λ2
or
xA = (x1 , x2 )A = (λ1 x1 , λ2 x2 ) has the properties:
1.
e1 A = λ1 e1 and e2 A = λ2 e2 . λ1 and λ2 are eigenvalues of A with
corresponding eigenvectors e1 and e2 .
2. In case λ1 = λ2 = λ, then A = λI2 and is called an enlargement with

scale λ. Consequently, A keeps every line passing 0 invariant and has

only 0 as an invariant point, if λ = 1.
3. In case λ1 = λ2 , A has two invariant lines (subspaces)
e1 and
e2 ,
and is called a stretching.
4. Stretching keeps the following invariants:
a. Length if λ1 = λ2 = 1.
b. Angle if λ1 = λ2.
c. Area if λ1 λ2 = 1.
d. Sense (direction) if λ1 λ2 > 0.
See Fig. 2.46.

1 < 0, 2 > 0 1 > 0, 2 > 0

e2
0 e1
1 < 0, 2 < 0 1 > 0, 2 < 0
Fig. 2.46
Suppose ab = 0 and consider the linear operator

f (
x ) = f (x1 , x2 ) = (bx2 , ax1 ).

x = 0 , then
No point is fixed except 0 . Suppose
x is an invariant line (subspace).

⇔ There exists a constant λ = 0 such that f (
x ) = λ
x , or
bx2 = λx1 ,
ax1 = λx2 .
In this case, x1 = 0 and x2 = 0 hold and consequently, λ2 = ab. Therefore,
in case ab < √0, there does not exist any invariant line. Now, suppose ab > 0.
Then λ = ± ab and there exists two invariant lines |a|x1 = ± |b|x2 .
In order to get insight into the case ab < 0, let us consider firstly a
special case: a = 1 and b = −1. Note that

0 1
x = (x1 , x2 ) −

− −−→ (x2 , x )
1 −−−−−→ (−x2 , x1 ) =
x .
0 1 −1 0 −1 0
1 0 0 1
This means that the point (x1 , x2 ) reflects along the line x1 = x2 into the
point (x2 , x1 ), and then reflects along x2 -axis into the point (−x2 , x1 ), the
image point of (x1 , x2 ) under f . See Fig. 2.47.
(x1, x2)
x1 = x2
e2
(−x2, x1)
(x2, x1)
0 e1
Fig. 2.47
We summarize these results in

0 a
A= , where ab = 0
b 0
has the following properties:
v1 = ( |b|, |a|) and
(1) In case ab > 0. Let v2 = (− |b|, |a|).
√ √ √ √
1. v1 A = ab v1 and v2 A = − ab v2 . Thus, ab and − ab are

eigenvalues of A with corresponding eigenvectors v1 and

v2 .
2. The invariant lines (subspaces) are v1 and v2 .

Read Explanation below and see Fig. 2.48.

(2) In case ab < 0. A does not have (non-zero) eigenvectors and hence
A does not have any invariant line. Suppose a > 0 and b < 0, then
−ab > 0 and

0 a −1 0 1 0 0 a
A= = .
−b 0 0 1 0 −1 −b 0
x →
The mapping x A can be decomposed as
x = (x1 , x2 ) −

−−−−→
(−bx2 , ax1 ) −
−−−−→

(bx2 , ax1 ) = x A
0 a −1 0
−b 0 0 1
See Fig. 2.49.
Explanation For a given x ∈ R2 , it might be difficult to pinpoint the

position of xA in the coordinate system N = {

e2 }. Since
e1 , v1 and
v2 are
linearly independent and thus B = { v1 , v2 } is a basis for R2 . Any

x ∈ R2
can be uniquely expressed as

x = α1
v1 + α2
v2 , i.e. [
x ]B = (α1 , α2 )
⇒
xA = α1
v1 A + α2
v2 A
√ √
= ab (α1 v1 − α2

v2 ), i.e. [
xA]B = ab (α1 , −α2 ).
This suggests how easily it is in the new coordinate system B to deter-
mine
xA:
√
x → [

x ]B = (α1 , α2 ) → (α1 , −α2 ) → ab (α1 , −α2 ) = [
xA]B →
xA.
This is the essence of the spectral decomposition
√ of A (refer√to Sec. 2.7.6).
√ √
See Figs. 2.48 and 2.49 where u1 = ( −b, a), u2 = (− −b, a).
x
〈〈 v2 〉〉
2v2 e2 v1 ab1v1
v2 1v1
e1
0
xA
〈〈v1 〉〉
−2v2
− ab2v2
Fig. 2.48
u2 0 a
xA x  
e2
u1 −b 0
0 e1
1 0
x  
0 −1
Fig. 2.49
If a = b = 1 or a = −b = 1, the readers are urged to simplify results stated

in Example 5 and Figs. 2.48 and 2.49.
Suppose abc = 0. Since
a
a 0 0
=b b c ,
b c 1 b
for simplicity, we may suppose that b = 1 and let a = λ1 , c = λ2 . Consider

the linear operator
f (x1 , x2 ) = (λ1 x1 + x2 , λ2 x2 ), where λ1 λ2 = 0.

x = (x1 , x2 ) = 0 . Then
Let
x is an invariant line (subspace) of f .

⇔ There exists constant λ = 0 such that f (
x ) = λ
x , i.e.
λ1 x1 + x2 = λx1
λ2 x2 = λx2 .
In case x2 = 0, λ = λ2 will give λ1 x1 + x2 = λ2 x1 , and hence

(λ1 − λ2 )x1 = −x2 implies that λ1 = λ2 . So
x = (1, λ2 − λ1 ) is a required
vector. If x2 = 0, then λ1 x1 = λx1 would give λ = λ1 and x = (1, 0) is
another choice. We put these in

λ1 0
A= , where λ1 λ2 = 0
1 λ2
(1) In case λ1 = λ2 . Let

v1 = v2 = (1, λ2 − λ1 ).
e1 ,
1.
v1 A = λ1 v1 and v2 A = λ2
v2 . Thus, λ1 and λ2 are eigenvalues of A
with corresponding eigenvectors v1 and v2 .
2. v1 and v2 are invariant lines of A.

In fact, B = { v2 } is a basis for R2 and

v1 ,

λ1 0 v1 1 0
[A]B = PAP −1 = , where P = =
0 λ2
v2 1 λ1 − λ2
is the matrix representation of A with respect to B (see Exs. 4, 5

of Sec. 2.4 and Sec. 2.7.3). See Fig. 2.50.
(2) In case λ1 = λ2 = λ. Then

e1 A = λ
e1
xA
v2
e2
0 e1 = v1
Fig. 2.50
and
e1 is the only invariant subspace of A. Also

λ 0 0 0 1 0
A= = λI2 + =λ 1
1 λ 1 0 λ 1
shows that A has the mapping behavior as
x = (x1 , x2 ) −−−−−−−−−→ (λx1 , λx2 )

enlargement λI2
@
A
translation along (x2 ,0)
R@ ?

xA = (λx1 + x2 , λx2 )
See Fig. 2.51.
x
xA
x
e2
0 e1
Fig. 2.51
A special operator of this type is

1 0
S= , where a = 0,
a 1
which is called a shearing. S maps each point x = (x1 , x2 ) to the point
(x1 + ax2 , x2 ) along the line passing
x and parallel to the x1 -axis, to the
right if a > 0 with a distance ax2 , proportional to the x2 -coordinate x2

of
x by a fixed constant a, and to the left if a < 0 by the same manner.
Therefore,
1. S keeps every point on the x1 -axis fixed which is the only invariant
subspace of it.
2. S moves every point (x1 , x2 ) with x2 = 0 along the line parallel to the
x1 -axis, through a distance with a constant proportion a to its distance
to the x1 -axis, to the point (x1 + ax2 , x2 ). Thus, each line parallel to
x1 -axis is an invariant line.
See Fig. 2.52.
e2
e1 e1
0 0
(a) a > 0 (b) a < 0
Fig. 2.52
Note that linear operators

λ1 1 1 a
, where λ1 λ2 = 0; , where a = 0,
0 λ2 0 1
can be investigated in a similar manner.
Suppose ab = 0. Consider the linear operator
f (
x ) = f (x1 , x2 ) = (bx2 , x1 + ax2 ).

x = (x1 , x2 ) = 0 . Then,
Let
x is an invariant line (subspace).

⇔ There exists a constant λ = 0 such that f (
x ) = λ
x , i.e.
bx2 = λx1
x1 + ax2 = λx2 .
Since b = 0, x1 = 0 should hold. Now, put x2 = λb x1 into the second equation

and we get, after eliminating x1 ,
λ2 − aλ − b = 0.
The existence of such a nonzero λ depends on the discriminant a2 + 4b ≥ 0.
Case 1 a2 + 4b > 0. Then λ2 − aλ − b = 0 has two real roots
√ √
a + a2 + 4b a − a2 + 4b
λ1 = , λ2 = .
2 2

Solve f ( x ) = λi x , i = 1, 2, and we have the corresponding vector

vi = (b, λi ). It is easy to see that
v1 and
v2 are linearly independent. There-
fore, f has two different invariant lines v1 and
v2 . See Fig. 2.53.
x
v1
〈〈 v2 〉〉
11v1
2v2 e2 1v1
xA
0 e1
22v2
〈〈 v1 〉〉 v2
12 < 0
Fig. 2.53
Case 2 a2 + 4b = 0. λ2 − aλ − b = 0 has two equal real roots

a a
λ= , .
2 2
Solve f (
x ) = λ v = (−a, 2). Then
x , and the corresponding vector is v
is the only invariant line. See Fig. 2.54.
Case 3 a2 + 4b < 0. f does not have any invariant line. See Fig. 2.55.
Summarize these results in

0 1
A= , where ab = 0
b a
2 (1+ 12 2, 2)

(1+ ,)
2 2
xA
v (1, 2)
e2
1v
x
0 e1 2e1
〈〈 v 〉〉
Fig. 2.54
xA = (bx2 , x1 + ax2 )
(0, ax2 )
( x1 , x2 )
e2
( x2 , x1 )
(bx2 , x1 )
0
e1
x1 = x2
Fig. 2.55
√ √
a+ a2 +4b a− a2 +4b
(1) a2 + 4b > 0. Let λ1 = 2 and λ2 = 2 and
vi = (b, λi )
for i = 1, 2.
1.
v i A = λi vi , i = 1, 2. Thus, λ1 and λ2 are eigenvalues of A with
corresponding eigenvectors v1 and v2 .
2. v1 and v2 are invariant lines (subspaces) of R2 under A.

In the basis B = { v2 }, A can be represented as

v1 ,

−1 λ1 0 v1 b λ1
[A]B = PAP = , where P = = .
0 λ2 v2 b λ2
(see Exs. 4, 5 of Sec. 2.4 and Sec. 2.7.3.) See Fig. 2.53.
(2) a2 + 4b = 0. Let λ = a2 , a2 and
v = (−a, 2).
1.
vA = λ v . λ is an eigenvalue of multiplicity 2 of A with correspond-
ing eigenvector v.
v is an invariant line (subspace) of R2 under A.
2.
In the basis B = { e1 }, A can be represented as

v,

λ 0 1 0 0 0
[A]B = P AP −1 = 1 =λ 1 = λI2 + 1 ,
2 λ 2λ 1 2 0

v −a 2
where P = = .
e1 1 0
This is the composite map of a shearing followed by an enlargement

with scale λ. See Figs. 2.52 and 2.54.
(3) a2 + 4b < 0. A does not have any invariant line. Notice that

0 1 0 1 b 0 0 0
= + .
b a 1 0 0 1 0 a
Hence, the geometric mapping property of A can be described as follows

in N = { e2 }.
e1 ,
   
   
   
 
 
 
See Fig. 2.55.

The operator

a 1
, where ab = 0
b 0
is of the same type.
Is there any more basic operator than these mentioned from Example 1
to Example 7? No more! It will turn eventually out that these operators are
sufficiently enough to describe any operators on R2 in various ways, both
algebraically and geometrically. Please refer to Secs. 2.7.5 to 2.7.8.
Before we are able to do so, we have to realize, as we learned from these
examples, that the natural Cartesian coordinate system N = { e2 } is
e1 ,
not always the best way to describe all possible linear operators whose def-
initions are independent of any particular choice of basis for R2 . A suitable
choice of a basis B = { x2 } for R2 , according to the features of a given
x1 ,
linear operator or a matrix (acts, as a linear operator in (2.7.7))

a a12
A = 11 ,
a21 a22
will reduce as many entries aij to zero as possible. That is, after a change
of coordinate from N to B, and in the eyes of B, A becomes

λ 0 λ 0 0 1
[A]B = P AP −1 = 1 or or . (2.7.10)
0 λ2 1 λ b a
It is [A]B that makes algebraic computations easier and geometric mapping
properties clearer and our life happier.
Through these examples, for a given real 2 × 2 matrix A = [aij ], its
main features are of two aspects:
1. the rank r(A),
2. the existence of an invariant line (subspace), if any.
We treated these concepts in examples via classical algebraic methods. Now,
we reformulate them formally in the language, commonly used in linear
algebra, which will be adopted throughout the book.
Definition Let A = [aij ]2×2 be a real matrix and considered as a linear
operator as in (2.7.7). If there exist a scalar λ ∈ R and a nonzero vector
x ∈ R2 such that

xA = λx,
then λ is called an eigenvalue (or characteristic root) of A and
x an asso-
ciated or corresponding eigenvector (or characteristic vector). (2.7.11)

In case λ = 0 and x = 0 is an associated eigenvector, then this geo-
metrically means that the line x is an invariant line (subspace) of R2
under A, and if λ = 1, A keeps each vector along x (or point on
x )
fixed, which is called a line of invariant points. If λ = −1, reverse it.
How to determine if A has eigenvalues and how to find them if they do
exist? Suppose there does exist a nonzero vector x such that

x for some scalar λ ∈ R
xA = λ

x (A − λI2 ) = 0
⇔
⇔ (by (3) in (2.7.8))

a11 − λ a12 a − λ a12
det(A − λI2 ) = det = 11
a21 a22 − λ a21 a22 − λ
= λ2 − (a11 + a22 )λ + (a11 a22 − a12 a21 ) = 0.
This means that any eigenvalue of A, if exists, is a root of the quadratic

equation λ2 − (a11 + a22 )λ + a11 a22 − a12 a21 = 0. Solve this equation for
real root λ and reverse the processes mentioned above to obtain the corre-
sponding eigenvectors x.
Summarize the above as
The procedure of computing eigenvalues and eigenvectors

Let A = [aij ]2×2 be a real matrix.
1. Compute the characteristic polynomial of A,
det(A − tI2 ) = t2 − (a11 + a22 )t + a11 a22 − a12 a21 = t2 − (tr A)t + det A,
where trA = a11 + a22 is called the trace of A and detA the determi-
nant of A.
2. Solve the characteristic equation of A,
det(A − tI2 ) = 0
for real roots λ, which are real eigenvalues of A.
3. Solve the simultaneous homogenous linear equations in two unknowns
x (A − λI2 ) = 0,

or
(a11 − λ)x1 + a21 x2 = 0
a12 x1 + (a22 − λ)
x2 = 0, where
x = (x1 , x2 ).
Any nonzero vector solution
x = (x1 , x2 ) is an associated eigenvector
of λ.
A has two distinct, coincident or no real eigenvalues according to the
discriminant(tr A)2 − 4 det A > 0, = 0 or < 0, respectively. (2.7.12)
It should be mentioned that definition (2.7.11) and terminologies in (2.7.12)

are still valid for n × n matrix A = [aij ] over a field F. For examples,
eigenvalue λ:
xA = λ x = 0 in Fn and λ ∈ F.
x for

eigenvector
x: x = 0 such that xA = λx for some λ ∈ F.
Characteristic polynomial:
det(A − tIn ) = (−1)n tn + an−1 tn−1 + · · · + a1 t + a0 ,
where an−1 = (−1)n−1 tr A and
n
trA = aii is the trace of A and a0 = det A. (2.7.13)
i=1
Please refer to Sec. B.10 if necessary. We will feel free to use them if needed.
The most simple and idealistic one is that A has two distinct real eigen-
values λ1 and λ2 with respective eigenvectors x1 and
x2 . In this case,
x1

and x2 are linearly independent. To see this, suppose there exist constants
α1 and α2 such that

α1
x1 + α2
x2 = 0

⇒ (perform A to both sides) α1 λ1
x1 + α2 λ2
x2 = 0
⇒ (by eliminating
x2 from the above two equations)

α1 (λ1 − λ2 )
x1 = 0

⇒ (since λ1 = λ2 and
x1 = 0 ) α1 = 0 and hence α2 = 0. (2.7.14)
Thus, B = { x2 } is a new basis for R2 . Since

x1 ,

x1 A = λ1
x1

x2 A = λ2 x2

x1 λ1 0 x1
⇒ A=
x2 0 λ2 x2

λ 0 x1
⇒ [A]B = PAP −1 = 1 , where P = . (2.7.15)
0 λ2 x2
This is the matrix representation of the linear operator A with respect
to B. We encountered this in Example 2 through Example 7. For further
discussion about matrix representation, see Sec. 2.7.3. Note that
tr [A]B = tr A = λ1 + λ2 and det[A]B = det A = λ1 λ2 . (2.7.16)
From (2.7.15), we have

2
x1 A = ( x1 )A = λ1 (
x1 A)A = (λ1 x1 ) = λ21
x1 A) = λ1 (λ1 x1 ,
2
x2 A = λ22
x2
⇒ (for any scalars α1 and α2 )
(α1
x1 + α2
x2 )A2 = λ21 α1
x1 + λ22 α2
x2
= (λ1 + λ2 )(α1 λ1 x2 ) − λ1 λ2 (α1
x1 + α2 λ2 x1 + α2
x2 )
= (λ1 + λ2 )(α1 x2 )A − λ1 λ2 (α1
x1 + α2 x1 + α2
x2 )I2
⇒ (since B = { x2 } is a basis for R2 )
x1 ,

x [A2 − (λ1 + λ2 )A + λ1 λ2 I2 ] = 0 x ∈ R2
for all
⇒ A2 − (tr A)A + (det A)I2 = O2×2 . (2.7.17)
This matrix identity holds for any 2 × 2 real matrix A even if A has
coincident eigenvalues or A does not have real eigenvalues. Since a direct
computation shows that
2
2 a11 + a12 a21 a12 (a11 + a22 )
A =
a21 (a11 + a22 ) a12 a21 + a222

a a12 1 0
= (a11 + a22 ) 11 − (a11 a22 − a12 a21 )
a21 a22 0 1
= (tr A)A − (det A)I2 , (2.7.18)
the result (2.7.17) follows. For other proofs, see Exs.<A> 7 and 5.
Cayley–Hamilton Theorem (Formula)

Let A = [aij ]2×2 be a real matrix.
(1) A satisfies its characteristic polynomial det(A − tI2 ), i.e.
A2 − (tr A)A + (det A)I2 = O.
Geometrically, this is equivalent to that

2
xA = (tr A)(
xA) + (det A)
x, x ∈ R2 ,
for any
xA2 ∈
and hence x, x ∈ R2 .
xA for any
(2)
A((tr A)I2 − A) = (det A)I2
shows that
A is invertible ⇔ det A = 0,
and, in this case,
1
A−1 = ((tr A)I2 − A). (2.7.19)
det A
Notice that (2.7.19) still holds for any n × n matrix A over a field F
(see (2.7.13)). It is the geometric equivalence of the Cayley–Hamilton for-
mula that enables us to choose suitable basis B for R2 so that, in the eyes
of B, A becomes [A]B as in (2.7.10). For details, see Secs. 2.7.6–2.7.8.
Exercises
<A>
1. Model after Example 1 and Fig. 2.43 to investigate the mapping

properties of each of the following operators:

0 λ 0 0 0 0
, and ,
0 0 λ 0 0 λ
where λ = 0.
2. Model after Example 2 and Fig. 2.44, do the same problem for operators:

a 0 0 b 0 0
, and ,
b 0 0 a b a
where ab = 0.
3. Model after Example 6 and Figs. 2.50–2.52, do the same problem for
operators:

λ1 1 1 a
and ,
0 λ2 0 1
where λ1 λ2 = 0 and a = 0.
4. Model after Example 7 and Figs. 2.53–2.55, do the same problem for the
operator

a 1
,
b 0
where ab = 0.
5. Give two sets, { x2 } and {
x1 , y2 }, of vectors in R2 . Does there always
y1 ,
exist a linear operator f : R → R2 such that f maps one set of vectors
2
onto another set of vectors? How many such f can be found?

6. Let x1 = (2, 3),
x2 = (−3, 1), x3 = (−2, −1) and x4 = (1, −4) be given
vectors in R . Do the following problems.
2
(a) Find a linear operator f on R2 keeping x1 and x2 fixed, i.e.

f ( xi ) = xi for i = 1, 2. Is such a f unique? Does the same f keep

x3 and x4 fixed?

(b) Find a linear operator f such that f ( x1 ) = f (
x3 ) = 0 . What
happens to f ( x2 ) and f (
x4 )?
(c) Find a linear operator f such that Ker(f ) = Im(f ) = x1 . How
many of f are possible?
(d) Find a linear operator f such that Ker(f ) = x2 while
Im(f ) = x3 . How many of f are possible?
(e) Find a linear operator f such that f (x1 ) =

x4 and f ( x2 ) =
x3 . Is
such a f unique?
(f) Does there exist a linear operator f such that f ( x1 ) = x3 and

f (−2 x1 ) = f ( x4 )? Why?
(g) Find a linear operator f such that f ( x1 ) = x3 + x4 and
f ( x2 ) = x3 − x4 .

(h) Does there exist a linear operator f such that f ( xi ) = i xi for

i = 1, 2, 3, 4?

(i) Find all possible linear operators mapping ∆ 0 x4 onto ∆ 0
x2 x1
x3 .

(j) Find all possible affine transformations mapping ∆ x2 x3 x4 onto
∆ x3
x4 x1 .
7. Suppose A = [aij ]2×2 is a real matrix, which is not a scalar matrix. Show
that these exists at least one nonzero vector x0 such that B = {x0 ,
x0 A}
is a basis for R . Try to model after (2.7.17) to show that there exist
2
constants α and β such that A2 + αA + βI2 = 0.

8. Let

−1 5
A= .
−6 7
(a) Show that the characteristic polynomial of A is t2 − 6t + 23.
(b) Justify Cayley–Hamilton formula for A.
(c) Use (b) to compute A−1 .
(d) Try to find a basis B = { x2 } for R2 , so that
x1 ,

0 1 x
[A]B = P AP −1 = , where P = 1 .
−23 6 x2
How many such B are there?

1. Let A = [aij ]2×2 be a matrix such that
A2 = O2×2 ,
and we call A a nilpotent matrix of index 2 if A = O (refer to Ex. 7 of
Sec. B.4).
(a) By algebraic computation, show that
A2 = O
⇔ tr A = a11 + a22 = 0 and a12 a21 = −a211 .

Give some numerical examples for such A.
(b) By geometric consideration, show that

A2 = O
⇔ Im(A) ⊆ Ker(A).
Then, consider the following three cases:

(1) Ker(A) = R2 , thus Im(A) = { 0 } and A = O2×2 .
(2) Ker(A) = x1 where x1 = 0, thus Im(A) =
x1 . Take
any fixed vector x2 which is linearly independent of

x1 , then

x1 for some scalar λ = 0. Show that
x2 A = λ
−1
x 0 0 x1
A = 1 .
x2 λ 0 x2

(3) Is it possible that Ker(A) = { 0 }?
(c) What are tr A and det A? How much can A2 − (tr A)A +
(det A)I2 = O help in determining A?
2. A matrix A = [aij ]2×2 is called idempotent if
A2 = A
(refer to Ex. 6 of Sec. B.4 and Ex. 7 of Sec. B.7).
(a) Using purely algebraic method, try to determine all idempotent
2 × 2 real matrices.
(b) Via geometric consideration, show that
A2 = A
⇔ Each nonzero vector in Im(A) is an eigenvector of A associated

to the eigenvalue 1.
Then,

(1) Im(A) = { 0 } implies A = O2×2 .

(2) Im(A) = x1 , where
x1 = 0 implies that x1 A =
x1 and
there exists an x2 which is linearly independent of

x1 with

Ker(A) = x2 . Hence
x2 A = 0 . Therefore,
−1
x 1 0 x1
A = 1 .
x2 0 0 x2

(3) Im(A) = R2 implies that Ker(A) = { 0 }. For any two linearly
independent vectors
x1 and
x2 ,
x1 A =
x1 and
x2 A =
x2 implies
A = I2 .
(c) Does A2 − (tr A)A + (det A)I2 = O help in determining such A?
How?
3. A matrix A = [aij ]2×2 is called involutory if
A2 = I2
(refer to Ex. 9 of Sec. B.4 and Ex. 8 of Sec. B.7).
(a) By purely algebraic method, show that
(1) A2 = I2
⇔ (2) A = ± I2 or a11 + a22 = 0 and a211 = 1 − a12 a21 .
Give some numerical examples for such A.
(b) By geometric consideration, show that
(1) A2 = I2
⇔ (2) (A − I2 )(A + I2 ) = O
⇔ (3) Im(A − I2 ) ⊆ Ker(A + I2 ) and Im(A + I2 ) ⊆ Ker(A − I2 ).
Then,
(1) If r(A − I2 ) = 0, then A = I2.
(2) If r(A − I2 ) = 1, then there exists a basis { x2 } for R2 so that
x1 ,
−1
x1 1 0 x1
A= .
x2 0 −1 x2
(3) If r(A − I2 ) = 2, then A = −I2 .
(c) Show that
R2 = Im(A − I2 ) ⊕ Im(A + I2 ),
Im(A + I2 ) = Ker(A − I2 ), Im(A − I2 ) = Ker(A + I2 ).
Then, try to find all such A.
(d) Try to use A2 − (tr A)A + (det A)I2 = O to determine A as far as
possible.
4. Let A = [aij ]2×2 be a real matrix such that
A2 = −I2
(refer to Ex. 9 of Sec. B.7).
(a) Show that there does not exist a nonzero vector x0 such that x0 A =
λ x0 for any λ = 0. That is, A does not have any invariant line.

(b) Take any nonzero vector x1 and show that

−1
x1 0 1 x1
A= .
x1 A −1 0 x1 A
(c) Is it possible to choose a basis B for R2 such that [A]B is a diagonal

matrix? Why?
5. Let A = [aij ]2×2 be a real matrix without real eigenvalues, i.e.
(tr A)2 − 4 det A < 0 holds.
(a) Let λ = λ1 + iλ2 be a complex eigenvalue of A, considered as a linear
operator defined by x → xA where x ∈ C2 . That is to say, there
exists a nonzero x = x1 + i x2 ∈ C2 where

x2 ∈ R2 such that
x1 ,

xA = λ
x.
Show that
yA = λ̄
y where
y =x1 −ix2 is the conjugate vector of
x.
(b) Show that

x1 − λ2
x1 A = λ1 x2 ,

x2 A = λ2
x1 + λ1
x2 .
(c) Show that x1 and x2 are linearly independent in R2 and hence

B = { x1 , x2 } is a basis for R2 . Then, in B,

−1 λ1 −λ2 x
[A]B = P AP = , where P = 1 .
λ2 λ1 x2
See Example 10 in Sec. 5.7.

(d) Try to use (b) or (c) to show that A2 − (tr A)A + (det A)I2 = O.
(e) Suppose the readers are familiar with basic knowledge concerned
plane trigonometry. Try to show that

λ1 λ2 cos θ sin θ
=r ,
−λ2 λ1 − sin θ cos θ
where r = λ21 + λ22 and θ ∈ R. The right side matrix can

be considered (under field isomorphism) as the complex number
z = reiθ . Try to explain its geometric mapping properties.
Read Secs. B.4, B.6 and B.7 if necessary and try to do the following
problems.
1. Extend Example 1 through Example 7 to linear operators on C2 , endowed

with the natural basis N = { e2 }. Emphasize the similarities and differ-
e1 ,
ences between linear operators on R2 and those on C2 , both algebraically
and geometrically. For example, what happens to (3) in Example 7?
2. Try to find as many “basic” linear operators on Fn , endowed with the

natural basis N = { en }, as possible. Is this an easy job if one has
e1 , . . . ,
only mean knowledge about the ranks and invariant subspaces of linear
operators on Fn ? Why not read Secs. B.10, B.11 and B.12 if necessary.
2.7.3 Matrix representations of a linear operator in

various bases
Remind that the definition (2.7.4) for a linear operator on R2 is independent
of any particular choice of bases for R2 . In Sec. 2.7.1 we considered a linear
operator on R2 as a real 2 × 2 matrix in the Cartesian basis N = { e2 }.
e1 ,
But various concrete examples in Sec. 2.7.2 indicated that N is not always
the best choice of bases in investigating both algebraic and geometric prop-
erties of a linear operator. It is in this background that, here in this subsec-
tion, we formally introduce the matrix representations of a linear operator
with respect to various bases for R2 .
Two matrix representations A and B of a linear operator with respect
to two bases are similar, i.e. there exists an invertible matrix P representing
change of coordinates (see Sec. 2.4) such that
B = PAP −1 . (2.7.20)
Similarity among matrices is an equivalent relation (see Sec. A.1) and it
preserves
1. the rank of a linear operator or a square matrix,
2. the characteristic polynomial and hence eigenvalues of a linear opera-
tor or a matrix,
3. the determinant of a matrix, and
4. the trace of a matrix. (2.7.21)
Refer to Exs. 25, 27, 31 and 32 of Sec. B.4 for more information concerned.
Also, reminding readers for linear operators f, g on R2 , there are several
relational linear operations and are mentioned as follows.
1. The addition g + f : (g + f )(x ) = g(x ) + f ( x ∈ R2 .
x ),
2. The scalar multiplication αf : (αf )( x ) = αf ( x ), α ∈ R.

3. The composite g ◦ f : (g ◦ f )(x ) = g(f ( x ∈ R2 .

x )),
−1
4. The invertible operator f (if f is one-to-one and onto).
For general reference, see Sec. A.2. Addition and scalar multiplication are
also defined for linear transformations to form a vector space V to a vector
space W over the same field. Therefore, the set Hom(V, W ) or L(V, W ) of
all such linear transformations form a vector space over F (see Sec. B.7). As
a consequence, the set Hom(V, V ) of operators on V forms an associative
algebra with identity and has the set GL(V, V ) of invertible linear opera-
tors become a group under the composite operation. If dim V = m < ∞
and dim W = n < ∞, Hom(V, W ) can be realized (under isomorphism) as
M(m, n; F), the set of all m×n matrices over F and GL(V, V ) as GL(n; F), the
set of all invertible matrices in M(n; F) = M(n, n; F). See Secs. B.4 and B.7.
Let us return to linear operators on R2 and proceed.

Choose a pair of bases B = { a2 } and C = { b1 , b2 } for the vector
a1 ,
space R2 . Suppose f : R2 → R2 is a linear operator.
Let

2

f (
ai ) = aij bj , i = 1, 2
j=1
ai )]C = (ai1 , ai2 ), i = 1, 2. Therefore, for any vector x ∈ R2 ,

i.e. [f (

2

x= xi
ai
i=1
  % 2 &

2
2
2

2
⇒ f(x) =

xi f (
ai ) = xi  aij bj  = xi aij bj
i=1 i=1 j=1 j=1 i=1
⇒ (by (2.3.1) or (2.6.3) and Note in (2.4.1 ))

% 2 &

2

[f ( x )]C = xi ai1, xi ai2
i=1 i=1

a11 a12
= (x1 x2 )
a21 a22
= [
x ]B [f ]B
C,
where

[f (
a1 )]C a a12
[f ]B
C = = 11 (2.7.22)
[f (
a2 )]C a21 a22
is called the matrix representation of the linear operator f with respect to

bases B and C.
Suppose g: R2 → R2 is another linear operator and D is a third basis

for R2 . Then, according to (2.7.22),
[g( x ]C [g]CD .
x )]D = [
Therefore, the composite linear operator g ◦ f : R2 → R2 satisfies
[(g ◦ f )( x )]C [g]CD = [
x )]D = [f ( x ]B [f ]B C
C [g]D , x ∈ R2
which implies
[(g ◦ f )]B B C
D = [f ]C [g]D . (2.7.22 )
In case f is a linear isomorphism and g = f −1 : R2 → R2 is the inverse

isomorphism, take D = B in (2.7.22 ), then

B 1 0
[f ]B
C [f −1 C
]B = 1 2
R B = I2 = ,
0 1
[f −1 ]CB [f ]B C
C = [1R2 ]C = I2 ,
which means that [f ]B

C is invertible and
[f ]B−1
C = [f −1 ]CB .
Conversely, given a real matrix A = [aij ]2×2 , there exists a unique linear
transformation f such that [f ]B
C = A. In fact, define f : R → R by
2 2

2

f (
ai ) = aij bj , i = 1, 2
j=1

2
and, then, for
x= i=1 xi
ai , linearly by

2

f(x) = xi f (
ai )
i=1
It is easy to check that f is linear and [f ]B

C = A holds.
Summarize as (refer to Sec. B.7 for more detailed and generalized

results)
The matrix representation of linear operator with respect to bases
Suppose B, C and D are bases for R2 , and f and g are linear operators on R2 .
Then, the matrix representation (2.7.22) has the following properties:
(a) [f ( x ]B [f ]B
x )]C = [ C, x ∈ R .
2
(b) Linearity:
[f + g]B B B
C = [f ]C + [g]C ,
[αf ]B B
C = α[f ]C , α ∈ R2 .
(c) Composite transformation:
[g ◦ f ]B B C
D = [f ]C [g]D .
(d) Inverse transformation: f is a linear isomorphism, i.e. invertible if and

only if for any bases B and C, the matrix [f ]B
C is invertible and
B−1 −1 C
[f ]C = [f ]B .
(e) Change of coordinates: Suppose B and C are bases for R2 . Then (refer
to (2.4.2))

B
[f ]B B C
C = AB [f ]C AC ,

where ABB is the transition
matrix from the basis B to the basis B
and similarly for ACC . So the following diagram is commutative.
C [f ]B
R2 −−−→ R2
(B) (C)
B
AB ↑ ↓ ACC =ACC −1
(2.7.23)
(B ) (C)
R 2
−−→ R 2
[f ]B
C
In case C = B, we write
[f ]B = [f ]BB (2.7.24)
in short. f is called diagonalizable if [f ]B is a diagonal matrix for some basis
B for R2 . Then, it follows from (e) that, if C = B and C = B ,

B −1
[f ]B = AB
B [f ]B AB (2.7.25)
holds. Then, [f ]B is said to be similar to [f ]B . Similarity among matrices
is an important concept in the theory of linear algebra. It enables us to
investigate the geometric behaviors of linear or affine transformations from
different choices of bases. For concrete examples, see Sec. 2.7.2.
Actually, (b) and (d) in (2.7.23) tell us implicitly more information, but
in an abstract setting.
Let the set of linear operators on R2 be denoted by
Hom(R2 , R2 ) or L(R2 , R2 ) = {f : R2 → R2 is a linear operator}, (2.7.26)
which is a vector space over R. As usual, let N = { e2 } be the natural
e1 ,
basis for R .
2
Four specified linear operators fij : R2 → R2 , 1 ≤ i, j ≤ 2 are defined as

follows:

ej , if k = i,
fij ( ek ) = (2.7.27)
o , if k = i,
and fij ( x ) = fij (x1 , x2 ) = x1 fij ( e2 ) linearly. If f ∈ L (R2 , R2 ),

e1 ) + x2 fij (

2
2

let f ( ek ) = j=1 akj ej . Then, for x = (x1 , x2 ) = k=1 xk

ek ,
 
2 2 2 2 2
f (
x) = xk f (ek ) = xk  ej  =
akj akj xk
ej
k=1 k=1 j=1 k=1 j=1

2
2
2
2
= akj xk fkj (
ek ) = akj fkj (xk
ek )
k=1 j=1 k=1 j=1

2
2
= akj fkj (
x)
k=1 j=1

2
2
⇒f = akj fkj . (2.7.28)
k=1 j=1
This means that any operator f is a linear combination of fkj , 1 ≤ k, j ≤ 2,

where fkj are linear independent, i.e.

2
2
akj fkj = 0 (zero operator) ⇔ akj = 0, 1 ≤ k, j ≤ 2.
k=1 j=1
Thus, {f11 , f12 , f21 , f22 } forms a basis for Hom(R2 , R2 ) and therefore,
Hom(R2 , R2 ) is a 4-dimensional real vector space (refer to Sec. B.3).
Corresponding to (2.7.26), let the set of real 2×2 matrices be denoted by
M(2; R) = {A = [aij ]2×2 | aij ∈ R, 1 ≤ i, j ≤ 2}, (2.7.29)
which is also a real vector space with matrix addition and scalar multipli-
cation: B = [bij ]2×2 and α ∈ R,
A + B = [aij + bij ];
αA = [αaij ].
Let

1 0 0 1 0 0 0 0
E11 = , E12 = , E21 = , E22 = . (2.7.30)
0 0 0 0 1 0 0 1
Note that fij defined in (2.7.27) has matrix representation
[fij ]N = Eij , 1 ≤ i, j ≤ 2.
For any A = [aij ] ∈ M(2; R), we have

2
2
A= aij Eij (2.7.31)
i=1 j=1
and A = O (zero matrix) if and only if aij = 0, 1 ≤ i, j ≤ 2. Thus,

{E11 , E12 , E21 , E22 } forms a basis for M(2; R) which hence is a
4-dimensional real vector space (refer to Sec. B.3).
Now, (b)–(d) in (2.7.23) guarantee
The isomorphism between Hom(R2 , R2 ) and M(2; R)
In natural basis N = { e2 } for R2 , define mapping Φ: Hom(R2 , R2 ) →
e1 ,
M(2; R) by
Φ(f ) = [f ]N .
Then Φ is a linear isomorphism, i.e. Φ is one-to-one and onto and preserves

linear operations:
1. Φ(f + g) = Φ(f ) + Φ(g),

2. Φ(αf ) = αΦ(f ), for α ∈ R and f, g ∈ Hom(R2 , R2 ).
Moreover, this Φ induces an algebra isomorphism between the associative

algebra Hom(R2 , R2 ) and the associative algebra M(2; R), i.e.
1. Φ(g ◦ f ) = Φ(f )Φ(g),

2. Φ(1R2 ) = I2 ,
where 1R2 : R2 → R2 is the identity operator. (2.7.32)

The vector space Hom(R2 , R2 ) has an important subset:
GL(R2 , R2 ) (2.7.33)
the set of all invertible linear operators on R2 . Under the composition of

functions, GL(R2 , R2 ) forms a group (see Sec. A.4), i.e. for each pair f, g ∈
GL(R2 , R2 ), f ◦ g ∈ GL(R2 , R2 ) always holds and satisfies the following
properties: for all f, g, h ∈ GL(R2 , R2 )
1. h ◦ (g ◦ f ) = (h ◦ g) ◦ f .
2. 1R2 ◦ f = f ◦ 1R2 , where 1R2 is the identity operator on R2 .
3. There exists a unique element f −1 ∈ GL(R2 , R2 ) such that

f ◦ f −1 = f −1 ◦ f = 1R2 .
Similarly, in vector space M(2; R), the set

GL(2; R) (2.7.34)
of all invertible matrices forms a group under the operation of matrix
multiplication. These two groups are essentially the same one in the fol-
lowing sense.
The isomorphism between GL(R2 , R2 ) and GL(2; R)
The linear (or algebra) isomorphism Φ in (2.7.32) induces a group isomor-
phism between GL(R2 , R2 ) and GL(2; R), i.e. for each f, g ∈ GL(R2 ; R2 ),
1. Φ(g ◦ f ) = Φ(f )Φ(g),

2. Φ(1R2 ) = I2 ,
3. Φ(f −1 ) = Φ(f )−1 .
Hence, both are called the real general linear group on R2 . (2.7.35)
Notice that, in terms of another basis B for R2 , GL(R2 , R2 ) is group

isomorphic to the conjugate group
PGL(2; R)P −1 = {PAP −1 | A ∈ GL(2; R)} (2.7.36)
of GL(2; R), where P is the transition matrix from B to N .
Here comes some notations to be used throughout the whole text. For
f ∈ Hom(R2 , R2 ) and A ∈ M(2; R) and n a positive integer,
f 0 = 1R2
f2 = f ◦ f (composite of function),
f n = f ◦ f n−1 if n ≥ 2,
−n −1 n
f = (f ) if f is invertible,
and correspondingly,
A0 = 1R2 ,
A2 = A · A (matrix multiplication),
A =A·A
n n−1
if n ≥ 2,
−n −1 n
A = (A ) if A is invertible. (2.7.37)
These are still valid for Hom(V ; V ) and M(n; F).
As a conclusion up to this point, (2.7.23), (2.7.32) and (2.7.35) all

together indicate that the study of linear operators on R2 can be reduced
to the study of real 2 × 2 matrices in a fixed basis for R2 , as we did
in Secs. 2.7.1 and 2.7.2, but the emphasis should be paid on the possi-
ble connections among various representations of a single linear operator
owing to different choice of bases. That is, similarity of matrices plays an
essential role in the interchange of a linear operator and its various matrix
representations.
In what remains in this subsection, we are going to find out some invari-
ants under the operation of similarity of matrices.
The easiest one among them is the determinant of a square matrix.
Suppose A ∈ M(2; R) and P ∈ GL(2; R). Then (2.4.2) shows that
The invariance of the determinant of a square matrix

under similarity
det PAP −1 = det A.
Hence, the determinant of a linear operator f on R2 is well-defined as
det f = det[f ]B ,
where B is any fixed basis for R2 (for geometric meaning, see (2.8.44)).
(2.7.38)
Owing to
PAP −1 − tI2 = P (A − tI2 )P −1 ,
it comes immediately
The invariance of characteristic polynomial and eigenvalues

of a square matrix under similarity
det(PAP −1 − tI2 ) = det(A − tI2 )
and hence PAP −1 and A have the same set of eigenvalues. Hence, the
characteristic polynomial of a linear operator f on R2 is
det(f − tIR2 ) = det([f ]B − tI2 ),
where B is any fixed basis for R2 . (2.7.39)

Notice that

x A = λ x ∈ R2
x for
x P −1 )(PAP −1 ) = λ(
⇔ ( x P −1 ). (2.7.40)
This means that if x is an eigenvector of A (or an operator) associated with
the eigenvalue λ in the natural coordinate system N , then x P −1 is the
−1
corresponding eigenvector of PAP associated with the same eigenvalue λ
in any fixed basis B for R2 , where P is the transition matrix from B to N
and x P −1 = [
x ]B . See the diagram.
A
R2 −−−→ R2
(N ) (N )
P ↑ ↓ P −1
(B) (B)
R2 −−−→ R2
PAP −1
How about the trace of a square matrix?

Suppose A = [aij ]2×2 and B = [bij ]2×2 . Actual computation shows that
tr(AB) = (a11 b11 + a12 b21 ) + (a21 b12 + a22 b22 )
= (b11 a11 + b12 a21 ) + (b21 a12 + b22 a22 )
= tr(BA). (2.7.41)
Suppose P2×2 is invertible. Let P A replace A and P −1 replace B in the
above equality, we obtain
The invariance of trace of a square matrix under similarity
tr(PAP −1 ) = tr A.
Hence, the trace of a linear operator f on R2 is well-defined as
tr f = tr[f ]B ,
For more properties concerned with trace, refer to Exs. 25–30 of Sec. B.4.
The invariance of trace can be proved indirectly from the invariance of
characteristic polynomial as (2.7.12) shows.
Finally, how about the rank of an operator or square matrix? The rank
we defined for a linear operator or matrix in Sec. 2.7.1 is, strictly speaking,
not well-defined, since we still do not know if the nonnegative integer is
unchangeable subject to changes of bases.
Let A and B be two real 2 × 2 matrices treated as linear operators on

R (see (2.7.7)). Without loss of generality, both may be supposed to be
2
nonzero matrices.
In case r(A) = 1, i.e. dim Im(A) = 1: Two separate cases are considered
as follows.

Case 1 Let r(B) = 1, then Im(AB) = { 0 } or Im(AB) = v where

v = 0 according to the image Im(A) = Ker(B) or not. Hence,

0 ⇔ Im(A) = Ker(B)
r(AB) =
r(A) = r(B) = 1 ⇔ Im(A) ∩ Ker(B) = { 0 }.
Case 2 Let r(B) = 2, then
r(AB) = r(A) = 1 < 2 = r(B).
In case r(A) = 2, still two separate cases are considered as follows.
Case 1 If r(B) = 1, then r(AB) = 1 holds.
Case 2 If r(B) = 2, then r(AB) = 2 holds.
The lower and upper bounds for the rank of the product matrix
of two matrices
Let A2×2 and B2×2 be two real matrices.
(1) If either A or B is zero, the
r(AB) = 0 ≤ min{r(A), r(B)}.
(2) If both A and B are nonzero,
r(A) + r(B) − 2 ≤ r(AB) ≤ min{r(A), r(B)},
where
r(AB) = r(A) + r(B) − 2 ⇔ Ker(B) ⊆ Im(A),

r(AB) = r(A) ⇔ Im(A) ∩ Ker(B) = { 0 },
r(AB) = r(B) ⇔ R2 = Im(A) + Ker(B). (2.7.43)
These results are still suitable for matrices Ak×m and Bm×n over fields,
with m replacing 2 (see Ex. 3 of Sec. B.7). Try to reprove (2.7.43) by using
the dimension theorem stated in (2.7.8) and then try to prove these results
for general matrices (see Ex. <C>).
As an easy consequence of (2.7.43), we have
The invariance of the rank of a matrix under similarity

Let A2×2 be a real matrix.
(1) If P2×2 is invertible, then P preserves the rank of A, i.e.
r(P A) = r(AP ) = r(A).
(2) Hence, if P is invertible,
r(PAP −1 ) = r(A).
The rank of a linear operator is well-defined as
r[f ] = r([f ]B ),
These results are still usable for arbitrary matrices over fields. Readers are
urged to prove (2.7.44) directly without recourse to (2.7.43).
For the sake of reference and generalization, we introduce four closely
related subspaces associated with a given matrix.
Let A = [aij ]2×2 be a real matrix. By interchange of rows and columns
of A, the resulted matrix, denoted as
A∗ , (2.7.45)
is called the transpose of A (see Sec. B.4).

Then, associated with A the following four subspaces:
Im(A) or R(A) = {
xA |
x ∈ R2 }
= the subspace of R2 generated by the row vectors
A1∗ = (a11 , a12 ) and A2∗ = (a21 , a22 ) of A,

Ker(A) or N(A) = {
x ∈ R2 |
xA = 0 } (2.7.46)
are called respectively the row space and the left kernel space of A, while
Im(A∗ ) or R(A∗ ) = {
x A∗ |
x ∈ R2 }
= the subspace of R2 generated by the column

a11 a12
vectors A∗1 = and A∗2 = of A,
a21 a22
∗ ∗ ∗
Ker(A ) or N(A ) = { x ∈ R | x A = 0 }
2
(2.7.47)
are called respectively the column space and the right kernel space of A.
Also, define the

row rank of A = dim Im(A)
= the maximal number of linearly independent row
vectors of A,
column rank of A = dim Im(A∗ )
= the maximal number of linearly independent
column vectors of A. (2.7.48)
Note that the row rank of A is the rank of A we defined in Sec. 2.7.1 and
(2.7.44) and is sometimes called the geometric rank of A. As a contrast,
the algebraic rank of A is defined to be the largest integer r such that some
r × r submatrix of A has a nonzero determinant (see Sec. B.6) if A itself is
a nonzero matrix.
Then (2.7.8) says
The equalities of three ranks of a matrix
Let A2×2 be a real matrix.
(1) If A = O2×2 , the zero matrix, then the rank of A is defined as
r(A) = 0.
(2) If A = O2×2 , then the rank r(A) satisfies
the row rank of A = the column rank of A
= the algebraic rank of A
= r(A).
Notice that 1 ≤ r(A) ≤ 2. (2.7.49)
What we have said here from (2.7.46) to (2.7.49) are still true for arbitrary
m × n matrices A over a field F (see Ex. 15 of Sec. B.4, Secs. B.5 and B.6,
Ex. 2 of Sec. B.7 and Sec. B.8): 0 ≤ r(A) ≤ min(m, n).
Exercises
<A>
1. Let the linear operator f on R2 be defined as

−6 5
f (
x) =
x
1 3
in N = {e1 , e2 }. Let B = {(−1, 1), (1, 1)} and C = {(2, −3), (4, −2)}.
(a) Compute the transition matrices ACB and AB

C , and justify that
(ACB )−1 = AB
C
by direct computation. Also ACB = ACN AN B .

(b) Compute [f ]B and [f ]C and show that they are similar by finding
an invertible P2×2 such that [f ]C = P [f ]B P −1 .
(c) Compute [f ]N N N N C
B and [f ]C , and show that [f ]B = [f ]C AB .
(d) Compute [f ]C and [f ]B , and show that [f ]B = AB [f ]B
B C C C C
C AB .
B B N
(e) Show that [f ]C = AN [f ]N AC .
(f) Show that [f ]B B N B N
C = AN [f ]C = [f ]N AC .
2. Let

2 3
f(x) = x
−4 −6
and B = {(1, −1), (−2, 1)} and C = {(−1, −2), (3, −4)}. Do the same
questions as in Ex. 1.
3. Find a linear operator f on R2 and a basis B for R2 such that

−1 1
[f (
x )]B = [
x ]N
1 1
x ∈ R2 . How many such f and B could we find?

for
4. Find the linear operator f on R2 and a basis B for R2 such that
−1
v1 3 2 v1
[f ]B = ,
v2 −1 4 v2
where v1 = (1, −1) and

v2 = (−5, 6). How many possible choices for B
are there?
5. Let

1 0
A= .
1 1
Does there exist a linear operator f on R2 and two bases B and C for
R2 such that [f ]B = I2 and [f ]C = A? Note that both I2 and A has
the same characteristic polynomial (t − 1)2 , and hence the same set of
eigenvalues.
6. Let

1 1 1 2
A= and B = .
−1 1 −1 0
(a) Give as many different reasons as possible to justify that there does
not exist a linear operator f on R2 and two bases B and C for R2
such that [f ]B = A and [f ]C = B.
(b) Find some invertible matrices P2×2 and Q2×2 such that P AQ = B.
7. For any linear operator f on R2 , there exist a basis B = { x2 } and
x1 ,
another basis C = { y1 , y2 } for R such that
2
−1
x1 y1 B 0 0 1 0 1 0
[f ] N = [f ] C = or or .
x2 y2 0 0 0 0 0 1
8. Find nonzero matrices A2×2 and B2×2 such that AB has each possible
rank.
9. Suppose A2×2 and B2×2 are matrices (or n × n matrices). Then
r(AB) = 2 ⇔ r(A) = r(B) = 2.
10. Let A be a 2 × 2 real matrix (or any n × n matrix). Then r(A) = 1 if

and only if there exist matrices B2×2 and C2×2 , each of rank 1, such
that A = BC.
11. Let A be a 2 × 2 real matrix (or any n × n matrix). Prove that the
following are equivalent.
(1) A is invertible (and (A−1 )−1 = A).
(2) A∗ is invertible (and hence, (A∗ )−1 = (A−1 )∗ ).
(3) There exists a real 2 × 2 matrix B such that AB = I2 (and hence
B = A−1 ).
(4) There exists a real 2 × 2 matrix B such that BA = I2 (and hence
B = A−1 ).
(5) The matrix equation AX = O2×2 always implies that X = O.
Compare with (4) in (2.7.8) and refer to Ex. 16 of Sec. B.4.
12. Let A and B be 2 × 2 real matrices (or any n × n matrices). Prove that
AB is invertible if and only if A and B are invertible, and
(AB)−1 = B −1 A−1 .
Compare with Ex. 9. What happens if Am×n and Bn×m with m = n?

13. Let A be a 2 × 2 real matrix (or any n × n matrix).
(a) Show that A is invertible if and only if A does not have zero
eigenvalues.
(b) A and A∗ have the same characteristic polynomial and hence same
eigenvalues.
(c) Suppose x A = λx and A is invertible, then x A−1 = λ1

x , i.e. λ1 is
−1
an eigenvalue of A associated with the same eigenvector x.
(d) Suppose x A = λx , then x Ap = λp x for integer p ≥ 1.
(e) Let g(t) = ak tk + · · · + a1 t + a0 be any polynomial with real coef-
ficients. Define
g(A) = ak Ak + · · · + a1 A + a0 I2 .
Suppose
x A = λ
x , then

x g(A) = g(λ)
x.
Refer to Exercises of Sec. B.10.
14. Let A and B be 2 × 2 real matrices (or any n × n matrices). Show that
r(A + B) ≤ r(A) + r(B)
with equality if and only if Im(A+B) = Im(A)⊕Im(B). Give examples
of 2 × 2 nonzero matrices A and B so that A + B has each possible
rank.
15. Let A be a 2 × 2 real matrix (or any n × n matrix). Show that there
exists another 2 × 2 real matrix B such that
(1) BA = O, and
(2) r(A) + r(B) = dim R2 = 2.
16. Suppose A is a 2 × 2 real matrix (or any n × n matrix) such that
r(A2 ) = r(A). Show that

Im(A) ∩ Ker(A) = { 0 }.
17. Let A be a 2 × 2 real matrix (or any n × n matrix) such that r(A) = 1.
Show that there exists a unique scalar λ ∈ R such that A2 = λA, and
I2 − A is invertible if λ = 1.
18. Let A be a 2 × 2 real matrix. Show that
r(A) = r(A∗ A) = r(AA∗ )
by testing case by case according to the rank of A. But, this equality
still holds for any m×n matrix A over a field F with characteristic zero.
Do you have any idea to prove this? Geometrically or algebraically?
19. Let A, B and C be 2 × 2 real matrices. Show that
r(AB) + r(BC) ≤ r(B) + r(ABC)
by testing case by case according to the ranks of A, B and C. Frobenius
obtained this inequality at 1911, and it is still true for arbitrary matrices
Ap×m , Bm×n and Cn×q . Any ideas to prove this? See Ex. <C> 11.
20. We say two vectors x and y in R2 are orthogonal if

xy ∗ = 0 and is
denoted as x ⊥ y . For a vector subspace S of R , let
2
S ⊥ = {
x ∈ R2 |
x ⊥ y ∈ S}.
y for each
Then S ⊥ is a subspace of R2 and is called the orthogonal complement

of S in R2 . For any nonzero real 2 × 2 matrix A, show that, both
geometrically and algebraically,
(1) Im(A)⊥ = Ker(A∗ ),
(2) Ker(A∗ )⊥ = Im(A) and R2 = Ker(A∗ ) ⊕ Im(A),
(3) Im(A∗ )⊥ = Ker(A),
(4) Ker(A)⊥ = Im(A∗ ) and R2 = Ker(A) ⊕ Im(A∗ ). Use the following
matrices

−5 2 3 −2
,
15 −6 4 6
to justify the above results. Also, prove that

(5) Ker(AA∗ ) = Ker(A) and hence, r(A) = r(AA∗ ) (see Ex. 18).
All these results are true for arbitrary m × n matrix A over the real
field R (for details, see Sec. B.8). Any ideas to prove these?
21. Let A be a real 2 × 2 matrix (or any n × n matrix over a field F, not of
characteristic equal to 2). Show that
(a) A is idempotent (see Ex. 2 of Sec. 2.7.2), i.e. A2 = A if and
only if r(A) + r(I2 − A) = 2.
(b) A is involutory (see Ex. 3 of Sec. 2.7.2), i.e. A2 = I2 if and
only if r(I2 + A) + r(I2 − A) = 2.

1. Both geometrically and algebraically, do the following problems:

λ 0 0 0
(a) 0 0 and 0 λ , where λ = 0, are similar.

λ 0 0 0
(b) What happens to 01 0 and 0 λ for similarity if λ1 and λ2 are
2
different?

0 λ 0 0
(c) 0 0 and λ 0
are similar if λ = 0.

λ 0 0 λ
(d) 0 0 and 0 0
are not similar if λ = 0.
2. Let

1 0 0 1 1 1
I2 = , J= , E= ,
0 1 1 0 0 0

0 0 1 0 0 1
F = , G= , H= .
1 1 1 0 0 1
Adopt the Cartesian coordinate system N = { e2 } in R2 .
e1 ,
(a) Explain geometrically and algebraically that it is impossible for I2
and J to be similar. Instead of this, is it possible that
I2 = PJP ∗
for invertible P ? How about I2 = PJQ for invertible P and Q?
(b) Show that E, F, G and H are never similar to I2 .
(c) Figure 2.56(a) and (b) explain graphically the mapping properties
of E and F , respectively. Both are projections (see Sec. 2.7.4).
invariant line x1 = x2
( x1 , x1 )
( x1 , x2 ) e2 ( x1 , x2 ) ( x2 , x2 )
e2
0 Ker ( F )
e1 ( x1 , x2 ) 0 e1
( x1 , x1 )
( x2 , x2 ) ( x1 , x2 )
Ker ( E )
(a) (b)
Fig. 2.56
Do you have a strong feeling that they should be similar? Yes, they
are! Could you see directly from Fig. 2.56 that
JFJ = JFJ −1 = E?
Try to find invertible matrices P and Q such that

−1 −1 1 0
PEP = QFQ = .
0 0
Is it necessary that J = P −1 Q?
(d) Look at Fig. 2.57(a) and (b). Explain geometrically and algebraically
that G and H are similar. Also,
JHJ = JHJ −1 = G.
invariant line
(0, x1 +x2)
x 1 + x2 = 0 x1 + x2 = 0
e2 ( x1 , x2 )
e2
( x1 , x2 )
(x1 + x2, 0)
0 e1
( x1 , x2 ) (x1 + x2,0) 0 e1
invariant line
( x1 , x2 )
(0, x1 + x2) Ker ( H )
Ker (G)
(a) (b)
Fig. 2.57
Find invertible matrices R and S such that

1 0
RGR −1 = SHS −1 = .
0 0
(e) Therefore, E, F, G and H are similar to each other. Explain these
both geometrically and algebraically.
(f) Determine if the following matrices

−1 1 1 −1 1 1
, ,
0 0 0 0 0 0
are similar. Justify your answers graphically.
(g) Determine if the following matrices

a b a 0 0 b 0 0
, , , , where ab = 0
0 0 b 0 0 a b a
are similar.
3. Let

1 1 0 1 1 1 1 0
M= , N= , U= , V = .
1 0 1 1 0 1 1 1
Explain geometrically and algebraically the following statements.

(a) M and N are similar.
(b) U and V are similar.
(c) M and U are not similar.
4. Similarity of two real 2 × 2 matrices A and B.
(a) Suppose both A and B have distinct eigenvalues. Then
(1) A is similar to B.
⇔ (2) A and B have the same eigenvalues λ1 and λ2 , λ1 = λ2 .
⇔ (3) A and B are similar to

λ1 0
.
0 λ2
(b) Suppose both A and B are not diagonalizable matrices and have
coincident eigenvalues. Then
⇔ (2) A and B have the same eigenvalues λ, λ.
⇔ (3) A and B are similar to or have the same Jordan canonical
form (see Sec. 2.7.7, if necessary)

λ 0
.
1 λ
(c) Suppose A and B do not have real eigenvalues. Then
⇔ (2) A and B have the same characteristic polynomial
t2 + a1 t + a0 with a21 − 4a0 < 0.
⇔ (3) A and B have the same rational canonical form (see Sec. 2.7.8,
if necessary)

0 1
.
−a0 −a1
Try to explain these results graphically (refer to Figs. 2.46–2.55).
5. Let A and B be two real 2 × 2 matrices.
(a) Show that AB and BA have the same characteristic polynomial
det(AB − tI2 ) = det(BA − tI2 )
and hence the same eigenvalues, if any.
(b) Let

0 0 0 −1
A= , B= .
1 1 0 1
Show that AB and BA are not similar.
(c) If either A or B is invertible, the AB and BA are similar.

All the materials in this section can be generalized verbatim or under minor
but suitable changes to abstract finite-dimensional vector spaces over a field
or m × n matrices over a field. Minded readers are able to find out their
counterparts widely scattered throughout the Appendix B. The main dif-
ficulties we encountered are that the methods adopted here are, in many
occasions, far insufficiently and ineffectively to provide proofs for these gen-
eralized and abstract extensions. More trainings such as experience with
linear operators on R3 and 3 × 3 matrices (see Chap. 3) and more sophis-
ticated backgrounds are indispensable.
For example, the case-by-case examination in the proof of (2.7.43) is
hardly suitable to prove the generalized results for Ak×m and Bm×n . On
the other hand, the dimension theorem stated in (1) of (2.7.8) might be a
hopeful alternative, a method seemed so roundabout and tedious in treat-
ing such a simple case as A2×2 and B2×2 . Let us try it now for matrices
Ak×m and Bm×n over an arbitrary field F. Consider A and B as linear
transformations defined as
A B

x A ∈ Fm −→ (
x ∈ Fk −→ x (AB) ∈ Fn .
x A)B =
Then, watch the steps (and better think geometrically at the same time):
r(AB) = dim Im(AB) = dim(B(Im(A)))

= dim Im(A) − dim(Im(A) ∩ Ker(B))
≥ dim Im(A) − dim Ker(B)
= dim Im(A) − (− dim Im(B) + m)
= dim Im(A) + dim Im(B) − m
= r(A) + r(B) − m.
Can you follow it and give precise reasons for each step? Or, do you get lost?
Where? How? If the answer is positive, try to do the following problems
and you will have a good chance to be successful.
1. Generalize (2.7.23) through (2.7.37) to linear transformations from a

finite-dimensional vector space V to another finite-dimensional vector
space W and M(m, n; F).
2. Generalize (2.7.38) through (2.7.49) to suitable linear operators on Fn
or n × n matrices over a field F and prove them.
3. Do Exs. <A> 9 through 21, except 19, for general cases stated in the
parentheses.
4. Let Ak×m and Bm×n be such that r(AB) = m. Show that r(A) =
r(B) = m. Is the converse true?
5. Let Bk×n be a k × n sub-matrix of Am×n obtained from A by deleting
(m − k) row vectors. Show that
r(B) ≥ r(A) + k − m.
6. Suppose An×n has at least n2 − n + 1 entries equal to zero. Show that
r(A) ≤ n − 1 and show an example that r(A) = n − 1 is possible.
7. Suppose Am×n and Bm×p are combined to form a partitioned matrix
[A B]m×(n+p) (see Ex. <C> of Sec. 2.7.5). Show that
r([A B]) ≤ r(A) + r(B)
and the equality is possible.
8. Let adj A denote the adjoint matrix of An×n (see (3.3.2) or Sec. B.6).
Show that
(1) r(A) = n ⇔ r(adj A) = n. In this case, det(adj A) = (det A)n−1 .
(2) r(A) = n − 1 ⇔ r(adj A) = 1.
(3) r(A) ≤ n − 2 ⇔ r(adj A) = 0.
(4) adj(adj A) = (adj A)n−2 A if n > 2; adj(adj A) = A if n = 2.
9. For Ak×m and Bm×n , show that r(AB) = r(A) if and only if

x (AB) = 0 always implies x ∈ Fk (refer to Ex. 11).
x A = 0 where
10. Let Ak×m , Bm×n , and Cl×k be such that r(A) = r(AB). Show that
r(CA) = r(CAB) (refer to Ex. 9).
11. For Ak×m , Bm×n and Cn×p .
(a) Show that, if x ∈ Fk ,

x A ∈ Fm |
dim{ x AB = 0 } = k − r(AB) − [k − r(A)]
= r(A) − r(AB).
Then, try to deduce that r(A) + r(B) − m ≤ r(AB). When does
equality hold?
(b) Show that Im(AB) ∩ Ker(C) ⊆ Im(B) ∩ Ker(C).
(c) Hence, show that
r(AB) + r(BC) ≤ r(B) + r(ABC).
When will equality hold?

12. Let A ∈ M(m, n; F) and B ∈ M(m, p; F).
(a) Construct a homogeneous system of linear equations whose solution
space is Ker(A) ∩ Ker(B).
(b) Construct a homogeneous system of linear equations whose solution
space is Ker(A) + Ker(B).
13. Let Am×n , Bp×m and Cq×m satisfy
r(A) + r(B) = m, BA = Op×n , CA = Oq×n .
Then Im(B) = Ker(A). Show that there exists Dq×p such that C = DB ,
and such a D is unique if and only if r(B) = p.
14. Try to do Ex. 10 of Sec. B.7.
2.7.4 Linear transformations (operators)

This subsection will devote to the study of some specified linear transfor-
mations between R and R2 and specially, linear operators on R2 . All the
results obtained can be easily extended, both in contents and in proofs, to
linear transformations or operators from a finite-dimensional vector space
V to another finite-dimensional vector space W over the same field F, but
not necessarily of the same dimension.
The feature is that we will try our best to obtain these results without
recourse to any particular choices of the bases for the space, i.e. the matrix
representations of the linear transformations (or operators). Then, for com-
parison, we will write these results in their corresponding matrix forms.
Suppose V1 and V2 are subspaces of R2 such that R2 = V1 ⊕ V2 holds
and hence, each x ∈ R2 can be uniquely expressed as x = x1 + x2 where
x1 ∈ V1 and x2 ∈ V2 . A linear operator f : R → R is called a projection
2 2
of R2 onto V1 along V2 if
f (
x) =
x1 (2.7.50)
x ∈ R2 (refer to Ex. 13 and Fig. B.3 of Sec. B.7 for definition in

for each
abstract space).

Of course, the zero linear operator is the projection of R2 onto { 0 }
along the whole space R2 . The identity operator 1R2 is the projection of R2

onto itself along { 0 }. Now, we have
The equivalent criteria of a projection on R2

Let f be nonzero linear operator on R2 with rank 1.
(1) f is a projection of R2 onto V1 along V2 .

⇔ (2) r(f ) + r(1R2 − f ) = 2.
⇔ (3) 1R2 − f is a projection of R2 onto V2 along V1 .
⇔ (4) f 2 = f (and hence f is the projection of R2 onto Im(f ) along
Ker(f )), i.e. f is idempotent.
⇔ (5) f has eigenvalues 1 and 0.
⇔ (6) There exists a basis B = { x2 } such that
x1 ,

1 0 x
[f ]B = P [f ]N P −1 = , P = 1 ,
0 0 x2
where, as usual, N = { e2 } is the natural basis for R2 .

e1 ,
See Fig. 2.58. (2.7.51)
V2
x V1
e2
f (x )
0 e1
Fig. 2.58
These results still hold for projections on a finite dimensional vector

space V , and (1), (3) and (4) are also true even for infinite-dimensional
space.
Proof (1) ⇔ (2) ⇔ (3) are trivial.

(1) ⇔ (4): For each x ∈ R2 , f ( x ) ∈ Im(f ) = V1 and hence

f ( x ) = f ( x ) + 0 shows that f (f (

x )) = f (
x ). This means that f 2 = f
holds. Conversely, f = f implies that f (f ( x )) = f (
2
x ). Since r(f ) = 1, let
Im(f ) = x1 and Ker(f ) =

x2 . Then f is the projection of R2 onto
Im(f ) along Ker(f ).
(1) ⇒ (5): Since for x1 ∈ V1 , f (x1 ) =
x1 and for x2 ∈ V2 , f (
x2 ) =

0 = 0 · x2 , f has only eigenvalues 1 and 0.

(5) ⇔ (6): obviously.

(6) ⇔ (4): Notice that [f ]2B = [f ]B ⇔ [f ]2N = [f ]N ⇔ f 2 = f .
Suppose f is a linear operator on R2 .

If the rank r(f ) = 2, then f is invertible and f −1 ◦ f = 1R2 , the identity
operator on R2 which is a trivial projection.

In case r(f ) = 1, let B = { x2 } be a basis for R2 such that f (
x1 , x1 ) = 0

and f (x2 ) = 0 . Let γ = { y2 } be a basis for R2 where
y1 , y 1 = f ( x1 ).
Define a linear operator g: R → R such that
2 2
g(
yi ) =
xi , i = 1, 2.
Therefore,
(g ◦ f )(
x1 ) = g(f (
x1 )) = g(
y1 ) =
x1
⇒ (g ◦ f )2 (
x1 ) = (g ◦ f )(
x1 ) = g(
y1 ) =
x1 ;
and

(g ◦ f )(
x2 ) = g(f (
x2 )) = g( 0 ) = 0

x2 ) = (g ◦ f )(
⇒ (g ◦ f )2 ( x2 ) = 0 .
x ) = (g ◦ f )(
Hence (g ◦ f )2 ( x ∈ R2 .
x ) for all
We summarize as
The projectionalization of an operator
Let f be a linear operator on R2 . Then there exists an invertible linear
operator g on R2 such that
(g ◦ f )2 = g ◦ f,
i.e. g ◦ f is a projection on R2 (see Fig. 2.59). Equivalently, for any real

2 × 2 matrix A, there exists an invertible matrix P2×2 such that
(AP )2 = AP. (2.7.52)
The above result still holds for linear operators on finite-dimensional vector
space or any n × n matrix over a field.
x2
y2
x x1
e2 f e2
g ° f ( x)
g
0
e1 0 e1
y1 = f ( x1 )
Fig. 2.59
For example, let

1 1
A= .
−2 −2

Then e1 A = (1, 1) =
y1 and
x2 = (2, 1) satisfying
x2 A = 0 . Take arbitrary

y2 linearly independent of y1 , say

y2 = e2 for simplicity. Define a square
matrix P2×2 such that

y1 P =
e1

y2 P =
x2
−1 −1
y1 e1 1 1 1 0 1 −1 1 0
⇒P = = =
y2 x2 0 1 2 1 0 1 2 1

−1 −1
= .
2 1
Then,

1 1 −1 −1 1 0
AP = =
−2 −2 2 1 −2 0
does satisfy (AP )2 = AP .
Suppose f is a nonzero linear operator on R2 and has rank equal to 1.
Let Ker(f ) = x2 with
x2 = 0. Take an arbitrary vector x1 ,
linearly independent of x2 , so that B = {

x2 } forms a basis for R2 .
x1 ,

Thus, x1 ) = 0 . Extend {
y1 = f ( y1 } to a basis C = { y2 } for R2 .
y1 ,
Define linear operators g and h on R2 respectively as follows.
g(
ei ) =
xi ,
h(
yi ) =
ei , for i = 1, 2.
Both g and h are isomorphisms. Hence, the composite h ◦ f ◦ g is a linear

operator with the properties that
h ◦ f ◦ g(
e1 ) = (h ◦ f )(
x1 ) = h(
y1 ) =
e1 ,

h ◦ f ◦ g(
e2 ) = (h ◦ f )(
x2 ) = h( 0 ) = 0
⇒ if
x = (x1 , x2 ) = x1
e1 + x2
e2 ,
h ◦ f ◦ g(x1 , x2 ) = (x1 , 0).
This means h ◦ f ◦ g is the projection of R2 onto the axis
e1 along the
axis
e2 . See Fig. 2.60.
x2
x1 y2
[ f ] = PAQ
R2 f R2
( ) ( ) 0 0
y1
g [1R2 ] = P [1R2 ] = Q
h
e2 e2
( ) ( )
R2 R2
h° f °g [f] = A
0 e1 0 e1
Fig. 2.60
In terms of natural basis N = { e2 } and B and C, we have

e1 ,
(see (2.7.23))

1 0
[h ◦ f ◦ g]N = [g]N
B [f ]B
C [h] C
N =
0 0

1 0 −1 1 0
⇒[f ]B
C = I2−1 I = , (2.7.53)
0 0 2 0 0
which means that, the matrix
representation of f with respect to the bases
1 0
B and C is the matrix 0 0 . Equivalently,
[f ]N = [1R2 ]N B C
B [f ]C [1R2 ]N
⇒ [f ]B B N
C = [1R2 ]N [f ]N [1R2 ]C
−1
x y 1 0
= 1 [f ]N 1 = . (2.7.54)
x2 y2 0 0
See Fig. 2.60.
We summarize as
The rank theorem of a linear operator on R2
Let f be a linear operator on R2 .
(1) There exist linear isomorphisms g and h on R2 such that, for any

x = ( x2 ) ∈ R2 ,
x1 ,

(0, 0), if r(f ) = 0,
h ◦ f ◦ g(x1 , x2 ) = (x1 , 0), if r(f ) = 1,

(x1 , x2 ), if r(f ) = 2.
See the diagram beside Fig. 2.60.
(2) Let A2×2 be a real matrix. Then there exist invertible matrices P2×2
and Q2×2 such that


 O2×2 , if r(A) = 0,




 1 0
, if r(A) = 1,
P AQ = 0 0





 1 0
 0 1 = I2 , if r(A) = 2.
Matrices on the right side are called the normal form of A according
to its respective rank (see Fig. 2.60). (2.7.55)
In general setting, for a m × n matrix A over a field F, there exist invertible

matrices Pm×m and Qn×n such that the normal form of A is

Ir O
PAQ = , r = r(A), (2.7.56)
O O m×n
where I0 = O in case r = 0.
Suppose again f is a nonzero linear operator on R2 .
If f has rank 2, so are f n for any n ≥ 2 and hence r(f n ) = 2, n ≥ 2.
In case f has rank 1, let Ker(f ) = x2 and
x1 be linearly independent
of x2 so that Im(f ) = f ( x1 ). If f (

x1 ) =
x2 , then

f 2 (
x1 ) = f (f (
x1 )) = f (λ
x2 ) = λf (
x2 ) = 0

for some λ ∈ R. Together with f 2 ( x2 ) = 0 , this means that f 2 = 0,
the zero linear operator. Therefore, r(f 2 ) = r(f 3 ) = · · · = 0. If

f (x1 ) ∩
x2 = { 0 }, then f 2 ( x1 )) = 0 , otherwise f (
x1 ) = f (f ( x1 ) ∈
x2 , a contradiction. Hence, Im(f ) = f ( x1 ) = f ( x1 ) which, in
2
turn, is equal to f n ( x1 ) for any integer n ≥ 3. This means r(f n ) = 1
for n ≥ 1 in this case.
As a conclusion, we summarize as
The ranks of iterative linear operators
(1) Then
r(f 2 ) = r(f 3 ) = r(f 4 ) = · · · .
In fact,
1. If r(f ) = 0, then r(f n ) = 0 for n ≥ 1.
2. If r(f ) = 1 and Ker(f ) = Im(f ), then r(f n ) = 0 for n ≥ 2, while if

r(f ) = 1 and Ker(f ) ∩ Im(f ) = { 0 }, then r(f n ) = 1 for n ≥ 1.
3. If r(f ) = 2, then r(f n ) = 2 for n ≥ 1,
(2) For any real 2 × 2 matrix A,
r(A2 ) = r(A3 ) = r(A4 ) = · · · . (2.7.57)
The aforementioned method can be modified a little bit to prove the general
result: for any n × n matrix A over a field,
r(An ) = r(An+1 ) = · · · . (2.7.58)
always holds.
Exercises
<A>
1. Notice that R2 = e1 + e2 ⊕
e1 = e2 ⊕
e1 + e2 . Suppose
f and g are respectively, the projections of R onto e1 +
2
e2 along
e2 and along
e1 . Show that f ◦ g = g and g ◦ f = f .
2. Suppose f and g are idempotent linear operators, i.e. f 2 = f and g 2 = g
hold.
(a) Show that Im(f ) = Im(g) ⇔ g ◦ f = f and f ◦ g = g.
(b) Show that Ker(f ) = Ker(g) ⇔ g ◦ f = g and f ◦ g = f .
(Note These results still hold for idempotent linear operators or pro-
jections on a finite-dimensional vector space.)
3. Let A2×2 be

−3 4 −3 4
or .
1 −2 −9 12
Try to find respective invertible matrix P2×2 such that (AP )2 = AP .
How many such P are possible?
4. Let the linear operator f be defined as

6 5
f (
x) =
x
−12 −10
in N = { e2 }. Try to find linear isomorphisms g and h on R2 such
e1 ,
x = (x1 , x2 ) ∈ R2 . How many
that f ◦ g ◦ h(x1 , x2 ) = (x1 , 0) for all
such g and h are there? Also, find invertible matrices P2×2 and Q2×2
such that

6 5 1 0
P Q=
−12 −10 0 0
and explain it graphically (see Fig. 2.60).
5. Let f be a linear operator on R2 and k a positive integer. Show that
Im(f k ) = Im(f 2k ) ⇔ R2 = Im(f k ) ⊕ Ker(f k ).
(Note This result still holds for linear operators on a finite-
dimensional vector space.)
6. Let f : R2 → R and g: R → R2 be linear transformations. Show that
g ◦ f : R2 → R2 is never invertible. How about f ◦ g?
(Note Suppose m > n and f ∈ Hom(Fm , Fn ) and g ∈ Hom(Fn , Fm ).
Then the composite g ◦ f is not invertible. What is its counterpart in
matrices?)
7. Let f ∈ Hom(R2 , R) and g ∈ Hom(R, R2 ).
(a) Show that f is onto if and only if there exists h ∈ Hom(R, R2 ) such
that
f ◦ h = 1R (the identity operator on R).
In this case, f is called right invertible and h a right inverse of f .
(b) Show that g is one-to-one if and only if there exists h ∈ Hom(R2 , R)
such that
h ◦ g = 1R .
Then, g is called left-invertible and h a left inverse of g.
(Note In general, let f ∈ Hom(V, W ). Then
(1) f is onto ⇔ there exists h ∈ Hom(W, V ) such that f ◦ h = 1W .
(2) f is one-to-one ⇔ there exists h ∈ Hom(W, V ) such that h◦f = 1V .
What are the counterparts for matrices (see Ex. 5 of Sec. B.7)? Try to
give suitable geometric interpretations. See also Ex. 5(d), (e) of
Sec. 3.3 and Ex. 6(e), (f) of Sec. 3.7.3.)
8. Let A2×2 and B2×2 be real matrices. Show that

det(AB − tI2 ) = det(BA − tI2 )
by the following steps (for other proof, try to use (2.7.12) and see
Ex. 5 of Sec. 2.7.3).
(a) We may suppose r(A) = 1 and A2 = A. By (2.7.51), there exists
invertible P2×2 such that

1 0
A = P −1 P.
0 0
Then

1 0
det(AB − tI2 ) = det P BP −1 − tI2 , and
0 0

−1 1 0
det(BA − tI2 ) = det P BP − tI2 .
0 0

Let PB P −1 = bb11
21
b12
b22 . Then
det(AB − tI2 ) = det(BA − tI2 ) = t(t − b11 ).

(b) Suppose r(A) = 1 only. By (2.7.52), there exists invertible P2×2
such that (AP )2 = AP holds. Then, use (a) to finish the proof.
(Note For general matrices Am×n and Bn×m over a field, exactly the
same method guarantees that
(−1)n−m tn det(AB − tIm ) = tm det(BA − tIn ).
Do you know how to do this? Refer to Ex. <C> 9 of Sec. 2.7.5.)
9. Suppose f, g ∈ Hom(R2 , R2 ) and f ◦ g = 0. Show that
r(f ) + r(g) ≤ 2.
When will the equality hold?
(Note This result still holds for f, g ∈ Hom(V, V ), where dim V =
n < ∞, and f ◦ g = 0.)
10. Suppose f, g, h ∈ Hom(R2 , R2 ) and f ◦ g ◦ h = 0. Show that
r(f ) + r(g) + r(h) ≤ 2 · 2 = 4.
When will equality hold?
(Note For f, g, h ∈ Hom(V, V ), where dim V = n < ∞, and
f ◦ g ◦ h = 0, then r(f ) + r(g) + r(h) ≤ 2n holds.)
11. Suppose f, g ∈ Hom(R2 , R2 ) and f + g = 1R2 . Show that

r(f ) + r(g) ≥ 2
and the equality holds if and only if f 2 = f , g 2 = g and f ◦g = g◦f = 0.
(Note This result still holds for f, g ∈ Hom(V, V ), where dim V = n
and f + g = 1V . Then r(f ) + r(g) ≥ n and the equality holds if and
only if f 2 = f , g 2 = g and f ◦ g = g ◦ f = 0.)
12. Suppose f ∈ Hom(R2 , R2 ).
(a) If there exists g ∈ Hom(R2 , R2 ) so that g ◦ f = 1R2 or f ◦ g = 1R2
holds, then f is invertible and g = f −1 .
(b) If there exists a unique g ∈ Hom(R2 , R2 ) so that f ◦ g = 1R2 , then
g ◦ f = 1R2 holds and this f is invertible.
(c) There exists a g ∈ Hom(R2 , R2 ) such that
f ◦ g ◦ f = f.
In this case, (f ◦g)2 = f ◦g and (g ◦f )2 = g ◦f hold simultaneously.
(Note Let V be a vector space over a field F. Then (a) still holds
for f ∈ Hom(V, V ) in case dim V < ∞, while (b) and (c) hold for
arbitrary V . Moreover, if there exist two different g and h in Hom(V, V )
so that f ◦g = f ◦h = 1V , then there are infinitely many k ∈ Hom(V, V )
so that f ◦k = 1V in case the underlying field F is of characteristic zero.)
13. Suppose f, g ∈ Hom(R2 , R2 ) such that r(f ) ≤ r(g) holds. Then, there
exist h1 , h2 ∈ Hom(R2 , R2 ) such that
f = h2 ◦ g ◦ h1
and h1 and h2 can be chosen as invertible linear operators in case
r(f ) = r(g).
(Note For f, g ∈ Hom(V, V ), where dim V < ∞, the result still holds.)
14. Suppose f ∈ Hom(R2 , R2 ) commutes with any g ∈ Hom(R2 , R2 ), i.e.
f ◦ g = g ◦ f for all such g. Show that, it is necessary and sufficient that
there exists scalar α ∈ R such that
f = α1R2 .
Try the following steps.
(1) For any x ∈ R 2 , {
x , f (
x )} should be linearly dependent.
(2) Take any nonzero x0 ∈ R2 , then f ( x0 ) = αx0 for some scalar α.

Thus, f ( x ) = α x should hold for any

x ∈ R2 .
(Note The result still holds for f ∈ Hom(V, V ) where V is arbitrary.)
15. Suppose f, g ∈ Hom(R2 , R2 ).

(a) There exist two bases B and C for R2 so that [f ]B = [g]C if and
only if
g = h ◦ f ◦ h−1
where h ∈ Hom(R2 , R2 ) is an invertible operator.

(b) For any basis B for R2 , [f ]B is the same matrix if and only if
f = α1R2 for some scalar α ∈ R.
(Note The above result still holds for any f ∈ Hom(V, V ), where
dim V = n < ∞.)

1. Let f, g ∈ Hom(R2 , R2 ) and f be diagonalizable but not a scalar matrix.

(a) Suppose f ◦ g = g ◦ f . Then f and g have a common eigenvector,
i.e. there exists a nonzero vector
x such that
f (
x ) = λ
x and g(
x ) = µ
x
for some scalars λ and µ. Therefore, g is diagonalizable.

(b) Suppose f ◦ g = g ◦ f . Then, there exists a basis B for R2 such that
[f ]B and [g]B are both diagonal matrices.
(Note Suppose An×n and Bn×n are complex matrices such that
AB = BA. Then there exists an invertible matrix P such that P AP −1
and PB P −1 are both upper (or lower) triangular matrices. See
Ex. <C> 10 of Sec. 2.7.6.)
2. Let fi ∈ Hom(R2 , R2 ) for 1 ≤ i ≤ 2 satisfy
fi ◦ fj = δij fi , 1 ≤ i, j ≤ 2.
(a) Show that either each r(fi ) = 0 for 1 ≤ i ≤ 2 or each r(fi ) = 1 for
1 ≤ i ≤ 2.
(b) Suppose gi ∈ Hom(R2 , R2 ) for 1 ≤ i ≤ 2 satisfy gi ◦ gj = δij gi ,
1 ≤ i, j ≤ 2.
Show that, there exists a linear isomorphism ϕ on R2 such that
gi = ϕ ◦ fi ◦ ϕ−1 , 1 ≤ i ≤ 2.
(Note Let dim V = n and fi ∈ Hom(V, V ) for 1 ≤ i ≤ n satisfy

fi ◦ fj = δij fi , 1 ≤ i, j ≤ n. Then the above results still hold.)
3. For f, g ∈ Hom(R2 , R2 ), define the bracket operation [ , ] of f and g as
[f, g] = f ◦ g − g ◦ f.
Show that, for any f, g, h ∈ Hom(R2 , R2 ),

(1) (bilinearity) [ , ] is bilinear, i.e. [f, ·]: g → [f, g] and [·, g]: f → [f, g]
are linear operators on Hom(R2 , R2 ),
(2) [f, f ] = 0, and
(3) (Jacobi identity)
[f, [g, h]] + [g, [h, f ]] + [h, [f, g]] = 0.
Then, the algebra Hom(R2 , R2 ) or the vector space R2 , is called a

Lie algebra according to the bracket operation [ , ] and the properties 1–3.
(Note Let dim V < ∞. Then Hom(V, V ) or the vector space V is a
Lie algebra with the bracket operation [ , ]. Equivalently, M(n; F) with
[A, B] = AB − BA is also a Lie algebra.)
4. (a) Suppose f ∈ Hom(R2 , R2 ). Show that (see (2.7.39))
det(f − t1R2 ) = t2 − (tr f )t + det f
and hence, f satisfies its characteristic polynomial (see (2.7.19))
f 2 − (tr f )f + (det f )1R2 = 0.
(b) Show that f 2 = −λ1R2 with λ > 0 if and only if
det f > 0 and tr f = 0.
(c) For f, g ∈ Hom(R2 , R2 ), show that
f ◦ g + g ◦ f = (tr f )g + (tr g)f + [tr (f ◦ g) − (trf ) · (tr g)]1R2 .
(Note (a) and (c) are still true for any vector space V with dim V = 2,
while (b) is true only for real vector space.)
<C>
1. Do (2.7.51) for finite-dimensional vector space V .

2. Do (2.7.52) for finite-dimensional vector space V .
3. Do (2.7.56). Prove that a matrix An×n of rank r can be written as a
sum of r matrices of rank 1.
4. Do (2.7.58).
5. Prove the statements inside the so many Notes in Exs. <A>
and .
6. Try your best to do as many problems as possible from Ex. 13 through
Ex. 24 in Sec. B.7.
7. For any A, B ∈ M(n; C) such that B is invertible, show that there
exists a scalar λ ∈ C such that A + λB is not invertible. How many
such different λ could be chosen?
8. Show that T : M(n; F) → M(n; F) is a linear operator if and only if,
there exists A ∈ M(n; F) such that
T (X) = XA, X ∈ M(n; F).
Therefore, show that T : M(n; F) → M(n; F) is a linear operator if and

only if, there exist a positive integer k and matrices Qj , Rj ∈ M(n; R)
for 1 ≤ j ≤ k such that

k
T (X) = Qj XRj , X ∈ M(n; F).
j=1
9. Suppose dim V < ∞ and T : Hom(V, V ) → F is a linear functional.
(a) There exist a linear operator f0 ∈ Hom(V, V ), uniquely determined

by T , such that
T (f ) = tr (f ◦ f0 ), f ∈ Hom(V, V ).
(b) Suppose T satisfies T (g ◦ f ) = T (f ◦ g) for any f, g ∈ Hom(V, V ).

Show that there exists a scalar λ such that
T (f ) = λ tr (f ), f ∈ Hom(V, V ).
(c) Suppose T : Hom(V, V ) → Hom(V, V ) is a linear transformation

satisfying that T (g ◦ f ) = T (g) ◦ T (f ) for any f, g ∈ Hom(V, V ) and
T (1V ) = 1V . Show that
tr T (f ) = tr f, f ∈ Hom(V, V ).
10. For P ∈ M(m; R) and Q ∈ M(n; R), define σ(P, Q): M(m, n; R) →
M(m, n; R) as
σ(P, Q)X = P XQ∗ , X ∈ M(m, n; R).
Show that σ(P, Q) is linear and
det σ(P, Q) = (det P )n (det Q)m .
(Note In tensor algebra, σ(P, Q) is denoted as
P ⊗Q
and is called the tensor product of P and Q. Results obtained here

can be used to discuss the orientations of Grassmann manifolds and
projective spaces.)
<D> Applications
(2.7.55) or (2.7.56) can be localized to obtain the rank theorem for contin-
uously differentiable mapping from open set in Rm to Rn . We will mention
this along with other local versions of results from linear algebra in Chaps. 4
and 5.
2.7.5 Elementary matrices and matrix factorizations

The essence of elimination method in solving systems of linear equations
lies on the following three basic operations:
1. Type 1: Interchange any pair of equations.

2. Type 2: Multiply any equation by a nonzero scalar.
3. Type 3: Replace any equation by its sum with a multiple of any other
equation.
These three operations on equations are, respectively, the same opera-

tions on the corresponding coefficient vectors without changing the order
of unknowns. Therefore, there correspond three types of operations on row
vectors of the coefficient matrix of the equations.
For example, let us put the system of equations

5x1 − 3x2 = 6
x1 + 4x2 = 2
with its matrix form

5 −3 x1 6
=
1 4 x2 2
side by side for comparison when solving the equations as follows. (1) ↔ (2)
means type 1, (2) + (−5) × (1) means type 3, and − 23
1
(2) means type 2, etc.

5x1 − 3x2 = 6 (1) 5 −3 x1 6
=
x1 + 4x2 = 2 (2) 1 4 x2 2
↓(1)↔(2) ↓E(1)(2)

x1 + 4x2 = 2 (1) 1 4 x1 2
=
5x1 − 3x2 = 6 (2) 5 −3 x2 6
↓(2)+(−5)×(1) ↓E(2)−5(1)

x1 + 4x2 = 2 (1) 1 4 x1 2
=
−23x2 = −4 (2) 0 −23 x2 −4
↓− 23
1
(2) ↓E− 1 (2)
23
x1 + 4x2 = 2
(1) 1 4 x1 2
4 = 4
x2 = (2) 0 1 x2 23
23
↓(1)+(−4)×(2) ↓E(1)−4(2)
x1 =
30
(1) 30
23 1 0 x1 23
= .
4 (2) 0 1 x2 4
x2 = 23
23
Thus, we introduce three types of elementary row (or column) operations

on a matrix A2×2 or any Am×n as follows:
Type 1: interchange of ith row and jth row, i = j,

Type 2: multiply ith row by a scalar α = 0,
Type 3: add a multiple of ith row to jth row, (2.7.59)
and the corresponding column operations. The matrices obtained by per-

forming these elementary row or column operations on the identity matrix
I2 or In are called elementary matrices and are denoted respectively as
row operations: E(i)(j) , Eα(i) and E(j)+α(i) ,

column operations: F(i)(j) , Fα(i) and F(j)+α(i) . (2.7.60)
For example, 2 × 2 elementary matrices are:

0 1
E(1)(2) = = F(1)(2) ,
1 0

α 0 1 0
Eα(1) = = Fα(1) ; Eα(2) = = Fα(2) ,
0 1 0 α

1 0
E(2)+α(1) = = F(1)+α(2) ,
α 1

1 α
E(1)+α(2) = = F(2)+α(1) .
0 1
Also, refer to Sec. 2.7.2 for geometric mapping properties of these ele-
mentary matrices. For more theoretical information about or obtained by
elementary matrices, please refer to Sec. B.5.
To see the advantages of the introduction of elementary matrices, let us
start from concrete examples.
Example 1 Let

1 2
A= .
4 10

(1) Solve the equation x = (x1 , x2 ) ∈ R2 and b = (b1 , b2 )
x A = b where
is a constant vector.
(2) Investigate the geometric mapping properties of A.

Solution Where written out, x A = b is equivalent to

1 2 x1 + 4x2 = b1
(x1 , x2 ) = (b1 , b2 ) or .
4 10 2x1 + 10x2 = b2
A called the coefficient matrix of the equations and the 3 × 2 matrix

is
A

b
its augmented matrix. Apply consecutive column operations to the
3×2
augmented matrix as follows.

   
  1 2 1 0  
A  4 10  4  A
- - =   −−−−−−→  2  =  F
- - (2)−2(1) (∗1 )
 - - - - - -  F(2)−2(1)  - - - - - - - - - - 
b b
b 1 b2 b1 b2 − 2b1
 
1 0  
4  A
1
−−−−−→  ---------
 = - - F(2)−2(1) F 1 (2)
2
(∗2 )
F1
(2) b
2 b2 −2b1
b1 2
 
1 0
 0 1 
−−−−−−→  

F(1)−4(2) -----------------
b2 −2b1
5b1 − 2b2 2
 
A
= - - F(2)−2(1) F 12 (2) F(1)−4(2) .

b
We can deduce some valuable information from the above process and
the final result follows.

(a) The solution of
xA = b

x1 = 5b1 − 2b2
x2 = −b1 + 1 b2
2
is the solution of the equations. It is better to put in matrix form as

5 −1 −1
x = (x1 x2 ) = (b1 b2 ) 1 = bA .
−2 2
In particular, we obtain the inverse A−1 of A.

(b) The invertibility of A and its inverse A−1

1 0
I2 = = AF(2)−2(1) F 12 (2) F(1)−4(2)
0 1

1 −2 1 0 1 0
⇒ A−1 = F(2)−2(1) F 12 (2) F(1)−4(2) =
0 1 0 1
2 −4 1

1 10 −2 5 −1
= = 1 .
2 −4 1 −2 2
By the way, A−1 can be written as a product of elementary matrices.

(c) A as a product of elementary matrices

−1
A = F(1)−4(2) F 1−1 −1
F(2)−2(1) = F(1)+4(2) F2(2) F(2)+2(1)
2 (2)

1 0 1 0 1 2
= . (2.7.61)
4 1 0 2 0 1
This factorization provides another way to investigate the geometric map-
ping properties of A than those presented in Sec. 2.7.4.
(d) The computations of det A and det A−1

1 0 1 0 1 2
det A = det · det · det = 1 · 2 · 1 = 2,
4 1 0 2 0 1

1 −2 1 0 1 0 1 1
det A−1 = det · det 1 · det =1· ·1= .
0 1 0 2 −4 1 2 2

Note When written in the transpose form A∗ x∗ = b∗ , we can apply ele-

mentary row operations to the augmented matrix [A∗ | b∗ ]2×3 and obtain
exactly the same factorizations for A and A−1 and solve the equations.
The only difference is that, when applying row operations to A∗ , the cor-
responding elementary matrices should multiply A in the front stepwise,
i.e.
E(1)−4(2) E 12 (2) E(2)−2(1) A∗ = I2 ,
which is the same as AF(2)−2(1) F 12 (2) F(1)−4(2) = I2 obtained when applying
column operations.
Adopt the factorization in (2.7.61) and see Fig. 2.61 for geometric map-
ping properties step by step.
Note that another way √ to study the geometric mapping properties is to
find the eigenvalues 11±2 113 of A and model after Fig. 2.50 and Example 6
(λ1 = λ2 ) in Sec. 2.7.2.
Note that (2.7.61) can also be written as

1 0 1 2 1 0 1 2
A= = , (2.7.62)
4 2 0 1 4 1 0 2
which is a product of a lower triangular matrix before an upper triangular
matrix. Readers are urged to explain how these factorizations affect the
mapping properties in Fig. 2.61.
It is worthy to notice that the appearance of lower and upper triangular
matrices factorization in (2.7.62) is not accidental and is not necessarily a
consequence of the previous elementary matrices factorization in (2.7.61).
1 0 
  1 0 
  (4, 2) (8, 2)
4 1  (4,1) (8,1)  0 2 
e2
Shearing One-way
0 e1 4e1 0 4e1 stretch
0 4e1
(8,18)
1 2 
1 2   
  0 1 
(4,10) Shearing
4 10
(4, 8)
0
( ) 1
2
scale
Fig. 2.61
We get it in the algebraic operation process up to the steps (∗ 1 ) and (∗ 2 ).

Stop there and we deduce that

1 0
AF(2)−2(1) =
4 2

1 0 −1 1 0 1 0 1 2
⇒A= F(2)−2(1) = F(2)+2(1) = ,
4 2 4 2 4 2 0 1
and

1 0
AF(2)−2(1) F 12 (2) =
4 1

1 0 −1 −1 1 0 1 2
⇒A= F1 F = .
4 1 2 (2) (2)−2(1) 4 1 0 2

Does factorization like (2.7.62) help in solving the equations
xA = b ?
Yes, it does. Take the first factorization for example,

1 2
x = b
4 10

1 0 1 2
⇔x
= y and y = b where
y = (y1 , y2 ).
4 2 0 1
Solve firstly
−1
1 2 1 2 1 −2
y = b ⇒ y = b

= b = (b1 , −2b1 + b2 ),
0 1 0 1 0 1
and secondly

1 0
x = (b1 , −2b1 + b2 )
42

1 0 1
⇒
x = (b1 , −2b1 + b2 ) = 5b1 − 2b 1 , −b1 + b 2 .
−2 12 2
Example 2 Let

0 2
A= .
−1 1
Do the same problems as in Example 1.

Solution For
x = (x1 , x2 ) and b = (b1 , b2 ),
x A = b is equivalent to

0 · x1 − x2 = b1
2x1 + x2 = b2
which can be easily solved as x1 = 12 (b1 + b2 ) and x2 = −b1 .
The shortcoming of A, for the purpose of a general theory to be estab-
lished later in this subsection, is that its leading diagonal entry is zero. To
avoid this happening, we exchange the first row and the second row of A
and the resulted matrix amounts to

−1 1 0 1 0 2
B= = = E(1)(2) A.
0 2 1 0 −1 1
Then, perform column operations to
x2 x1
 
    −1 1
E(1)(2) A B 
- - - - - - - - - = - - =  0 2
 
b b - - - - - -
b1 b2
 
1 1  
 0  B
−−−−−→  
2  =  F
 -

- −(1)
F−(1) ------- b
−b1 b2
 
1 0  
 0  B
−−−−−→  2  =  F
-- −(1) F(2)−(1)
F(2)−(1)  - - - - - - - - - - - 
b
−b1 b1 + b2
 
1 0  
 0  B
−−−−−→  ----------
1  =  F
-- −(1) F(2)−(1) F 12 (2) .
F 1 (2)
2 b
−b1 b1 +b 2
2

Note that x A = ( x E(1)(2) )(E(1)(2) A) = ( x E(1)(2) )B = (x2 x1 )B = b
so that the first column corresponds to x2 while the second one to x1 .
Equivalently, we can perform row operations to
 
..
−1 0 . b1  x2
[(E(1)(2) A)∗ | b∗ ] = [A∗ F(1)(2) | b∗ ] = [B ∗ | b∗ ] =  .
1 2 .. b2 x1
   
.. ..
1 0 . −b 1 0 . −b
−−−−→  ..
1  −−−−−→ 
..
1 
E−(1) E(2)−(1)
1 2 . b2 0 2 . b1 + b2
 
..
1 0 . −b1 
−−−−→  . .
0 1 .. b1 +b
E 1 (2) 2
2
2
In this case, the first row corresponds to x2 while the second one to x1 .

(a) The solution of xA = b

x = b1 + b2
1
2 , or

x2 = −b1

1
−1

x = (x1 , x2 ) = (b1 b2 ) 12
= b A−1 .
2 0

1 0
I2 = = BF−(1) F(2)−(1) F 12 (2)
0 1
−1
⇒ B −1 = (E(1)(2) A)−1 = A−1 E(1)(2) = F−(1) F(2)−(1) F 12 (2)
⇒ A−1 = F−(1) F(2)−(1) F 12 (2) F(1)(2)
1
−1 0 1 −1 1 0 0 1 −1
= = 21 .
0 1 0 1 0 12 1 0 2 0
(c) The elementary factorization of A
−1
A = F(1)(2) F 1−1 F −1 F −1
(2) (2)−(1) −(1)
2
= F(1)(2) F2(2) F(2)+(1) F−(1)

0 1 1 0 1 1 −1 0
=
1 0 0 2 0 1 0 1

0 1 −1 0 1 −1
= (2.7.63)
1 0 0 2 0 1

0 1 −1 1 1 0 −1 1 −1 0 1 −1
⇒ A= = = .
1 0 0 2 0 1 0 2 0 2 0 1
(2.7.64)
(d) The determinants det A and det A−1

0 1 1 0 1 1 −1 0
det A = det · det · det · det
1 0 0 2 0 1 0 1
= (−1) · 2 · 1 · (−1) = 2;

−1 −1 0 1 −1 1 0 0 1
det A = det · det · det 1 · det
0 1 0 1 0 2 1 0

1 1
= (−1) · 1 · · (−1) = .
2 2
See Fig. 2.62. Note that A does not have real eigenvalues.
Example 3 Let

1 2
A= .
2 −7

x ∗ = b∗ where
(1) Solve the equation A x = (x1 , x2 ) and b = (b1 , b2 ).

Solution As against
x A = b in Examples 1 and
2, here we use column
∗ x ∗ b
vector x = x as unknown vector and b = b1 as constant vector in
1
2 2

x ∗ = b∗ with A as coefficient matrix and [A | b∗ ]2×3 as augmented matrix.
A
(1, 5)
4e2 4e2
2e2
0 1 1 0   1 1
e2      
1 0  0 2   0 1 (1,1)
2e1 e1 Shearing
0 0 0 e1 0 −1 0
 
(−1, 4) 4e2 (−1, 5)  0 1
−1 0
   1 −1
 0 2   4e2
 0 1
Shearing
( −1,1)
− e1 0
0
Fig. 2.62
Now, apply consecutive row operations to

 
..
∗ 1 2 . b1 
[A| b ] =  .
2 −7 .. b2
 
..
1 2 . b  = E(2)−2(1) [A|
−−−−−−→  ..
1
b∗ ]
E(2)−2(1)
0 −11 . b2 − 2b1
 
..
1 2 . b ∗
−−−−−−→  .
1 
= E− 11
1
(2) E(2)−2(1) [A| b ]
0 1 .. 2b111
E− 1 (2) −b2
11
 
.. 7b1 +2b2
1 0 . 11  ∗
−−−−−−→  .. 2b1 −b2 = E(1)−2(2) E− 11 1
(2) E(2)−2(1) [A| b ].
E(1)−2(2)
0 1 . 11
∗ ∗
(a) The solution of A x = b
 1

x1 = (7b1 + 2b2 )
11
, or

 x = 1 (2b − b )
2 1 2
11

∗ x 1 7 2 b1

x = 1
= = A−1 b∗ .
x2 11 2 −1 b2

I2 = E(1)−2(2) E− 11 1
(2) E(2)−2(1) A

1 −2 1 0 1 0
⇒ A−1 = E(1)−2(2) E− 111 E
(2) (2)−2(1) =
0 1 0 − 11
1
−2 1

1 7 2
= .
11 2 −1
(c) The elementary factorization of A
−1 −1 −1
A = E(2)−2(1) E− 1 E(1)−2(2) = E(2)+2(1) E−11(2) E(1)+2(2)
11 (2)

1 0 1 0 1 2
= . (2.7.65)
2 1 0 −11 0 1
(d) The determinants det A and det A−1
det A = 1 · (−11) · 1 = −11,

1 1
det A−1 = 1 · − ·1=− .
11 11
See Fig. 2.63.
(−3,11) ( −1,11)
e2 (1,1)  1 0  (1,1) (3,1)
( −1,1) 1 0 
   
2 1  0 −11
e1 ( −1,9)
0 0
( −1, −1) (1, −1) (−3, −1) ( −1, −1)

(−3,5) 1 2 
 
0 ( )
1
2
scale
0 1 
 1 2
 
 2 −7
0 ( )
1
2
scale (1, −11) (3, −11)
(3, − 5)
(1, − 9)
Fig. 2.63
√
Note that A has two distinct real eigenvalues −3 ± 20.
Example 4 Let

2 3
A= .
−4 −6

(1) Solve the equation A x ∗ = b∗ .
(2) Try to investigate the geometric mapping properties of A.

x ∗ = b∗ out as
Solution Write A

2x1 + 3x2 = b1
.
−4x1 − 6x2 = b2
The equations have solutions if and only if 2b1 + b2 = 0. In this case, the
solutions are x1 = 12 (b1 − 3x2 ) with x2 arbitrary scalars.
Apply row operations to
 
..
2 3 . b1 
[A | b∗ ] =  .
−4 −6 .. b2
 
3
.. b1
1 .
−−−−−−→  2
..
2  = E1
2 (1)
[A | b∗ ]
E 1 (1)
2 −4 −6 . b2
 
3
.. b1
1 2 .  = E(2)+4(1) E 1 (1) [A |
−−−−−−→  .
2
2
b∗ ].
E(2)+4(1) .
0 0 . 2b + b
1 2
Since r(A) = 1 and elementary matrices preserve ranks, then

x ∗ = b∗ has a solution.
A

⇔ r(A) = r([A | b∗ ]) ⇔ 2b1 + b2 = 0.
This constrained condition coincides with what we obtained via traditional

method. On the other hand,

1 32
= E(2)+4(1) E 12 (1) A
0 0

1 32 2 0 1 0 1 32
⇒ A = E −1 1 E −1
= .
2 (1)
(2)+4(1) 0 0 0 1 −4 1 0 0
See Fig. 2.64.

2 0
  e2
 1 0
  (−4,1) (−2,1)

1

3
2

( )
1
2
scale (2,3)
e2 0 1  −4 1 0 0 0
0 e1 0 2e1 0 2e1 (− 2, −3)
(− 4, −6)
Fig. 2.64
We summarize what we obtained in Examples 1–4 in a general setting.

About elementary matrices
1. E(i)(j) = F(i)(j) ; Eα(i) = Fα(i) ; E(j)+α(i) = F(i)+α(j) .

2. det E (i)(j) = −1, det Eα(i) = α; det E(j)+α(i) = 1.
∗ ∗ ∗
3. E(i)(j) = F(i)(j) ; Eα(i) = Fα(i) ; E(j)+α(i) = F(j)+α(i) .
−1 −1 −1
4. E(i)(j) = E(i)(j) ; Eα(i) = E α1 (i) ; E(j)+α(i) = E(j)−α(i) .
5. The matrix obtained by performing an elementary row or column oper-
ation on a given matrix Am×n of the type E or F is
EA or AF,
where Em×m and Fn×n are the corresponding elementary matrices.

(2.7.66)
Proofs are easy and are left to the readers.

By a permutation matrix P of order n is a matrix obtained from In
by arbitrary exchange of orders of rows or columns of In . That is, for any
permutation σ: {1, 2, . . . , n} → {1, 2, . . . , n}, the matrix
 
eσ(1)
 .. 
P = .  (2.7.67)

eσ(n)
is a permutation matrix. There are n! permutation matrices of order n.
Since P is a product of row exchanges of In ,

1, even row exchanges from In
1. det P =
−1, odd row exchanges from In .
2. P is invertible and
P −1 = P ∗ .
Details are left to the readers.

Now, here are some of the main summaries.
How to determine the invertibility of a square matrix?

Let An×n be a real matrix (or any square matrix over a field).
(1) Perform a sequence of elementary row operations E1 , . . . , Ek needed to

see if it is possible that
Ek Ek−1 · · · E2 E1 A = In .
(2) If it is, then A is invertible and its inverse is
A−1 = Ek Ek−1 · · · E2 E1 .
(3) Elementary matrix factorization of an invertible matrix is
A = E1−1 E2−1 · · · Ek−1

−1
Ek−1 .
Hence, a square matrix is invertible if and only if it can be expressed

as a product of finitely many elementary matrices. (2.7.68)
Therefore, elementary row operation provides a method for computing the

inverse of an invertible matrix, especially of large order.
The lower and upper triangular decompositions of a square matrix

Let An×n be a matrix (over R or any field F).
(1) If only elementary row operations E1 , . . . , Ek of types 2 and 3 are

needed (no need of type 1, exchange of rows) to transform A into an
upper triangular matrix U , i.e.
Ek Ek−1 · · · E2 E1 A = U .
Then,
A = (E1−1 E2−1 · · · Ek−1

−1
Ek−1 )U
  
1 0 d1 · · · · · ·
  .. ,
= LU =  ... ..
.  ..
. . 
... ... 1 0 dn
where L is lower triangular with diagonal entries all equal to 1 while

U is upper triangular with diagonal entries d1 , d2 , . . . , dn (the nonzeros
among them are called the pivots of A). Therefore, the following hold:
1. A is invertible if and only if d1 , . . . , dn are nonzero. In this case,
A can be factored as
   
1 0 d1 0 1 ··· ···
   .. 
A =  ... ..
.  ..
.  ..
. . 
... ... 1 0 dn 0 1
= LDU,
where the middle diagonal matrix P is called the pivot matrix of A,

the lower triangular L is the same as before while this U is obtained
from the original U by dividing its ith row by di and hence, has all its
diagonal entries equal to 1 now. Note that det A = det D = d1 · · · dn .
2. As in 1,
A∗ = U ∗ DL∗ ,
where U ∗ is lower triangular and L∗ is upper and D∗ = D.

3. Suppose A is also symmetric and invertible such that A = LDU ,
then U = L∗ , the transpose of L and
A = LDL∗ .
(2) If row exchanges are needed in transforming A into an upper triangular,

there are two possibilities to handle this situation.
1. Do row exchanges before elimination. This means that there exists a
permutation matrix P such that no row exchanges are needed now
to transform PA into an upper triangular form. Then
PA = LU
as in (1).
2. Hold all row exchanges until all nonzero entries below the diagonal
are eliminated as possible. Then, try to permutate the order of rows
of such a resulting matrix in order to produce an upper triangular
form. Thus,
A = LP U,
where L is lower triangular with diagonal entries all equal to 1, P is

a permutation matrix while U is an upper triangular with nonzero
pivots along its diagonal. (2.7.69)
As an example for Case (2), let

 
0 1 3
A =  2 4 0 .
−1 0 5
For PA = LU :
   
−1 0 5 −1 0 5
A −−−−−−→ PA =  
2 4 0 −−−−−−→  0 4 10
P =E(1)(3) E(2)+2(1)
0 1 3 0 1 3
   
−1 0 5 −1 0 5
 5
−−−−−−→  0 1 52  −−−−−→  0 1 2 
E 1 (2) E(3)−(2)
4 0 1 3 0 0 12
 
−1 0 5
−1  0 1 5
⇒ PA = E(2)+2(1) E −1
1
−1
E(3)−(2)  2
4 (2)
1
0 0 2
  
1 0 0 −1 0 5
  5
= −2 4 0  0 1 2
0 1 1 1
0 0 2
   
−1 0 0 −1 0 0 1 0 −5
=  2 1 0   0 4 0  0 1 5
.
2
1
0 1 1 0 0 2 0 0 1
For A = LP U :
     
0 1
3 0 1 3 2 4 0
A −−−−−−→ 2 0 −−−−−−→ 2 4
4 0 −−−−−−→ 0 1 3
E(3)+ 1 (2) E(3)−2(1) E(1)(2) =P
2 0 2
5 0 0 −1 0 0 −1
 
2 4 0
−1
⇒ A = E(3)+ 1 E −1
E −1 0 1 3
2 (2)
(3)−2(1) (1)(2)
0 0 −1
   
1 0 0 0 1 0 2 4 0
= 0 1 0 1 0 0 0 1 3 .
−1
1 2 1 0 0 1 0 0 −1
The geometric mapping properties of elementary matrices of order 3 are

deferred until Sec. 3.7.2 in Chap. 3.
Similar process guarantees
The factorizations of an m × n matrix

Let Am×n be nonzero matrix over any field.
(1) There exists a lower triangular matrix Lm×m such that
A = LUm×n ,
where U is an echelon matrix with r pivot rows and r pivot columns,

having zeros below pivots (see Sec. B.5), where r is the rank of A.
(2) There exists an invertible matrix Pm×m such that
PA
is the row-reduced echelon matrix of A (see Sec. B.5).

(3) There exist invertible matrices Pm×m and Qn×n such that A has the
normal form

I 0
PAQ = r ,
0 0 m×n
where r = r(A) (see (2.7.56) and Fig. 2.60). (2.7.70)
For more details and various applications of this kind of factorizations, see
Sec. B.5 for a general reference.
In fact, this process of elimination by using elementary row and column
operations provides us a far insight into the symmetric matrices.
Let us start from simple examples.
Let

1 −3
A= .
−3 9
Then

1 −3 1 0
A −−−−−−→ −−−−−−→
E(2)+3(1) 0 0 F (2)+3(1) 0 0

−1 1 0 −1 1 0 1 0 1 −3
⇒ A = E(2)+3(1) F = = LDL∗ .
0 0 (2)+3(1) −3 1 0 0 0 1
Let

0 4
B= .
4 5
Then,

∗ 5 4
B −−−−→ E(1)(2) BF(1)(2) = E(1)(2) BE(1)(2) =
E(1)(2) 4 0
F(1)(2)

∗ ∗ 5 0
−−−−−−−−−−−−−→ E(2)− 45 (1) E(1)(2) BE(1)(2) E(2)− 4 =
E(2)− 4 (1) ,F(2)− 4 (1) 5 (1) 0 − 16
5
5 5

−1 −1 5 0 * −1 −1
+∗
⇒ B = E(1)(2) E(2)− 4
(1) −16 E(1)(2) E(2)− 4
(1)
5 0 5
5
4 4
5 1 5 0 5 1
=
1 0 0 −16
5 1 0
= PDP ∗ ,
4
1 5 0
where P = 15 0 is not necessarily lower triangular and D = 0 − 16 is
5
diagonal. To simplify D one step further, let

√1 0
Q= 5 √ .
5
0 4
Then,

1 0
∗
QDQ = ,
0 −1
B = P DP ∗ = P Q−1 (QDQ∗ )(Q−1 P )∗

√ √ √
4 5 √45 1 0 4 5 5
= √ .
5 0 0 −1 √4
0
5
Or, equivalently,

1 0
∗
RBR = ,
0 −1
where
 
0 √1
R = (P Q−1 )−1 = QP −1 =  √ √ .
5
4
5
− 5
These two examples tell us explicitly and inductively the following impor-
tant results in the study of quadratic forms.
Congruence of real symmetric matrices

(1) A real matrix An×n is congruent to a diagonal matrix, i.e. there exists
an invertible matrix Pn×n such that
 
d1 0
 d2 
PAP ∗ = 
 .. 

.
0 dn
if and only if A is a symmetric matrix.
(2) In case A is real symmetric, then there exists an invertible matrix Pn×n
such that
where
the index k of A = the number of positive 1 in the diagonal,
the signature s = k − l of A = the number k of positive 1 in the
diagonal minus the number l of negative −1 in the diagonal,
the rank r of A = k + l.
(3) Sylvester’s law of inertia
The index, signature and rank of a real symmetric matrix are invariants
under congruence.
(4) Therefore, two real symmetric matrices of the same order are congruent
if and only if they have the same invariants. (2.7.71)
The result (1) still holds for symmetric matrices over a field of characteristic
other than two (see Sec. A.3).
Exercises
<A>
1. Let

4 3
A= .
5 2
(a) Find the following factorizations of A:

λ 0
(1) A = P −1 01 λ P , where P is invertible.
2
(2) A = LU with pivots on the diagonal of U .
(3) A = LDU , where D is the pivot matrix of A.
(4) A as a product of elementary matrices.
(5) A = LS, where L is lower triangular and S is symmetric.
(b) Compute det A according to each factorization in (a).

(c) Solve the equation Ax ∗ = b∗ , where
x = (x1 , x2 ) and b = (b1 , b2 ),
by using the following methods:

(1) Apply elementary operations to the augmented matrix [A | b∗ ].
(2) Apply A = LU in (a).
(3) Apply A = LDU in (a).
(d) Compute A−1 by as many ways as possible, including the former
four ones in (a).
(e) Find the image of the square with vertices at ± e1 ±e2 under the
mapping x → x A by direct computation and by the former four

factorizations shown in (a). Illustrate graphically at each step.

2. Let

0 5
A= .
3 −2
Do the same problems as in Ex. 1, except (a)(2) is replaced by

E(1)(2) A = LU and (a)(3) is replaced by A = LP U , where P is a
permutation matrix.
3. Let

1 2
A= .
2 7
Do the same problems as in Ex. 1, except (a)(3) is replaced by

A = LDL∗ .
4. Let

1 3
A= .
−3 7
Do the same problems as in Ex. 1 but pay attention to (a)(1). What

change is needed in (a)(1)? And how?
5. Let

1 4
A= .
−3 7
Do the same problems as in Ex. 1. What happens to (a)(1)?
6. Let

2 −4
A= .
−6 12
(a) Find the following factorizations of A:

λ 0
(1) A = P −1 01 λ P .
2

1 −2
(2) A = E1 E2 0 0 , where E1 and E2 are elementary matrices.
(3) A = LU .
(4) A = LS, where L is lower triangular and S is symmetric.

(b) Determine when the equation A x ∗ = b∗ , where
x = (x1 , x2 ) and

b = (b1 , b2 ), has a solution and then, solve the equation by using
the following methods.

(1) Apply elementary row operations to [A | b∗ ].
(2) Apply A = LU in (a).
(c) Find the image of the triangle ∆ a1 a3 , where
a2 a1 = (3, 1),
a2 = (−2, 3) and a3 = (2, −4), under the mapping

x →
x A by
direct computation and by the former three factorizations in (a).
Illustrate graphically at each step.
7. Suppose A = LDU , where L, D, U are invertible.
(a) In case A has another factorization as A = L1 D1 U1 , show that
L1 , D1 , U1 are invertible and
L = L1 , D = D1 and U = U1
by considering L−1
1 LD = D1 U1 U
−1
.
∗ −1 ∗
(b) Rewrite A as L(U ) (U DU ). Show that there exists a lower tri-
angular matrix L and a symmetric matrix S such that
A = LS.
Notice that the diagonal entries along L are all equal to 1.

(Note These results hold not just for A2×2 over real but also for any
invertible matrix An×n over a field F.)
8. Let

6 2
A= .
2 9
(a) Show that A has eigenvalues λ1 = 10 and λ2 = 5. Find the respec-

tive eigenvectors
u1 ,
u2 , and justify that

−1 10 0 u
PAP = , P = 1 .
0 5 u2
In what follows, suppose you have concepts from plane Euclidean

geometry (refer to Chap. 4, if necessary). Show that u1 is perpen-
u2 , i.e. their inner product
dicular to u2 =
u1 , u ∗2 = 0. Divide
u1

ui

ui by its length |
ui |, the resulting vector is of unit length for
|
u |
i
i = 1, 2. Let
 

u1
 |u1 | 
Q= u2  .

| u2 |
Justify that Q∗ = Q−1 , i.e. Q is an orthogonal matrix and

∗ 10 0
QAQ =
0 5
still holds. Try to adjust Q to a new matrix R such that

1 0
RAR∗ = = I2
0 1
so that A = R−1 (R−1 )∗ .

(b) Try to graph the quadratic curve
6x21 + 4x1 x2 + 9x22 = x = 1

x A,
in the Cartesian coordinate system N = { e2 }, where

e1 ,

x = (x1 , x2 ), by any method available (for example, a computer).
What does it look like? An ellipse, a parabola, a hyperbola or
anything else? Determine its equations in the respective bases
B = { u2 }, C = {
u1 , v2 } and D = {
v1 , w 2 }, where
w1 , v1 ,
v2 are
row vectors of Q and w1 , w 2 are row vectors of R. In the eyes of

D, what does the quadratic curve look like?

(c) Applying elementary row operations to A, try to find an invertible

matrix S such that

6 0
SAS ∗ = .
0 25
3
and another invertible matrix T such that

1 0
TAT ∗ = = I2
0 1
so that A = T −1 (T −1 )∗ . Compare this T with R in (a). Are they
the same?
(d) What are the equations of the quadratic curve in (b) in the respec-

tive bases S = { s2 } and T = t 1 , t 2 } where
s1 , s1 ,
s2 are row

vectors of S and t 1 , t 2 are row vectors of T . Graph it in both
bases.
(e) After your experience with (a)–(d), which of these bases for R2 is
your favor in the understanding of that quadratic curve? Give some
convincing reasons to support your choice.
9. Let

7 4
A= .
4 −8
Do the same problems as in Ex. 8. Now, the corresponding quadratic
curve is
x = 7x21 + 8x1 x2 − 8x22 = 1,
x A,
x = (x1 , x2 ) in N = {
where e2 }.
e1 ,
10. Let

2 4
A= .
4 8
Do the same problems as in Ex. 8.
11. Let A2×2 be a nonzero real symmetric matrix. Suppose P2×2 and Q2×2
are two invertible matrices such that

∗ λ1 0 ∗ µ1 0
PAP = and QAQ = .
0 λ2 0 µ2
Show that
(1) λ1 > 0, λ2 > 0 ⇔ µ1 > 0, µ2 > 0,
(2) λ1 < 0, λ2 < 0 ⇔ µ1 < 0, µ2 < 0,
(3) λ1 λ2 < 0 ⇔ µ1 µ2 < 0, and

(4) one of λ1 and λ2 is zero and the nonzero one is positive (or negative)
⇔ one of µ1 and µ2 is zero and the nonzero one is positive (or
negative).
(Note This is the Sylvester’s law in (2.7.71) for 2×2 symmetric matrix.
Ponder if your proof is still good for 3 × 3 or n × n symmetric matrix.
If yes, try it; if no, is there any other way to attack this problem? See
Ex. 3 of Sec. 3.7.5.)
12. Factorize the nonzero matrix

a11 a12
A=
a21 a22
into products of elementary matrices and hence interpret its mapping

properties (refer to (2.7.9)) directly and step by step by using the
factorization.

Caution the following problems are combined and more challenging and
basic knowledge about inner products (see Chap. 4) are needed.
1. Let

0 1 2
A= .
−1 0 1
(a) A is an echelon matrix. Then

0 1 −1 0 1
A= .
1 0 0 1 2
Also,

1 0 −1 0 −1
PA = , where P =
0 1 2 1 0
is the row-reduced echelon matrix of A and

 
1 0 1
1 0 0
PAQ = , where Q = 0 1 −2
0 1 0
0 0 1
is the normal form of A. Justify such P and Q. Hence

−1 1 0 0
A=P Q−1
0 1 0
 
1 0 −1
0 1 1 0 0 
= 0 1 2 .
−1 0 0 1 0
0 0 1
There are two ways to interpret geometrically PAQ or
A = P −1 (PAQ)Q−1 .
(b) Fix the Cartesian’s coordinate system N = { e2 } in R2 . A, con-
e1 ,
sidered as the mapping x → x A, is the composite of the following

three consecutive maps:

x P −1 → xP −1 (PAQ) → xP −1 (PAQ)Q−1 =
x → x A.
See the upper part of Fig. 2.65. The other way is to take basis
B = { u2 } for R2 , where
u1 , u1 = (0, −1) and u2 = (1, 0) are row
vectors of P , and the basis C = { v1 , v2 , v3 } for R3 , where

v1 =
(1, 0, −1), v2 = (0, 1, 2) and v3 = (0, 0, 1) are row vectors of Q−1 .
Let N = { e1 ,
e2 ,
e3 } be the Cartesian coordinate system for R3 .
Then (refer to (2.7.23)),

B
P = PN , Q = QN
C

B N 1 0 0
⇒ PAQ = PN AN QN B
C = AC = ,
0 1 0
which is the matrix representation of A = AN N , in terms of B and C.
Thus, x → xA = x ] B → [
y is equivalent to [ x ] B AB
C = [ y ]C . See
the lower part of Fig. 2.65.
(c) The image of the unit circle x21 + x22 = 1, where x = (x1 , x2 ), under
the mapping A, is the intersection of the hyperplane y1 −2y2 +y3 = 0
with the cylinder y12 + y22 = 1 which is an ellipse, where y = xA =
(y1 , y2 , y3 ). See Fig. 2.66.
An interesting problem concerned is that where are the semiaxes
of the ellipse and what are their lengths. Also, is it possible to choose
a basis for R3 on which the ellipse has equation like a1 β12 +a2 β22 = 1?
Do you have any idea in solving these two challenging problems?
(d) Actual computation shows that

∗ 5 2
AA = .
2 2
e2 e3 ′
PAQ − e1 ′
P −1 Q−1
e2
− e1 0 0 e2 ′
(0,1, 2)
e1 ′
[e2 ] ( −1,0,1)
e1 v3 e3 ′
0 PAQ = A v2
−v1
P=P
u2 = [e1 ] Q=Q '
0 e2 ′
0
u1 e1 ′
v1
Fig. 2.65
e2 e2 e3 ′
e3 ′
−e1 ′ −1
−1 PAQ Q
e1 P − e1 e2 ′
0 0 0 e2 ′ 0
e1 ′ e1 ′
Fig. 2.66
Show that AA∗ has eigenvalues 1 and 6 with corresponding eigen-

vectors
v1 = √15 (−1, 2) and
v2 = √15 (2, 1). Thus,

1 0 v1 1 −1 2
R(AA∗ )R−1 = , where R = = √ .
0 6 v2 5 2 1
Note that R∗ = R−1 . Take the new basis B = { v2 } for R2 and

v1 ,
remember that, for x = (x1 , x2 ) ∈ R ,
2
x R−1 = [

x ]B = (α1 , α2 ).
x ∈ R2 and hence all [

Therefore, for all x ]B = (α1 , α2 ),

1 0 α1
x AA∗

x∗ = x R−1 (RAA∗ R−1 )(
x R−1 )∗ = (α1 α2 )
0 6 α2
= α12 + 6α22 ≤ 6(α12 + α22 ), and
≥ α12 + α22 .
What happens to α12 + α22 if x21 + x22 = 1 holds? Do these information

help in solving the aforementioned problems?
It might be still a puzzle to you why one knows beforehand to start from
the consideration of AA∗ and its eigenvalues. Try to figure this out as
precisely as possible.
2. Let

1 1 0
A= .
−1 2 1
Do the same problems as in 1.

3. Let

1 2 3
A= .
−2 −4 −6
Note that the rank r(A) = 1.

(a) Show that A has the LU-decomposition

1 0 1 2 3
A= .
−2 1 0 0 0
Meanwhile,

1 2 3 1 0
PA = , where P =
0 0 0 2 1
is the row-reduced echelon matrix and

 
1 −2 −3
1 0 0
PAQ = , where Q = 0 1 0
0 0 0
0 0 1
is the normal form of A.

(b) (Refer to Ex. <A> 20 of Sec. 2.7.3. The natural inner products both
in R2 and in R3 are needed and are urged to refer to the Introduction
in Chap. 4.) Show that, for
x = (x1 , x2 ) and
y = (y1 , y2 , y3 ) =
x A,
x ∈ R2 | x1 − 2x2 = 0} = (2, 1),

Ker(A) = {
y ∈ R3 | 2y1 − y2 = 3y1 − y3 = 0} = (1, 2, 3);
Im(A) = {
y = (y1 , y2 , y3 ) ∈ R3 and
for x = (x1 , x2 ) = x ∗ = A
y A∗ or y ∗,
Ker(A∗ ) = {
y ∈ R3 | y1 + 2y2 + 3y3 = 0}
= (−2, 1, 0), (−3, 0, 1),
∗
Im(A ) = {
x ∈ R2 | 2x1 + x2 = 0} = (1, −2).
Justify that
Im(A)⊥ = Ker(A∗ ), Ker(A)⊥ = Im(A∗ ), and
∗ ⊥ ∗ ⊥
Im(A ) = Ker(A), Ker(A ) = Im(A),
also R3 = Ker(A∗ ) ⊕ Im(A) and R2 = Ker(A) ⊕ Im(A∗ ). See
Fig. 2.67.
⊥
A Im(A) = Ker( A* )
*
Im(A )
(1, 2, 3)
* ⊥
Ker( A) = Im ( A )
Ker( A* )
0 0
(1, −2)
A*
Fig. 2.67
(c) From our experience in (2.7.8) and Fig. 2.67, it is easily known that
the line (1, −2) = Im(A∗ ) is an invariant line of AA∗ , i.e. there
exists a scalar λ such that
(1, −2)AA∗ = λ(1, −2).
Actual computation shows that λ = 70. Hence, even though A, as
a whole, does not have an inverse, when restricted to the subspace
Im(A∗ ), it does satisfy

1 ∗
A A = 1Im(A∗ ) .
70
Therefore, specifically denoted,
1 ∗
AA+ =
70
serves as a right inverse of the restriction A|Im(A∗ ) .
Since Ker(A) = Im(A∗ )⊥ , for any x ∈ R2 , there exist unique

scalars α and β such that x = α(1, −2) + β(2, 1) and hence

x (AA+ ) = α(1, −2)

= the orthogonal projection of

x onto
Im(A∗ ) along Ker(A).
Therefore, AA+ is the orthogonal projection of R2 onto Im(A∗ ) along

Ker(A). Show that it can be characterized as:
AA+ is symmetric and (AA+ )2 = AA+ .

⇔ (AA+ )2 = AA+ and Im(AA+ ) = Ker(AA+ )⊥ ,
Ker(AA+ ) = Im(AA+ )⊥ .
⇔ There exists an orthonormal basis B = { x2 } for R2 , i.e.
x1 ,
x1 | = |
| x2 | = 1 and x2 = 0 such that
x1 ,

1 0
[AA+ ]B = .
0 0
See Fig. 2.68. Similarly, A+ A: R3 → R3 is the orthogonal pro-

jection of R3 onto Im(A) along Ker(A∗ ). Try to write out similar
characteristic properties for A+ A as for AA+ . See Fig. 2.69. A+ is
called the generalized or pseudoinverse of A. For details, see Note
in Ex. 3 of Sec. 3.7.3, Example 4 in Sec. 3.7.5, Secs. 4.5, 5.5
and B.8.
+ x
AA ⊥
Ker( A) = Im(A*)
(2,1)
(1, −2)
Im(A*)
Fig. 2.68
Im(A)
A+ A
x
⊥
Ker(A* ) = Im(A)
Fig. 2.69
(d) Since r(A) = 1, try to show that

1 2 3 1
A= = [1 2 3]
−2 −4 −6 −2
 
√1
√ 1 2 3

= 70 −2
5
√ √ √
√ 14 14 14
5
 1 
√
14
1   1 −2 1 ∗
+ 
⇒ A = √  14  √
√ 2 
√ = A .
70 5 5 70
√3
14
√ √
Here 70 is called the singular value of A (see (e)) and ( 70)2 = 70
is the nonzero eigenvalue of AA∗ as shown in (c).
(e) Suppose x1 = √15 (1, −2) which is intentionally normalized from
(1, −2) as a unit vector. Also, let
x = √15 (2, 1). Then

 5 √ 1 2 3
x1 A = √ (1, 2, 3) = 70 √ , √ , √ ,
5 14 14 14


x2 A = 0 .
* +
Let
y1 = √1 , √2 , √3 . Take any two orthogonal unit vectors y2
14 14 14 * + * +
∗ −2 √1 −3 √ −6 √5
and
y3 in Ker(A ), say
y2 = √ 5
, 5
, 0 and
y 3 = √
70
, 70
, 70
.
Then
√
y1 + 0 ·
x1 A = 70 y2 + 0 ·
y3 ,

= 0 =0·
x2 A y1 + 0 ·
y2 + 0 ·
y3 ,
 
√ y1
x1 70 0 0  
⇒ A=  y2 
x2 0 0 0

y3
 
−1 √ y1 √
x1 70 0 0   70 0 0
⇒A=  y2  = R S, (∗)
x2 0 0 0 0 0 0

y3
where
 −1  
√1 −2
√ √1 √2
R=  5 5
=  5 5
and
√2 √1 −2
√ √1
5 5 5 5
 
√1 √2 √3
14 14 14
 −2 
S= √ √1 0 
 5 5 
√−3 −6
√ √5
70 70 70
are both orthogonal, i.e. R∗ = R−1 and S ∗ = S −1 . (∗) is called

the singular value decomposition of A. See Fig. 2.70. Notice that the
generalized inverse
 
√ + √1 0
70
70 0 0  
= 0 0
0 0 0
0 0
and hence, the generalized inverse of A is

 1 
√ 0
70
  ∗ 1
A+ = S ∗  0 ∗
0 R = √70 A .
0 0

(f) For b ∈ R3 but b ∈ / Im(A), there is no solution for the equation

x A = b . But, for any b ∈ Im(A), there are infinitely many solutions

for
x A = b . In fact, any point
x along the line b A+ + Ker(A) is a
70 y1
x2
A y1
0
0 y2
x1
y3
R S
e3 ′
e1  70 0 0 
 
 0 0 0
e2 ′
0 e2 0
e1 ′
70 e1 ′
Fig. 2.70

solution and b A+ is the shortest solution (in distance from 0 or its
length) among all, i.e.

| b A+ | = min |
x |.

x A= b

Equivalently, it is the vector b A+ that minimizes | b −
x A| for all
x ∈ R2 , i.e.

| b − ( b A+ )A| =
min | b −
x A|.
x ∈R2
This is the theoretical foundation for the least square method in

applications. Both explanations above can be used as definitions for
generalized inverse of a matrix (see Sec. B.8).
(g) Find orthogonal matrices T2×2 and U3×3 such that

∗ −1 70 0
T (AA )T = , and
0 0
 
70 0 0
U (A∗ A)U −1 =  0 0 0
0 0 0
Are T and U related to P and Q in (a)? Or R and S in (e)? In case

you intend to investigate the geometric mapping properties of A,
which matrix factorization is better in (a), (c) or (e)?
4. Let

−2 0 1
A= .
4 0 −2
Do the same problems as in Ex. 3
1. Prove (2.7.66).
2. Prove (2.7.67).
3. Prove (2.7.68).
4. Prove (2.7.69).
5. Prove (2.7.70).
6. Prove (2.7.71).
Something more should be said

about
the block matrix or partitioned
∗
matrix. The augmented matrix A
or [A | b ] is the simplest of such
b
matrices. Let Am×n be a matrix (over any field F). Take arbitrarily positive
integers m1 , . . . , mp and n1 , . . . , nq such that

p
q
mi = m, nj = n.
i=1 j=1
Partition the rows of A according to m1 rows, . . . , mp rows and the columns

of A according to n1 columns, . . . , nq columns. Then A is partitioned into
p × q blocks as
 
A11 A12 ... A1q
A21 A22 ... A2q 
 
A= . .. ..  = [Aij ],
 .. . . 
Ap1 Ap2 . . . Apq
where each Aij , 1 ≤ i ≤ p, 1 ≤ j ≤ q, is an mi × nj matrix and is called a

submatrix of A. A matrix of this type is called a block or partitioned matrix.
Two block matrices Am×n = [Aij ] and Bm×n = [Bij ] are said to be of the
same type if Aij = Bij for 1 ≤ i ≤ p, 1 ≤ j ≤ q. Block matrices have the

following operations as the usual matrices do (see Sec. B.4).
(1) Addition If A = [Aij ] and B = [Bij ] are of the same type, then
A + B = [Aij + Bij ].
(2) Scalar multiplication For A = [Aij ] and λ ∈ F,
λA = [λAij ].
(3) Product Suppose Am×n = [Aij ] and Bn×l = [Bjk ] where Aij is
mi ×nj submatrix of A and Bjk is nj ×lk submatrix of B for 1 ≤ i ≤ p,
1 ≤ j ≤ q, 1 ≤ k ≤ t, i.e. the column numbers of each Aij is equal to
the row numbers of the corresponding Bjk , then

q
AB = [Cik ], Cik = Aij Bjk for 1 ≤ i ≤ p, 1 ≤ k ≤ t.
j=1
(4) Transpose If A = [Aij ], then
A∗ = [A∗ij ]∗ .
(5) Conjugate transpose If A = [Aij ] is a complex matrix, then
Ā∗ = [Ā∗ij ]∗ .
Block matrix of the following type

 
  A11 0
A11 0 A21 
   A22 
 A22  or  ..  or
 . .. 
.. .
0 . App
Ap1 Ap2 . . . App
 
A11 A12 ... A1p
 A22 ... A2p 
 
 .. .. 
 0 . . 
App
is called pseudo-diagonal or pseudo-lower triangular or pseudo-upper trian-

gular, respectively.
Do the following problems.

7. Model after elementary row and column operations on matrices to prove
the following Schur’s formulas. Let

A11 A12
A= ,
A21 A22 n×n
where A11 is an invertible r × r submatrix of A.

Ir O A11 A12 A11 A12
(a) = .
−A21 A−1
11 In−r A21 A22 O A22 − A21 A−1
11 A12

A11 A12 Ir −A−1
11 A12 A11 O
(b) = .
A21 A22 O In−r A21 A22 − A21 A−1
11 A12

Ir O A11 A12 Ir −A−1
11 A12
(c)
−A21 A−1
11 In−r A21 A22 O In−r

A11 O
= .
O A22 − A21 A−1
11 A12
Hence, prove that

r(A) = r(A11 ) ⇔ A22 = A21 A−1
11 A12 .
Also, prove that if A and A11 are invertible, then A22 − A21 A−1
11 A12 is
also invertible and
−1
−1 Ir −A−1
11 A12 A11 O Ir O
A = .
O In−r O (A22 − A21 A−1
11 A12 )
−1
−A21 A−1
11 In−r
8. Let Aij be n × n matrices for 1 ≤ i, j ≤ 2 and A11 A21 = A21 A11.

(a) If A11 is invertible, then

A11 A12
det = det(A11 ) · det(A22 − A21 A−1
11 A12 )
A21 A22
= det(A11 A22 − A21 A12 ).
(b) If Aij are complex matrices for 1 ≤ i, j ≤ 2, then the invertibility
of A11 may be dropped in (a) and the result still holds.
(c) In particular,

A11 A12
det = det(A11 A∗11 + A∗12 A12 ) if A∗12 A11 = A11 A∗12 ;
−A∗12 A∗11

O A12
det ∗ = (−1)n | det A12 |2 .
A12 O
9. Let Am×n and Bn×m be matrices over F.

(a) Show that

Im O Im A I A
= m , and
−B In B In O In − BA

Im −A Im A I − AB O
= m .
O In B In B In
(b) Show that det(In − BA) = det(Im − AB).
(c) Show that In − BA is invertible if and only if Im − AB is invertible.
Try to use A, B and (Im − AB)−1 to represent (In − BA)−1 .
(d) Use (b) to show that
 
1 + x1 y1 x1 y2 ··· x1 yn
 x2 y1 1 + x2 y2 · · · x2 yn 
 
det  .. .. ..  = 1 + x1 y1 + · · · + xn yn .
 . . . 
xn y1 xn y2 · · · 1 + xn yn
In general, for An×1 and Bn×1 and λ ∈ F,
det(In − λAB ∗ ) = 1 − λB ∗ A.
Hence In − λAB ∗ is invertible if and only if 1 − λB ∗ A = O and
λ
(In − λAB ∗ )−1 = In + AB ∗ .
1 − λB ∗ A
(e) Show that, for λ ∈ F,
λn det(λIm − AB ) = λm det(λIn − BA).
10. (a) Suppose A ∈ M(n, F) satisfying A2 = A, i.e. A is idempotent. Show
that r(A) = tr(A).
(b) Let Ai ∈ M(n, F) for 1 ≤ i ≤ k be such that A1 + · · · + Ak = In .
Show that A2i = Ai , 1 ≤ i ≤ k, if and only if r(A1 )+· · ·+r(Ak ) = n.
11. Let

B C
A=
O Ir
be a square matrix (and hence B is a square submatrix, say of order m).
(a) Suppose det (B − Im ) = 0, then
n
n B (B n − Im )(B − Im )−1 C
A = , n = 0, 1, 2, . . . .
O Ir
(b) In case det B = 0, then (a) still holds for n = −1, −2, . . . .
12. Show that

Am×n Im
O Bn×m (m+n)×(m+n)
is invertible if and only if BA is invertible.

13. Let An×n and Bn×n be complex matrices. Show that

A −B
det = det (A + iB) det(A − iB).
B A
In particular,

A O O −B
det = (det A)2 ; det = (det B)2 .
O A B O
14. Adopt notations in (2.7.66).

(a) Show that
E(i)+α(j) = Eα−1 (j) E(i)+(j) Eα(j) , α = 0, i = j;

E(i)(j) = E−(j) E(j)+(i) E−(i) E(i)+(j) E−(j) E(j)+(i) .
(b) Show that an invertible matrix An×n can always be expressed as a

product of square matrices of the form In + αEij .
15. Use (2.7.70) to show that a matrix Am×n of rank r ≥ 1 is a sum of
r matrices of order m × n, each of rank 1.
16. Let A ∈ M(n, R) be a symmetric matrix with rank r. Show that at least
one of its principal subdeterminants (see Sec. B.6) of order r is not equal
to zero, and all nonzero such subdeterminants are of the same sign.
17. Let A be a m × n matrix. Show that r(A) = n if and only if there exists
an invertible matrix Pm×m such that

I
A=P n
O m×n
by using the normal form of A in (2.7.70).

18. The rank decomposition theorem of a matrix
Let Am×n be a matrix. Show that
r(A) = r ≥ 1
⇔ There exist a matrix Bm×r of rank r and a matrix Cr×n of rank r
such that A = BC.
Then, show that, in case A is a real or complex matrix, the generalized

inverse of A is
A+ = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ .
19. By performing elementary row operations and elementary column oper-
ations of type 1, any m × n matrix can be reduced to the form

Ir B
,
O O
where r = r(A).
20. Elementary matrix operations shown in (2.7.59) can be extended to
block matrices as follows. Let

A11 A12
A= ,
A21 A22 m×n
where A11 is a p × q submatrix.
(1) Type 1

O Im−p A11 A12 A21 A22
= ,
Ip O A21 A22 A11 A12

A11 A12 O Iq A12 A11
= .
A21 A22 In−q O A22 A21
(2) Type 2

Bp×p O A11 A12 BA11 BA12
= , etc.;
O Im−p A21 A22 A21 A22

A11 A12 Iq O A11 A12 B
= , etc.
A21 A22 O B(n−q)×(n−q) A21 A22 B
(3) Type 3

Ip B A11 A12 A11 + BA21 A12 + BA22
= , etc.;
O Im−p A21 A22 A21 A22

A11 A12 Iq O A11 + A12 B A12
= , etc.
A21 A22 B In−q A21 + A22 B A22
Note that Schur’s formulas in Ex. 7 are special cases.
21. Prove the following results for rank.
(a) If A = [Aij ] is a block matrix, the r(A) ≥ r(Aij ) for each i and j.
* +
A O
(b) r O B
= r(A) + r(B).
* + * +
A C A O
(c) r O B
≥r O B .
(d) Use (c) to prove that r(A) + r(B) − n ≤ r(AB), where Am×n and
Bn×m .
22. Suppose Am×n , Bn×p and Cp×q .
(a) Show that

Im A ABC O Iq O O −Iq AB O
= .
O In O B −C Ip Ip O B BC
(b) Show that

AB O AB O ABC O
r ≤r =r
O BC B BC O B
and hence, deduce the Frobenius inequality (see Ex. <C> 11 of
Sec. 2.7.3)
r(AB) + r(BC) ≤ r(B) + r(ABC).
(c) Taking B = In and using Bn×p to replace C, show the Sylvester
inequality (see (2.7.43) and Ex. <C> of Sec. 2.7.3)
r(A) + r(B) − n ≤ r(AB).
23. (a) Let Am×n and Bm×n be real matrices. Show that
(det AB ∗ )2 ≤ (det AA∗ )2 (det BB ∗ )2 .
(b) Let Am×n be a complex matrix. Show that any principal subdeter-
minant of order r of AĀ∗

∗ i1 . . . ir
AĀ ≥ 0, 1 ≤ i1 < · · · < ir ≤ m
i1 . . . ir
(see Sec. B.6).
2.7.6 Diagonal canonical form

The preliminary concepts about eigenvalues and eigenvectors of a linear
operator or a real 2 × 2 matrix were introduced in Example 1 through
Example 7 in Sec. 2.7.2, via the geometric terminology of invariant lines or
subspaces. Formal definitions for them are in (2.7.11) and (2.7.13). See also
(2.7.12), (2.7.14), (2.7.19) and (2.7.38), (2.7.39), (2.7.42).
From Secs. 2.7.2 to 2.7.5, we repeat again and again in examples and
exercises to familiarize readers with these two concepts and methods to
compute them, and to make readers realize implicitly and purposely how
many advantages they might have in the investigation of geometric mapping
properties of a linear operator.
Here in this subsection, we give two examples to end the study of the
diagonalizability of a linear operator (see (2.7.24)) and to summarize for-
mally their results in (2.7.72) and (2.7.73). Examples presented here might
make you feel cumbersome, boring, and sick. If so, please just skip this
content and go directly to Exercises or Sec. 2.7.7.
Example 1 Let R2 be endowed with Cartesian coordinate system

N = { e2 } as Fig. 2.17(b) indicated. Investigate the geometric map-
e1 ,
ping properties of the linear transformation
f (x1 , x2 ) = (2x1 − x2 , −3x1 + 4x2 )

2 −3
= x A, where x = (x1 , x2 ) and A = .
−1 4
Solution It is easy to check that f is indeed a linear transformation.

Moreover, f is isomorphic. To see this, suppose f (x1 , x2 ) = 0. This is
equivalent to say that
2x1 − x2 = 0, −3x1 + 4x2 = 0,

whose only solution is x = (x1 , x2 ) = 0 . Hence f is one-to-one. Next, for
any given
y = (y1 , y2 ), solve f (x1 , x2 ) = (y1 , y2 ), i.e.

2x1 − x2 = y1
−3x1 + 4x2 = y2
 4 1

x1 = y1 + y2
5 5 1 4 3
⇒ , with A−1 = .

x = 3 y + 2 5 1 2
2 1 y2
5 5

The resulting vector x = 45 y1 + 15 y2 , 35 y1 + 25 y2 is then the (unique)
solution to f (
x) = y . Thus f is onto (see (2.7.8)). It is worth to notice
that the above algebraic computation can be simplified as the computation
of the inverse A−1 of the matrix A: y = xA ⇔ x = y A−1 .
f maps straight lines into straight lines and preserves ratios of lengths
of line segments along the same line. The equation
a1 x1 + a2 x2 + b = 0 (∗1 )
of a line can be written in matrix form as

a1
x + b = 0.
a2
Hence, the image of this line under f has the equation

−1 a1
yA + b = 0, or
a2
(4a1 + 3a2 )y1 + (a1 + 2a2 )y2 + 5b = 0, (∗2 )
which represents a straight line. The property that f preserves ratios of

lengths segments is contained in the definition of f , a linear transformation.

Do you see why? Of course, one can use parametric equation x = a +tb
for the line (see (2.5.4)) to prove these results.
f preserves the relative positions of two lines (see (2.5.9)). Suppose the

two lines have respective equation x = a1 + t b1 and x = a2 + t b2 where

t ∈ R. Note that the image of the line x = a i + t b i under f has the

equation y = a i A + t b i A for i = 1, 2. Therefore, for example,
two lines intersect at a point

x0 .

⇔ b1 and b2 are linearly independent.

⇔ b1 A and b2 A are linearly independent(because A is invertible).
⇔ The two image lines intersect at a point
x0 A.
Again, one should refer to Ex. 4 of Sec. 2.4 or (2.7.8).

Suppose the x1 x2 -plane and y1 y2 -plane are deposited on the same plane.
We are posed with the problem: Is there any line coincident with its image
under f ? If yes, how to find it and how many can we find?

For simplicity, suppose firstly the line passes through the origin 0 =
(0, 0). Then in (∗1 ) and in (∗2 ), b = 0 should hold and both lines are
coincident if and only if
4a1 + 3a2 a1 + 2a2
= .
a1 a2
There are two algebraic ways to treat the above equation.
(1) One is
a2 (4a1 + 3a2 ) = a1 (a1 + 2a2 )

⇔ a21 − 2a1 a2 − 3a22 = (a1 − 3a2 )(a1 + a2 ) = 0
⇔ a1 + a2 = 0 which results in a1 : a2 = 1 : −1, or
a1 − 3a2 = 0 which results in a1 : a2 = 3 : 1.
This means that the line x1 −x2 = 0 and 3x1 +x2 = 0 are kept invariant
under the mapping f .
(2) The other is
4a1 + 3a2 a1 + 2a2

= =α
a1 a2

(4 − α)a1 + 3a2 = 0 4−α 3 a1 0
⇔ or =
a1 + (2 − α)a2 = 0 1 2−α a2 0
⇔(Suppose there does exist (a1 , a2 ) = (0, 0), and it solves equations.
Refer to Exs. <A> 2 and 4 of Sec. 2.4.)

4−α 3 4 − α 3
det =
1 2−α 1 2 − α
= (4 − α)(2 − α) − 3 = α2 − 6α + 5 = 0
⇔ α = 1, 5.
The case α = 1 will result in a1 + a2 = 0 which, in turn, implies that

x1 − x2 = 0 is an invariant line. While α = 5 results in a1 − 3a2 = 0 and
hence 3x1 + x2 = 0 is another invariant line. We obtain the same results
as in (1).
Note 1 Method (2) above can be replaced by the following process. By

the vector forms (∗1 ) and (∗2 ), we have:
of
a1 a1
The line x x A−1
= 0 coincides with the line = 0.
a2 a2
⇔ There exists constant µ such that

a1 a
A−1 =µ 1 .
a2 a2

−1 a 0
⇔ (A − µI2 ) 1 = . (∗3 )
a2 0

1 4 3 1 0
⇔ det(A−1 − µI2 ) = det −µ
5 1 2 0 1

1 4 − 5µ 3 1
= = (5µ2 − 6µ + 1) = 0.
25 1 2 − 5µ 5
1
⇒ µ = 1 and µ = .
5
If µ = 1, by (∗3 ), we have a1 − 3a2 = 0, and if µ = 15 , then a1 − a2 = 0.
Note 2 Traditionally, especially in Cartesian coordinate system, we adopt

(∗1 ) as starting point for almost every computation work about straight
lines. Why not use parametric equation in vector form as (2.5.4)? If so, then:

The line
x = t b coincides with its image
x = t b A under f .

⇔ There exists constant λ such that b A = λ b .

⇔ b (A − λI2 ) = 0 .

⇔ (since b = 0 )

2 − λ −3

det(A − λI2 ) = = λ2 − 6λ + 5 = 0.
−1 4 − λ
⇒ λ = 1, 5.

In case λ = 1, to solve b (A − λI2 ) = b (A − I2 ) = 0 is equivalent to
solve

2 − 1 −3
(b1 b2 ) = 0 or b1 − b2 = 0, where b = (b1 , b2 ).
−1 4 − 1
Hence b1 : b2 = 1 : 1 and the line x = t(1, 1) is an invariant line. In case

λ = 5, to solve b (A − λI2 ) = b (A − 5I2 ) = 0 is to solve

−3 −3
(b1 b2 ) = 0 or 3b1 + b2 = 0.
−1 −1
Hence b1 : b2 = 1 : −3 and thus the line
x = t(1, −3) is another invariant
line.
We call, in Note 2 above, λ1 = 1 and λ2 = 5 the eigenvalues of the square

matrix A, and x1 = t(1, 1) for t = 0 eigenvectors of A related to λ1 and
x2 = t(1, −3) for t = 0 eigenvectors of A related to λ2 = 5. When comparing
various algebraic methods mentioned above, the method in Note 2 is the

simplest one to be generalized.
We will see its advantages over the others as we proceed. Notice that

x1 A =
x1 and x2 A = 5 x2 . See Fig. 2.71.
f ( x) = xA
5x2 = x2 A invariant line

x = t (1,1)
invariant line
x = t (1, − 3) x
e2
x1 = x1 A
x2 (1,1)
0 e1
(1, −3)
Fig. 2.71
An immediate advantage is as follows. Let v2 = (1, −3).

v1 = (1, 1) and
Since

v1 1 1 1 1
det = det = = −4 = 0,
v2 1 −3 1 −3
so v2 are linearly independent. Thus B = {
v1 and v2 } is a basis for R2 .
v1 ,
What does the linear isomorphism f look like in B? Now,
f (
v1 ) =
v1 A = v1 + 0 ·
v1 = v2
⇒ [f (
v1 )]B = (1, 0), and
f (v2 ) = v2 A = 5 v2 =0·
v1 + 5
v2
⇒
[f ( v2 )]B = (0, 5).
Therefore, the matrix representation of f with respect to B is

[f ( v1 )]B 1 0
[f ]B = = = PAP −1 , (∗3 )
[f (v2 )]B 0 5
where

v 1 1
P = 1 = .
v2 1 −3
Hence, for any x = (x1 , x2 ) ∈ R2 , let [ x P −1 , then
x ]B = (α1 , α2 ) =

1 0
[f ( x )]B = [ x ]B [f ]B = (α1 α2 ) = (α1 , 5α2 ) (∗4 )
0 5
(see (2.4.2), (2.6.4) or (2.7.23) if necessary). Parallelograms in Fig. 2.72

show how goodness the representation [f ]B over the original [f ]N = A.
(−3,7)
〈〈 v2 〉〉 〈〈 v1 〉〉
(−1,1) (1,1)
(1,−1)
(−1,−1)
(3,−7)
Fig. 2.72
A useful and formal reinterpretation of (∗3 ) and (∗4 ) is as follows. Since

−1 1 0 1 0 0 0 1 0 0 0
PAP = = + = +5
0 5 0 0 0 5 0 0 0 1

−1 1 0 −1 0 0
⇒A = P P + 5P P = A1 + 5A2 ,
0 0 0 1
where
3 3

−1 1 0 1 3 1 1 0 1 1 4 4
A1 = P P = = ,
0 0 4 1 −1 0 0 1 −3 1 1
4 4
1
− 34
−1 0 0 4
A2 = P P = ,
0 1 − 14 3
4
the claimed advantage will come to surface once we can handle the
geometric mapping properties of both A1 and A2 . Note that
x = (x1 , x2 ) ∈ R2 using N = {

e2 }
e1 ,
↓
−1
x P = [ x ]B = (α1 , α2 ) ∈ R2 using B = { v2 }
v1 ,
↓
−1 1 0
xP = (α1 , 0), the projection of (α1 , α2 ) onto (α1 , 0) in B
0 0
↓
−1 1 0 v1
xP P = (α1 0) = α1 v1 =
x A1 .
0 0 v2
Hence A1 defines a linear transformation x ∈ R2 → x A1 ∈

v1 , called
the eigenspace of A related to the eigenvalue λ1 . Also,
x ∈

v1 ⇔
x A1 =
x
and thus A21 = A1 holds. A1 is called the projection of R2 onto v1 along
v2 . Similarly, A2 defined by
x ∈ R2 → x A2 ∈ v2 is called the
projection of R2 onto v2 along
v1 . For details, see (2.7.51).
We summarize as an abstract result.
The diagonal canonical form of a linear operator
Suppose f ( x A: R2 → R2 is a linear operator where A = [aij ]2×2 is a
x) =
real matrix. Suppose the characteristic polynomial
det(A − tI2 ) = (t − λ1 )(t − λ2 ), λ1 = λ2 .

Let
v i (= 0 ) be an eigenvector of f related to the eigenvalue λi , i.e. a
nonzero solution vector of
x A = λi
x for i = 1, 2. Then
v i = {
x ∈ R2 | x },
x A = λi i = 1, 2
v i ) ⊆
is an invariant subspace of R2 , i.e. f ( v i , and is called the
eigenspace of f related to λi .
(1) B = { v2 } is a basis for R2 and
v1 ,

λ 0 v1
[f ]B = PAP −1 = 1 , P = .
0 λ2 v2
In this case, call f or A diagonalizable.
(2) Let

1 0 0 0
A1 = P −1 P, and A2 = P −1 P.
0 0 0 1
Then,
1. R2 = v1 ⊕
v2 .
2. Ai : R → R is the projection of R2 onto
2 2
v i along
v j , i = j,
i.e.
A2i = Ai , i = 1, 2.
3. A1 A2 = A2 A1 = O.
4. I2 = A1 + A2 .
5. A = λ1 A1 + λ2 A2 .
This is called the diagonal canonical decomposition of A. (2.7.72)
x
A1 〈〈 v1 〉〉
A2
2xA2 e2
xA1
xA2 v1
v2 e1
0
xA
〈〈 v2 〉〉
1xA1
Fig. 2.73
Refer to Exs. 2 and 4 of Sec. B.11. See Fig. 2.73.

Let us return to the original mapping f in Example 1. Try to find the
image of the square with vertices at (1, 1), (−1, 1), (−1, −1) and (1, −1)
under the mapping f . Direct computation shows that

2 −3
f (1, 1) = (1 1) = (1, 1),
−1 4

2 −3
f (−1, 1) = (−1 1) = (−3, 7),
−1 4

2 −3
f (−1, −1) = (−1 − 1) = (−1, −1) = −f (1, 1),
−1 4

2 −3
f (1, −1) = (1 − 1) = (3, −7) = −f (−1, 1).
−1 4
Connect the image points (1, 1), (−3, 7), (−1, −1) and (3, −7) by consecu-
tive line segments and the resulting parallelogram is the required one. See
Fig. 2.72. Try to determine this parallelogram by using the diagonal canon-
ical decomposition of A. By the way, the original square has area equal to
4 units. Do you know what is the area of the resulting parallelogram? It is
20 units. Why? 2
In general, a linear operator on R2 may not be one-to-one, i.e. not an

isomorphism.
Example 2 Using N = { e2 }, the Cartesian coordinate system, let the

e1 ,
linear operator f : R → R be defined as
2 2
f (x1 , x2 ) = (2x1 − 3x2 , −4x1 + 6x2 )

2 −4
= x A, where x = (x1 , x2 ) and A = .
−3 6
Try to investigate its geometric mapping properties.
Solution f is obviously linear. Now, the kernel of f is

Ker(f ) = {(x1 , x2 ) | 2x1 − 3x2 = 0} = (3, 2),
which is a straight line passing through the origin. Hence f is not one-to-one
and thus is not onto.
What is the range of f ? Let y = (y1 , y2 ) =
x A, then

2x1 − 3x2 = y1
−4x1 + 6x2 = y2
⇒ 2y1 + y2 = 0.
Therefore,
Im(f ) = {
y = (y1 , y2 ) ∈ R2 | 2y1 + y2 = 0} = (1, −2).
Deposit the x1 x2 -plane and y1 y2 -plane on the same plane.
The characteristic polynomial of A is

2 − t −4
det(A − tI2 ) = = t2 − 8t.
−3 6 − t
Hence, f or A has two eigenvalues λ1 = 8 and λ2 = 0. Solve x A = 8x and

get the corresponding eigenvectors x1 = t(1, −2) for t = 0. Similarly, solve

x A = 0x = 0 , and the corresponding eigenvectors x2 = t(3, 2) for t = 0.
Therefore, v1 = (1, −2) and
v2 = (3, 2) are linearly independent and the
eigenspaces are

v1 = {
x ∈ R2 | x } = Im(f ),
x A = 8

v2 = {
x ∈ R2 |
x A = 0 } = Ker(f ),
which are invariant subspaces of f . Also,
R2 = Im(f ) ⊕ Ker(f ).
What follows can be easily handled by using classical high school alge-
bra, but we prefer to adopt the vector method, because we are studying
how to get into the realm of linear algebra.
How to find the images of lines parallel to the kernel v2 ? Let
x = x0 + t v2 be such a line and suppose it intersects the line v1 at the

point t0
v1 . Then
f (
x ) = f (
x0 + t
v 2 ) = f (
x0 ) + tf (
v 2 ) = f (
x0 )
v 1 ) = t 0 f (
= f (t0 v1 ) = 8t0
v1 .
This means that f maps the whole line x = x0 + tv2 into a single point,
7 units away from the point of intersection of it with the line v1 , along
v1 and in the direction of the point of intersection. See Fig. 2.74.
the line
8t0 v1 = f ( x) x = x 0 + tv2
〈〈 v2 〉〉
x0
e2
t0 v1 v2
e1
0
v1
〈〈 v1 〉〉
Fig. 2.74
What is the image of a line not parallel to the kernel v2 ? Let

x = x0 + t
u be such a line where u and v2 are linearly independent.
Suppose
u =u1 + u1 ∈ Im(f ) and
u2 with u2 ∈ Ker(f ). Then
f (
x ) = f (
x0 ) + tf (
u)
= f (
x0 ) + tf (
u1 +
u2 ) = f (
x0 ) + t(f (
u1 ) + f (
u2 ))
= f (
x0 ) + tf (
u1 ) = f (
x0 ) + 8t
u1 ,
where f (
x0 ) = 8t0 v1 for some t0 . Since u1 = 0, the image is a line and
coincides with the range line v1 . Also, f maps the line x = x0 + tu
one-to-one and onto the line v1 and preserves ratios of signed lengths

of segments along the line. See Fig. 2.75.

Now, we have a clearer picture about the mapping properties of f . For
any x = (x1 , x2 ), then

1 3 1 1

x= x1 − x2 v1 +
x1 + x2 v2
4 8 4 8

1 3 1 3
⇒ f(x) =

x1 − x2 f ( v1 ) =

x1 − x2 · 8 v1 = (2x1 − 3x2 )
v1 .
4 8 4 8
〈〈 v2 〉〉
x0 v2
e2 u2
u = u1
u
e1
0
v1 u1
x0
〈〈 v1 〉〉
Fig. 2.75
Define the mapping p: R → R by 2 2

1 3

p( x ) = x1 − x2 v1 .
4 8
p is linear and has the same kernel and range as f does. But p keeps every
x in

v1 fixed, i.e. p2 = p ◦ p = p holds. This p is called the projection of
R onto
2
v1 along the kernel v2 . Therefore,

x → p(
x ) → 8p(
x ) = f (
x ).
All one needs to do is to project x onto v2 , and then to
v1 along
scalarly multiply the projected vector by 8. Finally, the resulting vector is
f (
x ). See Fig. 2.76 which is a special case of Fig. 2.73.
f ( x)
8P( x)
x
〈〈 v2 〉〉
P
e2
v2
P( x)
e1
0
v1
〈〈 v1 〉〉
Fig. 2.76
We can use (2.7.72) to reinterpret the content of the above paragraph.

To set up the canonical decomposition of f , let B = { v2 } which is a
v1 ,
basis for R2 . Then

f (
v1 ) = 8 v1 + 0 ·
v1 = 8 v2

v1 ) = 0 = 0 ·
f ( v1 + 0 ·
v2

8 0 v1 1 −2
⇒ [f ]B = PAP −1 = , where P = = .
0 0 v2 3 2
Let

1 0 1 2 2 1 0 1 −2
A1 = P −1 P =
0 0 8 −3 1 0 0 3 2
1 − 12

1 2 −4 4 1
= = = A
8 −3 6 − 38 3 8
4
as we might expect beforehand, i.e. A = 8A1 . Note that

p(
x) =
x A1 ,

1 0
[p]N = A1 and [p]B = . 2
0 0
For the sake of later reference, we generalize (2.7.72) to linear operators
on Fn or n × n matrix over a field F as follows.
Diagonalization of linear operators or square matrices
Let f be a linear operator on a finite-dimensional vector space V (say, in
the form f (
x) =
x A where x ∈ Fn and An×n is a matrix over F).
(1) Diagonalizability
1. f is diagonalizable, i.e. there exists a basis B for V such that [f ]B
is a diagonal matrix.
⇔ 2. There exists a basis B = { x n } for V and some scalars
x1 , . . . ,

λ1 , . . . , λn such that f ( x i ) = λi x i , for 1 ≤ i ≤ n. Under these
circumstances,
 
λ1 0
 .. 
[f ]B =  . .
0 λn
⇔ 3. In case the characteristic polynomial of f is
det(f − t1V ) = (−1)n (t − λ1 )r1 · · · (t − λk )rk ,
where n = dim V and λ1 , . . . , λk are distinct with ri ≥ 1 for

1 ≤ i ≤ k and r1 + · · · + rk = n. Then, the algebraic multiplicity ri
of λi is equal to the dimension dim Eλi , the geometric multiplicity,
of the eigenspace
Eλi = {
x ∈ V | f ( x}
x ) = λi
corresponding to the eigenvalue λi , for 1 ≤ i ≤ k. Nonzero vectors

in Eλi are called eigenvectors corresponding to λi .
⇔ 4. Let λ1 , . . . , λk be as in 3. ϕ(t) = (t − λ1 ) · · · (t − λk ) annihilates
A, i.e.
ϕ(A) = (A − λ1 In ) · · · (A − λk In ) = On×n
(see Ex. <C> 9(e)). This ϕ(t) is called the minimal polynomial
of A.
⇔ 5. V = Eλ1 ⊕ · · · ⊕ Eλk , where Eλ1 , . . . , Eλk are as in 3.
⇐ 6. f has n distinct eigenvalues λ1 , . . . , λn , where n = dim V .
See Ex. <C> 4(e) for another diagonalizability criterion.
(2) Diagonal canonical form or decomposition
For simplicity, via a linear isomorphism, we may suppose that V = Fn
and f (
x) = x A where A is an n × n matrix. Adopt notations in (1)3.
Let B = { x 1 , . . . ,

x n } be a basis for Fn consisting of eigenvectors of A
such that
Defined the following matrices or linear operators:

Then,
1. Fn = Eλ1 ⊕ · · · ⊕ Eλk .
2. Each Ai : Fn → Fn is a projection of Fn onto Eλi along Eλ1 ⊕ · · · ⊕
Eλi−1 ⊕ Eλi+1 ⊕ · · · ⊕ Eλk , i.e.
A2i = Ai , 1 ≤ i ≤ k.
3. Ai Aj = On×n if i = j, 1 ≤ i, j ≤ k.
4. In = A1 + · · · + Ak .
5. A = λ1 A1 + · · · + λk Ak . (2.7.73)
For more details, refer to Secs. B.11 and B.12. Also, note that, if
dim(Ker(f )) ≥ 1, then each nonzero vector in Ker(f ) is an eigenvector
corresponding to the eigenvalue 0.
Exercises
<A>
1. In Example 1, does linear isomorphism f have other invariant lines
besides x1 − x2 = 0 and 3x1 + x2 = 0? Find invariant lines of the affine
transformation
f (
x) =
x0 +
x A,

where x0 = 0 , if any.
2. In Example 2, find all possible one-dimensional subspaces S of R2
such that
R2 = S ⊕ Ker(f ).
x ∈ R2 , let
For each
x + Ker(f ) = {

v |
x + v ∈ Ker(f )}
be the image of Ker(f ) under the translation v → x +
v which is the
x +
line v2 parallel to the line
v2 . Show that

x1 x2 + Ker(f ) ⇔
+ Ker(f ) = x1 −
x2 ∈ Ker(f ).
Denote the quotient set
R2 /Ker(f ) = {
x + Ker(f ) |
x ∈ R2 }
and introduce two operations on it as follows:
(1) α(x + Ker(f )) = α x + Ker(f ), α ∈ R,
(2) ( x1 + Ker(f )) + ( x2 + Ker(f )) = (

x1 +
x2 ) + Ker(f ),
which are well-defined (refer to Sec. A.1). Then, show that R2 /Ker(f )
is a vector space isomorphic to S mentioned above (see Sec. B.1).
R2 /Ker(f ) is called the quotient space of R2 modulus Ker(f ). What
is R2 /S?
3. In N = { e2 }, define f : R2 → R2 by
e1 ,

5 1
f ( x ) = x A, where A = and
x = (x1 , x2 ).
−7 −3
(a) Model after the purely algebraic method in Example 1 to answer
the following questions about f :
(1) f is one-to-one and onto.
(2) f maps straight lines into straight lines and preserves their
relative positions.
(3) f preserves ratios of signed lengths of line segments along the
same or parallel lines.
(4) f maps triangles into triangles with vertices, sides and interior
to vertices, sides and interior, respectively.
(5) f maps parallelograms into parallelograms.
(6) Does f preserve orientations (counterclockwise or clockwise)?
Why?
(7) Does f preserve areas of triangles? Why?
(8) How many invariant lines do f have? If any, find out.

x 0 ∈ R2 and
Now, for any fixed x 0 = 0 , answer the same questions
for the affine transformation
T (
x) =
x 0 + f (
x)
as (1)–(8).
(b) Model after the linearly algebraic method in Example 1 to answer
the same questions (1)–(8) as in (a). Notice the following process:
(1) Compute the characteristic polynomial det(A − tI2 ).
(2) Solve det(A − tI2 ) = 0 to determine the eigenvalues λ1 and λ2 .

(3) Solve the equation x or
x A = λi x (A−λi I2 ) = 0 to determine
the corresponding eigenvectors xi = 0, i = 1, 2.
(4) Make sure if x1 and x2 are linearly independent.
(5) Let B = { x2 }, a basis for R2 . Then set up
x1 ,

λ 0 x
[f ]B = PAP −1 = 1 , where P = 1 .
0 λ2 x2
(6) Justify (2) in (2.7.72) for this f or A.
Then, try to use these information to do the problem.
(c) Compare the advantages of both methods with each other!
4. In N = { e2 }, let f : R2 → R2 be defined by
e1 ,

6 2
f ( x ) = x A, where A = .
2 9
Do the same problems as in (a) of Ex. 3 via the linearly algebraic

method. The symmetry in entries of A induces that the eigenvectors

x1 and x2 are perpendicular to each other and could be chosen to be of
unit length. In this case, say something about the geometric mapping
properties of

x
P = 1 .
x2
5. Do the same problems as Ex. 4 for

2 3
f ( x ) = x A, where x = (x1 , x2 ) and A = .
3 −2
e1 ,

3 −2
f (
x) =
x A, where
x = (x1 , x2 ) and A = .
−6 4
(a) Use this f to justify (2.7.72), model after Example 2.

(b) Do the same problems as in (a) of Ex. 3. Be careful that some of
these problems may not be true or have to be readjusted properly.
7. A square matrix of order 2 with coincident eigenvalues is diagonalizable

if and only if the matrix is a scalar matrix.
8. A2×2 is diagonalizable ⇔ A∗ is diagonalizable.
9. If A2×2 is invertible, then A is diagonalizable ⇔ A−1 is diagonalizable.
10. Constructive linear algebra
Fix the Cartesian coordinate system N = { e2 } on the
e1 ,

plane R2 . Where is the square with vertices 0 = (0, 0), e1 = (1, 0),

e1 + e2 = (1, 1) and
e2 = (0, 1) in the coordinate system B = { v2 }
v1 ,
for v1 = − 2 , −1 and v2 = (−1, 2)? This means that we have to find
1

out [ 0 ]B , [
e1 ]B , [
e1 +
e2 ]B and [e2 ]B and reconsider these as points in
the original system N . Let

e1 = a11
v1 + a12
v2

= a21
e2 v1 + a22
v2
1
e1 a11 a12 v1 v1 − 2 −1
⇒ = , where =
e2 a21 a22 v2 v2 −1 2
 
−1 −1 − 1
a a12 e v 1 0 −1 2 1 2
⇒ 11 = 1 1 = · = .
a21 a22 e2 v2 0 1 2 1 − 12 −21 1
4
1 1 3 1
Hence [ e1 ]B = −1, − 2 , [ e2 ]B = − 2 , 4 and [ e1 + e2 ]B = − 2 , − 4 .
1
See Fig. 2.77.
v2
e2 e1 + e2
( )
1 1
− ,
2 4
( )
3 1
− ,−
2 4 0 e1
( ) −1,, −
1
2 v1
Fig. 2.77
Conversely, give a parallelogram with vertices (0, 0), (1, 2), (−2, 1)
and (−3, −1). Under what coordinate system B = { v2 } does the
v1 ,
square look like the given parallelogram? This means that we have to
find out v1 and v2 so that
[ e2 ]B = (−3, −1) and
e1 ]B = (1, 2), [ [
e1 +
e2 ]B = (−2, 1).
Now

e1 =1·
v1 + 2 ·
v2
= (−3) ·

e2 v1 + (−1) ·
v2

e1 1 2 v1
⇒ =
e2 −3 −1 v2
−1
v 1 2 1 0 1 −1 −2
⇒ 1 = = .
v2 −3 −1 0 1 5 3 1
1 2 3 1
Thus v1 = − 5 , − 5 , v2 = 5 , 5 . See Fig. 2.78.

(1, 2)
(−2, 1) e2 e1 + e2
v2
0 e1
v1
(−3, −1)
Fig. 2.78
Do the following problems. Note that R2 is always endowed with

N = { e2 }.
e1 ,

(a) Give a parallelogram with vertices 0 ,
a1 ,
a1 + a2 and a2 where
a1
and a2 are linearly independent. What would this parallelogram
look like in a coordinate system B = { v2 }?
v2 ,
(b) Conversely, find a coordinate system B = { v2 } on which the
v1 ,

parallelogram is the one with vertices 0 , b1 , b1 + b2 and b2 where

b1 and b2 are linearly independent.
Adopt the Cartesian coordinate system N = { e2 } on R2 . Let
e1 ,

a1 = (2, 1), a2 = (−4, −2) and b1 = (4, 6), b2 = (−2, −3). See

Fig. 2.79. Try to construct a linear operator, in N , to map ∆ 0 a1 b1
b1
e2
a1
e1
0
a2
b2
Fig. 2.79

onto ∆ 0 a2 b2 . In case
a1 and b1 are corresponding to
a2 and b2 respec-
tively, let linear operator f be defined as
f ( a2 = −2
a1 ) = a1 ,
1
f ( b1 ) = b2 = − b1
2

a1 −2 0 a1
⇒ [f ]N =
b1 0 −2 1
b1
−1
2 1 −2 0 2 1 − 11 − 98
⇒ f (x ) = [
x ]N [f ]N = x
=x 4
.
4 6 0 − 12 4 6 3
2
1
4

This f is the required one. In case
a1 and b1 are corresponding to b2
and
a2 respectively, define linear operator g as

g(
a2 ) = b2 ,

g( b2 ) =
a2

a2 0 1 a2
⇒ [g]N =
b2 1 0 b2
−1
−4 −2 0 1 −4 −2 − 14 5
⇒ g(
x ) = [
x ]N [g]N =
x
=x 8
.
−2 −3 1 0 −2 −3 3
2
1
4
Then the composite map

1 −22 −9 −2 5 −1 −2
(g ◦ f )( x·
x) = =
x
64 12 2 12 2 0 1
is the required one. A direct computation for g ◦f is easier and is shown

as follows.

(g ◦ f )(
a1 ) = b2 ,

(g ◦ f )( b1 ) =
a2
−1
a1 b2
⇒ (g ◦ f )( x ) = [ x ]N [g ◦ f ]N = x

b1 a2
−1
2 1 −2 −3 −1 −2
=x =x .
4 6 −4 −2 0 1

(a) Show that

−1 −2 1 −1 −1 0 1 1
= .
0 1 0 1 0 1 0 1
Consider B = {(1, 1), (0, 1)} as a basis for R2 . Try to construct

graphically, in B, the image ∆ 0 a2 b2 of ∆ 0
a1 b1 .

(b) Find all possible linear operators that map ∆ 0 a2 b1 onto ∆ 0 a1 b2 .

(c) Find all possible linear operators that map ∆ a1
a2 b1 onto

∆ a1 a2 b2 .

(d) Find all possible linear operators that map ∆ a1
a2 b1 onto

∆ b1 b2 a1 .

(e) Show that the linear operator mapping ∆ 0 a1 b1 onto ∆ 0 a2 b1 but

keeping the side 0 b1 fixed is
−1 −1
a1 −2 0 a1 a1 a2 1 −14 −9
= = .
b1 0 1 b1 b1 b1 4 12 10

a1 , b1 },
Denote the above matrix by A. Hence, in C = {

−2 0
[A]C = .
0 1
Try to explain geometrically by graphs the following two sequences
of linear operators

−1 0 −1 0 1 0
A, A , A , A and
0 1 0 −1 0 −1

−1 0 −1 0 1 0
[A]C , [A]C , [A]C , [A]C
0 1 0 −1 0 −1
in N and in C respectively. How about

0 −1 0 −1 0 1
A, A , A , A and
1 0 −1 0 −1 0

0 −1 0 −1 0 1
[A]C , [A]C , [A]C , [A]C ?
1 0 −1 0 −1 0
Adopt the Cartesian coordinate system N = { e2 } on R2 . Try
e1 ,

to map the square with vertices 0 , e1 , e1 + e2 and e2 onto a non-
degenerated segment along the line x1 + x2 = 0 where x = (x1 , x2 ).
There are various ways to achieve this. The easiest way might be that
each of the family of lines parallel to the x1 -axis will map its points or
−e1 + e2 e2 e1 + e2
0 e1
Fig. 2.80
points of intersection with the square into its points of intersection with
the line x1 + x2 = 0. See Fig. 2.80. In terms of linear algebra, this is the
projection of R2 along the x1 -axis, and in turn, eventually, becomes an
easy problem concerning eigenvectors. Define f : R2 → R2 by

f (
e1 ) = 0 ,
e2 ) = −
f ( e1 +
e2 .
Then

0 0
f (
x) =
x
−1 1
is a required one. See Fig. 2.80. Note that f (− e2 ) = −
e1 + e1 + e2 and
the image segment is covered twice by the square under f . Actually,
take any vector v which is linearly independent from − e1 + e2 . Then
a linear operator of R onto the line x1 + x2 = 0, defined as
2

f (
v) = 0,
f (−
e1 +
e2 ) = λ(−
e1 +
e2 ) for scalar λ = 0
will serve our purpose. See Fig. 2.81.
e2
−e1 + e2
e1
0
v
Fig. 2.81

(a) Does there exist any projection on R2 mapping the square with
e1 ±
vertices ± e2 onto the respective segment
(1) BC with OB = OC (lengths),
(2) BD with OB = OD,
(3) AE with OA = OE, or
(4) OA on the line x1 = x2 ?
(see Fig. 2.82). If yes, try to find all such projections; if no, give
the exact reason why and try to find some other kind of mappings
that will do.
e2 A
C O e1
Fig. 2.82
(b) Via a linear operator followed by a translation, it is possible to map

a triangle onto a non-degenerated line segment. Try to do this in
all possible different ways.

1. Prove (2.7.72).
2. Let

1 2 3 −8
A= and B = .
0 2 0 −1
(a) Show that AB = BA.

(b) Show that both A and B are diagonalizable.
(c) Let λ be any eigenvalue of A and v be any corresponding
eigenvector. Show that
v B is also an eigenvector of A.
(d) Use (c) to show that there exists a basis B = { v2 } for R2 such
v1 ,
that both [A]B and [B]B are diagonal matrices. In this case, A and
B are called simultaneously diagonalizable.
3. There exist matrices A2×2 and B2×2 such that A is diagonalizable and
AB = BA holds, but B is not diagonalizable. For example,

1 0
A = I2 and B = .
1 1
Try to find some other examples, if any.
4. Suppose A and B are similar 2 × 2 real matrices. Show that there exist
bases B and C for R2 and a linear operator f on R2 such that
[f ]B = A and [f ]C = B
(try to refer to (2.7.25)).
(Note This result still holds for n × n matrices over a field.)
5. Let f be a nonzero linear operator on R2 (or any two-dimensional vector
space). For any nonzero vector x in R2 , the subspace

x , f ( x ), . . . , ,
x ), f 2 ( denoted as Cf (
x)
is called the f -cycle subspace of R2 generated by
x.
(a) Show that Cf ( y ) ∈ Cf (
x ) is f -invariant, i.e. f ( x ) for each
y ∈ Cf ( x ).

(b) Show that either R2 = Cf ( x ∈ R2 or f = λ1R2 for some

x ) for some
scalar λ.
(c) In case f = λ1R2 for any scalar λ, let g be a linear operator on R2
such that g ◦ f = f ◦ g. Then, there exists a polynomial p(t) such
that g = p(f ). Refer to Ex. <C> 6.

1. Prove (2.7.73).
2. Suppose An×n is diagonalizable and PAP −1 is as in (2.7.73). Show that
(A − λ1 In )(A − λ2 In ) · · · (A − λk In ) = O.
And hence, A satisfies its minimal polynomial
ϕ(t) = (t − λ1 )(t − λ2 ) · · · (t − λk )
and its characteristic polynomial det(A − tIn ), which is the Cayley–
Hamilton theorem (see (2.7.13), (2.7.19), Exs. 5 and 9 below and Ex. 4
of Sec. B.10).
3. Let f : M(n; F) → M(n; F) be defined by

f (A) = A∗ (the transpose of A).
(a) Show that f is a linear operator on M(n; F).
(b) Let B = {E11 , E12 , . . . , E1n , E21 , . . . , En1 , . . . , Enn } be the ordered
basis for M(n; F) (refer to (2.7.30) and Sec. B.4). Find [f ]B .
(c) Show that ±1 are the only eigenvalues of f .
(d) Determine all eigenvectors of f corresponding to ±1.
(e) Is f diagonalizable? If yes, prove it; if not, say reason why.
4. Let f be a linear operator on a vector space V over a field F. Remind
that a subspace S of V is called f -invariant if f (S) ⊆ S holds.
(a) Examples for f -invariant subspaces are:

(1) { 0 }, V, Ker(f ) and Im(f ).
(2) For any nonzero x ∈ V , the f -cycle subspace generated by
x:
x ) =
Cf ( x , f (
x ), f 2 (
x ), . . ..
(3) For any λ ∈ F, the subspace

Gλ (f ) = {
x ∈ V | (f − λIV )k (
x) = 0
for some positive integer k}.
In particular, if λ is an eigenvalue of f, Gλ (f ) is called the
generalized eigenspace of f corresponding to λ, which contains
the eigenspace Eλ (f ) = {
x ∈ V | f ( x } as a subspace.
x ) = λ
(b) Properties
(1) Finite sum of f -invariant subspaces is f -invariant.
(2) Arbitrary intersection of f -invariant subspaces is
f -invariant.
(3) If S is f -invariant and f is invertible, then S is also
f −1 -invariant.
(4) If S is f -invariant and V = S ⊕ T , then T is not necessarily
f -invariant.
(5) If S is f -invariant, then the restriction f |S : S → S is a linear
operator.
(6) If S is f -invariant and x is an eigenvector of f |S associated
with eigenvalue λ, then the same is true for f .
(7) Suppose V = Im(f ) ⊕ S and S is f -invariant. Then S ⊆ Ker(f )
with equality if dim (V ) < ∞, but S Ker(f ) could happen if
V is not finite-dimensional.
(c) Matrix representation

(1) A subspace S of V , where dim V = n, is f -invariant if and only
if there exists a basis B = { x1 , . . . ,
xk , xn } for V such
xk+1 , . . . ,
that B1 = { x1 , . . . , xk } is a basis for S and

A11 O
[f ]B = ,
A21 A22 n×n
where the restriction f |S has matrix representation A11 with
respect to B1 .
(2) In particular, V = S1 ⊕ S2 where S1 and S2 are
f -invariant subspaces if and only if, there exists a basis B =
{x1 , . . . ,
xk , xn } for V such that
xk+1 , . . . ,

A11 O
[f ]B = ,
O A22 n×n
where { xk } = B1 is a basis for S1 with [f |S1 ]B1 = A11
x1 , . . . ,
and { xk+1 , . . . ,

xn } = B2 is a basis for S2 with [f |S2 ]B2 = A22 .
In Case (2), f = f |S1 ⊕ f |S2 is called the direct sum of f |S1 and f |S2 .
Another interpretation of A22 is as follows. Suppose S is a f -invariant
subspace of V . Consider the quotient space of V modulus S (see
Sec. B.1)
V /S = {
x + S |
x ∈V}
and the induced quotient operator f˜: V /S → V /S defined by
f˜(
x + S) = f (
x ) + S.
This f˜ is well-defined and is linear. Let π: V → V /S denote the natural
projection defined by π( x) = x + S. Then the following diagram is
commutative: π ◦ f = f˜ ◦ π.
f
V −→ V
π ↓ ↓π
f˜
V /S −→ V /S
(3) Suppose dim V = n. Adopt notations as in (1). Then B2 =
{
xk+1 + S, . . . ,
xn + S} is a basis for V /S and

A11 0
[f ]B = ,
A21 A22
where A11 = [f |S ]B1 = A11 and A22 = [f˜]B2 .
(d) Characteristic polynomials of f, f |S , and f˜

Suppose dim V = n and S is a f -invariant subspace of V .
(1) The characteristic polynomial of f |S divides that of f , so does
for f˜. Therefore, if the characteristic polynomial det(f −tIV ) =
(−1)n (t − λ1 )r1 · · · (t − λk )rk , i.e. splits as linear factors, then
so do the characteristic polynomials of f |S and f˜. This implies
that any nonzero f -invariant subspace of V contains an eigen-
vector of f if det(f − t1v ) splits.
(2) Adopt notations in (c)(2). Then
det(f − t1V ) = det(f |S1 − t1S1 ) · det(f |S2 − t1S2 ).
(3) Adopt notation in (c)(3). Then
det(f − t1V ) = det(f |S − t1S ) · det(f˜ − t1V /S ).
Both (2) and (3) provide other ways to compute the characteristic
polynomial of f .
(e) Diagonalizability of f, f |S , and f˜
Suppose dim V = n and S is a f -invariant subspace of V .
(1) If f is diagonalizable, then so is f |S .
(2) If f is diagonalizable, then so is f˜.
(3) If both f |S and f˜ are diagonalizable, then so is f .
5. Suppose f is a linear operator on a finite-dimensional vector space V .

(a) If Cf ( x = 0 and
x ) is the f -cycle subspace generated by

dim Cf ( x ) = k, then
(1) B = { x , f (
x ), f 2 (
x ), . . . , f (k−1) (
x )} is a basis for Cf (
x ).
Hence there exist unique scalars a0 , a1 , . . . , ak−1 such that
x ) = −a0
f (k) ( x − a1 f (
x ) − · · · − ak−1 f (k−1) (
x ).
(2) Therefore,
 
0 1
 0 1 0 
 
 
 0 
f |Cf ( =  
x) B  .. 


0 1 . 

 0 1 
−a0 −a1 −a2 · · · −ak−2 −ak−1 k×k
and the characteristic polynomial ϕ(t) of f |Cf (

x ) is

x ) −t1Cf (
det f |Cf ( k k
x ) = (−1) (t +ak−1 t
k−1
+· · ·+a1 t+a0 ).
with tk + ak−1 tk−1 + · · · + a1 t + a0 as its minimal polynomial
(see Ex. 9 below).
(3) Also, ϕ(t) annihilates f , i.e. ϕ f |Cf (
x ) = 0 or
f k + ak−1 f k−1 + · · · + a1 f + a0 1Cf (
x) = 0
on Cf (
x ).
(b) Use Ex. 4(d)(1) and (a) to prove the Cayley–Hamilton theorem, i.e.
the characteristic polynomial of f annihilates itself.
6. Let f be a linear operator on a vector space V with dim V < ∞.
y ∈ V , then
(a) For y ∈ Cf (x ) if and only if there exists a polynomial
ϕ(t), whose degree can always be chosen to be not larger than
dim Cf (x ), such that y = ϕ(f ) x.
(b) Suppose V = Cf ( x ) for some

x ∈ V . If g is a linear operator on V ,
then f ◦ g = g ◦ f if and only if g = ϕ(f ) for some polynomial ϕ(t).
7. (continued from Exs. 2 and 3) Two linear operators f and g on
a finite dimensional vector space V are called simultaneously diagonal-
izable if there exists a basis B for V such that both [f ]B and [g]C are
diagonal matrices.
(a) f and g are simultaneously diagonalizable if and only if, for
any basis C for V , the matrices [f ]C and [g]C are simultaneously
diagonalizable.
(b) If f and g are simultaneously diagonalizable, then f and g
commutes, i.e. f ◦ g = g ◦ f .
(c) Suppose f ◦ g = g ◦ f , then Ker(g) and Im(g) are f -invariant.
(d) Suppose f ◦ g = g ◦ f and f and g are diagonalizable, then f and
g are simultaneously diagonalizable.
8. Let An×n and Bn×n be matrices over C and the characteristic polyno-
mial of A be denoted by ϕ(t). Then
(1) A and B do not have common eigenvalues.
⇔ (2) ϕ(B) is invertible.
⇔ (3) The matrix equation AX = XB has only zero solution X = O.
⇔ (4) The linear operator fA,B : M(n; F) → M(n; F) defined by
fA,B (X) = AX − XB
is invertible.
9. A nonzero polynomial p(t) ∈ P (F) (see Sec. A.5) is said to annihilate

a matrix A ∈ M(n; F) if p(A) = On×n and is called an A-annihilator.
Therefore, the characteristic polynomial f (t) = det(A − tIn ) annihi-
lates A. Let A ∈ M(n; F) be a nonzero matrix.
(a) There exists a unique monic polynomial ϕ(t) of the least degree
that annihilates A. Such a ϕ(t) is called the minimal polynomial
of A. (Refer to Sec. B.11.)
(b) The minimal polynomial ϕ(t) divides any A-annihilator, in partic-
ular, the characteristic polynomial f (t) of A.
(c) Similar matrices have the same characteristic and minimal polyno-
mial, but not the converse. For example,
   
2 1 2 1 0
0 2 0  0 2 1 0
   
A=  and B = 0 0 2 .
 2 1  
0
0 2 0 2
Both have characteristic polynomial f (t) = (t − 2)4 and minimal

polynomial ϕ(t) = (t − 2)2 , yet they are not similar. Prove this.
(d) λ ∈ F is a zero of ϕ(t) ⇔ λ is a zero of f (t). Hence, each root of
ϕ(t) = 0 is an eigenvalue of A, and vice versa.
(e) A is diagonalizable ⇔ its minimal polynomial is
ϕ(t) = (t − λ1 ) (t − λ2 ) · · · (t − λk ),
where λ1 , . . . , λk are all distinct eigenvalues of A. Prove this by the

following steps.
(1) Let Eλi = { x ∈ Fn | x } for 1 ≤ i ≤ k. Then each
x A = λi

Eλi = { 0 } and

Eλi ∩ (Eλ1 +· · ·+Eλi−1 +Eλi+! +· · ·+Eλk ) = { 0 }, 1 ≤ i ≤ k.
,k
(2) Let pi (t) = j=i (t − λj ), 1 ≤ i ≤ k. Show that there exist
polynomials ϕ1 (t), . . . , ϕk (t) which are relatively prime such
that
ϕ1 (t)p1 (t) + · · · + ϕk (t)pk (t) = 1.
x ϕi (A)pi (A) ∈ Eλi for each

Try to show that x ∈ Fn and
1 ≤ i ≤ k.
(3) Show that Fn = Eλ1 ⊕ · · · ⊕ Eλk .

Then, use (1)4 in (2.7.73) to finish the proof (refer to Sec. B.11).
(f) A is diagonalizable ⇔ A∗ is diagonalizable.
10. Let A, B ∈ M(n; C) be nonzero matrices.
(a) (Schur, 1909) Use the fact that A has eigenvalues and the math-
ematical induction to show that A is similar to a lower trian-
gular matrix, i.e. there exists an invertible Pn×n , even unitarily
(i.e. P̄ ∗ = P −1 ), so that
 
λ1
 
 @ λ2 0 
PAP −1 = 
 @ . ..
,

 @ 
@ λn
where the diagonal entries are eigenvalues of A.
(b) Suppose AB = BA, show that A and B have a common eigenvector
by the following steps.
(1) Let λ be an eigenvalue of A and Eλ = { x ∈ Fn | x } have
x A = λ
basis { x1 , . . . , xk }. Show that each xj B ∈ Eλ for 1 ≤ j ≤ k.

k
(2) Let xj B = i=1 bij xi for 1 ≤ j ≤ k and Q = [bij ]k×k . Let
y = (α1 , . . . , αk ) ∈ Ck be an eigenvector of Q, say

y Q = µ y
for µ ∈ C.

k
(3) Try to show that x 0 = j=1 αj xj is a common eigenvector of
A and B.
(c) Suppose AB = BA. Show that there exists an invertible Pn×n such
that both PAP −1 and PBP −1 are lower triangular, i.e. A and B
are simultaneously lower triangularizable.
(d) Give examples to show that the converse of (b) is not true, in
general. Then, use
   
1 0 −1 0 4 −1 −1 0
0 1 0 −1  0 −1
A=  and B = −1 4 
1 0 −1 0   1 0 2 −1
0 1 0 −1 0 1 −1 2
to justify (b).
11. Let An×n be a complex matrix. Then
(1) A is nilpotent (i.e. there exists some positive integer k such
that Ak = On×n , and the smallest such k is called the index of
nilpotency of A).
⇔ (2) All eigenvalues of A are zeros.

⇔ (3) An = On×n .
⇔ (4) tr Al = 0 for l = 1, 2, 3, . . . .
⇔ (5) In case A has index n of nilpotency, then A is similar to
 
0
1 0 
 0 
 
 1 0 
  .
 .. 
 . 
 
 0 1 0 
1 0 n×n
12. Let A ∈ M(m, n; C). Show that there exist unitary matrices Pm×m and
Qn×n so that
 
λ1
 .. 
 . 0 
 
 
 λr 
PAQ =  , λ1 ≥ λ2 ≥ · · · ≥ λr > 0,
 0 
 
 .. 
 0 . 
0
where r = r(A) ≥ 1. Then, try to prove that
det(Im − AĀ∗ ) = det(In − Ā∗ A).
13. Let An×n be a complex matrix with trace tr A = 0. Show that A is

similar to a matrix whose diagonal entries are all equal to zeros, i.e.
there exists an invertible Pn×n such that
 
@
0 @ 
 0 
 @ 
PAP −1
=  @ .@  .
 @ .. 
 @ 
@ 0
n×n
Try the following steps:

(1) Suppose A is similar to a lower triangular matrix such as in (a)
of Ex. 10,
 
λ1  
x1
 b21 λ2
 0 
  
   x2 
PAP −1 =  b31 b32 λ3  , P =  ..  ,
 . .. .. ..   . 
 .. . . . 
xn
bn1 bn2 bn3 · · · λn
where tr A = λ1 + λ2 + · · · + λn = 0. Suppose λ1 = 0. Let

v1 = x2 − bλ211
x1 . Then B = {
v1 , xn } is a basis for Cn .
x2 , . . . ,
Now,
b21 b21

v1 A x2 A −
= x1 A = b21 x2 −
x1 + λ2 λ1 x1 = λ2
x2 ,
λ1 λ1

x2 A = b21 x2 = −λ1
x1 + λ2 v 1 + λ1
x2 + λ2
x2
= −λ1 v1 + (λ1 + λ2 )
x2
   
0 λ2 v1
−λ1 λ1 + λ2
 0 

x 
 2 
 b31   x3 
⇒ QAQ−1 = b 32 λ 3 , Q =  .
 . .. .. ..   . 
 .. . . .   .. 
bn1 bn2 bn3 · · · λn
xn
(2) Then, use the same process or mathematical induction on n to

finish the proof.
For another proof (here, A can be an arbitrary n × n matrix over a field
F of characteristic greater than n), try the following steps:
(1) A cannot have the form λIn unless λ = 0.
(2) There exists a vector u1 ∈ Fn such that { u1 ,
u1 A} is linearly inde-
pendent and { u1 , u1 A, u3 , . . . , un } = B is a basis for Fn . Then

 
0 1 0 ··· 0
 b21 b22 b23 · · · b2n 
 
−1  b31 b32 b33 · · · b3n  O A12
RAR =  =
 . .. .. .. ..  A21 A22
 .. . . . . 
bn1 bn2 bn3 · · · bnn
and tr A = tr A22 = 0. What is Rn×n ?
(3) By mathematical induction, there exists an (n−1)×(n−1) matrix S

such that SA22 S −1 is a matrix with zero diagonal entries.
(4) Then
−1
1 O −1 1 O O A12 S −1
RAR =
O S O S SA21 SA22 S −1
is a required one.
14. Show that
{A ∈ M(n; F) | tr A = 0} = {XY − Y X | X, Y ∈ M (n; F)}
as subspaces of M(n; F) (refer to Ex. 30 in Sec. B.4), i.e. every square
matrix A with tr A = 0 can be expressed as A = XY − Y X for some
square matrices X and Y . Try the following steps:
(1) Use Ex. 13.
(2) May suppose
 
0 a12 · · · a1n
 a21 0 · · · a2n 
 
A= . .. ..  .
 .. . . 
an1 an2 ··· 0
Take any diagonal matrix X = diag(λ1 , λ2 , . . . , λn ), where
λ1 , . . . , λn are all distinct. Suppose Y = [bij ]n×n . Then
 
0 b12 (λ2 − λ1 ) · · · b1n (λn − λ1 )
 b21 (λ1 − λ2 ) 0 · · · b2n (λn − λ2 )
 
XY − Y X =  .. .. .. 
 . . . 
bn1 (λ1 − λn ) bn2 (λ2 − λn ) · · · 0
aij
and XY − Y X = A holds if and only if bij = λj −λi for i = j.
<D> Applications
For possible applications of diagonalizable linear operators to differential
equations, we postpone the discussion to Sec. 3.7.6 in Chap. 3.
2.7.7 Jordan canonical form

In this subsection, we study the canonical form of these linear operators
which have coincident eigenvalues but are not diagonalizable.
We start from two concrete examples and end up with a summarized

result in general setting.
Example 1 In N = { e2 }, let linear operator f : R2 → R2 be defined as

e1 ,
f (x1 , x2 ) = (x1 , −2x1 + x2 )

1 −2
= x A, where x = (x1 , x2 ) and A = .
0 1
Try to investigate geometric mapping properties of f .
Solution The characteristic polynomial of f or A is

1 − t −2

det(A − tI2 ) = = (t − 1)2 .
0 1 − t
Thus, f has eigenvalues 1, 1. Suppose A is diagonalizable, i.e. there exists
an invertible 2 × 2 matrix P such that

λ 0
PAP −1 = 1
0 λ2
is a diagonal matrix. Since

λ −t 0
det(PAP −1 − tI2 ) = det 1 = (t − λ1 )(t − λ2 )
0 λ2 − t
= det P (A − tI2 )P −1 = det P · det(A − tI2 ) · det P −1
= det(A − tI2 ) = (t − 1)2 ,
hence λ1 = λ2 = 1 should hold. Then

−1 1 0
PAP = = I2
0 1
⇒ A = P −1 IP = P −1 P = I2 ,
which is not the original one. This indicates that A is not diagonalizable.
It is reasonable to expect that the mapping properties of f would be more
complicated.
Since det A = 1, f is one-to-one and hence onto.
Just like Example 1 in Sec. 2.7.6. This f maps lines into lines and pre-
serves their relative positions and ratios of signed lengths of line segments
along the same line or parallel lines.
Let
y = (y1 , y2 ) = f (
x) =
x A. Then
y1 = x1 ,
y2 = −2x1 + x2 .
When solving x1 = x1 and x2 = −2x1 + x2 , it is easy to see that x1 = 0 is

the only invariant line passing through 0 under f . For a point
x = (x1 , x2 )
with a nonzero first component x1 = 0, then
y2 − x2 = −2x1
shows that the point x is moved parallel to the line x1 = 0 and downward
the distance 2x1 if x1 > 0 and upward the distance −2x1 if x1 < 0. This is
equivalent to say that
y2 − x2
= −2,
x1
which shows that
x is moved along a line parallel to the line x1 = 0 and the
distance moved is proportional to its distance from x1 = 0 by a constant
scalar −2. Therefore, the line
x1 = c (constant)
is mapped onto itself with its point (c, x2 ) into (c, −2c + x2 ). Such a line is
also called an invariant line of f but is not an invariant subspace except
c = 0. See Fig. 2.83. In Fig. 2.84, does the quadrilateral with vertices
(3, 1), (1, 2), (−2, 1) and (−4, −2) have the same area and the same orien-
tation as its image quadrilateral? Why?
( x1 , x2 )
e2
0 e1
x1 = 0
(x1, − 2x1 + x2)
x1 = c
Fig. 2.83
But the concepts of eigenvalues and eigenvectors still can be used to

investigate mapping properties of f .
(−4, −6)
(−2, 5)
(1, 2)
(−2, 1) e2 (3, 1)
e1
(−4, −2)
(3, −5)
Fig. 2.84
We have already known that x = (x1 , x2 ) is an eigenvector of f related

to 1, i.e. x (A − I2 ) = 0 if and only if x1 = 0 and x2 = 0. What could

x (A − I2 )2 = 0 be? Actual computation shows that
(A − I2 )2 = (A − I2 )(A − I2 ) = A2 − 2A + I22

1 −4 1 −2 1 0 0 0
= −2 + = = O2×2 ,
0 1 0 1 0 1 0 0
i.e. A satisfies its characteristic polynomial (t − 1)2 = t2 − 2t + 1 which

illustrates the Cayley–Hamilton theorem (see (2.7.19)). Hence
- .
G= x ∈ R2 |
x (A − I2 )2 = 0 = R2
x ∈ R2 such that
holds. There does exist vector

x (A − I2 ) = 0

but x (A − I2 ))(A − I2 ) =
( x (A − I2 )2 = 0 .
For example, take

e1 = (1, 0) as our sample vector and consider

0 −2

e1 (A − I2 ) = (1 0) = (0 −2) = −2
e2 .
0 0
Thus {−2 e1 } is a basis B for G = R2 . According to B,

e2 ,
e2 )A = −2
(−2 e2 = 1 · (−2
e2 ) + 0 ·
e1 ,

e1 A = e1 − 2e2 = 1 · (−2
e2 ) + e1

−1 1 0 −2 e 0 −2
⇒ [f ]B = P AP = , where P = 2 = ,
1 1 e1 1 0
which is called the Jordan canonical form of f (see Sec. B.12). This means
that the geometric mapping properties of f in N = { e2 } is the same
e1 ,
as [f ]B in B = {−2 e1 }. The latter is better illustrated in Fig. 2.85.
e2 ,
(−2.5, 1) (−1.5, 1) e1 (1, 1) (2, 1)
invariant line − 2e2
(−2, −1) (−1, −1) (1.5, −1) (2.5, −1)
Fig. 2.85
One more example to practice methods in Example 1.
Example 2 In N = { e1 , e2 }, let f : R2 → R2 be defined as

1 1
f (x1 , x2 ) = x1 + x2 , − x1 + 2x2
2 2

1 − 12
=
xA, where x = (x1 , x2 ) and A = .
1
2 2
Investigate mapping properties of A.
Solution f is isomorphic. Its characteristic polynomial is

1 − t − 1 2
2 3
det(A − tI2 ) = = t − .
1 2 − t 2
2
Therefore, f or A has eigenvalues 32 , 32 and A is not diagonalizable. Solve

x (A − 32 I2 ) = 0 and we get the corresponding eigenvectors t(1, 1) for t = 0.
Hence, it is expected that x1 − x2 = 0 is an invariant line (subspace) of A.

Of course, A satisfies its characteristic polynomial, i.e.

2
9 3
A2 − 3A + I2 = A − I2 = O.
4 2
Thus
2 /

2 3
G = x ∈ R x A − I2

= 0 = R2 .
2
Take a vector, say
e1 = (1, 0), so that

3 1 1 1
e1 A − I2 = − , −

= − ( e2 ) = 0 ,
e1 +
2 2 2 2
2
3
e1 A − I2

= 0.
2
Then B = { v2 }, where
v1 , v1 = − 12 (
e1 + e2 ) and
v2 = e1 , is a basis for
R . Related to B,
2
3 3

v1 A = v1 = v1 + 0 · v2 ,
2 2
3 1 3

v2 A = e1 − ( e1 + e2 ) = v1 + v2
2 2 2

3
−1
0 3 0 0
⇒ [f ]B = P AP = 2
= I2 + ,
1 32 2 1 0
where
1
v −2 − 12
P = 1 = .
v2 1 0
x ∈ R2 ,
For any

x = (x1 , x2 ) in N = { e2 }
e1 ,
↓

[ x ]B = (α1 , α2 ) in B = { v2 }
v1 ,
↓
3
[f (
x )]B = [
x ]B [f ]B = (α1 , α2 ) + (α2 , 0)
2

3 3
= α1 + α2 , α2 in B
2 2
↓

−1 1 1

f ( x ) = xA = x P [f ]B P = x1 + x2 , − x1 + 2x2 ∈ R2 in N .
2 2
This means that v1 , i.e. x1 −x2 = 0, is the only invariant line (subspace)
of f , on which each point x is moved to f ( x ) = 32
x . See Fig. 2.86.
e2
invariant line
0 v2 = e1
x(1, 2)
v1
3
v1
( 3 3
)
1, 2
2 2
2
f ( x) (
3
2
3
1 + 2, 2
2 )
Fig. 2.86
We summarize Examples 1 and 2 as the following abstract result.
The Jordan canonical form of a linear operator
Suppose f ( xA: R2 → R2 is a linear operator where A = [aij ]2×2 is a

x) =
real matrix. Suppose the characteristic polynomial is
det(A − tI2 ) = (t − λ)2 ,
so that A has eigenvalues λ, λ and A is not diagonalizable. Then

(A − λI2 )2 = A2 − 2λA + λ2 I2 = O. Let the generalized eigenspace

Gλ = {
x ∈ R2 |
x (A − λI2 )2 = 0 } = R2 .

Take a vector v2 ∈ G so that v2 (A − λI2 ) = 0 . Then
v1 = v1 is an
eigenvector of A related to λ.
(1) B = { v2 } is a basis for R2 and

v1 ,

λ 0 0 0
[f ]B = P AP −1 = = λI2 + ,
1 λ 1 0
where

v
P = 1 .
v2
(2) The geometric mapping x → f ( x ) in N = { e2 } is equivalent to

e1 ,
the following mapping [f ]B , in B = { v1 , v2 }:

by scalar λ→
enlargement
[
x ]B = (α1 , α2 ) (λα1 , λα2 )
 
 
1 1translation along (α2 , 0)

[f ( x )]B = [ x ]B [f ]B = λ(α1 , α2 ) + (α2 , 0) = (λα1 + α2 , λα2 )
This means that the one-dimensional subspace v1 , i.e. α2 = 0, is
the only invariant subspace of f except λ = 1. In case λ = 1, each line
v1 is invariant under f .
parallel to (2.7.74)
For general results, please refer to Sec. B.11, in particular, Ex. 2 and
Sec. B.12. Remind readers to review Example 6 and the explanations after
it in Sec. 2.7.2, including Figs. 2.51 and 2.52.
Exercises
<A>
1. Model after Example 1 to investigate the geometric mapping proper-
ties of

−2 0
f (
x) =
xA, where
x = (x1 , x2 ) and A = .
3 −2
2. Do the same problem as in Ex. 1 and Ex. <A> 3(a) of Sec. 2.7.6 if

2 1
(a) f (
x) =
xA, where
x = (x1 , x2 ) and A = −1 4 ,

λ 0
(b) f (
x) = x = (x1 , x2 ) and A = b λ , bλ = 0.
xA, where
Fix the Cartesian coordinate system N = { e2 } on R2 . Let
e1 ,

v1 = (3, 1) and v2 = (1, −2). Try to map the triangle ∆ 0

v1v2 respec-

tively onto ∆ 0 v1 (− v2 ), ∆ 0 (− v2 )(− v1 ) and ∆ 0 (− v1 ) v2 . See Fig. 2.87.

For example, map ∆ 0 v2 onto ∆ 0
v1 v1 (− v2 ) but keep 0 fixed. One way
to do this is to find a linear isomorphism f : R2 → R2 such that
f ( v1 = 1 ·
v1 ) = v1 + 0 ·
v2
v 2 ) = −
f ( v2 = 0 ·
v1 + (−1) ·
v2

1 0
⇒ [f ]B = , where B = { v2 }.
v1 ,
0 −1
− v2
e2
v1
e1
0
−v1
v2
Fig. 2.87
In terms of the natural coordinate system N = { e2 },

e1 ,

1 0 v1 3 1
[f ]N = P −1 P, where P = =
0 −1 v2 1 −2

1 −2 −1 1 0 3 1
=−
7 −1 3 0 −1 1 −2

5 4
7 7
= 6
7 − 57
1
⇒ f (
x) =
x [f ]N = (5x1 + 6x2 , 4x1 − 5x2 ).
7
The other way is to define g: R2 → R2 as
v 1 ) = −
g( v2 = 0 ·
v1 + (−1)
v2
g( v1 = 1 ·
v2 ) = v1 + 0 ·
v2

0 −1 0 1 1 0
⇒ [g]B = =
1 0 1 0 0 −1
1 5

−1 0 −1 7 7
⇒ [g]N = P P =
1 0 − 10 − 1
7 7
1
⇒ g( x [g]N = (x1 − 10x2 , 5x1 − x2 ).
x) =
7

0 1
Note the role of the matrix 1 0 in the derivation of g.

(a) Model after f and g respectively and try to map ∆ 0
v1
v2 onto

∆ 0 (− v2 )(− v1 ).

(b) Same as (a) but map ∆ 0
v1v2 onto ∆ 0 (−
v1 )
v2 .

The next problem is to map ∆ 0 v1v2 onto ∆ 0
v2 (
v1 +
v2 ), half of

the parallelogram with vertices 0 , v2 , v1 + v2 and v1 . The mapping
h: R2 → R2 defined linearly by
h(
v1 ) =
v1 +
v2 ,
h(
v2 ) =
v2
is probably the simplest choice among others. See Fig. 2.88.
Geometrically, h represents the shearing with v2 as its invari-
ant line (for details, refer to Example 6 in Sec. 2.7.2) and it moves
a point [
x ]B = (α1 , α2 ) to the point [h(
x )]B = (α1 , α1 + α2 ) along
the line parallel to
v2 . Now

1 1
[h]B =
0 1
9
− 47 def
−1 1 1 7
⇒ [h]N = P P = =A
0 1 1 5
7 7
1
⇒ h( x ) = x [h]N

= (9x1 + x2 , −4x1 + 5x2 ).
7
v1 x
v1 + v2
v2
h( x)
〈〈v2 〉〉
Fig. 2.88
By the way, we can use this h to justify (2.7.74) as follows. Firstly,

the characteristic polynomial of h is

1−t 1
det([h]N − tI2 ) = det = (t − 1)2
0 1−t
and hence h has eigenvalues 1, 1 but h is not diagonalizable. Take
y2 = 72

e1 , then y2 (A − I2 ) =
y1 = v2 is an eigenvector of h and
γ = { y1 , y2 } is a basis for R2 . As might be expected, actual

h(
y1 ) =
y1
h(y2 ) =
y1 + y2

1 0
⇒ [h]γ =
1 1

−1 1 0
y1 1 −2
⇒ [h]N = Q Q, where Q = = 7
1 1 y2 2 0
9
1 0 2 1 0 1 −2 7 − 47
= 7 = ,
7 − 72 1 1 1 2 0 1 5
7 7
which coincides with the original one mentioned above. The other
choice is to define k: R2 → R2 by
k(
v1 ) =
v2 ,
k(v2 ) = v1 + v2

0 1
⇒ [k]B =
1 1
6
−1 0 1 7 − 57 def
⇒ [k]N = P P = = B
1 1 − 11 1
7 7
1
⇒ k(x) = x [k]N = (6x1 − 11x2 , −5x1 + x2 ).
7
Note that

0 1 0 1 0 0
= +
1 1 1 0 0 1

0 1 −1 0 0 0
= + .
−1 0 0 1 0 1
Try to use these two algebraic representations on the right to explain

the mapping properties of k in the basis B . Now, k has the charac-
teristic polynomial

−t 1
det([k]N − tI2 ) = det([k]B − tI2 ) = det ,
1 1−t
= t2 − t − 1,
√ √
1+ 5 1− 5
so that k has eigenvalues λ1 = 2 and λ2 = 2 . Solve

k( x ) = λi x , i.e.
6
− λi − 57
(x1 x2 ) 7 11 = 0, i = 1, 2
−7 1
7 − λi
√
and we obtain the corresponding
√ eigenvectors x1 = (22, 5 + 7 5)
x2 = (22, 5 − 7 5). Then C = {
and x1 , x2 } is a basis for R2 and
 √ 
1+ 5
0
R[k]N R−1 =  √  = [k]C ,
2
1− 5
0 2
where
√
x1 22 5 + 7 5
R= = √ .
x2 22 5 − 7 5
Therefore, [k]N = R−1 [k]C R, i.e.
k(
x) = x R−1 [k]C R = [
x [k]N = x ]C [k]C R.
x ∈ R2 , we can follow the following
This means that, for a given

steps to pinpoint k( x ):
x → [

x ]C → [
x ]C [k]C → [
x ]C [k]C R = k(
x ).
Equivalently, by using (2.7.72), compute

1 0
B1 = R−1 R
0 0
√ √ √
−1 5 − 7 5 −5 − 7 5 1 0 22 5 + 7 5
= √
308 −22 22 0 0 22 5 − 7 5
√
−1 5 − 7 5 −10
= √ √ ,
14 5 −22 −5 − 7 5
√
0 0 −1 −5 − 7 5 10
B2 = R−1 R= √ √
0 1 14 5 22 5−7 5
and we get the canonical decomposition of [k]N = B as
I2 = B1 + B2
√ √
1+ 5 1− 5
[k]N = B1 + B2 .
2 2
Refer to Fig. 2.73 and try to use the above decomposition to explain

geometrically how k maps ∆ 0 v1v2 onto ∆ 0 v2 (
v1 +
v2 ).

(c) Model after h and k to map ∆ 0 v2 onto ∆ 0 (−
v1 v1 −
v1 )(− v2 ).
(d) In B = { v1 , v2 }, a linear transformation p: R → R2 has the
2
representation

−3 0
[p]B = .
1 −3
Try to explain the geometric mapping properties of p in B and find

the image of ∆ 0 v1
v2 under p. What is the representation of p in
N = { e1 , e2 }?

(e) In B = { v2 }, a linear transformation q: R2 → R2 has the

v1 ,
representation

0 1
[q]B = .
−3 −2
Do the same question as (d). Refer to Sec. 2.7.8 if necessary.

(f) In B = { v2 }, a linear transformation r: R2 → R2 has the
v1 ,
representation

0 1
[r]B = .
−2 −3
Do the same problem as (d). Refer to Sec. 2.7.8 if necessary.

1. Prove (2.7.74) and interpret its geometric mapping properties graphically.
Read Secs. 3.7.7 and B.12 and try your best to do the exercises there.
2.7.8 Rational canonical form

Some linear operator may not have real eigenvalues.
Example 1 In N = { e2 }, let linear operator f : R2 → R2 be defined as

e1 ,
f (x1 , x2 ) = (x1 − 2x2 , x1 − x2 )

1 1
=
xA, where
x = (x1 , x2 ) and A = .
−2 −1
Try to investigate geometric mapping properties of f .

Solution f is one-to-one and hence is onto. The characteristic polynomial

of f is

1 − t 1

det(A − tI2 ) = = t2 − 1 + 2 = t2 + 1.
−2 −1 − t
Thus, f does not have any real eigenvalues. This is equivalent to say that
the simultaneous equations

xA = λ
x, or
(1 − λ)x1 − 2x2 = 0
x1 − (1 + λ)x2 = 0
do not have nonzero solution x = (x1 , x2 ) for any real number λ. Hence,

no line is invariant under f and only the point 0 is fixed by f .
A satisfies its characteristic polynomial t2 + 1, i.e.

2 −1 0 1 0
A + I2 = + = O.
0 −1 0 1
Thus A2 + I2 , as a linear operator on R2 , annihilates all the vectors
in R2 , i.e.

x ∈ R2 |
G = { x (A2 + I2 ) = 0 } = R2 .
Does this help in the investigation of choosing a better coordinate system

for R2 so that the mapping properties of f becomes clearer?
Let us try and find some clue, if any.
In N = { e2 }, consider the square with consecutive vertices
e1 ,

e1 , e2 , − e1 and −

e2 . See Fig. 2.89. If we rotate the whole plane, with

center at 0 , through 90◦ in the counterclockwise direction, the result-
ing image of the square coincides with itself while its vertices permutate
according to the ordering e1 → e2 → − e1 → − e2 → e1 . Four such
consecutive rotations will bring each vertex back to its original position
(see Fig. 2.89). These four rotations together form a cyclic group of order 4
(refer to Sec. A.4). Such a rotation through 90◦ carries a point x = (x1 , x2 )
into a point with coordinate (−x2 , x1 ) and, in N = { e1 , e2 }, can be
represented as

0 1
y = xJ, where J = .
−1 0
See Fig. 2.90, where J is the composite mapping of the reflection (x2 , x1 ) of
the point (x1 , x2 ) on the line x1 = x2 following by the reflection (−x2 , x1 )
e2
−e1 0 e1
J J2 J3 J4
−e2
Fig. 2.89
x1 = 0
x1 = x2
( x1 , x2 )
e2
( x2 , x1 )
(− x2 , x1 )
e1
Fig. 2.90
of (x2 , x1 ) on the line x1 = 0. Note that

e1 J =
e2
2
e1 J e2 J = −
= e1
3
e1 J e1 )J = −
= (− e2
4
e1 J = (−
e2 )J =
e1 .
In particular, J 2 + I2 = O and { e1 J} = {
e1 , e2 } are linearly indepen-
e1 ,
dent and thus is a basis for R .
2
When comparing A2 + I2 = O and J 2 + I2 = O, naturally we will have

strong confidence in handling A by a similar way. For this purpose, take
any fixed nonzero vector v ∈ G = R2 . Since A does not have any real

eigenvalues, v and v A should be linearly independent and hence
B = {
v,
v A}

is a basis for R2 . Now, since
v (A2 + I2 ) =
v A2 +
v = 0,

vA = 0·
v +1·
vA
( v A2 = −
v A)A = v +0·
v = (−1) vA

−1 0 1 v
⇒ [f ]B = P AP = , where P = .
−1 0 vA
This basis B and the resulting rational canonical form P AP −1 are exactly
what we want (refer to Sec. B.12). Figure 2.91 illustrates mapping proper-
ties of f or A in the language of the basis B. Note that
{I2 , A, A2 = −I2 , A3 = −A}
forms a cyclic group of order 4.
v
vA
A A2 A3 A4 = I 2
Fig. 2.91
Summarize Example 1 as the following abstract result.
The rational canonical form of a linear operator

Suppose f ( xA: R2 → R2 is a linear operator where A = [aij ]2×2 is a
x) =
real matrix. Suppose the characteristic polynomial is
det(A − tI2 ) = t2 + a1 t + a0 ,
where a21 −4a0 < 0, so that A does not have real eigenvalues (see Remark on
the next page). Then A2 + a1 A + a0 I2 = O. Let the generalized eigenspace

G = {
x ∈ R2 |
x (A2 + a1 A + a0 I2 ) = 0 } = R2 .
v ∈ R2 .
Take any nonzero vector
(1) B = { v A} is a basis for R2 and

v,

−1 0 1 0 1 a0 0 0 0
[f ]B = P AP = = + ,
−a0 −a1 −1 0 0 1 0 −a1
where

v
P = .
vA
(2) The geometric mapping x → f ( x) = xA in N = { e2 } is equivalent

e1 ,
to the mapping [ x ]B → [f ( x )]B = [ x ]B [f ]B in B = { v , v A} as follows.

x ]B = (α1 , α2 ) −
[ −−−→ (−α2 , α1 ) −
−−−→
(−a0 α2 , α1 )
0 1 a0 0
−1 0 0 1
 
  translation
1 1along (0, −a α
1 2)
[f ( x )]B = [ x ]B [f ]B = (−a0 α2 , α1 − a1 α2 ) = (−a0 α2 , α1 ) + (0, −a1 α2 )

(2.7.75)
See Fig. 2.92 and refer to Secs. 3.7.8 and B.12 for generalized results. Read-
ers should review Example 1 and the explanations associated, including
Fig. 2.91.
(0, −a12) x
(1, 2)
(−a02 , 1 − a12 ) f ( x)
vA (2 , 1)
v
(−2 ,1)
0
(−a02 , 1)
Fig. 2.92
Remark
Even if a21 − 4a0 ≥ 0 so that A has real eigenvalues, (2.7.75) is still valid but
not for every nonzero vector v in R2 . All one needs to do is to choose a vec-
tor v in R2 , which is not an eigenvector of A. Then, {

v,
v A} is linear inde-
pendent and hence forms a basis B for R . In this case, (1) and (2) hold too.
2
Exercises
<A>
e1 ,

−2 5
f (
x) =
xA where
x = (x1 , x2 ) and A = .
4 −3
(a) Model after Example 1 to justify (2.7.75).
(b) Do the same problem as in Ex. <A> 3(a) of Sec. 2.7.6.
2.8 Affine Transformations 235
2. Do the same problem as in 1 if

a 1

f ( x ) = xA,
where x = (x1 , x2 ) and A = , ab = 0.
b 0

1. Prove (2.7.75) and interpret it graphically.

Read Secs. 3.7.8 and B.12 and try to do exercises there.
2.8 Affine Transformations

Topics here inherit directly from Sec. 2.7, in particular, (2.7.2).
In Exs. 5 and 6 of Sec. 2.5, we gave definitions for affine sub-
spaces and affine transformations on the plane R2 . Now, we formally give
definitions as follows.
Definition 2.8.1 Let X be a nonempty set and V an n-dimensional vector

space over a field F. Elements of X are called points which are temporarily
denoted by capital letters P, Q, R, . . . . If there exists a function: X ×X → V
defined by
−
(P, Q) ∈ X × X → P Q ∈ V
with the following properties:
1. For arbitrary three points P, Q, R on X,

− − −
P Q + QR = P R.
x ∈ V , there corresponds a unique

2. For any point P in X and vector
point Q in X such that
−
PQ = x.
Then, call the set X an n-dimensional affine space with the vector space
V as its difference space. (2.8.1)
−
Since P P = 0 , hence any single point in X can be considered as the zero
−
vector or base point. Also, call P Q a position vector with initial point P
and terminal point Q or a free vector since law of parallel invariance holds
for position vectors. Moreover, fix a point O in X as a base point, the

correspondence
−
P ∈ X ↔ OP ∈ V
is one-to-one and onto. Hence, one can treat X as V but the latter V has

a specific 0 , called the zero vector while the former X does not have any
specific point in the sense that there is no difference between any two of its
points. Conversely, if consider each vector in V as a point, then any two
points x and y in V determine a vector
y −

x (the difference vector from the point
x to the point
y ).
In this case, V is an affine space with itself as difference space. We will
adopt this Convention throughout the book. This is the exact reason why we
consider R and R2 both as affine spaces and as vector spaces (refer to the
Convention before Remark 2.3 in Sec. 2.3).
Definition 2.8.2 Let S be a subspace of an n-dimensional vector space V

over a field F and
x0 be a fixed vector in V . The image

x0 + S = { x |
x0 + x ∈ S}
x →
of S under the translation x0 +
x is called an affine subspace of V
associated with the subspace S. Define the dimension of
x0 + S as dim S.
(2.8.2)
Note that x0 + S = x0 −

y0 + S if and only if y0 ∈ S. Also,
x0 + S = S if
and only if x0 ∈ S.

Definition 2.8.3 Let f : V → V be a linear isomorphism of V onto itself

and
x0 be a fixed vector in V . The composite mapping
T (
x) =
x0 + f (
x)
x →
of the linear isomorphism f followed by the translation x0 + x is
called an affine transformation or mapping of the vector or affine space V .
(2.8.3)
This is the reason why we called such changes of coordinates in (1.3.2),
(2.4.2) and (2.6.5) as affine transformations. Note that the above three
definitions are still good for infinite-dimensional vector space.
Now we can formally characterize affine transformations on the plane

in terms of linear transformations (see also Ex. 1 of Sec. 2.8.3).
Affine transformation or mapping on the plane

Suppose T : R2 (affine space) → R2 is a transformation. Then the following
are equivalent.
(1) T is one-to-one, onto and preserves ratios of signed lengths of line

segments along the same or parallel lines.
(2) For any fixed point x0 ∈ R2 , there exists a unique linear isomorphism
f : R (with x0 as base point) → R2 such that
2
T (
x ) = T ( x −
x0 ) + f ( x0 ), x ∈ R2 .

Such a T is called an affine transformation or mapping. (2.8.4)
What should be mentioned is the expression of T is independent of the

choice of the point
x0 . To see this, take any other point
y0 . Then,
T (
y 0 ) = T ( y0 −
x0 ) + f ( x0 ) and

T(x) = T (x0 ) + f(x −

y0 + y0 − x0 )
=
T ( x0 ) + f ( y0 − x0 ) + f ( x −

y0 )
= T (y0 ) + f ( x − y0 ), x ∈ R .
2
In particular,

T (
x ) = T ( 0 ) + f (
x ), x ∈ R2 .

(2.8.5)

In case T ( 0 ) = 0 , the affine transformation reduces to a linear isomor-
phism. In order to emphasize the “one-to-one and onto” properties, an
affine transformation is usually called an affine motion in geometry.
The composite function of two affine transformations is again affine. To

see this, let T1 (
x ) = T1 ( 0 ) + f1 (
x ) and T2 (
x ) = T2 ( 0 ) + f2 (
x ) be two
affine transformations. Thus
(T2 ◦ T1 )(
x ) = T2 (T1 (
x ))

= T2 ( 0 ) + f2 (T1 ( 0 ) + f1 (
x ))

= T2 ( 0 ) + f2 (T1 ( 0 )) + f2 (f1 (
x ))

= (T2 ◦ T1 )( 0 ) + (f2 ◦ f1 )(
x ), x ∈ R2

(2.8.6)
with the prerequisite that the composite function of two linear isomorphisms
is isomorphic, which can be easily verified.
By the very definitions, the inverse transformation f −1 : R2 → R2 of
f is isomorphic and the inverse transformation T −1 : R2 → R2 is affine too.
In fact, we have

T −1 (
x ) = −f −1 (T ( 0 )) + f −1 (
x ), x ∈ R2

(2.8.7)
−1 −1
and T ◦ T(x) = T ◦ T

( x ) = x for any x ∈ R .
2
Summarize as the
Affine group on the plane
The set of all affine transformations on the plane
Ga (2; R) = { x) |
x0 + f ( x0 ∈ R2 and f : R2 → R2 is a linear isomorphism}
forms a group under the composite operation (see Sec. A.2) with
I: R2 → R2 defined by I(
x) =
x
as identity element and
−f −1 (
x0 ) + f −1 (
x ), x ∈ R2

as the inverse element of x ). Call Ga (2; R) the affine group on the

x0 + f (
plane R (see Sec. A.4).
2
(2.8.8)
This is the counterpart of (2.7.34) for affine transformations.

This section is divided into five subsections.
As a counterpart of Secs. 2.7.1 and 2.7.3, Sec. 2.8.1 introduces the matrix
representations of an affine transformation in various affine bases. Concrete
and basic affine transformations are illustrated graphically in Sec. 2.8.2
which can be compared with Sec. 2.7.2 in content.
From what we have experienced through examples in Sec. 2.8.2, we
formally formulate these geometric quantities or properties that are invari-
ant under the action of the affine group Ga (2; R) as mentioned in (2.8.8).
Section 2.8.3 studies the affine invariants on the affine plane and proves
part (1) in (2.7.9). Results here are the plane version of those in Sec. 1.4,
and they will form a model for later development in the affine space R3
(see Sec. 3.8.3).
The study of these geometric properties that are invariant under
the affine group Ga (2; R) constitutes basically what the so-called affine
geometry on the plane. Section 2.8.4 will present some affine geometric
problems as examples.
In particular, Sec. 2.8.5 will focus our attention on the affine invariants
concerning the quadratic curve

b b
x, x + c = 0, B = 11 12 , b = (b1 , b2 ), c ∈ R,
x B + 2 b ,
b12 b22
x = (x1 , x2 ) and
where x,
x B is treated as x B)∗ =
x ( x∗ =
x B∗ x ∗.
x B
2.8.1 Matrix representations

We consider the vector space R2 as an affine plane at the same time.
Remember (see Sec. 2.6) that three points a0 ,
a1 ,
a2 are said to con-
stitute an affine basis B = { a0 , a1 , a2 } with base point

a0 if the vectors
a1 − a0 and a2 − a0 are linearly independent.

Let B = {
a0 , a2 } and C = { b0 , b1 , b2 } be two affine bases for the
a1 ,
plane R2 and T (x) = x ) an affine transformation on R2 .
x0 + f (
T may be expressed as
T (
x) = x −
y 0 + f ( a0 ),
where
y0 = T (
a0 ) =
x0 + f (
a0 ). Therefore,

x ) − b0 =
T ( y 0 − b 0 + f (
x −
a0 )
⇒ (by use of (2.6.3) and (a) in (2.7.23))

x ) − b0 ]C = [
[T ( y0 − b0 ]C + [f (
x −
a0 )]C

y0 − b0 ]C + [
= [ a0 ]B [f ]B
x − C
or, in short (see (2.3.1)),

[T (
x )]C = [ x ]B [f ]B
y0 ]C + [ C, (2.8.9)
which is called the matrix representation of the affine mapping T with

respect to the affine bases B = { a1 − a2 −
a0 , a0 } and C = { b1 − b0 ,

b2 − b0 }. (2.8.9) can be written in matrix form

[f ]B
C 0
([T (
x )]C 1) = ([
x ]B 1) . (2.8.10)
[T (a0 )]C 1
Combining (2.8.8), (2.7.23) and (2.8.10), we have
The matrix representations of affine transformations (motions)
with respect to affine bases

(1) Let N = { 0 , e2 } be the natural affine basis for R2 . Then, the affine
e1 ,
group Ga (2; R) in (2.8.8) is group isomorphic to the matrix group, also
denoted as

A 0
Ga (2; R) = A ∈ GL(2; R) and
x ∈ R 2
x0 1
0
with matrix multiplication

A1 0 A 2 0 A A 0
= 1 2 ,
x1 1 x2 1 x1 A2 + x2 1
where the identity motion is

I2 0
0 1
and the inverse motion is

−1
A 0 A−1 0
= −1 .

x0 1 − x0 A 1
Ga (2; R) is called the affine group or group of affine motions on the

plane with respect to the natural basis N .
(2) Let B = {a0 , a2 } be another affine basis for R2 , the affine group
a1 ,
with respect to B is the conjugate group
−1
A0 0 A 0
Ga (2; R) 0
a0 1 a0 1

A0 AA−1 0
= 0 A ∈ GL(2; R) and
x0 ∈ R
( x0 + a0 A)A−1 −1
0 − a0 A0

1
of the group Ga (2; R), where A0 is the transition matrix from

B = { a1 − a2 −
a0 , a0 } to N = { e2 } as bases for the vector
e1 ,
space R .
2
(2.8.11)
In case B = C, the matrix in (2.8.10) coincides with that in part (2) of

(2.8.11). This is because
[f ]B = AB B −1
N [f ]N (AN ) = A0 [f ]N A−1 −1
0 = A0 AA0
and
−1
[T (
a0 )]B = [ a0 )]N AN
x0 + f (
B = ( x0 + a0 A)A0
= ( a0 A)A−1
x0 + −1
0 − a0 A0 ,

where a0 A−1 N
0 = [ a0 ]N AB = [ a0 ]B = 0 .
Notice that change of coordinates stated in (2.4.2) is a special kind of
affine motions.

Using the natural affine basis N = { 0 , e2 }, an affine transformation
e1 ,

T ( x ) = x0 + xA (see (2.8.5)) is decomposed as

x →

y →
xA (keeping 0 fixed) = x0 +
y, (2.8.12)
where A ∈ GL(2; R). In matrix terminology, it is nothing new but

A 0 A 0 I2 0
= , A ∈ GL(2; R). (2.8.13)
x0 1 0 1 x0 1
As a consequence, we can consider the real general group GL(2; R) in

(2.7.34) as a subgroup of the affine group Ga (2; R) (for definition of a sub-
group, see Ex. <A> 2).
A linear isomorphism on R2 is a special kind of affine transformation

that keeps the zero vector 0 fixed. (2.8.12) says that an affine transfor-

mation is the composite of an affine transformation keeping 0 fixed and a

translation along T ( 0 ) − 0 = x0 . This fact is universally true for any point

in R2 .
x0 be any fixed point in R2 and T : R2 → R2 be an affine
To see this, let
transformation. Then
T (
x ) = T ( x −
x0 ) + f ( x0 )
x0 ) −
= T ( x0 + ( x −
x0 + f ( x0 ))
x0 ) −
= T ( x0 + T
x0 ( x ), (2.8.14)
x 0 ( x ) = x0 + f ( x − x0 ) is an affine transformation keeping the

where T

point x0 fixed. Therefore, we obtain the following result.
The decomposition of an affine transformation as

an affine transformation keeping a point fixed and
followed by a translation
Let T : R2 → R2 be an affine transformation and
x0 be any point in R2 .
Then T can be uniquely expressed as
T = f2 ◦ f1 ,
where f2 ( x0 ) −
x ) = [T ( x0 ] + x0 ) −
x is the translation along T ( x0 , while
f1 (
x) = x ) − T (
x0 + (T ( x0 )): R2 → R2 is an affine transformation keeping

x0 fixed. (2.8.15)
Finally, we state
The fundamental theorem of affine transformations (motions)

For two arbitrary affine bases B = { a0 , a2 } and C = { b0 , b 1 , b2 } for
a1 ,
the plane, there exists a unique affine transformation T : R2 → R2 satisfying

T (
ai ) = bi , i = 0, 1, 2. (2.8.16)
This is equivalent to say that there exists a unique linear isomorphism

f : R2 → R2 so that

ai −
f ( a0 ) = bi − b0 , i = 1, 2. (2.8.17)

Then T (
x) = b0 x −
+ f ( a0 ) is the required one.
We give an example to illustrate (2.8.15) and (2.8.16).
Example In R2 , let

a0 = (1, 2),
a1 = (1, −1),
a2 = (0, 1) and

b0 = (−2, −3), b1 = (3, −4), b2 = (−5, 1).
(a) Construct affine mappings T1 , T2 and T such that

T1 ( 0 ) =
a0 , T1 (
e1 ) =
a1 , T1 (
e2 ) =
a2 ;

T2 ( 0 ) = b0 , T2 (
e1 ) = b1 , T2 (
e2 ) = b2 , and

T (
ai ) = bi , i = 0, 1, 2.
(b) Show that T = T2 ◦ T1−1 .
x0 = (−2, −2). Express T as f2 ◦ f1 where f1 is a linear isomor-
(c) Let
phism keeping
x0 fixed while f2 is a suitable translation.
Solution See Fig. 2.93.
a0
b2
a2 = e2
e1
0
a1
x0
b0
b1
Fig. 2.93
(a) By computation,

a1 −
a0 = (1, −1) − (1, 2) = (0, −3) = 0
e1 − 3
e2 ,

a2 −
a0 = (0, 1) − (1, 2) = (−1, −1) = −
e1 −
e2 , and

a0 = T1 ( 0 ) = (1, 2) = e1 + 2
e2 .
Then T1 has the equation

0 −3
T1 ( x ) = (1 2) + x
−1 −1
or in coordinate form, if
x = (x1 , x2 ) and
y = T1 (
x ) = (y1 , y2 ),
y1 = 1 − x2 ,
y2 = 2 − 3x1 − x2 .
Similarly,

b1 − b0 = (3, −4) − (−2, −3) = (5, −1),

b2 − b0 = (−5, 1) − (−2, −3) = (−3, 4) and

b0 = T2 ( 0 ) = (−2, −3).
Then T2 has equation

5 −1

T2 ( x ) = (−2 −3) + x
, or
−3 4
y1 = −2 + 5x1 − 3x2 ,
y2 = −3 − x1 + 4x2 .

Suppose T (
x) = b0 x −
+ f ( a0 ) where f is a linear isomorphism. Thus
a1 −
f ( a0 ) = f (0, −3) = f (0
e1 − 3 e1 ) − 3f (
e2 ) = 0f ( e2 )

= −3f (
e2 ) = b1 − b0 = (5, −1)

5 1
⇒ f (
e2 ) = − , ;
3 3

f (a2 − a0 ) = f (−1, −1) = −f ( e1 ) − f (
e2 ) = b2 − b0 = (−3, 4)

5 1 14 13
⇒ f ( e1 ) = −f ( e2 ) − (−3, 4) =

,− − (−3, 4) = ,− .
3 3 3 3
Therefore,

14
− 13
T ( x ) = (−2 −3) + [ x − (1 2)]
3 3
, or
− 53 1
3
14 5 10 14 5
y1 = −2 + (x1 − 1) − (x2 − 2) = − + x1 − x2 ,
3 3 3 3 3
13 1 2 13 1
y2 = −3 − (x1 − 1) + (x2 − 2) = − x1 + x2 .
3 3 3 3 3
(b) T1−1 has the equation

1
x1 = (1 + y1 − y2 ), x2 = 1 − y1 .
3
Substituting these into the equation for T2 (and change y1 , y2 on the right
side back to x1 , x2 respectively), we have
1 10 14 5
y1 = −2 + 5 · (1 + x1 − x2 ) − 3(1 − x1 ) = − + x1 − x2 ,
3 3 3 3
1 2 13 1
y2 = −3 − (1 + x1 − x2 ) + 4(1 − x1 ) = − x1 + x2 ,
3 3 3 3
which is the equation for T . Hence T = T2 ◦ T1−1 . Readers are urged to
verify this by matrix computation.
(c) By (2.8.15),

14
− 13
T ( x0 ) = T (−2, −2) = (−2 −3) + [(−2 −2) − (1 2)]
3 3
− 53 1
3

14
− 13
= (−2 −3) + (−3 −4) 3 3
− 53 1
3

1 28 26
= (−2, −3) + (−22, 35) = − ,
3 3 3
and then
f2 ( x0 ) −
x ) = T ( x0 +
x

28 26 22 32
= − , − (−2, −2) + x = − ,

+
x,
3 3 3 3
x ) = f2−1 (T (
f1 ( x0 − T (
x )) = x0 ) + T (
x)

14
− 13
= (−2, −2) + ( x − (−2, −2))
3 3
.
− 53 1
3
Hence, f1 is an affine transformation keeping (−2, −2) fixed, and f2 is a
translation. Of course, T = f2 ◦ f1 holds.
Exercises
<A>
1. In R2 , let

a0 = (−1, 2), a1 = (5, −3),

a2 = (2, 1);

b0 = (−3, −2), b1 = (4, −1), b2 = (−2, 6).

(a) Show that B = { a0 , a2 } and C = { b0 , b1 , b2 } are affine bases
a1 ,
for R .
2

(b) Find the affine transformation T mapping ai onto bi , i = 0, 1, 2.
Express T in matrix forms with respect to B and C, and with respect
to the natural affine basis N , respectively.
(c) Suppose T is the affine transformation with matrix representation

1 5 2 −1
T (x1 , x2 ) = − + (x1 x2 )
2 4 5 6
with respect to natural basis N . Find the equation of T with

respect to B and C.
2. A nonempty subset S of a group (see Sec. A.4) is called a subgroup if
x, y ∈ S implies x ◦ y ∈ S and x ∈ S implies x−1 ∈ S.
(a) Show that
S+ = {A ∈ GL(2; R) | the determinant det A > 0}
is a subgroup of GL(2; R).

(b) Show that
S = {A ∈ GL(2; R) | |det A| = 1}
is a subgroup of GL(2; R) and
S1 = {A ∈ S | det A = 1}
is again a subgroup of S and, of course, a subgroup of S+ in (a).

But the set S−1 = {A ∈ S | det A = −1} is not a subgroup of S .
3. Let B = { a2 } be a basis for the vector space R2 and S be a subgroup
a1 ,
of GL(2; R).
(a) For each A ∈ S, show that { a2 A} forms a basis for R2 . Denote
a1 A,
the set of all such bases by
S(B) = {{ a2 A} | A ∈ S},

a1 A,
which is said to be generated by the basis B with respect to

subgroup S.

(b) Suppose a basis C = { b1 , b2 } ∈ S(B). Show that
S(B) = S(C).

4. Two bases B = { a2 } and C = { b1 , b2 } for R2 are said to belong to
a1 ,
the same class if there exists an A ∈ S+ (defined in Ex. 2(a)) such that

bi = ai = bi A, i = 1, 2. Thus, all the bases for R2 are divided,
ai A or
with respect to the subgroup S+ , into two classes. Two bases are said
to have same orientation if they belong to the same class, opposite
orientations otherwise. The bases of one of the two classes are said to
be positively orientated or right-handed ; then, the bases of the other
class are said to be negatively oriented or left-handed. R2 together with
a definite class of bases is said to be oriented.
(Note This acts as formal definitions for the so-called anticlockwise
direction and clockwise direction mentioned occasionally in the text,
say in Fig. 2.3 and (2.7.9).)
5. Subgroup of affine transformations keeping a point fixed
Let x0 ∈ R2 be a point. An affine transformation of the form
T (
x) = x −
x0 + ( x0 )A,
where A ∈ GL(2; R), keeps

x0 fixed, i.e.
T (
x0 ) =
x0 .
See Fig. 2.94. All such transformations form a subgroup of Ga (2; R),
and is group isomorphic to

A 0
A ∈ GL(2; R)
0 1
if the matrix representation of T with respect to an affine basis with

x0 as base point is used.
x0 + ( x − x0 ) A
x
x − x0
x0
Fig. 2.94
(Note This is the reason why the real general linear group GL(2; R)
can be treated as a subgroup of the affine group Ga (2; R).)
6. Subgroup of enlargement affine transformations keeping a point fixed

Affine transformation
T (
x) = x −
x0 + α( x0 ), α∈R

is an enlargement with scale α, keeping x0
fixed. See Fig. 2.95. All of
them form a group which is a subgroup of that group mentioned in
Ex. 5 and is group isomorphic to the group

αI2 0
α ∈ R and α
= 0 .
0 1
x x0 + (x − x0)
x0 x − x0
Fig. 2.95
7. Subgroup of translations
The set of translations
T (
x) =
x0 +
x,
x0 ∈ R2
forms a subgroup of Ga (2; R). See Fig. 2.96. This subgroup is group
isomorphic to the group

I2 0
x ∈ R 2
.
x0 1
0
x0 + x
x0
x
0
Fig. 2.96
8. Subgroup of similarity transformations

y0 be a fixed point in R2 . The affine transformation
Let
T (
x) = x −
y0 + α( x0 )
y0 −
= ( x0 ) + x −
x0 + α( x0 )
is the composite mapping of an enlargement followed by a translation
and is called a similarity transformation or mapping. See Fig. 2.97.
They form a group which is group isomorphic to the group

αI2 0
α ∈ R and α
= 0,
x ∈ R 2
with
y fixed .
x0 1
0 0
y0 −

x0 + ( x − x 0 )
y0 − x0 y0
x0
Fig. 2.97

Adopt the Cartesian coordinate system N = { e2 } on R2 . Is
e1 ,
it possible to construct an affine transformation mapping the triangle
∆ a1 a1 = (3, −1),
a3 with
a2 a3 = (−4, −3), onto the
a2 = (−2, 2) and

triangle ∆ b1 b2 b3 with b1 = (−1, 1), b2 = (1, 2) and b3 = (2, −3)? If
yes, how? (2.8.16) guarantees the possibility, and offers a way how to
find it. Construct side vectors

x1 = (−2, 2) − (3, −1) = (−5, 3),

x2 = (−4, −3) − (3, −1) = (−7, −2);

y1 = (1, 2) − (−1, 1) = (2, 1),

y2 = (2, −3) − (−1, 1) = (3, −4).
Let linear isomorphism f : R2 → R2 be defined as
f (
xi ) =
yi , i = 1, 2
⇒ f (x1 ) = f (−5e1 e2 ) = −5f (
+ 3 e1 ) + 3f (
e2 ) = (2, 1)

f ( x2 ) = − 2

f (−7 e1e2 ) = −7f (e1 ) − 2f (
e2 ) = (3, −4)

−5 3 f ( e1 ) 2 1
⇒ =
−7 −2 f ( e2 ) 3 −4
−1
f ( e1 ) −5 3 2 1 1 −2 −3 2 1
⇒ = =
f (e2 ) −7 −2 3 −4 31 7 −5 3 −4

1 −13 10
= .
31 −1 27
Hence, a required affine transformation is

1 −13 10
f (
x ) = (−1, 1) + ( x − (3, −1)) ,
31 −1 27
which is characterized by assigning the ordered vertices a1 ,
a2 ,
a3 to

the respective ordered vertices b1 , b2 , b3 . See Fig. 2.98.
x1 b2 − a1
y1 b2
a2 b2 0
b1
x → x − a1 b1 − a1 y1 0
b1 0
0 a1 y = f ( x − a1 ) w = y + b1
a3 x2 b3 − a1 y2 b3
b3
y2
Fig. 2.98
If the ordered vertices a1 ,

a2 ,
a3 are mapped into ordered vertices

b1 , b3 , b2 , i.e. x1 → y2 and

x2 → y1 , try to see the reason why the
required one is

1 −2 −3 0 1 2 1
f ( x ) = (−1, 1) + ( x − (3, −1)) ·

31 7 −5 1 0 3 −4

1 −12 5
= (−1, 1) + ( x − (3, −1)) .
31 11 −33
(a) Find all possible affine transformations mapping ∆ a1 a2
a3 onto

∆ b1 b2 b3 where ∆ a1 a2 a3 and ∆ b1 b2 b3 are as in Fig. 2.98.
(b) Do the same problem as (a), but for arbitrary triangles ∆ a1
a2
a3

and b1 b2 b3 .
(c) Does there exist an affine transformation mapping the interior of a
triangle onto the exterior of another triangle?
(d) Find all possible affine transformations mapping a parallelogram

a1 a2 a3 a4 onto another parallelogram b1 b2 b3 b4 .
10. Let f : R2 → R2 be a linear transformation and its Ker(f ) be a nonzero
proper subspace of R2 .
x ∈ R2 , show that the preimage of f (
(a) For each x ) is the affine
subspace
f −1 (f (
x )) =
x + Ker(f ).
(b) Show that the quotient set

R2 /Ker(f ) = {f −1 (f (
x )) |
x ∈ R2 }
is a vector space under (x + Ker(f )) + (
y + Ker(f )) = (
x +
y)+

Ker(f ) and α( x + Ker(f )) = α x + Ker(f ) and is isomorphic
to Im(f ). R2 /Ker(f ) is called the quotient space of R2 modules
Ker(f ). Refer to Sec. B.1.
(c) Try to define R2 /Im(f ) and prove that it is isomorphic to Ker(f ).

Results obtained here can be generalized almost verbatim to abstract
n-dimensional affine space V over a field F. Try your best!
2.8.2 Examples
This subsection will concentrate on how to construct some elementary affine
transformations and their matrix representations in a suitable basis. It will
be beneficial to compare the content with those in Sec. 2.7.2.
R2 is a vector space, and is also an affine plane as well. Occasionally, we
need planar Euclidean concepts such as lengths, angles and areas as learned
in high school courses. Please refer to the Introduction and Natural Inner
Product in Part Two, if needed.
An affine transformation T that maps a point (x1 , x2 ) onto a point
(y1 , y2 ) is of the form,
y1 = a11 x1 + a21 x2 + b1
(2.8.18)
y2 = a12 x1 + a22 x2 + b2
with the coefficient determinant

a11 a12
∆= = a11 a22 − a12 a21 = 0;
a21 a22
while, in the present matrix notation,

a11 a12
y =
x0 +
x A,
x0 = (b1 , b2 ) and A = (2.8.19)
a21 a22
with det A = 0, where x = (x1 , x2 ) and
y = (y1 , y2 ) and
y = T (
x ). The
latter is the one we obtained in (2.8.9) or (2.8.10), where B = C could be
any affine basis for R2 , via the geometric method stated in (2.8.4).
Remark In case
∆ = det A = 0, (2.8.20)
the associated transformation is called a singular affine transformation,
otherwise nonsingular. The affine transformations throughout the text will
always mean nonsingular ones unless specified otherwise.
The following are important special cases of affine transformations.

Case 1 Translation

Fix a plane vector
x0 (it could be the zero vector 0 ) and move all the point
x in R2 along the direction

x0 . The resulting motion
T (
x) =
x0 +
x, (2.8.21)
with A = I2 , is called a translation of R2 along
x0 . See Fig. 2.96.
A translation preserves all the geometric mapping properties stated in
(1) of (2.7.9). Every line parallel to
x0 is an invariant line.
The set of all translations constitutes a subgroup of Ga (2; R). Refer to
Ex. <A> 7 of Sec. 2.8.1.
Case 2 Reflection
Take two intersecting but non-coincident straight lines OA1 and OA2 in R2 ,
with O as point of intersection. For any point X ∈ R2 , draw a line through
X, parallel to OA2 , such that the line intersects OA1 at the point P . Extend
XP to X such that XP = P X . See Fig. 2.99(a). Then, the mapping
T : R2 → R2 defined by
T (X) = X
is an affine transformation which can be easily verified by using (1) in (2.8.4)
−
and is called the (skew) reflection along the direction OA2 of the plane R2
with respect to the axis OA1 . This reflection keeps each point on OA1 fixed
and the axis OA1 is then a line of invariant points. While, any line parallel
−
to the direction OA2 is an invariant line.
In order to linearize T , let the points O, A1 and A2 have respective
coordinate a0 ,
a1 and a2 in the Cartesian coordinate system N = { e2 }.
e1 ,
Then B = { a0 , a1 , a2 } is an affine basis for R . In terms of B,
2

1 0
[T (x )]B = [
x ]B [T ]B , [T ]B = x ∈ R2 .
, (2.8.22)
0 −1

Notice that [T ( a0 )]B = [
a0 ]B = 0 . What is the equation of T in

N = { 0 , e1 , e2 }? There are two ways to obtain this equation.
x
a2
X
A2 a0
A1
P a0
x − a0
O T(x)
X′ a 2 − a0 e1
a1 − a0
(a) e2
0
(b)
Fig. 2.99
One is to adopt the change of coordinates in (2.4.2). Now

x ]B = [ 0 ]B + [
[ x ] N AN
B

[T ( x )]N AN
x )]B = [ 0 ]B + [T ( B

⇒ T (
x ) = [T ( x )]B )AB
x )]N = (−[ 0 ]B + [T ( N

= −[ 0 ]B AB N B
N + {[ 0 ]B + [ x ]N AB }[T ]B AN

= −[ 0 ]B AB B N B
N + [ 0 ]B [T ]B AN + [ x ]N AB [T ]B AN .

But [ a0 ]N AN
a0 ]B = [ 0 ]B + [ B = 0 implies that

−[ 0 ]B AB N B
N = [ a0 ]N AB AN = [ a0 ]N = a0 .
Therefore,
T ( a0 ]N AN
a0 − [
x) = B N B
B [T ]B AN + [ x ]N AB [T ]B AN
=
a0 + ( a0 )AN
x − B
B [T ]B AN (2.8.23)
is the required equation of T in terms of N .
a1 −
The other way is to displace the vectors a2 −
a0 , x −
a0 and a0 to

the origin 0 . See Fig. 2.99(b). Then, observe each of the following steps:

x −
a0 (a vector in N = { e2 })
e1 ,
x −
⇒ ( a0 )AN
B = [ x ]B x −
(the coordinate of a0 in
a1 −
B = { a2 −
a0 , a0 })
⇒ ( a0 )AN
x − B [T ]B (the reflection in B = {
a1 − a2 −
a0 , a0 })
a0 )AN
x −
⇒ ( B
B [T ]B AN (the coordinate in N = { e2 })
e1 ,
⇒ a0 )AN
x −
a0 + ( B
B [T ]B AN (the reflection in affine basis
B = {
a0 , a2 })
a1 , (2.8.24)
and this is the required T (
x ).
We summarize the above result in

The reflection
Let a0 , a2 be three distinct non-collinear points in R2 . Then the
a1 and
(skew) reflection T along the direction a2 −
a0 of R2 with respect to the
axis a0 + a1 − a0 has the following representations:

1. In the affine basis B = {

a0 , a2 },
a1 ,

1 0
[T (
x )]B = [
x ]B [T ]B , where [T ]B = .
0 −1

2. In the Cartesian coordinate system N = { 0 , e2 },
e1 ,

a − a
T (
x) = x −
a0 + ( a0 )A−1 [T ]B A, where A = 1 0 .
a2 − a0 2×2
In case the direction a2 − a0 is perpendicular to the axis a0 +
a1 − a0 , T is called an orthogonal reflection or simply a symmetric

motion with respect to the axis a0 +

a1 −
a0 .
(1) A skew reflection preserves all the properties listed in (1) of (2.7.9) and
(c) area.
(2) An orthogonal reflection, in addition to (1), also preserves
(a) length,
(b) angle,
but reverses the direction. (2.8.25)
Notice that the affine transformation T mentioned in (2.8.25) can be

reduced to the form (A−1 [T ]B A there is now denoted as A below)

a11 a12
T ( x ) = x0 + x A, A = , (2.8.26)
a21 a22
where det A = 0. So we pose a converse problem: For a given affine trans-
formation, how can we determine if it is a reflection and, if certainty, how
to determine its direction and line of invariant points? The following are
the steps:
1. Compute the eigenvalues of A. If A has eigenvalues 1 and −1, then T
represents a reflection in case x (I2 − A) = x0 has a solution.
2. Compute an eigenvector v1 corresponding to 1, then

x (I2 − A) =
x0 or
1 1
( 0 + T ( 0 )) + v1 = x0 +
v1
2 2
is the line of invariant points of the reflection T .
3. Compute an eigenvector v2 corresponding to −1, then this v2 or − v2

is the direction of the reflection T . In fact, x0 is a required direction

(why?) if x0 = 0 . (2.8.27)
Example 1 Let a0 = (2, 2), a1 = (4, 1) and a2 = (1, 3). Determine the
reflection along the direction a2 −

a0 = (−1, 1) with a0 + a1 −
a0 =
a0 + (2, −1) as the line of invariant points.
Solution In the affine basis B = { a0 , a2 }, the affine transformation T

a1 ,
has the representation

1 0
[T (
x )]B = [
x ]B .
0 −1

while, in N = { 0 , e2 },
e1 ,
−1
2 −1 1 0 2 −1
T ( x ) = (2, 2) + [ x − (2, 2)]

−1 1 0 −1 −1 1

3 −2
= (2, 2) + [x − (2, 2)]
4 −3
which can be simplified as

3 −2
T(x) = x0 + x A, where x0 = (−12, 12) and A = .
4 −3
Now, suppose, on the contrary, that an affine transformation on R2 is given
as above. We try to determine if it is a reflection.
Follow the steps in (2.8.27).
By computing

3 − t −2
det (A − tI2 ) = = t2 − 9 + 8 = t2 − 1,
4 −3 − t
A has eigenvalues 1 and −1. Solve

3−1 −2
(x1 x2 ) = (0 0)
4 −3 − 1
v1 = (2, −1). Then the line
and get a corresponding eigenvector
1 1
x0 + v1 = (−12, 12) + (2, −1) = (−6, 6) + (2, −1)
2 2
is the line of invariant points. Note that this line is coincident with the line

with equation a0 + (2, −1) which is x1 + 2x2 = 6 in N = { 0 , e2 }.
e1 ,
On the other hand, solve

3+1 −2
(x1 x2 ) = (0 0)
4 −3+1
and get a corresponding eigenvector v2 = (−1, 1). Note that
x0 = (−12, 12)

is an eigenvector too. Either v2 or x0 , or any other corresponding eigen-
vector can be treated as the direction of the reflection. Readers are urged
to draw a graph to explain everything above.
Case 3 Simple elongation and compression or strain or stretching
Let OA1 and OA2 be two non-coincident straight lines which interest at
the point O. Take any fixed scalar k ∈ R (k could be positive or negative).
For any point X on R2 , draw a line through X and parallel to OA2 so that
the line interests the line OA1 at the point P . Pick up the unique point
X on the line XP such that X P = kXP . See Fig. 2.100. The mapping
T : R2 → R2 defined by
T (X) = X
X ′(k < 0)
A1
P
O
X ′(k > 0)
X
A2
Fig. 2.100
is an affine transformation and is called a simple elongation and compression

or strain or one-way stretch in the direction OA2 of R2 with axis the line
OA1 , and k is called the corresponding scale factor. A one-way stretching
has the following features:
1. there is a line of invariant points (i.e. the axis),
2. all other points move on lines parallel to the direction and there is a
stretch in this direction with scale factor k, and
3. all the lines parallel to the direction are invariant lines.
Notice the case k = −1 is the reflection mentioned in Case 2.
The processes to obtain (2.8.23), (2.8.24) and Fig. 2.99(b) are universal
and thus are still suitable for stretching. Hence, we have results correspond-
ing to (2.8.25) for
The one-way stretching

Let
a0 , a2 be three distinct non-collinear points in R2 . The one-way
a1 and
(skew) stretching T in the direction a2 −
a0 with a0 + a1 − a0 as axis
and scale factor k = 0 has the following representations:
a0 , a2 },
a1 ,

1 0
[T (
x )]B = [
x ]B [T ]B , where [T ]B = .
0 k

2. In the Cartesian coordinate system N = { 0 , e2 },
e1 ,

a − a
T (
x) = x −
a0 + ( a0 )A−1 [T ]B A, where A = 1 0 .
a2 − a0 2×2
a2 −
In case a0 is perpendicular to a1 − a0 , T is called an orthogonal
one-way stretch with axis a0 + a1 − a0 .

(1) A one-way stretch preserves all the properties listed in (1) of (2.7.9),
(2) but it enlarges the area by the scalar factor
|k|
and preserves the orientation if k < 0 and reverses the orientation if
k < 0. (2.8.28)
(2.8.26) and (2.8.27) are still good for one-way stretch, of course, subject
to some minor charges.
To test the affine transformation (2.8.26) to see if it is a one-way stretch,
follow the following steps:
1. Compute the eigenvalues of A. If A has eigenvalues 1 and k = 1, then T
x (I2 − A) =
represents a one-way stretch if x0 has a solution.
2. Compute an eigenvector v1 corresponding to 1, then

x (I2 − A) =
x0 or
1
x0 +
v1
1−k
is the line of invariant points (i.e. the axis).
3. Compute an eigenvector v2 corresponding to k, and then
v2 is the
direction of the stretch. In particular,
1
x0 or x0
1−k

x0 = 0 .
is a direction (up to a nonzero scalar) if (2.8.29)
Example 2 Let a1 = (1, −2) and

a0 = (−2, 1), a2 = (3, 2). Determine the
one-way stretch in the direction a2 − a1 = (5, 1) with

a0 + a1 −
a0 =
a0 + (3, −3) as axis and respective scale factors k = 2 and k = −3.
Solution In the affine basis B = { a0 , a2 }, the one-way stretch T is

a1 ,

1 0
[T ( x )]B = [ x ]B ,
0 k

where k = 2 or −3, while in N = { 0 , e2 },
e1 ,
−1
3 −3 1 0 3 −3
T (
x ) = (−2, 1) + [x − (−2, 1)]
5 1 0 k 5 1

1 3 + 15k −3 + 3k
= (−2, 1) + [ x − (−2, 1)] ·

,
18 −15 + 15k 15 + 3k
which can be simplified as
T (
x) =
x0 +
x A,
where

1 1 + 5k −1 + k

x0= (−2 1) − (−2 1) ·
6 −5 + 5k 5+k

1
6 (5, 1), if k = 2,
=
− 23 (5, 1), if k = −3;

1 11 1 1 −7 −2
A= if k = 2; if k = −3.
6 5 7 3 −10 1
Now, we consider the converse problems separately.

Suppose we have an affine transformation

1 1 11 1
T ( x ) = (5, 1) + x A where A =
6 6 5 7
and we want to decide if this T is a one-way stretch. If yes, where is its axis
and in what direction. Follow the steps suggested in (2.8.29) and proceed
as follows.
1. A has characteristic polynomial
11
−t 1
6 6
det (A − tI2 ) = = t2 − 3t + 2 = (t − 1)(t − 2).
5
6 −t
7
6
Hence A has two distinct eigenvalues 1 and 2 and it is a one-way stretch
with scale factor k = 2.
2. Solve
5 1

6 6
x (A − I2 ) = (x1

x2 ) = (0 0),
5 1
6 6
and get the eigenvectors v = t(1, −1) for t ∈ R and t = 0. Since

1
T ( 0 ) = 6 (5, 1), the line
1
− (5, 1) + (1, −1) = (−2, 1) + (3, −3)
6
is the axis as expected.
3. Solve

− 16 1
6

x (A − 2I2 ) = (x1 x2 ) = (0 0)
5
6 − 56
u = t(5, 1) for t ∈ R and t = 0. Then, any such

and get the eigenvectors

u and, in particular, x0 = 16 (5, 1) is the direction of T .
Next, we pose the following questions:
Q1. What is the image of the line connecting (0, 3) and (1, 0) under T ?
Where do these two lines intersect?

Q2. What is the image of the triangle ∆ b1 b2 b3 , where b1 = (0, 3),

b2 = (−4, 0) and b3 = (1, −1), under T ? What are the areas of these
two triangles?
For Q1, compute the image points of (0, 3) and (1, 0) as

1 1 11 1 1
T (0, 3) = (5, 1) + (0, 3) · = (10, 11),
6 6 5 7 3

1 1 11 1 1
T (1, 0) = (5, 1) + (1, 0) · = (8, 1).
6 6 5 7 3
Thus, the image line of the original line 3x1 + x2 = 3 has the equation,
in N ,
5x1 − x2 − 13 = 0.
These two lines intersect at the point (2, −3) which lies on the axis
(−2, 1) + (3, −3). Is this fact accidental or universally true for any line
and its image line under T ?
For Q2, compute

1 1 11 1 1
T ( b2 ) = T (−4, 0) = (5, 1) + (−4, 0) · = (−13, −1) = b2 ,
6 6 5 7 2

1 1 11 1 1
T ( b3 ) = T (−1, 1) = (5, 1) + (1, −1) · = (11, −5) = b3 .
6 6 5 7 6

Let b1 = T ( b1 ) = 1
3 (10, 11). Then the image of ∆ b1 b2 b3 is the triangle

∆ b1 b2 b3 .
Note that both ∆ b1 b2 b3 and ∆ b1 b2 b3 have the same orientation.
As far as triangle areas are concerned, we have the following results (refer
to Sec. 4.3, Ex. <A> of Sec. 2.6, or (2.8.44), if necessary):

1 b2 − b1 1 −4 −3 19
∆ b1 b2 b3 has area = det = = ,
2 1 −4

2 b3 − b1 2

1 b2 − b1 1 1 −59 −25 1368
∆ b 1 b 2 b 3 has area = det = · = = 19.
2 36 −9 −27

2 b 3 − b1 72

The area of ∆ b1 b2 b3
⇒ = 2 = det A.
The area of ∆ b1 b2 b3
See Fig. 2.101.
b1′
b1
axis
a2
a0
0 (1,0)
b2
b2′
b3 b3′
a1
(2, −3)
Fig. 2.101
On the other hand, suppose the affine transformation, for k = −3, is

2 1 −7 −2
x ) = − (5, 1) +
T ( x B, where B = .
3 3 −10 1
We want to consider the same problems as in the case k = 2.
1. The characteristic polynomial

− 7 − t − 2
3 3
det(B − tI2 ) = 10 = t2 + 2t − 3 = (t + 3)(t − 1).
− 1
− t
3 3
Hence, B has eigenvalues 1 and −3 and the corresponding T is a one-way

stretching with scale factor −3.
2. Solve
10
− 3 − 23
x (B − I2 ) = (x1 x2 )

= (0 0)
− 10
3 − 23
and get the eigenvectors v = t(1, −1) for t ∈ R and t = 0. The axis of
stretching is

1 2 1
· − (5, 1) + (1, −1) = − (5, 1) + (1, −1)
1 − (−3) 3 6
= (−2, 1) + (3, −3).
3. Solve

2
3 − 23
x (B + 3I2 ) = (x1 x2 ) = (0 0)
− 10
3
10
3
and get the eigenvectors u = t(5, 1) for t ∈ R and t = 0. Any such

eigenvector could be considered as the direction of the stretching.
Let Q1 and Q2 be as above. Then,
Q1. Compute

2 1 −7 −2 1
T (0, 3) = − (5, 1) + (0, 3) · = (−40, 1),
3 3 −10 1 3

2 1 −7 −2 1
T (0, 1) = − (5, 1) + (1, 0) · = (−17, −4).
3 3 −10 1 3
Hence, the image line of the original line 3x1 + x2 = 3 is, in N ,

15x1 + 69x2 + 177 = 0. They intersect at the point (2, −3).

Q2. It is known that T ( b1 ) = b1 = 13 (−40, 1). Compute

2 1 −7 −2
T ( b2 ) = − (5, 1) + (−4, 0) · = (6, 2) = b2 ,
3 3 −10 1

2 1 −7 −2 1
T ( b3 ) = − (5, 1) + (1, −1) · = (−7, −5) = b3 .
3 3 −10 1 3

Then ∆ b1 b2 b3 has the signed area

1 b2 − b1 1 1 58 5 57
det = · =−
−6

2 b 3 − b1 2 9 33 2

the signed area of ∆ b1 b2 b3
⇒ = −3 = det B.
the area of ∆ b1 b2 b3

Notice that T reverses the orientation of ∆ b1 b2 b3 . Hope that the readers
will be able to give a graphical illustration just like Fig. 2.101.
Case 4 Two-way stretch
This is a combination of two one-way stretches whose lines of invariant
points intersect at one point, the only invariant point. There are no invariant
lines at all if the scale factors are distinct.
As an easy consequence of (2.8.28), we have
The two-way stretch
Let a0 , a2 be three distinct non-collinear points in R2 . The two-way
a1 and
stretch T , with a0 the only invariant point, which has scale factor k1 along
a1 −

a0 and scale k2 along a2 − a0 , has the following representations:
a0 , a2 },
a1 ,

k1 0 k 0 1 0
[T (
x )]B = [
x ]B [T ]B , where [T ]B = = 1 ,
0 k2 0 1 0 k2
where k1 = k2 and k1 k2 = 0.

2. In N = { 0 , e2 },
e1 ,

a1 −
a0

T(x) =
a0 + (x −

a0 )A−1 [T ]B A,

where A = .
a2 −
a0 2×2
a1 −
In case a2 −
a0 is perpendicular to a0 , T is called an orthogonal
two-way stretch.
(1) A two-way stretch preserves all the properties listed in (1) of (2.7.9),
(2) but it enlarges the area by the scale factor

|k1 k2 |
and preserves the orientation if k1 k2 > 0 and reverses it if k1 k2 < 0.
(2.8.30)
If k1 = k2 = k = 0, 1, then
T (
x) = x −
a0 + k( a0 ) (2.8.31)
is called an enlargement with scale factor k (refer to Ex. <A> 6 of
Sec. 2.8.1). See Fig. 2.102.
T(x), k1< 0, k2 > 0
a2
T(x), k1> 0, k2 > 0
x
T(x), k1< 0, k2 < 0 a0
a1
T(x), k1> 0, k2 < 0
Fig. 2.102
(2.8.29) has a counterpart for a two-way stretch. The details are left to
the readers. We go directly to an example.
Example 3 Let a0 = (1, 1),

a1 = (2, 2) and
a2 = (3, 0). Determine the

two-way stretch T , with a0 as the only invariant point, which has scale 3
a1 −
along a0 and scale −2 along a2 −
a0 .
Solution In the basis B = {

a0 , a2 },
a1 ,

3 0
[T (
x )]B = [
x ]B [T ]B where [T ]B = .
0 −2

In N = { 0 , e2 },
e1 ,
−1
1 1 3 0 1 1
T ( x ) = (1, 1) + [ x − (1, 1)]

2 −1 0 −2 2 −1

−1 1 −5
x − (1, 1)] ·
= (1, 1) + [
3 −10 −4

1 1 −5
= (−2, −2) + x A, where A = −

.
3 −10 −4
Conversely, if T is given as above, how can one determine if it is a

stretch? And, if yes, where are the directions of stretching, and what are
the scale factors?
Watch the following steps.
1. The characteristic polynomial is

− 1 − t 5

det(A − tI2 ) = 310 3
= t2 − t − 6 = (t − 3)(t + 2).
4
− t
3 3
Hence, A has eigenvalues 3 and −2, and the associated T is a two-way

stretch.
2. Solve

− 10 5
x (A − 3I2 ) = (x1 x2 ) 10
3 3
= (0 0)
3 − 53
v1 = t(1, 1)t ∈ R and t = 0. Any such

and get the eigenvectors v1 is the
direction of one-way stretch with scale factor 3.
3. Solve

5 5
3 3
x (A + 2I2 ) = (
x1
x2 ) 10 10
= (0 0)
3 3
and get the eigenvectors v2 = t(2, −1) for t ∈ R and t = 0. Then, any
v2 is another direction of stretch with scale factor −2.
such vector
4. To find the only invariant point, solve
x = (−2, −2) +

xA
x (A − I2 ) = (2, 2)
⇒

−1 1
1
− 53
⇒ x = (2 2)(A − I2 )

= (2 2) · − 3
= (1 1)
6 − 16
3 − 43
and the point (1, 1) is the required one.
The square with vertices a0 = (1, 1), (−1, 1), (−1, −1), and (1, −1), in
counterclockwise ordering, is mapped onto the parallelogram with vertices
a0 = (1, 1), T (−1, 1) = 13 (5, −7), T (−1, −1) = (−5, −5) and T (1, −1) =
− 13 (17, 5) in clockwise ordering. See Fig. 2.103.

a0 a1
(−1, 1)
a2
0
T (1, −1) ( −1, −1)
(1, −1)
T (−1, 1)
T ( −1, −1)
Fig. 2.103
Notice that the image parallelogram has

2

10
T (−1, 1) −
a0 3 −3
the signed area = det = 20
T (1, −1) −
a0 − − 83
3
216
=− = −24
9
the signed area of the parallelogram −24
⇒ = = −6 = det A.
the area of the square 4
Case 5 Shearing (Euclidean notions are needed)

Fix a straight line OA on the plane R2 and pick up a line segment P Q
on it. With P Q as one side, construct a rectangle P QRS and then push
the rectangle P QRS in the direction parallel to OA, but keep points on
OA fixed and the height of each point to OA invariant, to construct a new
rectangle P QR S . See Fig. 2.104 (a). Of course, these two rectangles have
the same area.
R′
X X′
R
Y Y′
S′ A
S
Q
O A
T
P
O (a) (b)
Fig. 2.104
What we care is that, how to determine the image Y of a given point Y

in the rectangle PQRS under this mapping. Suppose Y does not lie on the
line OA. Draw any line through Y which intersects PQ at T and RS at X.
See Fig. 2.104 (b). The image X of X must satisfy XX = SS = RR .
Draw a line through Y and parallel to OA so that it intersects T X at the
point Y . This Y is then the required one. Now
∆T XX is similar to ∆T Y Y .
XX YY
⇒ =
TX TY
XX YY
⇒ = .
the distance of X to OA the distance of Y to OA
This means that any point Y in the rectangle PQRS moves, parallel to
the line OA, a distance with a fixed direct proportion to its distance to the
line OA.
Suppose every point X in the plane moves in this same manner, i.e.
the distance XX moved from X to X parallel to OA

= k.
the (perpendicular) distance of X to the line OA
Then, the mapping T : R2 → R2 defined as
T (X) = X
is an affine transformation and is called a shearing with the line OA as axis,

the only line of invariant points. Any line parallel to OA is invariant.
To linearize T , take two straight lines OA and OB, perpendicular to

each other, so that OA = OB = 1 in length. Suppose in N = { 0 , e2 },
e1 ,
O = a0 , A = a1 and B = a2 . Then, B = { a0 , a2 } is an orthonormal
a1 ,
affine basis for R2 and for x ∈ R2 ,

x − a1 −
a0 = α1 ( a2 −
a0 ) + α2 ( a0 ), i.e. [
x ]B = (α1 , α2 )
⇒ T (x) − a0 = (α1 + kα2 )( a1 − a2 −
a0 ) + α2 ( a0 )

1 0
⇒ [T (
x )]B = (α1 + kα2 , α2 ) = (α1 α2 )
k 1

1 0
= [ x ]B .
k 1
The afore-mentioned process for the definition of a shearing indicates

that it preserves areas of a parallelogram and hence of a triangle. To see
this analytically, let

x1 ,
x2 and
x3 be three non-collinear points. Then,

xi = a1 −
a0 + αi1 ( a2 −
a0 ) + αi2 ( a0 ), i = 1, 2, 3.
⇒ T ( x i ) = a0 + (αi1 + kαi2 )( a1 − a0 ) + αi2 (

a2 −
a0 ), i = 1, 2, 3.
⇒ The signed area of ∆T ( x1 )T ( x2 )T ( x3 )

1 T ( x1 )
x2 ) − T (
=
2 T ( x3 ) − T (x1 )
 
[(α21 + kα22 ) − (α11 + kα12 )](
a1 −
a0 )
1  + (α22 − α12 )( a1 −
a0 ) 
= det  [(α31 + kα32 ) − (α11 + kα12 )](

2 a0 )
a2 −
+ (α32 − α12 )(
a2 −
a0 )

1 (α21 − α11 ) + k(α22 − α12 ) α22 − α12 a1 −
a0
= det
2 (α31 − α11 ) + k(α32 − α12 ) α32 − α12 a2 −
a0

1 α − α11 α22 − α12 a1 − a0
= det 21
2 α31 − α11 α32 − α12 a2 −
a0

1 x −x
= 2 1
2 x3 − x1
= The signed area of ∆
x1
x2
x3 .
Therefore, a shearing preserves areas of plane domains bounded by non-self
intersecting closed polygonal curves and hence any measurable plane set by
a limit process.
We summarize as (one needs Euclidean concepts.)
The shearing
Let a0 , a2 be three distinct points in R2 so that |
a1 and a1 − a0 | =
| a2 − a0 | = 1 and ( a1 − a0 )⊥( a2 − a0 ), and k = 0 be a scalar. Then,

the shearing T with axis a0 +

a1 −
a0 and coefficient or scale factor k
has the following representations.
a0 , a2 },
a1 ,

1 0
[T ( x )]B = [ x ]B [T ]B , where [T ]B = .
k 1

2. In N = { 0 , e2 },
e1 ,

a1 −
a0

T(x) =
a0 + (x −

a0 )A−1 [T ]B A,

where A = .
a2 −
a0
Here A is an orthogonal matrix, i.e. A−1 = A∗ .
See Fig. 2.105.
(1) A shearing preserves all the properties listed in (1) of (2.7.9),

(2) and also preserves
(c) area,
(d) directions (i.e. orientations). (2.8.32)
T(x) = (1 + k2)a1 + 2a2 T(x) = (1 + k2)a1 + 2a2

x
x
a2 a2
a0 a1 a0 a1
x
x
T(x) = (1 + k2)a1 + 2a2 T(x) = (1 + k2)a1 + 2a2
Fig. 2.105
For a generalization to skew shearing, see Ex. <A> 17.

As a counterpart of (2.8.27) for a shearing, we list the following steps:
1. Compute the eigenvalues of A. In case A has only one eigenvalue 1 with

multiplicity 2 and A is not diagonalizable, then the associated T is a
shearing.
2. The eigenspace
x ∈ R2 |
E = { x}
xA =
has dimension one (see (1)3. in (2.7.73)). Take any eigenvector v of unit
length and take any a0 as a solution of x (I2 − A) = x0 . Then a0 +

v
x (I2 − A) =
or x0 itself is the axis of the shearing.
3. Take any u , of unit length and perpendicular to uA −
v , and then u=

k v holds and the scalar k is the coefficient. (2.8.33)
We give an example to illustrate these concepts.
Example 4 Let a0 = (−2, −2) and a1 = (2, 1). Construct the shearings
with axis a0 + a1 −

a0 and coefficients k = 2 and k = −1, respectively.
a1 −
Solution a0 = (4, 3) and its length |
a1 −
a0 | = 5. Let
1
v1 (4, 3),
=
5
1
v2 = (−3, 4).
5
Then B = { a0 ,
a0 + v1 ,
a0 + v2 } is an orthonormal affine basis for R2 .
In B, the required affine transformation is

1 0
[T (x )]B = [ x ]B [T ]B , where [T ]B = .
k 1

While, in N = { 0 , e2 },
e1 ,
−1
v 1 0 v1
T ( x ) = (−2, −2) + [ x − (−2, −2)] 1

v2 k 1 v2
=
x0 +
x A,
where

1 25 − 12k −9k
A=
25 16k 25 + 12k


 1 1 −18

 25 32 if k = 2;
49
=

 1 37 9

 if k = −1,
25 −16 13

x0 = (−2, −2) − (−2, −2)A = (−2, −2)(I2 − A)
4

 (4, 3) if k = 2;
25
=

− 2 (4, 3) if k = −1.
25
Now, we treat the converse problems.
Suppose T (
x) = x0 +
x A if k = 2. Notice the following steps:
1. The characteristic polynomial is
1−25t
5 − 18
25
det[A − tI2 ] = det = t2 − 2t + 1 = (t − 1)2 .
32 49−25t
5 25
Hence, A has coincident eigenvalues 1. The corresponding eigenspace is

x ∈ R2 |
E = { x (A − I2 ) = 0 } = (4, 3) =
v1 ,
which is one-dimensional. Therefore A is not diagonalizable. See also

Ex. <A> 7 of Sec. 2.7.6. Therefore, the associated T is a shearing.
2. Solve
24 18

25 25 4
x (I2 − A) = (x1 x2 )

=x0 = (4, 3),
− 32
−
25
24
25
25
which reduces to 3x1 − 4x2 = 2. Then any point, say (−2, −2), on the
line 3x1 − 4x2 = 2, is an invariant point. Therefore, (−2, −2) + v1 ,
which is 3x1 − 4x2 = 2, is the axis.
3. Take a vector v2 such that |v2 | = 1 and
v2 ⊥ v1 , say
v2 = 15 (−3, 4).
Then

1 1 1 −18

v2 A (−3, 4) ·
= = (1, 2)
5 25 32 49
1 1
v2 A −
⇒ v2 = (1, 2) − (−3, 4) = (8, 6) = 2
v1 .
5 5
Hence, k = 2 is the coefficient.
We raise two questions as follows:
Q1. What is the image of the straight line x1 + x2 = 3? Where do they

intersect?
Q2. What is the image of the square with vertices (1, 1), (−1, 1), (−1, −1)
and (1, −1)? How much is its area?
For Q1, one way to do this is to rewrite x1 + x2 = 3 as

1 −1 1
(x1 x2 ) = (x1 x2 )A · A =3
1 1
⇒ (denote
y = (y1 , y2 ) = T (
x ) temporarily)

4 1 49 18 1
y − (4, 3) ·

=3
25 25 −32 1 1
⇒ (change y1 , y2 back to x1 , x2 respectively)
67x1 − 31x2 = 103.
These two lines intersect at (2, 1) which lies on axis 3x1 − 4x2 = 2. This is
absolutely not accidental, but why?
For Q2, by computation,

4 1 1 −18 1
T (1, 1) = (4, 3) + (1, 1) = (49, 43),
25 25 32 49 25
1
T (−1, 1) = (47, 78),
25
1
T (−1, −1) = (−17, −18),
25
1
T (1, −1) = (−3, −11).
5
These four points form four vertices of a parallelogram whose area is equal
to 4. Do you know why? See Fig. 2.106.
T ( −1,1)
T (1,1) axis
( −1,1) (1,1)
T ( −1, −1) a1
0
(− 1, −1) (1, −1)
v2
v
a0 1 T (1, −1)
Fig. 2.106
Suppose T ( x) = x0 + x A if k = −1. We leave the corresponding details

to the readers except Q2. For Q2, by computation,

2 1 37 9 1
T (1, 1) = − (4, 3) + (1, 1) = (13, 16),
25 25 −16 13 25
1
T (−1, 1) = (−61, −2),
25
1
T (−1, −1) = (−29, −28),
25
1
T (1, −1) = (45, −10).
25
They form a parallelogram of area equal to 4. See Fig. 2.107.
Case 6 Rotation (Euclidean notions are needed)

Take two perpendicular lines OA and OB as axes so that the line segments
OA and OB are of unit length. Let θ be real such that −2π < θ ≤ 2π.
axis
( −1,1) (1,1) a1
0 T (1, −1)
T ( −1,1)
(1, −1)
v2 v1
a0
Fig. 2.107
For any point X in the plane, rotate the line segment OX, with center
at O, through an angle θ to reach the line segment OX . Define a mapping
T : R2 → R2 by
T (X) = X .
This T is an affine transformation and is called the rotation with center
at O of the plane through angle θ. It is counterclockwise if θ > 0, clockwise
if θ < 0. See Fig. 2.108.
X′
X X
B B A X′

A
O O
(a) counterclockwise (b) clockwise
Fig. 2.108

To linearize T , let O =
a0 , A = a2 in N = { 0 ,
a1 and B = e2 } and
e1 ,
v1 = a1 − a0 = (α11 , α12 ) and v2 = a2 − a0 = (α21 , α22 ). Then

|
v1 | = |
v2 | = 1 and
v1 ⊥
v2 .
Now, B = {
a0 , a2 } is an affine orthonormal basis for R2 . For
a1 , x ∈ R2 , let

x −
a0 = α1
v1 + α2
v2 or [
x ]B = (α1 , α2 ),
x) −
T ( a0 = β1
v1 + β2
v2 or [T (
x )]B = (β1 , β2 ).
Then Fig. 2.108(a) indicates
β1 = α1 cos θ − α2 sin θ
β2 = α1 sin θ + α2 cos θ

cos θ sin θ
⇒ (β1 β2 ) = (α1 α2 ) .
− sin θ cos θ
We summarize as
The rotation
Let
a0 ,
a1 and
a2 be three distinct non-collinear points so that
|
a1 −
a0 | = |
a2 −
a0 | = 1 (in length), and
(
a1 −
a0 ) ⊥ (
a2 −
a0 ) (perpendicularity).
Then the rotation T with center at

a0 and through the angle θ has the
following representations.
1. In the orthonormal affine basis B = { a0 , a2 },

a1 ,

cos θ sin θ
[T ( x )]B = [ x ]B [T ]B , where [T ]B = .
− sin θ cos θ

2. In N = { 0 , e2 },
e1 ,

a1 −
a0

T(x) =
a0 + (x −
a0 )P −1 [T ]B P,

where P = .
a2 −
a0
Here P is an orthogonal matrix, i.e. P ∗ = P −1 .

a0 is the only invariant point if θ = 0.
The center
(1) A rotation preserves all the properties listed in (1) of (2.7.9),

(2) and it also preserves
(a) length,
(b) angle,
(c) area,
(d) orientation. (2.8.34)

In N = { 0 , e2 }, to test if a given affine transformation
e1 ,
T (
x) =
x0 +
xA
is a rotation, try the following steps.

1. Justify that A is an orthogonal matrix, i.e. A∗ = A−1 and det A = 1.

2. Put A in the form

cos θ sin θ
.
− sin θ cos θ
Then θ is the rotation angle.

xA =
3. The only solution of x (if θ = 0) is 0 . Hence I2 − A is invertible.
Thus

x0 (I2 − A)−1
is the center of rotation. (2.8.35)
Notice that A does not have real eigenvalues if θ = 0, π. The readers should
consult Ex. 5 of Sec. 2.7.2 for a more general setting.
Example 5 Determine whether

 
√1 √3
where A =  
10 10
T ( x ) = (1, 1) + x A,
− √310 √1
10
is a rotation. If yes, write down its rotation angle and center.
Solution It is easy to check that

A∗ A = I2
and det A = 1. Put
1 3
cos θ = √ and sin θ = √ ,
10 10
−1
then tan θ = 3 and θ = tan 3 is the rotation angle. Also,
√
1 10 − 1 √ −3
I2 − A = √
10 3 10 − 1
√
1 √ 2( 10 − 1)
⇒ det(I2 − A) = (20 − 2 10) = √
10 10
⇒ I2 − A is invertible and
√ √
−1 10 1 10 − 1 √ 3
(I2 − A) = √ ·√
2( 10 − 1) 10 −3 10 − 1
√
10+1
1 √
1 3
= .
2 − 10+1 3 1
 √ 
1 1 10+1
√ √
⇒x0 (I2 − A)−1 = (1 1) ·  √
3  = 1 (2 − 10, 4 + 10),
2 − 10+1
1 6
3
which is the center of rotation.
Do you know how to find the equation of the image of the unit circle
x21 + x22 = 1 under T ?
Case 7 Reflection again (Euclidean notations are needed)
Example 6 Study the affine transformation

 √ 
√ 1 3
T (
x ) = (−2, 2 3) +
x A, where A =  √2 2 
.
2
3
− 12
Solution Although A is orthogonal, i.e. A∗ = A−1 , but its determinant

det A = −1. Therefore, the associated T is not a rotation.
√
1 3
2 − t
det(A − tI2 ) = √ = t2 − 1,
2

23 − 2 − t
1
so A has eigenvalues 1 and −1. According to 1 in (2.8.27), the associated

T is a reflection.
Solve
 √ 
− 1 3
x (A − I2 ) = (x1 x2 )  √
2 2 
= (0 0)
2
3
− 32
√
and get the associated eigenvectors
√

v = t( 3, 1), for t ∈ R and t = 0. Pick
1
up a unit vector v1 = 2 ( 3, 1). On the other hand, solve
 √ 
3 3
x2 )  √
2 2 
x (A + I2 ) = (x1 = (0 0)
3 1
2 2
√
and get the associated
√ eigenvectors v = t(−1, 3). Pick up a unit vector

v2 = 12 (−1, 3). This v2 is the direction of reflection. Note that v2 is
perpendicular to v1 , so that A is an orthogonal reflection with respect to
the axis v1 .
To see if the original affine transformation T ( x) =
x0 +
x A is an orthog-
onal reflection with respect to a certain axis, this is our problem. Suppose
the axis is of the form a0 +
v1 , then
T (
x) =
x0 +
xA =
x0 + x −
a0 A + ( a0 )A
⇒
a0 =
x0 +
a0 A

⇒ a0 (I2 − A) which implies that
x0 = x0 A = −
x0 (I2 + A) = 0 , i.e. x0
x0 is an eigenvector of A corresponding to − 1.
or (2.8.36)
√
For the given
x0 = (−2, 2 3), solve
√
1 3

2 − 2 √
x (I2 − A) = x √

= (−2, 2 3), where
x = (x1 , x2 )
− 23 3
2
√
and the axis is x1 − 3x2 = −4. Of course, any point on it, say (−4, 0),
can be used as a0 +
a0 and the axis is the same as v1 . See Fig. 2.109.
x
a0 + 〈〈v1〉〉, axis for T
T(x) 〈〈 v1〉〉, axis for A
a0 v2
v1
(− 4,0) x0
0 e1
xA
Fig. 2.109
x → T (
We can reinterpret the transformation x ) as follows.

orthogonal reflection 1 0

x = (x1 , x2 ) −−−−−−−−−− −−−→ x
with axis e 1 0 −1
 √ 
1 3
rotation with 1 0  2 2 
= (x1 , −x2 ) −−−−−−− −−−−−−−−−−−→ x √
center at 0 and through 60◦ 0 −1 − 3 1
2 2
 √ 
1 3
x √
2 2 
=
2
3
− 12
translation
x A −−−−−−−−−−−−−
= √−→
x0 +
x A = T (
x ).
along
x 0 = (−2, 2 3)
Fig. 2.110 illustrates all the steps involved.

What is the image of the unit circle x21 +x22 = 1? As it might be expected
beforehand, since A, as a mapping, does keep the circle invariant while the
transformation x → x0 + x preserves everything, the image is the circle
T ( x) a0 + 〈〈v1〉〉
〈〈v1〉〉
x0
x
a0 v2
v1 xA
60o
( x1 , − x2 )
Fig. 2.110
|
x −
x0 | = 1. Formally, we prove this as follows.
x =
x, xx ∗ = x21 + x22 = 1
⇒ (T ( x0 )A−1 [(T (
x) − x0 )A−1 ]∗ = 1
x) −
x0 )A−1 (A−1 )∗ (T (
x) −
⇒ (T ( x0 )∗ = 1
x) −
⇒ (Since A−1 (A−1 )∗ = A∗ A = A−1 A = I2 )(T (
x) − x0 )∗ = 1
x) −
x0 )(T (
x) −
⇒ |T ( x0 | = 1 if |
x | = 1.
In order to avoid getting confused, it should be mentioned and cau-

tioned that the transformation T (x) =
x0 +
x A in (2.8.26) is derived from
−1
T ( x ) = a0 + ( x − a0 )B, where B = A [T ]N A, so that

x0 a0 −
= a0 (I2 − B).
a0 B =
This implies explicitly why 3 in (2.8.27) and (2.8.29) should hold. But this
is not the case in Example 6 where T ( x) =
x0 + xA is given, independent
of the form a0 + ( x − a0 )B. That is why we have to recapture

a0 from
x0
in (2.8.36).
Therefore, we summarize the following steps to test if a given transfor-

mation, in N = { 0 , e2 },
e1 ,
T (
x) =
x0 +
xA
is an orthogonal reflection where A is an orthogonal matrix.
1. Justify det A = −1 so that A has eigenvalues ±1.

2. Put A in the form

cos θ sin θ 1 0 cos θ sin θ
A= = .
sin θ − cos θ 0 −1 − sin θ cos θ
θ
Then the eigenvectors corresponding to 1 are v = tei 2 = t cos θ2 , sin θ2
θ
u = tiei 2 = t −sin θ2 , cos θ2 for
for t ∈ R and t = 0, these to −1 are
t ∈ R and t = 0.
3. To find the axis, suppose the axis passes the point
a0 . Then, solve

a0 =
x0 +
a0 A, or

x0 =
a0 (I2 − A)
to get a0 +
a0 and v , the axis of the reflection. In fact, this axis has

x (I2 −A) in N . Note that
the equation x0 = x0 should be an eigenvector
corresponding to −1. (2.8.37)
Steps similar to these should replace those stated in (2.8.27) and (2.8.29)
once T (x) = x0 + x A is given beforehand, independent of being derived

x −
from T ( x ) = a0 + ( a0 )B.
Let A = [aij ]2×2 be an invertible matrix. We have the following
elementary matrix factorization for A (see (2.7.68)).
Case 1 a11 = 0, and

a11 a12

1 0 1 det A
a11 0
A= a21 det A .
a11 1 0 1 0 a11
0, and
Case 2 a11 = 0, then a12 a21 =
a22
0 1 1 a12 a21 0
A= .
1 0 0 1 0 a12
Case 1 shows that an affine transformation T (
x) =
x0 +
x A in

N = {0, e2 } is the composite of
e1 ,
a21
e1 and coefficient
(a) a shearing: axis a11
,
a11 a12
(b) a shearing: axis
e2 and coefficient det A ,
(c) a two-way stretch: along e1 with scale factor a11 and along
e2
det A
with scale factor a11 , and
(d) a translation: along x0 ; (2.8.38)
while Case 2 shows that it is the composite of
(a) a reflection: axis e1 + e2 ,
a22
(b) a shearing: axis e2 and coefficient a ,

12
(c) a two-way stretch: along e1 with scale factor a21 and along
e2
with scale factor a12 , and
(d) a translation: along x0 . (2.8.39)
According to logical development arranged in this book, planar

Euclidean notions will be formally introduced at the very beginning of
Part 2 and Chap. 4. Unfortunately, we need them here in order to intro-
duce shearings, rotations and orthogonal reflections. Hope it does not cause
much trouble to the readers.
In Chap. 4, different decompositions for affine transformations from
these in (2.8.38) and (2.8.39) will be formulated and proved. See
Exs. 5–9 of Sec. 4.8.
Exercises
<A>
1. Prove (2.8.27).
2. Let

1 13 −8
T (
x) =
x0 +
x A, where A =
11 6 −13

be an affine transformation in N = { 0 , e2 }.
e1 ,
(a) Determine which x0 will guarantee that T is a reflection in a certain
affine basis.
(b) Find an affine basis in which T is a reflection and determine the
direction of the reflection.
(c) Find the image of the line 4x1 + x2 = c, where c is a constant.
(d) Find the image of the line x1 + x2 = 1. Where do they intersect?
(e) Find the image of the unit circle x21 + x22 = 1 and illustrate it
graphically.
3. Let
4 3

5 5
T(x) = x0 + x A, where A = .
3
5 − 45
Do the same problems as in Ex. 2.
4. Prove (2.8.29).
5. Let k be a nonzero constant and k = 1. Let

−3k + 4 6k − 6
T (
x) =
x0 +
x A, where A =
−2k + 2 4k − 3

be an affine transformation in N = { 0 , e2 }.
e1 ,
(a) Determine all these
x0 so that T is a one-way stretch in a certain
affine basis.
(b) For these x0 in (a), determine the axis and the direction of the
associated one-way stretch.
(c) Show that a line, not parallel to the direction, and its image under
T will always intersect at a point lying on the axis.
(d) Find the image of the ellipse x21 + 2x22 = 1 under T if k = 3.
(e) Find the image of the hyperbola x21 − 2x22 = 1 under T if k = −2.
(f) Find the image of the parabola x21 = x2 under T if k = 12 .

6. In N = { 0 , e2 }, let
e1 ,
T (
x) =
x0 +
x A,
where A2×2 is an invertible matrix with two distinct eigenvalues. Model

after (2.8.27) and (2.8.29) to set up criteria for T to be a two-way
stretch, including
(1) axes (i.e. directions of one-way stretches) and scale factors, and
(2) the only invariant point.
For k1 = k2 and k1 k2 =
0, let

16k1 − 15k2 12(k1 − k2 )
A= .
20(−k1 + k2 ) −15k1 + 16k2
Justify your claims above for this T (

x) =
x0 +
xA. Then, try to do the
following problems.
(a) If k1 k2 < 1, find the image of the circle x21 + x22 = 1 under T .
(b) If k1 k2 = 1, find the image of the hyperbola x21 − x22 = 1 under T .
(c) If k1 k2 > 1, find the image of the parabola x22 = x1 under T .
Try to sketch these quadratic curves for some numerical k1 and k2 .
7. Prove (2.8.33).

8. In N = { 0 , e2 }, let k = 0 be a scalar and
e1 ,
√
1 4 − 3k −k

T ( x ) = x0 + xA, where A = √ .
4 3k 4 + 3k
(a) Show that A has coincident eigenvalues 1 but A is not

diagonalizable.
(b) Determine these x0 for which T is a shearing. In this case, where
is the axis?
(c) Show that the coefficient of the shearing is k.
(d) Graph the image of the square with √ vertices at (1, 0), (0, 1),
(−1, 0), (0, −1) under T , where
x0 = ( 3, 1) and k = 1.
√ the image of the curve x1 − x2 = 0 under T , where x0 =

2 2
(e) Graph
(− 3, −1) and k = −2.
2 2
√ of the curve x1 = a , where a is a scalar, under T
(f) Graph the image

with x0 = (3, 3) and k = 3.
9. Prove (2.8.35).
10. Let α2 + β 2 = 0 and

1 α −β
T(x) = x0 + x A, where A = .
α2 + β 2 β α
(a) Show that T is a rotation.

(b) Determine the center and the angle of rotation.
(c) Show that T has properties listed in (1) and (2) of (2.8.34).
11. Prove (2.8.37).
12. Let α2 + β 2 = 0 and

1 α β
T (
x) =
x0 +
xA, where A = .
α2 + β 2 β −α
Model after Example 6 to do all the problems within.

13. Let

−3 5
A= .
2 4
Decompose A as A = A1 A2 A3 , where

1 0 1 15
22 , −3 0
A1 = , A 2 = A3 = .
− 23 1 0 1 0 22
3

and consider the affine transformation, in N = { 0 , e2 },
e1 ,
T (
x ) = (−2, 3) +
xA.
(a) Find the image of the straight line x1 + x2 = 1 under T via the
following methods and graph it and its image.
(1) By direct computation.
(2) By steps indicated in (2.8.38).
(3) By diagonalization of A and try to use (2) of (2.7.72).
(b) Do the same problems as in (a) for the square |x1 | + |x2 | = 2.
(c) Do the same problems as in (a) for the circle x21 + x22 = 1.
(d) Justify (1) of (2.7.9).
14. Let

0 5
A= .
−3 4
Decompose A as A = A1 A2 A3 , where

0 1 1 45 −3 0
A1 = , A2 = , A3 =
1 0 0 1 0 5

and consider the affine transformation, in N = { 0 , e2 },
e1 ,
x ) = (3, −2) +
T ( x A.
Do the same problems as in Ex. 13 but now use (2.8.39).

15. Fix an affine basis B = { a0 , a2 }, Remind that a one-way stretch in
a1 ,
the direction a2 − a0 with a0 +

a1 −
a0 as axes can be represented as

1 0
,
0 k
where k = 0 is the scale factor.
(a) Show that the set of all such one-way stretches

1 0
k∈R
0 k
is a subgroup (see Ex. <A> 2 of Sec. 2.8.1) of Ga (2; R). It is an

abelian group.
(b) Show that

1 0
k > 0
0 k
forms an abelian subgroup of the group in (a).

(c) Suppose B = N = { 0 , e2 }. Show that the unit circle can
e1 ,
be transformed into an ellipse by an orthogonal one-way stretch
and conversely. See Fig. 2.111(a), where the ellipse has equation
x21 + k12 x22 = 1 for 0 < k < 1.
(d) In N , the unit circle can be transformed into a skew ellipse. See
Fig. 2.111(b), where the ellipse has the equation ax21 + 2bx1 x2 +
cx22 = 1 with suitable constants a, b and c. Try the following two
ways.
(1) Use result in (c) and perform a suitable rotation to it.
e2
a2
a1
e1
0 0
(a) (b)
Fig. 2.111
(2) Let ai = (ai1 , ai2 ) for i = 1, 2. Note that | a1 | = |

a2 | = 1 and

a1 ⊥

a2 . Then B = { 0 , a2 } is an orthonormal affine basis
a1 ,
for R2 . For some 0 < k < 1, the ellipse in B is

1 0 ∗
[ x ]B [ x ]B = 1.
0 k12
Then use the change of coordinates [ x ] N AN

x ]B = [ B to change
it into the equation in N .
16. Fix an affine basis B = { a0 , a2 }. Remind that a two-way stretch
a1 ,
stated in 1 of (2.8.30).
(a) Show that

k1 0
k1 , k2 ∈ R and k1 k2 = 0
0 k2
is an abelian subgroup of Ga (2; R).

(b) In N , a circle can be transformed into an ellipse of the form
a11 x21 + 2a12 x1 x2 + a22 x22 = 1 and conversely.
(c) How a circle can be transformed into an ellipse of the form
a11 x21 + 2a12 x1 x2 + a22 x22 + b1 x1 + b2 x2 + c = 0? Try to give out
the details.
17. Fix an affine basis B = {
a0 , a2 } for R2 .
a1 ,
(a) An affine transformation T having the equation, in B,

1 0
[T (
x )]B = [
x ]B , k = 0.
k 1
is called a (skew) shearing. Try to illustrate this T graphically,

including its axis, i.e. the line of invariant points, and all its
invariant lines.
(b) The set of all shearings

1 0
k∈R
k 1
forms an abelian subgroup of Ga (2; R).

1. Hyperbolic rotation
Fix an affine basis B = { a2 } for R2 . Let f denote the one-way
a1 ,
a0 ,
stretch in the direction a2 − a0 , with

a0 + a1 −
a0 as axis and scale
factor k, while g denote the one-way stretch in the direction a1 −
a0 ,
with a0 + a2 − a0 as axis and scale factor k . Then the composite
1
T =g◦f =f ◦g
is called a hyperbolic rotation, See Fig. 2.112.
x
a2
T ( x)
a0
a1
Fig. 2.112
(a) Show that

[T (
x )]B = [
x ]B [T ]B , where
1 1
0 1 0 0
[T ]B = k = k .
0 1 0 k 0 k
Note that T is completely determined by a0 ,
a1 , a point
x not lying

on a0 + a1 − a0 and its image point x = T (

x ).
(b) The set of all hyperbolic rotations
1
0
k k ∈ R and k = 0
0 k
forms an abelian subgroup of Ga (2; R). This subgroup is useful in
the study of hyperbolic (or Lobachevsky) geometry and theory of
relativity.
(c) Suppose | a1 − a0 | = | a2 −
a0 | = 1 and ( a1 − a0 ) ⊥ (a2 − a0 ),
the resulting T is called a hyperbolic (orthogonal ) rotation. Let
[
x ]B = (α1 , α2 ). Then a hyperbola with the lines a0 +a1 −
a0 and
a0 + a2 − a0 as asymptotes has equation α1 α2 = c, where c is

a constant. Let [T ( x )]B = (α1 , α2 ). Then α1 = k1 α1 and α2 = kα2
and hence
α1 α2 = α1 α2 = c,
which indicates that the image point T ( x ) still lies on the same
hyperbola α1 α2 = c. Therefore, such hyperbolas are invariant
under the group mentioned in (b). In particular, the asymptotes
a0 +

a1 −
a0 and
a0 +
a2 −
a0 are invariant lines with a0 as
the only point of invariant.
2. Elliptic rotation
Take two different points a0 and a1 in R2 . Let k be a positive constant.
Denote by f the orthogonal one-way stretch with axis a0 + a1 −
a0

and scale factor k, and by g the rotation with center at a0 and angle θ.
The composite
T = f −1 ◦ g ◦ f
is called the elliptic (orthogonal ) rotation with a0 + a1 − a0 as axis
and k and θ as parameters. See Fig. 2.113. Suppose | a1 −
a0 | = 1. Take
another point a2 so that |a2 −
a0 | = 1 and (a2 − a0 ) ⊥ (a1 − a0 ).
g ( f ( x))
f ( x)
T ( x) x

a0 a1
Fig. 2.113
Then B = {
a0 , a2 } is an affine orthonormal basis for R2 .
a1 ,
(a) Show that
[T (
x )]B = [
x ]B [T ]B ,
where

1 0 cos θ sin θ 1 0
[T ]B = [f ]B [g]B [f ]−1
B=
0 k − sin θ cos θ 0 1
k
1

cos θ k sin θ .
=
−k sin θ cos θ
(b) Suppose the rotation center a0 , the axis a0 +

a1 − a0 and the
scale factor k are fixed. Then the set of all elliptic rotations

cos θ 1
k sin θ θ ∈ R
−k sin θ cos θ
forms an abelian subgroup of Ga (2; R). This group is of fundamental

in the study of elliptic geometry.

(c) Let 0 < k < 1. An ellipse with center √ at a0 , the major axis
a0 + a1 − a0 and the eccentricity 1 − k 2 has the equation

kα12 + α22 = r > 0, where [ x ]B = (α1 , α2 ). Its image under f is

a circle kα2 + kα2 = r. Conversely, f −1 will recover the ellipses from
2 2
the circles. Hence, T maps such a family of ellipses onto itself.

(d) An elliptic rotation is completely determined by its center a0 , its
major axis a0 +a1 −
a0 and a pair x of points where
x, x = a0 ,

x = a0 .

3. An affine transformation with a unique line of invariant points (called

the axis) is called an affinity. Suppose T is an affinity with axis L and
T maps a point x 0 .
x0 , not lying on L, into the point
(a) In case the line x 0 is parallel to L, show that T is an orthogonal
x0
shearing with L as its axis (see (2.8.32)).
(b) In case the line x 0 is not parallel to L, show that T is a one-way
x0
stretch in the direction x 0 and with axis L.
For counterparts in R3 , refer to Sec. 3.8.2.

Results detained here and Sec. 3.8.2 can be generalized to abstract
n-dimensional affine space V over a field F, in particular, the real field.
2.8.3 Affine invariants

We come back to (1) of (2.7.9) and prove all of them to be true in the
content of affine transformations.
Affine invariants
Affine transformation T (x) = x0 +
x A, where A2×2 is an invertible matrix,
preserves 1–7 but not necessarily preserves a–d in (1) of (2.7.9). (2.8.40)
Various examples in Sec. 2.8.2 convinced us that (2.8.40) indeed holds.

Formal proof will be given as follows.
Proof For 1, adopt notations and results in (2.5.9). The image of the line

Li :
x =
ai + t bi for i = 1, 2 under T (
x) =
x0 +
x A is

T (Li ): T (
x) =
x0 +
ai A + t( bi A), t ∈ R,

which is still a line passing the point x0 +
ai A with direction bi A. Therefore,
a. L1 is coincident with L2 .

⇔ The vectors a2 − a1 , b1 and b2 are linearly dependent.

⇔ (since A is invertible) ( a2 A) − (
x0 + x0 + a2 −
a1 A) = ( a1 )A, b1 A

and b2 A are linearly dependent.
⇔ T (L1 ) is coincident with T (L2 ).
b. L1 is parallel to L2 .

⇔ b1 and b2 are linearly dependent, but linearly independent of
a2 −

a1 .

⇔ b1 A and b2 A are linearly dependent, but linearly independent of
( a2 A) − (
x0 + x0 + a2 −
a1 A) = ( a1 )A.
⇔ T (L1 ) is parallel to T (L2 ).
c. L1 is intersecting with L2 at a unique point.

⇔ b1 and b2 are linearly independent.

⇔ b1 A and b2 A are linearly independent.
⇔ T (C1 ) intersects with T (C2 ) at a unique point.
Due to one-to-one and onto properties, point of intersection is mapped into
point of intersection in Case c.
For 2, adopt notations and concepts concerned in (2.5.11). The image
of the directed segment x = (1 − t)
a2 :
a1 a2 , 0 ≤ t ≤ 1, under T , is
a1 + t
T ( x0 + (1 − t)
x) = a1 A + t
a2 A
= (1 − t)(
x0 +
a1 A) + t(
x0 +
a2 A), 0 ≤ t ≤ 1,
which is the directed line segment with the initial point
x0 + a1 A and the

terminal point x0 + a2 A. It is obvious that interior points (i.e. 0 < t < 1)
are mapped into interior points and endpoints into endpoints.
For 3, adopt notations in (2.5.10). Take four points

xi on the line
L: x = a1 + t( a2 − a1 ), say

xi = a2 −
a1 + ti ( a1 ), i = 1, 2, 3, 4.
The directed segments

x1 x2 x2 −
= x1 = (t2 − t1 )(
a2 −
a1 ), and

x3 x4 =
x4 −
x3 = (t4 − t3 )(
a2 −
a1 )
have their respective image under T the directed segment
T (
x1 )T ( x2 ) − T (
x2 ) = T ( x1 ) = (t2 − t1 )(
a2 −
a1 )A
T (
x3 )T ( x4 ) − T (
x4 ) = T ( x3 ) = (t4 − t3 )(
a2 −
a1 )A.
x3 =
Therefore (refer to (1.4.5)), in case x4 , it follows that
T (x1 )T (x2 ) t2 − t1
x1 x2
= = (ratios of signed lengths),

T ( x3 )T ( x4 ) t4 − t 3 x3 x4
which proves the claim. If x2 and
x1 x3
x4 lie on different but parallel lines,
the same proof will work by using 2 in (2.5.9).
For 4, refer to (2.5.12) and Fig. 2.26. By 2, it is easy to see that the

image of ∆ a1 a3 under T is the triangle ∆ b1 b2 b3 where bi = T (
a2 ai ) for

i = 1, 2, 3. Let a be any fixed interior point of ∆ a1 a2 a3 (one might refer
to Fig. 2.31 for a precise definition of interior points of a triangle). Extend
the line segment a to meet the opposite side
a3 a2 at the point
a1 a4 .

Then, a4 is an interior point of a1 a2 . By 2 or 3, T ( a4 ) is an interior point

of the side b1 b2 . Since the line segment a1
a4 is mapped onto the line

segment b3 T ( a4 ), the interior of a implies that T (

a ) is an interior point

of b3 T (
a4 ). See Fig. 2.114. Hence T (a ) is an interior point of the triangle

∆ b1 b2 b3 .
a3
a1
a4 a2
Fig. 2.114
For 5 and 6, both are easy consequences of 4.

For 7, let us start from a triangle ∆ a1

a2
a3 (see Fig. 2.114) and find the

signed area of its image ∆ b1 b2 b3 under T . In what follows, it is supposed
that the readers are familiar with the basic knowledge about the determi-
nants of order 2 and it is suggested to refer to Sec. 4.3 or Ex. of
Sec. 2.6 if needed.

The image triangle ∆ b1 b2 b3 , where bi = T ( a i) = x0 + ai A for
1 ≤ i ≤ 3, has the

1 T ( a2 ) − T ( a1 )
signed area = det
2 T (a3 ) − T ( a1 )

1 ( a − a )A

= det 2 1
2 ( a3 − a1 )A

1 ( a2 − a1 )
= det A
2 ( a3 − a1 )
= (the signed area of ∆
a1
a2
a3 ) det A (2.8.41)

⇒ = | det A|. (2.8.42)
the area of ∆a1
a2
a3
From this relation, it follows that an affine transformation preserves the
ratio of (signed) areas of two triangles and hence, two non-self-intersecting
polygons and finally, by limiting process, two measurable planar domains.
Examples in Sec. 2.8.2 show that a general affine transformation does
not necessarily preserve length, angle and area except det A = 1. It does
preserve the direction if det A > 0, and
reverse the direction if det A < 0. (2.8.43)
For formal definitions for directions or orientations for the plane, please
refer to Ex. <A> 4 of Sec. 2.8.1. 2
As we have experienced so many numerical examples concerned with
areas in the previous exercises and examples, such as examples in
Exs. 7 and 8 of Sec. 2.5 and in Sec. 2.8.2, we single out (2.8.41)–
(2.8.43) as parts of the following summary.
The geometric interpretation of the determinant of
a linear operator or a square matrix
Let f ( x A: R2 → R2 be an invertible linear operator in the natural
x) =
basis N = { e1 , e2 } for R2 , where

a11 a12
A=
a21 a22
is an invertible real 2 × 2 matrix. The determinant

a11 a12
det f = det A =
a21 a22

= the signed area of the parallelogram a1 a2 , where

a1
= f (
e1 )
= (a11 , a12 ) and =
a2 f (
e2 ) = (a21 , a22 )

the signed area of a1 a2
= .
the area of the square e1 e2
See Fig. 2.115.
a1
e2
a2
f ( x) = xA or
a1
0 e1 0 a2
0 (det A> 0) (det A<0)
Fig. 2.115
Therefore, for any affine transformation T (

x) =
x0 +
x A,
1. The signed area of the image
∆T (
a1 )T (
a2 )T (
a3 ) = (the signed area of ∆
a1
a2
a3 ) det A.
It is also true for any simple closed polygon and measurable planar
domain.
2. T preserves orientation ⇔ det A > 0; and
T reverses orientation ⇔ det A < 0. (2.8.44)
It is worthy of mentioning that det A = 0 can be used to interpret the

parallelogram a1 a2 is degenerated to a single point or a line segment in
Fig. 2.115.
What kind of linear operator or associated affine transformation will pre-
serve length, angle, and hence area? It is the orthogonal operator (matrix).
This is one of the main topics to be touched in Part 2.
Exercises
<A>
1. Prove (2.8.40) by using (2.8.38) and (2.8.39).
2. Prove (2.8.44) by using (2.8.38) and (2.8.39).
3. On the sides a2 ,
a1 a3 , and
a2 a1 of a triangle ∆
a3 a1
a2
a3 , pick three

points b3 , b1 and b2 respectively, so that the directed segments a1 b3 =

3 b3 a2 , a2 b1 = b1 a3 and a3 b2 = 2 b2
a1 .
a1
b2
a1 '
b3
a3 '
a2 '
a2 a3
b1
Fig. 2.116
See Fig. 2.116. Compute

the area of ∆a 1
a 2
a 3

the area of ∆ a1 a2 a3
by the following methods.
(1) May suppose ∆ a1
a2
a3 as a right-angle triangle with its two legs
lying on the axes and then use 7 in (2.8.40).
(2) Find an affine transformation mapping ∆ a1 a 1
a3 onto ∆
a2 a 2
a 3 .
(3) Use Menelaus theorem (see Sec. 2.8.4 or Ex. 3 below, if needed).

1. (Refer to [14]) Suppose T : R2 → R2 has the properties:
(1) T is one-to-one and onto, and
(2) T maps any three collinear points into three collinear points.
Show that T is an affine transformation. Try the following steps:
(a) T maps straight lines into straight lines.
(b) T maps parallel lines into parallel lines.
(c) T maps intersecting lines into intersecting lines and preserves the
point of intersection.
(d) T preserves the midpoint of a line segment.
(e) T maps n equidistant points on a line segment into n equidistant
points on a line segment, for any natural number n.
(f) (Darboux’s Theorem) T preserves interior points of a line

segment.
(g) T preserves signed ratios of line segments along the same or parallel
lines.
2. Give a triangle ∆ a1
a2
a3 and three positive number α, β, γ with

0 < α < 1, 0 < β < 1 and 0 < γ < 1. Suppose the point b1 divides

the side a3 internally into the ratio α: 1 − α; b2 divides the side
a2 a3
a1

internally into the ratio β: 1 − β; and b3 divides the side a1 a2 internally

into the ratio γ: 1 − γ. Also, suppose the line segments a1 b1 ,
a2 b2 and

a3 b3 intersect into a triangle ∆ a 1
a 2
a 3 . (Refer to Fig. 2.116.)
(a) Compute
the area of ∆a 1
a 2
a 3
.
the area of ∆
a1 a2 a3

(b) (Ceva Theorem) The lines a1 b1 ,
a2 b2 and
a3 b3 are concurrent at
a point if and only if
αβγ
= 1.
(1 − α)(1 − β)(1 − γ)

3. Let ∆a1
a2
a3 and b1 , b2 , b3 and α, β, γ be as in Ex. 3, except that now

b1 divides a2 a3 externally into α: 1 − α. See Fig. 2.117.

a1
b3
b2
b1
a3
a2
Fig. 2.117
(a) Compute

.
the area of ∆a1
a2
a3

(b) (Menelaus Theorem) The three points b1 , b2 and b3 are collinear
if and only if
αβγ
= −1.
(1 − α)(1 − β)(1 − γ)

(c) What happens to (b) in case b2 b3 is parallel to
a2
a3 ?

Properties 1–6 in (2.8.40) are still true for affine transformation on an
n-dimensional affine space V over a field F, with suitable extensions as
follows.
1. The relative positions of a r-dimensional affine subspace and a
s-dimensional subspace, where 0 ≤ r, s < n.
4. n-dimensional simplex.
5. n-dimensional parallelepiped, etc.
While 6, 7 and a to d are still true for n-dimensional inner product space.
For a clearer picture about what we have claimed here, one might refer
to Chap. 3 and Sec. 5.9.
2.8.4 Affine geometry

The study of these geometric properties invariant under the affine group
Ga (2; R)constitutes the content of the so-called affine geometry on the affine
plane R2 . According to (2.8.40), roughly speaking, affine geometry investi-
gates the following topics:
1. Parallelism.
2. The ratio of line segments (along the same or parallel lines).
3. Collinear points (e.g. Menelaus Theorem).
4. Concurrent lines (e.g. Ceva Theorem).
5. Centroid (or median point) of a triangle.
But not such geometric problems concerned with lengths, angles and hence
areas which are in the scope of Euclidean geometry (refer to Sec. 4.9).
Here in this subsection, we state and prove three fundamental theorems
on planar affine geometry: Menelaus, Ceva and Desargues theorems, in such
a way that they can be extended verbatim to abstract affine plane over a
field. One can refer to [19] for their historical backgrounds.
Menelaus Theorem
For any three non-collinear points x1 ,
x2 and x3 in R2 , pick up a point
y1
on the line x2 + x3 − x2 , a point y2 on the line x3 + x1 − x3 and a

point
y3 on the line x1 +
x1 − x2 so that
y1 ,
y2 and y3 do not coincide

with the vertices of ∆ x1 x2 x3 . Then

y1 , y2 and y3 are collinear.
⇔ (
x2 y1 : x3 )(
y1 y2 :
x3 x1 )(
y2 y3 :
x1 x2 ) = −1,
y3
where y1 denotes the directed segment from

x2 x2 to
y1 , etc. Equivalently, if
y1 divides x2 x3 into α: 1 − α, y2 divides x3 x1 into β: 1 − β and

y3 divides
x1 x2 into γ: 1 − γ, then

and
y1 , y2 y3 are collinear.
⇔ (1 − α)−1 α(1 − β)−1 β(1 − γ)−1 γ = −1. (2.8.45)
See Fig. 2.118.
x1
y3
γ
y2
y3 x1
y2
1− γ
y1
x3 x2 y1
x3
x2
Fig. 2.118
Proof (For another proof, see Ex. 3 of Sec. 2.8.3.) According to the
assumptions, we have

y1 = (1 − α)
x2 + α
x3 , α = 0, 1

y2 = (1 − β)x3 + βx1 , β = 0, 1

y3 = (1 −
γ) x1
+ γ x2 , γ = 0, 1. (∗1 )
The necessity From the first two equations, it follows that

y2 − x1 − (1 − α)
y1 = β x2 + (1 − α − β)
x3 .
Then y2 is parallel to
y1 x2 if and only if 1 − α − β = 0 holds.
x1
In case y1 y2 is not parallel to

x2 , i.e. 1 − α − β = 0. Eliminate
x1 x3
from the first two equations, one obtains the equation
(1 − β)y 1 − α
y2 = (1 − α)(1 − β)
x2 − αβ
x1 (∗2 )
⇒ (multiply both sides by (1 − α − β)−1 )
(1 − α − β)−1 (1 − β)
y1 − (1 − α − β)−1 α
y2
= (1 − α − β)−1 (1 − α − β + αβ)
x2 − (1 − α − β)−1 αβ
x1 ,
which represents a point lying simultaneously on the segments x1
x2 and

y1 y2or their extended lines. Suppose this point is y3 . Then, compare it
with the third equation in (∗1 ) and get the relations

1 − γ = −(1 − α − β)−1 αβ
γ = (1 − α − β)−1 (1 − α)(1 − β)
⇒ (1 − α)−1 α(1 − β)−1 β(1 − γ)−1 γ

= (1 − α)−1 α(1 − β)−1 β[−β −1 α−1 (1 − α − β)]
× (1 − α − β)−1 (1 − α)(1 − β)
= −(1 − α)−1 (1 − α)αα−1 (1 − β)−1 (1 − β)ββ −1
× (1 − α − β)(1 − α − β)−1
= −1. (∗3 )
In case y2 is parallel to
y1 x2 , i.e. 1 − α − β = 0. Then 1 − α = β
x1
holds and also
(1 − α)−1 α(1 − β)β −1 = 1, (∗4 )
which is equivalent to (∗3 ) once we can interpret what
γ
lim = −1 (∗5 )
γ→∞ 1 − γ
x1 +
means geometrically. To see this, notice that the lines x2 −
x1 and
y1 + y2 − y1 are parallel to each other and they never meet at a point

in R2 within our sight. Suppose there were a point where these two lines
meet. By comparing the third equation in (∗1 ) to (∗2 ), there exists some
scalar t ∈ R such that
1 − γ = −tαβ
γ = t(1 − α)(1 − β)
⇒ (using (1 − α)(1 − β) = 1 − α − β + αβ = αβ)

1 − γ = −γ
⇒ 1 = 0,
which is a contradiction. Now, by imagining the parallel lines x2 −
x1 + x1
and y1 + y2 − y1 extended beyond any limit and met at a point, called

the infinite point (refer to Ex. 5 of Sec. 2.8.5), denoted as ∞. It is in

this sense that (∗5 ) means geometrically. See Fig. 2.119.
Refer to the Remark after this proof.
The sufficiency It is well-known that three points y1 ,
y2 and y3 are
collinear or affinely dependent (refer to Sec. 2.6) if and only if there exist
x1
y3 = ∞
x2
x3
y1
y3 = ∞ y2
Fig. 2.119
scalars t1 , t2 , t3 , not all zeros, so that

t1
y 1 + t2
y 2 + t3
y3

= [t2 β + t3 (1 − γ)]
x1 + [t3 γ + t1 (1 − α)]
x2 + [t1 α + t2 (1 − β)]
x3 = 0 ,
where t1 + t2 + t3 = 0. By assumptions, y1 ,
y2 and y3 do not coincide with
any one of x1 , x2 and x3 . Therefore, y1 , y2 and

y3 are all distinct and t1 , t2
and t3 can be chosen, all not equal to zero, so that
t−1
2 t3 = −(1 − γ)
−1
β,
t−1
3 t1 = −(1 − α)
−1
γ,
t−1
1 t2 = −(1 − β)
−1
α
hold. In this case,

t1
y 1 + t2
y 2 + t3
y3 = 0
and
t1 + t2 + t3 = t1 (1 − α + α) + t2 (1 − β + β) + t3 (1 − γ + γ)
= [t2 β + t3 (1 − γ)] + [t3 γ + t1 (1 − α)] + [t1 α + t2 (1 − β)] = 0.
This proves that y1 ,
y2 ,
y3 are collinear.
In case, say y3 = ∞, the infinite point, then t3 = 0 if and only if γ = ∞.

At this time, the points y1 and y2 are still collinear.
Remark (∗4 ) and (∗5 ) are related to the well-known theorem: in a triangle
∆
x1
x2
x3 , the line segments (see Fig. 2.120)
y1 y2

x1
x2

x3 y2 x3 y1
⇔ = (ratio of signed lengths),
y2 x1 y1 x2
which is a special case of Menelaus Theorem.
By the way, in case the points y1 ,
y2 and
y3 all do not lie between the

vertices x1 , x2 and x3 or just two of them lie between the vertices, the
Menelaus Theorem is known also as Pasch axiom.
x3
y2 y1
x1 x2
Fig. 2.120
Ceva Theorem Adopt the notations and assumption for x1 ,

x2 ,
x3 and

y1 , y2 , y3 and α, β, γ as in Menelaus Theorem. Then the segments

x1 y1 , x2 y2 and x3
y3 are concurrent.
⇔ ( x2 y1 : y1 x3 )( x3 y2 :

x1 )(
y2 y3 :
x1 y3
x2 ) = 1.
−1 −1 −1
⇔ (1 − α) α(1 − β) β(1 − γ) γ = 1. (2.8.46)
See Fig. 2.121.
x1 x1
y3 y2 y3
z x3 y1
y1 x3 x2 y2 z
x2
Fig. 2.121
Proof (For other proofs, see Ex. 2 of Sec. 2.8.3 and Ex. <A> 1.)
Use Menelaus Theorem twice as follows. In ∆ x1
x2
y2 ,

x3 , z , y3 are collinear.
⇔ (
y2 x3 : x1 )(
x3 y3 :
x1 x2 )(
y3 x2
z :
zy2 ) = −1;
while in ∆
x2
y2
x3 ,

x1 , z , y1 are collinear.
⇔ (
x2 y1 : x3 )(
y1 x1 :
x3 y2 )(
x1 y2
z :
zx2 ) = −1.
By noting that z lies on the line segment
x2
y2 , combine together these
two results and finish the proof.
Desargues Theorem Let x0 ,

y1 ,
z1 ;
x0 ,
y2 ,
z2 and
x0 ,
y3 ,
z3 be three sets
of distinct collinear points in R , lying on different lines. Suppose the line
2
segments y3 and
y2 z3 meet at
z2 x1 , y1 and
y3 z1 meet at
z3 x2 ,
y1
y2 and

z1 z2 meet at x3 . Then

x1 , x2 and
x3 are collinear. (2.8.47)
See Fig. 2.122.
x0
y1
y3
x1 y2
x3
z2
x2
z1 z3
Fig. 2.122
Proof There exist α, β and γ in R such that

x0 = (1 − α) z1 = (1 − β)
y 1 + α z2 = (1 − γ)
y2 + β y3 + γ
z3 .
Hence
y1 − (1 − β)
(1 − α) y2 = −α
z1 + β
z2 . (∗1 )
We claim that α = β. To see this, suppose on the contrary that α = β does
hold. Then (∗1 ) reduces to
y1 −
(1 − α)( z2 −
y2 ) = α( z1 ).
On the other hand, since
x3 lies on y2 and
y1 z1
z2 , there exist t1 and t2
such that

x3 = (1 − t1 ) y2 = (1 − t2 )
y 1 + t1 z 1 + t2 t1 , t2 = 0, 1
z2 ,
−1 −1 −1
⇒ z2

= (1 − t1 )t2 y1 + t1 t2 y2 − (1 − t2 )t2 z1
⇒(1 − α)(
y1 − y2 ) = α[(1 − t1 )t−1 −1
2 y1 + t1 t2 y2 − (1 − t2 )t2 z 1
−1
−
z1 ]
z1 = [(1 − t1 ) − t2 (1 − α)α−1 ]
⇒ y1 + [t1 + t2 (1 − α)α−1 ] y2 .
Since the sum of the coefficients of y1 and y2 in the last relation is equal
to 1, the point z1 lies on the segment y1 y2 . This will induce that

y2 lies on
the line z1 and hence
y1 x0 ,
y1 ,
z1 , and
x0 ,
y2 ,
z2 all lie on the same line,
a contradiction to the assumption. Therefore, α = β is true.
Note that, in (∗1 ), (1 − α) − (1 − β) = −α + β. This means that

1 1
[(1 − α)
y1 − (1 − β)
y2 ] = [−α
z1 + β
z2 ] =
x3
β−α β−α
⇒ (1 − α)
y1 − (1 − β)
y2 = −α z2 = (β − α)
z1 + β x3 , β − α = 0. (∗2 )
Similarly,
y2 − (1 − γ)
(1 − β) y3 = −β z3 = (γ − β)
z2 + γ x1 , γ − β = 0 (∗3 )
y3 − (1 − α)
(1 − γ) y1 = −γ z1 = (α − γ)
z 3 + α x2 , α − γ = 0. (∗4 )
By adding (∗2 ), (∗3 ) and (∗4 ) side by side, we have

x1 + (α − γ)
(γ − β) x2 + (β − α)
x3 = 0
with coefficients γ − β = 0, α − γ = 0, (β − α) = 0 and
(γ − β) + (α − γ) + (β − α) = 0.
Thus,
x1 ,
x2 and
x3 are collinear.
Exercises
<A>
1. Another proof of Ceva Theorem. Let y1 ,

y2 and
y3 be expressed as in
(∗1 ) in the proof of Menelaus Theorem.
y1 +
(a) Show that any point on the line x1 −
y1 can be expressed as
(1 − t1 ) x1 = (1 − t1 )(1 − α)
y 1 + t1 x2 + (1 − t1 )α
x3 + t1
x1 .
y2 +
Find similar expressions for points on the lines x2 −
y2 and
+ x3 − y3 .

y3
(b) The three lines in (a) meet at a point z if and only if the three

expressions in (a) for z will be coincident.
2. (One of the main features of affine geometry) In Fig. 2.123, show that
XY BC ⇔ V is the midpoint of BC, by the following two methods.
(a) Use Ceva Theorem.
(b) Take {A, B, C} as an affine basis with A as base point (see Sec. 2.6).
Let A = (0, 0), B = (1, 0), C = (0, 1) and X = (α, 0), 0 < α < 1
and Y = (0, β), 0 < β < 1. Try to find out the affine coordinates of
− − −
U and V and show that α = β ⇔ 2AV = AB + AC.
Y
V
A U
X B
Fig. 2.123
3. Four straight lines, in the plane R2 , meeting at six points x1 ,

x2 ,
x3 ,
x4

and y1 , y2 are said to form a complete quadrilateral. See Fig. 2.124. Let
the segment x3 meet the segment
x1 y2 at the point
y1 z1 , and x2
x4 meet

y1 y2 at z2 . Then z1 divides y1 y2 into the signed ratio y1 z1 : z1 y2 and

z2 divides y2 into the signed ratio
y1 z2 :
y1 z2
y2 . Show that the ratio
of these two signed ratios is equal to −1, denoted as
(
y1 ,
y2 ;
z1 ,
z2 ) = ( z1 :
y1 y2 )(
z1 z2 :
y1 y2 )−1 = −1
z2
In this case, the four points y1 ,
y2 ,
z1 ,
z2 , in this ordering, are said to
form harmonic points.
(Note In Fig. 2.124, x4
x2 y2 ⇔
y1 z2 = ∞, the infinite point. In this
case, this problem reduces to Ex. 2 as a special case.)
x1 x1
x4
x2 x4
x2
y1
x3 x3
y1
z1
y2
y2 z2
Fig. 2.124
4. In the plane R2 , four different sets of three points x0 ,

y1 ,
z1 ;

x0 , y2 , z2 ; x0 , y3 , z3 and x0 , y4 , z4 lie on different lines, all passing
through the same point x0 . Suppose both y1 ,
y2 ,
y3 ,
y4 and
z1 ,
z2 ,
z3 ,
z4
are collinear. See Fig. 2.125. Now, let

y3 = (1 − t1 )
y 1 + t1
y2 , y4 = (1 − t2 )

y 1 + t2
y2 ,

z3 = (1 − s1 )
z1 + s1
z2 , z4 = (1 − s2 ) z1 + s2

z2 .
Prove that
t1 (1 − t1 )−1 (1 − t2 )t−1
2 = s1 (1 − s1 )
−1
(1 − s2 )s−1
2 ,
x0
y1 y2
y3
y4
z4
z3
z2
z1
Fig. 2.125
which is called the cross ratio of the four points

y3 ,
y4 ,
y1 ,
y2 , in this
ordering, and is denoted as
(
y3 ,
y4 ; y2 ) = t1 (1 − t1 )−1 : t2 (1 − t2 )−1 = t1 (1 − t1 )−1 (1 − t2 )t−1
y1 , 2 .
In case (y3 ,
y4 ; y2 ) = −1, these four points form a set of harmonic
y1 ,
points as in Ex. 3.
(Note In Fig. 2.125, if the line passing yi ’s is parallel to the line passing

zi ’s, to what extent can this problem be simplified both in statement
and in proof?)

Try to review problem sets in Ex. of Sec. 2.6 and use ideas, such as
coordinate triangle and homogeneous area coordinate, introduced there to
reprove Menelaus, Ceva and Desargues Theorems and Exs. <A> 1–4.
Try to extend all the results, including Ex. <A>, to n-dimensional affine
space over a field F.
2.8.5 Quadratic curves

Review (from high-school mathematical courses)
In the Cartesian coordinate system N = { e2 }, the graph of the equation
e1 ,
of the second degree polynomial of two real variables x1 and x2 with real
coefficients
b11 x21 + 2b12 x1 x2 + b22 x22 + 2b1 x1 + 2b2 x2 + b = 0 (2.8.48)
is called a quadratic curve, where b11 , b12 and b22 are not all zeros. It is also
called a conic in a traditional and more geometric feature.
To eliminate x1 and x2 terms, use a translation x1 = x1 +h, x2 = x2 +k

so that
b11 x2 2
1 + 2b12 x1 x2 + b22 x2 + 2(b11 h + b12 k + b1 )x1 + 2(b12 h + b22 k + b2 )x2
+ b11 h2 + 2b12 hk + b22 k 2 + 2b1 h + 2b2 k + b = 0. (∗1 )
Put the coefficients of x1 and x2 equal to zeros, i.e.
b11 h + b12 k + b1 = 0
b12 h + b22 k + b2 = 0.
Case 1 b11 b22 − b212 = 0. The unique solution is

b12 b2 − b22 b1 b12 b1 − b11 b1
h= , k= . (∗2 )
b11 b22 − b212 b11 b22 − b212
The translation x1 = x1 + h, x2 = x2 + k then transforms (∗1 ) into
b11 x2 2
1 + 2b12 x1 x2 + b22 x2 + b = 0, (∗3 )

b11 b12 b1
1
where b = b1 h + b2 k + b = b12 b22 b2 .
b11 b22 − b212
b b2 b
1
To eliminate x1 x2 term in (∗3 ), use a rotation

x1 = x1 cos θ − x2 sin θ
x2 = x1 sin θ + x2 cos θ (∗4 )

so that
(b11 cos2 θ + 2b12 cos θ sin θ + b22 sin2 θ)x2
1
+ [(b22 − b11 ) cos θ sin θ + b12 (cos2 θ − sin2 θ)]x1 x2
+ [b11 sin2 θ − 2b12 cos θ sin θ + b22 cos2 θ]x2 + b = 0. (∗5 )
Put the coefficient of x1 x2 equal to zero, i.e.
(b22 − b11 ) cos θ sin θ + b12 (cos2 θ − sin2 θ) = 0
2b12
⇒ tan 2θ = . (∗6 )
b11 − b22
Choose the smallest positive θ so that (∗6 ) holds. Then the rotation trans-
forms (∗3 ) into
b11 x2 2
1 + b22 x2 + b = 0, (∗7 )
where b11 + b22 = b11 + b12 ,

b11 − b22 = (b11 − b22 ) cos 2θ + 2b12 sin 2θ,
b2
12 − b11 b22 = b12 − b11 b22
2
(here b12 = 0),
2
1
b11 = b11 + b22 ± (b11 − b22 )2 + 4b212 ,
2
2
1
b22 = b11 + b22 ∓ (b11 − b22 )2 + 4b212 .
2
Notice that the choice of plus and minus signs in b11 and b22 should obey
the identity b11 b22 = b11 b22 − b212 .
Case 2 b11 b22 − b212 = 0. Note that, now, b11 + b22 = 0 and b11 b22 ≥ 0 hold.
From (∗6 ), let
3 3
b11 b22
cos θ = and sin θ = ± (± is the sign of b11 b22 ).
b11 + b22 b11 + b22
(∗8 )
The rotation
x1 = x1 cos θ − x2 sin θ
x2 = x1 sin θ + x2 cos θ
will transform (2.8.48) into
b22 x2
2 + 2b1 x1 + 2b2 x2 + b = 0, (∗9 )
where b22 = b11 + b22 ,

b1 = b1 cos θ + b2 sin θ,
b2 = −b1 sin θ + b2 cos θ.
or b11 x2
1 + 2b1 x1 + 2b2 x2 + b = 0 with b11 = b11 + b22 . Since (∗9 ) can be
rewritten as
2
b2 2b1 bb22 − b2 2
x2 + + x1 + = 0,
b22 b22 2b22 b1
the translation
bb22 − b2 b2
x1 = x1 − 2
, x2 = x2 − (∗10 )
2b22 b1 b22
will transform (∗9 ) into
2b1
x2
2 = px1 , where p = − . (∗11 )
b22
We summarize (∗7 ), (∗9 ) and (∗11 ) as
The standard forms of quadratic curves

In the Cartesian coordinate system N = { e2 }, the quadratic curves are
e1 ,
classified into the following nine standard forms, where a1 > 0, a2 > 0 and
a > 0.
x21 x22
1. Ellipse + = 1.
a21 a22
x2 x2
2. Imaginary ellipse 21 + 22 = −1.
a1 a2
x21 x2
3. Two intersecting imaginary lines or point ellipse 2 + 22 = 0.
a1 a2
x21 x22
4. Hyperbola − = 1.
a21 a22
x2 x2
5. Two intersecting lines 21 − 22 = 0.
a1 a2
6. Parabola x22 = 2ax1 .
7. Two parallel lines x21 = a2 .
8. Two imaginary parallel lines x21 = −a2 .
9. Two coincident lines x21 = 0. (2.8.49)
See Fig. 2.126.
(0, a2) (0, a2)
0 0 (a1, 0)
(a1, 0) 0
point ellipse hyperbola
ellipse
(a1, a2)
(a, 0) (−a, 0) (a, 0)
0 0 0 0
two intersecting lines parabola two parallel lines two coincident lines
Fig. 2.126
These nine types of quadratic curves can be characterized respectively, via

point sets, as follows.
1. A bounded set containing (at least) two distinct points.
2. Empty set.
3. A set containing only one point.
4. An unbounded set containing two non-intersecting branches.
5. Two distinct intersecting lines.
6. An unbounded set containing one (connected) branch.
7. Two distinct parallel lines.
8. Empty set.
9. Two coincident lines. (2.8.50)
Unfortunately, we cannot distinguish type 2 from type 8 since both are
empty set. But, in the eyes of the affine complex plane C2 , both do exist
and are of different types. In C2 , the quadratic curves are simplified as five
types only:
(a) Ellipse and hyperbola (combining 1, 2, 4).
(b) Parabola (only 6).
(c) Two intersecting lines (combining 3, 5).
(d) Two parallel lines (combining 7, 8).
(e) Two coincident lines (only 9). (2.8.51)
On the other hand, a careful examination of the coefficients involved
from (∗1 ) to (∗11 ) will lead to
The characterizations of quadratic curves

Let

b b12
B = 11 ∈ M(2; R), a nonzero symmetric matrix;
b12 b22

b = (b1 , b2 ) ∈ R2 , a fixed vector; and b ∈ R, a scalar.
Define (refer to Chap. 4 if necessary)

∗ x
x = 1 , the transpace of
x = (x1 , x2 );
x2

2

x,
x B = x B)∗ =
x ( x∗ =
x B bij xi xj
i,j=1
(called a quadratic form in x1 , x2 );

and

2
b, x∗ =
x = b bi xi .
i=1
Also, let

∗ b11 b12 b1
B b
∆= with det ∆ = b12 b22 b2 ;
b b b b2 b
1

b b12
det B = 11 = b11 b22 − b212 ; and
b12 b22
tr B = b11 + b22 .
The quadratic curve

x, x + b = 0
x B + 2 b ,
is, respectively,
1. Ellipse ⇔ det B > 0, det ∆ < 0, tr B > 0.

2. Imaginary ellipse ⇔ det B > 0, det ∆ > 0, tr B > 0.
3. Point ellipse ⇔ det B > 0, det ∆ = 0.
4. Hyperbola ⇔ det B < 0, det ∆ = 0.
5. Two intersecting lines ⇔ det B < 0, det ∆ = 0.
6. Parabola ⇔ det B = 0, det ∆ = 0, tr B = 0.
7. Two parallel lines ⇔ In case b11 = 0 (or b22 = 0), det B = 0, det ∆ = 0
and

b11 b1 b22 b2
= b11 b − b21 < 0
or <0 .
b1 b b2 b
8. Two imaginary parallel lines ⇔ In case b11 = 0 (or b22 = 0), det B = 0,
det ∆ = 0 and

b11 b1 b22 b2
= b11 b − b21 > 0 or > 0 .
b1 b b2 b
9. Two coincident lines ⇔ In case b11 = 0 (or b22 = 0), det B = 0,

det ∆ = 0 and

b11 b1 b22 b2
= b11 b − b21 = 0
or =0 . (2.8.52)
b1 b b2 b
The quantity det B = b11 b22 − b212 is called the discriminant for quadratic
curves. Sometimes, ellipse, hyperbola and parabola (note that in these cases,
the rank of ∆ is equal to 3) are called non-degenerated conics or irreducible
quadratic curves, while the others are called degenerated or reducible (note
that, the rank r(∆) = 2 for two intersecting lines and r(∆) = 1 for type 7
and type 9). Therefore, in short, for irreducible curves, it is
an ellipse ⇔ det B > 0, or

a hyperbola ⇔ det B < 0, or
a parabola ⇔ det B = 0.
A point c is called a center of a quadratic curve γ if for each point

x on γ,
there is another point y on γ such that
1
c = ( x + y ). (2.8.53)
2
According to this definition, among the standard forms in (2.8.49), ellipse,

hyperbola and two intersecting lines all have center at 0 , two parallel lines
have every point on the line equidistant from both lines as its centers, two
coincident lines have every point on it as its centers, while parabola does
not have center. A non-degenerated conic with a (unique) center is called
a central-conic. There are only two such curves: ellipse and hyperbola and
the criterion for this is det B = 0.
Remark
Instead of the conventional methods shown from (∗1 ) to (∗11 ), one can
employ effectively the techniques developed in linear algebra (even up to
this point in this text) to give (2.8.49) and (2.8.52) a more concise, system-
atic proof that can be generalized easily to three-dimensional quadrics or
even higher-dimensional ones. This linearly algebraic method mainly con-
tains the following essentials:
(a) Homogeneous coordinates of points in affine plane R2 or spaces Rn ,

n ≥ 2 (refer to Ex. of Sec. 2.6).
(b) The matrix representation of an affine transformation such as (2.8.10)
or (2.8.11).
(c) Orthogonal matrix Pn×n , i.e. P ∗ = P −1 (refer to (2.8.32), (2.8.34) and
(2.8.37)).
(d) The diagonalization of a symmetric matrix (refer to Sec. 2.7.6, Ex. 2 in
Sec. B.11 and Ex. 7 in Sec. B.9). (2.8.54)
As a preview to this method, we appoint Ex. 1 for the readers to

practice around. In Secs. 3.8.5, 4.10 and 5.10, we will formulate this method
to prove counterparts of (2.8.49) for quadratic curves and quadrics.
Affine point of view about quadratic curves

As we have known already in (2.4.2), a change of coordinates between affine
bases is a special kind of affine transformations.
Conversely, any affine transformation on R2 can be treated as a change
of coordinates between some affine bases. For example, for any fixed affine
basis B = {a0 , a2 } and a given affine transformation
a1 ,
T (
x) =
x0 +
x A, (2.8.55)
where A = [aij ] ∈ GL(2; R), we

1. consider x0 as [ b0 ]B for some b0 ∈ R2 , and

2. construct a new affine basis B = { b0 , b1 , b2 }, where bi = ai1 ( a1 −
a0 )+

a2 −
ai2 ( a0 ) for i = 1, 2 so that (ai1 , ai2 ) = [ bi ]B is the row vector of

A = AB B , the transition matrix from B to B, and

3. treat x as [ y ]B .
Then T is nothing but the change of coordinates from B to B:

[ b ]0 ]B + [
y ]B AB
B = [ y ]B . Refer to (2.4.2) and Ex. 1 of Sec. 2.4.
Under this convention, we notice the statement in the next paragraph.
Give an arbitrary affine transformation on R2
T (
y) =
x =
y0 +
y A,
y0 ∈ R2 is fixed and A = [aij ]2×2 ∈ GL(2; R). The image of the
where

quadratic curve (2.8.48), in N = { 0 , e2 }, under T is
e1 ,

(
y0 +
y A)B( y A)∗ + 2 b (
y0 + y A)∗ + b = 0
y0 +

⇒ y ∗ + 2
y ABA∗ y0 BA∗ + b A∗ ,
y

+ ( y0∗ + 2 b ,
y0 B y0 + b) = 0, (2.8.56)
which is still a quadratic curve but in the affine basis B = {
y0 ,
+
y0 a1 ,

y0 + a2 }, where a1 = (a11 , a12 ) and a2 = (a21 , a22 ) are row vectors of A.

For two quadratic curves γ1 and γ2 , no matter in what affine bases, if

there exists an affine transformation T on R2 mapping γ1 onto γ2 , i.e.
T (γ1 ) = γ2 , (2.8.57)
then γ1 and γ2 are said to be affinely equivalent and are classified as the
same type.
Note This definition is not good for types 2 and 8. We have compensated
this deficiency by introducing the algebraic criteria in (2.8.52) instead of
the geometric (and intuitive) criteria in (2.8.50).
As a consequence of (2.8.40), from (2.8.50) it follows easily that
quadratic curves of different types in (2.8.49), except types 2 and 8, cannot
be affinely equivalent.
However, quadratic curves of the same type in (2.8.49) indeed are
affinely equivalent in the sense of (2.8.57).
For example, let γ1 and γ2 be two arbitrary ellipses on the plane R2 .
After suitable translation and rotation (see Sec. 2.8.2), one can transform
γ2 into a new location so that its center coincides with that of γ1 and its
major axis lies on that of γ1 . Use γ2∗ to denote this new-located ellipse.
Choose the center as the origin of a Cartesian coordinate system and the
common major axis as x1 -axis, then γ1 and γ2∗ can be expressed as
x21 x2
γ1 : 2 + 22 = 1,
a1 b1
2
x x2
γ2∗ : 21 + 22 = 1.
a2 b2
Then, the affine transformation
a2 b2
y1 = x1 , y2 = x2 (∗12 )
a1 b1
will transform γ1 onto γ2∗ . This means that γ1 and γ2∗ , and hence γ1 and γ2
are affinely equivalent.
For two imaginary ellipses
x21 x22 x21 x22
+ = −1, + = −1
a21 b21 a22 b22
or point ellipses or hyperbolas, (∗12 ) still works.
For two parabolas
x22 = 2a1 x1 , a1 = 0 and x22 = 2a2 x2 , a2 = 0,
the affine transformation
a1
y1 = x1 , y2 = x2 (∗13 )
a2
will do.
Readers definitely can handle the remaining types in (2.8.49), except
types 2 and 8.
We summarize as
The classification of quadratic curves in affine geometry

The quadratic curves are classified into nine types stated in (2.8.49) under
affine transformations (motions). (2.8.58)
We arrange Ex. <A> 3 for readers to prove this result by using (2.8.52)
and by observing (2.8.59) below.
Let us come back to (2.8.56) and compute the following quantities (refer
to (2.8.52)):
det(ABA∗ ) = det A · det B · det A∗ = (det A)2 det B;

ABA∗ y0 BA∗ + bA∗ )∗
(
det ∆ = det ∗
y0 BA∗ + bA∗ y0∗ + 2 b
y0 B y0 + b

∗ ∗
A 0 B b A 0
= det
y0 1 b b y0 1
∗

B b
= (det A)2 det = (det A)2 ∆;
b b
tr(ABA∗ ) = tr(BA∗ A)

= a211 + a221 b11 + 2 a11 a12 + a21 a22 b12 + a212 + a222 b22
(∗14 )
where A = [aij ]2×2 .
Since A is invertible and thus det A = 0, we note that det ABA∗ and
det B, det ∆ and det ∆ all have the same signs.
The implication of tr B upon tr (ABA∗ ) is less obvious. A partial result
is derived as follows. Suppose det B > 0 and tr B > 0 hold. Then
b11 b22 > b212 and b11 > 0, b22 > 0

⇒− b11 b22 < b12 < b11 b22
⇒ tr(ABA∗ ) > (a211 + a221 )b11 − 2(a11 a12 + a21 a22 ) b11 b22
+ (a212 + a222 )b22
= (a11 b11 − a12 b22 )2 + (a21 b11 − a22 b22 )2 ≥ 0.
Since the inverse of an affine transformation is still affine, the assumptions

that det(ABA∗ ) > 0 and tr(ABA∗ ) > 0 would imply that tr B > 0 holds.
Summarize as
The affine invariants of quadratic curves
For a quadratic curve

x, x + b = 0,
x B + 2 b ,
the signs or zeros of det B, and

∗

B b
det
b b
are affine invariants. In case det B and tr B are positive, then the positive-
ness of
tr B
is an affine invariant. (2.8.59)
Later in Sec. 4.10, we are going to prove that these three quantities are
Euclidean invariants.
Exercises
<A>
1. For each of the following quadratic curves, do the following problems:

(1) Determine what type of curve it is.
(2) Determine the affine transformation needed to reduce it to its stan-
dard form.
(3) Write out its standard form.
(4) Sketch the curve in the original Cartesian coordinate system

N = {0, e2 } and in the affine orthonormal basis B where its
e1 ,
standard form stands, of course, in the same affine plane R2 .
(a) x21 − 6x√
1 x2 + x2 − 4x1 + 4x2 √
2
+ 3 = 0. √
(b) 2x1 + 4 3x1 x2 + 6x22 + (8 + 3)x1 + (8 3 − 1)x2 = 0.
2
(c) 2x21 + 3x1 x2 − 2x22 + 5x2 − 2 = 0.

(d) 2x21 + 2x1 x2 + 2x22 − 4x1 − 2x2 − 1 = 0.
(e) x21 − 2x1 x2 + x22 + 2x1 − 2x2 + 1 = 0.
(f) x21 − 2x1 x2 + x22 − 3x1 + 3x2 + 2 = 0.
2. Use (∗1 )–(∗11 ) to prove (2.8.52) in detail.
3. Try to use (2.8.52) and (2.8.59) to prove (2.8.58) in detail.
4. (1994 Putnam Examination) Find the value of m so that the line y = mx

bisects the region
2

2 x
(x, y) ∈ R + y ≤ 1, x ≥ 0, y ≥ 0 .
2
4
For such concept as center, tangent line, pole and polar, diameter and
conjugate diameter for quadratic curves and the methods to derive them,
please refer to Sec. 4.10.

1. In order to preview the linearly algebraic method indicated in (2.8.54)

in the proof of (2.8.52), try the following steps.
(a) Use xx13 and xx23 , where x3 = 0, to replace x1 and x2 respectively in
(2.8.48), so that the quadratic curve has equation
b11 x21 + 2b12 x1 x2 + b22 x22 + 2b1 x1 x3 + 2b2 x2 x3 + bx23 = 0
in the homogeneous coordinate (x1 , x2 , x3 ) for the affine plane R2 .

Note that from this equation (2.8.48) may be obtained by putting
x3 = 1. Rewrite the above equation in matrix form as
∗

B b
x , x ∆ = 0, where ∆ =

and x = (x1 , x2 , x3 ).
b b
(b) Rewrite an affine transformation
(y1 , y2 ) →
v0 + (y1 , y2 )A, v 0 ∈ R2
where
in the homogeneous form

∗

A 0
x = y , where
y = (y1 , y2 , y3 ).
v0 1
Note that this will reduce to the one stated in (2.8.10) or (2.8.11)
by putting x3 = y3 = 1.
(c) Then x,
x ∆ = x ∗ = 0 under the affine transformation
x ∆
becomes
∗

∗ ∗
A 0 A 0 ∗
y ∆ y = 0.
v0 1 v0 1
Now both B and ABA∗ are symmetric matrices. Since B is diago-

nalizable, there exists an orthogonal matrix P so that

λ 0
P BP −1 = 1
0 λ2
is a diagonal matrix. If chosen this P as A, then the equation
becomes
λ1 y12 + λ2 y22 + 2b1 y1 y3 + 2b2 y2 y3 + b y32 = 0
or, by putting y3 = 1,
λ1 y12 + λ2 y22 + 2b1 y1 + 2b2 y2 + b = 0.
(d) Let

cos θ sin θ
A=P = , and
− sin θ cos θ
 
∗ ∗ ∗ b11 b12 b1
A 0 A 0
∆ = b12 b22 b2  .
v0 1 v0 1
b1 b2 b
Try to use bij , bi , b to express bij , bi and b .

(e) Use data obtained so far to prove (2.8.49) and (2.8.52).
The following three problems will introduce the classification of affine
transformations according to the non-degenerated conics and the corre-
sponding invariant subgroups of Ga (2; R).
2. Elliptic rotation (refer to Ex. 2 of Sec. 2.8.2)

Take the unit circle, in N = { 0 , e2 },
e1 ,
x21 + x22 = 1
as the representative of the ellipses in the affine plane R2 . The problem

is to find all possible affine transformations on R2 that keep x21 + x22 = 1
invariant.
(a) Show that an affine transformation
x =
y0 +
y A keeps x21 + x22 = 1
invariant if and only if

y∗ +
y AA∗ y∗ +
y0 A∗ y0∗ +
y A y0∗ = 1
y0

⇔ y ∗ = 1 and
y AA∗
y0 = 0

⇔ AA∗ = I2 and
y0 = 0

y0 = 0 and A∗ = A−1 . Hence, A is
i.e. an orthogonal matrix where

cos θ sin θ cos θ sin θ
A= or
− sin θ cos θ sin θ − cos θ
(see (2.8.34) and (2.8.37)).

(b) The set

cos θ sin θ
θ∈R
− sin θ cos θ
forms a transitive one-parameter subgroup of Ga (2; R), whose mem-

ber is called an elliptic rotation. Transitivity means that for any
two points on x21 + x22 = 1, there is a member of this group that
transforms one of the points into another point.
3. Hyperbolic rotation (refer to Ex. 1 of Sec. 2.8.2)

Take the hyperbola, in N = { 0 , e2 },
e1 ,
x1 x2 = 1
as the representative of hyperbolas. The problem is to find all possible

affine transformations on R2 that keep x1 x2 = 1 invariant.
(a)
x =y0 + x A keeps x1 x2 = 1 invariant if and only if

0 12 0 12 0 12
∗∗ ∗∗ ∗
yA A y + y0 A y + yA y0
1 1 1
2 0 2 0 2 0

0 12
∗
+ y0 y0 = 1
1
2 0

0 12 0 12
∗
⇔A A = and y0 = 0
1 1
2 0 2 0
⇔ (let A = [aij ]2×2 )
a11 = a22 = 0 and a12 a21 = 1 or a12 = a21 = 0 and a11 a22 = 1.
(b) The set

/
a 0
1
a ∈ R and a = 0
0 a
forms a transitive one-parameter subgroup of Ga (2; R), whose

member is called a hyperbolic rotation.
(c) The asymptotes x1 = 0 and x2 = 0 of x1 x2 = 1 are the only invariant

lines under this group. In fact, for each member A of this group,
(x1 , 0)A = a(x1 , 0) for each x1 ∈ R, and
1
(0, x2 )A = (0, x2 ) for each x2 ∈ R,
a
so A has eigenvectors (x1 , 0) and (0, x2 ) with respective eigenvalues
a and a1 .
4. Parabolic translation

Take the parabola, in N = { 0 , e2 },
e1 ,
x1 = x22
as the representative of parabolas. The problem is to find all possible
affine transformations that keep x1 = x22 invariant.
(a) Let x = y0 + y0 = (b1 , b2 ) and A = [aij ]2×2 ∈ GL(2; R).
y A where
Then this affine transformation keeps x1 = x22 invariant if and only
if,
y = (y1 , y2 ),
(b1 + a11 y1 + a21 y2 ) − (b2 + a12 y1 + a22 y2 )2 = 0
⇔ a12 = 0, b1 = b22 , a11 = 2a222 and a21 = 2a22 b2
⇔ x1 = a y1 + 2aby2 + b
2 2
x2 = ay2 + b, or
2
a 0

x = (b2 , b) +
y , a = 0, b ∈ R.
2ab a
All such transformations form a subgroup of two parameters a and
b of Ga (2; R).
(b) Take a = 1. The set

1 0
(b2 , b) +
x b ∈ R and
x ∈ R 2
2b 1
forms a transitive one-parameter subgroup of Ga (2; R), whose mem-
ber is called a parabolic translation.
(c) The linear part of a parabolic translation

1 0
2b 1
is a shearing (refer to Ex. 6 in Sec. 2.7.2 and (2.8.32)) and the x1 -axis
e1 is its line of invariant points.
In what follows, our purpose is to characterize non-degenerated conics

by counting the number of infinite points lying on it. We adopt ideas
and notations used in Ex. of Sec. 2.6. Remind that any point in
R2 is represented by a certain barycentric coordinate as (λ1 : λ2 : λ3 )
where λ1 + λ2 + λ3 = 0. In case λ1 + λ2 + λ3 = 0, (λ1 : λ2 : λ3 ) is said
to represent an infinite point of the affine plane R2 . The set of infinite
points
l∞ = {(λ1 : λ2 : λ3 )|λ1 + λ2 + λ3 = 0}
is called the infinite line of R2 so that R2 ∪ l∞ = P 2 (R) is called the
projective plane (refer to (3.8.60) and Ex. <A> 13 of Sec. 3.8.4.).
5. Any line in the affine plane R2 passes through a unique infinite point.
Try the following steps.
(a) By Ex. 5 of Sec. 2.6, show that any ordinary line l has equation
c1 λ1 + c2 λ2 + c3 λ3 = 0 in barycentric coordinates, where c1 , c2 and
c3 are not all zeros.
(b) Let (λ∗1 : λ∗2 : λ∗3 ) be an infinite point on l, if any. Solve
c1 λ∗1 + c2 λ∗2 + c3 λ∗3 = 0
λ∗1 + λ∗2 + λ∗3 = 0
and get (λ∗1 : λ∗2 : λ∗3 ) = (c2 − c3 : c3 − c1 : c1 − c2 ).
6. Parallel lines in R2 pass through the same infinite point. Try to use
Ex. 5(d) of Sec. 2.6 and Ex. 5.
7. Let ∆A1 A2 A3 be a coordinate triangle. Show that a quadratic curve Γ
passing through the base points A1 , A2 and A3 has the equation
c1 λ2 λ3 + c2 λ3 λ1 + c3 λ1 λ2 = 0
in barycentric coordinates. Try to use Ex. 2(c) of Sec. 2.6. Show
that the number of infinite points lying on Γ has the following criteria:
the discriminant

> 0 ⇔ two infinite points
D = c21 + c22 + c23 − 2c2 c3 − 2c3 c1 − 2c1 c2 = 0 ⇔ one infinite point

< 0 ⇔ none.
8. Let Γ be a quadratic curve passing through the base points of a coordi-
nate triangle. Show that
(a) If Γ is a hyperbola, then there are two infinite points lying on Γ.
(b) If Γ is a parabola, then there is only one infinite point lying on Γ.
(c) If Γ is an ellipse, then there is no infinite point on Γ.
For (c), follow the steps (due to Professor Lu Yang):

2 2
(1) Suppose Γ is xa2 + yb2 = 1 in N = {0, e2 }. Use Ex. 2(c) of
e1 ,
Sec. 2.6 to show that, in barycentric coordinates, Γ has equation
c1 λ2 λ3 + c2 λ3 λ1 + c3 λ1 λ2 = 0,
where
*x x +
2 3 y2 y3 (x2 − x3 )2 (y2 − y3 )2
c1 = 2 + 2 −1 =− + ,
a2 b a2 b2
*x x +
3 1 y3 y1 (x3 − x1 )2 (y3 − y1 )2
c2 = 2 + − 1 = − + ,
a2 b2 a2 b2
*x x +
1 2 y1 y2 (x1 − x2 )2 (y1 − y2 )2
c3 = 2 + − 1 = − +
a2 b2 a2 b2
and A1 = (x1 , y1 ), A2 = (x2 , y2 ), A3 = (x3 , y3 ) in N = {0, e2 }.
e1 ,
(2) By Ex. 7, the discriminant is
Qxx Qxy Qyy
D= + 2 2+ 4
a4 a b b
where Qxx is a polynomial in x1 , x2 , x3 ; Qyy is a polynomial in
y1 , y2 , y3 ; and Qxy is a polynomial in x1 , x2 , x3 , y1 , y2 , y3 .
(3) Show that Qxx = Qyy = 0 and
2Qxy = Qxx + 2Qxy + Qyy = · · ·
= −16s(s − a1 )(s − a2 )(s − a3 )
= −16∆2 ,
where a1 = A2 A3 , a2 = A3 A1 , a3 = A1 A2 and ∆ =
s(s − a1 )(s − a2 )(s − a3 ) is the Herron’s formula.
2
(4) Finally, D = − 16∆
a2 b2 < 0.
x2 y2
For (a), let Γ be a2 − b2 = 1, then
2Qxy 16∆2
D=− 2 2
= 2 2 > 0.
a b a b
For (b), let Γ be y 2 = 2px, then
D = Qyy = 0.
Remark As a whole, a quadratic curve

x, x + b = 0
x B + 2 b ,
is classified to be
1. an elliptic type ⇔ (algebraic) det B > 0.
⇔ (geometric) containing no infinite point.
These contain types 1, 2, 3.
2. a hyperbolic type ⇔ det B < 0.
⇔ containing two infinite points.
These contain types 4, 5.
3. a parabolic type ⇔ det B = 0.
⇔ containing one infinite point.
These contain types 6, 7, 8, 9.

See Secs. 3.8.5, 4.10 and 5.10.
CHAPTER 3
The Three-Dimensional Real Vector Space R3
Introduction
In our real world, there does exist a point lying outside a fixed given plane.
For example, a lamp (considered as a point) hanging over a desk (considered
as a plane) is such a case.
Figure 3.1 shows that one point R is not on the plane Σ. The family
of the straight lines connecting R to all arbitrary points in Σ are consid-
ered, in imagination, to form a so-called three-dimensional space, physically
inhabited by the human being.
Σ
Q
O
P
Fig. 3.1
Therefore, we have the
Postulate Four different non-coplanar points determine a unique (three-

dimensional) space.
Here and only in this chapter, the term “space” will always mean three-
dimensional as postulated above. Usually, a parallelepiped including its
interior is considered as a symbolic graph of a space Γ (see Fig. 3.2).
One should be familiar with the following basic facts about space Γ.
(1) Γ contains uncountably many points.
(2) Γ contains the line generated by any two different points in it, the
plane generated by any three non-collinear points in it, and coincides with
the space determined by any four different non-coplanar points in it.
319
320 The Three-Dimensional Real Vector Space R3
Fig. 3.2
(3) A space Γ possesses the following Euclidean geometric concepts or

quantities.
1. Length.
2. Angle.
3. The area of a rectangle is equal to length times width, and therefore,
the area of a parallelogram is equal to height times base (length).
4. The volume of a rectangular box is equal to length times width times
height, and therefore, the volume of a parallelepiped is equal to height
times base area (of a parallelogram).
5. The right-hand system and left-hand system.
See Fig. 3.3.
right-hand system left-hand system
Fig. 3.3
−
A directed segment P Q in space Γ is called a (space) vector, considered
identical when both have the same length and the same direction just as
stated in Sec. 2.1. And the most important of all, space vectors satisfy all the
properties as listed in (2.1.11) (refer to Remark 2.1 in Sec. 2.1). Hereafter, in
this section, we will feel free to use these operational properties, if necessary.
What we need is to note the following facts:
1. α ∈ R and x is a space vector ⇒ α
x and
x are collinear vectors in the
space Γ.
2.
x and y are space vectors ⇒ x +
y and
x,
y are coplanar vectors in
the space Γ.
Sketch of the Content 321
3.
x,
y and z are space vectors ⇒
x +
y +
z and
x,
y,
z may be either
coplanar or non-coplanar.
See Fig. 3.4.
x+y+z z
x
x+y
z
x y x+y+z y y
x x x
collinear coplanar coplanar non-coplanar
Fig. 3.4

Refer to the Sketch of the Content in the Introduction of Chap. 2.
The whole theoretical development will be based on the Postulate stated
above.
Since the steps adopted are exactly the same like those in Chap. 2,
only main results and difference between R2 and R3 will be mentioned and
emphasized and the details of proofs will be left to the readers as good
exercises.
Space vectors are already explained in Remark 2.1 of Sec. 2.1. We pro-
ceed at the very beginning to vectorize the space Γ (Sec. 3.1) and coordina-
tize it (Sec. 3.2), then discuss changes of coordinates (Sec. 3.3). The lines
(Sec. 3.4) and the planes (Sec. 3.5) in R3 are introduced.
From Sec. 3.6 to Sec. 3.8, both titles and contents of the sections are
parallel to those of the same numbered sections in Chap. 2.
Note that, in Sec. 3.6, we formally use affine basis B = { a0 ,
a1 , a3 }
a2 ,

to replace the coordinate system Γ ( a0 ; a1 , a2 , a3 ) introduced in Sec. 3.2
and is used thereafter.
Sections 3.4–3.6 lay the foundation for geometric interpretations of
results to be developed in Secs. 3.7 and 3.8.
Linear operators (or transformations, Sec. 3.7) are the main theme in
the whole Chap. 3. Except routine topics as in Sec. 2.7, exercises here
contain some applications of the theory to the related fields, such as Markov
processes and differential equations.
Section 3.8 investigates the affine transformations, affine invariants and

geometry on R3 .
Exercises <D> throughout the whole chapter contain the following
topics:
Section 3.7.6: the limit process and the matrices; matrix exponentials;
Markov processes; homogeneous linear system of differential
equations; the nth order linear ordinary differential equation
with constant coefficients.
Section 3.7.7: differential equations.
The primary connections among sections are listed as follows.
3.1 Vectorization of a Space: Affine Structure

Let O, A1 , A2 and A3 be four different non-coplanar points in a space Γ.
Also, let the space vectors
−
ai = OAi , i = 1, 2, 3.

Then ai = 0 for i = 1, 2, 3 and any one of the vectors a1 ,
a2 ,
a3 can-
not be expressed as a linear combination of the other two, for example,
a1 = α

a3 for any scalars α, β ∈ R.
a2 + β
Take an arbitrarily fixed point P ∈ Γ. Notice that
−
Case 1 If O, A1 , A2 and P are coplanar, then the vector OP =
x1
a1 + x2
a2 for some scalars x1 and x2 .
−
Case 2 If O, A3 and P are collinear, then OP = x3 a3 for some scalar x3 .
3.1 Vectorization of a Space: Affine Structure 323
Therefore, one may suppose that P is positioned in space Γ so that Cases 1

and 2 cannot occur. Under this circumstance, points O, A3 and P are copla-
nar. A unique line L, passing the point P and lying entirely in that plane,
can be drawn parallel to the line generated by O and A3 and intersects at
a point Q with the plane generated by O, A1 and A2 (see Fig. 3.5).
x3 a3 P
A3
x2 a2
A2
O Q
x1 a1 A1
Σ(O; A1, A2)
L (O; A3 )
Fig. 3.5
Hence, there exist scalars x1 , x2 and x3 such that

− − − − −
OP = OQ + QP with OQ = x1 a1 + x2
a2 and QP = x3
a3
−
⇒ OP = x1 a1 + x2
a2 + x3 a3 .
Conversely, for any fixed scalars x1 , x2 and x3 , there corresponds

−
a unique point P in space Γ such that OP = x1
a1 + x2 a2 + x3
a3 holds.
Fix a scalar x3 ∈ R. Move the plane Σ(O; A1 , A2 ) along the direction
x3
a3 up to the parallel plane
x3
a3 + Σ(O; A1 , A2 ) (3.1.1)
(see Fig. 3.6). Then, let x3 run through all the reals, the family of parallel
planes (3.1.1) will fill the whole space Γ.
We summarize as (corresponding to (2.2.2))
Algebraic vectorization of a space

Let O, A1 , A2 and A3 be any fixed noncoplanar points in a space Γ. Let
−
ai = OAi , i = 1, 2, 3
be space vectors.
x3 a3 + Σ(O; A1, A2) x3 a3
a3
a2
Σ(O; A1 , A2 ) 0
a1
Fig. 3.6
(1) The linear combination x1 a1 +x2 a3 , of the vectors

a2 +x3 a1 ,
a2 and
a3
with corresponding coefficients x1 , x2 and x3 , is suitable to describe any
−
point P in Γ (i.e. the position vector OP ). Therefore, the set
Γ(O; A1 , A2 , A3 ) = {x1
a1 + x2 a3 | x1 , x2 , x3 ∈ R}
a2 + x3
is called the vectorized space or a coordinate system with the point O
−
as the origin (i.e. OO = 0 as zero vector) and a1 ,
a2 and
a3 as basis
vectors. It is indeed a vector space over R (see (2.1.11) and Remark 2.2
there), with { a1 , a3 } as an ordered basis (see (3.1.3) and (3.1.4)).
a2 ,
(2) Therefore,
Γ(O; A1 , A2 , A3 ) = L(O; A1 ) ⊕ L(O; A2 ) ⊕ L(O; A3 )
= L(O; A3 ) ⊕ Σ(O; A1 , A2 )
= L(O; A2 ) ⊕ Σ(O; A1 , A3 )
= L(O; A1 ) ⊕ Σ(O; A2 , A3 )
in the isomorphic sense. (3.1.2)
Corresponding to (2.2.3), we have

Linear dependence of space vectors
The following are equivalent.
(1) (geometric) Five different points O, B1 , B2 , B3 and B4 lie in the same
space Γ.
3.1 Vectorization of a Space: Affine Structure 325
⇔ (2) (algebraic) Take any one of the five points as a base point and con-
−
struct four vectors, for example, bi = OB i , 1 ≤ i ≤ 4. Then, at least

one of the four vectors b1 , b2 , b3 , b4 can be expressed as a linear

combination of the other three, i.e. b4 = y1 b1 + y2 b2 + y3 b3 , etc.
⇔ (3) (algebraic) There exist scalars y1 , y2 , y3 , y4 , not all zero, such that

y1 b1 + y2 b2 + y3 b3 + y4 b4 = 0 .

In any of these cases, b1 , b2 , b3 and b4 are said to be linearly dependent.
(3.1.3)
Also, corresponding to (2.2.4), we have
Linear independence of nonzero vectors in space
The following are equivalent.
(1) (geometric) Four different points O, B1 , B2 and B3 are not coplanar

(this implies implicitly that any three of them are non-collinear).
⇔ (2) (algebraic) Take any one of the four points as a base point and
−
construct three vectors, for example, bi = OBi , 1 ≤ i ≤ 3. Then,

any one of the three vectors b1 , b2 , b3 cannot be expressed as a linear

combination of the other two, i.e. b3 = y1 b1 + y2 b2 for any scalars
y1 , y2 , etc.
⇔ (3) (algebraic) If there exist scalars y1 , y2 and y3 , satisfying

y1 b1 + y2 b2 + y3 b3 = 0 ,
then it is necessary that y1 = y2 = y3 = 0.

In any of these cases, b1 , b2 and b3 are said to be linearly independent.
(3.1.4)
It is observed that any one or any two vectors out of three linearly inde-
pendent vectors must be linearly independent, too.
Exercises
<A>
1. Prove (3.1.2)–(3.1.4) in detail.

2. Suppose b1 , b2 , b3 , b4 are elements of Γ(O; A1 , A2 , A3 ).

(a) Prove that b1 , b2 , b3 and b4 should be linearly dependent.

(b) If b1 and b2 are known linearly dependent, then so are b1 , b2 and b3 .

(c) If b1 , b2 and b3 are linearly independent, is it true that b1 and b2
are linearly independent? Why?

(d) Is it possible that any three vectors out of b1 , b2 , b3 and b4 are
linearly independent? Why?
Try to describe Γ(O; A1 , . . . , An ).
3.2 Coordinatization of a Space: R3

Just like (2.3.2), we have
The Coordinatization of a space

Fix an arbitrary vectorized space Γ(O; A1 , A2 , A3 ) of a space Γ, with ordered
−
basis B = { a1 , a3 },
a2 , ai = OAi , i = 1, 2, 3. The coordinate of a point P
in Γ with respect to B is defined and denoted by the ordered triple
− −
[P ]B = [OP ]B = (x1 , x2 , x3 ) ⇔ OP = x1 a1 + x2
a2 + x3
a3 .
Call the set
R3Γ(O;A1 ,A2 ,A3 ) = {(x1 , x2 , x3 ) | x1 , x2 , x3 ∈ R}
the coordinatized space of Γ with respect to B. Explain as follows.
(1) Points in Γ are in one-to-one correspondence with the triples in

R3Γ(O;A1 ,A2 ,A3 ) .
(2) Introduce into R3Γ(O;A1 ,A2 ,A3 ) two operations as
1. addition (x1 , x2 , x3 ) + (y1 , y2 , y3 ) = (x1 + y1 , x2 + y2 , x3 + y3 ),
2. scalar multiplication α(x1 , x2 , x3 ) = (αx1 , αx2 , αx3 ), where α ∈ R,
which have all the properties listed in (2.1.11), treated (x1 , x2 , x3 ) as
x , etc. Hence, R3Γ(O;A1 ,A2 ,A3 ) is indeed a real vector space.

(3) Define a mapping Φ: Γ(O; A1 , A2 , A3 ) → R3Γ(O;A1 ,A2 ,A3 ) by

−
Φ(x1
a1 + x2
a2 + x3
a3 ) = (x1 , x2 , x3 ) or Φ(OP ) = [P ]B ,
then Φ is one-to-one, onto and preserves vector operations, i.e.

− − − −
1. Φ(OP + OQ) = [P ]B + [Q]B (= Φ(OP ) + Φ(OQ)),
− −
2. Φ(αOP ) = α[P ]B (= αΦ(OP )), α ∈ R.
Φ is called a linear isomorphism between the two vector spaces.
3.2 Coordinatization of a Space: R3 327
Therefore, conceptually, Γ(O; A1 , A2 , A3 ) and R3Γ(O;A1 ,A2 ,A3 ) are considered

identical. (3.2.1)
The only main difference between Γ(O; A1 , A2 , A3 ) and R3Γ(O;A1 ,A2 ,A3 ) is in
notations used to represent them, even though, the former might be more
concrete than the latter (see Fig. 3.7).
P ( x1 , x2 , x3 )
A3 (0, 0,1)
OP
A2
a3 (0,1, 0)
Φ
a2
O (0, 0, 0)
a1
A1 (1, 0, 0)
Γ(O; A1, A2, A3) R Γ(O;
3
A1, A2, A3)
Fig. 3.7
The fact that four non-coplanar points determine a space is algebraically

equivalent to three linearly independent vectors, no more or less, generate
the whole space Γ(O; A1 , A2 , A3 ) or R3Γ(O;A1 ,A2 ,A3 ) . Therefore, they both
are called three-dimensional vector spaces.
The diagram in (2.3.3) is still valid for Γ(O; A1 , A2 , A3 ) and
R3Γ(O;A1 ,A2 ,A3 ) , replacing Σ(O; A1 , A2 ) and R2Σ(O;A1 ,A2 ) there, respectively.
Hence, we introduce the
Standard three-dimensional vector space R3 over R
Let
R3 = {(x1 , x2 , x3 ) | x1 , x2 , x3 ∈ R}
and (x1 , x2 , x3 ) = (y1 , y2 , y3 ) mean that xi = yi , 1 ≤ i ≤ 3. Define on R3
two operations as follows:
1. addition (x1 , x2 , x3 ) + (y1 , y2 , y3 ) = (x1 + y1 , x2 + y2 , x3 + y3 ),
2. scalar multiplication α(x1 , x2 , x3 ) = (αx1 , αx2 , αx3 ), where α ∈ R,
which have all the properties listed in (2.1.11), with (x1 , x2 , x3 ) as
x there,
etc. Hence, R3 is a three-dimensional vector space over R. In particular,

zero vector : 0 = (0, 0, 0);
x = (x1 , x2 , x3 ) : −
the inverse vector of x = (−x1 , −x2 , −x3 ),
and the natural basis N = {

e1 , e3 } where
e2 ,

e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1). (3.2.2)
As a whole, R3 is the universal representative of any vectorized space

Γ(O; A1 , A2 , A3 ) or coordinatized space R3Γ(O;A1 ,A2 ,A3 ) of a space Γ, and
Γ(O; A1 , A2 , A3 ) is a geometric model of R3 .
If additional requirements that
1. the straight lines OA1 , OA2 and OA3 are orthogonal,
2. OA1 = OA2 = OA3 in length. (3.2.3)
are imposed on Γ(O; A1 , A2 , A3 ), then it is called a rectangular or Cartesian
coordinate system of R3 ; otherwise, called oblique or affine coordinate system
(see Fig. 3.8).
A3
A3
e3
e3 A2
O A2 e2
0
0 e2
O
A1 A1
e1 e1
rectangular coordinates affine coordinates

(even less 2) (less both 1 and 2)
Fig. 3.8
Unless otherwise specified, from now on, R3 will always be endowed with
rectangular coordinate system with N = { e1 , e3 } as natural basis.
e2 ,
Elements in R are usually denoted by x = (x1 , x2 , x3 ),
3
y = (y1 , y2 , y3 ),
etc. They represent two kinds of concept as follows.
Case 1 (affine point of view) When R3 is considered as a space, element

x −
x is called a point and two points decide a unique vector y −
y or x

with x − x = 0.
Case 2 (vector point of view) When considered as a vector space, element

x in R3 is called a vector, pointing from the zero vector 0 toward the
point
x.
(3.2.4)
See Fig. 3.9. In short, in Case 1, an arbitrary fixed point can be used as a
base point in order to study the position vectors of the other points relative

to the base point. If the base point is considered as zero vector 0 , then
Case 1 turns into Case 2.
x x
y −x
0 y 0
Case 1 Case 2
Fig. 3.9
Remark Reinterpretation of (2) in (3.1.2).

Corresponding to (2.3.5), one has the following
R3Γ(O;A1 ,A2 ,A3 ) = RL(O;A1 ) ⊕ RL(O;A2 ) ⊕ RL(O;A3 )
= RL(O;A1 ) ⊕ R2Σ(O;A2 ,A3 )
= ··· (3.2.5)
and hence, corresponding to (2.3.6), the following
R3 = R ⊕ R ⊕ R
= R ⊕ R2 , (3.2.6)
where, in the sense of isomorphism,
R∼= {(x1 , 0, 0) | x1 ∈ R}, etc., and
2 ∼
R = {(0, x2 , x3 ) | x2 , x3 ∈ R}, etc.
Exercises
<A>
1. Explain (3.2.1) graphically and prove 1, 2 in (3).
2. Any two vectorized spaces Γ(O; A1 , A2 , A3 ) and Γ(O ; B1 , B2 , B3 ) of a
space Γ is isomorphic to each other. Prove this and try to explain it
graphically. Indeed, there are infinitely many isomorphisms between
them, why?
3. Explain and prove the equivalence of linear dependence as stated in
(3.1.3) and linear independence as stated in (3.1.4), respectively, for R3 .
4. Explain (3.2.5) just like (2.3.5).
5. Explain (3.2.6) just like (2.3.6).
6. Vector (or linear) subspaces of R3 are defined in R3 (instead of R2 )

exactly the same as in Ex. <A> 3 of Sec. 2.3. Try to find out graphically
all vector subspaces of R3 (see Exs. 9 and 10 below).

7. Is it possible, for lines or planes not passing the origin 0 in R3 , to be
subspaces of R3 ? Why?
8. Define what it means B is a basis for a vector subspace S of R3 . Prove
that any vector in R3 can be uniquely expressed as a linear combination
of vectors from basis.
(a) Show that any nonzero vector subspace of R3 has a basis.
(b) Show that the numbers of elements in any two bases for a vector
subspace S are the same. This common number, denoted as
dim S,

is called the dimension of S. Note that dim { 0 } = 0.
9. Model after Ex. <A> 3 of Sec. 2.3 to characterize any one-dimensional
vector subspace of R3 .
10. Let S be a vector subspace of R3 . Prove that the following are
equivalent.
(a) dim S = 2.
(b) There are two linearly independent vectors x2 in R3 so that
x1 and
S = x2 = {α1
x1 , x2 | α1 , α2 ∈ R}.
x1 + α2
In this case, S is called generated or spanned by x1 and
x2 .
(c) There exist scalars a1 , a2 and a3 , not all zero, so that
S = {
x = (x1 , x2 , x3 ) | a1 x1 + a2 x2 + a3 x3 = 0}.

In this case, simply call S a plane passing 0 = (0, 0, 0) and denote
S by its equation a1 x1 + a2 x2 + a3 x3 = 0.
11. Let S1 and S2 be subspaces of R3 .
(a) Show that the intersection S1 ∩ S2 and the sum S1 + S2 (see

Ex. <A> 5 of Sec. 2.3) are subspaces of R3 . In case S1 ∩ S2 = { 0 },
denote S1 + S2 by
S1 ⊕ S2 (the direct sum of S1 and S2 ).
(b) Prove that
dim(S1 ∩ S2 ) + dim(S1 + S2 ) = dim S1 + dim S2

so that S1 ∩ S2 = { 0 } if and only if dim(S1 + S2 ) = dim S1 +
dim S2 .
(c) For a given subspace S1 of R3 , try to find all possible subspaces S2

of R3 so that
R3 = S1 ⊕ S2 .
(d) Suppose S1 ⊆ S2 . Prove that the following are equivalent.
(1) S1 = S2 .
(2) S1 ∩ S2 = S2 .
(3) dim S1 = dim S2 .
(4) dim(S1 + S2 ) = dim S1 .
12. Let
x1 = (1, 2, 1),
x2 = (−2, 1, 2), x4 = (0, −3, −5) and
x3 = (−1, 3, 3),
x5 = (2, −9, −11) be vectors in R .
3
(a) Show that S = { x1 ,

x2 ,
x3 , x5 } generates R3 .
x4 ,
(b) Find all subsets of S that are bases for R3 . How many ordered bases
for R3 can be chosen from vectors in S? List them out.
(c) Find all linearly independent subsets of S and all linearly dependent
subsets of S.
(d) Construct all possible but different subspaces of R3 from vectors
in S. Among these subspaces, which are the intersections of the
other two? Which are the sums of the other two?
13. Let S1 = {(2, −3, 1), (1, 4, −2), (5, −2, 0), (1, −7, 3)}, and S2 =
{(4, −17, 7), (0, 6, 1)}.
(a) Determine the subspace S1 of R3 generated by S1 and S2 .
Write out their equations in N (see Exs. 9 and 10).
(b) Determine S1 ∩ S2 and S1 + S2 and write out their
equations.
14. Give the system of homogeneous linear equations:
1
x1 + 2x2 − 6x3 = 0,
3
− 4x1 + 5x3 = 0,
− 3x1 + 6x2 − 13x3 = 0,
11
− x1 + 2x2 − x3 = 0,
3
x = (x1 , x2 , x3 ) in N .
where
(a) Solve the equations. Denote by S the set of all such solutions and
try to explain S geometrically.
(b) Show that S is a subspace of R3 and find a basis for it.
15. Let x2 = (2, 5, −1),

x1 = (3, 1, 1), x3 = (1, −4, 2),
x4 = (4, −3, 3) be
vectors in R .
3
(a) Determine the subspace x1 ,

x2 , x4 and its dimension.
x3 ,
(b) Let y1 = (5, 6, 0), y2 = (8, 7, 1). Show that

y1 and y2 are linearly
independent vectors in x1 , x2 , x3 , x4 .

(c) Which two vectors of x1 ,

x2 ,
x3 and x4 can be replaced by y1

and y2 , say xi and xj , so that y1 , y2 , x i , x j = x1 , x2 , x3 , x4 ,

where x i and
x j are the remaining two vectors of x1 ,
x2 ,
x3 and

x4 ? In how many ways?
(Note To do (b), you need algebraic computation to justify the claim
there. Once this procedure has been finished, can you figure out any
geometric intuition on which formal but algebraic proof for (c) will rely?
For generalization, see Steinitz’s Replacement Theorem in Sec. B.3.)
16. Find scalar k so that the vectors (k, 1, 0), (1, k, 1) and (0, 1, k) are
linearly dependent.
17. Find
the necessary
and sufficient conditions so that the vectors
2 2 2
1, a1 , a1 , 1, a2 , a2 and 1, a3 , a3 are linearly dependent.

Review the comments in Ex. <C> of Sec. 2.3 and then try to extend prob-
lems in Ex. there to counterparts in R3 (or Fn or more abstract vector
space, if possible) and prove them true or false.
1. Suppose
x1 , xk for k ≥ 2 are linearly dependent vectors in R3 .
x2 , . . . ,
Let x be an arbitrary vector in R3 .

(a) The vectors x1 + x,

x2 +
x, . . . ,
xk +
x may be linearly dependent
or independent. Find conditions that guarantee their dependence or
independence and explain these conditions geometrically.
(b) There do exist scalars α1 , . . . , αk , not all zeros, so that x1 + α1
x,

x2 + α2x , . . . , xk + αkx are always linearly dependent. Any geomet-
rical meaning?
2. Let {x1 ,
x2 , x3 } be a basis for R3 . For any vector x ∈ R3 , at most one of

the vectors x , x1 , x2 and x3 can be represented as a linear combination
of the preceding ones.
3. Suppose { x1 , x3 } is a basis for R3 . Let
x2 , x ∈ R3 be such that x can
be expressed as linear combinations of any two of the vectors x1 ,
x2 ,
x3 .

Show that x = 0 both algebraically and geometrically.
4. Let x1 , xk for 2 ≤ k ≤ 4 be linearly dependent vectors in R3
x2 , . . . ,
such that any (k − 1) vectors of them are linearly independent.

(a) If a1 x2 + · · · + ak
x1 + a2 xk = 0 for some scalars a1 , a2 , . . . , ak , show
that either a1 a2 · · · ak = 0 or a1 = a2 = · · · = ak = 0.

(b) In case a1 a2 · · · ak = 0 and b1 x2 + · · · + bk
x1 + b2 xk = 0 also holds,
then a1 : b1 = a2 : b2 = · · · = ak : bk .
5. Do Ex. <C>6 of Sec. 2.3 in case n = 3 and V = R3 .

Do problems in Ex. <C> of Sec. 2.3 if you missed them at that moment.
Problems that follow concern with vector spaces such as Fn , M(m, n; F),
P(F), Pn (F) and F(X, F). For definitions, see Sec. B.1. Readers are required
to be able to extend what have been learned in R, R2 and R3 to abstract
vector spaces over a field F, not to say about geometric meanings, but at
least about linearly algebraic computational techniques.
1. It is well-known that P2 (R) has a natural basis {1, x, x2 }.
(a) Let a0 , a1 and a2 be distinct real scalars. Show that the unique poly-
nomial p0 (x) ∈ P2 (R) satisfying
p0 (aj ) = δ0j , j = 0, 1, 2
is
(x − a1 )(x − a2 )
p0 (x) = .
(a0 − a1 )(a0 − a2 )
Find other two polynomials p1 (x), p2 (x) ∈ P2 (R) so that pi (aj ) = δij
for i = 1, 2 and j = 0, 1, 2.
(b) Show that {p0 , p1 , p2 } is a basis for P2 (R).
(c) Construct three different bases for P2 (R).
(d) Not every basis for P2 (R) is derived as in (a) and (b). Show that
{x2 + 3x − 2, 2x2 + 5x − 3, −x2 − 4x + 4} is a basis for P2 (R).
(e) Show that {1 − 2x + 4x2 , 2 + x − 2x2 , −1 − x + 2x2 } is not a basis
for P2 (R), but any two of them are linearly independent. Try to
construct three different bases for P2 (R) from this set of vectors.
(Note For generalization, see Sec. B.3.)
2. Let r1 , . . . , rn be distinct real numbers. Show that {er1 x , er2 x , . . . , ern x }
are linearly independent in F(R, R).
3. Denote by N the set of positive integers. An element in F(N, F) is called
a sequence in F and is denoted by {an }. Let
V = {{an } ∈ F(N, F) | an = 0 for only a finite number of n}.
(a) Show that V is a proper subspace of F(N, F).

(b) Find a basis for V . Is V finite-dimensional?
4. In M(2; R), let

a −a
V = a, b, c ∈ R , and
b c

d e
W = d, e, f ∈ R .
−d f
(a) Show that V and W are subspaces and dim V = dim W = 3. Find a
basis for each of V and W .
(b) Find a basis for each of V ∩ W and V + W .
(c) Show that dim V + dim W = dim(V ∩ W ) + dim(V + W ).
5. Remind that M(n; C) is the n-dimensional complex vector space, con-
sisting of all complex matrices of order n, while M(n; R) is the real one.
Let
SL(n; R) = {A ∈ M(n; R) | tr A = 0};
SL(n; iR) = {A ∈ M(n; C) | entries of A are all pure imaginaries and
tr A = 0};
E11 , iE11 = the subspace of M(n; C), generated by {E11 , iE11 },
over the reals; E11 ; iE11 ;
M(n; iR) = {A ∈ M(n; C) | entries of A are all pure imaginaries};
S(n; R) = {A ∈ M(n; R) | A∗ = A};
T (n; R) = {A ∈ M(n; R) | A∗ = −A}.
(a) Show that M(n; C) is 2n-dimensional over R. Find a basis for it.
(b) Show that both SL(n; R) and SL(n; iR) are (n2 − 1)-dimensional
over R. Find a basis for each of them.
(c) Show that S(n; R) is an n(n+1)
2 -dimensional space over R. Find a
basis for it.
(d) Show that T (n; R) is an n(n−1)
2 -dimensional space over R. Find a
basis for it.
(e) What are about SL(n; iR) and M(n; iR)?
(f) Show that
M(n; C) = SL(n; R) ⊕ E11 , iE11 ⊕ SL(n; iR);
M(n; R) = SL(n; R) ⊕ E11
= S(n; R) ⊕ T (n; R)
etc. See the following diagram.
3.3 Changes of Coordinates: Affine Transformation (or Mapping) 335
M(n; C)
(2n2 )
@
@
SL(n; R) E11 , iE11 SL(n; iR)
(n2 − 1) (2) (n2 − 1)
@@ @
@
SL(n; R) ⊕ E11 SL(n; iR) ⊕ iE11

M(n; R) M(n; iR)
(n2 ) (n2 )
@
@
S(n; R) T (n; R)

n(n + 1) n(n − 1)
2 2
6. Find dimensions and bases for real vector spaces:
SU(n; C) = {A ∈ M(n; C) | Ā∗ = −A (Skew-Hermitian) and tr A = 0};
SH(n; C) = {A ∈ M(n; C) | Ā∗ = A (Hermitian) and tr A = 0}.
3.3 Changes of Coordinates: Affine Transformation

(or Mapping)

Take points 0 ,
e1 , e3 in R3 and construct the (rectangular or affine)
e2 and

coordinate system Γ( 0 ; e1 , e2 , e3 ) as in (3.2.3) with ordered basis
N = {
e1 , e3 }.
e2 ,
x = (x1 , x2 , x3 ) ∈ R3 ,
Then, for point

x = x1
e1 + x2
e2 + x3
e3
⇒ [
x ]N = (x1 , x2 , x3 ) =
x (3.3.1)
i.e. the coordinate of x with respect to the basis N is the vector x . That

is why Γ( 0 ; e1 , e2 , e3 ) is called the natural coordinate system of R3 and N
the natural basis of R3 . This is the most commonly used coordinate system
in R3 , and unless specified, will be adopted throughout, without mentioning

and indicating the notation Γ( 0 ; e1 ,
e2 ,
e3 ).
The equivalent statements in Ex. <A> 2 of Sec. 2.4 can be, quite simi-
larly, extended to R3 . For convenience and later usage, we list them in
The equivalence of linear independence for vectors in R3

xi = (xi1 , xi2 , xi3 ), i = 1, 2, 3, be vectors in R3 . Then the following are
Let
equivalent.

(1)
x1 ,
x2 ,
x3 are linearly independent (i.e. if α1
x1 + α2
x2 + α3 x3 = 0 ,
then α1 = α2 = α3 = 0) and therefore form a basis { x1 , x3 }
x2 ,
for R3 (i.e. any x ∈ R3 can be expressed uniquely as x = α1

x1 +
α2 x2 + α3 x3 ).
⇔ (2) The determinant, formed by x2 ,
x1 , x3 as row vectors,

x1 x11 x12 x13

x2 = x21 x22 x23 = x11 x22 x33 + x12 x23 x31 + x13 x21 x32

x3 x31 x32 x33
− x13 x22 x31 − x12 x21 x33 − x11 x23 x32
has nonzero value.
⇔ (3) The matrix, formed by
x1 ,
x2 ,
x3 as row vectors,
   
x1 x11 x12 x13
x2  = x21 x22 x23 

x3 x31 x32 x33
is invertible (i.e. if denoted by A, it means that there exists another
3 × 3 matrix B such that AB = BA = I3 and B, denoted by A−1 ,
is called the inverse matrix of A). In this case, the inverse matrix is
 
A11 A21 A31
1
A−1 = adj A with adj A = A12 A22 A32  ,
det A
A13 A23 A33
the adjoint matrix of A,
where det A is as in (2) and Aij is the cofactor of xij in det A, i.e.
deleting the ith row and the jth column from det A, and multiplying
(−1)i+j to the remaining 2 × 2 determinant. For example,

1+1 x22 x23
1+2 x21 x23
A11 = (−1) x32 x33 , A12 = (−1) x31 x33 , etc.
(3.3.2)
For details about matrix and determinant, please refer to Secs. B.4–B.6.
Perhaps, it might be easy for the readers to prove statements in Ex. <A>
2 of Sec. 2.4. But by exactly the same method you had experienced there,
is it easy for you to prove the extended results in (3.3.2)? Where are the
difficulties one might encounter? Can one find any easier way to prove
them? Refer to Ex. 3.
At least, for this moment, (3.3.2) is helpful in the computation of the
Coordinate changes of two coordinate systems in R3
Let
−
Γ(O; A1 , A2 , A3 ) with basis B = { a1 , a3 },
a2 , ai = OAi , 1 i 3, and
−−
Γ(O ; B1 , B2 , B3 ) with basis B = { b1 , b2 , b3 }, bi = O B i , 1 i 3
be two coordinate systems of R3 . Then the coordinates [P ]B and [P ]B of
the point P in R3 have the following formulas of changes of coordinates
(called affine transformations or mappings)

3
xi = αi + yj αji , i = 1, 2, 3, and
j=1

3
yj = βj + xi βij , j = 1, 2, 3,
i=1
simply denoted by

[P ]B = [O ]B + [P ]B AB
B , and
[P ]B = [O]B + [P ]B AB
B ,

where [O ]B = (α1 , α2 , α3 ), [O]B = (β1 , β2 , β3 ), [P ]B = (x1 , x2 , x3 ), [P ]B =
(y1 , y2 , y3 ) and
 
b1 B  
  α11 α12 α13

AB

= 
b  = α α22 α23  is called the transition matrix of
B  2 B  21
α α32 α33
b3 B 31
B with respect to B, and

 
 
a1 β11 β12 β13
 B 
AB
B =   
 a2 B  = β21 β22 β23 is called the transition matrix of

a3 B β31 β32 β33
B with respect to B
satisfying:
1. The determinants

α11 α12 α13 β11 β12 β13

det AB = α21 α22 α23 = 0; = β21 β23 = 0.

B
det AB
B β22
α α32 α33 β β32 β33
31 31

2. The matrices AB B
B and AB are invertible to each other, i.e.
 
1 0 0

B  
AB B B
B AB = AB AB = I3 = 0 1 0
0 0 1
and therefore (see (3) in (3.3.2))
* +−1 B −1
AB B = AB B ; AB = AB
B .
3. Hence, these two formulas are reversible, i.e.

* +−1 * +−1
[P ]B = −[O ]B AB
B + [P ]B AB B , and
* +−1
[O]B = −[O ]B ABB = −[O ]B ABB .
In particular, if O = O , then [O ]B = [O]B = (0, 0, 0) and the affine

mapping is usually called a linear mapping or transformation.
(3.3.3)
The proofs (compare with those of (2.4.2)) are left to the readers. See
Fig. 3.10 (compare with Fig. 2.19).
P B2
A3
B3
O′
O
A2
A1 B1
Fig. 3.10

Remark The computations of AB B
B and AB (extending (2.4.3)).
Adopt the notations in (3.3.3) and use the results in (3.3.2) to help
computation.

Let
ai = (ai1 , ai2 , ai3 ) and bi = (bi1 , bi2 , bi3 ), i = 1, 2, 3, also
   
a1 a11 a12 a13
A =  a2  = a21 a22 a23  and

a3 a31 a32 a33
   
b1 b11 b12 b13
 
B =  b2  = b21 b22 b23  .

b3 b31 b32 b33
Then A and B are invertible (see (3) in (3.3.2)).

By assumption, [ bi ]B = (αi1 , αi2 , αi3 ), i = 1, 2, 3, and then

bi = αi1
a1 + αi2
a2 + αi3
a3
 
a1
= (αi1 αi2 αi3 )  a2 

a3

= [ bi ]B A (remember that [ bi ]B is viewed as a 1 × 3 matrix)

⇒ [ bi ]B = b i A−1 , i = 1, 2, 3.
Similarly,
[ a i B −1 ,
ai ]B = i = 1, 2, 3.
Therefore,
   
b1 A−1 b1
 −1    −1
AB
B =  b2 A  =  b2  A = BA−1 , and
−1
b3 A b3
 −1   
a1 B a1
AB
B = a2 B −1  = 
a2  B −1 = AB −1 (3.3.4)
−1
a3 B a3
are the required formulas.
Example Give two sets of points

O = (1, 0, 0), A1 = (1, 1, 0), A2 = (0, 1, 1), A3 = (1, 0, 1), and

O = (−1, −1, −1), B1 = (1, −1, 1), B2 = (−1, 1, 1), B3 = (1, 1, 1)
in R3 . Construct two coordinate systems Γ(O; A1 , A2 , A3 ) and
Γ(O ; B1 , B2 , B3 ) and establish the formulas of changes of coordinates
between them.
Solution Firstly, by simple computation, one has

−
a1 = OA1 = (1, 1, 0) − (1, 0, 0) = (0, 1, 0),

−
a2 = OA2 = (0, 1, 1) − (1, 0, 0) = (−1, 1, 1),

−
a3 = OA3 = (1, 0, 1) − (1, 0, 0) = (0, 0, 1),

−−
b1 = O B 1 = (1, −1, 1) − (−1, −1, 1) = (2, 0, 2),
−−
b2 = O B 2 = (−1, 1, 1) − (−1, −1, −1) = (0, 2, 2),
−−
b3 = O B 3 = (1, 1, 1) − (−1, −1, −1) = (2, 2, 2), and
−− −−
OO = (−1, −1, −1) − (1, 0, 0) = (−2, −1, −1) = −O O.
Then, let
       
a1 0 1 0 b1 2 0 2
 
A = 
a2  =  −1 1 1 and B =  b2  =  0 2 2

a3 0 0 1 b3 2 2 2
and compute, by the method indicated in (3) of (3.3.2),
 
1 −1 1
A−1 = 1 0 0,
0 0 1
   0 −1 1 
0 4 −4 2 2
1  
B −1 =−  4 0 −4  = − 12 0 1
2 .
8
−4 −4 4 1 1
− 12
2 2
Then,
    
2 0 2 1 −1 1 2 −2 4

AB
B = BA
−1
= 0 2 21 0 0 = 2 0 2,
2 2 2 0 0 1 4 −2 4
  0 −21 1  1
−2 0 1 
0 1 0 2 2
 1  
AB
B = AB
−1
=  −1 1 1   − 12 0 2
= 0 1 − 12 .
0 0 1 1 1
− 12 1 1
− 12
2 2 2 2
Finally, using (3.3.3), for any point P in R3 , we have the following

formulas for changes of coordinates:
 
2 −2 4
[P ]B = (−3 2 −3) + [P ]B  2 0 2  and
4 −2 4
 1 1
−2 0 2
1  
[P ]B = 0 − 1 + [P ]B  0 1 − 12  .
2
1
2
1
2 − 12
Let [P ]B = (x1 , x2 , x3 ) and [P ]B = (y1 , y2 , y3 ), the above two equations

can be rewritten, respectively, as

x1 = −3 + 2y1 + 2y2 + 4y3
x = 2 − 2y1 − 2y3
 2
x3 = −3 + 4y1 + 2y2 + 4y3 , and

 1 1

 y1 = − x1 + x3

 2 2

1 1
y2 = + x2 + x3

 2 2



 y3 = 1 + x1 − 1 x2 − 1 x3 .
1
2 2 2
Exercises
<A>
1. Prove (3.3.2) in detail, if possible (otherwise, see Ex. 3).

3. Let
O = (1, −2, 0), A1 = (1, 1, 2), A2 = (−1, 2, −1), A3 = (−2, −2, 3),
and
O = (−1, −1, −2), B1 = (1, 1, 0), B2 = (0, 1, −1), B3 = (−1, 0, 1)
be two sets of points in R3 . Construct graphically two coordinate systems

− −−
Γ(O; A1 , A2 , A3 ) and Γ(O ; B1 , B2 , B3 ) with
ai = OAi and bi = O B i ,
i = 1, 2, 3.
(a) Prove that the vectors a1 ,
a2 ,
a3 are linearly independent. Hence,
B = { a1 , a2 , a3 } is an ordered basis for Γ(O; A1 , A2 , A3 ).

(b) Prove, similarly, that B = { b1 , b2 , b3 } indeed is an ordered basis
for Γ(O ; B1 , B2 , B3 ).
(c) Find formulas of changes of coordinates.
4. Let Γ(O ; B1 , B2 , B3 ) be a coordinate system in R3 with basis
−−
B = { b1 , b2 , b3 }, where b i = O B i , i = 1, 2, 3. Suppose a point
x = (x1 , x2 , x3 ) ∈ R3 has coordinate [

x ]B = (y1 , y2 , y3 ) with respect
to B , satisfying

x1 = −6 + y1 − y2 + 2y3 ,
x = 5 + y2 − y 3 ,
 2
x3 = 2 + y1 + y2 + y3 .
Determine the positions of the points O , B1 , B2 and B3 ; that is, the

coordinates of these points with respect to the natural basis of R3 .
5. Let the points O, A1 , A2 and A3 be as in Ex. 3, with B = { a1 , a3 },
a2 ,
−
where ai = OAi , i = 1, 2, 3, a basis for Γ(O; A1 , A2 , A3 ). If a point
x ∈ R3 has coordinate [

x ]B = (x1 , x2 , x3 ), repeat Ex. 4.
6. Construct two coordinate systems Γ(O; A1 , A2 , A3 ) and Γ(O ; B1 ,
B2 , B3 ) with basis B and B , respectively, such that the coordinates
[
x ]B = (x1 , x2 , x3 ) and [ x ∈ R3 satisfy the
x ]B = (y1 , y2 , y3 ) of a point
following equations, respectively.
(a) x1 = 1 + 2y1 − 15y2 + 22y3 , (b) x1 = 2y1 + y2 − y3 ,
x2 = −3 + y1 − 7y2 + 10y3 , x2 = 2y1 − y2 + 2y3 ,
x3 = 4 − 4y1 + 30y2 − 43y3 ; x3 = 3y1 + y3 ;
4 8 1
(c) y1 = −15 + x1 + 2x2 + x3 , (d) y1 = 4 + x1 − x2 − x3 ,
3 3 3
1 1 1
y2 = −7 + x1 − x2 + 2x3 , y2 = −3 + x1 + x2 − x3 ,
3 3 3
1 2 1
y3 = 30 − x1 ; y3 = −6 − x1 + x2 + x3 .
3 3 3

1. Consider the following system of equations

x1 = α1 + α11 y1 + α21 y2 + α31 y3 ,
x2 = α2 + α12 y1 + α22 y2 + α32 y3 ,
x3 = α3 + α13 y1 + α23 y2 + α33 y3 ,
where the coefficient matrix
 
α11 α12 α13
 α21 α22 α23 
α31 α32 α33
is invertible. Try to construct, in R3 , two coordinate systems, so that the

above prescribed equations will serve as changes of coordinates between
them. In fact, there are infinitely many such a pair of coordinate systems.
2. Let y = (y1 , y2 , y3 ) be vectors in R3 . Prove the

x = (x1 , x2 , x3 ) and
following.
(a)
x is linearly independent by itself if and only if at least one of the
components x1 , x2 , x3 is not equal to zero.
(b)
x and y are linearly independent if and only if at least one of the
following three determinants

x1 x2 x3
, x2 x3 , x1
y1 y2 y2 y3 y1 y3
is not equal to zero. Try to explain this result geometrically.
(Note Refer to the following Ex. 3 and try to think jointly. Are results
here extendable to R4 ? Rn ?)
3. Prove (3.3.2), noting and answering the following questions.
(1) Can the method used in proving Ex. <A>2 of Sec. 2.4 be adopted
directly in order to prove (3.3.2)? If yes, try out all the details.
Practically, is the method still valuable in proving similar results for
4 × 4 matrices, or even higher order matrices?
(2) If not so easy, at which steps one might encounter difficulties? Can
you overcome it by invoking some other methods which are powerful
and still efficient in proving similar results for higher order matrices?
(3) In the process of proof, are you able to give each algebraic quantity
or step an exact geometric meaning or interpretation?
4. Give a 3 × 3 matrix
 
a11 a12 a13
A =  a21 a22 a23  .
a31 a32 a33
x = (x1 , x2 , x3 ) ∈ R3 , designate
For a vector
 
a11 a12 a13

xA = (x1 x2 x3 )  a21 a22 a23 
a31 a32 a33
% 3 &

3
3
= ai1 xi , ai2 xi , ai3 xi
i=1 i=1 i=1
and consider it as a vector in R3 (see the Convention before Remark 2.3

in Sec. 2.3).
xA is called the image vector of
x under the operation of
A which represents a linear transformation, to be treated in detail in
Sec. 3.7 (see Sec. B.7 for details).
(a) Show that the kernel Ker(A) and the range Im(A) are subspaces
of R3 .
(b) Show that dim Ker(A) + dim Im(A) = dim R3 = 3.
(c) Show that the following are equivalent.

(1) A is one-to-one, i.e. Ker(A) = { 0 }.
(2) A is onto, i.e. Im(A) = R3 .
(3) A maps every or a basis B = { x1 , x3 } for R3 onto a basis
x2 ,
{ x1 A, x2 A, x3 A} for R .
3
(4) A is invertible.
(d) Let B = { x1 , x3 } be a basis for R3 and
x2 , y1 ,
y2 ,
y3 be any vec-
tors in R . Show that there exists a unique linear transformation
3
f : R3 → R3 so that
f (
xi ) =
yi for 1 ≤ i ≤ 3.
(e) Let S be the subspace x1 + 2x2 + 3x3 = 0 in R3 . Show that there are
infinitely many linear transformations f : R3 → R3 with the following
respective property:
(1) Ker(f ) = S.
(2) Im(f ) = S.
Is there any linear transformation f : R3 → R3 such that Ker(f ) =
Im(f ) = S? Try to explain your claim geometrically and analytically.
(f) Let S be the subspace in R3 defined by
−x1 + 2x2 + 3x3 = 0, 5x1 + 2x2 − x3 = 0.
Do the same problem as in (e).
(g) Let S1 be the subspace as in (e) and S2 be the subspace as in
(f). Show that there are infinitely many linear transformations
f : R3 → R3 with the following respective property:
(1) Ker(f ) = S1 and Im(f ) = S2 .
(2) Ker(f ) = S2 and Im(f ) = S1 .
5. Let S1 ⊆ R2 be a subspace and S2 ⊆ R3 be a subspace.
(a) Find all possible linear transformations f : R2 → R3 with the follow-
ing property, respectively:
(1) f is one-to-one.
(2) Ker(f ) = S1 .
(3) Im(f ) = S2 . Be careful about the case that dim S2 = 3.
(4) Ker(f ) = S1 and Im(f ) = S2 .
3.4 Lines in Space 345
(b) Find all possible linear transformations g: R3 → R2 with the follow-

ing property, respectively:
(1) g is onto.
(2) Ker(g) = S2 .
(3) Im(g) = S1 .
(4) Ker(g) = S2 and Im(g) = S1 .
(c) Does there exist a linear transformation from R2 onto R3 ? How
about a one-to-one linear transformation from R3 into R2 ? Why?
(d) Suppose f : R2 → R3 is an one-to-one linear transformation. Find all
possible linear transformations g: R3 → R2 so that
g ◦ f = 1 R2 (the identity transformation on R2 ).
(e) Suppose g: R3 → R2 is an onto linear transformation. Find all pos-
sible linear transformations f : R2 → R3 so that
g ◦ f = 1R 2 .
6. A linear transformation f from R3 (or R2 ) to R is specially called a
linear functional.
(a) Define the kernel Ker(f ).
(b) Prove that there exist unique scalars a1 , a2 and a3 so that
f (
x ) = a1 x1 + a2 x2 + a3 x3 ,
x = (x1 , x2 , x3 ) is in R3 .
where
<C>Abstraction and generalization
Try to extend (3.3.3) to n-dimensional vector space over a field.
3.4 Lines in Space

Throughout this section, Γ(O; A1 , A2 , A3 ) is a fixed coordinate system with
−
basis B = {
a1 , a3 },
a2 , ai = OAi , i = 1, 2, 3.
The straight line determined by O and Ai is denoted by Li , i = 1, 2, 3.
Take a point P in R3 . Then
P ∈ L1
−
⇔ OP ∈ L(O; A1 )
⇔ [P ]B = (x1 , 0, 0).
This characterizes the fact that the second and the third components of the
coordinate [P ]B for a point P lying on L1 should be all equal to zero.
Therefore,
x2 = 0, x3 = 0 (3.4.1)
is called the equation of the first coordinate axis L1 with respect to B
(see Fig. 3.11). Similarly,
x1 = 0, x3 = 0,
(3.4.2)
x1 = 0, x2 = 0,
are, respectively, called the equation of the second and the third coordinate
axes L2 and L3 (see Fig. 3.11).
L3
(0, 0, x3 )
A3 L2
A2
(0, x2 , 0)
O
( x1 , 0, 0)
A1
P L
1
Fig. 3.11
These three coordinate axes, all together, separate the whole space R3
into 23 = 8 quadrants, according to the positive and negative signs of
components x1 , x2 , x3 of the coordinate [P ]B , P ∈ R3 (see Sec. 2.5).
By exactly the same way as we did in Sec. 2.5, one has
Equations of a line with respect to a fixed coordinate system in R3
The straight line L determined by two different points A and B in R3 has
the following ways of representation in Γ(O; A1 , A2 , A3 ) with basis B.
(1) Parametric equation in vector form
− −
L passes the point
a = OA with the direction b = AB, and hence has
the equation

x = t ∈ R,
a +tb,
−
−
where
x = OX is the position vector of a point X on L with respect
to O and is viewed as a point in R3 .
(2) Parametric equation with respect to basis B

[
x ]B = [
a ]B + t[ b ]B , t∈R

or, let [
a ]B = (a1 , a2 , a3 ), [ b ]B = (b1 , b2 , b3 ) and [
x ]B = (x1 , x2 , x3 ),
x1 = a1 + tb1 ,
x2 = a2 + tb2 ,
x3 = a3 + tb3 .
(3) Coordinate equation with respect to B

x1 − a1 x2 − a2 x3 − a3
= = . (3.4.3)
b1 b2 b3
See Fig. 3.12 (compare with Fig. 2.23).
a3 A
b
a B
x X
0
O L
a2
a1
Fig. 3.12
Remark Changes of equations of the same line in different coordinate

systems.
We adopt the notations and results from (3.3.3).
The line L in the coordinate system Γ(O; A1 , A2 , A3 ) has the equation

[X]B = [
a ]B + t[ b ]B ,
− −
where a = OA and b = AB (direction) and X is a moving point on L.
Via the change of coordinates [X]B = [O]B + [X]B AB
B , the equation of L
in the coordinate system Γ(O ; B1 , B2 , B3 ) is

a ]B + t[ b ]B }AB
[X]B = [O]B + {[ B . (3.4.4)
Example (continued from the example in Sec. 3.3) Find the equa-
tions of the straight line determined by the points A = (1, 2, 3)
and B = (−2, 1, −1) in the coordinate systems Γ(O; A1 , A2 , A3 ) and
Γ(O ; B1 , B2 , B3 ), respectively.
Solution In the coordinate system Γ(O; A1 , A2 , A3 ),
−
a = OA = (1, 2, 3) − (1, 0, 0) = (0, 2, 3), [

a ]B = (2, 0, 3), and
−
b = AB = (−2, 1, −1) − (1, 2, 3) = (−3, −1, −4), [ b ]B = (−4, 3, −7).
Let X be a moving point on the line and [X]B = (x1 , x2 , x3 ). Then, the
equation of the line is
(x1 , x2 , x3 ) = (2, 0, 3) + t(−4, 3, −7), t∈R


 x1 = 2 − 4t,
⇒ x2 = 3t, t∈R or

x3 = 3 − 7t,
x1 − 2 x2 x3 − 3
= = .
−4 3 −7
In the coordinate system Γ(O ; B1 , B2 , B3 ),

−− 1 1
a = O A = (1, 2, 3) − (−1, −1, −1) = (2, 3, 4),

[
a ]B = , 1, and
2 2

3 1
b = (−3, −1, −4), [ b ]B = − , − , 0 .
2 2
Let [X]B = (y1 , y2 , y3 ), the equation of the line is

1 1 3 1
(y1 , y2 , y3 ) = , 1, + t − ,− ,0 , t∈R
2 2 2 2

 1 3

 y1 = 2 − 2 t,




1
⇒ y2 = 1 − t, t ∈ R or

 2




 y3 = 1 ,
2
y1 − 12 y2 − 1 y3 − 12
= = .
− 32 − 12 0
By using (3.4.4) and the results obtained in the example of Sec. 3.3,
we are able to deduce the equation of the line in the coordinate system
Γ(O ; B1 , B2 , B3 ) from that in Γ(O; A1 , A2 , A3 ) as follows.
 1 
−2 0 1
2
1  
(y1 , y2 , y3 ) = 0 − 1 + {(2 0 3) + t(−4 3 − 7)}  0 1 − 12 
2 1
2
1
2 − 12

1 1 3 3 1 1
= 0 − 1 + − t − t −
2 2 2 2 2 2

1 3 1 1
= − t 1− t ,
2 2 2 2
which is identical with the above result.
Finally, we list the

Relative positions of two lines in space
Given two lines

L1 : a1 + t b1 , t ∈ R and
x =

L2 : a2 + t b2 , t ∈ R
x =
in R3 , they have the following four kinds of relative positions.

Case 1 Coincident (L1 = L2 ) ⇔ the vectors b1 , b2 and
a2 −
a1 are linearly
dependent.

Case 2 Parallel (L1 L2 ) ⇔ b1 and b2 are linearly dependent, and a2 −
a1

is linearly independent of b1 or b2 .

Case 3 Intersecting (at a unique point) ⇔ b1 and b2 are linearly indepen-

dent, and a2 −
a1 is linearly dependent of b1 and b2 .

Case 4 Skew (neither parallel nor intersecting) ⇔ b1 and b2 and a2 − a1
are linearly independent.
In Cases 1–3, the two lines are coplanar, while in Case 4, they are
non-coplanar. (3.4.5)
Proofs are left to the readers (see Fig. 3.13).
L1 L1 L2 L1
L2
L2
L1 = L2
Fig. 3.13
Exercises
<A>
3.3) Let L√be the line in R deter-
3
3. (continued from Ex. <A> 3 of Sec.
mined by the points A = − 2 , 0, 3 and B = ( 2, 1, 6).
1
(a) Find the equations of L both in Γ(O; A1 , A2 , A3 ) and in

Γ(O ; B1 , B2 , B3 ).
(b) Use the results in (a) to justify (3.4.4).
4. Suppose a line in R3 has the equation

x1 − 1 x2 + 3 x3 − 6
√ = =
2 −2 4

in the rectangular coordinate system Γ( 0 ; e1 ,
e2 ,
e3 ). Find the equation
of the
√ line in another coordinate system Γ(O; A √, A2 , A3 ) where O =
1
(0, 3, 1), A1 = (1, 0, 1), A2 = (1, −1, 0), A3 = (− 2, −1, −2).
5. (continued from Ex. 4) If the equation of a line in the coordinate system
Γ(O; A1 , A2 , A3 ) is
y1 + 2 y2 − 1 y3 + 5
= = ,
1 2 3

find the equation of the line in Γ( 0 ;
e1 ,
e2 ,
e3 ).

1. Suppose that ai , bi and ci , di , i = 1, 2, 3, are scalars, that b1 , b2 , b3 and
d1 , d2 , d3 , are not all equal to zero respectively. Let
x1 − a1 x2 − a2 x3 − a3 y1 − c1 y2 − c2 y3 − c3
= = and = = .
b1 b2 b3 d1 d2 d3
Try to construct two coordinate systems Γ(O; A1 , A2 , A3 ) and
Γ(O ; B1 , B2 , B3 ) in R3 so that the respective equation of the same line
with respect to either of the two coordinate systems is the one or the
other given above. How many such coordinate systems are possible?
2. Prove that the relative positions of two straight lines in R3 are indepen-
dent of the choice of coordinate systems.
3. Let

Li :
x =
ai + t bi , t ∈ R, i = 1, 2, 3,
be three given lines in R3 . Try to discuss all possible relative positions

of them and use ai , bi , i = 1, 2, 3, to characterize each case.
4. Do there exist, in R3 , infinitely many straight lines such that any two of
them are skew to each other? If yes, try to construct one.
5. How to find the distance between two parallel lines or two skew lines in
R3 ? Any formula for it?
3.5 Planes in Space

−
Suppose Γ(O; A1 , A2 , A3 ) with basis B = {
a1 , a3 },
a2 , ai = OAi , i = 1, 2, 3,
is a fixed coordinate system in R3 , throughout this whole section.
3.5 Planes in Space 351
Refer back to Fig. 3.11. Take any point P ∈ R3 , then

−
OP ∈ Σ(O; A1 , A2 ), the plane generated by O, A1 and A2 ,
⇔ [P ]B = (x1 , x2 , 0),
which characterizes the fact that the third component x3 of the coordinate
[P ]B with respect to B must be equal to zero. Hence we call
x3 = 0 (3.5.1)
the equation of the coordinate plane generated by O, A1 and A2 . The other
two coordinate planes, generated respectively by O, A2 , A3 and O, A1 , A3 ,
have respective equation
x1 = 0, and
x2 = 0. (3.5.2)
Note that any two of these three coordinate planes intersect along a
straight line which is a coordinate axis (see Sec. 3.4).
Three non-collinear points A, B and C in R3 determine a unique plane,
still lying in R3 . What is the equation of this plane in a fixed coordinate
system?
In Fig. 3.14, let
Σ ( A; B, C )
X x − aA
b2
C
b1
B a3
a
x a2
0
O
a1
Fig. 3.14
−

a = OA (viewed as a point in R3 ),
−
b1 = AB,
−
b2 = AC (both viewed as direction vectors in R3 ),
−−
x = OX (X, a moving point in R3 and

x viewed as a point).

Notice that b1 and b2 are linearly independent.
For point X ∈ R3 , then

−−
OX ∈ Σ(A; B, C), the plane generated by A, B and C

⇔
x −
a = t1 b1 + t2 b 2 , t1 , t2 ∈ R

⇔x= a+

t1 b 1 + t2 b 2 , t1 , t2 ∈ R.
This is the equation, in vector form, of the plane Σ(A; B, C), passing

through the point
a with directions b1 and b2 .

Just like what we did in Sec. 2.5, one may use coordinates of x,
a, b
and c with respect to B, and eliminate the parameters t1 and t2 to obtain
the coordinate equation of the plane in B. Anyway, we summarize as
Equations of a plane with respect to a fixed
coordinate system in R3
The plane Σ determined by non-collinear points A, B and C in R3 has the
following representations in Γ(O; A1 , A2 , A3 ) with basis B.
(1) Parametric equation in vector form

x =a + t1 b1 + t2 b2 , t1 , t2 ∈ R.

− −
Σ passes through the point a = OA with directions b1 = AB and
− −−
x = OX is viewed as a point in Σ for X ∈ Σ.
b2 = AC, while
(2) Parametric equation in coordinates

[
x ]B = [a ]B + t1 [ b1 ]B + t2 [ b2 ]B

[ b1 ]B

= [ a ]B + (t1 t2 ) , t1 , t2 ∈ R.
[ b2 ]B

Or, let [ a ]B = (a1 , a2 , a3 ), [ bi ]B = (bi1 , bi2 , bi3 ), i = 1, 2, and [
x ]B =
(x1 , x2 , x3 ), then

2
xj = aj + bij ti , t1 , t2 ∈ R, j = 1, 2, 3.
i=1
(3) Coordinate equation

x1 − a1 x2 − a2 x3 − a3

b11 b12 b13 = 0,

b b22 b23
21
or

b12 b13 b b11 b b12
(x1 − a1 ) + 13 (x2 − a2 ) + 11 (x3 − a3 ) = 0,
b22 b23 b23 b21 b21 b22
or simplified as
αx1 + βx2 + γx3 + δ = 0. (3.5.3)
The details of the proof are left to the readers.

Remark Changes of equations of the same plane in different coordinate
systems.
Notations and results in (3.3.3) are adopted directly in the following.
The plane Σ in another coordinate system Γ(O ; B1 , B2 , B3 ) with respect
to B has the equation /
[ b1 ]B
[X]B = [O]B + [
a ]B + (t1 t2 ) AB
B , (3.5.4)
[ b2 ]B
−−
where X ∈ Σ and [X]B = [O X]B .
Example (continued from the example in Sec. 3.3) Find the equations
of the plane determined by the points A = (1, 2, 3), B = (−2, 1, −1) and
C = (0, −1, 4), respectively, in Γ(O; A1 , A2 , A3 ) and Γ(O ; B1 , B2 , B3 ).
Solution One might refer to the example in Sec. 3.4 and results there.
In the coordinate system Γ(O; A1 , A2 , A3 ),
−
a = OA = (0, 2, 3), [ a ]B = (2, 0, 3);
−
b1 = AB = (−3, −1, −4), [ b1 ]B = (−4, 3, −7);
−
b2 = AC = (0, −1, 4) − (1, 2, 3) = (−1, −3, 1), [ b2 ]B = (−4, 1, 0).
−−
For point X in the plane, let [ x ]B = [OX]B = (x1 , x2 , x3 ) and then, the
equation is
(x1 , x2 , x3 ) = (2, 0, 3) + t1 (−4, 3, −7) + t2 (−4, 1, 0), t1 , t2 ∈ R

x1 = 2 − 4t1 − 4t2 ,
⇒ x2 = 3t1 + t2 , t1 , t2 ∈ R

x3 = 3 − 7t1 ,
⇒ 7x1 + 28x2 − 8x3 + 10 = 0.
In the coordinate system Γ(O ; B1 , B2 , B3 ),

−− 1 1
a = O A = (2, 3, 4), [ a ]B = , 1, ;
2 2

− 3 1
b1 = AB = (−3, −1, −4), [ b1 ]B = − , − , 0 ;
2 2

− 5
b2 = AC = (−1, −3, 1), [ b2 ]B = 2, 1, − .
2
Let [X]B = (y1 , y2 , y3 ), then the equation is

1 1 3 1 5
(y1 , y2 , y3 ) = , 1, + t1 − , − , 0 + t2 2, 1, − , t1 , t2 ∈ R
2 2 2 2 2

 1 3
y1 = − t1 + 2t2 ,


 2 2

1
⇒ y 2 = 1 − t 1 + t2 , t1 t2 ∈ R

 2



y3 = 1 − 5 t2 ,
2 2
⇒10y1 − 30y2 − 4y3 + 27 = 0.
By the way, using (3.5.4) and results obtained in the example of Sec. 3.3,
one has

1
(y1 , y2 , y3 ) = 0 − 1 + {(2 0 3) + t1 (−4 3 −7)
2
 1 1
−2 0 2
 
+ t2 (−4 1 0)}  0 1 − 12 
1 1
− 12
2 2
1 1 3 3 1 1 5
= 0 − 1 + − t1 + 2t2 − t 1 + t2 − − t2
2 2 2 2 2 2 2

1 3 1 1 5
= − t1 + 2t2 1 − t1 + t2 − t2
2 2 2 2 2
just as established above. 2
Finally, we present the

Relative positions of a straight line and a plane in R3
Given a straight line L and a plane Σ in R3 , respectively, as

L:
x =
a +tb, t ∈ R, and

Σ: x = c +
t1 d1 + t2 d2 , t1 , t2 ∈ R,
then L and Σ have the following relative positions:

1. Coincident (L ⊆ Σ) ⇔ both b and a −c are linearly dependent on

d1 and d2 .

2. Parallel (L Σ) ⇔ b is linearly dependent on d1 and d2 and a − c is

linearly independent of d1 and d2 .

3. Intersecting (at a unique point) ⇔ b , d1 and d2 are linearly
independent.
(3.5.5)
See Fig. 3.15.

L
L
L
Σ Σ Σ
Fig. 3.15
Also is the
Relative positions of two planes in R3

Given two planes in R3 as

Σ1 :
x =
a + t 1 b 1 + t2 b 2 ,

Σ2 :
x =
c + t1 d1 + t2 d2 , t1 , t2 ∈ R,
then, Σ1 and Σ2 have the following relative positions:

1. Coincident (Σ1 = Σ2 ) ⇔ the vectors a − c , d1 and d2 are linearly

dependent on b1 and b2 .

2. Parallel (Σ1 Σ2 ) ⇔ d1 and d2 are linearly dependent on b1 and b2 ,

but a − c is linearly independent of b1 and b2 .

3. Intersecting (along a straight line) ⇔ at least three of the vectors b1 ,

b2 , d1 and d2 are linearly independent.
(3.5.6)
See Fig. 3.16.
Σ1
Σ1
Σ2
Σ2
Σ1 = Σ2
Fig. 3.16
According to (3.4.3), the line determined by the distinct points

a1 and
a2 in the space R3 has the parametric equation

x = a2 −
a1 + t( a1 ) = (1 − t)
a1 + t
a2 , t ∈ R. (3.5.7)
This is exactly the same as (2.5.10) where a2 are points in R2 .

a1 and
Therefore the definitions and terminology there for
directed segment
a1 a2 (0 ≤ t ≤ 1),

initial point a1 and terminal point a2 ,
interior point (0 < t < 1) and exterior point (t < 0 or t > 1),
endpoints (t = 0 or t = 1), and
1
middle point ( a1 +
a2 ), (3.5.8)
2
all are still valid for space line (3.5.7). See also Fig. 2.25.
A tetrahedron

a1
a2
a3
a4 (3.5.9)
with four non-coplanar points a1 ,
a2 ,
a3 and
a4 as vertices is the space
figure formed by six edges a1 a2 , a1 a3 , a1 a4 , a2 a3 , a2 a4 and

a3
a4 . See
Fig. 3.17. It has four triangles a1 a2 a3 , a2 a3 a4 , a3 a4 a1 and

a4 a1
a2
1
as faces. The plane determined by the points a1 , a2 and 2 ( a3 + a4 ) is called
the median plane along the edge a2 . The median planes along
a1 a1
a2 ,

a1 a3 and a1 a4 will meet along a line passing through a1 and the centroid
3 ( a2 + a3 + a4 ) of a2 a3 a4 . (See Ex. 7.)
1
a1
a4
a2
a3
Fig. 3.17
For more related information, see Sec. 3.6.
Exercises
<A>
4. (continued
√ from Ex. <A> 3 of Sec. 3.3) Let A = (− 12 , 0, 3),
B = ( 2, 1, 6) and C = (0, 1, 9) be three given points in R3 .
− − −
(a) Show that OA, OB and OC are linearly independent, where
O = (0, 0, 0). Hence A, B and C determine a unique plane Σ.
(b) Find equations of Σ in Γ(O; A1 , A2 , A3 ) and Γ(O ; B1 , B2 , B3 )
respectively.
(c) Use (3.5.4) to check results obtained in (b).

5. In the natural rectangular coordinate system Γ( 0 ;
e1 ,
e2 ,
e3 ), the
following equations
x1 − x2 + 2x3 − 3 = 0,
−2x1 + 3x2 + 4x3 + 12 = 0,
6x1 − 8x2 + 5x3 − 7 = 0
represent, respectively, three different planes in R3 .
(a) Show that the planes intersect at a point. Find out the coordinate
of the point.
(b) Try to choose a new coordinate system Γ(O; A1 , A2 , A3 ), in which
the above three planes are the coordinate planes.

6. According to natural coordinate system Γ( 0 ; e1 ,
e2 ,
e3 ), a line and a
plane are given, respectively, as
x1 − 1 x2 + 1 x3
L: = = , and
2 −3 5
Σ: 2x1 + x2 + x3 − 4 = 0.
(a) Show that the line L and the plane Σ intersect at a point. Find out
the coordinate of this point.
(b) Construct another coordinate system Γ(O; A1 , A2 , A3 ) in R3 so
that, according to this new system, the line L is a coordinate axis
and the plane is a coordinate plane. How many choices of such a
new system are possible?

7. Still in Γ( 0 ;
e1 ,
e2 ,
e3 ), a line
x1 − 0 x2 − 2 x3 − 3
L: = =
1 1 2
and two planes
Σ1 : x1 − x2 + x3 = 1,
Σ2 : x1 + 2x2 + x3 = 7
are given.
(a) Show that Σ1 and Σ2 intersect along a line L . Find the equation
of L .
(b) Then, show that L and L intersect at a point. Find the coordinate
of this point.
(c) Construct a new coordinate system Γ(O; A1 , A2 , A3 ) in R3 such
that Σ1 and Σ2 become coordinate planes of the new system which
contains the given line L in its third coordinate plane.
8. Let
x1 − 18x2 + 6x3 − 5 = 0

be the equation of a plane in Γ( 0 ; e1 ,
e2 ,
e3 ). Find its equation in
Γ(O; A1 , A2 , A3 ) as shown in Ex. <A> 4 of Sec. 3.4.
9. (continued from 8) Suppose
3y1 + 4y2 + 7y3 = 1
is the equation of a plane in Γ(O; A1 , A2 , A3 ). Find its equation in

Γ( 0 ;
e1 ,
e2 ,
e3 ).

10. Do the following questions in Γ( 0 ;
e1 ,
e2 ,
e3 ).
(a) Given a point P0 = (3, 4, −1) and two straight lines
−x2 −1
3 = x2 − 1 = −x3 , x1 +1 =
x1
2 = x3 +1, find the plane passing
through the point P0 and parallel to the two straight lines.
(b) Prove that the straight lines x12−1 = x−3
2 +2
= x35−5 and x1 = 7 + 3t,
x2 = 2 + 2t, x3 = 1 − 2t lie on a plane simultaneously. Find the
equation of the plane.
(c) Find a plane passing through the point P0 = (3, −2, 1) and con-
taining the straight line x12+1 = x23−1 = x43 .
(d) Find a plane containing both the line x13−2 = x22+1 = x−23 −3
and the
x1 −1 x2 −2 x3 +3
line 3 = 2 = −2 .

1. Suppose
aij , 1 ≤ i, j ≤ 3 and bij , 1 ≤ i, j ≤ 3
are scalars such that the vectors (a12 , a22 , a32 ) and (a13 , a23 , a33 ),
(b12 , b22 , b32 ) and (b13 , b23 , b33 ) are linearly independent, respectively.
Let
x1 = a11 + a12 t1 + a13 t2 , y1 = b11 + b12 t1 + b13 t2 ,
x2 = a21 + a22 t1 + a23 t2 , y2 = b21 + b22 t1 + b23 t2 , t1 , t2 ∈ R.
x3 = a31 + a32 t1 + a33 t2 , y3 = b31 + b32 t1 + b33 t2 ,
Construct, in R3 , two coordinate systems so that the equation of a

plane with respect to either of the systems is the one or the other given
above. How many such systems are possible?
2. Prove that the relative position of a straight line and a plane in R3 is
independent of choices of coordinate systems concerned.
3. Do the same question as in Ex. 2 for two planes.
4. Give three planes in R3 ,

Σi :
x =
ai + t1 bi1 + t2 bi2 , i = 1, 2, 3, t1 , t2 ∈ R.
Explain graphically all possible relative positions of the three planes

and characterize them by using the vectors ai , bi1 , bi2 , i = 1, 2, 3.
Suppose equations of Σi are changed to coordinate forms as
Σi : αi x1 + βi x2 + γi x3 + δi = 0, i = 1, 2, 3.
Now, use the coefficients αi , βi , γi , δi , i = 1, 2, 3, to characterize the

relative positions described above.
5. Do the same question as in Ex. 2 for three planes.
6. Is it possible to represent the whole space R3 as a union of finitely
many or countably infinite planes? Why? Here, it is required that all

the planes should pass through the origin 0 .
7. In Fig. 3.17, show that the three median planes along the edges

a1 a2 , a3 and
a1 a4 meet along the line passing through
a1 a1 and
1
3 ( a2 + a3 + a4 ).
8. Let V be a vector subspace of R3 and x0 a fixed point in R3 . Call the
set

x0 + V = {
x0 + v | v ∈ V }
an affine subspace of R3 passing through the point x0 , obtained by

moving the vector space V parallelly along the direction x0 until up to
the point
x0 (see Fig. 3.18).
(a) Determine all the affine subspaces of R3 .
(b) What are the equations to represent affine subspaces?
x0 x0 + V
V
0
Fig. 3.18

9. In R3 with natural coordinate system Γ( 0 ; e1 ,
e2 ,
e3 ), one uses
1/2
x | = x21 + x22 + x23
|
to denote the length of a vector
x = (x1 , x2 , x3 ). Then
x |2 = x21 + x22 + x23 = 1
|
is the equation of unit sphere in R3 (see Fig. 3.19).
e3
x
0 e2
e1
Fig. 3.19

(a) Try to find out all the coordinate systems Γ( 0 ; a1 , a3 ) in R3 ,
a2 ,
with basis B = { a1 , a2 , a3 }, so that the unit sphere still has the

equation
y12 + y22 + y32 = 1,
where [ x ]B = (y1 , y2 , y3 ).
(b) Suppose the coordinate system Γ(O; A1 , A2 , A3 ) is the same as in
Ex. <A> 3 of Sec. 3.3. Determine the equation of the unit sphere in
Γ(O; A1 , A2 , A3 ). Then, try to construct a coordinate system
Γ(O; B1 , B2 , B3 ) in which the equation is unchanged. How many
such systems are there?
(c) One views the equation obtained in (b) in the eyes of

Γ( 0 ;
e1 ,
e2 ,
e3 ). Then, what kind of surface does it represent? Try
to graph the surface and compute the volume it encloses. Compare

this volume with that of the unit sphere. What is the ratio? Refer
to Secs. 5.4 and 5.10 if necessary.
10. Points

a1 = (1, 1, 0), a2 = (0, 1, 1), a3 = (1, 0, 1)

in R form a coordinate system
3
with basis B =
Γ( 0 ;
a1 ,
a2 ,
a3 )
{a1 , a3 }. For point X ∈ R3 , let [X]B = (y1 , y2 , y3 ). Then, the
a2 ,
equation
y12 + y22 + y32 = 1
represents a certain kind of surface in R3 (see Fig. 3.20). When viewing

it in Γ( 0 ;
e1 ,
e2 ,
e3 ), what is the surface? What are the equation of
the surface and the volume it encloses? Refer to Secs. 5.4 and 5.10 if
necessary.
a3 X
a2
0
a1
Fig. 3.20

Read Ex. <C> of Sec. 2.5 and try to do problems there if you missed them
at that time. Try to extend as many problems in Ex. as possible to
abstract spaces V over a field F. While Exs. 9 and 10 should be in
Rn or finite-dimensional inner product spaces.
3.6 Affine and Barycentric Coordinates

Contents here are parallel to those in Sec. 2.6. So, only sketch is given as
follows.
Let a0 ,
a1 , a3 be four points in R3 . Then
a2 and

a0 , a1 , a2 a3 are non-coplanar in affine space R3 .
and
a1 −
⇔ The vectors a2 −
a0 , a3 −
a0 , a0 are linearly independent
in the vector space R3 .
In this case,
a0 ,
a1 ,
a2 and
a3 are said to be affinely independent and the
ordered set
B = {
a0 ,
a1 , a3 } or
a2 , {
a1 − a2 −
a0 , a3 −
a0 , a0 } (3.6.1)
an affine basis for R3 with

a0 as the base point. See Fig. 3.21.
a3 x
a2
a0
a3 − a0
a1
a2 − a0
0
a1 − a0
Fig. 3.21
For such an affine basis B, the vectorized space Γ( a0 ;

a1 ,
a2 ,
a3 )
is a geometric model for R . Conversely, for a given vectorized space
3
Γ(a0 ;
a1 , a3 ), the set B = {
a2 , a0 ,
a1 , a3 } is an affine basis for R3 .
a2 ,
Note that the rectangular affine basis

N = {0;
e1 , e3 }
e2 , (3.6.2)
is the standard basis for R3 when considered as a vector space.

Fix an affine basis B = { a0 ,
a1 , a3 } for R3 . For any point
a2 , x ∈ R3 ,
there exist unique scalars x1 , x2 and x3 so that

3
x −

a0 = ai −
xi ( a0 )
i=1

3
⇒
x =
a0 + ai −
xi ( a0 )
i=1
= (1 − x1 − x2 − x3 )
a0 + x1
a1 + x2
a2 + x3
a3
= λ0
a0 + λ1
a1 + λ2
a2 + λ3
a3 , (3.6.3)
where λ0 = 1 − x1 − x2 − x3 , λ1 = x1 , λ2 = x2 and λ3 = x3 . The ordered

quadruple
(
x )B = (λ0 , λ1 , λ2 , λ3 ) where λ0 + λ1 + λ2 + λ3 = 1 (3.6.4)
is called the (normalized) barycentric coordinate of the point

x with respect
to the affine basis B. Occasionally, the associated
[ x −
x ]B = [ a0 ]B = (x1 , x2 , x3 ) = (λ1 , λ2 , λ3 ) (3.6.5)

is called the affine coordinate of x , for simplicity.
In particular,
(
a0 )B = (1, 0, 0, 0) with [
a0 ]B = (0, 0, 0),
(
a1 )B = (0, 1, 0, 0) with [
a1 ]B = (1, 0, 0),
(
a2 )B = (0, 0, 1, 0) with [
a2 ]B = (0, 1, 0),
(
a3 )B = (0, 0, 0, 1) with [
a3 ]B = (0, 0, 1).
Hence,
the coordinate axis
a0
a1 has equation λ2 = λ3 = 0, and

the coordinate plane a0 a1 a2 has equation λ3 = 0,
etc.
There are four coordinate planes. They all together divide the space R3
into 24 − 1 = 15 regions. The one with (+, +, +, +) is the interior of the
base tetrahedron ∆a0
a1
a2
a3 whose barycenter is 14 , 14 , 14 , 14 . See Fig. 3.22.
2 = 3 = 0
a0
a3
a1
(+ , + , + ,+) a2
Fig. 3.22
Notice that ∆ a0
a1
a2
a3 can be easily described as the set of the points
(λ0 , λ1 , λ2 , λ3 ) where λ0 ≥ 0, λ1 ≥ 0, λ2 ≥ 0, λ3 ≥ 0 and λ0 + λ1 +
λ2 + λ3 = 1. In this expression, what does λ3 = 0 mean? How about
λ1 = λ3 = 0?

Let Γ( b0 ; b1 , b2 , b3 ) be another vectorized space in R3 with affine basis

B = { b0 , b1 , b2 , b3 }.
Then, the change of coordinates from Γ( a0 ;
a1 ,
a2 ,
a3 ) to

Γ( b0 ; b1 , b2 , b3 ) is

x − b0 ]B = [
[ a0 − b0 ]B + [ a0 ]B AB
x − B , x ∈ R3

or
(y1 , y2 , y3 ) = (p1 , p2 , p3 ) + (x1 , x2 , x3 )AB

B
or
 
0
 AB 0
(y1 y2 y3 1) = (x1 x2 x3 1) 

B  , (3.6.6)
0
p1 p2 p3 1 4×4
where

x −
[ a0 ]B = (x1 , x2 , x3 ), y − b0 ]B = (y1 , y2 , y3 ),
[

a0 − b0 ]B = (p1 , p2 , p3 ) and
[
   
[ a1 − a0 ]B β11 β12 β13
AB B = [
a2 −
a ]
0 B  = β 21 β22 β23  is the transition matrix.
[ a3 − a0 ]B β31 β32 β33
Just like what (2.6.7) and (2.6.8) indicated, a change of coordinates is a

composite of a linear isomorphism followed by a translation. This is the
typical form of an affine transformation (see Sec. 2.8).
Exercises
<A>
1. Generalize Ex. <A> 1 of Sec. 2.6 to R3 .

2. Generalize Ex. <A> 2 of Sec. 2.6 to R3 .
3. Generalize Ex. <A> 3 of Sec. 2.6 to planes in R3 .
4. State and prove the counterparts for lines in R3 of Ex. <A> 3 in Sec. 2.6.
5. Use the results obtained in Exs. 3 and 4 to prove (3.4.5), (3.5.5) and
(3.5.6).

Try to extend problems in Ex. of Sec. 2.6 to R3 . Be careful that, in
some cases, lines in R2 should be replaced by planes in R3 here.
Try to extend the contents of Ex. to abstract finite-dimensional affine
space over a field F. For partial extension to Rn , see Sec. 5.9.5.
3.7 Linear Transformations (Operators)

All the concepts, definitions concerned, linearly algebraic expositions and
results in Sec. 2.7 are still valid in R3 . Readers are urged to check them
carefully in the realm of R3 , particularly from (2.7.1) to (2.7.4). We will feel
free to use them, including notations such as Ker(f ), Im(f ), in this section.
The section is also divided into eight subsections, each of which as a
counterpart of the same numbered subsections in Sec. 2.7. Results obtained
here are generalized ones of the corresponding results stated in Sec. 2.7, in
the manner that they look much more like linearly algebraic theorems than
those in Sec. 2.7.
Section 3.7.1: Discuss linear operators in N = { e1 , e3 } from different
e2 ,
algebraic and geometric points of view. Main topics are kernels, ranks and
mapping prospects.
Section 3.7.2: Here illustrates basic essential linear operators and their
eigenvalues and eigenvectors, if exist.
Section 3.7.3: Treat some special linear operators independent of bases
for R3 .
Section 3.7.4: Introduce matrix representations of a linear operator with
respect to various bases for R3 and the relations among them.
Section 3.7.5: Various decompositions of a square matrix, such as ele-
mentary matrices, LU and LDU, etc., and right and left and generalized
inverses are discussed by examples.
Section 3.7.6: Put special emphasis on diagonalizable operators and their
characterizations and varieties of their usefulness.
Section 3.7.7: Discuss Jordan canonical forms for operators with coinci-
dent eigenvalues and their applications in solving special kinds of differential
equations.
Section 3.7.8: Discuss the rational canonical forms for operators, espe-
cially these which do not have their eigenvalues all real.
3.7.1 Linear operators in the Cartesian coordinate system

We have previewed some concepts about linear operators in Ex. 4
of Sec. 3.3 and linear transformations among R, R2 and R3 there in
Exs. 5 and 6. Here in this section, we will treat them in great details.
Results obtained are parallel to the contents of Sec. 2.7.1. In case the pro-
cesses to obtain these results are similar to those counterparts in Sec. 2.7.1,
the details will be left to the readers as exercises.
Fix the vector space R3 with its natural basis N = {

e1 , e3 }
e2 ,
(see Fig. 3.8) throughout the whole section.
Just like (2.7.7), we have
The formula for linear operator on R3 (in N )
In N = {
e1 , e3 },
e2 ,
1. f : R3 → R3 is a linear operator.
⇔ 2. There exists a (unique) real 3 × 3 matrix A = [aij ]3×3 such that, for

x = (x1 , x2 , x3 ),
f (
x) =
xA
% 3 &

3
3
= ai1 xi , ai2 xi , ai3 xi
i=1 i=1 i=1
where, for 1 ≤ i ≤ 3, f (
ei ) = (ai1 , ai2 , ai3 ) is the ith row vector
of A, and
   
f ( e1 ) a11 a12 a13
A = f ( e2 ) = a21 a22 a23  , which is [f ]N . (3.7.1)
f (e3 ) 3×3 a31 a32 a33
In case Rm for m = 1, 2, 3 or any positive integer and Rn for n = 1, 2, 3

or any positive integer, any linear transformation f : Rm → Rn can be
expressed as, in terms of natural bases for Rm and Rn ,
f (
x) =
x A, x ∈ Rm .

(3.7.2)
where A is a certain real m × n matrix.
In what follows, we adopt the convention that a real 3 × 3 matrix A will
stand for a linear operator on R3 as

x →
x A. (3.7.3)
So does Am×n act as a linear transformation x → x A from Rm to Rn .
Fix A = [aij ] ∈ M (3; R).
The kernel Ker(A) and the image Im(A) of A are subspaces of R3 . They
could be dimension zero, one, two or three. Moreover,
dim Ker(A) + dim Im(A) = dim R3 = 3, (3.7.4)
where dim Ker(A) is called the nullity of A and dim Im(A) is called the
rank of A, denoted as r(A).
We separate into four cases in the following, according to dim Ker(A).
Case 1 dim Ker(A) = 3 Then Ker(A) = R3 and A = O3×3 is the only

possibility. In this case, Im(A) = { 0 } holds and the rank r(A) = 0.
Case 2 dim Ker(A) = 2 Then A = O3×3 . Also,

(1) Ker(A) is a plane through 0 = (0, 0, 0).

⇔ (2) (algebraic and geometric) The homogeneous equations
x A = 0 , i.e.
the three planes
a1j x1 + a2j x2 + a3j x3 = 0, 1≤j≤3
coincide with a single plane in R3 (refer to (3.5.6)). See Fig. 3.23.
Fig. 3.23
⇔ (3) (algebraic) All subdeterminants of order 2 are equal to zero, namely,

a ai1 j2
det i1 j1 = 0 for 1 ≤ i1 < i2 ≤ 3 and 1 ≤ j1 < j2 ≤ 3
ai2 j1 ai2 j2
and hence det A = 0 (refer to Ex. 4 of Sec. 3.5). For example,
the point (a21 , −a11 , 0) lying on that plane will result in

a11 a12
a12 a21 − a22 a11 = − = 0, etc.
a21 a22
⇔ (4) (linearly algebraic) A has at least one nonzero row vector, say

a1 = (a11 , a12 , a13 ) and there are scalars α and β such that the
other two row vectors a i = (ai1 , ai2 , ai3 ), i = 2, 3, satisfy

a2 = α
a1 and
a3 = β
a1 .
So do the three column vectors of A.

⇔ (5) (linearly algebraic) The row rank and the column rank of A are equal
to 1, i.e.
The maximal number of linearly independent row vectors of A
= the maximal number of linearly independent column vectors of A
= 1.

a1j
⇔ (6) (geometric) In case A∗2 = αA∗1 and A∗3 = βA∗1 where A∗j = a2j
a3j
for j = 1, 2, 3, are column vectors, then the image Im (A) is the
straight line
(1, α, β) = {t(1, α, β) | t ∈ R}
or x2 = αx1 , x3 = βx1 .
⇔ (7) The rank r(A) = 1. (3.7.5)
For a general setting, let the kernel
Ker(A) = x2
x1 ,
and take any vector x3 ∈ R3 which is linearly independent of x1 and

x2 ,
so that B = { x1 , x2 , x3 } forms a basis for R . Then, the range
3
Im(A) =
x3 A

v ,
is a fixed line v = 0, through 0 . It is easy to see that (refer to (3.5.6))
1. A maps each plane x 0 + Ker(A), parallel to Ker(A), into a single point

x0 A on
(or vector) v .
2. A maps each plane u +

u 2 where only one of
u 1, u 1 and u 2 is
linearly dependent on x1 and x2 , intersecting Ker(A) along a line, onto
v .
the whole line
(3.7.6)
See Fig. 3.24.
x0 x0 + Ker(A) Im(A) = 〈〈v〉〉 = 〈〈x3A〉〉
x3 v
A x0 A
x2
Ker(A)
0 x1 0
x3 A
u + 〈〈u1, u2〉〉
Fig. 3.24
Take any two linearly independent vectors v 2 in R3 so that

v 1,

B = { v 1 , v 2 , x3 A} is a basis for R . Then,
3

x1 A = 0 = 0
v 1 + 0
v 2 + 0
x3 A,

x2 A = 0 = 0
v 1 + 0
v 2 + 0
x3 A,

x3 A= 0v 1 + 0
v 2 + 1x3 A,
    
x1 0 0 0 v1
⇒ x2  A = 0 0 0  v2 

x3 0 0 1 x3 A
 
0 0 0
⇒ [A]B B N 
B = PN AQB = 0 0 0 ,
 where
0 0 1
 
x1
B
PN = x2  is the transition matrix from B to N , and

x3
 −1
v1
QN
B =  
v2 is the transition matrix from N to B . (3.7.7)

x3 A
This [A]B
B is the matrix representation of A with respect to bases B and B .

There are infinitely many such B and B that put A into this canonical form.
The quotient space of all affine subspaces (refer to Ex. 8 of Sec. 3.5)
modulus Ker(A)
R3 /Ker(A) = {
x + Ker(A) |
x ∈ R3 } (3.7.8)
is a one-dimensional real vector space which is linearly isomorphic to Im(A).
See Fig. 3.25. Therefore,
dim Ker(A) + dim R3 /Ker(A) = dim R3 = 3.
Case 3 dim Ker (A) = 1 Of course, A = O3×3 . Also,

(1) (geometric) Ker(A) is a line through 0 = (0, 0, 0).
⇔ (2) (algebraic and geometric) The homogeneous linear equations

x A = 0 , i.e. the three planes

3
aij xi = 0 for 1 ≤ j ≤ 3
i=1

intersect along a straight line in R3 , passing through 0 . See Fig. 3.26.
x1 + x2 (x1 + x2) + Ker(A)
x1 x1 + Ker(A) Im(A)
x2 (x1 + x2)A
A x2 A
x2 + Ker(A) x1 A
0 Ker(A)
0
( x1 A )
x1 x1 + Ker(A)
Fig. 3.25
0
0
(a) (b)
Fig. 3.26

3
⇔ (3) (algebraic) In Fig. 3.26(a), suppose the planes i=1 aij xi = 0 for

3
j = 1, 2 coincide while the third plane i=1 ai3 xi = 0 intersects the
former two along a line. Then,

ai1 ai2
det = 0 for 1 ≤ i < j ≤ 3 and at least one of
aj1 aj2

ai1 ai3
det , 1 ≤ i < j ≤ 3, is not equal to zero,
aj1 aj3
and thus det A = 0. In Fig. 3.26(b), at least three of

a ai1 j2
det i1 j1 , 1 ≤ i1 < i2 ≤ 3, 1 ≤ j1 < j2 ≤ 3
ai2 j1 ai2 j2
are not equal to zero. Since the intersecting line of the former two
planes lies on the third one, then

a11 a12 a13

det A = a21 a22 a23
a a32 a33
31

a21 a22 a11 a12 a11 a12

= a13
− a23
+ a33 = 0.
a31 a32 a31 a32 a21 a22
In conclusion, det A = 0 and at least one of the subdeterminents of
order 2 is not equal to zero.
⇔ (4) (linearly algebraic) Two row (or column) vectors of A are linearly
independent and the three row (or column) vectors are linearly
dependent, say

α1 A∗1 + α2 A∗2 + α3 A∗3 = 0∗
for scalars α1 , α2 , α3 , not all zero, where A∗j for j = 1, 2, 3 are
column vectors.
⇔ (5) (linearly algebraic)
The row rank of A = the column rank of A = 2.
⇔ (6) (geometric) Adopt notations in (4). The image Im(A) is the plane
α1 x1 + α2 x2 + α3 x3 = 0.
⇔ (7) The rank r(A) = 2. (3.7.9)
Let
Ker(A) =
x1
and take two arbitrarily linearly independent vectors x3 in R3 so
x 2 and
that B = { x1 , x2 , x3 } forms a basis for R . Then, the range
3
Im(A) =
x2 A,
x3 A

is a fixed plane through 0 . It is easy to see that (refer to (3.5.5))
1. A maps each line x0 + Ker(A), parallel to Ker(A), into a single point

x0 A on Im(A).
2. A maps each line v0 + u , where
u is linearly independent of
x1 , not
parallel to Ker(A), onto a line on Im (A).
3. A maps each plane, parallel to Ker(A), into a line on Im(A). When will

the image line pass 0 , i.e. a subspace?
4. A maps each plane, not parallel to Ker(A), onto Im(A) in a one-to-one

manner.
(3.7.10)
See Fig. 3.27.
v0
Im(A)
x0 Ker(A) v0 A
x1 A
x2 A 0
0 x0 A
x3 x3 A
x2 (a)
Σ Σ2
l2 Ker(A) Im(A) = Σ′
Σ1
A
l1 0 0 l2′
Σ′2
Σ1′ l1′
(b)
Fig. 3.27
Take any nonzero vector v in R3 so that B = {v,x2 A,

x3 A} is a basis
for R . Then the matrix representation of A with respect to B and B is
3
 
0 0 0
[A]B B N 
B = PN AQB = 0 1 0 ,

0 0 1
   −1
x1 v
B
where PN x2  and QN
=  B =  
x2A . (3.7.11)

x3 x3 A
The quotient space of R3 modulus Ker(A),
R3 /Ker(A) = {
x + Ker(A) |
x ∈ R3 } (3.7.12)
is a two-dimensional vector space which is isomorphic to R2 . Also
dim Ker(A) + dim R3 /Ker(A) = dim R3 = 3.
See Fig. 3.28.
x1 + x2 ( x1 + x2 ) + Ker(A)
x1 x1 + Ker(A) Im(A)
x2
A x2 A ( x1 + x2 ) A
Ker(A)
x2 + Ker(A)
( x1 A) x1 A
0 A 0
x1 + Ker(A)
x1
Fig. 3.28
Case 4 dim Ker(A) = 0 Then A = O3×3 . Also,

(1) Ker(A) = { 0 }, i.e. A is one-to-one.
⇔ (2) A is onto, i.e. Im(A) = R3 .

⇔ (3) (algebraic and geometric) The homogeneous equation
x A = 0 has

only zero solution 0 , i.e. the three planes

3
aij xi = 0 for 1 ≤ j ≤ 3
i=1

meet at only one point, namely, the origin 0 . See Fig. 3.29.
Fig. 3.29
⇔ (4) (algebraic) The (nonzero) direction

a21 a22 a31 a32 a11 a12
,
a31 a32 a11 a12 , a21 a22
of the intersection line of the first two planes does not lie on the
third plane. This is equivalent to
det A = 0.
⇔ (5) (linearly algebraic) Three row (or column) vectors of A are linearly
independent.
⇔ (6) (linearly algebraic)
The row rank of A = the column rank of A = 3.
⇔ (7) The rank r(A) = 3.
⇔ (8) (linearly algebraic) A is invertible and hence A: R3 → R3 , as a linear
operator, is a linear isomorphism.
⇔ (9) (linearly algebraic) A maps every or a basis { x1 , x3 } for R3 onto
x2 ,
a basis { x1 A, x2 A, x3 A} for R .
3
(3.7.13)
In (9), let B = { x2 ,
x1 , x3 } and B = {x1 A,
x2 A,
x3 A} which are bases
for R . Then the matrix representation of A with respect to B and B is
3
 
1 0 0
[A]B B N 
B = PN AQB = 0 1 0 = I3 ,

0 0 1
   −1
x1 x1 A
B
where PN x2  and QN
=  B =  
x2 A . (3.7.14)

x3 x3 A
As a counterpart of (2.7.9), we have
The general geometric mapping properties of an
invertible linear operator
Let A = [aij ]3×3 be an invertible real matrix. Then, the linear isomorphism
A: R3 → R3 defined by x → x A and the affine transformation T : R3 → R3

defined by T ( x ) = x0 + x A (see Sec. 3.8.3) both preserve:
1. Line segment (interior points to interior points) and line.
2. Triangle and parallelogram (interior points to interior points) and plane.
3. Tetrahedron and parallelepiped (interior points to interior points).
4. The relative positions (see (3.4.5), (3.5.5) and (3.5.6)) of straight lines
and planes.
5. Bounded set.
6. The ratio of signed lengths of line segments along the same or parallel
lines.
ratio of solid volumes.

7. The
Note Let a i = (ai1 , ai2 , ai3 ), i = 1, 2, 3, be row vectors of A. Denote

by a 1 a 2 a 3 the parallelepiped (see Fig. 3.2) with vertex at 0 and
side vectors a 1,
a 2 and
a 3 , i.e.
3 /

a1 a2 a3 = λi a i | 0 ≤ λi ≤ 1 for 1 ≤ i ≤ 3 .

i=1
Then (for details, see Sec. 5.3)

the signed volume of a1 a2 a3
= det A.
the volume of e1 e2 e3
They do not necessarily preserve:
a. Angle.
b. (linear) Length.
c. (planar) Area.
d. (solid) Volume, except det A = ±1.
e. Directions or orientations (see Fig. 3.3): preserving the orientation if
det A > 0 and reversing the orientation if det A < 0.
(3.7.15)
Proofs are left to the readers. In case difficulties arise, just review Sec. 2.8.3
carefully and model after proofs there or see Sec. 3.8.3.
Exercises
<A>
1. Prove (3.7.5) and (3.7.6) in detail.
2. Prove (3.7.8) and (3.7.12).
3. Prove (3.7.9) and (3.7.10) in detail.
6. Let A = [aij ] ∈ M(3; R) be a nonzero matrix, considered as the
linear operator x → x A. Denote by A1∗ , A2∗ , A3∗ the three row
vectors of A and A∗1 , A∗2 , A∗3 the three column vectors of A. Let

x = (x1 , x2 , x3 ),
y = (y1 , y2 , y3 ). Notice that

x A = x1 A1∗ + x2 A2∗ + x3 A3∗
= (
x A∗1 ,
x A∗2 ,
x A∗3 ) =
y.
The following provides a method of how to determine the kernel Ker(A)
and the range Im(A) by intuition or inspection.

(a) Suppose r(A) = 1. We may suppose that A1∗ = 0 and A2∗ = α2 A1∗
and A3∗ = α3 A1∗ for some scalars α2 and α3 . Then

x1 A1∗ + x2 A2∗ + x3 A3∗ = (x1 + α2 x2 + α3 x3 )A1∗ = 0
⇔ x1 + α2 x2 + α3 x3 = 0,

which is the kernel Ker(A). We may suppose that A∗1 = 0∗ and
A∗2 = β2 A∗1 and A∗3 = β3 A∗1 for some scalars β2 and β3 . Then
A∗2 = β2 A∗1 , A∗3 = β3 A∗1

⇔ x A∗2 = β2 x A∗1 ,

x A∗3 = β3
x A∗1 x ∈ R3 .
for all
⇔ y2 = β2 y1 , y3 = β3 y1 ,
which is the image Im(A).

(b) Suppose r(A) = 2. May suppose that A1∗ and A2∗ are linearly inde-
pendent and A3∗ = α1 A1∗ + α2 A2∗ . Then,

x1 A1∗ + x2 A2∗ + x3 A3∗ = (x1 + α1 x3 )A1∗ + (x2 + α2 x3 )A2∗ = 0 .
⇔ x1 + α1 x3 = 0, x2 + α2 x3 = 0,
which is Ker(A). May suppose that A∗1 and A∗2 are linearly inde-
pendent and A∗3 = β1 A∗1 + β2 A∗2 . Then,
A∗3 = β1 A∗1 + β2 A∗2

⇔
x A∗3 = β1
x A∗1 + β2
x A∗2 x ∈ R3 .
for all
⇔ y3 = β1 y1 + β2 y2 ,
which is Im(A).
Try to determine Ker(A) and Im(A) for each of the following matrices:
   
6 −10 2 12 0 −2
A = −3 5 −1 ; A =  0 4 6 .
−9 15 −3 3 −1 −2
7. Formulate the features for Ker(A) and Im(A) and the geometric mapping
properties for each of the following linear transformations, in the natural
bases for spaces concerned.
(a) f: R → R3 defined by f ( x A, A ∈ M(1; 3; R).
x) =
(b) f: R → R defined by f (
2 3
x A, A ∈ M(2; 3; R).
x) =
(c) f: R3 → R defined by f ( x A, A ∈ M(3; 1; R).
x) =
(d) f: R3 → R2 defined by f ( x A, A ∈ M(3; 2; R).
x) =

The system of non-homogeneous linear equations in three unknowns

x1 , x2 , x3

3
aij xi = bj , j = 1, 2, 3
i=1

has a solution if and only if the constant vector b = (b1 , b2 , b3 ) lies on the
range Im(A) of the linear operator
f (
x) =
x A, A = [aij ]3×3 .
This is equivalent to saying that the coefficient matrix A and its augmented
∗
matrix [A | b ] have the same rank, i.e.
* ∗
+
r(A) = r [A | b ] .
1. Try to use these concepts and methods to redo Ex. 4 of Sec. 3.5
but emphasize the following aspects.
(a) Find all relative positions of the three planes and characterize each
case by using ranks and determinants.
(b) What kinds of relative positions will guarantee the existence of a
solution? And how many solutions are there?
2. Try to use Ex. 1 to solve the following sets of equations and graph these
planes concerned, if possible.
(a) 5x1 + 3x2 + 7x3 = 4; (b) x1 − 3x2 + x3 = 1;

3x1 + 26x2 + 2x3 = 9; 2x1 − 3x2 = 4;
7x1 + 2x2 + 10x3 = 5. x2 + 2x3 = 0.
(c) 2x1 + 6x2 + 11 = 0; (d) 2x1 + x2 = 2;
6x1 + 20x2 − 6x3 + 3 = 0; 3x2 − 2x3 = 4;
6x1 − 18x3 + 1 = 0. x2 + 3x3 = 1.
(e) x1 + x2 + x3 = 6; (f) x1 + 2x2 − x3 = 5;
3x1 − 2x2 − x3 = 7; 3x1 − x2 + 2x3 = 2;
x1 − 4x2 − 3x3 = −5. 2x1 + 11x2 − 7x3 = −2.
(g) 3x1 + x2 − 3x3 = 0;
2x1 + 2x2 − 3x3 = 0;
x1 − 5x2 + 9x3 = 0.
3. Solve the equations:
(a + 3)x1 − 2x2 + 3x3 = 4;

3x1 + (a − 3)x2 + 9x3 = b;
4x1 − 8x2 + (a + 14)x3 = c,
where a, b, c are constants.
Do problems in Ex. <C> of Sec. 2.7.1.

Results in (3.7.5), (3.7.9) and (3.7.13), and the methods to obtain them
can be unified and generalized to linear transformation f : Fm → Fn or
matrix A[aij ] ∈ M(m, n; F). We proceed as follows.
An (m − 1)-dimensional subspace S of Fm is called a hypersubspace and
its image
x0 + S, under the translation x → x0 + x , a hyperplane which is
an (m − 1)-dimensional affine subspace of F . These two geometric objects
m
can be characterized through the concept of linear functional on Fm .

Suppose { x m−1 } is any fixed basis for the hypersubspace S and
x1 , . . . ,
x m−1 , −
B = { x1 , . . . ,

x m } is a basis for Fm . Then the linear functional
f : F → F defined by
m

0, 1 ≤ i ≤ m − 1
f (
x i) =
1, i = m
has precisely its
Ker(f ) = S;
Im(f ) = F.
To find the expression of f in terms of the natural basis N = {e1 , . . . , e m}

m
for F , let ei = j=1 aij xj for 1 ≤ i ≤ m, namely, [ ei ]B = (ai1 , . . . , aim ).
m
Then

m

x = (x1 , . . . , xm ) = xi
ei
i=1
 

m
m
m
m
⇒ f (
x) = xi f (
ei ) = xi  x j ) =
aij f ( aim xi .
i=1 i=1 j=1 i=1
Note that at least one of a1m , a2m , . . . , amm is not equal to zero. As a
consequence,

m
the hypersubspace S: aim xi = 0, and
i=1

m
the hyperplane
x0 + S: aim xi = b, (3.7.16)
i=1

m
where b = i=1 aim xi0 if
x0 = (x10 , x20 , . . . , xm0 ).
Give a linear functional f : Fm → F. Since
dim Ker(f ) + dim Im(f ) = dim Fm = m,
it follows that if f is nonzero, then dim Ker(f ) = m − 1 and Ker(f ) is a

hypersubspace. Also, for any (nonzero) subspace V of Fm , the dimension
identity
dim V ∩ Ker(f ) + dim(V + Ker(f )) = dim V + dim Ker(f )

dim V, if V ⊆ Ker(f )
⇒ dim V ∩ Ker(f ) =
dim V − 1, if f is nonzero and V ⊆ Ker(f ).
(3.7.17)
For a nonzero matrix A = [aij ] ∈ M(m, n; F), when considered as a

linear transformation from Fm into Fn defined by
x → x A, then
dim Ker(A) + dim Im(A) = dim Fm = m, and

4
n
Ker(A) = Ker(fj ), (3.7.18)
j=1

m
where fj (
x ) = i=1 aij xi are the linear functionals determined by the jth
column vector of A, for each j, 1 ≤ j ≤ n.
Try to do the following problems.

1. What peculiar vectors Ker(fj ), 1 ≤ j ≤ n, might possess?
(a) Suppose dim Ker(f1 ) = m − 1 and a11 = 0, for simplicity. Show that
the following vectors
(a21 , −a11 , 0, . . . , 0),
(a31 , 0, −a11 , . . . , 0),
..
.
(am1 , 0, . . . , 0, −a11 )
form a basis for Ker(f1 ). What happens if ak1 = 0 for some k, where
1 ≤ k ≤ m? These facts are still valid for any Ker(fj ), 1 ≤ j ≤ n.
(b) For 1 ≤ i1 < i2 < i3 ≤ m, Ker(f1 ) ∩ Ker(f2 ) contains the vector

ai2 1 ai2 2 ai3 1 ai3 2
0, . . . , 0, det , 0, . . . , 0, det ,
ai3 1 ai3 2 ai1 1 ai1 2
↑ ↑
i1 th coordinate i2 th coordinate

a ai1 2
0, . . . , 0, det i1 1 , 0, . . . , 0
ai2 1 ai2 2
↑
i3 th coordinate
which might be denoted as v i1 i2 i3 . Do you see why? Note that

it could happen that v i1 i2 i3 = 0 . But when? When all such

v i1 i2 i3 = 0 ?
(c) For each 1 ≤ i1 < i2 < · · · < ik < ik+1 ≤ m, the subspace
5k
j=1 Ker(fj ) contains the vector v i1 i2 ···ik ik+1 whose
il th coordinate
 
ai1 1 ai1 2 ··· ai1 ,l−1 ai1 ,l+1 ··· ai1 ,k+1
 . .. .. .. .. 
 .. 
 . . . . 
 
l+1  ail−1 ,1 ail−1 ,2 · · · ail−1 ,l−1 ail−1 ,l+1 · · · ail−1 ,k+1 
= (−1) det  
 ail+1 ,1 ail+1 ,2 · · · ail+1 ,l−1 ail+1 ,l+1 · · · ail+1 ,k+1 
 
 .. .. .. .. .. 
 . . . . . 
aik+1 ,1 aik+1 ,2 · · · aik+1 ,l−1 aik+1 ,l+1 · · · aik+1 ,k+1
for 1 ≤ l ≤ k + 1, and
pth coordinate = 0 for p = il , 1 ≤ p ≤ m and 1 ≤ l ≤ k + 1.

Try to Figure out when v i1 i2 ···ik ik+1 = 0 for some 1 ≤ i1 <
i2 < · · · < ik < ik+1 ≤ m.
(Note The inductive processes in (a), (b) and (c) all together sug-
gest implicitly an inductive definition for determinant of order m over a
field F.)
5n
2. What are the possible dimensions for j=1 Ker(fj ) and how to charac-
terize them in each case? Here we suppose that, for each 1 ≤ j ≤ n,
dim Ker(fj ) = m − 1 holds.
(a) In case n = 2, the following are equivalent:
(1) The hypersubspaces Ker(f1 ) and Ker(f2 ) coincide, i.e.
Ker(f1 ) = Ker(f2 ) holds.
(2) There exists a nonzero scalar α so that f2 = αf1 .
(3) The ratios of the corresponding coefficients in

m
m
Ker(f1 ): i=1 ai1 xi = 0 and Ker(f2 ): i=1 ai2 xi = 0 are equal,
namely,
a11 a21 am1
= = ··· = .
a12 a22 am2
(4) All the submatrices of order 2 of the matrix

 
a11 a12
 a21 a22 
 
A= . .. 
 .. . 
am1 am2
have zero determinant, namely, for 1 ≤ i1 < i2 ≤ m,

ai1 1 ai1 2
det = 0.
ai2 1 ai2 2
(5) The two column vectors of A are linearly dependent, and the
maximal number of linearly independent row vectors of A is 1.
Then, what happens if Ker(f1 ) = Ker(f2 ), i.e. dim (Ker(f1 ) ∩
Ker(f2 )) = m − 2?
53
(b) In case n = 3, the dimensions for j=1 Ker(fj ) could be n − 1, n − 2
or n − 3 which is, respectively, the case (3.7.5), (3.7.9) or (3.7.13)
if n = 3. Try to figure out possible characteristic properties, both
algebraic and geometric.
(c) Suppose the matrix A = [aij ]m×n has a submatrix of order k whose
determinant is not equal to zero. Say, for simplicity,
 
a11 a12 · · · a1k
a21 a22 . . . a2k 
 
det  . ..  = 0.
 .. . 
ak1 ak1 . . . akk
Then it follows easily that the first k column vectors A∗1 , . . . , A∗k
of A should be linearly independent. So are the first k row vectors
A1∗ , . . . , Ak∗ . For
∗
α1 A∗1 + · · · + αk A∗k = 0 , where 0 ∈ Fm .
 
a11 . . . ak1
 ..  =
⇒ (α1 · · · αk )  ...

.  0 , where 0 ∈ F .
k
a1k . . . akk

⇒ (α1 , . . . , αk ) = 0 in Fk .
Furthermore, in this case,
4
k
dim Ker(fj ) = m − k.
j=1
To see this, one might use (3.7.17) k times by noting that

A∗1 , . . . , A∗k are linearly independent. Or, we might observe that
4
k
x = (x1 , . . . , xm ) ∈

Ker(fj )
j=1

m
⇔ aij xi = 0 for 1 ≤ j ≤ k.
i=1

k
m
⇔ aij xi = − aij xi for 1 ≤ j ≤ k.
i=1 i=k+1
⇔ (x1 · · · xk )
  −1
ak+1,1 . . . ak+1,k a11 ... a1k
 .. ..   .. ..  ,
= −(xk+1 · · · xm )  . .  . . 
am1 ... amk ak1 . . . akk
5k
which indicates immediately that dim j=1 Ker(fj ) = m − k. Also,
Ex. 1 might be helpful in handling this problem.
5k
(d) (continuation of (c)) Conversely, suppose dim j=1 Ker(fj ) = m−k,
then there exists at least one 1 ≤ i1 < i2 < · · · < ik ≤ m so that
 
ai1 1 . . . ai1 k
 ..  = 0.
det  ... . 
aik 1 . . . aik k
Suppose on the contrary that all such submatrices of order k have

zero determinant. Let p be the largest integer so that there exists
at least one submatrix of order p of the matrix [A∗1 · · · A∗k ]m×k ,
having nonzero determinant. Then p < k holds. And the argument
5k
in (c) shows that j=1 Ker(fj ) should have dimension m − p rather
than m − k, a contradiction.
(e) Try to use or model after (c) and (d) to characterize the case that
5k
dim j=1 Ker(fj ) = m − p, where 1 ≤ p ≤ k.
As a conclusion, we figure out our final result as (adopt notations in

(3.7.18)):
1. The linear transformation A: Fm → Fn has its kernel the dimension

dim Ker(A) = m − r.
⇔ 2. (geometric) The n hypersubspaces Ker(fj ), 1 ≤ j ≤ n, intersect into
5n
a (m − r)-dimensional subspace j=1 Ker(fj ) = Ker(A).
⇔ 3. (algebraic) r is the largest integer so that some r × r submatrix of A
has a nonzero determinant.
⇔ 4. (geometric and linearly algebraic)
The maximal number of linearly independent row vectors of A

= The maximal number of linearly independent column vectors
of A
= r,
the former is called the row rank of A and the latter the column
rank of A.
⇔ 5. (algebraic) row rank of A = column rank of A = r.
⇔ 6. (linearly algebraic) the rank of A = r = dim Im(A), denoted as r(A);
in case r(A) = m ≤ n,

⇔ 7. A is one-to-one, i.e. Ker(A) = { 0 }; in case r(A) = n ≤ m,
⇔ 8. A is onto, i.e. Im(A) = Fn ; in case m = n and r(A) = n,
⇔ 9. A is a linear isomorphism on Fn .
⇔ 10. A maps every or a basis for Fn onto a basis for Fn .
(3.7.19)
For other approaches to these results, refer to Exs. 2 and 24 of Sec. B.7,
see also Secs. B.8 and 5.9.3.
Let V and W be finite-dimensional vector spaces over the same field.
Suppose dim V = m, and B is a basis for V , while dim W = n and C is
a basis for W . The mapping Φ: V → Fm defined by Φ( x ) = [x ]B , the
coordinate vector of x ∈ V with respect to B, is a linear isomorphism

(refer to (2.3.2) and (3.2.1)). Similarly, Ψ: W → Fn defined by Ψ(y ) = [ y ]C

is a linear isomorphism. Let N and N be natural bases for F and Fn ,
m
respectively. For each f ∈ L(V, W ), define g ∈ L(Fm , Fn ) by g([ x ]B ) =

x )]C . Then g ◦ Φ = Ψ ◦ f reflects the commutative of the following
[f (
diagram
f
V → W
Φ↓ ↓Ψ.
Fm → Fn
g
Since [Φ]B N B C
N [g]N = [f ]C [Ψ]N (see Secs. 2.7.3 and 3.7.3), we can recapture
various properties in (3.7.19) for f via [g]N B
N , and hence, via [f ]C .
3.7.2 Examples
It is supposed or suggested that readers are familiar with those basic exam-
ples presented in Sec. 2.7.2 and general terminologies, such as
1. eigenvalues, eigenvectors, characteristic polynomial,

2. line of invariant points, invariant line, and
3. Caylay–Hamilton theorem,
stated in the later part of that section.

Give a nonzero matrix A = [aij ] ∈ M(3; R), considered as a linear
operator x → x A on R3 . According to the rank r(A) = 1, 2, 3 and (3.7.7),
(3.7.11), (3.7.14), A has the following respective canonical form

     
1 0 0 1 0 0 1 0 0
0 0 0 , 0 1 0 , 0 1 0 .
0 0 0 0 0 0 0 0 1
For operator A with rank r(A) = 1 or r(A) = 2, its algebraic and geomet-
ric mapping properties are essentially the same as operators on R2 , after
choosing a fixed basis B for R3 where at least one of the basis vectors is
from its kernel. Therefore, we will focus on examples here on operators of
rank 3.
Example 1 The operator (compare with Example 4 in Sec. 2.7.2)

 
λ1 0 0
A=0 λ2 0 , where λ1 λ2 λ3 = 0
0 0 λ3
has, in N = {
e1 , e3 }, the properties:
e2 ,
1. ei for i = 1, 2, 3. So
ei A = λi ei is an eigenvector of A corresponding to
the eigenvalue λi and ei is an invariant line (subspace) of A.

2. In case λ1 = λ2 = λ3 = λ, then
A = λI3
is called an enlargement with scale λ. In this case, A keeps every line

passing 0 invariant.
3. In case λ1 = λ2 = λ3 = 1, A is called a one-way stretch along the
invariant line e1 , while e3 is the plane of invariant points.
e2 ,
4. In case λ1 = λ2 = λ3 = 1, A is called a two-way stretch along e1 and
e2 , while e3 is the line of invariant points.

5. In case λ1 = λ2 = λ3 = 1, A is simply called a stretch or a three-way

stretch and 0 is the only invariant point.
This is the simplest linear operator among all. It just maps a vector x =
(x1 , x2 , x3 ) into the vector
x A = (λ1 x1 , λ2 x2 , λ3 x3 ). See Fig. 3.30.
xA
3 x3 e3
x3 e3
1 x1 e1
e3
x
0 e2 x2 e2 2 x2 e2
e1
x1 e1
1 < 0, 2 > 0, 3 > 0
Fig. 3.30
If A is the matrix representation of a linear operator B on R3 with respect

to a certain basis B = {x1 , x3 }, namely,
x2 ,
   
λ1 0 0 x1
−1 
[B]B = PBP = A = 0 λ2 0 , where P = x2 ,   
(3.7.20)

0 0 λ3 x3
then the action of A = [B]B on a vector [ x ]B = (α1 , α2 , α3 ) can be similarly
illustrated as in Fig. 3.31. In this case, B is called diagonalizable.
33x3
3x3
x3 [ x]B A [ x]B
22x2 2x2
0 x2
1x1
11x1
[ x ]B =(1, 2, 3) x1
1 > 0, 2 < 0, 3 > 0
Fig. 3.31

 
0 a 0
A =  b 0 0 , with abc = 0
0 0 c
has the following properties: The characteristic polynomial is

−(t − c)(t2 − ab).
√ √
(1) Case 1 ab > 0 and c = ab or − ab
eigenvalues eigenvectors
√
λ1 = ab
v1 = ( |b|, |a|, 0)
√
λ2 = − ab
v2 = (− |b|, |a|, 0)

λ3 = c v3 = e3 = (0, 0, 1)
B = { v1 , v3 } is a basis for R3 and the axes
v2 , v1 ,
v2 ,
v3 are
invariant lines.
1. Since
  
0 1 0 b 0 0
A = 1 0 0 0 a 0
0 0 1 0 0 c
in N = {
e1 , e3 }, A can be decomposed as
e2 ,
 
0 1 0
x = (x1 , x2 , x3 ) →

x 1 0 0 = (x2 , x1 , x3 )
0 0 1
 
b 0 0
→ (x2 , x1 , x3 ) 0 a 0 = (bx2 , ax1 , cx3 ) =
xA
0 0 c
where the first mapping is a reflection with respect to the plane x1 = x2

while the second one is a stretch (see Example 1). See Fig. 3.32 (and
Fig. 2.49).
2. In B = {v1 , v3 },
v2 ,
√   
ab √ 0 v1
[A]B = PAP = −1  − ab  
, where P = v2 . 

0 c v3
Refer
√ to Fig.√3.31 (and compare with Fig. 2.48). What happens if
c = ab or − ab.
(2) Case 2 ab < 0

A has only one real eigenvalue c with associated eigenvector e3 . Thus
e3 is the only invariant line (subspace). Suppose a > 0 and b < 0,

then −ab > 0 and

     
0 a 0 −1 0 0 1 0 0 0 a 0
A = −b 0 0  0 1 0 = 0 −1 0 −b 0 0 .
0 0 c 0 0 1 0 0 1 0 0 c
Therefore, in N , A can be decomposed as

    
0 a 0 0 a 0 −1 0 0
x −b 0 0 →
x = (x1 , x2 , x3 ) →

x −b 0 0  0 1 0 =
xA
0 0 c 0 0 c 0 0 1
where the first mapping is of type as in (1) while the second one is a
reflection with respect to the coordinate plane e3 . See Fig. 3.33
e2 ,
(and Fig. 2.49).
xA = (bx2, ax1, cx3)

(x2, x1, x3)
x e3 (bx2, ax1, x3)
e2
0
e1
(x1, x2, 0) x1 = x2
Fig. 3.32

Note A satisfies its characteristic polynomial, namely

(A − cI3 )(A2 − abI3 ) = A3 − cA2 − abA + abcI3 = O3×3 .
Remark Other matrix representations for the case that ab < 0

For simplicity, replace b by −b and then suppose b > 0 in A. In this case,
A satisfies (A − cI3 )(A2 + abI3 ) = O. By computation,
   
−ab 0 0 0 0 0
A2 =  0 −ab 0  ⇒ (A2 + abI3 ) = 0 0 0 .
2 2
0 0 c 0 0 c + ab
(−bx2, ax1, cx3)

x1 = x2
(x2, x1, x3)
e1
x invariant line
(x1, x2, 0)
e2 0 e3
〈〈e2 , e3〉〉
xA
Fig. 3.33
Since c2 + ab > 0, r(A2 + abI3 ) = 1. As we have experienced in the example

in Sec. 2.7.8, let us consider vector
x satisfying

x A = 0 ,

x (A2 + abI3 ) = 0 .
Such a x is not an eigenvector of A corresponding to c. Hence,
x ∈

e2 should hold. Actual computation shows that any vector
e1 , x in

e1 , e2 satisfies x (A + abI3 ) = 0 and conversely. Choose an arbitrarily
2
fixed nonzero v ∈ e2 . Then

e1 , v and
v A are linearly independent,
since

α
v + β
vA = 0

⇒ α
vA + β v A − abβ
v A2 = α v = 0

⇒ (α2 + abβ 2 )
v = 0

⇒ (since v = 0 and ab > 0) α = β = 0.
Therefore, B = {
v, e3 } forms a basis for R3 . In B,
v A,

vA = 0·
v +1·
vA+0·
e3
( v A2 = −ab
vA)A = v = −ab
v +0·
vA+0·
e3

e3 A = c
e3
     
0 a 0 0 1 0 v
⇒ [A]B = P −b 0 0 P −1 = −ab 0 0 , where P = 
vA.

0 0 c 0 0 c e3
(3.7.21)
This is the rational canonical form of A (for details, see Sec. 3.7.8). The
action of [A]B on [ x ]B is illustrated similarly as in Fig. 3.32 if a and −b are
replaced by 1 and −ab respectively, and e1 and
e2 are replaced by v and

v A which are not necessarily perpendicular to each other.
For another matrix representation, we adopt the method indicated in
Ex. 5 of Sec. 2.7.2. In what fallows, the readers are supposed to
have
√ basic knowledge about complex numbers. A has complex eigenvalues
± ab i. Let x = (x1 , x2 , x3 ) ∈ C3 be a corresponding eigenvector. Then

√
x A = ab i x
√
⇔ (−bx2 , ax1 , cx3 ) = ab i (x1 , x2 , x3 )
√ √
⇔ bx2 = − ab i x1 , ax1 = ab i x2 ,
√
cx3 = ab i x3 (recall that c2 + ab > 0)
√ √
⇔ x = t( bi, a, 0) for t ∈ C.
√ √ √ √
Note that ( bi, a, 0) = (0, a, 0) + i( b, 0, 0). By equating the real and
the imaginary parts of both sides of
√ √ √ √ √
(0, a, 0) + i( b, 0, 0) A = ab i (0, a, 0) + i( b, 0, 0)
√ √ √
⇒ 0, a, 0 A = − ab b, 0, 0 ,
√ √ √
b, 0, 0 A = ab 0, a, 0 .

Combined with √ e3 A = c e3 , we obtain the√matrix representation of A in
√ √
the basis C = {( b, 0, 0), (0, a, 0), e3 } = { b
e1 , a e3 }
e2 ,
   √ 
0 a 0 √0 ab 0
[A]C = Q −b 0 0 Q−1 = − ab 0 0
0 0 c 0 0 c
  √ 
√  0 1 0 b 0 0
√
= ab −1 0 0 
 , where Q =  0 a 0 . (3.7.22)
c
0 0 √ab 0 0 1
[A]C can be decomposed as, in C,
    
0 1 0 0 1 0 1 0 0
 0 
x ]C → [
[ x ]C −1 0 0 → [ x ]C −1 0 0 0 1 
0 0 1 0 0 1 0 0 √c
ab
 
√  0 1 0
→ [ x ]C ab 

−1 0 0 
 = [ x ]C [A]C = [ x A]C ,
√c
0 0 ab
where
√ the first mapping is a rotation through 90◦ in the coordinate plane
√
b e1 , a e2 , the second one is a one-way
√ stretch along
e3 and the
third one is an enlargement with scale ab. The readers are urged to illus-
trate these mappings graphically.

 
a 0 0
A =  b c 0 , where abc = 0
0 0 1
Case 1 a = c = 1

λ1 = a v1 =e1
λ2 = c
v2 = (b, c − a, 0)

λ3 = 1 v3 =e3
In the basis B = {
v1 , v3 }, A has the representation
v2 ,
   
a 0 0 v1
[A]B = P AP −1 = 0 c 0 , v2  .
where P = 

0 0 1 v3
Refer to Fig. 3.31 for the mapping properties of A in B.
Case 2 a = c = 1

λ1 = a v1 =e1
λ2 = 1, 1
v2 = (b, 1 − a, 0) and
v3 =
e3
In the basis B = {
v1 , v3 },
v2 ,
     
a 0 0 a 0 0 v1
[A]B = P  b 1 0 P −1 = 0 1 0 , v2  .
where P = 

0 0 1 0 0 1 v3
Refer to Fig. 3.31.

Case 3 a = c = 1
1 is the only eigenvalue of A with associated eigenvectors
e1 and
e3 .
1. Since b = 0,
 
1 0 0
A =  b 1 0
0 0 1
is not diagonalizable.
2. The coordinate plane e3 is the plane (subspace) of invariant
e1 ,
points of A.
3. Since x = (x1 , x2 , x3 ) →
x A = (x1 + bx2 , x2 , x3 ), A moves a point x
along a line parallel to the axis e1 through a distance with a constant
proposition b to its distance x2 from the e1 axis, i.e.
(x1 + bx2 ) − x1
=b
x2
to the point x A. This A is called a shearing along the plane e3
e1 ,
with scale factor b. See Fig. 3.34.
4. Hence, every plane v + e3 parallel to
e1 , e3 is an invariant
e1 ,
plane.
e2
e2
0 e1 0 e1
x e3
xA
e3
Fig. 3.34
Case 4 a = c = 1
a and 1 are eigenvalues of A with respective eigenvectors
e1 and
e3 .
1. A satisfies its characteristic polynomial −(t − a)2 (t − 1), i.e.
(A − aI3 )2 (A − I3 ) = O3×3 .
2. Since (A − aI3 )(A − I3 ) = O3×3 , A is not the diagonalizable (refer to
Ex. <C> 9 of Sec. 2.7.6 or Secs. 3.7.6 and 3.7.7).
3. Since
        
a 0 0 a 0 0 1 0 0 a 0 0 0 0 0
A =  b a 0 = 0 a 0  ab 1 0 =  0 a 0 +  b 0 0 ,
0 0 1 0 0 1 0 0 1 0 0 1 0 0 0
therefore A can be decomposed, in N , as an enlargement on the coordi-

nate plane e2 with scale a followed by a shearing along
e1 , e3
e1 ,
with scale ab . See Fig. 3.34.
Remark Jordan canonical form of a matrix

Let
 
λ1 0 0
B=b λ1 0
a c λ2
where b = 0 and λ1 = λ2 . B has characteristic polynomial −(t−λ1 )2 (t−λ2 )

and hence has eigenvalues λ1 , λ1 and λ2 . Since
  
0 0 0 λ1 − λ 2 0 0
(B − λ1 I3 )(B − λ2 I3 ) =  b 0 0  b λ 1 − λ2 0
a c λ2 − λ 1 a c 0
 
0 0 0
= b(λ1 − λ2 ) 0 0 = O3×3 ,

bc 0 0
B is not diagonalizable. As we have experienced in Sec. 2.7.7, consider
 
0 0 0
(B − λ1 I3 )2 =  0 0 0 
bc + a(λ2 − λ1 ) c(λ2 − λ1 ) (λ2 − λ1 )2

and take a vector v2 (B − λ1 I3 )2 = 0 but
v2 satisfying v2 (B −λ1 I3 ) =
v1 =

0 . Say v2 = e2 , then

v1 v2 (B − λ1 I3 ) = (b, 0, 0) = b
= e1 .
Solve x B = λ2 x and take an eigenvector v3 = (bc − a(λ1 − λ2 ), −c(λ1 −

λ2 ), (λ1 − λ2 ) ). Then C = { v1 , v2 , v3 } is a basis for R3 . Since
2

v1 B = b
e1 B = bλ1
e1 = λ1 v1 + 0 ·
v1 = λ1 v2 + 0 ·
v3 ,

v2 B =
e2 B = (b, λ1 , 0) = b v 1 + λ1 ·
e2 =
e1 + λ1 v2 + 0 ·
v3 ,

v3 = 0 ·
= λ2
v3 B v1 + 0 ·
v 2 + λ2 ·
v3
   
λ1 0 0 v1
−1  
⇒[B]C = QBQ = 1 λ1 0 , where Q = v2 .  
(3.7.23)

0 0 λ2 v3
This is called the Jordan canonical form of B (for details, see Sec. 3.7.7).
[B]C belongs to Case 4 in Example 3 by noticing that
 λ1 
λ2 0 0
 
[B]C = λ2  λ12 λλ12 0 . (3.7.24)
0 0 1

 
λ 0 0
A =  b λ 0 , where bc = 0
0 c λ
1. A satisfies its characteristic polynomial −(t − λ)3 , i.e.
(A − λI3 )3 = O3×3
but
   
0 0 0 0 0 0

A − λI3 = b 0 0 = O3×3 , 
(A − λI3 ) = 0
2
0 0 = O3×3 .
0 c 0 bc 0 0
2. Hence, A has eigenvalues λ, λ, λ with associated eigenvectors t e1 , t ∈ R
and t = 0 and A is not diagonalizable.
3. Notice that A can be written as the sum of the following linear operators:
A = λI3 + (A − λI3 )
   
1 0 0 0 0 0
= λ  λb 1 0 + 0 0 0 .
0 0 1 0 c 0
Note that
e1 is the only invariant line (subspace). See Fig. 3.35 for
some geometric feeling.
(0,1, 2)
e3 (0,1,1) e3 (1, 3, 2)
(2,1, 2)
(1, 0,1) e2
(1,1,1) A (3, 3, 2)
e2 0
0
e1 (1, 2, 0)
(1,1, 0) (2, 0, 0)
e1
invariant line
(3, 2, 0)
( = 2, b = c= 1)
Fig. 3.35
Remark Jordan canonical form of a matrix

Let A be as in Example 4.
Try to choose a nonzero vector
v3 so that

v3 (A − λI3 )3 = 0 ,

v2 v3 (A − λI3 ) = 0 ,
= and

v1 =
v2 (A − λI3 ) =
v3 (A − λI3 )2 is an eigenvector of A.
Take v3 = e3 , then v2 = (0, c, 0) = c

e2 and
v1 = (bc, 0, 0) = bc
e1 . Now
B = { v1 , v2 , v3 } is a basis for R . Thus
3

v1 A = bc
e1 A = bc(λ, 0, 0) = λ v1 + 0 ·
v1 = λ v2 + 0 ·
v3 ,

v2 A = c
e2 A = c(b, λ, 0) = bc e2 = 1 ·
e1 + λc v1 + λ ·
v2 + 0 ·
v3 ,

v3 A =
e3 A = (0, c, λ) = c e3 = 0 ·
e2 + λ v1 + 1 ·
v2 + λ ·
v3
   
λ 0 0 bce1
⇒ [A]B = P AP −1 = 1 λ 0 , e2  .
where P =  c (3.7.25)

0 1 λ e3
[A]B is called the Jordan canonical form of A (for details, see Sec. 3.7.7).
See Fig. 3.35 for λ = 2.
Example 5 The linear operator (compare with Example 7 in Sec. 2.7.2)

 
0 1 0
A = b a 0 , where bc = 0
0 0 c
has the following properties: A has the characteristic polynomial
−(t2 − at − b)(t − c).
Case 1 a2 + 4b > 0
√
a + a2 + 4b
λ1 = v1 = (b, λ1 , 0)
√2
a − a2 + 4b
λ2 = v2 = (b, λ2 , 0)
2
λ3 = c v3 =
e3
In the basis B = {
v1 , v3 }, A is diagonalizable as
v2 ,
   
λ1 0 0 v1
[A]B = P AP −1 =0 λ2 0 , where P = 
v2  .

0 0 λ3 v3
See Fig. 3.31.
Case 2 a2 + 4b = 0 and a = 2c
a a
λ= , v = (−a, 2, 0)
2 2
λ3 = c v3 =
e3
1. Since
 a 
−2 1 0
a
A − I3 =  b a
2 0 ,
2
0 0 c − a2
 
* + 0 0 0
a 2
 
A − I3 = 0 0 0 ,
2 2
0 0 c − a2

−c 1 0
A − cI3 =  b a − c 0 , and
0 0 0
 ac 
2 −c
a
* + 2 +b 0
a  
A − I3 (A − cI3 ) = b( a2 − c) −( ac
2 + b) 0
= O3×3 ,
2
0 0 0
thus A is not diagonalizable.
2. Now, B = { v, e3 } is a basis for R3 . Since
e1 ,

v A = λ
v,
1
e1 A = (0, 1, 0) = v + λ
e1 ,
2

e3 A = c
e3
   
0 1 0 λ 0 0
⇒ [A]B = P  b a 0 P −1 =  12 λ 0
0 0 c 0 0 c
λ  
c 0 0 v
1 λ   
= c  2c c 0  , where P = e1 .

0 0 1 e3
For mapping properties of A in B, please refer to Example 4. Note that

e3 is an invariant plane (subspace) of A.
v,
( 1
1 + 2 , 2 , x 3
2 ) ( 1
2 )
x A = 1 + 2 , 2 ,cx 3
( 1, 2 , x3 )
v 1 v
x
e2
(0, 0, x3 ) 2 e1
e1
0 (0,0, x3)+ 〈〈v,e1〉〉
e3
Fig. 3.36
Case 3 a2 + 4b < 0
A has only one real eigenvalue c with associated eigenvector
e3 . Hence
e3 is an invariant line (subspace) of A. Since

    
0 1 0 b 0 0 0 0 0
A = 1 0 0 0 1 0 + 0 a 0 ,
0 0 1 0 0 1 0 0 c−1
then, A is the sum of the following two linear operators.
1. The first one is the composite of the reflection with respect to the plane
x1 = x2 (see Example 2) followed by a one-way stretch along e1 with
scale b (see Example 1).
2. The second one is a mapping of R3 onto the plane e3 if a = 0, c =
e2 ,
1; onto the axis e2 if a = 0 and c = 1; and onto the axis

e3 if
a = 0 and c = 1 (see (3.7.34)).
x1 = x2 ( x2 , x1 , x3 )
e1 (bx2 , x1 , x3 )
( x1 , x2 , 0)
x
xA
e2 0 e3
(0, ax2 , (c − 1) x3 )
Fig. 3.37
Remark The rational canonical form of a matrix

Let B3×3 be a linear operator on R3 . If there exists a basis
B = {v1 , v3 } so that the matrix representation is
v2 ,
   
0 1 0 v1
−1   
[B]B = PBP = b a 0 , where P = v2 and a2 + 4b < 0, 

0 0 c v3
(3.7.26)
then [B]B is called the rational canonical form of B. For details, see
Sec. 3.7.8.
To sum up what we obtained so far in this subsection and to expect
what we will do in Secs. 3.7.6–3.7.8, let A3×3 be a nonzero real matrix
whose characteristic polynomial is

det(A − tI3 ) = −(t3 + b2 t2 + b1 t + b0 ) = −(t − λ1 )(t2 + a1 t + a0 ).
Cayley–Hamilton theorem says that characteristic polynomial annihi-
lates A, i.e.
A3 + b2 A2 + b1 A + b0 I3 = (A − λ1 I3 )(A2 + a1 A + a0 I3 ) = O3×3 . (3.7.27)
For a direct proof of this matrix identity, see Ex. <A> 3, also refer to
(2.7.19), Ex. <C> 5 of Sec. 2.7.6 and Ex. 4 of Sec. B.10. Note that A is
invertible if and only if
1 2
−b0 = det A = 0 and then A−1 = − (A + b2 A + b1 I3 ). (3.7.28)
b0
The canonical forms of a nonzero matrix A3×3 under similarity

Case 1 a21 − 4a0 > 0
t2 + a1 t + a0 = (t − λ2 )(t − λ3 ) and A has real eigenvalues λ1 , λ2 , λ3 .
1. If λ1 = λ2 = λ3 , A is diagonalizable and is similar to
 
λ1 0
 λ2 .
0 λ3
2. If at least two of λ1 , λ2 , λ3 are equal, see Case 2.
Case 2 a21 − 4a0 = 0
t2 + a1 t + a0 = (t − λ2 )2 and A has real eigenvalues λ1 , λ2 , λ2 .
1. If λ1 = λ2 and (A − λ1 I3 )(A − λ2 I3 ) = O3×3 , A is diagonalizable and is
similar to
 
λ1 0
 λ2 .
0 λ2
2. If λ1 = λ2 and (A − λ1 I3 )(A − λ2 I3 ) = O3×3 , A is not diagonalizable and
is similar to the Jordan canonical form or the rational canonical form
   
λ2 0 0 0 1 0
 1 λ2 0  or −λ22 2λ2 0  .
0 0 λ1 0 0 λ1
3. If λ1 = λ2 = λ3 = λ and A − λI3 = O3×3 , then A itself is λI3 .
4. If λ1 = λ2 = λ3 = λ but A − λI3 = O3×3 and (A − λI3 )2 = O3×3 , then

A is similar to the Jordan canonical form or the rational canonical form
   
λ 0 0 0 1 0
1 λ 0 or −λ2 2λ 0 .
0 0 λ 0 0 λ
5. If λ1 = λ2 = λ3 = λ but (A − λI3 ) = O3×3 , (A − λI3 )2 = O3×3 and

(A − λI3 )3 = O3×3 , then A is similar to the Jordan canonical form or
the rational canonical form
   
λ 0 0 0 1 0
1 λ 0 or 0 0 1 .
0 1 λ λ3 −3λ2 3λ
Case 3 a21 − 4a0 < 0

A is similar to the rational canonical form
 
0 1 0
−a0 −a1 0 . (3.7.29)
0 0 λ1
Exercises
<A>
1. For each of the following matrices A (linear operators), do the following

problems.
(1) Determine the rank r(A).
(2) Compute the characteristic polynomial det(A − tI3 ) and real eigen-
values and the associated eigenvectors.
(3) Justify the Cayley–Hamilton theorem and use it to compute A−1 if
A is invertible (see (3.7.27) and (3.7.28)).
(4) Determine invariant lines or/and planes (subspaces and affine sub-
spaces); also, lines or planes of invariant points, if any.
(5) Illustrate graphically the essential mapping properties of A in
N = { e1 , e3 }.
e2 ,
(6) Try to find a basis B = { v1 , v3 }, if possible, so that [A]B is in its
v2 ,
canonical form (see (3.7.29)).
(7) Redo (5) in B.
(a) A is one of
     
0 0 0 0 0 a 0 0 0
0 a 0 , 0 0 0 or 0 0 0 , where a = 0.
0 0 0 0 0 0 0 a 0
(b) A is one of
     
0 a 0 0 0 a 0 0 a
0 0 0 , 0 0 0 or 0 b 0 , where ab = 0.
b 0 0 0 0 b 0 0 0
(c) A is one of
       
0 a 0 0 0 0 a 0 0 0 0 0
 b 0 c  , a 0 0 , 0 0 0 or 0 a b ,
0 0 0 b c 0 0 b c c 0 0
where abc = 0.
(d) A is one of
     
0 0 a 0 0 a 0 a 0
 b 0 0 , 0 b 0 or  b 0 0 , where abc = 0.
0 c 0 c 0 0 0 0 c
Note that one can assign a, b and c numerical numbers so that
the characteristic polynomial can be handled easier.
(e) A is one of
     
0 −1 −1 −3 3 −2 0 1 −1
−3 −1 −2 , −7 6 −3 or −4 4 −2 .
7 5 6 1 −1 2 −2 1 1
 
0 1 0
(f) A = 0 0 1 into Jordan canonical form.
8 −12 6
 
−2 1 0
(g) A =  0 −2 1 into Jordan and rational canonical forms.
0 0 −2
2. Recall that e1 = (1, 0, 0),
e2 = (0, 1, 0), and
e3 = (0, 0, 1). Let
x1 =

(0, 1, 1), x2 = (1, 0, 1), x3 = (1, 1, 0), and x4 = (1, 1, 1). These seven
points form vertices of a unit cube. See Fig. 3.38. This cube can be
divided into six tetrahedra of equal volumes. Three of them are shown
in the figure. Any such a tetrahedron is called part of the cube. Similar
situation happens to a parallelepiped.
e3 x1
x2
x4
e2
0
e1 x3
Fig. 3.38
(a) Find a linear transformation mapping the tetrahedron ∆

oe1
e2
e3

one-to-one and onto the ∆ o e1 x3 x4 .
(1) Determine the other vertices of the parallelepiped that contains
∆ oe1
x3
x4 as part of it. Find the volume of it.
(2) How many such linear transformations are there? Write them
out in matrix forms. Are there any connections among them,
say, via permutation matrices (see (2.7.67))?
(3) If we replace “onto” by “into the tetrahedron ∆ o
e1
x3
x4 ” so
that the range spaces contain a vertex, an edge or a face of it,
find the corresponding linear transformations. Not just one only,
but many of them!
(b) Do the same questions as in (a), but onto and into the tetrahedron
∆oe1
e3
x4 .
(c) Do the same questions as in (a), but onto and into the tetrahedron
∆e1
x2
e3
x4 .
(Note Here, “linear” should be replaced by “affine”. By an affine
transformation, we mean a mapping of the form
T (
x) =
x0 +
x A,
x0 , x ∈ R3 ,
where A is a linear operator (refer to (2.8.20) and Sec. 3.8).)
(d) Do the same questions as in (a), but from ∆ e1
x3
e2
x4 onto or into

∆ x1 e2 e3 x4 .
3. Prove Cayley–Hamilton theorem (3.7.27) by the following methods.
(a) Use canonical forms in (3.7.29).
(b) A3×3 has at least one real eigenvalue λ so that there exists a basis
B = {v1 , v3 } for R3 in which
v2 ,
 
λ1 0 0
[A]B = PAP −1 = b21 b22 b23  .
b31 b32 b33
(c) Try to simplify for A3×3 what Exs. 5 and <C> 5 of
Sec. 2.7.6 say.
(d) Try to model after (2.7.18) for A3×3 . Is this a beneficial way to prove
this theorem?
(e) Let ϕ(λ) = det(A − λI3 ), the characteristic polynomial of A. Try the
following steps.
(1) Take λ so that ϕ(λ) = 0. Then (see (3.3.2) or Sec. B.6)
(A − λI3 )−1 = ϕ(λ)
1
adj(A − λI3 ).
(2) Each entry of the adjoint matrix adj(A − λI3 ) is a polynomial
in λ of degree less than 3. Hence, there exist constant matrices
B2 , B1 , B0 ∈ M(3; R) such that
adj(A − λI3 ) = B2 λ2 + B1 λ + B0 .
(3) Multiply both sides of adj(A − λI3 ) · (A − λI3 ) = ϕ(λ)I3 out and
equate the corresponding coefficient matrices of λ3 , λ2 , λ and λ0
to obtain
−B2 = −I3
B2 A − B1 = −b2 I3
B1 A − B0 = −b1 I3
B0 A = −b0 I3 .
Then try to eliminate B2 , B1 , B0 from the right sides.
(Note Methods in (c) and (e) are still valid for n × n matrices over a
field F.)
4. Let A = [aij ]3×3 .
(a) Show that the characteristic polynomial
det(A − tI3 ) = −t3 + tr(A)t2 + b1 t + det A.
Try to use aij , 1 ≤ i, j ≤ 3, to express the coefficient b1 .
(b) Prove (3.7.28).
(c) Let
 
−9 4 4

A = −8 3 4 .
−16 8 7
(1) Try to use (b) to compute A−1 .
(2) Show that A is diagonalizable and find an invertible matrix P
such that P AP −1 is a diagonal matrix.
(3) Compute An for n = ±2, ±3.

(4) Compute A5 − 5A3 − A2 + 23A + 17I3 .

1. Try to prove (3.7.29). For details, refer to Secs. 3.7.6–3.7.8.
2. (Refer to Ex. 1 of Sec. 2.7.2, Ex. 7 of Sec. B.4 and Ex. 5 of
Sec. B.12.) Let A ∈ M(3; R) be a nilpotent matrix of index 2, namely
A = O3×3 but A2 = O.
(a) Show that A is not one-to-one, i.e. not invertible.
(b) If λ is an eigenvalue (it could be a complex number) of A, then it is
necessarily that λ = 0.
(c) What is the characteristic polynomial of A? What its canonical form
looks like?
(d) Show that
A = O, A2 = O ⇔ Im(A) ⊆ Ker(A).

Use this to determine all such A. For example, Ker(A) = { 0 } or R3

is impossible if A = O and hence, Im(A) = { 0 } or R3 . Consider the
following cases.
(1) dim Ker(A) = 2 and Ker(A) = x2 . Choose
x1 , x3 so that
{
x1 , x3 } is a basis for R3 . Then
x2 , x3 A = α1 x1 + α2
x2 and
   
0 0 0 x1
P AP −1 =  0 0 0 , where P =  x2  .

α1 α2 0 x3
Is Im(A) = Ker(A) = x2 a possibility?
x1 ,
(2) dim Ker (A) = 1 will eventually lead to
 
0 0 0
P AP −1 = α1 0 0 .
α2 0 0
3. Let A ∈ M(3; R) be a nilpotent matrix of index 3, i.e. A = O3×3 , A2 = O
but A3 = O.
(a) Show that all the eigenvalues of A are zeros. What is its canonical
form?
(b) Show that tr(Ak ) = 0 for k = 1, 2, 3, 4, . . ..
(c) Show that
(1) A = O, A2 = O but A3 = O.
⇔ (2) 1 = dim Ker(A) < 2 = dim Ker(A2 ) < 3 = dim Ker(A3 ).
⇔ (3) 2 = r(A) > 1 = r(A2 ) > 0 = r(A3 ).

x ∈ R3 so that
Hence, show that there exists a vector x A2 = 0
x A = 0 ,

and x A = 0 , and B = { x , x A, x A } forms a basis for R . What is
3 2 3
[A]B ? See Ex. <C> 3 of Sec. 3.7.7 and Ex. 5 of Sec. B.12 for general
setting.
4. Does there exist a matrix A ∈ M(3; R) such that Ak = O for k = 1, 2, 3
but A4 = O? If yes, give explicitly such an A; if not, give a precise reason
(or proof).
5. (Refer to Ex. 2 of Sec. 2.7.2, Ex. 6 of Sec. B.4 and Ex. 7 of Sec. B.7.)
Let A ∈ M(3; R) be idempotent, i.e. A2 = A.
(a) Show that each eigenvalue of A is either 0 or 1 .
(b) Guess what are possible characteristic polynomials and canonical
forms for such A. Could you justify true or false of your statements?
(c) Show that
A2 = A.
⇔ Each nonzero vector in Im (A) is an eigenvector of A associated
to the eigenvalue 1.
Try to use this to determine all such A up to similarity.
6. (Refer to Ex. 3 of Sec. 2.7.2, Ex. 9 of Sec. B.4 and Ex. 8 of Sec.
B.7.) Let A ∈ M(3; R) be involutory, i.e. A2 = I3 .
(a) Show that each eigenvalue of A is either 1 or −1.
(b) Consider the following cases:
(1) A − I3 = O3×3 .
(2) A + I3 = O3×3 .
(3) A − I3 = O3×3 , A + I3 = O3×3 but (A − I3 )(A + I3 ) = O.
Write out the canonical form of A for each case. Prove them!
(c) Show that
Ker(A − I3 ) = Im(A + I3 ),
Ker(A + I3 ) = Im(A − I3 ), and
R = Ker(A − I3 ) ⊕ Ker(A + I3 ).
3
Thus, determine the canonical forms for all such A.

7. (Refer to Ex. 4 of Sec. 2.7.2 and Ex. 9 of Sec. B.7.) Try to show
that there does not exist any real 3 × 3 matrix A such that
A2 = −I3 .
8. (Refer to Ex. 5 of Sec. 2.7.2.) Let A3×3 be a real matrix
which has only one real eigenvalue λ. Show that there exists a basis
B = {
x1 , x3 } such that
x2 ,
   
λ 0 0 x1
[A]B = P AP −1 = 0 λ1 λ2  , where P = 
x2  and λ2 = 0.
0 −λ2 λ1
x3
In particular, A has an invariant subspace x3 of dimension two.

x2 ,
<C> Abstraction and generalization.
Refer to Ex. <C> of Sec. 2.7.2.
3.7.3 Matrix representations of a linear operator

in various bases
The process adopted, the definitions concerned and all the results obtained
in Sec. 2.7.3 can be generalized verbatim to linear operators on R3 . All we
need to do is to change trivially from R2 to R3 . For examples:
R2 R3
f : R2 → R2 is a linear operator. f : R3 → R3 is a linear operator.
B = { a2 } is a basis for R2 .
a1 , B = { a1 , a3 } is a basis for R3 .
a2 ,
 
a11 a12 a13
a11 a12 
[f ]B
C = , etc. [f ]B
C = a21 a22 a23 
a21 a22
a31 a32 a33
 
f ( 1 C
a )
=  f ( a ) , etc.
2 C
f ( a3 ) C
 
1 0 0
1 0
I2 = I3 = 0 1 0
0 1
0 0 1

3
A= aij Eij A= aij Eij
i=1 j=1 i=1 j=1
Hom(R2 , R2 ) or L(R2 , R2 ) Hom(R3 , R3 ) or L(R3 , R3 )

M(2; R) M(3; R)
1R2 1R3
GL(2; R) GL(3; R)
O2×2 O3×3
A2×2 , B2×2 , etc. A3×3 , B3×3 , etc.
We will feel no hesitation to use these converted results for R3 . Recall that
f ∈ Hom(R3 , R3 ) is called diagonalizable if there exists a basis B for R3
so that [f ]B is a diagonal matrix. In this case, [f ]C is similar to a diagonal
matrix for any basis C for R3 , and vice versa.
We list following results for reference.
The invariance of a square matrix or a linear

operator under similarity
Let A3×3 (or An×n ) be a real matrix. Then the following are invariants
under similarity:
1. The determinant det(PAP −1 ) = det(A).

2. The characteristic polynomial det(PAP −1 − tI3 ) = det(A − tI3 ), and
hence the set of eigenvalues.
3. The trace tr(PAP −1 ) = tr A.
4. The rank r(PAP −1 ) = r(A).
Hence, for a linear operator f : R3 → R3 (or f : Rn → Rn ) and any fixed

basis B for R3 , the following are well-defined:
1. det f = det[f ]B .
2. det(f − t1R3 ) = det([f ]B − tI3 ).
3. tr f = tr[f ]B .
4. r(f ) = r([f ]B ). (3.7.30)
In (2.7.43), the following change
r(A) + r(B) − 3 ≤ r(AB) ≤ min{r(A), r(B)} (3.7.31)
is needed for A3×3 and B3×3 . For general case, see Ex. <C> in Sec. 2.7.3.
For A ∈ M(m, n; R) or A ∈ M(m, n; F) where m, n = 1, 2, 3, . . . , as
before in (2.7.46) and (2.7.47), let the
row space: Im(A) or R(A) = {xA | x ∈ Rm },

left kernel: Ker(A) or N (A) = { x ∈ Rm |

x A = 0 },
column space: Im(A∗ ) or R(A∗ ) = {x A∗ |
x ∈ Rn },

right kernel: Ker(A∗ ) or N (A∗ ) x A∗ = 0 }.
= { x ∈ Rn |

(3.7.32)
Then, we have (refer to Sec. 3.7.1 and, in particular, to Ex. <C> there)
The equalities of three ranks of a matrix Am×n :

(1) If A = Om×n , r(A) = 0.
(2) If A = Om×n , then
the row rank of A (i.e. dim Im(A))
= the column rank of A (i.e. dim Im(A∗ ))
= the algebraic rank of A (how to define it?)
= r(A) (called the rank of A).
Notice that 1 ≤ r(A) ≤ min{m, n}.
By the way, in case Pm×m and Qn×n are invertible matrices, then
r(PA) = r(AQ) = r(A),
i.e. operations on A by invertible matrices preserve the rank of A.
(3.7.33)
We end up this subsection by the following
Example Let the linear operator f on R3 be defined as

 
−1 0 1
f (
x) = x A, where A =  0 3 −2 
−2 3 0
in N = { e1 , e3 }. Let B = {
e2 , a1 , a3 } where
a2 , a1 = (1, 1, 0),

a2 = (1, 0, 1), a3 = (0, 1, 1) and C = { b1 , b2 , b3 } where b1 = (−1, −1, 1),

b2 = (−1, 1, −1), b3 = (1, −1, −1). Do the same questions as in Ex. <A> 1
of Sec. 2.7.3.
Solution Denote by Ai∗ the ith row vector of A for i = 1, 2, 3 and A∗j the
jth column vector for j = 1, 2, 3.
Since A3∗ = 2A1∗ + A2∗ or A∗3 = −A∗1 − 23 A∗2 , then the rank r(A) = 2.
By direct computation,
   
a1 1 1 0 b1 −1 −1 1
 
a2  = 1 0 1 = −2; det  b2  = −1
det  1 −1 = 4

a3 0 1 1
b3 1 −1 −1
therefore B and C are bases for R3 .
(a) By using (3.3.5),
   −1    
b1
a1 −1 −1 1 −1 −1 1
  1
ACB =  b2  
a2  = −1 1 −1 · − −1 1 −1

2
b3

a3 1 −1 −1 1 −1 −1
 
3 −1 −1
1
=− −1 3 −1 ,
2
−1 −1 3
   −1      
a1 b1 1 1 0 1 1 0 2 1 1
   1 1
a2   b2  = 1 0 1 · − 1 0   1 .

AB
C = 1 =− 1 2
2 2
a3 b3 0 1 1 0 1 1 1 1 2
It happens to be that
 −1    −1  
a1 b1 b1 a1
 1     1
a2  = −  b2  and hence,  b2  = − a2  .

2 2
a3 b3 b3 a3
By above matrix relations or actual computation, we have
ACB AB
C = I3 .
Similarly,
   −1    
b1
e1 b1 −1 −1 1
     
ACN =  b2  e2 =  b2  I3 = −1 1 −1 ,

b3

e3 b3 1 −1 −1
   −1  −1  
e1 a1 a1 −1 −1 1
1
AN
B = e2  
a2  = I3 a2  = − −1 1 −1
2

e3
a3
a3 1 −1 −1
and hence, ACB = ACN AN B holds.
(b) Note that [f ]N = A as given above. Now, by (3.3.4) and (3.3.5),
       −1
f (1 B
a ) a A a1 A a1
1 B
[f ]B = [f ]B =  f ( a )  =  a A  =    
a A a2
B 2 B 2 B
2

f ( a3 ) B a3 A B a3 A a3
   −1
a1 a1
 
= a2 A a2  
= AB N N
N [f ]N AB

a3 a3
    
1 1 0 −1 0 1 −1 −1 1
1 
= 1 0 1  0 3 −2 · − −1 1 −1
2
0 1 1 −2 3 0 1 −1 −1
 
−3 5 −3
1
= −  1 5 −7 .
2
−6 10 −6
Similarly,
[f ]C = [f ]CC = ACN [f ]N
N AC
N
    
−1 −1 1 −1 0 1 1 1 0
1
= −1 1 −1  0 3 −2 · − 1 0 1
2
1 −1 −1 −2 3 0 0 1 1
 
−1 0 1
1
= −  3 0 −3 .
2
−5 4 −3
Direct computation shows that both [f ]B and [f ]C are of rank 2. By what

we obtained for [f ]B and [f ]C ,
[f ]N = [f ]N N B N C
N = AB [f ]B AN = AC [f ]C AN
⇒ [f ]C = ACN AN B N
B [f ]B AN AC = P [f ]B P
−1
,
where
P = ACN AN C
B = AB
as shown above.
(c) By definition, (3.3.4) and (3.3.5),
       −1  −1
f (1 B
e ) A∗ A1∗ a1 a1
1 B
[f ]N =  f ( e2 ) B  =  A2∗ B  = A2∗   a  = A  
a2
B
2

f ( e3 ) B A3∗ B A3 ∗ a3 a3
     
−1 0 1 −1 −1 1 2 0 −2
1 1
=  0 3 −2 · − −1 1 −1 = − −5 5 −1 ;
2 2
−2 3 0 1 −1 −1 −1 5 −5
 −1    
b1 −1 0 1 1 1 0
  1
[f ]N
C = A  b2  =  0 3 −2 · − 1 0 1
2
b3 −2 3 0 0 1 1
 
−1 0 1
1
= −  3 −2 1 .
2
1 −2 3
Note that both [f ]N N

B and [f ]C have ranks equal to 2. Direct computation
shows that
[f ]N N C
B = [f ]C AB .
(d) Just like (c),

       −1    −1
b1 b1
f ( a1 )C a1 AC a1 A
 
a1
 
[f ]B   
C = f ( a2 )C = a2 AC =
  a2  A  b2 
a2 A  b2  = 

f (a3 ) C
a3 A C a3 A b3 a3 b3
= AB N N
N [f ]N AC
      
1 1 0 −1 0 1 1 1 0 2 −2 2
1 1
= 1 0 1  0 3 −2 · − 1 0 1 = − 0 −2 4
2 2
0 1 1 −2 3 0 0 1 1 4 −4 4
 
−1 1 −1
= 0 1 −2 ;
−2 −2 −2
[f ]CB = ACN [f ]N
N AB
N
    
−1 −1 1 −1 0 1 −1 −1 1
1
= −1 1 −1  0 3 −2 · − −1 1 −1
2
1 −1 −1 −2 3 0 1 −1 −1
   
2 0 −2 −1 0 1
1  
=− −6 0 6 = 3 0 −3 .
2
6 −12 −6 −3 6 3
Note that both [f ]B C
C and [f ]B have ranks equal to 2. Also
[f ]N N B C N C B
N = AB [f ]C AN = AC [f ]B AN
⇒ [f ]CB = ACN AN B C N C B C
B [f ]C AN AB = AB [f ]C AB .
(e) This is contained in (d).

(f) By direct computation or by using known results in (c) and (d), it
follows that
[f ]B B N N B N
C = AN [f ]N AC = AN [f ]C
where, like (c), [f ]N N N B B N

C = [f ]N AC holds. Similarly, [f ]C = [f ]N AC . 2
Exercises
<A>
1. In N = {
e1 , e3 }, let
e2 ,
 
1 1 −1
f (
x) =
x A, where A = 1 −1 0 .
2 0 1
Let B = { a1 , a3 } where
a2 , a1 = (−1, 2, 1), a3 = (1, 2, −3)
a2 = (2, 4, 5),

and C = { b1 , b2 , b3 } where b1 = (1, 0, 1), b2 = (1, 2, 4), b3 = (2, 2, −1).
Model after the example and do the same questions as in Ex. <A> 1 of
Sec. 2.7.3.
2. Find a linear operator f on R3 and a basis B for R3 so that
 
−1 1 1
[f ( x ]N  0 2
x )]B = [ 5 , x ∈ R3 .
−2 4 −3
Describe all possible such f and B.
3. Let
   
1 0 0 1 0 0
A = 1 1 0 and B = 1 1 0 .
0 0 1 0 1 1
(a) Do there exist a linear operator f on R3 and two bases B and C for
R3 so that [f ]B = A and [f ]C = B? Give precise reason.
(b) Show that there exist invertible matrices P and Q so that
PAQ = B.
4. Find nonzero matrices A3×3 and B3×3 so that AB has each possible
rank. Show that
r(AB) = 3 ⇔ r(A) = r(B) = 3.
5. Generalize Exs. <A> 10 through 21 of Sec. 2.7.3 to R3 or real 3×3 matri-
ces and prove them. For example, in Ex. 20, the orthogonal complement
of a subspace S in R3 is now defined as
S ⊥ = {
x ∈ R3 |
xy ∗ = 0 for each
y ∈ S}.
For a nonzero 3 × 3 matrix A, then
(1) Im(A)⊥ = Ker(A∗ ).
(2) Ker(A∗ )⊥ = Im(A), and R3 = Ker(A∗ ) ⊕ Im(A).
(3) Im(A∗ )⊥ = Ker(A).
(4) Ker(A)⊥ = Im(A∗ ), and R3 = Ker(A) ⊕ Im(A∗ ).
For each of the following matrices A:
     
2 5 3 −1 −2 3 1 0 −2
−6 1 −2 ,  3 6 −9 and  4 6 −3 ,
2 21 10 2 4 −6 −5 −1 2
find a basis for each of the subspaces Im(A), Ker(A), Im(A∗ ) and
Ker(A∗ ) and justify the above relations (1)–(4).
6. Extend (2.7.23), (2.7.28), (2.7.31), (2.7.32) and (2.7.35) to R3 and prove

all of them.
7. Prove (3.7.30).
8. Prove (3.7.31).
9. Prove (3.7.33).

Read back Sec. 2.7.3 carefully and extend all possible definitions and results
there to linear transformations from Rm to Rn where m, n = 1, 2, 3; in par-
ticular, the cases m = n. Please refer to Secs. B.4, B.6 and B.7 if necessary.
We will be free to use them in what follows if needed.
1. Let N = {1} be the natural basis for R and N = {

e1 , e3 } for R3 .
e2 ,
(a) A mapping f : R → R3 is a linear transformation if and only if there
a = (a1 , a2 , a3 ) in R3 such that
exists a vector
f (x) = x
a for x ∈ R,
namely, [f (x)]N = [x]N [f ]N

N in which

[f ]N
N = [f (1)]N 1×3 , where [f (1)]N = f (1) =
a,
is the matrix representation of f with respect to N and N .

(b) Does there exist a basis B for R so that [f ]B
N = b ∈ R but b is
3
linearly independent of
a ? Notice that
[f ]B B N
N = [1R ]N [f ]N ,
where 1R : R → R is the identity transformation and hence [1R ]B N is

the transition matrix from the basis B to the basis N .

(c) Do there exist a basis B for R and a basis B for R3 so that [f ]B
B = b

where b ∈ R3 is a preassigned vector? If yes, prove it; if no, under
what conditions that it will become true. Notice that

[f ]B B N N
B = [1R ]N [f ]N [1R3 ]B .
(d) For any fixed straight line

x = v for t ∈ R, in R3 , show
x0 + t
that there exists an affine transformation mapping R one-to-one
and onto that line. How many such transformations are there?
2. Let N = { e2 } be the natural basis for R2 and N = {
e1 , e1 ,
e2 ,
e3 }
the natural basis for R .
3
(a) A mapping f : R2 → R3 is a linear transformation if and only if

there exists a real 2 × 3 matrix A = [aij ] so that
f (
x) =
x A, x ∈ R2 ,

i.e.

f( e )

[f ( x)]N = [ x]N A,
where A = [f ]N
N = 1 N .
f ( e2 ) N
(b) Give a fixed real matrix B2×3 . Do there exist a basis B for R2 and
a basis B for R3 so that [f ]B B
B = B holds? Notice that [f ]B =

B N N
[1R2 ]N [f ]N [1R3 ]B .
(c) Show that f can be one-to-one but never onto R3 .
(d) Show that there are infinitely many affine transformations
T (
x) =
y0 +
x A,
y0 ∈ R3 fixed and
x ∈ R2 ,
where A2×3 is of rank 2, mapping R2 one-to-one and onto any
preassigned two-dimensional plane in R3 .
(e) Show that any affine transformation from R2 into R3 preserves
relative positions of two straight lines in R2 (see (2.5.9)).
(f) Let T (x) = xA be an affine transformation from R2 into R3
y0 +
(see (d)).

(1) Show that the image of the unit square with vertices 0 , e1 ,

e1 +
e2 and e2 under T is a parallelogram (see Fig. 3.39).
Compute the planar area of this parallelogram.
(2) The image of a triangle in R2 under T is a triangle
∆T (
a1 )T (
a2 )T (
a3 ) (see Fig. 3.39). Compute
the area of ∆T (
a1 )T (
a2 )T (
a3 )
.
the area of ∆ a1 a2 a3
T ( a1 )
a1
T ( a2 )
e2 e1 + e2 e3 ′
T T ( a3 )
a3 T (0)
a2
0 e1 0 e2 ′
e1 ′
Fig. 3.39
(3) What is the image of the unit circle x21 + x22 = 1 in R2 under
T ? How about its area? Refer to Fig. 2.66.
3. The system of three non-homogeneous linear equations in two
unknowns x1 and x2

2
aij xi = bj , j = 1, 2, 3
i=1
can be rewritten as

a a12 a13

xA = b , where A = 11 ,
a21 a22 a23

x = (x1 , x2 ) ∈ R2 .
b = (b1 , b2 , b3 ) and
(a) Prove that the following are equivalent.

(1)
xA = b has a solution.

(2) (linearly algebraic) The vector b lies on the range space
of A, i.e.

b ∈ Im(A).
(3) (geometric) The three lines
l1 : a11 x1 + a21 x2 = b1 ,
l2 : a12 x1 + a22 x2 = b2 ,
l3 : a13 x1 + a23 x2 = b3
in R2 are either all coincident, two coincident and one inter-

secting or three intersecting at a single point but never any two
of them intersecting at different points. See Fig. 3.40 (refer to
Fig. 2.24).
(4) (algebraic)
a11 a21 b1
Fig. 3.39(a) ⇔ = = for i = 2, 3
a1i a2i bi
(infinitely many solutions).
a11 a21 b1 a11 a21
Fig. 3.39(b) ⇔ = = but =
a12 a22 b2 a13 a23
(a unique solution).
l2
l3 l1 l1
l1 = l2 = l3 l1 = l2 l2
l3 e2
e2 e2 e2
l3
0 e1 0 e1 0 e1 0 e1
(a) (b) (c) (d)
l1 l1
l2 e2 l3 e2 l2
l3
0 e1 0 e1
(e) (f)
Fig. 3.40
a11 a21 a12 a22 a13 a23

Fig. 3.39(c) ⇔ = , = , = but
a12 a22 a13 a23 a11 a21

a11 a21 b1

∆ = a12 a22 b2 = 0 (a unique solution).
a a23 b3
13
a11 a21 a12 a22 a13 a23
Fig. 3.39(d) ⇔ = , = , = but
a12 a22 a13 a23 a11 a21
∆ = 0 (no solution).
a11 a21 b1 a13 a23
Fig. 3.39(e) ⇔ = = but = and
a12 a22 b2 a11 a21
∆ = 0 (no solution).
a11 a21 b1 a12 a22 b2
Fig. 3.39(f) ⇔ = = , = = ,
a12 a22 b2 a13 a23 b3
a13 a23 b3
= = but ∆ = 0 (no solution).
a11 a21 b1
What happens if l1 = l2 l3 ?

A
(5) (linearly algebraic) The rank of the augmented matrix
b 3×3
is equal to that of the coefficient matrix A, i.e.

A
r = r(A).
b

x A = b has a solution where A = O2×3 . Then
(b) In case

(1)
x A = b has a unique solution.

⇔ (2) x A = 0 has only one solution

x = 0 , i.e. Ker(A) = { 0 }.
⇔ (3) The linear transformation A: R2 → R3 is one-to-one, i.e.
r(A) = 2.
⇔ (4) AA∗ , as a square matrix of order 2, is invertible. Thus, the
unique solution (refer to Ex. <A> 5(1)) is

b A∗ (AA∗ )−1 .
On the other hand,

(1)
x A = b has infinitely many solutions.

⇔ (2)
x A = 0 has infinitely many solutions, i.e. the solution space
Ker(A) has dim Ker(A) = 1.
⇔ (3) The linear transformation A: R2 → R3 is not one-to-one,
i.e. the rank r(A) = 1.
⇔ (4) AA∗ is not invertible and r(AA∗ ) = 1.

If
x0 is a solution of
x A = b , then the solution affine subspace

x0 + Ker(A)

is the solution set of x A = b (see Fig. 3.41). Among so many

solutions, there exists a unique solution
v whose distance to 0 is
the smallest one, i.e.
|
v | = min |
x |.

x A= b
The remaining question is that how to find v , more explicitly, how

to determine v via A. If r(A) = 2, then by the former part of (b),

v = b A∗ (AA∗ )−1 .
it is known that
x0 + Ker(A)
e2 x0
Ker(A)
+
bA
e1
0
Fig. 3.41
(c) Suppose r(A) = 1. Then r(AA∗ ) = 1 and AA∗ is then not invertible.
We may suppose a11 = 0. Rewrite A as

a11 A12
A= = BC,
a21 A22
where

a11
B= , C = 1 a−1
11 A12 and
a21

A12 = a12 a13 , A22 = a22 a23 .
Therefore r(B) = r(C) = 1. Now

xA = b

x AA∗ = b A∗
⇒

x (BC)(BC)∗ =
⇒ x B(CC ∗ )B ∗ = b C ∗ B ∗

x B(CC ∗ )(B ∗ B) = b C ∗ (B ∗ B)
⇒
⇒ (since B ∗ B and CC ∗ are invertible)

x B = b C ∗ (CC ∗ )−1

⇒ (since r(B) = 1) The required solution is

v = b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ (Why?).

1 1 1
(d) Let A = . Show that r(A) = 1 and
−1 −1 −1
 
1 −1
1
A+ = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ = 1 −1 .
6
1 −1

4. (continued from Ex. 3) Given A = [aij ]2×3 and b = (b1 , b2 , b3 ) ∈ R3 ,

suppose x A = b doesn’t have any solution x ∈ R2 . The problem is to

find x0 ∈ R2 so that | b −
x0 A| is minimal, i.e.

|b − min | b −
x0 A| = x A|.
x ∈R2

Geometrically, this means that x0 A is the orthogonal projection of b

onto the range space Im(A) and | b − x0 A| is the distance from b to
Im(A). For details, see Chaps. 4 and 5. See also Fig. 3.42.
b Im(A)
e2 e3 ′ b e3 ′
Im(A)
A
x0 A x0 A
0 e1 e2 ′
0 0 e2 ′
e1 ′ r(A) = 1 e1 ′
r(A) = 2
Fig. 3.42
Now,

(b −
x0 A) ⊥ x ∈ R2
x A for all

⇔(b − xA)∗ = ( b −
x0 A)( x∗ = 0
x0 A)A∗ x ∈ R2
for all

x0 A)A∗ = b A∗ −
⇔(b − x0 AA∗ = 0 , i.e.

x0 AA∗

= b A∗ .
(a) In case r(A) = 2, then r(AA∗ ) = 2 and AA∗ is invertible. Therefore,

x0 = b A∗ (AA∗ )−1

⇒ The projection vector of b on Im(A) is b A∗ (AA∗ )−1 A.
(b) In case r(A) = 1, then r(AA∗ ) = 1 and AA∗ is not invertible.
Decompose A as in Ex. 3(c) and show that

x0 = b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗

⇒ The projection vector of b on Im(A) is

b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ A = b C ∗ (CC ∗ )−1 C.
(Note Combining Ex. 3 and Ex. 4 together, for A2×3 = O2×3 the 3 × 2
matrix

∗ ∗ −1

A (AA ) , if r(A) = 2
+ ∗ ∗ −1 ∗ −1 ∗
A = C (CC ) (B B) B , if r(A) = 1 and


A = BC as in Ex. 3(c)
is called the generalized or pseudoinverse of A. This A+ has the follow-
ing properties:

(1) | b A+ | = min | x | if
x A = b has a solution.
x A= b

(2) b A+ A is the orthogonal projection of b ∈ R3 on Im(A) and

| b − b A+ A| =
min | b − x A|.
x ∈R2
These results are also still valid for real or complex m × n matrix
A with m ≤ n. For general setting in this direction, see Sec. B.8 and
Fig. B.9, also refer to Ex. of Sec. 2.7.5 and Example 4 in Sec. 3.7.5,
Secs. 4.5, 5.5.)
5. Let N = { e1 , e3 } be the natural basis for R3 and N = {1} the one
e2 ,
for R.
(a) A mapping f : R3 → R is a linear transformation, specifically called
a linear functional, if and only if there exist scalars a1 , a2 , a3 so that
 
a1
f ( x a2  ,
x ) = a1 x1 + a2 x2 + a3 x3 = x ∈ R3 .
a3
i.e.
 
a1
[f ( x ]N [f ]N
x )]N = [ N, where [f ]N
N = a2  .
a3
(b) Give a matrix B = [bi ]3×1 . Find conditions so that there exist a
basis B for R3 and a basis B for R so that [f ]B
B = B.
(c) Show that f can be onto but never one-to-one.
(d) Suppose f : R3 → R is a linear functional such that Im(f ) = R
holds. Try to define the quotient space
R3 /Ker(f ).
Show that it is linear isomorphic to R (refer to Fig. 3.25).
(e) Let fj : R3 → R be the linear functional satisfying
fj (
ei ) = δij , 1 ≤ i, j ≤ 3.
Then any linear functional f : R → R can be uniquely expressed as
3
f = f (
e1 )f1 + f (
e2 )f2 + f (
e3 )f3 .
(f) The set of all linear functionals from R3 to R, namely
(R3 )∗ = Hom(R3 , R)
(see Ex. 19 of Sec. B.7) is a three-dimensional real vector space,
called the (first) dual space of R3 , with {f1 , f2 , f3 } as a basis which
is called the dual basis of N in (R3 )∗ . How to define the dual basis
B∗ for (R3 )∗ of a basis B for R3 ?
(g) Let S be a subspace (or a nonempty subset) of R3 . The set

S 0 = {f ∈ (R3 )∗ | f ( x ∈ S}
x ) = 0 for all
is a subspace of (R3 )∗ and is called the annihilator of S in (R3 )∗ .
Show that
(1) dim S + dim S 0 = dim R3 = 3.
(2) S1 ⊆ S2 ⇔ S10 ⊇ S20 .
(3) (S1 + S2 )0 = S10 ∩ S20 .
(4) (S1 ∩ S2 )0 = S10 + S20 .
Try to explain (1)–(4) geometrically. Therefore, S → S 0 sets up
a one-to-one correspondence between the family of subspaces of
R3 and the family of subspaces of (R3 )∗ but reverses the inclusion
relation. For example,

Two planes intersecting along a line through 0 in R3 .

↔ Two lines generating a plane through 0 in (R3 )∗ .
Thus, S → S 0 reflects geometrically the dual properties between
R3 and (R3 )∗ and the latter is then called the dual space of the
former.
(h) The dual space of (R3 )∗
∗
(R3 )∗∗ = (R3 )∗ = Hom (R3 )∗ , R
is called the second dual space of R3 . For each
x ∈ R3 , define a
∗∗ 3 ∗
mapping x : (R ) → R by
∗∗
x (f ) = f (
x ), f ∈ (R3 )∗ .
Show that
x ∗∗ ∈ (R3 )∗∗ .
(1)
x →
(2) x ∗∗ sets up a linear isomorphism from R3 onto (R3 )∗∗
in a natural way, i.e. independent of choices of bases for R3
and (R3 )∗∗ .
Therefore, each basis for (R3 )∗ is the dual basis of some basis for R3 .
x ∗∗ (f ) = f (
Occasionally, x ) is rewritten as

x , f = f,
x , x ∈ R3 and f ∈ (R3 )∗ .

which indicates implicitly the duality between R3 and (R3 )∗ . Also,

show that for a nonempty set S in R3 ,
(S 0 )0 = S 00 = S, the subspace generated by S.
(i) Let ϕ: R3 → R3 be a linear operator. Define a mapping

ϕ∗ : (R3 )∗ → (R3 )∗ by
ϕ∗ (f )(
x ) = f (ϕ(
x )), f ∈ (R3 )∗ and
x ∈ R3 ; or
∗
x , ϕ (f ) = ϕ( x ), f .

Then, such a ϕ∗ is linear and is unique and is called the adjoint or

the dual of ϕ. Let B and C be the bases for R3 and B∗ and C ∗ the
corresponding dual bases for (R3 )∗ . Then
∗ ∗
[ϕ]B
C = [ϕ∗ ]CB∗ ,
∗
namely, [ϕ∗ ]CB∗ is the transpose of [ϕ]B
C.
(j) Discuss R∗ and (R2 )∗ . How to define ϕ∗ : (Rn )∗ → (Rm )∗ of a linear
transformation ϕ: Rm → Rn , where m, n = 1, 2, 3?
(Note For a general setting, please refer to Ex. 19 through Ex. 24 of
Sec. B.7.)
6. Let N = { e1 , e3 } be the natural basis for R3 and N = {
e2 , e1 ,
e2 }
the one for R .
2
(a) A mapping f : R3 → R2 is a linear transformation if and only if

there exists a matrix A = [aij ]3×2 such that
f (
x) =
x A, x ∈ R3 ,

namely,
[f ( x ]N [f ]N
x )]N = [ N, where [f ]N
N = A3×2 .
(b) For a matrix B = [bij ]3×2 , find conditions so that there exist a
basis B for R3 and a basis B for R2 so that [f ]B
B = B.
(c) Show that f can be onto but never one-to-one.
(d) Show that R3 /Ker(f ) is isomorphic to Im(f ).
(e) Let f : R3 → R2 be a linear transformation. Then
(1) f is onto.
⇔ (2) There exists a linear transformation g: R2 → R3 so that
f ◦ g = 1R2 , the identity operator on R2 .
⇔ (3) There exist a basis B for R3 and a basis B for R2 so that
[f ]B
B is left invertible, i.e. there is a matrix B2×3 so that
B[f ]BB = I2 .
In this case, f is called right invertible. Use

 
−5 3
f (x) = x A, where A =  4 −1
2 6
to justify (1)–(3).
(f) Let g: R2 → R3 be a linear transformation. Then
(1) g is one-to-one.
⇔ (2) There exists a linear transformation f : R3 → R2 so that
f ◦ g = 1R2 , the identity operator on R2 .
⇔ (3) There exist a basis B for R2 and a basis B for R3 so that

[g]B
B is right invertible, i.e. there is a matrix B3×2 so that
[g]B
B B = I2 .
In this case, g is called left invertible. Use

−5 4 2
g( x ) = x
3 −1 6
to justify (1)–(3).
(g) Prove counterparts of (e) and (f) for linear transformations
f : R3 → R and g: R → R3 .
(Note For (e), (f) and (g), general cases can be found in Ex. 5 of
Sec. B.7 and Ex. 3 of Sec. B.8. Also, refer to Ex. <A> 7 of Sec. 2.7.4
and Ex. 5 (d), (e) of Sec. 3.3.)
(h) Let f : R3 → R2 be defined as
 
1 0
f ( x  0 −1 ,
x) = x ∈ R3 .
−1 1
Find the image of the unit sphere x21 + x22 + x23 = 1 under f . How
about the image of the unit closed ball x21 + x22 + x23 ≤ 1? Watch
the following facts: Let
y = (y1 , y2 ) = f (
x ).
(1) Ker(f ) = (1, 1, 1). Hence, f is one-to-one on the plane
x1 + x2 + x3 = 0.
(2) y1 = x1 −x3 , y2 = −x2 +x3 ⇒ x1 = y1 +x3 and x2 = −y2 +x3 .
Substitute these two equations into x1 +x2 +x3 = 0 and obtain
x3 = 13 (y2 −y1 ) and hence x1 = 13 (2y1 +y2 ), x2 = − 13 (y1 +2y2 ).
(3) Now, consider the image of the circle x21 + x22 + x23 = 1,
x1 + x2 + x3 = 0 under f .
(i) Let f ( x A be a linear transformation from R3 onto R2 . Do

x) =
the same question as in (h) by trying the following method:
(1) Consider R3 = Ker(f ) ⊕ Ker(f )⊥ so that f |Ker(f )⊥ is one-
to-one.
(2) Use (e) to find a matrix B2×3 so that BA = I2 . Then
y = xA
implies that y B ∈ x + Ker(f ) for y ∈ R and vice versa.
2
Indeed, B = (A∗ A)−1 A∗ .

See Fig. 3.43.
Ker( f )
e3 Ker( f )⊥ e2 ′
e2 f
0 0 e1 ′
e1
Fig. 3.43
7. The system of two non-homogeneous linear equations in three

unknowns x1 , x2 and x3

3
aij xi = bj , j = 1, 2
i=1
can be rewritten as
 
a11 a12

xA = b , where A = a21 a22  = O3×2 , b = (b1 , b2 ) ∈ R2 and
a31 a32
x = (x1 , x2 , x3 ) ∈ R3 .

(a) Prove that the following are equivalent.

(1)
x A = b has a solution.

(2) (linearly algebraic) The vector b lies on the range space of A,

i.e. b ∈ Im(A).
(3) (geometric) The two planes
Σ1 : a11 x1 + a21 x2 + a31 x3 = b1 ,
Σ2 : a12 x1 + a22 x2 + a32 x3 = b2
in R3 are either coincident or intersecting along a line but never
parallel (see (3.5.6) and Fig. 3.16).
(4) (algebraic)
a11 a21 a31 b1
Coincident Σ1 = Σ2 ⇔ = = = ;
a12 a22 a32 b2
a11 a21
intersecting along a line ⇔ at least two of the ratios ,
a12 a22
a31
and are not equal.
a32
(5) (linearly algebraic)
The coefficient matrix A and the augmented
A
matrix have the same rank, i.e.
b

A 1, if coincidence;
r = r(A) =
b 2, if intersection.
Therefore, it is worth mentioned that

x A = b has no solution.
⇔ The planes Σ1 and Σ2 are parallel.
a11 a21 a31 b1
⇔ = = = .
a12 a22 a b
32 2
A
⇔ r(A) = 1 < r = 2.
b

(b) In case
x A = b has a solution. Then
(1) The linear transformation A: R3 → R2 is onto, i.e. r(A) = 2.

⇔ (2) The solution space Ker(A) of x A = 0 is a one-dimensional
subspace of R3 .

⇔ (3) For each point b ∈ R2 , the solution set of
x A = b is a one-
dimensional affine subspace of R3 .
⇔ (4) A∗ A is an invertible 2 × 2 matrix, i.e. r(A∗ A) = 2. Thus, a

particular solution of
x A = b is

b (A∗ A)−1 A∗ .

In this case, the solution affine subspace of
x A = b is

b (A∗ A)−1 A∗ + Ker(A),

which is perpendicular to the vector b (A∗ A)−1 A∗ , namely, for any
x ∈ Ker(A),

∗
x ( b (A∗ A)−1 A∗ )∗ =
x A(A∗ A)−1 b
∗
= 0 (A∗ A)−1 b = 0 .

Hence, the point b (A∗ A)−1 A∗ has the shortest distance, among

all the points lying on the solution affine subspace, to the origin 0
(see Fig. 3.44), i.e.

| b (A∗ A)−1 A∗ | = min |
x |.

x A= b
* –1 *
b(A A) A + Ker(A)
e3
Ker(A)
* –1 *
b(A A) A
0 e2
e1
Fig. 3.44

(c) In case
x A = b has a solution. Then
(1) The linear transformation A: R3 → R2 is not onto, i.e.
r(A) = 1.

⇔ (2) The solution space Ker(A) of x A = 0 is a two-dimensional
subspace of R .3

⇔ (3) For the point b for which x A = b has a solution, the
solution set is a two-dimensional affine subspace of R3 .

⇔ (4) For the point b for which x A = b has a solution, A∗ A is
not invertible and

r(A∗ A) = r(A∗ A + b∗ b ) = 1.

If
x0 is a particular solution of
x A = b , then the affine plane

x0 + Ker(A)

is the solution set of
x A = b (see Fig. 3.45). There exists a unique

point v on the plane whose distance to 0 is the smallest one, i.e.
v | = min |
| x |.

x A= b
This
v is going to be perpendicular to both Ker(A) and
x0 +
Ker(A).
+
bA + Ker(A)
+
bA
e3
Ker(A)
0 e2
e1
Fig. 3.45
To find such a
v , we model after Ex. 3(c) and proceed as follows.
May suppose a11 = 0 and rewrite A as

a11
A = BC, where B = and C = 1 a−1
11 A12 and
A21 3×1 1×2

a
A21 = 21 , A12 = [a12 ]1×1
a31
so that r(B) = r(C) = 1. Then

xA = b

⇒ x AA∗ = b A∗ or

x B(CC ∗ )B ∗ = b C ∗ B ∗

⇒x B(CC ∗ )(B ∗ B) = b C ∗ (B ∗ B)

⇒ x B(CC ∗ ) = b C ∗ or

x B = b C ∗ (CC ∗ )−1
⇒ The required solution is

v = b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ = b A+

where A+ = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ .

x ∈ R3 ,
Notice that, for any x A = 0 if and only if
x B = 0 , and

hence, for such x ,

x ( b A+ )∗ =
x B(B ∗ B)−1 (CC ∗ )−1 C = 0.

This means that the vector b A+ is indeed perpendicular to Ker(A)
and it is a point lying on the affine plane
x0 + Ker(A), since

( b A+ )A = b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ BC

= b C ∗ (CC ∗ )−1 C =
x BC =
xA = b .
(d) Let
 
1 −1
A = 1 −1 .
1 −1
Show that A+ = 16 A∗ .

8. (continued from Ex. 7) Given A = [aij ]3×2 and b = (b1 , b2 ) ∈ R2 ,

suppose x ∈ R3 . As in Ex. 4, the
x A = b does not have any solution
problem is to find x0 ∈ R so that
3

|b − min | b −
x0 A| = xA|.
x ∈R3

This means that x0 A is the orthogonal projection of b onto the range

space Im(A) and | b −
x0 A| is the distance from b to Im(A). For details,
see Chaps. 4 and 5. See also Fig. 3.46.
e3 e2 ′ b
0 e2 e1 ′
0
Im(A)
e1
Fig. 3.46
According to Ex. 7, r(A) = 1 should hold in this case. Just like Ex. 4,
we have

(b −
x0 A) ⊥ x ∈ R3
x A for all

x0 AA∗ = b A∗ .
⇔
Since r(AA∗ ) = r(A) = 1, model after Ex. 7(c) and A = BC there,

then

x0 = b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ = b A∗

⇒ The projection vector of b on Im(A) is

b C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ A = b C ∗ (CC ∗ )−1 C.
(Note Combing Exs. 7 and 8 together, for A3×2 = O3×2 , the 2 × 3

matrix
∗ −1 ∗
+ (A A) A , if r(A) = 2
A = ∗ ∗ −1 ∗ −1 ∗
C (CC ) (B B) B , if r(A) = 1 and A = BC as in Ex. 7(c)
is called the generalized or pseudoinverse of A. This A+ has the follow-

ing properties:

(1) | b A+ | = min | x | if x A = b has a solution.
x A= b

+
(2) b A A is the orthogonal projection of b on Im(A) and

| b − b A+ A| =
min | b −
x A|.
x ∈R3
Similar results hold for real or complex matrix Am×n with m ≥ n.

See Sec. B.8, also refer to Exs. and <C> 18 of Sec. 2.7.5 and
compare with the note in Ex. 4.)
9. Do the same problems as in Ex. 4 and Ex. 8 for A1×2 , A2×1 , A1×3
and A3×1 .
10. (a) N points (xi , yi ), 1 ≤ i ≤ n, in the plane R2 are collinear if and
ai = (xi , yi ) − (x1 , y1 ), 2 ≤ i ≤ n, the matrix
only if, for
 
a2
 
 a3 
 . 
 .. 
 

an
has rank equal to or less than 2.

(b) N straight lines ai1 x1 +ai2 x2 +bi = 0, 1 ≤ i ≤ n, in the plane R2 are
concurrent at a point if and only if, for ai = (ai1 , ai2 ), 1 ≤ i ≤ n,
   
a1 a1 b1
   
 a2   a2 b2 
A =  .  and [A | b∗ ] =  . ..  , where b = (b1 , . . . , bn )
 ..   .. . 

an an bn
have the same rank 2, these lines are coincident along a line if and

only if r(A) = r([A | b∗ ]) = 1.
11. In R3 .
(a) Find necessary and sufficient conditions for n points (xi , yi , zi ),
1 ≤ i ≤ n, to be coplanar or collinear.
(b) Find necessary and sufficient conditions for n planes ai1 x1 +ai2 x2 +
ai3 x3 + bi = 0, 1 ≤ i ≤ n, to be concurrent at a point, intersecting
along a line or coincident along a plane.
<C>
Read Ex. <C> of Sec. 2.7.3 and do all the problems there if you missed
them at that time.
Also, do the following problems. Refer to Exs. 19–24 of Sec. B.7, if
necessary.
1. A mapping f : C3 → C3 is defined by
f (x1 , x2 , x3 ) = (3ix1 − 2x3 − ix3 , ix2 + 2x3 , x1 + 4x3 ).
(a) In the natural basis N = { e1 , e3 }, f can be expressed as
e2 ,
 
3i 0 1
f ( x ) = [ x ]N [f ]N , where [f ]N = [f ]N

N = −2 i 0 .
−i 2 4
(b) Let x1 = (1, 0, i), x2 = (−1, 2i, 1), x3 = (2, 1, i). Show that both
B = { x1 , x2 , x3 } and f (B) = B = {f (

x1 ), f (
x2 ), f (
x3 )} are bases
for C3 . What is [f ]B B ?
(c) By direct computation, show that
 23−3i 
2
1−5i
2 −5 + i
 −15+9i 5+5i 
[f ]B
B = 2 2 5 − 5i  .
19 − 5i −5i −10 + 3i
B
Compute the transition matrix and justify that [f ]B
PN B =
B N N
PN [f ]N PB .

B B B
(d) Compute [f ]B B B
B and PB and justify that [f ]B = PB [f ]B PB .
(e) Let g: C → C be defined by
3 3
g(x1 , x2 , x3 ) = (x1 + 2ix2 , x2 + x3 , −ix1 + ix3 ).

Compute
(1) g ◦ f and f ◦ g,
B
(2) [g]B B
B and [g]B , [g]B , and
B
(3) [g ◦ f ]B B
B and [f ◦ g]B , [f ◦ g]B .
2. In Pn (R) (see Secs. A.5 and B.3), let
N = {1, x, x2 , . . . , xn }, and
B = {1, x + x0 , (x + x0 )2 , . . . , (x + x0 )n },
where x0 ∈ R is a fixed number. N is called the natural basis for Pn (R).
(a) Show that B is a basis for Pn (R).

(b) Let D: Pn (R) → Pn (R) be the differential operator, i.e. D(p) = p .
Show that
 
0 0 0 0 ··· 0 0
1 0 0 0 · · · 0 0
 
0 2 0 0 · · · 0 0
 
 
[D]N = [D]N =  0 0 3 0 · · · 0 0 .
N . . . . .. .. 
. . . . .. 
. . . . . . .
 
0 0 0 0 · · · 0 0
0 0 0 0 · · · n 0 (n+1)×(n+1)
(c) Show that [D]B = [D]N .

(d) Compute the transition matrices PBN and PN B
.
(e) Compute PBN for n = 2 and justify that [D]B = PN B
[D]N PBN .
(f) What happen to (b)–(e) if D is considered as a linear transformation
from Pn (R) to Pn−1 (R).
(g) Define Φ: Pn (R) → Rn+1 by
% n &

i
Φ ai x = (a0 , a1 , . . . , an ).
i=0
Show that Φ is a linear isomorphism and

[Φ]N
N = In+1 ,
where N = {
e1 , . . . ,
en+1 } is the natural basis for Rn+1 . What is

[Φ ◦ D ◦ Φ−1 ]N
N
where D is as in (b)?
3. In P2 (R), let N = {1, x, x2 } be the natural basis and let
B = {x2 − x + 1, x + 1, x2 + 1},
B = {x2 + x + 4, 4x2 − 3x + 2, 2x2 + 3},
C = {x2 − x, x2 + 1, x − 1},
C = {2x2 − x + 1, x2 + 3x − 2, −x2 + 2x + 1}.
(a) Use Ex. 2(g) to show that B, B , C and C are bases for P2 (R).
(b) Show that {5x2 − 2x − 3, −2x2 + 5x + 5} is linear independent in
P2 (R) and extend it to form a basis for P2 (R).
(c) Find a subset of {2x2 − x, x2 + 21x − 2, 3x2 + 5x + 2, 9x − 9} that is
a basis for P2 (R).

(d) Compute the transition matrices PCB and PCB . Let Φ be as in Ex. 2(g)
and notice that
(1) Φ transforms a basis B for P2 (R) onto a basis Φ(B) for R3 .
(2) What is [Φ]B
Φ(B) ?
(3) PCB = [Φ]B
Φ(B) Φ(C)
Φ(B) PΦ(C) [Φ]C .
(e) Let T : P2 (R) → P2 (R) be defined by
T (p)(x) = p (x) · (3 + x) + 2p(x).
Show that T is a linear operator. Compute [T ]N and [T ]B and justify
that [T (p)]B = [p]B [T ]B by supposing p(x) = 3 − 2x + x2 .
B B B
(f) Compute [T ]B B C
C and [T ]C and justify that [T ]C = PB [T ]C PC .
(g) Let U : P2 (R) → R be defined by
3
U (a + bx + cx2 ) = (a + b, c, a − b).
Show that U is a linear isomorphism. Use N = { e1 , e3 } to
e2 ,

denote the natural basis for R . Compute [U ]N and [U ]N , [U −1 ]N
3 N B
B
and justify that

[U ]B
N [U
−1 N
]B = [U −1 ]N B
B [U ]N = I3 ,

i.e. ([U ]B
N)
−1
= [U −1 ]N
B .
(h) Compute [U ]N and justify that [U ◦ T ]N
N N
N = [T ]N [U ]N .
(i) Define V : P2 (R) → M(2: R) by

p (0) 0
V (p) = .
2p(1) p (3)
Show that V is a linear transformation and compute [V ]N N where
N = {E11 , E12 , E21 , E22 } is the natural basis for M(2: R) (see
Sec. B.4). Verify that [V (p)]N = [p]N [V ]N
N if p(x) = 4 − 6x + 3x .
2
(j) Define linear functionals fi : P2 (R) → R, for i = 1, 2, 3, by

f1 (a + bx + cx2 ) = a,
f2 (a + bx + cx2 ) = b,
f3 (a + bx + cx2 ) = c.
Show that N ∗ = {f1 , f2 , f3 } is the dual basis of N = {1, x, x2 }
in P2 (R)∗ .
(k) It is easy to see that
ax2 +bx+c = (c−a−b)(x2 −x+1)+(c−a)(x+1)+(2a+b−c)(x2 +1).
Then, try to find the dual basis B∗ of B in P2 (R)∗ . How about C ∗ ?
(l) Define gi : P2 (R) → R, for i = 1, 2, 3, by

6 1
g1 (p) = p(t) dt,
0
6 2
g2 (p) = p(t) dt,
0
6 3
g3 (p) = p(t) dt.
0
Show that {g1 , g2 , g3 } is a basis for P2 (R)∗ . Try to find a basis for
P2 (R) so that its dual basis in P2 (R)∗ is {g1 , g2 , g3 }.
∗
(m) Let T be as in (e). Compute [T ∗ ]CB∗ and justify that it is equal to
([T ]B ∗ ∗ 2
C ) . Try to find T (f ) if f (ax + bx + c) = a + b + c by the
following methods:
∗
(1) [T ∗ (f )]B∗ = [f ]C ∗ [T ∗ ]CB∗ .
(2) By definition of T ∗ , for p ∈ P2 (R), then
T ∗ (f )(p) = f (T (p)).
(n) Let U be as in (g). Describe U ∗ : (R3 )∗ → (P2 (R))∗ .

(o) Let V be as in (i). Describe V ∗ : M(2: R)∗ → (P2 (R))∗ .
4. In M(2: R), let
N = {E11 , E12 , E21 , E22 },

1 0 1 1 1 0 0 0
B= , , , ,
0 1 0 0 1 0 1 1

0 1 1 0 1 1 1 1
C= , , , .
1 1 1 1 0 1 1 0
(a) Show that B7and C are

bases8for M(2: R).
−2 5 0 4
(b) Show that 1 3
, 2 −1 is linearly independent in M(2: R)
and extend it to a basis for M(2: R).
(c) How many bases for M(2: R) can be selected from the set

1 1 1 1 0 1 −1 0 0 1 −1 1
, , , , , ?
0 0 1 1 1 0 0 1 0 −1 0 2
(d) Let Φ: M(2: R) → R4 be defined by

a11 a12
Φ = (a11 , a12 , a21 , a22 ).
a21 a22
Show that Φ is a linear isomorphism. Hence, Φ(N ) = {

e1 ,
e2 , e4 }
e3 ,
is the natural basis for R . What are Φ(B) and Φ(C)? Show that
4
[Φ]N
Φ(N ) = I4 .
B C
(e) Compute the transition matrices PN , PN and PCB by using the
following methods:
(1) Direct computation via definitions.
(2) PCB = [Φ]B
Φ(B) Φ(C)
Φ(B) PΦ(C) [Φ]C , etc.
Verify that
PCB = PN
B N
PC .
(f) Define T : M(2: R) → M(2: R) by
T (A) = A∗ (the transpose of A).
Show that T is a linear isomorphism. Compute [T ]N , [T ]B , [T ]C and

[T ]B
C by using the methods indicated in (e):
(1) Direct computation.

(2) Via R4 . Consider the following diagram
T
M(2: R) → M(2: R)
Φ↓ Φ↓
T̃
R4 → R4
where T = Φ−1 ◦ T̃ ◦ Φ. What is T̃ ?

Verify that [T ]B B N
C = PN [T ]N PC and [T
−1 C
]B = ([T ]B
C)
−1
.
(g) Compute the dual bases B and C of B and C in M(2: R)∗ . Show
∗ ∗
that
∗
[T ∗ ]CB∗ = ([T ]B
C)
∗
and compute T ∗ (tr) where tr: M(2: R) → R defined by tr(A) =

a11 + a22 is the trace of A.
(h) Show that ±1 are the only eigenvalues of T . Find all eigenvectors
of T corresponding to 1 and −1, respectively.
(i) Define U : M(2: R) → P2 (R) by

a11 a12
U = (a11 + a12 ) + 2a22 x + a12 x2 .
a21 a22
Determine Ker(U ) and the rank r(U ). Let N0 = {1, x, x2 } be the

natural basis for P2 (R) and B0 and C0 be B and C in Ex. 3 respec-
tively. Do the following problems.
(1) Compute [U ]N B C
N0 , [U ]B0 and [U ]C0 .
(2) Verify that [U (A)]B0 = [A]B [U ]B
B0 .
N0
(3) Verify that [U ]CC0 = PN
C
[U ]N
N0 PC0 .
B∗
(4) Describe U ∗ and verify that [U ∗ ]B0∗ = ([U ]B ∗ ∗
B0 ) . Compute U (p)
2
if p(x) = 1 + x + x .
(5) Verify that [U ◦ T ]CC0 = [T ]CB [U ]B C
C0 = [T ]C [U ]C0 .
3.7.4 Linear transformations (operators)

Just like Sec. 3.7.3 to Sec. 2.7.3, the definitions, the results and the proofs
concerned in Sec. 2.7.4 can be generalized, as claimed there, to higher
dimensional vector spaces or matrices, in particular to Rn for n = 1, 2, 3.
Here we list corresponding results in R3 and leave their proofs to readers.
As a generalization of (2.7.50) and (2.7.51), we have the
Projection on R3
Let f : R3 → R3 (or Rn → Rn for n ≥ 2) be a nonzero linear operator with
rank equal to r, 1 ≤ r ≤ 2.
(1) f is a projection of R3 = V1 ⊕ V2 onto V1 along V2 , i.e. for each

x = x2 ∈ R3 , where
x1 + x1 ∈ V1 , and
x2 ∈ V2 , then
f (
x) =
x1 .
See Fig. 3.47.

⇔ (2) r(f ) + r(1R3 − f ) = dim R3 = 3.
⇔ (3) 1R3 − f is a projection of R3 onto V2 along V1 .
⇔ (4) f 2 = f (and hence f is the projection of R3 onto Im(f ) along
Ker(f )).
⇔ (5) f has only eigenvalues 1 and 0.
x x
V1 V2
V2 V1
x2 e3 e3
x2
x1 = f ( x ) x1 = f ( x )
0 e2 0 e2
e1 e1
r( f ) = 2 r( f ) = 1
Fig. 3.47
⇔ (6) There exists a basis B = { x1 , x3 } for R3 such that

x2 ,
 
x1
I 0
[f ]B = P [f ]N P −1 = r , P =  x2  ,
0 0 3×3
x3
where N = {
e1 , e3 } is the natural basis for R3 .
e2 , (3.7.34)

Of course, the zero linear operator is the projection of R3 onto { 0 } along
the whole space R3 , while the identity operator 1R3 is the only projection

of R3 onto itself along { 0 }.
As a counterpart of (2.7.52), it is
The projectionalization
(1) For each linear operator f on R3 (or on Rn for n ≥ 2), there exists an
invertible linear operator g on R3 such that
(g ◦ f )2 = g ◦ f,
i.e. g ◦ f is a projection on R3 .
(2) For any real 3 × 3 (or n × n) matrix A, there exists an invertible matrix
P3×3 such that
(AP )2 = AP. (3.7.35)
(2.7.55) and (2.7.56) become
The rank theorem

Let f : Rm → Rn be a linear transformation for m, n = 1, 2, 3 (and for any
m, n ≥ 1). Suppose the rank r(f ) = r ≤ min(m, n).
(1) There exist an invertible linear operator g on Rm and an invertible

linear operator h on Rn so that

(0, . . . , 0), if r(f ) = 0
h ◦ f ◦ g(x1 , . . . , xm ) = (x1 , . . . , xr , 0, . . . , 0), if r(f ) = r ≥ 1.

n−r
See the diagram below and Fig. 3.48.

(2) Let Am×n be a real matrix. Then there exist invertible matrices Pm×m
and Qn×n such that

Om×n , if r(A) = 0,
P AQ = Ir×r Or×(n−r)
O(m−r)×r O(m−r)×(n−r) , if r(A) = r ≥ 1,
m×n
which is called the normal form of A. See Fig. 3.49. (3.7.36)
To illustrate (1) graphically and, at the same time, to provide a sketch of

its proof, let N = { em } and N = {
e1 , . . . , e1 , . . . ,
en } be the respective
natural bases for R and R . Let B = { x1 , . . . , xr , xr+1 , . . . ,
m n
xm } be a basis
for Rm so that {f ( x1 ), . . . , f (
xr )} is a basis for Im(f ) which is extended
to form a basis C = {f ( x1 ), . . . , f (
xr ),
yr+1 , . . . , yn } for Rn . Define linear
operators g: Rm → Rm and h: Rn → Rn by
g(
ei ) =
xi , 1 ≤ i ≤ m,

h(f (
xi )) = ei for 1 ≤ i ≤ r and h( ej for r + 1 ≤ j ≤ n.
yj ) =
Then h ◦ f ◦ g has the required property. See the following diagram and
Fig. 3.48 for m = 3, n = 2 and r(f ) = 1.
f
Rm −→ Rn
(B) (C)
g↑ ↓h
(N ) (N )
h◦f ◦g
Rm −−−−→ Rn
In short, a linear transformation f : Rm → Rn , of rank r, can be described

as a projection h ◦ f ◦ g mapping Rm onto the subspace generated by first
r coordinate axes of Rn , after suitably readjusting coordinate axes in Rm
e3
Ker( f ) Im( f )
e2 ′
x1
x3 f ( x1 )
y2
f
0 e2 e1 ′
0
x2
e1
g h
e3
〈〈 e2, e3 〉〉
e2 ′
( x1 , x2 , x3 )
h° f °g
0 e2 ( x1 , 0)
e1
( x1 , 0, 0) 0 e1 ′
Fig. 3.48
and Rn . Suppose
f (
x) =
x A,
 
  f (
x1 )
x1  . 
 ..   .. 
 .   
   
f ( xr )
P = [1Rm ]B
N = 
 xr  and Q−1 = [1Rn ]CN =   .
 .   yr+1 
 ..   
 .. 
 . 
xm m×m
yn n×n
Then

I 0 N
P AQ = r = [f ]B B N
C = [g]N [f ]N [h]C . (3.7.37)
0 0
This is (2) in (3.7.36). See Fig. 3.49.
The counterpart of (2.7.57) and (2.7.58) is
The nullities and the ranks of iterative linear operators

P1* Qn−1
*
[ f ] = PAQ
P2* 0 0
Pm* Q−1
2*
Q1*−1
[1R m ] = P [1R n ] ' = Q
em en '
[f] = A
0 e2 0 e2 '
e1 e1 '
Fig. 3.49
(1) Then
dim Ker(f 3 ) = dim Ker(f 4 ) = · · · = dim Ker(f n ) = · · · , for n ≥ 3;
r(f ) = r(f ) = · · · = r(f ) = · · · ,
3 4 n
for n ≥ 3.
(2) For any real 3 × 3 matrix A,
r(A3 ) = r(A4 ) = · · · = r(An ) = · · · , for n ≥ 3.
In general, for a linear operator f : Rn → Rn , there exists a least positive
integer k so that
1. Ker(f k ) = Ker(f k+1 ) = · · · and Im(f k ) = Im(f k+1 ) = · · · , and
2. Rn = Ker(f k ) ⊕ Im(f k ).
(3.7.38)
Exercises
<A>
1. Prove (3.7.34), (3.7.35), (3.7.36) and (3.7.38).
2. For each of the following matrices A, do the following problems.
(1) Find invertible matrix P such that AP is a projection on R3 .
(2) Find invertible matrices P and Q so that PAQ is in its normal form.
(3) Use A to justify (3.7.38) and determine the smallest positive integer
k so that r(Ak ) = r(An ) for n ≥ k.
   
6 1 −5 −3 −6 15
(a) A = 2 −3 4 . (b) A = −1 −2 5 .
3 7 −1 2 4 −10
 
−2 6 3
(c) A =  0 12 10 .
4 0 4
3. Do Exs. <A> 2 and 5–15 of Sec. 2.7.4 in case Rn or n × n matrix A for
n ≥ 3.

Do Exs. 1–3 of Sec. 2.7.4 in case Rn or n × n matrix A for n ≥ 3.
Read Ex. <C> of Sec. 2.7.4 and do all the problems there.
<D> Applications
Do the following problems
1. Remind that the (n + 1)-dimensional vector space Pn (R) has the natural
basis N = {1, x, . . . , xn−1 , xn }. Let D: Pn (R) → Pn−1 (R) ⊆ Pn (R) be
the differential operator
D(p(x)) = p (x)
and I: Pn−1 (R) → Pn (R) be the integral operator
6 x
I(q(x)) = q(t) dt.
0
(a) Show that D ◦ I = 1Pn−1 (R) , the identity operator on Pn−1 (R) and

N N
[D ◦ I]N = IN DN = In ,
where N = {1, x, . . . , xn−1 } is the natural basis for Pn−1 (R). Is this
anything to do with the Newton–Leibniz theorem?
(b) Is I ◦ D = 1Pn (R) true? Why? Any readjustment needed?
(c) Show that, for 1 ≤ k < n,
Pn (R) = Pk (R) ⊕ xk+1 , . . . , xn ,
where Pk (R) is an invariant subspace of D, and xk+1 , . . . , xn is
not. Therefore
Pn (R)/Pk (R) is isomorphic to xk+1 , . . . , xn .
(d) Compute the matrix representations

[D | Pk (R)]N1 and [D|Pn (R)/Pk (R) ]N2 ,
where N1 = {1, x, . . . , xk } and D|Pn (R)/Pk (R) is the linear operator
mapping the coset xj + Pk (R) to jxj−1 + Pk (R) for k + 1 ≤ j ≤ n
and N2 = {xk+1 + Pk (R), . . . , xn + Pk (R)}.
2. Let a0 , a1 , . . . , an be distinct real numbers.
(a) Define a linear functional fi : Pn (R) → R by fi (p) = p(ai ) for
0 ≤ i ≤ n. Show that B∗ = {f0 , f1 , . . . , fn } is a basis for Pn (R)∗ , the
dual space of Pn (R).
(b) There exists a (unique) basis B = {p0 , p1 , . . . , pn } for Pn (R) so that
B ∗ in (a) is the dual basis of B in Pn (R)∗ , i.e.
fi (pj ) = pj (ai ) = δij , 0 ≤ i, j ≤ n
(refer to Ex. 5(h) of Sec. 3.7.3 and Exs. 19 and 21(c) of
Sec. B.7). p0 , p1 , . . . , pn are called the Lagrange polynomials asso-
ciated with a0 , a1 , . . . , an (see Sec. B.3).
(c) For any scalars α0 , α1 , . . . , αn ∈ R, there exists a unique polynomial
p ∈ Pn (R) such that p(ai ) = αi for 0 ≤ i ≤ n. In fact,

n
n
p= p(ai )pi = αi pi .
i=0 i=0
This is the Lagrange interpolation formula.

(d) Show that
6 b n 6 b
p(t) dt = p(ai ) pi (t) dt.
a i=0 a
i(b−a)
Now, divide [a, b] into n equal parts with ai = a + n for
0 ≤ i ≤ n.
(1) Take n = 1. Show that
6 b
(b − a)
p(t) dt = [p(a) + p(b)].
a 2
This is the trapezoidal rule for polynomials.
(2) Take n = 2. Calculate
6 b 2 6 b
p(t) dt = p(ai ) pi (t) dt
a i=0 a
which will yield the Simpson’s rule for polynomials.

3.7.5 Elementary matrices and matrix factorizations

All the definitions and theoretical results concerned in Sec. 2.7.5 are still
true for real matrices of order 3. Here in this subsection, we use concrete
examples to illustrate these results stated there without rewriting or cop-
ing them.
Elementary matrices of order 3 are as follows.
Type 1:
 
0 1 0
E(1)(2) = 1 0 0 = F(1)(2) ;
0 0 1
E(1)(3) = F(1)(3) ; E(2)(3) = F(2)(3) .
Type 2:
 
α 0 0
Eα(1) = 0 1 0 = Fα(1) , α = 0;
0 0 1
Eα(2) = Fα(2) ; Eα(3) = Fα(3) .
Type 3:
 
1 0 0
E(2)+α(1) = α 1 0 = F(1)+α(2) ; E(3)+α(1) = F(1)+α(3) ;
0 0 1
E(1)+α(2) = F(2)+α(1) ; E(3)+α(2) = F(2)+α(3) ;
E(1)+α(3) = F(3)+α(1) ; E(2)+α(3) = F(3)+α(2) . (3.7.39)
For geometric mapping properties of these elementary matrices, please refer
to examples in Sec. 3.7.2.
In what follows, we list a series of examples to illustrate systematically
the general results stated in (2.7.68)–(2.7.71) and more.
Example 1 Let
 
1 1 0
A = 4 3 −5 .
1 1 5
∗
x ∗ = b where
(1) Solve A x = (x1 , x2 , x3 ) ∈ R3 and b = (b1 , b2 , b3 ).
(2) Determine if A is invertible. If yes, compute A−1 and express A and
A−1 as products of elementary matrices.
(3) Compute det A and det A−1 .

(4) Find LU and LDU decompositions of A.
(5) Compute Im(A), Ker(A) and Im(A∗ ), Ker(A∗ ) (see (3.7.32)).
Solution Perform elementary row operations to

 
1 1 0 || b1 || 1 0 0
∗
[A| b |I3 ] = 4 3 −5 || b2 || 0 1 0
| |
1 1 5 | b3 | 0 0 1
 
1 1 0 || b1 |
|
1 0 0
−−−−−−→ 0 −1 −5 | b2 − 4b1 || −4 1
|
0 (*1)
E(2)−4(1)
| |
E(3)−(1) 0 0 5 | b3 − b1 | −1 0 1
 
1 1 0 || b1 |
|
1 0 0

−−−−−−→ 0 1 5 | 4b1 − b2 ||
|
4 −1 0
E−(2)
| |
E1
(3)
0 0 1 | 15 (b3 − b1 ) | − 15 0 1
5
5
 
1 0 −5 || −3b1 + b2 || −3 1 0
−−−−−−→ 0 1 0 || 5b1 − b2 − b3 || 5 −1 −1
E(1)−(2)
| |
E(2)−5(3) 0 0 1 | 5 (b3 − b1 ) | − 5
1 1
0 1
5
 
1 0 0 || −4b1 + b2 + b3 || −4 1 1
−−−−−−→ 0 1 0 || 5b1 − b2 − b3 || 5 −1 −1 .
E(1)+5(3) | |
0 0 1 | 51 (b3 − b1 ) | −
1
5 0 1
5
(*2)
Stop at (*1):
 x1 x2 x3 |
 
1 1 0 |
b1  x1 + x2 = b1
0 −1 −5 |
|

b2 − 4b1 ⇒ −x2 − 5x3 = b2 − 4b1
| 
0 0 5 | b 3 − b1 5x3 = b3 − b1


 x1 = −4b1 + b2 + b3

⇒ x2 = 5b1 − b2 − b3


 x = 1 (b − b ).
3 3 1
5
∗
This is the solution of the equations A x ∗ = b . On the other hand,
 
1 1 0
E(3)−(1) E(2)−4(1) A = 0 −1 −5
0 0 5
   
1 1 0 1 1 0
−1
⇒ A = E(2)−4(1) −1
E(3)−(1) 0 −1 −5 = E(2)+4(1) E(3)+(1) 0 −1 −5
0 0 5 0 0 5
   
1 0 0 1 0 0 1 1 0
= 4 1 0 0 1 0 0 −1 −5
0 0 1 1 0 1 0 0 5
  
1 0 0 1 1 0
= 4 1 0 0 −1 −5 (LU-decomposition)
  
1 0 1 0 0 5
   
1 0 0 1 0 0 1 1 0
= 4 1 0 0 −1 0 0 1 5 (LDU-decomposition).
1 0 1 0 0 5 0 0 1
From here, it is easily seen that
det A = the product of the pivots 1, −1 and 5 = −5.

1
det A−1 = (det A)−1 = − .
5
Also, the four subspaces (see (3.7.32)) are:
Im(A) = (1, 1, 0), (0, −1, −5), (0, 0, 5) = R3 ;

Ker(A) = { 0 };
Im(A∗ ) = (1, 0, 0), (1, −1, 0), (0, −5, 5) = R3 ;

Ker(A∗ ) = { 0 };
which can also be obtained from (*2) (see Application 8 in Sec. B.5).
Stop at (*2):
A is invertible, since
E(1)+5(3) E(2)−5(3) E(1)−(2) E 15 (3) E−(2) E(3)−(1) E(2)−4(1) A = I3 .
Therefore,
 
−4 1 1
A−1 =  5 −1 −1
− 15 0 1
5
= E(1)+5(3) E(2)−5(3) E(1)−(2) E 15 (3) E−(2) E(3)−(1) E(2)−4(1)
1 1
⇒ det A−1 = · (−1) = − ;
5 5
and
−1 −1 −1
A = E(2)−4(1) E(3)−(1) E−(2) E −1
1 E −1 E −1 E −1
(3) (1)−(2) (2)−5(3) (1)+5(3)
5
= E(2)+4(1) E(3)+(1) E−(2) E5(3) E(1)+(2) E(2)+5(3) E(1)−5(3)

    
1 0 0 1 0 0 1 0 0 1 0 0
= 4 1 0 0 1 0 0 −1 0 0 1 0
0 0 1 1 0 1 0 0 1 0 0 5
   
1 1 0 1 0 0 1 0 −5
0 1 0 0 1 5 0 1 0
0 0 1 0 0 1 0 0 1
⇒ det A = (−1) · 5 = −5.
From the elementary matrix factorization of A, we can recapture the LDU
and hence LU decomposition, since
 
1 0 0
E(2)+4(1) E(3)+(1) = 4 1 0 = L,
1 0 1
 
1 0 0
E−(2) E5(3) = 0 −1 0 = D,
0 0 5
 
1 1 0
E(1)+(2) E(2)+5(3) E(1)−5(3) = 0 1 5 = U.
0 0 1
This is within our reasonable expectation, because in the process of obtain-
ing (*2), we use E(2)−4(1) and E(3)−(1) to transform the original A into
an upper triangle as shown in (*1) and the lower triangle L should be
−1
E(3)−(1) E(2)−4(1) = E(2)+4(1) E(3)+(1) .
∗
The LU-decomposition can help solving A x ∗ = b . Notice that
∗
A x∗ = b
   
1 1 0 1 0 0
∗
⇔ 0 −1 −5 y ∗ and 4 1 0
x∗ = y∗= b
0 0 5 1 0 1
 −1  −1
1 1 0 1 0 0
∗
⇔x ∗ = 0 −1 −5 4 1 0 b
0 0 5 1 0 1
∗
x ∗ = A−1 b .
⇔
Readers are urged to carry out actual computations to solve out the
solution.
The elementary matrices, LU and LDU decompositions can be used to
help investigating geometric mapping properties of A, better using GSP. For
example, the image of the unit cube under A is the parallelepiped as shown
in Fig. 3.50. This parallelepiped can be obtained by performing successive
mappings E(2)+4(1) to the cube followed by E(3)+(1) · · · then by E(1)−5(3) .
Also (see Sec. 5.3),
(2, 2,5)
(1,1,5)
e3
(6,5, 0)
(5, 4, 0)
A e3
e2 (1,1, 0)
e1
e2 0
0
e1
( )
1
3
scale (4, 3, − 5)
(5, 4, − 5)
Fig. 3.50
 
1 1 0
the signed volume of the parallelepiped = det 4 3 −5 = −5
1 1 5
the signed volume of the parallelepiped
⇒ = det A.
the volume of the unit cube
Since det A = −5 < 0, so A reverses the orientations in R3 . 2
Example 2 Let
 
0 1 −1
A = 2 4 6 .
2 6 4
Do problems similar to Example 1.
 
0 1 −1 |
|
b1 |
|
1 0 0 (x1 )
∗
[A| b |I3 ] = 2 4 6 |
| b2 |
| 0 1 0 (x2 )
| |
2 6 4 | b3 | 0 0 1 (x3 )
 | |

2 4 6 |
b2 |
0 1 0 (x1 )
−−−−−→ 0 1 −1 |
| b1 |
| 1 0 0 (x2 )
E(1)(2) | |
2 6 4 | b3 | 0 0 1 (x3 )
 | b2 | 1 
1 2 3 | 2 |
0 2 0
−−−−−→ 0 1 −1 |
| b1 |
| 1 0 0
E(3)−(1)
| |
E1
(1)
0 2 −2 | b3 − b 2 | 0 −1 1
2
 | b2 | 1 
1 2 3 | 2 |
0 2 0
−−−−−−→ 0 1 −1 |
| b1 |
| 1 0 0 (*3)
E(3)−2(2) | |
0 0 0 | b3 − b2 − 2b1 | −2 −1 1
 
1 0 5 |
|
− 2b1
b2
2
|
|
−2 1
2 0
−−−−−−→ 0 1 −1 |
| b1 |
| 1 0 0 . (*4)
E(1)−2(2) | |
0 0 0 | b3 − b2 − 2b1 | −2 −1 1
Notice that, since the leading entry of the first row of A is zero, exchange
of row 1 and row 2 is needed as the first row operation.
From (*3),
∗
x ∗ = b has a solution
A x.
    
x2 1 2 3 x1
⇔ E(3)−2(2) E 12 (1) E(3)−(1) E(1)(2) A  x1  = 0 1 −1  x2 
x3 0 0 0 x3
b2 
 2 
= b1 
b3 − b2 − 2b1
has a solution
x.
⇔ b3 − b2 − 2b1 = 0.
In this case,

 b2
x1 + 2x2 + 3x3 =
2
x − x = b
2 3 1


x2 = b1 + x3

⇒ x1 = −2b1 + b2 − 5x3

 2

x3 ∈ R is arbitrary.

b2
⇒
x= −2b1 + − 5x3 , b1 + x3 , x3
2

b2
= −2b1 + , b1 , 0 + x3 (−5, 1, 1), x3 ∈ R.
2

Hence, the
solution set is the affine line −2b1 + b22 , b1 , 0 + (−5, 1, 1) in
R3 with −2b1 + b22 , b1 , 0 as a particular solution. It is worth mentioning
∗
x ∗ = b is the system of equations
that, A

x2 − x3 = b1
2x + 4x2 + 6x3 = b2
 1
2x1 + 6x2 + 4x3 = b3 .
For this system of equations to have a solution, it is necessary and sufficient
that, after eliminating x1 , x2 , x3 from the equations,
b2 − 4b1 − 10x3 + 6b1 + 6x3 + 4x3 = b3
which is b3 − b2 − 2b1 = 0, as claimed above.
(*3) tells us that A is not invertible.
But (*3) does indicate that
 
1 2 3
E(3)−2(2) E 12 (1) E(3)−(1) (E(1)(2) A) = 0 1 −1 (upper triangle)
0 0 0
   
2 4 6 1 2 3
⇒ E(1)(2) A = 0 1 −1 = E −1 −1 −1 
(3)−(1) E 12 (1) E(3)−2(2) 0 1 −1

2 6 4 0 0 0
 
1 2 3
= E(3)+(1) E2(1) E(3)+2(2) 0 1 −1
0 0 0
  
2 0 0 1 2 3
= 0 1 0 0 1 −1
  
2 2 1 0 0 0
  
1 0 0 2 4 6
= 0 1 0 0 1 −1 (LU-decomposition)
2 2 1 0 0 0
   
1 0 0 2 0 0 1 2 3
= 0 1 0 0 1 0 0 1 −1 (LDU-decomposition).
2 2 1 0 0 0 0 0 0
Refer to (2) in (2.7.69) and the example after this. A can be decomposed
as follows too.
     
0 1 −1 0 1 −1 2 4 6
A −−−−−→ 2 4 6 −−−−−−→ 2 4 6 −−−−→ 0 1 −1
E(3)−(2) E(3)−2(1) E(1)(2)
0 2 −2 0 0 0 0 0 0
 
2 4 6
−1
⇒ A = E(3)−(2) −1
E(3)−2(1) −1
E(1)(2) 0 1 −1
0 0 0
   
1 0 0 0 1 0 2 4 6
= 0 1 0 1 0 0 0 1 −1 (LPU-decomposition).
2 1 1 0 0 1 0 0 0
From (*4), firstly, (*4) says that A is not invertible. Secondly, (*4) also says
∗
that Ax ∗ = b has a solution if and only if b3 − b2 − 2b1 = 0, and the solu-
tions are x1 = b22 −2b1 −5x3 , x2 = b1 +x3 , x3 ∈ R. Third, (*4) indicates that
 
1 0 5
E(1)−2(2) E(3)−2(2) E 1 (1) E(3)−(1) E(1)(2) A = 0 1 −1
2
0 0 0
 
1 0 5
⇒ P A = 0 1 −1 ,
0 0 0
 
−2 1
2 0
where P =  1 0 0 = E(1)−2(2) E(3)−2(2) E 12 (1) E(3)−(1) E(1)(2) .
−2 −1 1 3×3
(row-reduced echelon matrix of A)
Now perform elementary  column operations
 to
1 0 5 1 0 0
0 1 −1 0 1 0
       I 0
    2
PA  0 0 0  0 0 0  
   
 ----  =  ---------  −−−−−−→  ---------  =  0 0
  F(3)−5(1)    ----- 
I3 1 0 0 F(3)+(2) 1 0 −5
    Q3×3
0 1 0 0 1 1
0 0 1 0 0 1
 
1 0 −5
I2 0
⇒ P AQ = , where Q = 0 1 1 = F(3)−5(1) F(3)+(2) .
0 0 3×3 0 0 1
(the normal form of A)
From P A = R, the row-reduced echelon matrix of A, it follows that
Im(A) = (1, 0, 5), (0, 1, −1), where the basis vectors are the first and
the second row vectors of P A;
Ker(A) = (−2, −1, 1), where (−2, −1, 1) is the third row vector
of P ;
Im(A∗ ) = (0, 2, 2), (1, 4, 6), generated by the first and the second
column vectors of A;
Ker(A∗ ) = (−5, 1, 1), generated by the fundamental

solution
x∗ = 0 .
(−5, 1, 1) of A
Refer to Application 8 in Sec. B.5.

Various decompositions of A, such as LU, LDU, LPU, row-reduced ech-
elon matrix and normal form, can be used to study the geometric mapping
properties of A (refer to (3.7.9) and (3.7.10)). Of course, the application
of GSP method in drawing graphs is strongly convincing, especially when
applying stepwise to each member of an elementary matrix factorization of
a matrix.
By computation,
Ker(A) = (−2, −1, 1),

Im(A) = {(x1 , x2 , x3 ) | 5x1 − x2 − x3 = 0} =
e 1 A,
e 2 A
= (0, 1, −1), (2, 4, 6).
Therefore, the parallelepiped with side vectors e 2 and (−2, −1, 1) has
e 1,
the image under A the parallelogram with side vector e 1 A and
e 2 A. See
Fig. 3.51.
e2 A = (2,4,6)
(−2, −1, 1)
(−1, −1, 1) e3
0
e2
A
( )
1
2
scale
(1,1,0) e3
e1 e2
e1
0
e1A = (0, 1, −1)
Fig. 3.51
By the way (why? See Sec. 5.3 if needed),

1 0 0

the volume of the parallelepiped = 0 1 0 = 1,
−2 −1 1
1
e 1 A, e 1 A e 2 A 2
e 1 A,

the area of the parallelogram =
e 2 A
e 1 A, e 2 A
e 2 A,
1
2 −2 2 √
= = 6 3.
−2 56
What is the preimage of the parallelogram (e 1 A)(
e 2 A)? Readers are
urged to find the image under A of the unit cube with side vectors e 1,
e2
and e 3.
Yet the other way to visualize the mapping properties of A are to see if
A is diagonalizable. By computation, we have
λ1 = 0
v1 = t(−2, −1, 1), for t ∈ R and t = 0

λ2 = 10 v2 = t(12, 31, 29)
λ3 = −2
v3 = t(0, 1, −1).
So A is diagonalizable (see Case 1 in (3.7.29) or Sec. 3.7.6 ). Refer to

Figs. 3.30 and 3.31. 2
Example 3 Let
 
1 −3 0
A = −3 2 −1 .
0 −1 4
Do problems similar to Example 1.

 
1 −3 0 || b1 || 1 0 0
∗
A | b | I3 = −3 2 −1 || b2 || 0 1 0
| |
0 −1 4 | b3 | 0 0 1
 
1 −3 0 || b1 |
|
1 0 0

−−−−−−→ 0 −7 −1 || b2 + 3b1 |
| 3 1 0
E(2)+3(1) | |
0 −1 4 | b3 | 0 0 1
 
1 −3 0 |
|
b1 |
|
1 0 0
−−−−−−→ 0 1 17 |
| − b2 +3b
7
1 |
| − 37 − 17 0
E− 1 (2) | |
7 0 −1 4 | b3 | 0 0 1
 
1 −3 0 |
|
b1 |
|
1 0 0
 | | 
−−−−−−→ 0 1 17 |
|
− b2 +3b
7
1 |
|
− 37 − 17 0 (*5)
E(3)+(2)
| |
0 0 29
7 | − b2 +3b
7
1
+ b3 | − 37 − 17 1
 
1 0 3
7
|
|
− 3b2 +2b
7
1 |
|
− 27 − 37 0
 | | 
−−−−−−→ 0 1 1
7
|
|
− b2 +3b
7
1 |
|
− 37 − 17 0
E 7
(3)
− b2 +3b291 −7b3
29 | |
E(1)+3(2) 0 0 1 | | − 29
3
− 29
1 7
29
 
1 0 0 |
|
− 7b1 +12b
29
2 +3b3 |
|
− 29
7
− 12
29 − 29
3
 | |
1 
−−−−−−→ 0 1 0 |
|
− 12b1 +4b
29
2 +b3 |
|
− 12
29 − 29
4
− 29 . (*6)
E
(1)− 3 (3)
− 3b1 +b292 −7b3
7 | |
E
(2)− 1 (3)
0 0 1 | | − 29
3
− 29
1 7
29
7
From (*5):
∗
x∗
A = b has a solution
x.
∗
x ∗ = P b has a solution
⇔ P A x, where
 
1 0 0
 3 
P = E(3)+(2) E− 17 (2) E(2)+3(1) = 
− 7 − 7 0 .
1
− 37 − 17 1
In this case,
 
−  1

x 1 3x2 = b1 
x1 = − (7b1 + 12b2 + 3b3 )

 
 29
 1 1 
x2 + x3 = − (b2 + 3b1 ) ⇒ x2 = − 1 (12b1 + 4b2 + b3 ) .

 7 7 
 29

 

 29 x3 = − 1 (b2 + 3b1 − 7b3 ) 
x3 = − 1 (3b1 + b2 − 7b3 )
7 7 29
On the other hand,
 
1 −3 0
 
PA = 
0 1 1
7


29
0 0 7
implies that A is invertible and

1 29
det(P A) = det P · det A = − det A =
7 7
−1 1
⇒ det A = −29 and det A = − .
29
Since
 
1 0 0
P −1 = E(2)−3(1) E−7(2) E(3)−(2) = −3 −7 0
0 −1 1
 
  1 −3 0
1 0 0  
⇒ A = −3 −7 0 0 1 17 
0 −1 1 29
0 0 7
  
1 0 0 1 −3 0
= −3 1 0 0 −7 −1 (LU-decomposition)
1
0 7 1 0 0 29
7
   
1 0 0 1 0 0 1 −3 0
= −3 1 0 0 −7 0  0 1 17  (LDL∗ -decomposition).
1 29
0 7 1 0 0 7 0 0 1
∗
x∗ = b .
These decompositions can help in solving the equation A
Moreover,
  
1 −3 0 1 − 37 − 37  
   1 0 0
PAP ∗ = 
0 1 1
7
 0 − 1
 7 − 17 
 = 0 − 71 0 . (*7)
29
29 0 0 7
0 0 7 0 0 1
Notice that
P ∗ = E(2)+3(1)
∗ ∗
E− 1
∗
(2) E(3)+(2) = F(2)+3(1) F− 7 (2) F(3)+(2) .
1
7
Therefore, (*7) might be expected beforehand since A is a symmetric

matrix. (*7) means that, when performing at the same time elementary
row operations and column operations of the same types, we will get a
diagonal matrix. In fact
 
1 0 0
E(2)+3(1)
A −−−−−−→ E(2)+3(1) AF(2)+3(1) = 0 −7 −1
F(2)+3(1)
0 −1 4
   
E− 1 (2) 1 0 0 1 0 0
E
−→ 0 − 17 71  −−−−−→ 0 − 17 0 
7 (3)+(2)
−−−− (*8)
F− 1 (2) 1 F(3)+(2) 29
7 0 7 4 0 0 7
which is (*7). Interchange row 2 and row 3, and let

 
1 0 0
 
P1 = E(2)(3) P = − 37 − 17 1 .
− 37 − 17 0
Therefore
  1   1 
1 0 0 20 0 1 0 0 20 0
     
∗
P1 AP1 = 0 29
7 0 = 0 29
7 0 0 1 0 0 29
7 0
0 0 −7
1
0 0 √1 0 0 −1 0 0 √1
7 7
   
1 0 0 1 20 0
 
⇒ QP1 AP1∗ Q∗ = 0 1 0 , where Q = 0 7
0 .
29 √
0 0 −1 0 0 7
Now, let
 
1 0 0
 3√7 √ √ 
S = QP1 =  − √
 7 √29 − √7
7 √
29
√7 
29 
−377 − 77 0
 
1 0 0
⇒ SAS ∗ = 0 1 0 (A is congruent to a diagonal matrix). (*9)
0 0 −1
Hence, the index of A is equal to 2, the signature is equal to 2 − 1 = 1, and
the rank of A is 2 + 1 = 3 (see (2.7.71)).
The invertible matrix S in (*9) is not necessarily unique. A is diag-
onalizable (see Secs. 3.7.6 and 5.7 ). A has characteristic polynomial
det(A − tI3 ) = −t3 + 7t2 − 4t − 29 and has two positive eigenvalues λ1 , λ2
and one negative eigenvalue λ3 . Let

v 1,
v 2 and
v 3 be corresponding eigen-
vectors so that
 
v1
R =  v2 

v3
is an orthogonal matrix, i.e. R∗ = R−1 . Then
RAR −1 = RAR∗
 
λ1 0 0
=  0 λ2 0 
0 0 λ3
√   √ 
λ1 0 0 1 0 0 λ1 0 0
√ √
=  0 λ2 0   0 1 0   0 λ2 0 
√ √
0 0 −λ3 0 0 −1 0 0 −λ3
   
√1 0 0 
1 0 0 λ1 v1
 0 
⇒ S1 AS1∗ = 0 1 0 , where S1 =  0 √1
λ2  v2.

0 0 −1 0 0 √1

v3
−λ3
(*10)
This S1 is not equal to S in (*9) in general.

From (*6):
∗
(*6) solves Ax ∗ = b immediately, just by inspection.
Also, A is invertible and
A−1 = E(2)− 17 (3) E(1)− 37 (3) E(1)+3(2) E 29
7
(3) E(3)+(2) E− 17 (2) E(2)+3(1)
 
7 12 3
1
= − 12 4 1 ,
29
3 1 −7
A = E(2)−3(1) E−7(2) E(3)−(2) E 29
7 (3)
E(1)−3(2) E(1)+ 37 (3) E(2)+ 17 (3) .
Note that A−1 is also a symmetric matrix. Again, GSP will provide a step-
wise graphing of the mapping properties of A according to the above ele-
mentary matrix factorization.
To the end, let us preview one of the main applications of the concept
of congruence among matrices. Give the quadric from
x A = x21 + 2x22 + 4x23 − 6x1 x2 − 2x2 x3 = 1,
x,
x = (x1 , x2 , x3 )
in R3 . Let

x S −1
y = (y1 , y2 , y3 ) =
x in the coordinate system B formed by
= the coordinate vector of
three row vectors of S.
Then
x,
x A = x ∗ = (
x A x S −1 )(SAS ∗ )(
x S −1 )∗
y∗
y (SAS ∗ )
=
= y12 + y22 − y32 = 1.
This means that, in B, the quadric looks like a hyperboloid of one sheet
(refer to Fig. 3.90) and can be used as a model for hyperbolic geometry (see
Sec. 5.12).
The following examples are concerned with matrices of order m × n
where m = n.
Example 4 Let

2 1 0
A= .
−1 0 1
(1) Find LU decomposition and the normal form of A (see (2.7.70)).

(2) Find a matrix B3×2 so that AB = I2 (refer to Ex. <A> 7 of Sec. 2.7.4,
Ex. 5 of Sec. B.7 and Exs. 3–5 of Sec. B.8).
(3) Try to investigate the geometric mapping properties of AA∗ and A∗ A
(refer to Ex. of Sec. 2.7.5 and Ex. of Sec. 3.7.4).
Caution that, for the understanding of (3), one needs basic knowledge about
Euclidean structures of R2 and R3 , and one can refer to Part 2 if necessary.
Solution Perform elementary column operations to
     
2 1 0 1 1 0 1 0 0
   0 0 1 0 1 0
−1 0 1    
A      
 ---------   --------   ---------- 
-- =   −−−−−−→  1  −−−−−→  1 .
I3  1 0 0 F 12 (1)  2 0 0 F(2)−(1)  2 0 − 12 
  F   F(2)(3)  
 0 1 0 (1)+ 21 (3)  0 1 0 0 0 1
0 0 1 1
2 0 1 1
2 1 − 12
This indicates that

1 
2 0 − 12
1 0 0
A 0 0 1 = = [I2 0],
0 1 0
1
2 1 −2 1
which is the normal form of A. From this, we extract a submatrix B3×2

so that
1 
2 0
 
AB = I2 , where B =  0 0 .
1
2 1
Hence, A is right invertible and B is one of its right inverses. In general, the
right inverses B can be found in the following way. Let B = [ v ∗1
v ∗2 ]3×2 .
Then
∗ ∗

AB = A
v1 v2 v ∗1
= A v ∗2 = I2
A
v ∗1 =
⇔ A e ∗1
v ∗2 =
A e ∗2 .
Suppose
v1 = (x1 , x2 , x3 ), then
v ∗1 =
A e ∗1 ⇔ 2x1 + x2 = 1
− x1 + x3 = 0
v1 = (0, 1, 0) + t1 (1, −2, 1)
⇔ for t1 ∈ R.
Similarly, if
v2 = (x1 , x2 , x3 ), then
v ∗2 =
A e ∗2 ⇔ 2x1 + x2 = 0
− x1 + x3 = 1
v 2 = (0, 0, 1) + t2 (1, −2, 1)
⇔ for t2 ∈ R.
Therefore, the right inverses are

 
t1 t2

B = 1 − 2t1 −2t2  for t1 , t2 ∈ R. (*11)
t1 1 + t2 3×2
On the other hand, perform elementary row operations to

∗
2 1 0 | b | 1 0
| 1 |
A | b | I2 =
−1 0 1 || b2 || 0 1

1 12 0 || b21 | 1
0
−−−−−−−−−−→ | 2
(*12)
E 1 (1) , E(2)+(1) 0 12 1 || b2 + b21 || 12 1
2

1 0 −1 |
−b2 |
0 −1
−−−−−−−−−−−→ |
|
|
| . (*13)
E2(2) , E(1)− 1 (2) 0 1 2 | 2b2 + b1 | 1 2
2
From (*12),
 1

1 2 0
E(2)+(1) E 12 (2) A =  1
 (an echelon matrix)
0 2 1
 1

1 2 0
⇒ A = E2(1) E(2)−(1)  
0 12 1
1

2 0 1 2 0
= (LU-decomposition).
−1 1 0 12 1
Refer to (2.7.70).
From (*13),
∗
Ax ∗ = b has a solution
x = (x1 , x2 , x3 ).
 
x
1 0 −1  1  −b2
⇔ x2 = has a solution
x.
0 1 2 2b2 + b1
x3
⇒ x1 = −b2 + x3
x2 = b1 + 2b2 − 2x3 , x3 ∈ R
⇒ x = (−b2 , b1 + 2b2 , 0) + t(1, −2, 1),

t ∈ R.
Thus the solution set is a one-dimensional affine subspace in R3 .
Also, (*13) indicates that
E(1)− 12 (2) E2(2) E(2)+(1) E 12 (1) A

1 0 −1
= (row-reduced echelon matrix)
0 1 2

1 0 −1 0 −1
⇒PA = , where P = E (1)− 12 (2) E2(2) E(2)+(1) E 1
2 (1)
= .
0 1 2 1 2
Then, perform elementary column operations to

   
1 0 −1 1 0 0
   0 1 2
0 1
 0
PA    

 ----  =  ---------   --------- 
 −−−−−−→  
1 0 0 F(3)+(1) 1 0 1
I3   F(3)−2(2)  
0 1 0 0 1 −2
0 0 1 0 0 1
 
1 0 1
⇒ P AQ = I2 O1×1 2×3 , where Q = 0 1 −2 .
0 0 1
This is the normal form of A.
For part (3), we consider the solvability of the system of equation

x A = b where x = (x1 , x2 ) ∈ R2 and b = (b1 , b2 , b3 ) ∈ R3 .
Simple calculation or from (*13) shows that

Ker(A) = { 0 } = Im(A∗ )⊥ ,
Im(A) = {(y1 , y2 , y3 ) ∈ R3 | y1 − 2y2 + y3 = 0} = (2, 1, 0), (0, 1, 2)
= (1, 0, −1), (0, 1, 2) = Ker(A∗ )⊥ ,
Ker(A∗ ) = (1, −2, 1) = Im(A)⊥ ,
Im(A∗ ) = (2, −2), (1, 2) = (2, −1), (1, 0) = R2 = Ker(A)⊥ ,
and

5 −2
AA∗ = ,
−2 2

1 2 2
(AA∗ )−1 = .
6 2 5

For any fixed b ∈ R3 , x A = b may have a solution or not if and only if

the distance from b to the range space Im(A) is zero or not (see Ex. 4
x0 ∈ R2 so that
of Sec. 3.7.3 and Fig. 3.42). Suppose

|b − min | b −
x0 A| = x A|
x ∈R2

⇔(b −
x0 A) ⊥ x ∈ R2
x A for all

⇔(b − x A)∗ = 0 for all
x0 A)( x ∈ R2

x0 AA∗ = b A∗
⇔

x0 = b A∗ (AA∗ )−1 .
⇔ (*14)

x0 A = b A∗ (AA∗ )−1 A is the orthogonal projection of b on Im(A).The
and
operator
 
2 −1
1 2 2 2 1 0
A∗ (AA∗ )−1 A = 1 0 · ·
6 2 5 −1 1 0
0 1
 
5 2 −1
1
= 2 2 2 (*15)
6
−1 2 5
is the projection of R3 onto Im(A) along Ker(A∗ ) (see (3.7.34)). In fact,
A∗ (AA∗ )−1 A is symmetric and

(A∗ (AA∗ )−1 A)2 = A∗ (AA∗ )−1 A.
Moreover, it is orthogonal in the sense that
Im(A∗ (AA∗ )−1 A)⊥ = Ker(A∗ (AA∗ )−1 A) = Ker(A∗ ) = Im(A)⊥ ,

Ker(A∗ (AA∗ )−1 A)⊥ = Im(A∗ (AA∗ )−1 A) = Im(A).
See Fig. 3.52. Such an operation is simply called an orthogonal projection of

R3 onto Im(A) along Ker(A∗ ). We have defined A∗ (AA∗ )−1 as the general-
ized inverse A+ of A (see the Note in Ex. 4 of Sec. 3.7.3 or Sec. B.8).
Try to find A+ out of (*11).
Im(A)
b (0,1,2)
* * –1
bA (AA ) A
e3
b(2A (AA ) A−I3)
* * –1
e2
0
e1 Ker(A )
*
(2,1,0)
Fig. 3.52

A subsequent problem is to find the reflection or symmetric point of b
with respect to the plane Im(A) (see Fig. 3.52). It is the point

b A∗ (AA∗ )−1 A + ( b A∗ (AA∗ )−1 A − b ) = b (2A∗ (AA∗ )−1 A − I3 ). (*16)
Hence, denote the operator
PA = 2A∗ (AA∗ )−1 A − I3 = 2A+ A − I3
   2 
5 2 −1 3
2
3 − 13
1  2
=2·  2 2 2 − I3 =  32 − 13 3
. (*17)
6
−1 2 5 − 13 2 2
3 3
PA is an orthogonal matrix, i.e. PA∗ = PA−1 and PA is symmetric.

By simple computation or even by geometric intuition, we have:
eigenvalues of A+ A eigenvalues of PA eigenvectors

2 1
1 1 v1 = √ , √ ,0

5 5

1 2 5
1 1
v2 = √ ,− √ ,− √
30 30 30

1 2 1
0 −1 v 3 = √ , −√ , √

6 6 6
B = {v 1, v 3 } is an orthonormal basis for R3 . In B,
v 2,
   
1 0 0 v1
[A+ A]B = RA+ R−1 = 0 1 0 , where R =  v 2  is orthogonal;

0 0 0 v3
 
1 0 0
[PA ]B = RPA R = 0 1−1  0 .
0 0 −1
Try to explain [A+ A]B and [PA ]B graphically.
As compared with A+ A,
   
2 −1 5 2 −1
2 1 0
A∗ A = 1 0 = 2 1 0: R3 → R3
−1 0 1
0 1 −1 0 1
is not a projection, even though Im(A) is an invariant subspace of A∗ A. A∗ A
has eigenvalues 0,1 and 6 with corresponding eigenvectors (1, −2, 1), (0, 1, 2)
and (5, 2, −1). Readers are urged to model after Fig. 3.31 to explain graph-
ically the mapping properties of A∗ A and A+ A.
What we obtained for this particular A is universally true for any real
2 × 3 matrix of rank 2. We summarize them in
A real matrix A2×3 of rank 2 and its transpose A∗ and its
generalized inverse A+
(1) 1. AA∗ : R2 → R2 is an invertible liner operator.
2. A∗ A: R3 → Im(A) ⊆ R3 is an onto linear transformation with Im(A)
as an invariant subspace.
(2) AA∗ = 1R2 : R2 → R2 is the identity operator on R2 , considered as the

orthogonal projection of R2 onto itself along { 0 }.
⇔ A∗ A: R3 → R3 is the orthogonal projection of R3 onto Im(A) along
Ker(A∗ ), i.e. A∗ A is symmetric and (A∗ A)2 = A∗ A.
In general, A∗ does not have these properties except A∗ is a right inverse
of A.
(3) The generalized inverse
A+ = A∗ (AA∗ )−1
of A orthogonalizes A both on the right and on the left in the following
sense:
AA+ = 1R2 is the orthogonal projection of R2 onto itself along

Ker(A) = { 0 }.
⇔ A A is the orthogonal projection of R3 onto Im(A) along Ker(A∗ ).
+
Therefore, A+ can be defined as the peculiar inverse of the restricted

linear isomorphism A|Im(A∗ ) : Im(A∗ ) = R2 → Im(A), i.e.
A+ = (A|Im(A∗ ) )−1 : Im(A) → Im(A∗ ).
that makes A+ A an orthogonal projection. (3.7.40)
These results are still valid for real Am×n with rank equal to m. For a
general setting, refer to Ex. 12 of Sec. B.8 and Secs. 4.5, 5.5.
Example 5 Let
 
1 0
A = 1 1 .
1 −1
Do the same problems as Example 4.

 
1 0 || b1 || 1 0 0
∗
[A | b | I3 ] = 1 1 || b2 || 0 1 0
1 −1 || b3 || 0 0 1
 
1 0 || b1 || 1 0 0
−−−−−→ 0 1 | b2 − b1 || −1 1
|
0
E(2)−(1)
E(3)−(1) 0 −1 || b3 − b1 || −1 0 1
 
1 0 || b1 |
| 1 0 0

−−−−−→ 0 1 | |
b2 − b 1 | −1
|
1 0 . (*18)
E(3)+(2)
0 0 || b2 + b3 − 2b1 || −2 1 1
From (*18)
∗
x ∗ = b has a solution
A x = (x1 , x2 ) ∈ R2 .
⇔ b2 + b3 − 2b1 = 0
⇒ The solution is x1 = b1 , x2 = b2 − b1 .
The constrained condition b2 + b3 − 2b1 = 0 can also be seen by eliminating

x1 , x2 , x3 from the set of equations x1 = b1 , x1 + x2 = b2 , x1 − x2 = b3 .
(*18) also indicates that

1 0 0
BA = I2 , where B =
−1 1 0

v1
i.e. B is a left inverse of A. In general, let B =
v
. Then
2 2×3

v1 v A
BA = A = 1 = I2
v2 v 2A
⇔
v 1A =
e1

v 2A =
e 2.
Suppose
v 1 = (x1 , x2 , x3 ). Then

v 1A =
e1

x1 + x2 + x3 = 1
⇔
x2 − x3 = 0
⇔
v 1 = (1, 0, 0) + t1 (−2, 1, 1), t1 ∈ R.
Similarly, let
v 2 = (x1 , x2 , x3 ). Then

v 2A =
e2

x1 + x2 + x3 = 0
⇔
x2 − x3 = 1
⇔
v 2 = (−1, 1, 0) + t2 (−2, 1, 1), t2 ∈ R.
Thus, the left inverses of A are

1 − 2t1 t1 t1
B= for t1 , t2 ∈ R. (*19)
−1 − 2t2 1 + t2 t2 2×3
On the other hand, (*18) says

 
1 0
 
E(3)+(2) E(3)−(1) E(2)−(1) A = P A = 
 0 1 

0 0
(row-reduced echelon matrix and
normal form of A),
 
1 0 0
 
P = E(3)+(2) E(3)−(1) E(2)−(1) = −1 1 0

−2 1 1
   
1 0 1 0
−1
⇒ A = E(2)−(1) −1
E(3)−(1) −1
E(3)+(2) 0 1 = P −1 0 1
0 0 0 0
  
1 0 0 1 0

= 1 1 0   0 1 (LU-decomposition).
1 −1 1 0 0
Refer to (1) in (2.7.70).

To investigate A∗ and A∗ A, consider x = (x1 , x2 , x3 ) ∈ R3
x A = b for

and b = (b1 , b2 ) ∈ R .
2
By simple computation or by (*18),
Ker(A) = (−2, 1, 1) = Im(A∗ )⊥ ,

Im(A) = R2 = Ker(A∗ )⊥ ,
Ker(A∗ ) = {0} = Im(A)⊥ ,
Im(A∗ ) = {(x1 , x2 , x3 ) ∈ R3 | 2x1 − x2 − x3 = 0} = (1, 2, 0), (1, 0, 2)
= (1, 1, 1), (0, 1, −1) = Ker(A)⊥ ,
and

3 0
A∗ A = ,
0 2

∗ −1 1 2 0
(A A) = .
6 0 3

For any fixed b ∈ R2 ,
x A = b always has a particular solution
∗ −1 ∗
b (A A) A and the solution set is

b (A∗ A)−1 A∗ + Ker(A), (*20)
which is a one-dimensional affine subspace of R3 . Among so many solu-

tions, it is b (A∗ A)−1 A∗ that has the shortest distance to the origin 0 (see
Ex. 7 of Sec. 3.7.3). For simplicity, let
A+ = (A∗ A)−1 A∗

1 2 0 1 1 1 1 2 2 2
= = (*21)
6 0 3 0 −1 −1 6 0 3 −3 2×3
and can be considered as a linear transformation from R2 into R3 with the

range space Im(A∗ ). Since
A+ A = I2 ,
A+ is a left inverse of A.
Therefore, it is reasonable to expect that one of the left inverses shown

in (*19) should be A+ . Since the range of A+ is Im(A∗ ), then
B in (∗ 19) is A+ .
⇔ The range space of B = Im(A∗ )
⇔ (1 − 2t1 , t1 , t1 ) and (−1 − 2t2 , 1 + t2 , t2 ) are in Im(A∗ ).

2(1 − 2t1 ) − t1 − t1 = 0
⇔
2(−1 − 2t2 ) − (1 + t2 ) − t2 = 0
1 1
⇒ t1 = and t2 = − .
3 2
In this case, B in (*19) is indeed equal to A+ . This A+ is called the gener-
alized inverse of A.
How about AA∗ ?
   
1 0 1 1 1
1 1 1
AA∗ = 1 1  = 1 2 0 (*22)
0 1 −1
1 −1 1 0 2
is a linear operator on R3 with the range space Im(A∗ ). Actual computation

shows that (AA∗ )2 = AA∗ . Therefore, AA∗ is not a projection of R3 onto
Im(A∗ ) along Ker(A). Also
eigenvalues of AA∗ eigenvectors

1 1
2 v 1 = 0, √ , − √

2 2

1 1 1
3 v2 = √ , √ , √

3 3 3

2 1 1
0 v 3 = −√ , √ , √

6 6 6
indicates that AA∗ is not a projection (see (3.7.34)). Notice that

   
2 0 0 v1
QAA∗ Q−1 = 0 3 0 , where Q =  v 2  is orthogonal.

0 0 0 v3
√   √ 
2 √0 0 1 0 0 2 √0 0
⇒ QAA∗ Q∗ =  0 3 0 0 1 0  0 3 0
0 0 1 0 0 0 0 0 1

I2 0
⇒ R(AA∗ )R∗ = ,
0 0 3×3
where
    
√1 0 0 0 √1
2
− √12 0 1
2 − 12
 2
√1
  √1 √1 √1
  1 1 1 
R= 0 3
0  3 3 3 = 3 3 3 .
0 0 1 − √2 √1 √1 − √26 √1 √1
6 6 6 6 6
Thus, the index of AA∗ is 2 and the signature is equal to 2. By the way,
we pose the question: What is the preimage of the unit circle (or disk)
y12 + y22 = 1 (or ≤1) under A? Let
y = (y1 , y2 ) =
x A. Then
y12 + y22 =
yy∗ = 1
⇔ ( x A)∗ =
x A)( x∗
x AA∗
= x21 + 2x22 + 2x23 + 2x1 x2 + 2x1 x3 , in the natural basis
for R3
= 2x2 2
1 + 3x2 , in the basis { v 1 , v 2 , v 3 } = B

= x2 2
1 + x2 , in the basis {R1∗ , R2∗ , R3∗ } = C
= 1, (*23)
where (x1 , x2 , x3 ) = [ x Q−1 and (x1 , x2 , x3 ) = [

x ]B = x R−1 . See
x ]C =
Fig. 3.53.
R3 Ker(A)
R2
e3 v3
v2 A
e2
e1 0
A*
v1
Im( A* )
Fig. 3.53
Replace A∗ in AA∗ by A+ and, in turn, consider

   
1 0 2 2 2
1 2 2 2 1
AA+ = 1 1 · = 2 5 −1 .
6 0 3 −3 6
1 −1 2 −1 5
AA+ is symmetric and
(AA+ )2 = AA+ AA+ = AI2 A+ = AA+ .
Therefore, AA+ : R3 → Im(A∗ ) ⊆ R3 is the orthogonal projection of R3

onto Im(A∗ ) along Ker(A). Note that
Im(AA+ ) = Im(A+ ) = Ker(A)⊥ = Ker(AA+ )⊥ ,

Ker(AA+ ) = Ker(A) = Im(A∗ )⊥ = Im(AA+ )⊥ .
x in R3 with respect
What is the reflection or symmetric point of a point
∗
to the plane Im(A )? Notice that (see (*16))
x ∈ R3

→ x on Im(A∗ )
x AA+ , the orthogonal projection of
→ x AA+ −
x AA+ + ( x (2AA+ − I3 ), the reflection point.
x) = (*24)
Thus, denote the linear operator

    − 1 2 2
2 2 2 1 0 0 3 3 3
1  
PA = 2AA+ − I3 = 2 · 2 5 −1 − 0 1 0 =  23 2
3 − 13  .
6
2 −1 5 0 0 1 2
− 13 2
3 3
PA is symmetric and is orthogonal, i.e. PA∗ = PA−1 and is called the reflection
of R3 with respect to Im(A∗ ).
A simple calculation shows that
eigenvalues of A+ A eigenvalues of PA eigenvectors

1 2
1 1 u1 = √ , √ , 0

5 5

2 1 5
1 1 u2 = √ , −√ , √

30 30 30

2 1 1
0 −1 u3 = −√ , √ , √

6 6 6
D = {
u 1, u 3 } is an orthonormal basis for R3 . In D,
u 2,
   
1 0 0 u1
+ + −1  
[AA ]D = S(AA )S = 0 1 0 , where S = u 2 is orthogonal;  

0 0 0 u3
 
1 0 0
[PA ]D = SPA S −1 = 0 1 0 .
0 0 −1
Try to explain [AA+ ]D and [PA ]D graphically.
As a counterpart of (3.7.40), we summarize in
A real matrix A3×2 of rank 2 and its transpose A∗ and its
generalized inverse A+
(1) 1. AA∗ : R3 → Im(A∗ ) ⊆ R3 is an onto linear transformation with
Im(A∗ ) as an invariant subspace.
2. A∗ A: R2 → R2 is an invertible linear operator.
(2) AA∗ : R3 → R3 is the orthogonal projection of R3 onto Im(A∗ ) along
Ker(A), i.e. AA∗ is symmetric and (AA∗ )2 = AA∗ .
⇔ A∗ A = 1R2 : R2 → R2 is the identity operator on R2 , considered as

the orthogonal projection of R2 onto itself along { 0 }.
In general, A∗ does not have these properties except A∗ is a left inverse
of A.
(3) The generalized inverse
A+ = (A∗ A)−1 A∗
of A orthogonalizes A both on the right and on the left in the following
sense.
AA+ : R3 → R3 is the orthogonal projection of R3 onto Im(A∗ )
along Ker(A).
⇔ A+ A = 1R2 is the orthogonal projection of R2 onto itself along

Ker(A∗ ) = { 0 }.
Therefore, A+ can be defined as the inverse of the linear isomorphism
A|Im(A∗ ) : Im(A∗ ) ⊆ R3 → R2 , i.e.
A+ = (A|Im(A∗ ) )−1 : Im(A) = R2 → Im(A∗ ). (3.7.41)
These results are still valid for real Am×n with rank equal to n. (see Ex. 12
of Sec. B.8 and Sec. 5.5.)
Exercises
<A>
1. Prove (2.7.68) of Sec. 2.7.5 for A3×3 .
2. Prove (2.7.69) of Sec. 2.7.5 for A3×3 .
3. Prove (2.7.70) for A2×3 and A3×2 .
4. Prove (2.7.71) for real symmetric matrix A3×3 . For the invariance of
the index and the signature of A, try the following methods.
(1) A case-by-case examination. For example, try to prove that it is
impossible for any invertible real matrix P3×3 so that
   
1 0 0 −1 0 0
P 0 0 0 P ∗ =  0 0 0 .
0 0 0 0 0 0
(2) See Ex. 3.
5. Prove Ex. <A> 7 of Sec. 2.7.5 for A3×3 .
6. For each of the following matrices A:
(1) Do problems as in Example 1.
(2) Find the generalized inverse A+ of A and explain it both alge-
braically and geometrically (one may refer to Exs. 4 and 8 of
Sec. 3.7.3, (3.7.40), (3.7.41) and Sec. B.8 if necessary).
     
3 1 1 −1 0 −3 1 2 −1
(a)  2 4 2 . (b)  0 1 2 . (c) 2 4 −2 .
−1 −1 1 −1 −1 −5 3 6 −3
7. For each of the following matrices, refer to Example 2 and do the same
problems as in Ex. 6.
     
0 −2 3 0 3 5 0 −1 −4
(a) 0 1 −4 . (b) −1 2 4 . (c) 0 2 8 .
2 0 5 −1 −1 −1 0 1 4
8. For each of the following matrices, refer to Example 3 and do the same
problems as there.
     
0 1 1 2 1 1 2 3 0
(a) 1 0 1 . (b) 1 2 1 . (c) 3 5 −1 .
1 1 0 1 1 2 0 −1 2
 
−2 −2 0
(d) −2 −1 1 .
0 1 1
9. Use (2.7.71) and Ex. 4 to determine the congruence among the following
matrices:
     
1 0 1 0 1 2 1 2 3
0 1 2 , 1 −1 3 and 2 4 5 .
1 2 1 2 3 4 3 5 6
If A and B are congruent, find an invertible matrix P3×3 so that
B = PAP ∗ .
10. Let

1 1 1
A= .
2 1 0
Do the same problems as in Example 4. One may also refer to Ex. 
of Sec. 2.7.5.
11. Model after (*19) and (*21) and try to derive A+ from (*11) in
Example 4.
12. Let
 
2 1
A = 1 0 .
1 1
Do the same problems as in Example 5.

1. Prove (3.7.40).
2. Prove (3.7.41).
3. Let A be a nonzero real symmetric matrix of order n. Suppose there
exist invertible matrices Pn×n and Qn×n so that
PAP ∗ and QAQ∗
are diagonal matrices. Let p and q be the number of positive diagonal
entries of PAP ∗ and QAQ∗ , respectively. Suppose that p < q. For
x i = Pi∗ , the ith row vector of P for 1 ≤ i ≤ n and
simplicity, let
y j = Qj∗ for 1 ≤ j ≤ n. Let the rank r(A) = r. Note that r ≥ q > p.

(a) Define f : Rn → Rr+p−q by

f (
x ) = ( x∗1 , . . . ,
x A x ∗p ,
x A y ∗q+1 , . . . ,
x A y ∗r ).
x A
Show that f is linear and r(f ) ≤ r + p − q and hence
dim Ker(f ) ≥ n − (r + p − q) = n − r + (q − p) > n − r.
(b) There exists a nonzero x0 ∈ Ker(f ) − x n . Hence,

x r+1 , . . . ,
∗
f ( x0 ) = 0 implies that x0 A x i = 0 for 1 ≤ i ≤ p and

x0 A y ∗j = 0
for q + 1 ≤ j ≤ r.

n
n
(c) Let x0 = ai
xi = bj
y j . Use (b) to show that ai = 0 for
i=1 j=1
1 ≤ i ≤ p and bj = 0 for q + 1 ≤ j ≤ r.
(d) There exists some i0 for p + 1 ≤ i0 ≤ r so that ai0 = 0. Show that

r
∗
x0 A x0 = x i A
a2i x ∗i < 0 and
i=p+1
q
∗
x0 A x0 = y ∗j
y j A
b2j ≥ 0,
j=1
a contradiction.
Hence, p = q should hold. In fact, the above process can be further
simplified as follows. Rewrite x = (x1 , . . . , xn ) =
yP =
z Q where

y = (y1 , . . . , yn ) and z = (z1 , . . . , zn ) so that

x∗ =
x A y ∗ = y12 + · · · + yp2 − yp+1
y (PAP ∗ ) 2
− · · · − yr2 , and
∗ ∗ ∗

x A x = z (QAQ ) z = z12 + ··· + zq2 − 2
zq+1 − ··· − zr2 .
Then

yP =
zQ
⇔y = z QP −1
n
⇔ yj = bij zi for 1 ≤ j ≤ n.
i=1
Now, consider the system of linear equations

n
yj = bij zi = 0 for 1 ≤ j ≤ p
i=1
zq+1 = 0
..
.
zn = 0
which has a nonzero solution z1∗ , . . . , zq∗ , zq+1
∗
= 0, . . . , zn∗ = 0 since
q > p. Corresponding to this set of solutions, the resulted y will induce
that
x Ax ∗ ≤ 0 while the resulted z will induce that x Ax ∗ > 0, a
contradiction. Hence, p = q should hold.
4. Let
 
0 0 0 0 1 1 1
0 2 6 2 0 0 4
A=
0
 .
1 3 1 1 0 1
0 1 3 1 2 1 2 4×7
Find invertible matrices P4×4 and Q7×7 so that

I 0
P AQ = 3 .
0 0 4×7
Express P and Q as products of elementary matrices.

5. Suppose a, b and c are positive numbers so that aa + bb + cc = 0.
Determine the row and column ranks of
 
0 c −b a
 −c 0 a b 
A=
 b

−a 0 c 
−a −b −c 0
and find invertible matrices P and Q so that P AQ is the normal

form of A.
6. For each of the following matrices Am×n , do the following problems.
(1) Find an invertible matrix P so that P A = R is the row-reduced

∗
echelon matrix of A. Use this result to solve A x ∗ = b where

x ∈ Rn and b ∈ Rm .

(2) Find the row and column ranks of A.

(3) Find an invertible matrix Q so that P AQ is the normal form of A.
Also, express P and Q as products of elementary matrices.
(4) In case m = n, i.e. A is a square matrix, determine if A is invertible.
If it is, find A−1 and express A and A−1 as products of elementary
matrices and hence compute det A and det A−1 .
(5) If A is a symmetric matrix, find invertible matrix S so that SAS ∗ is a
diagonal matrix with diagonal entries equal to 1, −1 or 0. Determine
the index and the signature of A.
(6) Find Ker(A), Im(A), Ker(A∗ ) and Im(A∗ ).
(7) Try to find the generalized inverse A+ of A, if possible.

   
i 1−i 0 2 −2 1 1 0 0 3
(a) 1 −2 1 . (b)  2 −1 −1 0 1 0 5 .
1 2i −1 −1 2 2 −1 0 1 6
 
2 2 −1
−2 3  
 2
 0 0 0 0 1 1 1
  0
−1 1 −1 2 6 2 0 0 4
(c)  . (d) 
0
.
 1 0 0 1 3 1 1 0 1
 
 0 1 0 0 1 3 1 2 1 2
0 0 1
 
  1 −3 −1 5 6
1 1 1 −3  2
0 1 0 0 −1 2 3
0  
(e) 
1 1 2 −3 .
 (f) 
−1 0 2 −1 3 .
 3 1 3 2 1
2 2 4 −5
−2 −1 0 1 2
   
1 1 1 1 0 1 1 ··· 1
1
2 3 4 1 0 1 · · · 1
2 1 1 1  
 1 0 · · · 1
(g) 
1
3 4 5
1
. (h) 1  .
3
1 1 . .. 
4 5 6  .. .. .. . . 
. . . .
1 1 1 1
4 5 6 7 1 1 1 · · · 0 n×n
7. Let
 
1 0 −1 2 1
−1 1 3 −1 0
A=
−2
.
1 4 −1 3
3 −1 −5 1 −6
(a) Prove that the rank r(A) = 3 by showing that A∗1 , A∗2 and A∗4 are
linearly independent and A∗3 = −A∗1 + 2A∗2 , A∗5 = −3A∗1 − A∗2 +
2A∗4 . Find α1 , α2 , α3 so that A4∗ = α1 A1∗ + α2 A2∗ + α3 A3∗ .
(b) Show that
Ker(A) = (0, −1, 2, 1),

Im(A) = A1∗ , A2∗ , A3∗ .
(c) A matrix B4×4 has the property that BA = O4×5 if and only if
Im(B) ⊆ Ker(A). Find all such matrices B. Is it possible to have a
matrix B4×4 of rank greater than one so that BA = O? Why?
(d) A matrix C5×5 has the property that AC = O4×5 if only if

Im(A) ⊆ Ker(C). Find all such matrices C. Why is r(C) ≤ 2?
(e) Find matrices B4×4 so that BA have each possible rank. How
about AC?
Also, try to do (a) and (b) by using elementary row operations on A.
Can we solve (c) and (d) by this method? And (e)?
(Note Refer to Ex. <C> of Sec. 2.7.3, Ex. <C> 21 of Sec. 2.7.5 and
Sec. B.5.)
8. Suppose
 
1 4 0 −1 0 2 −5
0 0 1 −3 0 5 −2
R= 0 0

0 0 1 0 −1
0 0 0 0 0 0 0 4×7
is the row-reduced echelon matrix of A4×7 . Determine A if
     
3 0 −2
 2  1  −3 
A∗1 =    
 −9  , A∗3 =  −2  and A∗5 =  1 
 
5 4 −1
and find an invertible matrix P4×4 such that P A = R.
(Note Refer to Sec. B.5.)
9. Let A be any one of the following matrices
   
1 −1 0 0 1 0 0 0
1 −1 1 1 
, 0 0 1 −1 and −1 1 0 0 .
1 0 2 1
1 −1 1 −1 0 2 1 1
Do problems as in Example 4.
10. Let A be any one of the following matrices
     
1 0 1 1 0 1 0 1
−1 1 1 1 0  0 −1 0
     .
 0 2 , 0 1 −1 and 0 0 −2
3 0 1 0 1 1 1 0
Do problems as in Example 5.

Read Ex. <C> of Sec. 2.7.5 and do problems listed there. Also, do the
following problems.
1. Let
P1 (x) = 2 − 3x + 4x2 − 5x3 + 2x4 , P2 (x) = − 6 + 9x − 12x2 + 15x3 − 6x4,

P3 (x) = 3 − 2x + 7x2 − 9x3 + x4 , P4 (x) = 2 − 8x + 2x2 − 2x3 + 6x4 ,
P5 (x) = 1 − x − 3x2 + 2x3 + x4 , P6 (x) = − 3 − 18x2 + 12x3 + 9x4 ,
P7 (x) = − 2 + 3x − 2x2 + x3 , P8 (x) = 2 − x − x2 + 7x3 − 9x4
be polynomials in P4 (R) which is isomorphic to R5 . Find a subset of

{P1 , P2 , . . . , P8 } which is a basis for P1 , P2 , . . . , P8 .
2. Let
S = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 − x2 + x3 − x4 + x5 = 0}
be a subspace of R5 .
(a) Determine dim(S).
(b) Show that (1, 1, 1, 1, 0) ∈ S and extend it to form a basis for S.
3. Let S be the set of solutions of the system of linear equations
3x1 − x2 + x3 − x4 + 2x5 = 0
x1 − x2 − x3 − 2x4 − x5 = 0.
(a) S is a subspace of R5 . Determine dim(S).

(b) Show that (1, −1, −2, 2, 0) ∈ S and extend it to form a basis for S.
4. The set

1 1 0 −1 2 1 1 −2 1 0 −1 2
S= , , , , ,
0 1 −1 1 1 9 4 −2 −1 1 2 −1
generates a subspace V of M(2; R) which is isomorphic to R4 . Find a

subset of S that is a basis for V .
3.7.6 Diagonal canonical form

We have encountered, up to now, many diagonalizable linear operators on
R3 or real matrices A3×3 . Here in this subsection, we are going to prove
(2.7.73) for square matrices of order 3 and hence realize partial results listed
in (3.7.29).
Suppose A is diagonalizable, i.e. there exists an invertible matrix P3×3

so that
   
λ1 0 x1
PAP = −1  λ2  , where P = x2  

0 λ3 x3
⇔
xi A = λi
xi for 1 ≤ i ≤ 3

⇔
xi (A − λ i I3 ) = 0 for 1 ≤ i ≤ 3. (3.7.42)
Remind that λ1 , λ2 and λ3 are eigenvalues of A and x1 ,
x2 and
x3 are
associated eigenvectors of A, respectively. Note that
B = {
x1 , x3 }
x2 ,
is a basis for R3 , consisting entirely of eigenvectors.
For any vector x ∈ R3 ,
x = α1 x1 + α2
x2 + α3
x3 for some unique scalars
α1 , α2 and α3 . Hence

x (A − λ1 I3 )(A − λ2 I3 )(A − λ3 I3 )
x1 (A − λ1 I3 )](A − λ2 I3 )(A − λ3 I3 )
= α1 [
x2 (A − λ2 I3 )](A − λ1 I3 )(A − λ3 I3 )
+ α2 [
x3 (A − λ3 I3 )](A − λ1 I3 )(A − λ2 I3 )
+ α3 [

= α1 0 + α2 0 + α3 0 = 0 x ∈ R3
for all
⇒ (A − λ1 I3 )(A − λ2 I3 )(A − λ3 I3 ) = O3×3 . (3.7.43)
A direct matrix computation as
(A − λ1 I3 )(A − λ2 I3 )(A − λ3 I3 )
 
0 0
= P −1  λ2 − λ1 
0 λ 3 − λ1
   
λ1 − λ2 0 λ1 − λ 3 0
P · P −1  0  P · P −1  λ2 − λ3 P
0 λ3 − λ2 0 0
−1
=P OP = O
will work too. This result is a special case of Cayley–Hamilton theorem
which states that A satisfies its characteristic polynomial
det(A − tI3 ) = −(t − λ1 )(t − λ2 )(t − λ3 ).
Eigenvalues λ1 , λ2 and λ3 may be not distinct.
Case 1 λ1 = λ2 = λ3
If A3×3 has three distinct eigenvalues λ1 , λ2 and λ3 , then their respective
eigenvectors x1 ,
x2 and
x3 should be linearly independent. For

α1
x1 + α2
x2 + α3
x3 = 0

⇒ (apply both sides byA) α1 λ1
x1 + α2 λ2
x2 + α3 λ3
x3 = 0
⇒ (eliminating, say
x1 , from the above two relations)

α2 (λ1 − λ2 )
x2 + α3 (λ1 − λ3 )
x3 = 0
⇒ (by inductive assumption) α2 (λ1 − λ2 ) = α3 (λ1 − λ3 ) = 0
⇒ (because λ1 = λ2 = λ3 ) α2 = α3 = 0 and hence α1 = 0.
Now, { x1 , x3 } is a basis for R3 and thus (3.7.42) holds. In particular,

x2 ,
A is diagonalizable and each eigenspace
Eλi = {
x ∈ R3 | x } = Ker(A − λi I3 )
x A = λi
is of dimension one.
Case 2 λ1 = λ2 = λ3
(3.7.43) can be simplified as
(A − λ1 I3 )(A − λ3 I3 )
   
0 0 λ1 − λ 3 0
= P −1  0  P · P −1  λ1 − λ 3 P
0 λ3 − λ1 0 0
= P −1 OP = O. (3.7.44)
In this case,
x1 and x2 are eigenvectors associated to λ1 = λ2 and hence
dim(Eλ1 ) = 2 while dim(Eλ3 ) = 1.
λ3
Conversely, if A has three eigenvalues λ1 , λ2 and λ3 with λ1 = λ2 =
such that (A − λ1 I3 )(A − λ3 I3 ) = O holds. We claim that
dim(Eλ1 ) = 2 and dim(Eλ3 ) = 1.

Suppose dim(Eλ3 ) = 2. Since dim(Eλ1 ) ≥ 1 and Eλ1 ∩ Eλ3 = { 0 }, so
dim(Eλ1 ) = 1 should hold. In a resulted basis B = { x1 , x3 } consisting
x2 ,
of eigenvectors x1 ∈ Eλ1 and x2 , x3 ∈ Eλ3 , A is diagonalizable with diago-

nal entries λ1 , λ3 , λ3 which contradicts our original assumption. Therefore

dim(Eλ3 ) = 1.
Now, (A − λ1 I3 )(A − λ3 I3 ) = O implies that
Im(A − λ1 I3 ) ⊆ Ker(A − λ3 I3 )
⇒ (since A − λ1 I3 = O3×3 and dim(Eλ3 ) = 1)
dim Im(A − λ1 I3 ) = r(A − λ1 I3 ) = 1
⇒ dim Ker(A − λ1 I3 ) = dim Eλ1 = 3 − 1 = 2.
Or, by using (3.7.31) (also, refer to Ex.<C> of Sec. 2.7.3),
r(A − λ1 I3 ) + r(A − λ3 I3 ) − 3 ≤ r(O) = 0

⇒ 1 ≤ r(A − λ1 I3 ) ≤ 3 − 2 = 1
⇒ r(A − λ1 I3 ) = 1.
Hence dim(Eλ1 ) = 2.
Since R3 = Eλ1 ⊕ Eλ3 , A is definitely diagonalizable.
Case 3 λ1 = λ2 = λ3 , say λ
Then (3.7.43) is simplified as
 
0 0
(A − λI3 ) = P −1  0  P = P −1 OP = O (3.7.45)
0 0
⇒ A = λI3
i.e. A itself is a scalar matrix. 2
We summarize as (refer to (2.7.73))
The diagonalizability of a nonzero real matrix A3×3

and its canonical form
Suppose the characteristic polynomial of A is
det(A − tI3 ) = −(t − λ1 )(t − λ2 )(t − λ3 )
where λ1 , λ2 and λ3 are real numbers. Let
Eλi = Ker(A − λi I3 ) = {
x ∈ R3 | x },
x A = λi i = 1, 2, 3
be the eigenspace corresponding to λi . Then A is diagonalizable if and only

if one of the following cases happens.
(1) λ1 = λ2 = λ3 .
a. The minimal polynomial is (t − λ1 )(t − λ2 )(t − λ3 ).
b. Let Eλi =

xi for 1 ≤ i ≤ 3. Then B = {x1 , x3 } is a basis for
x2 ,
R and
3
   
λ1 0 x1
[A]B = PAP −1 =  λ2  , where P =  x2  .

0 λ3 x3
Define the following matrices or linear operators:
   
1 0 0 0
A1 = P −1  0  P, A2 = P −1  1  P,
0 0 0 0
 
0 0
A3 = P −1  0  P.
0 1
Then,
1. R3 = Eλ1 ⊕ Eλ2 ⊕ Eλ3 .
2. Each Ai : R3 → R3 is a projection of R3 onto Eλi along Eλj ⊕ Eλk
for 1 ≤ j < k ≤ 3 and j, k = i, i.e.
A2i = Ai .
3. Ai Aj = O3×3 if i = j, 1 ≤ i, j ≤ 3.
4. I3 = A1 + A2 + A3 .
5. A = λ1 A1 + λ2 A2 + λ3 A3 .
See Fig. 3.31.
(2) λ1 = λ2 = λ3 and their algebraic multiplicities are equal to their respec-
tive geometric dimensions, i.e. dim(Eλ1 ) = 2 and dim(Eλ3 ) = 1.
a. The minimal polynomial is (t − λ1 )(t − λ3 ).
b. Let Eλ1 = x2 and Eλ3 =
x1 , x3 . Then B = {
x1 , x3 } is a
x2 ,
basis for R and
3
   
λ1 0 x1
[A]B = PAP = −1  λ1   
, where P = x2 .

0 λ3 x3
Define
   
10 0 0
A1 = P −1  1  P and A3 = P −1  0  P.
0 0 0 1
Then,
1. R3 = Eλ1 ⊕ Eλ3 .
2. Each Ai : R3 → R3 is a projection of R3 onto Eλi along Eλj for j = i,
i.e. A2i = Ai for i = 1, 3.
3. A1 A3 = A3 A1 = O3×3 .
4. I3 = A1 + A3 .
5. A = λ1 A1 + λ3 A3 .
See Fig. 3.31 for λ1 = λ2 .
(3) λ1 = λ2 = λ3 , say equal to λ and dim(Eλ ) = 3.
a. The minimal polynomial is t − λ.
b. For any basis B = {
x1 , x3 } for R3 ,
x2 ,
 
x1
A = [A]B = PAP −1 = λI3 , x2  .
where P = 

x3
Namely, A is a scalar matrix. (3.7.46)
Notice that the following matrices

     
λ1 0 λ 0 λ 0
A =  1 λ1  , B = 1 λ  and C =  1 λ 
0 0 λ2 0 0 λ 0 1 λ
(3.7.47)
are not diagonalizable and their respective minimal polynomials are
(t − λ1 )2 (t − λ2 ), (t − λ)2 and (t − λ)3 . For details, see Sec. 3.7.7.
In what follows, we list four further examples to be concerned.
Note that similar matrices have the same characteristic polynomials
and hence the same minimal polynomials, but not conversely for matrices
of order n ≥ 4. For example, the matrices
   
1 0 || 0 1 0 || 0
1 1 ||  1 1 || 
   
 ------------  and  ------------  (3.7.48)
 |
| 1 0  |
| 1 0
| |
0 | 1 1 0 | 0 1
both have the same characteristic polynomial (t−1)4 and the same minimal
polynomial (t − 1)2 , but they are not similar to each other (Why? One may
prove this by contradiction or refer to Sec. 3.7.7).
Example 1 Test if
   
0 −1 0 −1 4 2
A = 3 3 1 and B = −1 3 1
1 1 1 −1 2 2
are similar.
Solution The characteristic polynomials are

−t −1 0

det(A − tI3 ) = 3 3 − t 1 = −t(3 − t)(1 − t) − 1 + 3(1 − t) + t
1 1 1 − t
= −(t − 1)2 (t − 2),
det(B − tI3 ) = −(t − 1)2 (t − 2).
So both are the same and A and B have common eigenvalues 1, 1 and 2.
By computation,
  
−1 −1 0 −2 −1 0
(A − I3 )(A − 2I3 ) =  3 2 1  3 1 1
1 1 0 1 1 −1
 
−1 · · · · · ·
= · · · · · · · · · = O,

··· ··· ···
  
−2 4 2 −3 4 2
(B − I3 )(B − 2I3 ) = −1 2 1 −1 1 1 = O.
−1 2 1 −1 2 1
So A has the minimal polynomial (t − 1)2 (t − 2), while B has (t − 1)(t − 2).
Therefore A and B are not similar and B is diagonalizable while A is not.
Equivalently, r(A−I3 ) = 2 implies that dim E1 = 3−r(A−I3 ) = 1 which
is not equal to the algebraic multiplicity 2 of 1, meanwhile r(B − I3 ) = 1
implies that dim E1 = 3 − r(B − I3 ) = 2. Thus A is not diagonalizable while
B is and they are not similar. 2
Example 2 Let a2 = (1, −1, 1) and

a1 = (−1, 1, 1), a3 = (1, 1, −1).

(1) Try to find linear operators mapping the tetrahedron ∆ 0 a1
a2
a3 onto

the tetrahedron ∆ 0 (− a1 )(− a2 )(− a3 ). See Fig. 3.54(a).

(2) Try to find a linear operator mapping the tetrahedron ∆ 0
a1
a2
a3 onto

the parallelogram a1 a2 . See Fig. 3.54(b).
Solution (1) There are six such possible linear operators. The simplest
one, say f1 , among them is the one that satisfies
a i ) = −
f1 ( ai for 1 ≤ i ≤ 3.
− a3 2e3
a1 a1
e3 e3
a2 e2 a2
e1 0 − a2 0 e2
e1
− a1 a3
a3
(a) (b)
Fig. 3.54
In the natural basis N = {e 1, e 3 },

e 2,
 −1   
a1 −1 0 a1
 
[f1 ]N = a2  −1   
a2 = −I3

a3 0 −1 a3
⇒ f1 (
x ) = −
x = −
x I3 .
It is possible that a2 are mapped into −
a1 and a2 and −a1 respectively
while a3 is to −a3 . Denote by f2 such a linear operator. Then
a1 ) = −
f2 ( a2 ,
a2 ) = −
f2 ( a1 ,
a3 ) = −
f2 ( a3 . (*1)
     
0 −1 0 a1 −1 1 1
⇒ [f2 ]N = P −1 −1 0 0 P, where P = a2  =  1 −1 1
0 0 −1
a3 1 1 −1
     
0 1 1 0 −1 0 −1 1 1 0 −1 0
1
= 1 0 1−1 0 0 1 −1 1 = −1 0 0
2
1 1 0 0 0 −1 1 1 −1 0 0 −1
   
0 −1 0 0 1 0

⇒ f2 ( x ) = x [f2 ]N = x −1

0 0 = − x 1 0 0 .
 
(*2)
0 0 −1 0 0 1
Notice that, (*1) is equivalent to
− f2 (
e 1 ) + f2 ( e 3 ) = −
e 2 ) + f2 ( e 1 ) − f2 (
a2 , f2 ( e 3 ) = −
e 2 ) + f2 ( a1 ,
f2 ( e 2 ) − f2 (
e 1 ) + f2 ( e 3 ) = −
a3
⇒ f2 (
e 1 ) + f2 ( e 3 ) = −(
e 2 ) + f2 ( a1 + a3 ) = −(1, 1, 1)
a2 +
e 1 ) = −
⇒ f2 ( e 2, e 2 ) = −
f2 ( e 1, e 3 ) = −
f2 ( e 3. (*3)
This is just (*2). f2 is diagonalizable. Similarly, both

   
0 0 1 1 0 0

f3 ( x ) = − x 0

1 0 
and f4 ( x ) = − x 0

0 1
1 0 0 0 1 0
are another two such linear operators.

The last two linear operators are
   
0 1 0 0 0 1
x 0
x ) = −
f5 ( 0 1 x 1
x ) = −
and f6 ( 0 0 .
1 0 0 0 1 0
Both are not diagonalizable. For details, see Sec. 3.7.8.

(2) The parallelogram a1 a2 has the vertices at 0 ,
a1 ,
a1 +
a2 = 2
e3
and
a2 .
Define a linear operator g: R3 → R3 as
g(
a1 ) =
a1 ,
g(
a2 ) =
a2 ,
g(
a3 ) =
a1 +
a2 = 2
e 3.
The process like (*2) or (*3) will lead to

 1 3
2 − 12 2
 1 3
g(
x) =
x [g]N = x − 2 1
2 2
or
0 0 1
 
1 0 0
[g]B = P [g]N P −1 = 0 1 0 ,
1 1 0
where P is as above and B = {

a1 , a3 }. g is diagonalizable and
a2 ,
   
1 0 0 −1 1 1
Q[g]N Q−1 = 0 1 0 , where Q =  1 −1 1 .
0 0 0 1 1 −3
g is a projection of R3 onto the subspace a2 along (1, 1, −3) as

a1 ,
can be visualized in Fig. 3.54(b).
The readers are urged to find more such linear operators. 2
One of the main advantages of diagonalizable linear operators or matri-

ces A is that it is easy to compute the power
An
for n ≥ 1 and n < 0 if A is invertible. More precisely, suppose

 
λ1 0
A = P −1  λ2 P
0 λ3
⇒ 1. det(A) = λ1 λ2 λ3 .
2. A is invertible ⇔ λ1 λ2 λ3 = 0. In this case,
 −1 
λ1 0
A−1 = P −1  λ−1
2
 P.
0 λ−1
3
3. Hence
 
λn1 0
An = P −1  λn2  P.
0 λn3
4. tr(A) = λ1 + λ2 + λ3 .
5. For any polynomial g(t) ∈ Pn (R),

 
g(λ1 ) 0
g(A) = P −1  g(λ2 )  P. (3.7.49)
0 g(λ3 )
These results still hold for any diagonalizable matrix of finite order.
Example 3 Use
 
1 −6 4
A = −2 −4 5
−2 −6 7
to justify (3.7.49).
Solution The characteristic polynomial is

det(A − I3 ) = −(t + 1)(t − 2)(t − 3).
Hence A is diagonalizable.
For λ1 = −1:
 
2 −6 4
x3 ) −2 −3 5 = 0

x (A + I3 ) = (x1 x2
−2 −6 8
v1 = (1, 4, −3) is an associated eigenvector.
⇒
v2 = (0, 1, −1). For λ3 = 3,
For λ2 = 2, v3 = (1, 0, −1). Thus
     
−1 0 v1 1 4 −3
A = P −1  2  P, where P =  v2  = 0 1 −1 .
0 3
v3 1 0 −1
Now
 
1 −4 4
1
P −1 = 1 −2 −1
2
1 −4 −1
 
(−1)n 0
⇒ An = P −1  2n P
n
0 3
 
n
(−1) + 3 n
4 · (−1)n − 4 · 2n −3 · (−1)n + 4 · 2n − 3n
1
= (−1) − 3
n n
4 · (−1)n − 2 · 2n −3 · (−1)n + 4 · 2n + 3n 
2
(−1) − 3
n n
4 · (−1)n − 4 · 2n −3 · (−1)n + 4 · 2n + 3n
for n = 0, ±1, ±2, . . . . 2
In Markov process (see Applications <D2 >), one of the main themes is
to compute
lim
x0 An ,
n→∞
where A is a regular stochastic matrix and
x0 is any initial probability
vector. We give such an example.
Example 4 Let
2 1 1 
5 10 2
1 7 1 
A= 5 10 10 
.
1 1 3
5 5 5
(a) Compute limn→∞ An .

x0 = (α1 , α2 , α3 ), i.e. α1 ≥ 0, α2 ≥ 0, α3 ≥ 0
(b) For any probability vector
and α1 + α2 + α3 = 1, compute limn→∞ x0 An .
Note that A is a regular stochastic matrix, i.e. each entry of it is positive
and each row vector of it is a probability vector.

2 7 3 1 1
det(A − tI3 ) = −t −t −t + +
5 10 5 50 500

1 7 1 3 1 2
− −t − −t − −t
10 10 50 5 50 5
1 1
= − (10t3 − 17t2 + 8t − 1) = − (t − 1)(2t − 1)(5t − 1).
10 10
Thus, A has eigenvalues 1, 12 and 15 .
For λ1 = 1:
 3 1
−5 1
10 2
 1 
x (A − I3 ) = (x1 x2 x3 )  51 − 10
3
10 
= 0
1
5
1
5 − 25
⇒
v1 = (5, 7, 8) is an associated eigenvector.
v2 = (0, 1, −1) and for λ3 = 15 ,
For λ2 = 12 , v3 = (3, 1, −4). Therefore,
   
1 0 5 7 8
A = Q−1  1
2
 Q, where Q = 0 1 −1 .
0 1
5 3 1 −4
By computation,
 
−3 36 −15
1 
Q−1 =− −3 −44 5 .
60
−3 −16 5
Then
   
1 0 1 0 0
lim An = lim Q−1  ( 12 )n  Q = Q−1 0 0 0 Q
n→∞ n→∞
0 ( 15 )n 0 0 0
 
0.25 0.35 0.40
= 0.25 0.35 0.40 .

0.25 0.35 0.40
This limit matrix has three equal row vector
1 1
p = (0.25, 0.35, 0.40) = v1 = (5, 7, 8),
5+7+8 20
which is the unique probability vector as an eigenvector associated to the
eigenvalue 1 of A.
For any probability vector
x0 = (α1 , α2 , α3 ),

p
lim x0 An =
x0 lim An = x0  p
n→∞ n→∞
p
= (0.25(α1 + α2 + α3 ), 0.35(α1 + α2 + α3 ),
0.40 · (α1 + α2 + α3 ))
= (0.25, 0.35, 0.40) =
p.
This guarantees the uniqueness of

p. 2
Exercises
<A>

(1) Test if A is diagonalizable and justify your claim.
(2) If A is diagonalizable, try to find an invertible matrix P so that
PAP −1 is diagonal and justify (3.7.49).
(3) If A is not diagonalizable, try to decide its canonical form according

to (3.7.29).
     
10 11 3 3 3 3 0 −2 1
(a) −3 −4 −3 . (b) −3 −3 −3 . (c) 1 3 −1 .
−8 −8 −1 3 3 3 0 0 1
     
−1 1 2 1 −1 4 1 2 1
(d)  1 2 
1 . (e) 3  2 −1 . (f) 0 1 0 .
 
2 1 −1 2 1 −1 1 3 1
     
3 2 −2 4 −3 3 0 0 1
(g) 2 2 −1 . (h) 0 1 4 . (i) 0 1 2 .
2 1 0 2 −2 1 0 0 1
2. Show that f2 , f3 and f4 in Example 2 are diagonalizable, while f5 and
f6 are not diagonalizable. Guess what are their canonical forms.
3. Show that the two matrices in (3.7.48) are not similar.

1. Prove Ex. 4 of Sec. 2.7.6 for 3 × 3 matrices.

2. Determine whether each of the following matrices A is diagonalizable.
If it is, find invertible matrix P4×4 so that PAP −1 is diagonal. Also,
justify (3.7.49).
     
1 0 0 1 0 −3 1 2 2 3 3 2
0 1 1 1  1 −1 2  
(a)   . (b) −2  . (c) 3 2 2 3 .
0 0 2 0 −2 1 −1 2 0 0 1 1
0 0 0 2 −2 −3 1 4 0 0 0 1
Read Ex. <C> of Sec. 2.7.6 and do problems there. Also do the following
problems.
1. Test the diagonalizability for each of the following linear operators f .

If it is, find a basis B for which [f ]B is a diagonal matrix.
(a) f : M(2; R) → M(2; R) defined by f (A) = A∗ .
(b) f : P3 (R) → P3 (R) defined by f (p) = p , the first derivative of p.
(c) f : P3 (R) → P3 (R) defined by f (p)(x) = p(0) + p(1)(1 + x + x2 ).
(d) f : P3 (R) → P3 (R) defined by f (p) = p + p .
(e) Denote by C ∞ (R) the vector space of all infinitely differentiable
functions on R. Let V = ex , xex , x2 ex be the subspace of C ∞ (R)
generated by ex , xex and x2 ex . f : V → V is defined by
f (p) = p − 2p + p.

9x
(f) f : P3 (R) → P3 (R) defined by f (p)(x) = 0 p (t) dt.
(g) f : P3 (R) → P3 (R) defined by f (p)(x) = x2 p (x) − xp (x).
2. To compute the eigenvalues of a matrix of larger order, one might adopt
numerical methods such as Newton’s one or the QR-decomposition of
a matrix (see Secs. 4.4 and 5.4). For details, refer to Wilkinson [18] or
Strang [15, 16]. In practice and in many occasions, bounds for eigen-
values are good enough to solve problems. Hirsch (1900) and Gersgörin
(1931) gave some beginning results in this direction. For our purposes in
Application <D3 >, we state Gersgörin’s disk theorem as follows: Each
eigenvalue of a complex matrix A = [aij ]n×n lies in some disk

n
|z − ajj | ≤ |aij | − |ajj |, j = 1, 2, . . . , n
i=1
in the complex plane C.

(a) To prove it, suppose λ is an eigenvalue of A and x = (x1 , . . . , xn ) is
an associated eigenvector. Let |xk | ≥ |xi | for i = 1, 2, . . . , n. Then
xk = 0. Try to show that

n

x⇔
x A = λ aij xi = λxj j = 1, 2, . . . , n
i=1
and hence
% &

n
|λxk − akk xk | = |xk ||λ − akk | ≤ |xk | |aik | − |akk | .
i=1
(b) As a consequence of it, show that for any eigenvalue λ of A,

 
 n
n 
|λ| ≤ min max |aij |, max |aij | .
1≤j≤n 1≤i≤n 
i=1 j=1
3. Let A = [aij ]n×n ∈ M(n; C). Define

n
n
Ar = max |aij | and Ac = max |aij |.
1≤i≤n 1≤j≤n
j=1 i=1
(a) Suppose A is a positive matrix, i.e. each entry aij > 0 for
1 ≤ i, j ≤ n and λ is an eigenvalue of A such that |λ| = Ar
or Ac , then
λ = Ar or Ac
respectively. Also, show that Eλ = { x ∈ Cn | x A = λ x} =
(1, 1, . . . , 1).
(b) Furthermore, suppose A is a positive stochastic matrix (for defi-
nition, see Ex. <D3 >). Then any eigenvalue λ of A other than 1
satisfies |λ| < 1 and dim E1 = 1 where E1 = {
x ∈ Cn |
xA = x }.
4. Let A = [aij ] ∈ M(n; F).
(a) Show that the characteristic polynomial of A
det(A − tIn )
= (−1)n tn + an−1 tn−1 + · · · + ak tk + · · · + a1 t + a0
 
ai1 i1 · · · ai1 ik
n ..  tn−k .
= (−1)tn + (−1)n−k  ..
. .
k=1 1≤i1 <···<ik ≤n ai i · · · ai i
k 1 k k
Namely, the coefficient an−k of det(A − tIn ) is equal to (−1)n−k

times the sum of Ckn principal subdeterminants (see Sec. B.6) of
order k of A.
(b) If A has n eigenvalues λ1 , . . . , λn in F so that
det(A − tIn ) = (−1)n (t − λ1 ) . . . (t − λn ),
then the elementary symmetric function of order k of λ1 , . . . , λn is

λi1 , . . . , λik = (−1)n−k an−k
1≤i1 <···<ik ≤n
= sum of Ckn principal subdeterminants

of order k of A.
5. Let A = [aij ] ∈ M(n; C)
(a) Suppose λ0 is a simple root of det(A − tIn ) = 0. Show that there
exists some k so that

1 ··· k − 1 k + 1 ··· n
(A − λ0 In ) = 0
1 ··· k − 1 k + 1 ··· n
which is the principal subdeterminant of order n − 1 of
det(A − λ0 In ), obtained by deleting the kth row and kth column
(see Sec. B.6).
(b) Let x0 be an eigenvector corresponding to the simple eigenvalue

λ0 . Then
x0 can be expressed as
(∆1 , ∆2 , . . . , ∆n )
* +
2 ··· n
if ∆1 = (A − λ0 In ) 2 · · · n = 0 and ∆i , 2 ≤ i ≤ n,
is obtained from ∆1 by replacing the (i − 1)st column by
−a12 , −a13 , . . . , −a1n .
x0 ∈ Cn be a fixed vector. Show that

6. Let
x∗0
det( x0 − λIn ) = (−1)n [λn − ( x∗0 )λn−1 ],
x0
where x0 =
x0 , x∗0 . Try to find an invertible matrix Pn×n so that
x0
∗ −1
P ( x0 x0 − λIn )P = diag[ x0 , 0, . . . , 0].
x0 ,
7. Let
 
1 1 1 1
1 i −1 −i
A=
1 −1
.
1 −1
1 −i −1 i
(a) Find the characteristic polynomial, eigenvalues and eigenvectors

of A. Diagonalize A.
(b) Compute Ak and their eigenvalues, for 2 ≤ k ≤ 4.
8. Let
 
a0 a1 a2 a3
−a1 a0 −a3 a2 
A=
−a2
.
a3 a0 −a1 
−a3 −a2 a1 a0
Find the characteristic polynomial of A and try to diagonalize A, if

possible.
9. Find the characteristic polynomial of each of following matrices and its

eigenvalues, if possible.
 
  0 1 1 0 0 ··· 0 0
α1 1 0  
−1 α  0 0 1 1 0 · · · 0 0
 2 1  0 0 0 1 1 · · · 0 0
   
 −1 α3 1  . . . . . .. .. 
 ;  . . . . .  ;
 .. .. ..  . . . . . . .
 . . .   
  0 0 0 0 0 · · · 1 1
 −1 αn−1 1   
1 0 0 0 0 · · · 0 1
0 −1 αn
1 1 0 0 0 · · · 0 0 n×n
 
a1 0 · · · 0 b1
 0 a2 · · · 0 b 2 
 
 .. .. .. ..  .
. . . . 
 
 0 0 · · · an−1 bn−1 
b1 b2 ··· bn−1 an
10. Let
=
k
ϕ0 (t) = 1; ϕk (t) = (t − aj ), 1 ≤ k ≤ n.
j=1
Show the characteristic polynomial of

 
a1 1 0 · · · 0 0
 0 a2 1 · · · 0 0
 
 .. .. .. .. .. 
. . . . .
 
 0 0 0 · · · an−1 1 
b0 b1 b2 · · · bn−2 an

n−2
is (−1)n [ϕn (t) − k=0 bk ϕk (t)].
11. Suppose A = [aij ] ∈ M(n; F) has the characteristic polynomial
det(A − tIn ) = (−1)n tn + an−1 tn−1 + · · · + a1 t + a0
= (−1)n (t − λ1 ) · · · (t − λn ).
Let sk = trAk denote the trace of Ak , k = 0, 1, 2, . . . . Show that if
2 ≤ k ≤ n,
(−1)n−1 an−1 = s1 ,
k · (−1)n−k an−k = (−1)k−1 [sk + (−1)n an−1 sk−1 + · · ·
+ (−1)n an−k+1 s1 ];
and if k > n,
sk − (−1)n−1 an−1 sk−1 + (−1)n−2 an−2 sk−2 + · · ·

+ (−1)n−1 (−1)a1 sk−n+1 + (−1)n a0 sk−n
= sk + (−1)n an−1 sk−1 + (−1)n an−2 sk−2 + · · ·
+ (−1)n a1 sk−n+1 + (−1)n a0 sk−n = 0.
These are called Newton identities.

12. (a) Let A, B ∈ M(2; F). Show that
(1) det(A − tI2 ) = t2 − (tr A)t + det A.
(2) AB + BA = Atr B + Btr A + I2 (tr(AB) − tr A · tr B).
(b) Let A, B ∈ M(3; F). Show that
(1) det(A − tI3 ) = −t3 + (tr A)t2 − tr(adjA)t + det A.
(2) det(A−tB) = det A−tr((adjA)B)t+tr((adjB)A)t2 −(det B)t3 .
(3) det(A + B) = det A + tr((adjA)B) + tr((adjB)A) + det B.
<D1 > Application (I): The limit processes and the matrices
Suppose aij (t) is a real or complex valued function defined on a set S in
the plane R2 or C for each 1 ≤ i ≤ m, 1 ≤ j ≤ n. Then
A(t) = [aij (t)]m×n ∈ M(m, n; C), t ∈ S.
If t0 is a limit point of S and for each i, j, limt→t0 aij (t) = aij exists, then
A = [aij ] = lim A(t)

t→t0
is called the limit matrix of A(t) as t → t0 . Similarly, one defines
A (t) = [aij (t)]m×n ,
6 6
b b
A(t) dt = aij (t) dt .
a a
m×n
(k)
Let A(k) = [aij ]m×n ∈ M(m, n; C) be a sequence of matrices of the
(k)
same order. Suppose for each i, j, limk→∞ aij = aij exists, then
A = [aij ]m×n = lim A(k)

k→∞
is called the limit matrix of the sequence matrices A(k) as k → ∞.

1. Let A = [aij ]m×n . Show that
det A
is a polynomial in n2 variables aij , 1 ≤ i, j ≤ n. Hence, det A is a

continuous function of its entries.
2. Try to use Ex. 1 to prove Ex. <C> 8(b) of Sec. 2.7.5.
3. Suppose A and B are n × n matrices.
(a) If A or B is invertible, show that AB and BA have the same char-
acteristic polynomial.
(b) Use Ex. 1 to prove (a) when both A and B are non-invertible.
4. Prove
(1) limk→∞ (αA(k) ) = α limk→∞ A(k) , α ∈ C.
(2) limk→∞ (A(k) + B (k)
* ) = limk→∞ A
(k) (k)
+ + limk→∞ B .
(3) limk→∞ PA(k) = P limk→∞ A(k)
* +
and limk→∞ A(k) Q = limk→∞ A(k) Q.
* +
(4) limk→∞ PAk P −1 = P limk→∞ Ak P −1 .
5. Let A = [aij ] ∈ M(n; C). Then

(i) limk→∞ Ak exists.
⇔ (ii) a. If λ is an eigenvalue of A, then either |λ| < 1 or λ = 1.
b. If 1 is an eigenvalue of A, then
the algebraic multiplicity of 1 = the geometric multiplicity
of 1.
(a) Try to prove (i) ⇒ (ii) a.
(b) Suppose A is diagonalizable and (ii) a holds. Try to show that
limk→∞ Ak exists.
(For a complete proof for this result, one needs Jordan canonical
forms of matrices (see Sec. 3.7.7 and Ex. <C> 13(e) there).)
k
1 0
(c) Show that limk→∞ 1 1 does not exist.
(d) Compute limk→∞ Ak , where A is either
 
0 − 32 21  1 
 1  −2 0 0
− 2 −1 12  or  0 1
1 .
  3
− 52 − 15
2 3 0 0 − 14
6. Let A ∈ M(n; C). Show that

lim A(k) = O
k→∞
if and only if
ρ(A) = max |λj | < 1
1≤j≤n
where λ1 , . . . , λn are eigenvalues of A. ρ(A) is called the spectral

radius of A.
<D2 > Application (II): Matrix exponential, etc.
It is well-known form analysis that the power series
∞
zk 1 1
ez = = 1 + z + z2 + · · · + zk + · · ·
k! 2! k!
k=0
converses absolutely on the plane C and uniformly on every compact subsets
of C. For any fixed matrix A = [aij ] ∈ M(n; C), via norms for A such as
Ar and Ac as defined in Ex. <C> 3 or others (see Ex. <C> 14 of
Sec. 3.7.7), it is not hard to show that

1 1
lim In + A + A2 + · · · + Ak exists.
k→∞ 2! k!
Therefore, we define the exponential of A as this limit matrix and denote as
∞
1 k
eA = A .
k!
k=0
For example,
 
1 0 1
A = 2 1 3
1 0 1
   
0 0 0 −1 0 1
⇒ A = P −1 0 1 0 P, where P =  3 −1 2
0 0 2 1 0 1
 
k 1 0
1 l −1 
k 1 
⇒ A =P  P
l! l=0 l!
k
l=0 0 1
l=0 l! · 2l
 
1 0
⇒ eA = P −1  e  P.
2
0 e

a b
1. (a) Let A = −b a , a, b ∈ R. Show that
at
e cos bt eat sin bt
etA = , t ∈ R.
−eat sin bt eat cos bt
(b) Let
 
0 a −b
A = −a 0 c , a, b, c ∈ R.
b −c 0
Show that there exists an invertible matrix P so that
 ui 
e 0 0
eA = P −1  0 e−ui 0 P,
0 0 1
√
where u = a2 + b2 + c2 .
2. Suppose An×n is diagonalizable and
 
λ1 0
 
PAP −1 =  ..
. .
0 λn
Show that
 
eλ1 0
 .. 
eA = P −1  .  P.
0 eλn
For eA where A has Jordan canonical form, see Ex. <C> 13(a), (d) of
Sec. 3.7.7.
3. Give examples to show that, in general,
(1) eA · eB = eB · eA .
(2) eA+B = eA · eB .
Suppose A, B ∈ M(n; C). Show that
et(A+B) = etA etB , t∈C
if and only if AB = BA. Hence, in this case,
eA+B = eA eB = eB eA .
4. Prove the following.
(1) det eA = etr(A) .
(2) For any A, eA is invertible and
(eA )−1 = e−A .

−1
(3) ePAP = P eA P −1 .
(4) If A is skew-symmetric, i.e. A is real and A∗ = −A, then eA is
∗
orthogonal, i.e. (eA )∗ = eA = e−A .
e = AetA = etA · A.
d tA
(5) dt

k
Suppose Am ∈ M(n; C) for m = 1, 2, 3, . . .. If m=0 Am = Sk converges

∞
to A as k → ∞ (see Ex. <D1 >), then the series m=0 Am is said to
converge to the sum A and is denoted as
∞

A= Am .
m=0
5. Let A ∈ M(n; C) and its spectral radius ρ(A) < 1 (see Ex. <D1 > 6).
Show that
(a) In − A is invertible, and

∞
(b) (In − A)−1 = m=0 Am (Note that A0 = In ).
In case A is a nilpotent matrix of index k, then (In − A)−1 = In +
A + · · · + Ak−1 .

n
6. (a) Let A1 = i,j=1 |aij | where A = [aij ] ∈ M(n; C). Show that (see
also Ex. <C> 14 of Sec. 3.7.7)
(1) A1 ≥ 0, and = 0 ⇔ A = O.
(2) αA1 = |α|A1 , α ∈ C.
(3) A + B1 ≤ A1 + B1 .
(4) AB1 ≤ A1 B1 .
(5) x A| ≤ |
| x |1 A1 , where
x = (x1 , . . . , xn ) ∈ Cn and |
x |1 =

n 1
k=1 |xk |.
Hence, M(n; C) endowed with 1 is a Banach space and is also a

Banach algebra.
(b) Suppose A ∈ M(n; C) is invertible. For any 0 < ε < 1, show that
any matrix B in the set

ε
B ∈ M(n; C)|B − A1 <
A−1 1
is also invertible. This means that the general linear group GL(n; C)
is an open set in M(n; C).

∞
7. (Weyr, 1887) Let A ∈ M(n; C). Suppose the power series m=0 am z m
−1
has positive radius r of convergence, where r = limm→∞ m |am | .
Show that

∞
(1) if the spectral radius ρ(A) < r,then 0 am Am converges absolutely

∞
namely, 0 |am |Am 1 < ∞ ; and

∞
(2) if ρ(A) > r, 0 am Am diverges (i.e. does not converge).
In particular, in case r = +∞, then for any A ∈ M(n; C), the power
series
∞

am Am
m=0

∞
always converges absolutely. Suppose ϕ(A) = m=0 am Am , ρ(A) < r.
(a) Let λ1 , . . . , λn be eigenvalues of A. Then ϕ(λ1 ), . . . , ϕ(λn ) are eigen-
values of ϕ(A).
(b) Suppose A is diagonalizable. Then A and ϕ(A) are simultaneously
diagonalizable.
(c) Let

1
1
A= 2 1 .
0 2
Show that

−k 2k k · 2k+1
(I2 − A) = , k ≥ 1.
0 2k
(d) Let
 
0 a −b
A = −a 0 c , a, b, c ∈ R.
b −c 0
Show that
cosh u − 1 2 sinh u
cos A = I3 − A , sin A = A
u2 u
where u2 = a2 + b2 + c2 .
8. Let A ∈ M(n; C) be invertible. If there exists a matrix X ∈ M(n; C) so
that
eX = A,
then X is called a logarithmic matrix of A and is denoted by
X = log A.
log A is a multiple-valued function of A.

(a) If A is diagonalizable and
 
λ1 0
 .. 
A = P −1  .  P,
0 λn
then
 
log λ1 0
 .. 
log A = P −1  .  P.
0 log λn
(b) Suppose
 
λ 0
 λ 
 
A= @@ .. , λ = 0,
 
@ .
@ λ
is a lower triangular matrix. Note that (A − λIn )k = O for k ≥ n.

Then
∞
(−1)m−1
log A = log[λIn + (A − λIn )] = (log λ)In + m
(A − λIn )m
m=1
mλ

n−1
(−1)m−1
= (log λ)In + m
(A − λIn )m .
m=1
mλ
(c) Suppose
 
A1 0
 A2 
 
A = P −1  ..  P,
 . 
0 Ak
where each Aj is a lower triangular matrix of order rj with diagonal

entries all equal to λj for 1 ≤ j ≤ k, and r1 + · · · + rk = n. Then
 
log A1 0
 log A2 
 
log A = P −1  .  P.
 . . 
0 log Ak
(d) Try to determine log In .
(e) Show that
∞
(−1)m−1 m
log(In + A) = A , ρ(A) < 1.
m=1
m
(f) Let
 
λ 0
A = 1 λ , λ = 0.
0 1 λ
Show that
   
1 0 log λ 0
A λ   1 
e =e 1 1 and log A =  λ log λ .
1
2 1 1 − 2λ1 2 1
log λ
λ
<D3 > Application (III): Markov processes

In a sequential experiment or data collection, each stage contains n states.
Let
aij = the probability that the ith state at time t causes the jth state
at time t + 1,
for 1 ≤ i, j ≤ n. Then aij ≥ 0. The resulted matrix
 
a11 . . . a1n
 ..  , where a ≥ 0, 1 ≤ i, j ≤ n and
M =  ... ..
. .  ij
an1 ··· ann

n
aij = 1 for 1 ≤ i ≤ n, i.e. each row is a probability vector,
j=1
is called the transition matrix or stochastic matrix or Markov matrix of this

Markov process. M is called positive if each entry aij > 0 and is called
regular if some non-negative power matrix M r is positive.
For example, the matrix in Example 4 is a positive or regular stochastic

matrix. The following matrices
 
  0 12 41 41
0.5 0.5 0 1 
0  0 14 41 
0 1 and  2 
0 0 1 0
0 1 0
0 0 0 1
are stochastic but not regular.
1. Let M = [aij ]n×n be a non-negative matrix, i.e. each aij ≥ 0.
(a) Show that
(1) M is stochastic.
⇔ (2) If e0 M ∗ =
e0 = (1, 1, . . . , 1) ∈ Rn , then e0 .
⇔ (3) If
p is a probability vector, so is p M.
(b) If M is stochastic, so are M k for k = 0, 1, 2, . . . .
(c) If M1 and M2 are stochastic, then both
(1 − t)M1 + tM2 for 0 ≤ t ≤ 1 and M1 M2
are stochastic.
2. Let M = [aij ]n×n be a stochastic matrix.
(a) M has eigenvalue 1.
(b) If M is regular, then each eigenvalue λ of M is either |λ| < 1 or
λ = 1. For the eigenvalue λ = 1, we have
x ∈ Rn |
dim{ x} = 1
xM =
i.e. the geometric multiplicity of 1 is equal to 1 (see Ex. 3(d) below).
(k)
3. Let M = [aij ]n×n be a regular stochastic matrix and M k = aij n×n
for k = 1, 2, 3, . . . . Then
(a) M k has limit matrix as k → ∞. In fact,
   
p0 p1 p2 . . . pn
p  p p2 . . . pn 
 0  1  ∗
lim M k =  .  =  . .. .. ..  = e p 0
k→∞  ..   .. . . .

p0 p1 p2 . . . pn
where
e0 = (1, 1, . . . , 1) (see Ex. 1(a)) and
p 0 = (p1 , . . . , pn ) is a
probability vector with
(k)
pj = lim aij for 1 ≤ i ≤ n and 1 ≤ j ≤ n.
k→∞
(b) For any probability vector

p,
lim
p Mk =
p0
k→∞
always holds. This p 0 is called the limiting probability vector of M .

(c) In particular, p 0 M k =

p 0 for k = 1, 2, 3, . . .. Hence, p 0 is the
unique probability vector that is also an eigenvector associated to
the eigenvalue 1 of M . In this case,
p 0 is also called a stability vector
or steady-state vector for M .
(d) Furthermore, as an eigenvalue of M ,
the algebraic multiplicity of 1
= the geometric multiplicity of 1 = 1.
(a) can be proved by using nested interval theorem in real analysis. (b)
and (c) are then consequences of (a) (see also Ex. <D1 > 5). While for
(d), one needs Jordan canonical form, see Ex. <C> 15 of Sec. 3.7.7 and
Ex. 5 below.
4. Let

0 1
M= .
1 0

Show that limk→∞ M k does not exist but p 0 = 12 , 12 satisfies
p 0M k =

p 0 for k = 1, 2, . . . and hence p 0 is a stability vector for M . Note that
M is not regular.
x = (x1 , . . . , xn ) ∈ Cn , let
For

n
x |1 =
| |xi |,
i=1
and for A = [aij ] ∈ M(n; C), let

n
n
A1 = |Ai∗ |1 = |aij |.
i=1 i,j=1
Then, just like Ex. <D2 > 6(a), M(n; C) is a Banach algebra with the
norm 1 . A vector x = (x1 , . . . , xn ) in Cn is called positive (or non-
negative) if xi > 0 (or xi ≥ 0) for 1 ≤ i ≤ n and is denoted as

x> 0 x ≥ 0 ).
(or
Hence, a probability vector is a nonzero non-negative vector. Define

x > x ≥
y (or y) ⇔
x −
y > 0 (or ≥ 0 ).
5. Let A = [aij ] ∈ M(n; C) be a positive matrix.

(a) A has a unique positive eigenvalue λ(A), greater than the absolute
value of any other eigenvalue of A. Also

λ(A) = max{λ | λ ≥ 0 and there exists a nonzero
x≥ 0
x A ≥ λ
so that x}

= min{λ | λ > 0 and there exists a positive
x> 0
x A ≤ λ
so that x}

n
i=1 aij xi
= max min
1≤j≤n xj
x≥0

x = 0

n
i=1 aij xi
= min

max .

x ≥ 0 1≤j≤n xj

x = 0
(b) There exists a positive eigenvector of A corresponding to λ(A). Also,

the geometric multiplicity of λ(A)
= the algebraic multiplicity of λ(A) = 1.

n
n
(c) min1≤j≤n i=1 aij ≤ λ(A) ≤ max1≤j≤n i=1 aij = Ac .
6. (Positive matrix and positive stochastic matrix) Let A = [aij ]n×n be a
positive matrix.
(a)
x0 = (d1 , . . . , dn ) is a positive eigenvector of A corresponding to the
largest positive eigenvalue λ(A) if and only if there exists a positive
Markov matrix M so that
A = λ(A)D−1 M ∗ D, where D = diag[d1 , . . . , dn ].
Note that x0 can be chosen as the unique positive probability vector.
(b) Suppose p 0 = (p1 , . . . , pn ) is the limiting probability vector of M in
(a) corresponding to 1 and e0 = (1, 1, . . . , 1). Then
Ak
lim p ∗0
= D−1 ( p ∗0
e0 )D = D−1 x0
k→∞ (λ(A))k
exists as a positive matrix. Note that e0 D =

x0 , as in (a).
(c) For any vector x ∈ R (in particular, a non-negative vector), the
n
limit vector
k
xA
lim = p ∗0
x D−1 ( x D−1
e0 )D = ( p ∗0 )
x0
k→∞ (λ(A))k
always exists and is an eigenvector of A corresponding to λ(A).

(Note For more information, refer to Chung [38, 39] and Doob [40] or
simpler Kemeny and Snell [41].)
<D4 > Application (IV): Differential equations
Fundamental theorem of calculus says that the (simplest) differential

equation
dx
= ax
dt
has (all) the solutions
x(t) = αeat , α is any constant.
The initial condition x(0) = α0 will restrict α to be α0 .

As a consequence, the homogeneous linear system of differential
equations
dx1
= λ1 x1 (t),
dt
dx2
= λ2 x2 (t),
dt
with initial conditions x1 (0) = α10 , x2 (0) = α20
has the unique solution
x1 (t) = α10 eλ1 t , x2 (t) = α20 eλ2 t , t ∈ R.
In matrix notation, the system can be written as

d
x λ 0
=
x (t)A, where x (t) = (x1 (t), x2 (t)), A = 1

,
dt 0 λ2

x (0) =
x0 = (α10 , α20 )
and the solution is as

x (t) =
x0 etA .

1. The homogeneous linear system of differential equations
dxj n
= aij xi (t), 1 ≤ j ≤ n,
dt i=1
with initial conditions x1 (0) = α10 , . . . , xn (0) = αn0

can be written as, in matrix notation,
dx
=x (t)A, where x (t) = (x1 (t), . . . , xn (t)) and A = [aij ]n×n ,
dt

x (0) =
x0 = (x1 (0), . . . , xn (0)).
Then, the general solution is

x (t) =
αetA , α ∈ Rn or Cn

and the initial value problem has the unique solution

x (t) =
x0 etA .
In particular, if A is diagonalizable and
   
λ1 0 v1
   
 λ2   v2 
   
A = P −1 DP, where D =   and P =  .  ,
 ..
.   .. 
   

0 λn vn
then the solution is

x (t) = αP −1 etD P
αetA =
= (c1 , c2 , . . . , cn )etD P, where αP −1 = (c1 , c2 , . . . , cn )
n
= ci eλi t v i ∈ v1 , v n ,
v2 , . . . ,
i=1
where {
v1 , v n}
v2 , . . . ,
is the fundamental system of the solution space.
In case A is not diagonalizable, the Jordan canonical form of A is needed
(see Ex. <D> of Sec. 3.7.7).
2. For each of the following systems of differential equations, do the fol-
lowing problems.
(1) Find the general solutions, the fundamental system of solutions and
the dimension of the solution space.
(2) Solve the initial value problem.

dx1
(a) = −x2 ,
dt
dx2
= x1 , with x1 (0) = 1, x2 (0) = −1.
dt
dx1
(b) = x1 + 2x2 ,
dt
dx2
= −2x1 + x2 , with x1 (0) = 2, x2 (0) = 0.
dt
dx1
(c) = x2 ,
dt
dx2
= x1 , with x1 (0) = −2, x2 (0) = 1.
dt
dx1
(d) = −x2 ,
dt
dx2
= −x1 , with x1 (0) = 2, x2 (0) = 0.
dt
 
−1 1 2
dx
(e) =x A, where A =  1 2 1 and x (0) = (2, 0, 4).
dt
2 1 −1
 
2 −1 −1
dx
(f) = x (0) = (−1, −2, 3), where A = −1
x A, 2 −1 .
dt
−1 −1 2
 
1 −2 2
dx
(g) = x (0) = (0, 0, −2), where A = −1
x A, 0 −1.
dt
0 2 −1
dx1
(h) = x2 + x3 ,
dt
dx2
= x3 + x1 ,
dt
dx3
= x1 + x2 , with x1 (0) = 1, x2 (0) = 0, x3 (0) = −1.
dt
3. Let

λ 0 0 0
A= = λI2 + J, where J = .
1 λ 1 0
(a) Show that J 2 = O and hence
(λI2 + J)k = λk I2 + kλk−1 J for k = 1, 2, 3, . . . .

(b) Use definition and (a) to show that

eλt 0
etA = eλt I2 + teλt J = .
teλt eλt
(c) Solve
d
x
=
x A.
dt
4. Solve
   
λ 0 0 λ 0 0
d
x   

= x A, where A = 1 λ 0 or 1 λ 0 .
dt
0 0 λ 0 1 λ
5. To solve
d3 x d2 x dx
3
− 6 2 + 11 − 6x = 0,
dt dt dt
dx d2 x
let x1 = x, x2 = dt , x3 = dt2 . Then, the original equation is equiva-
lent to
dx1
= x2 ,
dt
dx2
= x3 ,
dt
dx3 d3 x
= 3 = 6x1 − 11x2 + 6x3 ,
dt dt
 
0 0 6
dx
⇔ =x A, where A = 1 0 −11 and
x = (x1 , x2 , x3 ).
dt
0 1 6
The characteristic polynomial of A is −(t − 1)(t − 2)(t − 3). For λ1 = 1,
the associated eigenvector is v1 = (1, 1, 1); for λ2 = 2,
v2 = (1, 2, 4);
for λ3 = 3,
v3 = (1, 3, 9). Thus
   
1 0 0 1 1 1
A = P −1 0 2 0 P, where P = 1 2 4 .
0 0 3 1 3 9
Now, the general solution of the system is
 t 
e 0 0

x = αetA = αP −1  0 e2t 0 P
0 0 e3t
and the solution of the original ordinary differential equation is c1 et +

c2 e2t + c3 e3t for scalars c1 , c2 , c3 . If subject to the initial conditions
x(0) = 1, x (0) = 0, x (0) = 1,
then the particular solution is obtained by solving c1 , c2 , c3 from
c1 + c2 + c3 = 1, c1 + 2c2 + 3c3 = 0, c1 + 4c2 + 9c3 = 0.
The particular solution is 72 et − 4e2t + 32 e3t .

6. The nth-order linear ordinary differential equation with constant
coefficients
dn x dn−1 x dx
n
+ an−1 n−1 + · · · + a1 + a0 x = 0
dt dt dt
is equivalent to the homogeneous linear differential system
 
0 ··· ··· 0 −a0
1 0 ··· 0 −a1 
 
d
x  .. . . .. .. .. 
=
x A, where A = 
. . . . . 

dt  
0 . ..
.. . 0 −an−2 
0 0 ··· 1 −an−1
dn−1 x
and
x = (x1 , . . . , xn ) with x1 = x, x2 = dx
dt , . . . , xn = dtn−1 .
(a) By Ex. 1, the system has the general solution

x (t) =
αetA
and the initial value problem

x (0) =
x0 has the unique solution

x (t) =
x0 etA .
In fact, let v i (t) = (bi1 (t), bi2 (t), . . . , bin (t)) be the ith row vec-
tor of etA , then { v1 (t), . . . ,
v n (t)} forms a fundamental system of
solutions and

n

x (t) = αi
v i (t), where
α = (α1 , . . . , αn ).
i=1
(b) The original equation with the initial value conditions

dx dn−1 x
x(0) = c1 , = c2 , . . . , n−1 (0) = cn
dt dt
has the unique solution, for some scalars α1 , . . . , αn ,

n
x(t) = αi bi1 (t).
i=1
In particular, if the coefficient matrix A has n distinct eigenvalues

λ1 , λ2 , . . . , λn , then the general solution is

n
x(t) = αi eλi t .
i=1
These coefficients α1 , . . . , αn can be expressed uniquely in terms of

c1 , c2 , . . . , cn .
7. Model after Ex. 5 and solve each of the following equations.
d2 x
(a) − x = 0.
dt2
d2 x
(b) + x = 0.
dt2
d3 x d2 x dx
(c) + 2 − − x = 0, with x(0) = 1, x (0) = 0, x (0) = −1.
dt3 dt dt
d3 x d2 x dx
(d) 3
+3 2 +3 + x = 0, with x(0) = x (0) = x (0) = 1.
dt dt dt
d3 x dx
(e) − = 0.
dt3 dt
8. Consider the 2nd-order system of ordinary differential equations

d2
x dx 1 5
= x A, x (0) = x1 , (0) = x2 , where A = .
dt2 dt 2 4
Since

6 0 2 5
PAP −1 = , where P =
0 −1 1 −1

d2
x d2
⇒ P −1 = 2 (x P −1 ) =
x AP −1 = (x P −1 )PAP −1
dt2 dt

d2
y 6 0
⇒ 2 = y , where y =x P −1 ,
dt 0 −1
whose general solution is

√ √
y1 (t) = a1 e 6t√+ b1 e− 6t ,√
y2 (t) = a2 cos t + b2 sin t.
Then the original equation has the solution

y (t)P = (2y1 (t) + y2 (t), 5y1 (t) − y2 (t)).
x (t) =
Finally, use x (0) = x1 and x (0) =
x2 to determine the coefficients
a1 , b1 , a2 and b2 .
9. Model after Ex. 8 and solve the following equations.

d2 x 3 −2
(a) =
x x (0) = (−1, 2).
, x (0) = (0, 1),
dt2 −2 3
 
0 1 0
d2 x 
(b) =x 0 0 1 , x (0) = (1, 1, 1), x (0) = (−1, −1, 0).
dt2
8 −14 7
 
2 1 0 0
d x
(c) = x 0 0 1 .
dt2
0 1 0
10. By introducing
y (t) = (x1 (t), x2 (t), x1 (t), x2 (t)),

d2
show that the 2nd-order system of differential equations dt2
x
=
x A in
Ex. 8 can be expressed as a 1st-order system
 
0 0 1 5
dy 0 0 2 4
= y (t) 
1 0 0 0 .

dt
0 1 0 0
Use method in Ex. 1 to the above equation to solve the original equa-
tion. Then, try to use this method to solve equations in Ex. 9.
(Note For further readings, see Boyce and Diprema [33] and Farlow [34]
concerning dynamical systems.)
3.7.7 Jordan canonical form

For real matrix A3×3 with three but coincident real eigenvalues, we are
going to prove counterpart of (2.7.74) in this subsection, and hence realize
partial results listed in (3.7.29).

Suppose A has three real eigenvalues λ1 , λ2 and λ3 , but at least two
of them are coincident and yet A is not diagonalizable (refer to (3.7.46)).
Remind that
(A − λ1 I3 )(A − λ2 I3 )(A − λ3 I3 ) = O3×3
and the eigenspaces
Eλi = {
x ∈ R3 | x } = Ker(A − λi I3 ),
x A = λi i = 1, 2, 3
are invariant subspaces of at least dimension one.
Case 1 Suppose λ1 = λ2 = λ3 and (A − λ1 I3 )(A − λ3 I3 ) = O but
(A − λ1 I3 )2 (A − λ3 I3 ) = O. (*1 )
Then Eλ1 = Eλ2 and dim Eλ3 = 1 (Why? One might refer to Case 2 in
Sec. 3.7.6).
Just like the proof shown in Case 1 of Sec. 3.7.6, it follows easily that

Eλ1 ∩ Eλ3 = { 0 }.
In case dim Eλ1 = 2, then dim(Eλ1 + Eλ3 ) = 3 shows that R3 = Eλ1 ⊕
Eλ3 . Via Case 2 in Sec 3.7.6, it follows that A is diagonalizable and thus,
(A − λ1 I3 )(A − λ3 I3 ) = O, contradicting to our assumption.
Hence dim Eλ1 = 1. As a byproduct,
R3 = Eλ1 ⊕ Eλ3
so A is not diagonalizable.
Hence, we introduce the generalized eigenspace

Gλ1 = {
x ∈ R3 |
x (A − λ1 I3 )2 = 0 } = Ker((A − λ1 I3 )2 )
corresponding to λ1 . Gλ1 is an invariant subspace of R3 and Gλ1 contains

the eigenspace Eλ1 as its subspace. Thus, dim Gλ1 ≥ 1 holds.
We claim that dim Gλ1 = 2. To see this, notice that (*1) holds if and
only if
Im(A − λ1 I3 )2 ⊆ Ker(A − λ3 I3 )
⇒ (since (A − λ1 I3 )2 = O3×3 and dim Eλ3 = 1) r((A − λ1 I3 )2 ) = 1
⇒ dim Eλ1 = 3 − r((A − λ1 I3 )2 ) = 3 − 1 = 2.
Or, by using (3.7.31) (also, refer to Ex. <C> of Sec. 2.7.3),
r((A − λ1 I3 )2 ) + r(A − λ3 I3 ) − 3 ≤ r(O) = 0

⇒ 1 ≤ r((A − λ1 I3 )2 ) ≤ 3 − 2 = 1
⇒ r((A − λ1 I3 )2 ) = 1.
This means that dim Gλ2 = 2.

Furthermore, Gλ1 ∩Eλ3 = { 0 }. In fact, take any
x1 ∈ Gλ1 and
x3 ∈ Eλ3

and suppose x1 = 0 , x3 = 0 so that

α1
x1 + α3
x3 = 0
for some scalars α1 and α3 . If x1 ∈ Eλ1 , since λ1 = λ3 , it follows that

α1 = α3 = 0. Thus, if x1 ∈ Gλ1 − Eλ1 , then x1 (A − λ1 I3 ) = 0 and

x1 (A − λ1 I3 )2 = 0 which, in turn, implies that x1 (A − λ1 I3 ) ∈ Eλ1 which

indicates that x1 (A − λ1 I3 ) is an eigenvector associated to λ1 . As a conse-

quence,

x1 (A − λ1 I3 ) + α3
α1 x3 (A − λ1 I3 ) = 0
⇒ α1
x1 (A − λ1 I3 )(A − λ3 I3 ) = α1 (λ1 − λ3 )(
x1 (A − λ1 I3 ))
x3 (A − λ1 I3 )(A − λ3 I3 )
= −α3

= −α3
x3 (A − λ3 I3 )(A − λ1 I3 ) = −α3 0 = 0
⇒ (since λ1 = λ3 ) α1 = 0

⇒ α3
x3 = 0 and hence α3 = 0.

This proves that Gλ1 ∩ Eλ3 = { 0 }.
We conclude that
R3 = Gλ1 ⊕ Eλ3 .

Take a vector v2 ∈ Gλ1 so that
v2 (A − λ1 I3 ) = 0 . Then v2 (A − λ1 I3 )
v1 =
is an eigenvector of A associated to λ1 , i.e. v2 ∈ Eλ1 . Then {

v2 } is
v1 ,
linearly independent and hence forms a basis for Gλ1 . Take any basis { v3 }
for Eλ3 . Together, B = {

v1 , v3 } forms a basis for R3 . Therefore,
v2 ,

v1 A = λ1
v1 ,

v2 A v2 (A − λ1 I3 ) + λ1
= v2 =
v 1 + λ1
v2 ,

v3 A = λ3
v3
     
v1 λ1 0 0 v1
  
⇒ v 2 A = 1 λ1 0   
v2

v3 0 0 λ3 v3
   
λ1 0 0 v1
⇒ PAP −1 =  1 λ1 0  , where P = 
v2  .

0 0 λ3 v3
Case 2 Suppose λ1 = λ2 = λ3 , say equal to λ and A − λI3 = O but

(A − λI3 )2 = O. (*2)

Let Eλ be the eigenspace and Gλ = { x ∈ R3 | x (A − λI3 )2 = 0 } the
generalized eigenspace of A corresponding to λ. Then dim Eλ ≥ 1 and
Eλ ⊆ Gλ holds. Also, (∗ 2) says that dim Gλ = 3.
We claim that dim Eλ = 2. To see this, notice first that dim Eλ ≤ 2
since A − λI3 = O. Since (*2) holds if and only if
Im(A − λI3 ) ⊆ Ker(A − λI3 ),
it is impossible that dim Eλ = dim Ker(A − λI3 ) = 1 holds. Then,
dim Eλ = 2. Or,
r(A − λI3 ) + r(A − λI3 ) − 3 ≤ r(O) = 0
3
⇒ 1 ≤ r(A − λI3 ) ≤
2
⇒ r(A − λI3 ) = 1.
This is equivalent to say that dim Eλ = 3 − r(A − λI3 ) = 3 − 1 = 2.

Now, take a nonzero vector v2 ∈ Gλ so that v1 = v2 (A − λI3 ) = 0 is
an eigenvector of A. There exists another nonzero vector v3 in Eλ which is
linearly independent of v1 . Then B = {
v1 , v3 } is a basis for R3 and
v2 ,

v1 A = λv1 ,

v2 A =v 1 + λ
v2 ,

v3 A = λv3
     
v1 λ 0 0 v1
v2  A =  1 λ 0  
⇒  v2 

v3 0 0 λ v3
   
λ 0 0 v1
v2  .
⇒ PAP −1 =  1 λ 0  , where P = 

0 0 λ v3
Case 3 Suppose λ1 = λ2 = λ3 = λ and A − λI3 = O, (A − λI3 )2 = O but

(A − λI3 )3 = O (*3)
as it should be. The eigenspace is Eλ , while the generalized eigenspace is

Gλ = { x ∈ R3 |
x (A − λI3 )3 = 0 } = R3 . Note that dim Eλ ≥ 1.
We claim that dim Eλ = 1. Obviously, it is not true that dim Eλ = 3
since A − λI3 = O. Suppose that dim Eλ = 2 does happen. Take a basis
{ u2 } for Eλ and extend it to a basis C = {
u1 , u1 , u3 } for R3 . Then
u2 ,
   
λ 0 0 u1
[A]C = QAQ−1 =  0 λ 0  , where Q =  u2 

c31 c32 c33 u3
⇒ det([A]C − tI3 ) = det(A − tI3 ) = (t − λ)2 (c33 − t) = −(t − λ)3
⇒ c33 = λ
⇒
u3 A = c31 u2 + λ
u1 + c32 u3
⇒
u3 (A − λI3 ) = c31 u2 ∈ Eλ
u1 + c32

⇒
u3 (A − λI3 )2 = 0

⇒ (plus
ui (A − λI3 )2 = 0 for i = 1, 2)
x (A − λI3 )2 = 0 x ∈ R3 .
for all
⇒ (A − λI3 )2 = O
which shows a contradiction to our original assumption. This proves
the claim.
Since Gλ = R3 , take any nonzero vector v3 ∈ Gλ so that v2 =

v3 (A − λI3 ) = 0 and

v1 = v3 (A − λI3 )2 = 0 . Note that
v 1 ∈ Eλ .
Also, B = {v1 , v3 } is linearly independent and hence is a basis for R3 .
v2 ,
Therefore,

v1 A = λ
v1 ,

v2 A v2 (A − λI3 ) + λ
= v2 =
v 1 + λ
v3 ,

v3 A v3 (A − λI3 ) + λ
= v3 =
v 2 + λ
v3
     
v1 λ 0 0 v1
⇒ v2  A =  1 λ 0  
v2 

v3 0 1 λ v3
   
λ 0 0 v1
⇒ A = P −1  1 λ 0  P, where P = 
v2  .

0 1 λ v3
We summarize as (refer to (3.7.47))
The Jordan canonical form of a nonzero real matrix A3×3
det(A − tI3 ) = −(t − λ1 )(t − λ2 )(t − λ3 )
where λ1 , λ2 and λ3 are real numbers and at least two of them are coinci-
dent. Denote by Eλi = Ker(A − λi I3 ) the eigenspace for i = 1, 2, 3.
(1) λ1 = λ2 = λ3 . Let
Gλ1 = Ker(A − λ1 I3 )2
be the generalized eigenspace associated to λ1 . Then
1. (A − λ1 I3 )(A − λ3 I3 ) = O but (A − λ1 I3 )2 (A − λ3 I3 ) = O.
⇔ 2. dim Gλ1 = 2 = the algebraic multiplicity of λ1 ,

dim Eλ1 = dim Eλ3 = 1 and Gλ1 ∩ Eλ3 = { 0 }.
⇔ 3. R3 = Gλ1 ⊕ Eλ3

v2 ∈ Gλ1 be such that
In this case, let v1 = v2 (A − λ1 I3 ) = 0 and
v3 ∈ Eλ3 a nonzero vector. Then B = { v1 , v2 , v3 } is a basis for R3 and

   
λ1 0 0 v1
[A]B = PAP −1 =  1 λ1 0  , where P =  v2  .

0 0 λ3 v3
(2) λ1 = λ2 = λ3 = λ. Then
1. A − λI3 = O but (A − λI3 )2 = O.
⇔ 2. dim Gλ = 3, where Gλ = Ker(A − λI3 )2 , and dim Eλ = 2.
In this case, take v2 ∈ Gλ so that v2 (A − λI3 ) ∈ Eλ is a
v1 =
nonzero vector and v3 ∈ Eλ which is linearly independent of

v1 . Then
B = {v1 , v3 } is a basis for R3 and
v2 ,
   
λ 0 0 v1
[A]B = PAP −1 =  1 λ 0  , where P =  v2  .

0 0 λ v3
(3) λ1 = λ2 = λ3 = λ. Then
1. A − λI3 = O, (A − λI3 )2 = O but (A − λI3 )3 = O.

⇔2. dim Gλ = 3, where Gλ = Ker(A − λI3 )3 , dim Eλ = 1.

v3 ∈ Gλ so that
In this case, take v3 (A − λI3 ) =
v2 = 0 and v1 =

v3 (A − λI) = 0 which is in Eλ . Then B = { v1 , v2 , v3 } is a basis for R3
2
and
   
λ 0 0 v1
[A]B = PAP −1 = 1 λ 0 , v2  .
where P =  (3.7.50)

0 1 λ v3
For geometric mapping properties of these canonical forms, refer to exam-

ples in Sec. 3.7.2. For general setting concerned with Jordan canonical
forms, refer to Sec. B.12. See also Exs. <C> 3, 16, 17 and 21.
Combining (3.7.46) and (3.7.50), we have
The criteria for similarity of two real matrices of order 3

Suppose both A3×3 and B3×3 have three real eigenvalues. Then
(1) A and B are similar.

⇔ (2) A and B have the same characteristic and minimal polynomials.
⇔ (3) A and B have the same Jordan canonical form (up to similarity, i.e.
up to a permutation of their eigenvalues).
(3.7.51)
(3.7.48) indicates that (2) ⇒ (1) is no more true for square matrices of
order n ≥ 4, but (1) ⇔ (3) ⇒ (2) still holds.
As an example, we continue Example 1 in Sec. 3.7.6 as
Example 1 Find the Jordan canonical forms of
     
0 −1 0 −1 4 2 2 0 0
A = 3 3 1 , B = −1 3 1 and C = 0 1 1
1 1 1 −1 2 2 0 0 1
and determine if they are similar.

Solution By direct computation, A, B and C have the same characteristic

polynomial
det(A − tI3 ) = det(B − tI3 ) = det(C − tI3 ) = −(t − 1)2 (t − 2)
and the respective minimal polynomials
mA (t) = mC (t) = (t − 1)2 (t − 2), and
mB (t) = (t − 1)(t − 2).
Therefore, they have the respective Jordan canonical form
   
1 0 0 1 0 0
JA = JC = 1 1 0 , and JB = 0 1 0 .
0 0 2 0 0 2
Thus, only A and C are similar.
To find an invertible P3×3 such that PAP −1 = C, let us find R3×3 and
Q3×3 so that RAR −1 = JA and QCQ−1 = JC and hence
RAR −1 = QCQ−1
⇒ PAP −1 = C, where P = Q−1 R.
Now,
 
−1 −1 0
A − I3 =  3 2 1
1 1 0
 
−2 −1 −1
⇒ (A − I3 )2 =  4 2 2 with rank r((A − I3 )2 ) = 1.
2 1 1

x (A−I3 )2 = 0 and we get G1 = Ker((A−I3 )2 ) = (1, 0, 1), (2, 1, 0).
Solve

Take v2 = (2, 1, 0), then
 
−1 −1 0

v2 (A − I3 ) = (2 1 0)  3
v1 = 2 1 = (1 0 1) ∈ E1 .
1 1 0
v3 = (2, 1, 1) ∈ E2 = Ker(A − 2I3 ). Then
Take
 
1 0 1
RAR −1 = JA , where R = 2 1 0 .
2 1 1
On the other hand,

 
0 0 1
QCQ−1 = JC , where Q = 0 1 0 = Q−1 .
1 0 0
Hence, the required

   
0 0 1 1 0 1 2 1 1
−1 
P = Q R = QR = 0 1 0    
2 1 0 = 2 1 0 .
1 0 0 2 1 1 1 0 1
2
Example 2 Let A be as in Example 1.
(a) Compute det A and A−1 .

(b) Compute An for n = ±1, ±2, . . ..
(c) Compute eA (see Ex. <D2 > of Sec. 3.7.6).
(d) Solve the differential equation (refer to Ex. <D4 > of Sec. 3.7.6).
d
x
=
x A.
dt
Solution (a) By using RAR −1 = JA as shown in Example 1, det A =

det JA = 2, and
   
1 1 −1 1 0 0 1 0 1
−1
A−1 = R−1 JA R = −2 −1 2 −1 1 0  2 1 0
0 −1 1 0 0 1
2 2 1 1
 
1 1
2 − 12

= −1 0 0 .
0 − 12 3
2
Notice that the above way to compute A−1 is not necessarily the best one.
As a summary, one can try the following methods:
1. Direct computation by using the adjoint matrix adjA (see (3.3.2) and
Sec. B.6);
2. Elementary row operations (see Secs. 3.7.5 and B.5);
3. Caylay–Hamilton theorem (see, for example, (3.7.28)),
to compute A−1 . (3.7.52)
(b) Partition JA into Jordan blocks as

 
..

D . 0
1 0
JA = 
· · · ··· · · ·
, where D = and E = [2]1×1 .
.. 1 1 2×2
0 . E
Then (refer to Ex. <C> of Sec. 2.7.5 if needed)

Dn 0
n
JA = for n = ±1, ±2, . . ..
0 En
Note that E n = [2n ]1×1 . To compute Dn , notice that

0 0
D = I2 + N, where N =
1 0
⇒ (since N k = 0 for k ≥ 2)D2 = I2 + 2N

1 0
⇒ D = I2 + nN =
n
for n ≥ 1.
n 1
Since D−1 = (I2 + N )−1 = I2 − N, therefore

1 0
Dn = I2 + nN = for n ≤ −1.
n 1
Hence,
 
1 0 0
n
JA = n 1 0 for n = ±1, ±2, . . ..
0 0 2n
⇒ An = (R−1 JA R)n = R−1 JA
n
R
   
1 1 −1 1 0 0 1 0 1
= −2 −1 2 n 1 0  2 1 0
0 −1 1 0 0 2n 2 1 1
 
3+n−2 n+1
1−2 n
1 + n − 2n
= −4 − n + 2n+2 −1 + 2n+1 −2 − n + 2n+1  for n = ±1, ±2, . . ..
−2 − n + 2n+1 −1 + 2n −n + 2n
(c) By definition of etA , the partial sum

% n &
n
tk k tk
−1 k
A =R J R
k! k! A
k=0 k=0


n tk
0 0

n k=0 k!

n tk 
= R−1  k=1 (k−1)!
tk
k=0 k! 0 R


n 1 k
0 0 k=0 k! (2t)
 t 
e 0 0
→ R−1 tet et 0  R = R−1 etJA R as n → ∞.
0 0 e2t
 t 
e 0 0
⇒ etA = R−1 tet et 0  R for t ∈ R.
0 0 e2 t
(d) Notice that
d
x
=xA
dt
d
x −1
⇔ R = x AR−1 = ( x R−1 )(RAR −1 )
dt
d
y
⇔ =y JA , where
y =x R−1 .
dt
Hence, according to Ex. <D4 > 1 of Sec. 3.7.6, the solution is

y (t) =
αetJA , or
tA tJA
x (t) = c e = αe R, where c =αR.
 t  
e 0 0 1 0 1
= (α1 α2 α3 ) tet et 0  2 1 0
0 0 e2t 2 1 1
 t

e 0 et
= (α1 α2 α3 ) (2 + t)et et tet 
2e2t e2t e2t
⇒
x1 (t) = [α1 + (2 + t)α2 ]et + 2α3 e2t ,

x2 (t) = α2 et + α3 e2t ,

x3 (t) = (α1 + α2 t)et + α3 e2t ,
where α1 , α2 and α3 are arbitrary real constants. 2

Example 3 Let A be as in Example 1. Find a real matrix B so that

B 2 = A.
Such a B is called a square root of A. For the nilpolent matrices
   
0 0 0 0 0 0
N1 = 1 0 0 and N2 = 0 0 0 ,
0 1 0 1 0 0
show that N12 = N2 , so N1 is a square root of N2 but N1 does not have any
square root.
Solution From Example 1, it is known that

 
1 0 0
A=R −1
JA R, 
where JA = 1 1 0 .
0 0 2
Suppose we can choose a matrix B̃3×3 so that B̃ 2 = JA , then the matrix
B = R−1 B̃R is a required one since B 2 = R−1 B̃ 2 R = R−1 JA R = A holds.
To find B̃ so that B̃ 2 = JA , consider firstly the Jordan block

1 0 0 0
D= = I2 + N, where N =
1 1 1 0
as in Example 2. Recall the power series expansion
∞ 1
√
2 2 − 1 ··· 2 − k + 1
1 1
1 1 1
1+x= 2 k 2
Ck x , C k = for k ≥ 1 and C02 = 1
k!
k=0
which converges absolutely on (−1, 1). Substituting N for x in this expan-

sion leads to the polynomial in N (remember that N k = 0 for k ≥ 2)
1
I2 + N
2
which does satisfy
2
1 1
I2 + N = I2 + N + N 2 = I2 + N = D.
2 4
Therefore, we define the matrix
 
1 0 0
I + 1N 0
√ = 1
B̃ = 2 2 2 1 √0  .
0 2
0 0 2
Then B̃ 2 = JA .
Consequently, define
   
1 1 −1 1 0 0 1 0 1
B = R−1 B̃R = −2 −1 2  12 1 √0  2 1 0
0 −1 1 0 0 2 2 1 1
 7 √ √ √ 
2 −2 2 1− 2 2 −
3
2
 9 √ √ √ 
= − 2 + 4 2 −1 + 2 2 − 2 + 2 2 .
5
√ √ √
− 52 + 2 2 −1 + 2 − 12 + 2
Then B 2 = A holds.
It is obvious that N12 = N2 .
Suppose N1 has a real square root S, i.e. S 2 = N1 . Then S, as a complex
matrix, has three complex eigenvalues which are equal to zero. Therefore S,
as a real matrix, is similar to
   
0 0 0 0 0 0
N3 = 1 0 0 or N1 = 1 0 0 .
0 0 0 0 1 0
Let P be an invertible matrix so that
P SP −1 = N3
⇒ P S 2 P −1 = P N1 P −1 = N32 = O.
which leads to N1 = O, a contradiction. Similarly,
P SP −1 = N1
⇒ P S 2 P −1 = P N1 P −1 = N12 = N2
⇒ P N12 P −1 = P N2 P −1 = N22 = O
which leads to N2 = O, a contradiction. Hence, N1 does not have any real
square root.
Example 4 Let A be as in Example 1. Show that A and A∗ are similar.

Find an invertible matrix P , even a symmetric one, so that
A∗ = PAP −1 .
Solution Since A and A∗ have the same characteristic and minimal poly-
nomials, according to (3.7.51), they are similar.
By Example 1,
 
1 0 0
−1
A = R JA R, where JA = 1 1  0
0 0 2
⇒ A∗ = R∗ JA
∗
(R∗ )−1 .
Now
 
1 1 0
∗ 
JA = 0 1 0
0 0 2
 
0 1 0
∗ −1
⇒ SJA S = JA , where S = 1 0 0 = E(1)(2) = S −1 .
0 0 1
Therefore,
A∗ = R∗ S −1 (SJA
∗ −1
S )S(R∗ )−1
= R∗ S −1 JA S(R∗ )−1
= R∗ SRAR −1 S(R∗ )−1 = PAP −1 ,
where
     
1 2 2 0 1 0 1 0 1 8 3 4
∗ 
P = R SR = 0 1 1   1 0 0    
2 1 0 = 3 1 2
1 0 1 0 0 1 2 1 1 4 2 1
is an invertible symmetric matrix. 2
Example 5 Let A be as in Example 1. Show that there exist symmetric

matrices A1 and A2 so that
A = A1 A2
and at least one of A1 and A2 is invertible.
Solution Using results in Example 4,

A = P −1 A∗ P
= A1 A2 ,where
 
−3 5 2
A1 = P −1 =  5 −8 −4 , and
2 −4 −1
    
0 3 1 8 3 4 13 5 7
A2 = A∗ P = −1 3 1 3 1 2 =  5 2 3 .
0 1 1 4 2 1 7 3 3
Even by the definition, A1 is symmetric and invertible; also, since
P A = A∗ P , A∗2 = (A∗ P )∗ = P ∗ A = P A = A∗ P = A2 , so A2 is sym-
metric and invertible in this case.
Let A1 = P −1 A∗ and A2 = P , then A = A1 A2 will work too. 2
As a bonus of this subsection, in what follows we use examples to illus-

trate Jordan canonical forms for matrices of order larger than 3. For theo-
retical development in general, refer to Sec. B.12; for a concise argument,
refer to Exs. <C> 3, 16, 17 and 21.
Example 6 Try to analyze the Jordan canonical form

 
..
 2 0 0 . 
 .. 
 1 
 2 0 . 
 .. 
 
 0 1 2 . 
 
 .. 
· · · ··· ··· . ··· 
 
 .. .. 
 . 2 . 
 
 ··· ··· 
A= 
 ··· ··· ··· 
 
 .. . 
 . −1 0 .. 
 
 .. . 
 
 . 1 −1 .. 
 
 ··· ······ ··· · · ·
 
 .. 
 . 0 0
 
..
. 1 0 8×8
 
J1 O
 J2 
=



J3
O J4
composed of 4 Jordan blocks of respective 3 × 3, 1 × 1, 2 × 2 and 2× 2 orders.
Analysis The characteristic polynomial of A is
det(A − tI8 ) = (t − 2)4 (t + 1)2 t2 .
Remember that we use Ai∗ to denote the ith row vector of A.

For J1 and J2 :

e1 A e1 →
= A1∗ = 2 e1 (A − 2I8 ) =
e3 (A − 2I8 )3 = 0 ,

e2 A = A2∗ = e2 →
e1 + 2 e2 (A − 2I8 ) = e3 (A − 2I8 )2 ,
e1 =

e3 A = A3∗ = e3 →
e2 + 2 e3 (A − 2I8 ) =
e2 ,

e4 A e4 →
= A4∗ = 2 e4 (A − 2I8 ) = 0 .
Also, by computation or inspection, the ranks (refer to (3.7.38))
r(A − 2I8 ) = 6, r((A − 2I8 )2 ) = 5,

r((A − 2I8 )k ) = 4 for k ≥ 3.
These relations provide the following information:
1. The generalized eigenspace

G2 = {
x ∈ R8 |
x (A − 2I8 )4 = 0 }
=
e1 ,
e2 , e4 =
e3 , e1 , e3 ⊕
e2 , e4
is an invariant subspace of R8 of dimension 4, the algebraic multiplicity

of the eigenvalue 2.
2. The eigenspace E2 = Ker(A − 2I8 ) ⊆ G2 has the dimension
dim E2 = dim R8 − r(A − 2I8 ) = 8 − 6 = 2.
Thus, there is a basis for G2 containing exactly two eigenvectors asso-

ciated to 2.

3. Select a vector
v3 satisfying v3 (A − 2I8 )3 = 0 but v3 (A − 2I8 )2 = 0 .
Denote v2 = v3 (A − 2I8 ) and v2 (A − 2I8 ) =
v1 = v3 (A − 2I8 )2 , an
eigenvector in E2 . Then
{
v1 , v3 }
v2 ,
is an ordered basis for Ker[(A − 2I8 )3 | e1 , e3 ].

e2 ,
4. Take an eigenvector v4 ∈ E2 , linearly independent of

v1 . The {
v4 }
forms a basis for Ker[(A − 2I8 ) | e4 ].

5. Combing 3 and 4,
B2 = {
v1 ,
v2 , v4 }
v3 ,
is a basis for G2 while E2 = v4 . Notice that Ker(A − 2I8 ) ⊆

v1 ,
Ker(A−2I8 ) ⊆ Ker(A−2I8 ) , and R8 ⊇ Im(A−2I8 ) ⊇ Im(A−2I8 )2 ⊇
2 3
Im(A − 2I8 )3 , and then,

v1 ; v4 ← · · · total number 2 = 8 − r(A − 2I8 ) = dim Ker(A − 2I8 )

v2 ← · · · total number 1 = r(A − 2I8 ) − r((A − 2I8 )2 )
= dim Ker(A − 2I8 )2 − dim Ker(A − 2I8 )

v3 ← · · · total number 1 = r((A − 2I8 )2 ) − r((A − 2I8 )3 )
= dim Ker(A − 2I8 )3 − dim Ker(A − 2I8 )2
Since

v1 A v1 (A − 2I8 ) + 2
= v1 = 2
v1 ,

v2 A v2 (A − 2I8 ) + 2
= v2 =
v1 + 2
v2 ,

v3 A v3 (A − 2I8 ) + 2
= v3 =
v2 + 2
v3 ,

v4 A = 2
v4
 
..
 J1 . 
 .. 
⇒ [A | G2 ]B2 = · · · . · · · .
 
..
. J2
For J3 :

e5 A = A5∗ = −
e5 →
e5 (A + I8 ) =
e6 (A + I8 )2 = 0 ,

e6 A e5 −
= A6∗ = e6 →
e6 (A + I8 ) =
e5
and the ranks
r(A + I8 ) = 7,
r((A + I8 )k ) = 6, for k ≥ 2.
These imply the following:
1. G−1 = Ker((A + I8 )2 ) = e6 is an invariant subspace of dimen-
e5 ,
sion 2, the algebraic multiplicity of −1 as an eigenvalue of A.
2. E−1 = Ker(A + I8 ) has the dimension
dim E−1 = dim R8 − r(A + I8 ) = 8 − 7 = 1.
Thus, since E−1 ⊆ G−1 , there exists a basis for G−1 containing exactly
one eigenvector associated to −1.

3. Select a vector v6 satisfying
v6 (A + I8 )2 = 0 but v6 (A + I8 ) = 0 .
v5 =
Then v5 is an eigenvector in E−1 .
4. Now,
B−1 = { v6 }
v5 ,
is an ordered basis for G−1 while E−1 =

v5 . Notice that

v5 ← · · · total number 1 = 8 − r(A + I8 ) = dim Ker(A + I8 )

v6 ← · · · total number 1 = r(A + I8 ) − r((A + I8 )2 )
= dim Ker(A + I8 )2 − dim Ker(A + I8 ).
Since

v5 A v5 (A + I8 ) −
= v5 = −
v6 ,

v6 A v6 (A + I8 ) −
= v5 −
v6 = v6
⇒ [A | G−1 ]B−1 = [J3 ]
For J4 : similarly,
[A | G0 ]B0 = [J4 ]
where B0 = { v8 } is an ordered basis for G0 = Ker A2 , with

v7 , v8 ∈ G0

and v7 = v8 A = 0 is an eigenvector associated to 0.

Putting together,
B = B2 ∪ B−1 ∪ B0 = {
v1 ,
v2 ,
v3 ,
v4 ,
v5 ,
v6 , v8 }
v7 ,
is an ordered basis for R8 , called a Jordan canonical basis of A, and

 
[A | G2 ]B2 O
[A]B = PAP −1 =  [A | G−1 ]B−1 
O [A | G0 ]B0
 
J1
 J2 
= 
,

J3
J4
where
 
v1
 .. 
P = .  .

v8 8×8
2
Example 7 Find a Jordan canonical basis and the Jordan canonical form
for the matrix
 
7 1 2 2
 1 4 −1 −1
A= −2 1
.
5 −1
1 1 2 8
Solution We follow the process in Example 6. The characteristic polyno-

mial is
det(A − tI4 ) = (t − 6)4 .
So A has only one eigenvalue 6, with algebraic multiplicity 4.
Compute the ranks:
 
1 1 2 2
 1 −2 −1 −1
A − 6I4 = −2
 ⇒ r(A − 6I4 ) = 2;
1 −1 −1
1 1 2 2
 
0 3 3 3
 0 3 3 3
(A − 6I4 )2 =  
0 −6 −6 −6 ⇒ r((A − 6I4 ) ) = 1;
2
0 3 3 3
r((A − 6I4 )k ) = 0 for k ≥ 3 (refer to (3.7.38)).
Therefore, there exists a basis B = {
v1 ,
v2 , v4 } for R4 so that
v3 ,

v1 ; v4 ← · · · total number = dim E6 = 4 − 2 = 2,

v2 ← · · · total number = r(A − 6I4 ) − r((A − 6I4 )2 ) = 1,

v3 ← · · · total number = r((A − 6I4 )2 ) − r((A − 6I4 )3 ) = 1.
Thus,
v1 ,
v2 , v4 = Ker((A − 6I4 )3 ) and
v3 , v4 = Ker(A − 6I4 ).
v1 ,
We want to choose such a basis B as a Jordan canonical basis of A.
To find a basis for E6 = Ker(A − 6I8 ):

x (A − 6I4 ) = 0
⇒ x1 + x2 − 2x3 + x4 = 0
x1 − 2x2 + 2x3 + x4 = 0
⇒ x2 = x3 , x1 = x3 − x4
⇒ x = (x3 − x4 , x3 , x3 , x4 ) = x3 (1, 1, 1, 0) + x4 (−1, 0, 0, 1).

We may take, {(1, 1, 1, 0), (−1, 0, 0, 1)} as a basis for E6 .

To find a basis for Ker((A − 6I4 )3 ), solve

x (A − 6I4 )3 = 0 and
x (A − 6I4 )2 = 0
⇒ We may choose
e1 = (1, 0, 0, 0) as a solution.
Let
v3 =
e1 . Define

v2 v3 (A − 6I4 ) =
= e1 (A − 6I4 ) = (1, 1, 2, 2),

v1 v2 (A − 6I4 ) = (0, 3, 3, 3).
=
Then { v1 , v3 } is linearly independent. Therefore, we may select

v2 ,
v4 = (1, 1, 1, 0) or (−1, 0, 0, 1), say the former. Then G6 = Ker(A − 6I4 )4 =
Ker(A − 6I4 )3 = R4 = v1 , v3 ⊕

v2 , v4 .
Let B = { v1 , v2 , v3 , v4 }. B is a Jordan canonical basis and

 .. 
6 0 0 .
   
 .. 
 1 6 0 .  0 3 3 3
  1
  1 2 2
[A]B = PAP −1 = 0 ..
, where P = 
1
.
 1 6 .  0 0 0
 .. 
· · · ··· ··· . · · · 1 1 1 0
 
..
. 6
We can select a basis for Ker((A − 6I4 )3 ) in a reverse way. Take

v1 = (−1, 0, 0, 1) + (1, 1, 1, 0) = (0, 1, 1, 1). Solve
x (A − 6I4 ) =

v1
⇒ x1 + x2 − 2x3 + x4 = 0
x1 − 2x2 + x3 + x4 = 1
1
⇒
v2 = (1, 1, 2, 2).
3
Again, solve
x (A − 6I4 ) =

v2
1
⇒
v3 = (1, 0, 0, 0).
3
Then {v1 ,
v2 , v4 }, where
v3 , v4 = (1, 1, 1, 0) or (−1, 0, 0, 1), is also a Jordan
canonical basis of A. 2
Example 8 Find a Jordan canonical basis and the Jordan canonical form
for the matrix
 
2 0 0 0 0 0
 1 0
 2 0 0 0 
 
−1 1 2 0 0 0
A=  .
 0 0 0 2 0 0
 
 0 0 0 1 2 0
0 0 0 0 1 4 6×6
Compute etA and solve
d
x
=
x A.
dt
det(A − tI6 ) = (t − 2)5 (t − 4).
So A has one eigenvalue 2 of multiplicity 5 and another eigenvalue 4 of

multiplicity 1.
To compute the ranks:
r(A − 2I6 ) = 4, r((A − 2I6 )2 ) = 2, r((A − 2I6 )k ) = 1 for k ≥ 3;

r(A − 4I6 ) = 7, r((A − 4I6 ) ) = 1
k
for k ≥ 2.
These provide information about the number and the sizes of Jordan blocks
in the Jordan canonical form:
1. For λ = 2, dim G2 = 5 and a basis { v5 } must be chosen so that

v1 , . . . ,

v1 ; v4 ← · · · 2 = dim R6 − r(A − 2I6 ),

v2 ; v5 ← · · · 2 = r(A − 4I6 ) − r((A − 2I6 )2 ),

v3 ← · · · 1 = r((A − 4I6 )2 ) − r((A − 2I6 )3 ).
2. For λ = 4, dim E4 = 1 and a basis {

v6 } is chosen for E4 .
Combining together, B = {v1 , . . . , v6 } is going to be a Jordan canonical

v5 ,
basis of A and, in B,
 
..
 2 0 0 . 
 .. 
 1 2 0 . 
 
 .. 
 0 
 1 2 . 
 . 
 . 
−1 · · · · · · · · · . · · · · · · · · · 
[A]B = PAP =   ,
 .. .. 
 . 2 0 . 
 
 .. .. 
 . 1 2 . 
 
 .. .. 
 . · · · · · · . · · ·
 
..
. 4 6×6
 
v1
 .. 
 
P =  . .
v5 

v6
To choose a Jordan canonical basis B:
Solve

x (A − 2I6 ) = 0

⇒ x2 − x3 = 0, x3 = 0, x5 = x6 = 0
⇒ x = (x1 , 0, 0, x4 , 0, 0) = x1

e1 + x4
e4 .
Therefore, E2 = e4 . Next, solve
e1 ,

x (A − 2I6 )3 = 0

but x (A − 2I4 )2 = 0

⇒ x6 = 0 but at least one of x1 , x2 , x3 , x4 and x5 is not zero.

⇒ Choose
v3 =
e3 but not to be
e1 and
e4 .
⇒ v3 (A − 2I6 ) = (−1, 1, 0, 0, 0, 0) = −
v2 = e1 +
e2 .
⇒ v2 (A − 2I6 ) = (1, 0, 0, 0, 0, 0) =
v1 = e1 .

v1 (A − 2I6 ) = 0 . Next, solve
Really, it is

x (A − 2I6 )2 = 0 x (A − 2I6 ) = 0
but
⇒ x3 = 0, x6 = 0 but at least one of x1 , x2 , x4 and x5 is not zero.
⇒ Choose
v5 = (0, 0, 0, 1, 0, 0) =
e5 (why not
e2 and
e4 ?)
⇒ v5 (A − 2I6 ) = (0, 0, 0, 1, 0, 0) =
v4 = e4 .
Therefore, {
v1 , v3 } and {
v2 , v5 } are the required bases so that
v4 ,
Ker(A − 2I6 )5 =

v1 , v3 ⊕
v2 , v5 .
v4 ,
Finally, solve

x (A − 4I6 ) = 0

⇒ − 2x1 + x2 − x3 = 0, −2x2 + x3 = 0, −2x3 = 0,

− 2x4 + x5 = 0, −2x5 + x6 = 0
⇒ x1 = x2 = x3 = 0, x5 = 2x4 , x6 = 2x5 = 4x4
⇒ x = (0, 0, 0, x4 , 2x4 , 4x4 ) = x4 (0, 0, 0, 1, 2, 4)

⇒
v6 = (0, 0, 0, 1, 2, 4).
Combing together,
B = { v2 ,
v1 , v3 ,
v4 , v6 }
v5 ,
is a Jordan canonical basis of A and the transition matrix is

 
1 0 0 0 0 0
−1 1 0 0 0 0
 
 
 0 0 1 0 0 0
P =  .
 0 0 0 1 0 0
 
 0 0 0 0 1 0
0 0 0 1 2 4 6×6
To compute etA : let JA = [A]B for simplicity. Then
A = P −1 JA P
⇒ (see Ex.<D2 > 4 of Sec. 3.7.6)
etA = P −1 etJA P.
The problem reduces to compute etJA . Thus

   
2 0 0 0 0 0
J1 = 1 2 0 = 2I3 + N1 , where N1 = 1 0 0 with N13 = O
0 1 2 0 1 0
⇒ J12 = 22 I3 + 22 N1 + N12 , . . . ,
J1k = 2k I3 + Ck1 · 2k−1 · N1 + Ck2 · 2k−2 · N12

 
2k 0 0
 
=  k · 2k−1 2k 0  for k ≥ 2
k(k−1) k−2
2! 2 k · 2k−1 2k
n
1 k
⇒ J
k! 1
k=0

n 1 k 
1 + 2 + k=2 k! 2 0 0

n

= 
1 k−1 n 1 k
1 + k=2 (k−1)! 2 1 + 2 + k=2 k! 2 0 

n 1 k−2

n 1 k−1

n 1 k
k=2 (k−2)!2! 2 1 + k=2 (k−1)! 2 1 + 2 + k=2 k! 2
 2   
e 0 0 1 0 0
  21 
→ e 2
e 2
0 = e  1! 1 0 as n → ∞
1 2 2 2
2! e e e 1 1
2! 1! 1
 
1 0 0
2t  1 
⇒ e = e  1! 1 0 .
tJ1
1 1
2! 1! 1
Similarly,

tJ2 2t 1 0 2 0
e =e 1 , where J2 = , and
1! 1 1 2
et[4] = e4t [1]1×1 .
Therefore,
 
2t
..
e 0 0 . 
 .. 
 e2t e2t
0 . 
 
 .. 
 1 e2t 2t
e2t 
  
e .
 2 
e tJ1
 .. 
= ··· ··· ··· . ··· ··· ··· 
etJA
= etJ2   .
 .. . 
et[4]  . e2t 0 .. 
 
 .. . 

 . e2t e2t .. 

 . 
 ··· ··· · · · .. · · ·
 
.. 4t
. e 6×6
d
The solution of x
dt =
x A is

x (t) =
αetA , α ∈ R6 is any constanst vector.
where
Remark
The method in Example 6 to determine the Jordan canonical form of a real
matrix An×n for n ≥ 2, once the characteristic polynomial splitting into
linear factors as
(−1)n (t − λ1 )r1 (t − λ2 )r2 · · · (t − λk )rk , r1 + r2 + · · · + rk = n,
is universally true for matrices over a field, in particular, the complex

field C.
For general setting in this direction, see Sec. B.12 and Ex. <C> 21.
Exercises
<A>
1. Try to use the method introduced in Example 6 to reprove (3.7.46) and

(3.7.50) more systematically.
(1) Find a Jordan canonical basis of A and the Jordan canonical form
of A. Try to use (3.7.32) and Ex. <A> 5 there to find a Jordan
canonical basis and the Jordan canonical form of A∗ .
(2) Compute det A.
(3) In case A is invertible, compute A−1 by as many methods as possi-
ble and compare the advantage of one with the other (see (3.7.52)).
(4) Compute An for n ≥ 1 and etA and solve the equation
d
x
=
x A (refer to Ex.<D2 > and <D4 > of Sec. 3.7.6).
dt
(5) If A is invertible, find a square root of A (see Example 3).
(6) Determine an invertible symmetric matrix R so that A∗ = RAR −1
(see Example 4).
(7) Decompose A as a product of two symmetric matrices so that at
least one of them is invertible (see Example 5).
     
1 2 3 2 2 −1 0 0 2
(a) 2 4 6 . (b)  0 1 0 . (c) 1 −2 1 .
3 6 9 −1 −2 2 2 0 0
     
0 4 2 2 0 0 −1 1 0
(d) −3 8 3 . (e)  2 2 0 . (f)  0 −1 1 .
4 −8 −2 −2 1 2 0 0 −1
     
−3 3 −2 1 0 1 2 −1 −1
(g) −7 6 −3 . (h)  1 0 2 . (i)  2 −1 −2 .
1 −1 2 −1 0 3 −1 1 2
     
5 −3 2 1 −3 4 4 6 −15

(j) 6 −4 4 . (k) 4 −7 8 . (l) 3 4 −12 .
  
4 −4 5 6 −7 7 2 3 −8

1. For each of the following matrices A, do the same problems as indicated

in Ex. <A> 2.
   
3 0 0 0 1 0 0 0
 1 3 0 0  1 1 0 0
(a)  
 0 1 3 0 . (b) 
 0 1 2 0 .

−1 1 0 3 −1 0 1 2
   
0 −2 −2 −2 7 1 −2 1
−3 1 1 −3 1 4 1 1
(c) 
 1 −1 −1
. (d)  .
1 2 −1 5 2
2 2 2 4 2 −1 −1 8
 
  2 −2 −2 −2 2
2 0 0 0 −4
−1 0 −2 −6 1
3 1 −1  
(e) 
 0 −1 1
. 
(f)  2 1 3 3 0
 .
0  2 3 3 7 2
1 0 0 3
0 0 0 0 5 5×5
   
1 0 0 0 0 0 5 1 1 0 0 0
0 0 0 5 1 0 0 0
 1 0 0 0   
   
0 1 1 0 1 0 0 0 5 0 0 0
(g)   . (h)   .
0 1 0 1 0 0 0 0 0 5 1 −1
   
1 0 0 0 1 0 0 0 0 0 5 1
0 −1 0 0 −1 1 6×6 0 0 0 0 0 5 6×6
 
−1 0 2 −2 0 0 −1
 0 0 0 0 −1
 1 1 
−1 2 −1 0 0 0
 0 
 
(i)  1 0 −1 2 0 0 1 .
 
 1 0 −1 1 1 0 2
 
 3 0 −6 3 0 1 4
0 0 0 0 0 0 1 7×7
2. A matrix A ∈ M(n; R) (or M(n; F)) is called a nilpotent matrix of

index k if Ak−1 = O but Ak = O (see Ex. 7 in Sec. B.4). Verify that
each of the following matrices is nilpotent and find its Jordan canonical
form.
 
0 0 0 0 1 0
    0 0 1 1 0 −1
1 −1 −1  
0 a b  
    0 0 0 0 0 0
(a) 0 0 c . (b) 2 −2 −2 . (c)  .
0 0 0 0 0 0
0 0 0 −1 1 1  
0 0 1 0 0 −1
0 0 0 0 0 0
Read Sec. B.12 and try your best to do exercises there.
Besides, do the following problems.
1. Find the Jordan canonical form for each of the following matrices.
   
1 −1 O 0 1 O
 1 −1   0 1 
   
 .. ..   .. .. 
(a)  . .  . (b)  . .  .
   
 1 −1   0 1 
O 1 n×n
O 0 n×n
(c) The square power of the matrix in (b).

 
α 0 1 0 O
0 α 0 
 1 
 
 .. .. .. 
(d) 

. . . 
 , n ≥ 3.
 α 0 1
 
 0 α 0
O 0 0 α n×n
 
0 1 0 ··· 0 0  
0 0 1 · · · 0 0 0 0 ··· 0 a1
 
. . .   ··· 0
. . . .. ..  0 0 a2 
. . . . .  .. 
(e)  . . . ..  . (f)  ... ..
.
..
. . .
. . . ..   
. . . . . 0 ··· 0
  an−1 0
0 0 0 · · · 0 1 an 0 ··· 0 0 n×n
1 0 0 ··· 0 0 n×n
 
α0 α1 α2 ··· αn−2 αn−1
αn−1 α α1 ··· αn−3 αn−2 
 0 
αn−2 αn−1 α0 ··· αn−4 αn−3 
(g)  , α0 , α1 , . . . , αn−1 ∈R (or C).
 . .. .. .. .. 
 .. . . . . 
α1 α2 α3 ··· αn−1 α0
 
a0 a1 a2 ··· an−1
µan−1 a0 a1 ··· an−2 
 
µan−2 µan−1 a0 ··· an−3 
(h)  , a0 , a1 , . . . , an−1 , µ ∈ R (or C).
 . .. .. .. 
 .. . . . 
µa1 µa2 µa3 ··· a0
 
c0 c1 c2 c3 c4
c1 c2 + c1 a c3 + c2 a c4 + c3 a c0 + c4 a
 
(i) 
c2 c3 + c2 a c4 + c3 a c0 + c4 a c1  ,
c3 c4 + c3 a c0 + c4 a c1 c2 
c4 c0 + c4 a c1 c2 c3
where c0 , c1 , c2 , c3 , c4 , a ∈ R (or C). Try to extend to a matrix
of order n.
(j) Pn×n is a permutation matrix of order n (see (2.7.67)), namely,
for a permutation σ: {1, 2, . . . , n} → {1, 2, . . . , n},
 
eσ(1)
 .. 
P =  . .

eσ(n)
2. For each of the following linear operators f , do the following problems.
(1) Compute the matrix representation [f ]B , where B is the given
basis.
(2) Find a Jordan canonical basis and the Jordan canonical form
of [f ]B .
(3) Find the corresponding Jordan canonical basis for the original
operator f .
(a) f : P2 (R) → P2 (R) defined by f (p)(x) = (x + 1)p (x), B =

{x2 − x + 1, x + 1, x2 + 1}.
(b) f : P2 (R) → P2 (R) defined by f (p) = p + p + p , B = {2x2 − x + 1,
x2 + 3x − 2, −x2 + 2x + 1}.
(c) f : M(2; R) → M(2; R) defined by f (A) = A∗ ,
/
1 1 1 1 1 0 0 1
B= , , , .
1 0 0 1 1 1 1 1
1 2
(d) f : M(2; R) → M(2; R) defined by f (A) = A, where B =
2 1
{E11 , E12 , E21 , E22 } is the basis for M(2; R). What happens
natural

1 2 1 2 1 2
if 2 1 is replaced by 0 1 or −2 1 ?
(e) Let V = ex , xex , x2 ex , e−x , e−2x be the vector subspace of
C[a, b], generated by B = {x, xex , x2 ex , e−x , e−2x }. f : V → V
defined by f (p) = p .
(f) Let V be the subspace, generated by B = {1, x, y, x2 , y 2 , xy}, of
the vector space P (x, y) = {polynomail functions over R in two
variables x and y}. f : V → V defined by f (p) = ∂y
∂p
.
3. Let f be a nilpotent operator on an n-dimensional vector space V over

a field F, i.e. there exists some positive integer k so that f k = 0 and
the smallest such k is called its index (refer to Ex. 2). Note that
the index k ≤ n (see (3.7.38)).
(a) Let B be any ordered basis for V . Prove that f is nilpotent of

index k if and only if [f ]B is nilpotent of index k.
(b) Suppose F = C, the complex field. Prove that the following are
equivalent.
(1) f is nilpotent of index k.
(2) There exist a sequence of bases B1 , . . . , Bk = B such that Bi is
a basis for Ker(f i ) and Bi+1 is an extension of Bi for 1 ≤ i ≤
k − 1. Hence B is a basis for Ker(f k ) = V so that [f ]B is lower
triangular with each diagonal entry equal to zero (see Ex. 5 of
Sec. B.12).
(3) f has the characteristic polynomial
det(f − t1V ) = (−1)n tn .
and the minimal polynomial tk .

(4) For any , 1 ≤ ≤ n,
tr(f
) = 0.
(Note (b) is still valid even if F is an infinite field of characteristic

zero. But (4) is no more true if F is not so. For example, in case
F = I2 = {0, 1}, then

1 0
1 1
satisfies tr[f
] = 0 for ≥ 1 but it is not nilpotent. See also
Ex. <C> 11 of Sec. 2.7.6.)
Suppose f is nilpotent of index k. Let
dim Ker(f i ) = ni , 1 ≤ i ≤ k, with nk = n, n0 = 0 and

ri = ni − ni−1 = r(f i−1
) − r(f ),
i
1 ≤ i ≤ k.
Note that r1 + r2 + · · · + rk = n = dim V .

(c) For simplicity, let V = Fn and f ( x) = x ∈ Fn and
x A, where
A ∈ M(n; F). Choose x1 , . . . , x rk ∈ Ker(A ) \ Ker(A
k k−1
), linearly
independent, so that
Ker(Ak ) = V = xrk ⊕ Ker(Ak−1 ).

x1 , . . . ,
Try to show that

x1 A, . . . ,
xrk A are linearly independent and

xrk A ∩ Ker(Ak−2 ) = { 0 }.
x1 A, . . . ,
Hence, deduce that rk + nk−2 ≤ nk−1 and thus, rk ≤ rk−1 . In case

rk < rk−1 , choose linearly independent vectors xrk−1 ∈
xrk +1 , . . . ,
Ker(A k−1
) \ Ker(A k−2
) so that
Ker(Ak−1 ) =
x1 A, . . . ,
xrk A, xrk−1 ⊕ Ker(Ak−2 ).
xrk +1 , . . . ,
Repeat the above process to show that rk−1 ≤ rk−2 . By induction,

this process eventually leads to the following facts.
(1) rk ≤ rk−1 ≤ · · · ≤ r2 ≤ r1 , r1 + r2 + · · · + rk = n. Call
{rk , . . . , r1 } the invariant system of f or A.
(2) Ker(Ak ) = V has a basis B:

x1 ···
xrk

x1 A ···
xrk A
xr +1 ···
xrk−1
k
. . . .
. . . .
. . . .

x1 Ak−2 · · · xr +1 Ak−1 · · ·
xrk Ak−2 xrk−1 Ak−1 · · · xr3 +1 · · · xr2
k

x1 Ak−1 · · · xr +1 Ak−2 · · ·
xrk Ak−1 xrk−1 Ak−2 · · ·
xr3 +1 A· · ·
xr2 A
xr2 +1 · · ·
xr1
k
(3) Denote
(i)
Bj = {
xj Ai−1 ,
xj Ai−2 , . . . , xj },
xj A,
1 ≤ i ≤ k, ri+1 + 1 ≤ j ≤ ri with rk+1 = 0;
(i) (i)
Wj = the invariant subspace generated by Bj .
(k) (k) (k−1) (1) (1)
Then B = B1 ∪ · · · ∪ Brk ∪ Brk +1 ∪ · · · ∪ Br2 +1 ∪ · · · ∪ Br1
and
[A]B = PAP −1
 (k) 
A1
 .. 
 . 0 
 
 (k) 
 Ark 
 
 (k−1)
Ark +1 
 
 .. 
= . 
 
 (2) 
 Ar 2 
 
 (1)
Ar2 +1 
 
 .. 
 0 . 
(1)
Ar 1 n×n
where
 
0 0
1 0 
 
 
 1 0 
=  ,
(i) (i)
Aj = A | Wj (i)  .. 
Bj  . 
 
 0 
0 1 0 i×i
2 ≤ i ≤ k, ri+1 + 1 ≤ j ≤ ri ;
(1)
Aj = [0]1×1 , r2 + 1 ≤ j ≤ r1 .
Call [A]B a (Jordan) canonical form of the nilpotent operator
of f or A.
Compare with Ex. 5 of Sec. B.12.

(d) Show that, there exists a (nilpotent) operator g on V so that
g2 = f
i.e. g is a square root of f , if and only if, there exist 2k non-negative

integers t1 , . . . , t2k satisfying
ri = t2i−1 + t2i , 1 ≤ i ≤ k;
t1 ≥ t2 ≥ · · · ≥ t2k ≥ 0 and t1 + t2 + · · · + t2k = n.
In this case, if g is of index , then it is understood that 2k ≥ and

tj = 0 if j > so that {t1 , . . . , t
} is the invariant system of g.
(Note Note that ri ≥ 2t2i and ri+1 ≤ 2t2i+1 so that
ri ri+1
≥ t2i ≥ t2i+1 ≥ .
2 2
If both ri and ri+1 are odd integers, then
ri − 1 ri+1 + 1
≥ t2i ≥ t2i+1 ≥ ⇒ ri ≥ ri+1 + 2.
2 2
Conversely, suppose these inequalities hold. If ri is even, choose
t2i = t2i+1 = r2i ; if ri is odd, choose t2i = ri2+1 and t2i+1 =
ri −1
2 .)
(e) Let A ∈ M(n; C) be nilpotent of index n. Show that A does not
have square root (see Example 3). What happens to a nilpotent
matrix An×n of index n − 1?
(f) Suppose An×n and Bn×n are nilpotent of the same index k, where
k = n or n − 1 and n ≥ 3. Show that A and B are similar. What
happens if 1 ≤ k ≤ n − 2?
(g) Show that a nonzero nilpotent matrix is not diagonalizable.
(h) Find all canonical forms of a nilpotent matrix An×n , where
n = 3, 4, 5 and 6.
4. Let A = [aij ] ∈ M(n; C) be an invertible complex matrix.
(a) Show that A has a square root, i.e. there exists a matrix B ∈
M(n; C) so that B 2 = A (see Example 3).
(b) Let p ≥ 2 be a positive integer, then there exists an invertible
matrix B ∈ M(n; C) so that B p = A.
5. Any matrix A ∈ M(n; C) is similar to its transpose A∗ and there exists

an invertible symmetric matrix P so that
A∗ = PAP −1
(see Example 4). Hence, A and A∗ are diagonalizable at the same time
(see Ex. <C> 9(f) of Sec. 2.7.6).
6. Any matrix A ∈ M(n; C) can be decomposed as a product of two sym-
metric matrices where at least one of them can be designated as an
invertible matrix (see Example 5).
7. Any matrix A ∈ M(n; C) is similar to a lower triangular matrix whose
nonzero nondiagonal entries are arbitrarily small positive number. Try
to use the following steps:
(1) Let
 
  λi O
J1 O 1 
   λi 
PAP −1 =  ..
.  , where Ji =  .. 
 . 
O Jk m×n O 1 λi ri ×ri
for 1 ≤ i ≤ k.
2 ri
(2) For any ε > 0, let Ei = diag[ε, ε , . . . , ε ] be a diagonal matrix of
order ri . Then
 
λi 0
 ε λi 
 
Ei Ji Ei−1 =  ..  for 1 ≤ i ≤ k.
 . 
0 ε λi r ×r
i i
(3) Denote the pseudo-diagonal matrix E = diag[E1 , E2 , . . . , Ek ].

What is (EP )A(EP )−1 ?
8. A matrix of the form

   
J1 0 λ 0
 J  1 λ 
 2   
 ..  , where Ji =  ..  for 1 ≤ i ≤ k
 .   . 
0 Jk n×n 0 1 λ ri ×ri
with descending sizes r1 ≥ r2 ≥ · · · ≥ rk has the characteristic polyno-

mial (−1)n (t − λ)n and the minimal polynomial (t − λ)r1 . Prove this
and use it to do the following problems.
(a) Find the minimal polynomial of a matrix A in its Jordan canonical

form.
(b) A matrix is diagonalizable if and only if its minimal polynomial
does not have repeated roots (refer to (2.7.73) and Ex. <C> 9(e)
in Sec. 2.7.6).
9. A complex matrix R similar to the matrix
 .. 
 0 1 . 
 .. 
 1 0 . 
 
 . 
· · · · · · .. · · · 
 
..
. I n−2 n×2
is called a reflection of Cn with respect to a 2-dimensional subspace.

Then, any involutory matrix An×n , i.e. A2 = In , which is not In or −In ,
can be decomposed as a product of finitely many reflections. Watch the
following steps.
(1) For some invertible P and 1 ≤ r ≤ n − 1,

−Ir 0
A = P −1 P
0 In−r n×n
 
1

−1 0  0 
= P −1 P · P −1 


 P
0 In−1 n×n −1
0 In−2 n×n
 
Ir−1
 0 
· · · P −1 


 P.
−1
0 In−r n×n
Let Ri denote the ith factor on the right, 1 ≤ i ≤ r.

(2) Let
 
Ii−1 O
 
 0 1 0 
 
Qi = 

 1 0 1 
 for i = 1, 2, . . . , r.
 −1 0 1 
O In−i−2 n×n
Then
 
Ii O

 0 1 
P Ri P −1 = Q−1   Qi for 1 ≤ i ≤ r.
i  
1 0
O In−i−2
10. A matrix A6×6 has the characteristic and the minimal polynomial
p(t) = (t − 1)4 (t + 2)2 ,
m(t) = (t − 1)2 (t + 2)
respectively. Find the Jordan canonical form of A.
11. Suppose a matrix A8×8 has the characteristic polynomial
(t + 1)2 (t − 1)4 (t − 2)2 .
Find all possible Jordan canonical forms for such A and compute the
minimal polynomial for each case.
12. Suppose a complex matrix An×n has all its eigenvalues equal to 1, then
any power matrix Ak (k ≥ 1) is similar to A itself. What happens if
k ≤ −1.
13. Let
 
λ
1 λ 
 
 1 λ 
J = 
 . . 
 .. .. 
1 λ n×n
and p(x) be any polynomial in P (R). Show that
 
p(λ) 0 ··· ··· 0 0
 
 p (λ) 
 p(λ) ··· ··· 0 0 
 1! 
 
 p (λ)
p (λ) 
 ··· ··· 0 0 
 2! 1! 
p(J) =   .
.. .. .. .. 
 . . . . 
 
 
 p(n−2) (λ) p(n−3) (λ) 
 · · · · · · p(λ) 0 
 (n−2)! (n−3)! 
 
p(n−1) (λ) p(n−2) (λ)
(λ)
(n−1)! (n−2)! · · · · · · p 1! p(λ)
n×n
In particular, compute J for k ≥ 1. Show that

k
(a)
 
1 0 ··· ··· 0 0
 
 1 1 ··· ··· 0 0
 1! 
 
 1 1
··· ··· 0 0
 2! 1! 
J λ 
e =e  . .. . What is etJ ?
 .. . 
 
 
 1 1
··· ··· 1 0
 (n−2)! (n−3)! 
 1 
(n−1)!
1
(n−2)! ··· ··· 1
1! 1
(b) limk→∞ J k exists (see Ex. <D1 > of Sec. 3.7.6) if and only if one of
the following holds:
(1) |λ| < 1.
(2) λ = 1 and n = 1.
(c) limk→∞ J k = On×n if |λ| < 1 holds and limk→∞ J k = [1]1×1 if λ = 1
and n = 1.
(d) Let
 
J1 0
 J2 
 
PAP −1 =  . 
 .. 
0 Jk
be the Jordan canonical form of a matrix An×n with k Jordan blocks
Ji , 1 ≤ i ≤ k. Show that
 tJ1 
e 0
 etJ2 
 
etA = P −1  ..  P.
 . 
0 etJk
(e) Prove Ex. <D1 > 5 of Sec. 3.7.6.
14. For A = [aij ] ∈ M(n; C) and p ≥ 1, define

   p1
n n
Ap =   |aij |p  , and
i=1 j=1
A∞ = max |aij |.

1≤i,j≤n
Show that, for 1 ≤ p ≤ ∞,

(1) Ap ≥ 0 and equality holds if and only if A = On×n .

(2) αAp = |α|Ap , α ∈ C.
(3) A + Bp ≤ Ap + Bp .
Hence, p is called a p-norm for M(n; C). See also Exs. <C> 3
and <D2 > 6 of Sec. 3.7.6. Also,
(4) AB∞ ≤ nA∞ · B∞ . Try to use (4) to show that

k
1
eA = lim A
k→∞ !

=0
exists for any A ∈ M(n; C). See Ex. <D2 > of Sec. 3.7.6.
15. Let M = [aij ] ∈ M(n; R) be a stochastic matrix (see Ex. <D3 > of
Sec. 3.7.6) and P M P −1 = J be the Jordan canonical form of M . Let
∞ be as in Ex. 14. Show that:
(1) M k ∞ ≤ 1 for all k ≥ 1 and hence J k ∞ is bounded for all

k ≥ 1.
(2) Each Jordan block in J corresponding to the eigenvalue 1 is of the
size 1 × 1, i.e. a matrix of order 1.
(3) limk→∞ M k exists if and only if, whenever λ is an eigenvalue of M
with |λ| = 1, then λ = 1.
Use these results to prove Ex. <D3 > 3(d) of Sec. 3.7.6.
16. Let A = [aij ] ∈ M(n; F). Show that the following are equivalent.
(a) A is triangularizable, i.e. there exists an invertible matrix P so that

PAP −1 is a lower or upper triangular matrix.
(b) The characteristic polynomial of A can be factored as product of
linear factors, i.e.
det(A − tIn ) = (−1)n (t − λ1 )r1 · · · (t − λk )rk
where λ1 , . . . , λk are distinct eigenvalues of A, ri ≥ 1, for 1 ≤ i ≤ k

and r1 + · · · + rk = n.
For (b) ⇒ (a), see Ex. <C> 10(a) of Sec. 2.7.6. As a consequence, any
complex square matrix or real square matrix considered as a complex
one is always triangularizable. Yet, we still can give a more detailed
account for (b) ⇒ (a) as follows (refer to Ex. 3, and Exs. 2 and 3 of
Sec. B.12). Suppose the minimal polynomial of A ∈ M(n; F) is
ψIA (t) = (t − λ1 )d1 · · · (t − λk )dk , 1 ≤ di ≤ ri for 1 ≤ i ≤ k.
(See Ex. 8, and Ex. <C> 9 of Sec. 2.7.6.) Let
ψIA (t) =k
fi (t) = = (t − λ
)d , 1 ≤ i ≤ k.
(t − λi )di =1
=i
Then, f1 (t), . . . , fk (t) are relatively prime. There exist polynomials

g1 (t), . . . , gk (t) so that
f1 (t)g1 (t) + · · · + fk (t)gk (t) = 1.
(See Sec. A.5.) Hence
f1 (A)g1 (A) + · · · + fk (A)gk (A) = In .
Let, for 1 ≤ i ≤ k,
Ei = fi (A)gi (A), and

Wi = { x ∈ F | x (A − λi In )di = 0 }
n
= {
x ∈ Fn | there exists some positive integer

x (A − λi In )
= 0 }.
so that
Then
(1) Each Wi , 1 ≤ i ≤ k, is an invariant subspace of A so that
F n = W1 ⊕ · · · ⊕ W k .
(2) A | Wi = Ei : Fn → Fn is a projection (see (3.7.34)) onto Wi along

W1 ⊕ · · · ⊕ Wi−1 ⊕ Wi+1 ⊕ · · · ⊕ Wk , namely,
Ei2 = Ei ,
Ei Ej = O, i = j, 1 ≤ i, j ≤ n.
Also, Ei has
the characteristic polynomial = (t − λi )ri or − (t − λi )ri , and

the minimal polynomial = (t − λi ) . di
Note that dim Wi = ri , 1 ≤ i ≤ k.

(3) There exists a basis Bi for Wi so that

 
λi 0
 
 @ λi 
[Ei ]Bi = 
 @ . ..

 is lower triangular, 1 ≤ i ≤ k.
 @ 
@ λi
r ×r i i
(4) B = B1 ∪ · · · ∪ Bk forms a basis { xn } for Fn so that

x1 , . . . ,
   
[E1 ]B1 0 x1
   
[A]B = PAP = −1 . ..  , P =  ...  .

[Ek ]Bk n×n
xn
In this case,
 
0
 .. 
 . 
 0 
 
 0 
−1  
Ei = P  Iri  P, 1 ≤ i ≤ k.
 
 0 
 
 .. 
 . 
0 0
17. (continued from Ex. 16) Define
 
λ1 I r 1 0
k
−1  .. 
D= λ i Ei = P  .  P,
i=1
0 λ k Ir k n×n
a diagonalizable matrix and
 
[E1 ]B1 − λ1 Ir1 0
 .. 
N = A − D = P −1  .  P,
0 [Ek ]Bk − λk Irk n×n
a nilpotent matrix (see Ex. 3).
If r is the least common multiple of r1 , . . . , rk , then N r = O. Also,
since both D and N are polynomials of A, so DN = N D holds. We
summarize as:
Let A ∈ M (n; F) be a triangularizable matrix. Then there exist unique
matrices D and N satisfying:
1. D is diagonalizable and N is nilpotent. Moreover, there exists an

invertible matrix Pn×n so that
PDP −1 is a diagonal matrix, and

PNP −1 is a lower triangular matrix with zero diagonal entries.
2. A = D + N,
DN = N D.
3. Both D and N can be expressed as polynomials in A. A and D have
the same characteristic polynomial and hence, the set of eigenvalues.
If A is a real matrix having real eigenvalues λ1 , . . . , λk , then P can be

chosen to be a real matrix. Suppose
 
3 2 2
A= 1 2 2 .
−1 −1 0
Show that the corresponding

   
1 0 −2 2 2 4
D = 1 2 2 and N =  0 0 0 .
0 0 2 −1 −1 −2
18. Suppose A, B ∈ M(n; C) and B is nilpotent. In case AB = BA, show

that
det(A + B) = det A.
In case AB = BA, give example to show that this identity may be

not true.
19. (continued from Ex. 17) Suppose N is of index k. Then

 m m
i=0 Cm−i D
m−i i
N , m<k
m m
A = (D + N ) =
 k−1 Cm Dm−i N i , m ≥ k
i=0 m−i

k−1
f (i) (D) i
= N , if f (z) = z m and m ≥ k.
i=0
i!
These suggest the following generalizations.
Suppose A ∈ M(n; C) and A = D + N as in Ex. 17, where N k = O but

N k−1 = O.
(a) For any polynomial p(z),

k−1
p(i) (D) i
p(A) = p(D) + N ,
i=1
i!
where the former is diagonalizable and the latter is nilpotent, and

both are commutative.

∞
(b) For any power series f (z) = n=0 an z n with positive radius r of
convergence, if the spectral radius (see Ex. <D1 > 6 of Sec. 3.7.6)
ρ(A) < r, then

k−1
f (i) (D) i
f (A) = f (D) + N ,
i=1
i!
where the former is diagonalizable and the latter is nilpotent, and

both are commutative.
For example, since DN = N D,

eA = eD+N = eD eN = eD + eD (eN − In )

k−1
1 i
= eD + eD N .
i=1
i!
20. (continued from Ex. 19) Suppose A ∈ M(n; C) is invertible and

λ1 , . . . , λk are all distinct eigenvalues of A, with respective multiplicity
r1 , . . . , rk where r1 + · · · + rk = n. To solve
eX = A, where X ∈ M(n; C)
is equivalent to solve
eD1 = D,
eD1 (eN1 − In ) = N or eN1 = N D−1 + In ,
where A = D + N and X = D1 + N1 are decompositions as in Ex. 17.
Choose invertible Pn×n so that
 
λ1 I r 1 0
 
D = P −1 

..
.  P.

0 λ k Ir k
Then
 
(log λ1 )Ir1 0
 .. 
D1 = P −1  .  P.
0 (log λk )Irk
Recall that each log λj is multiple-valued. On the other hand,

(N D−1 )k = O and (N D−1 )k−1 = O hold. Hence
1 (−1)k−2
N1 = log(ND −1 + In ) = ND −1 − (ND −1 )2 + · · · + (N D−1 )k−1 .
2 k
When D1 and N1 are given as above, the matrix logarithm of A is
log A = D1 + N1 ,
which is multiple-valued. If a ∈ C and a = 0, the matrix power Aa is

defined as
Aa = ea log A , A ∈ GL(n; C).
1
In particular, if a = m where m ≥ 1 is a positive integer, then the mth
1
root A m is defined as
1 1 1 1 1
A m = e m log A = e m (D1 +N1 ) = e m D1 e m N1
 1 
λ1m Ir1 0
 
 ..  1
= P −1  .  P (In + N D−1 ) m .
 1

0 λkm Irk
21. Try to use Exs. 16, 17 and 3 to prove that every triangularizable matrix
A ∈ M(n; F) has a Jordan canonical basis B so that [A]B is the Jordan
canonical form of A (refer to Sec. B.12).
<D> Application: Differential equations
For preliminary explanations concerned, please refer to Ex. <D4 > of

Sec. 3.7.6. Also, Exs. <C> 13 and 19 are helpful in the computation of
etA in what follows.
We start from a concrete example. Solve
dx1 (t)
= 2x1 (t) − x2 (t) − x3 (t),
dt
dx2 (t)
= 2x1 (t) − x2 (t) − 2x3 (t),
dt
dx3 (t)
= −x1 (t) + x2 (t) + 2x3 (t).
dt
Written in matrix form, this system is equivalent to
 
2 2 −1
d
x
x (t)A, where A = −1 −1
= 1 and
dt
−1 −2 2

x (t) = (x1 (t), x2 (t), x3 (t)). (*4)
A has characteristic polynomial −(t − 1)3 . To compute the ranks:

 
1 2 −1
A − I3 = −1 −2 1 ⇒ r(A − I3 ) = 1;
−1 −2 1
(A − I3 )k = O3×3 for k ≥ 2 ⇒ r((A − I3 )k ) = 0 for k ≥ 2.
Therefore,
   
1 0 0 1 2 −1
A = P −1 JP, where J = 1 1 0 and P = 1 0 0 .
0 0 1 1 1 0
Compute etJ (see Ex. <C> 13):

 
et 0 0
etJ 
= tet et 0.
0 0 et
Hence, the general solution is

  
et 0 0 1 2 −1
 t
−1 tJ
x (t) = αP e P = c te et 0  1 0 0
t
0 0 e 1 1 0
= ((c1 + c2 + c3 + c2 t)et , (2c1 + c3 + 2c2 t)et , −(c1 + c2 t)et ).
If the initial condition x1 (0) = 0, x2 (0) = 0 and x3 (0) = 1 is imposed, then

c1 + c2 + c3 = 0, 2c1 + c3 = 0, −c1 = 1
⇒ c1 = −1, c3 = 2, c2 = −1.
The particular solution is
x (t) = (−tet , −2tet , (1 + t)et ).

2
Consider the non-homogeneous equation

dx
= x A + f (t), x (t0 ) =
c0 , (*5)
dt
where An×n is in M(n; C). (*5) can be rewritten as
dx
− x A = f (t)
dt
dx
⇒ − x A e−tA = f (t)e−tA
dt
d
⇒ x (t)e−tA = f (t)e−tA
dt
⇒ (integrate both sides entrywise form t0 to t, refer to
Ex. <D1 > of Sec. 3.7.6)
6 t

x (t)e−tA |tt=t0 =

f (t)e−tA dt
t0
6 t

⇒
x (t) =
c0 e(t−t0 )A + f (t)e−tA dt etA . (*6)
t0
This is the solution to (*5). Meanwhile, the solution to the homogenous

equation (*5) with f (t) = 0 is

x (t) =
c0 e(t−t0 )A .
What are the general solution to (*5) without the initial condition?
1. For each of the following equations, do the following problems.

(1) Solve the homogeneous equation ddtx = x A, and then the solution

to the initial value x (0) = x0 .

(2) Solve the inhomogeneous equation ddtx = x A + f (t), and then the
solution to the initial value
x (0) = x0 .

dx 1 4
(a) = x A + f (t), where A = , x (0) = (1, 2), f (t) = (1, −1).
dt 2 3
 
1 −2 −1
d
x
(b) x A + f (t), where A = −3 −6 −4 ,
=
dt
3 13 8

x (0) = (1, 0, 0), f (t) = (0, −1, 1).

 
3 4 0 0
d
x −4 −5 0 0
x A + f (t), where A =  ,

(c) = 
dt 0 −2 3 2
2 4 −2 −1

x (0) = (0, 0, 1, 1), f (t) = (1, 1, 1, 1).
2. Consider the equation
d2 x dx
−3 + 2x = e−3t , x(1) = 1, x (1) = 0.
dt2 dt
Rewrite it as

dx 0 −2

= x A + f (t), A = , f (t) = (0, e−3t ),
dt 1 3

dx
x (t) = x(t), and x (1) = (1, 0).
dt
Then (see Exs. <D4 > 5, 6 of Sec. 3.7.6 and (*6)),
2t
tA −e + 2et −2e2t + 2et
e =
e2t − et 2e2t − et

⇒ f (t)e−tA = e−5t − e−4t , 2e−5t − e−4t
6 t 6 t 6 t
−tA −5t −4t −5t −4t
⇒ f (t)e dt = (e − e ) dt, (2e − e ) dt
1 1 1
= ···
6 t
−tA
⇒ f (t)e dt etA
1

1 −3t 1 (2t−5) 1 t−4 2 −3t 2 (2t−5) 1 t−4
=e + e − e ,− e + e − e
20 5 4 5 5 4

1 1 1
x (t) = −e2(t−1) + 2et−1 + e−3t + e(2t−5) − et−4 , . . .
⇒
20 5 4
1 −3t 1 (2t−5) 1 t−4
⇒ x(t) = −e2(t−1) + 2et−1 + e + e − e .
20 5 4
Try to work out the details.
3. Model after Ex. 2 to solve each of the following equations.

d2 x
(a) + 4x = sin t, x(0) = 1, x (0) = 0.
dt2
d3 x d2 x dx
(b) − 2 − + x = 1, x(0) = 0, x (0) = 1, x (0) = −1.
dt3 dt dt
d3 x d2 x dx
(c) − 6 −7 − 6x = t, x(0) = 1, x (0) = 0, x (0) = 1.
dt3 dt2 dt
d2 x dx
(d) =2 + 5y + 3,
dt2 dt
dy dx
=− − 2y, x(0) = 0, x (0) = 0, y(0) = 1.
dt dt
4. Suppose each aij (t) is a real or complex valued function and is continuous
on t ≥ 0, for 1 ≤ i, j ≤ n. Let A(t) = [aij (t)]n×n . Then the differential
equation
d
x
=
x A(t),
a ∈ Rn (or Cn )
x (0) =
dt
has a unique solution

x =
a X(t), t ≥ 0,
where X(t)n×n is the unique solution of the matrix differential equation
dX
= XA(t), X(0) = In .
dt
If the matrix equation has a solution X(t), it must be of the form

6 t
X(t) = In + XA(s) ds
0
(see Ex.<D1 > of Sec. 3.7.6). This suggests the following proof, called
the method of successive approximation. Define
X0 = In
6 t
Xm+1 = In + Xm A(s) ds, m ≥ 1.
0
Step 1 Fix t1 > 0. Let α = max0≤t≤t1 A(t)1 (for 1 , see Ex. 14).
Then
6 t
Xm+1 − Xm 1 ≤ · · · ≤ Xm − Xm−1 1 A(s)1 ds
0
6 t
≤α Xm − Xm−1 1 ds
0
⇒ X1 − X0 1 ≤ αt,
α2 2
X2 − X1 1 ≤ t ,
2!
..
.
(αt)m+1
Xm+1 − Xm 1 ≤ , m≥1
(m + 1)!
∞ ∞
(αt)m+1
⇒ Xm+1 − Xm 1 ≤ = eαt − 1 < ∞, 0 ≤ t ≤ t1
m=0 m=0
(m + 1)!
∞

⇒ (Xm+1 − Xm ) = lim Xm = X exists on [0, t1 ] and hence,
m→∞
m=0 on t ≥ 0.
Step 2 Suppose Yn×n is another solution. Then
6 t
X −Y = (X − Y )A(s) ds.
0
Since both X and Y are differentiable, they are continuous on [0, t1 ].
Therefore α1 = max0≤t≤t1 X(t) − Y (t)1 < ∞. Now, for 0 ≤ t ≤ t1 ,
6 t
X(t) − Y (t)1 ≤ X(t) − Y (t)1 A(s)1 ds
0
6 t
≤ α1 A(s)1 ds ≤ α1 αt.
0
6 t 6 t
α2 α1 2
⇒ X(t) − Y (t)1 ≤ αα1 sA(s)1 ds ≤ α2 α1 s ds = t
0 0 2!
..
.
(αt)m
⇒ X(t) − Y (t)1 ≤ α1 , 0 ≤ t ≤ t1 , m ≥ 1
m!
⇒ X(t) − Y (t)1 = 0, 0 ≤ t ≤ t1
⇒ X(t) = Y (t) on 0 ≤ t ≤ t1 and hence, on t ≥ 0.
Thus, such a solution X is unique.
Step 3 Let
x =
a X(t). Then
dx d d
= ( a X(t)) = a (X(t)) =
a X(t)A(t) =
x A(t), t ≥ 0 and
dt dt dt

x (0) =
a X(0) = a In = a.
Hence, this x is a solution. Just like Step 2,
x can be shown to be
unique.
5. (continued from Ex. 4) In case A(t) = A is a constant matrix, then
x0 = In ,
6 t
x0 = In + A ds = In + tA,
0
..
.
1 2 2 1 m m
xm = In + tA + t A + ··· + t A , t ≥ 0.
2! m!
These leads to the following important fact:
dX
= XA, X(0) = In
dt
∞
1 m m
t A = etA .
m=0
m! (def.)
This definition coincides with that defined in Ex. <D2 > of Sec. 3.7.6.
Hence,
d
x
=
x A,
x (0) =
a
dt

x =
a etA .
These results are useful in linear functional equation, Lie group and Lie
algebra, quantum mechanics and probability, etc.
3.7.8 Rational canonical form

For real matrix A3×3 having nonreal eigenvalues, this subsection is going
to prove its counterpart of (2.7.75) and hence finishes the investigation of
canonical forms listed in (3.7.29).
A definitely has at least one real eigenvalue λ (refer to Sec. A.5). Hence,
its characteristic polynomial is of the form
det(A − tI3 ) = −(t − λ)(t2 + a1 t + a0 ),
where a1 and a0 are real constants. What we really care is the case that the
quadratic polynomial t2 +a1 t+a0 does not have real zeros, i.e. a21 −4a0 < 0.
But for completeness, we also discuss the cases a21 − 4a0 ≥ 0.
As we have learned in Secs. 2.7.6–2.7.8, 3.7.6 and 3.7.7, a canonical form
is a description of a matrix representation of a linear operator or square
matrix, obtained by describing a certain kind of ordered basis for the space
according to the features (e.g. eigenvalues, eigenvectors, etc.) of the given
operator or matrix. In this section, we will establish the canonical forms for
real 3×3 matrices based on the irreducible monic factors of its characteristic
polynomial instead of eigenvalues (see Sec. A.5).
Four cases are considered as follows.
Case 1 The characteristic polynomial det(A−tI3 ) = −(t−λ1 )(t−λ2 )(t−λ3 )

where λ1 = λ2 = λ3 . The canonical form in this case is nothing new but
the diagonal matrix studied in Sec. 3.7.6.
Case 2 det(A − tI3 ) = −(t − λ1 )2 (t − λ2 ) where λ1 = λ2 but (A − λ1 I3 ) ·

(A − λ2 I3 ) = O. As in Secs. 3.7.6 and 3.7.7, let

Gλ1 = {
x ∈ R3 |
x (A − λ1 I3 )2 = 0 } (the generalized eigenspace),

Eλ2 = {
x ∈ R3 |
x (A − λ2 I3 ) = 0 } (the eigenspace).
Case 1 in Sec. 3.7.7 showed that
1. dim Gλ1 = 2, dim Eλ2 = 1,

2. Gλ1 ∩ Eλ2 = { 0 },
3. R3 = Gλ1 ⊕ Eλ2 .
and a particular basis had been chosen for Gλ1 and hence a basis B for R3
so that [A]B is the Jordan canonical form for A.
Here, we try to choose another basis for Gλ1 so that the induced basis
B for R3 represents A in a canonical form. The central ideas behind this
method are universally true for Cases 3 and 4 in the following.
Notice that
(A − λ1 I3 )2 = A2 − 2λ1 A + λ21 I3

⇒
x (A2 − 2λ1 A + λ21 I3 ) = 0 x ∈ Gλ1
for all
⇒
x A2 = −λ21 x ∈ Gλ1 .
x A for all
x + 2λ1
This suggests that x A2 can be represented as a particular linear combina-

tion of x and x A for any

x ∈ Gλ1 . It is not necessary that x and xA
should be linear independent for any x ∈ Gλ1 , for example, that x is an

eigenvector of A associated to λ1 is the case. So, we try to find, if exists, a

x so that {
vector x,
x A} is linearly independent and hence forms a basis
for Gλ1 . This is the basis we wanted.
Therefore, we pick up any vector v1 in Gλ1 but not in Eλ1 , then
v1 A ∈ Gλ1 but is linearly independent of

v1 . Thus
Gλ1 =
v1 ,
v1 A, and
Eλ2 =
v2
⇒ B = {
v1 , v2 } is a basis for R3 .
v1 A,
Since

v1 A =0·
v1 +
v1 A,
( v1 A2 = −λ21
v1 A)A = v1 + 2λ1
v1 A,

v2 A= λ2
v2
   

v1 0 1 0 v1
   2   
⇒  v1 A A = −λ1 2λ1 0   v 1 A

v2 0 0 λ2 v2
   
0 1 0 v1
   
⇒ [A]B = PAP −1 = −λ21 2λ1 0 , where P =  v 1 A .

0 0 λ2 v2
This is the so-called rational canonical form of A.
Case 3 det(A − tI3 ) = −(t − λ)3 but A − λI3 = O.

It might happen that (A−λI3 )2 = O. Then Case 2 in Sec. 3.7.7 indicated

that
dim Eλ = 2 and dim Gλ = 3, where Gλ = Ker(A − λI3 )2 .
As in Case 2 above, take a vector v1 ∈ Gλ = R3 so that v1 A is not in Eλ

but is linearly independent of v1 . Since dim Eλ = 2, it is possible to choose
another vector v2 ∈ Eλ so that, as in Case 2 above,
B = {
v1 , v2 }
v1 A,
is a basis for R3 . In B,
   
0 1 0 v1
[A]B = PAP −1 = −λ2 2λ 0 , where P = v 1 A .


0 0 λ v2
In case (A − λI3 )2 = O but (A − λI3 )3 = O. Case 3 in Sec. 3.7.7 showed
that
dim Eλ = 1 and dim Gλ = 3, where Gλ = Ker(A − λI3 )3 = R3 .
Since
(A − λI3 )3 = O

⇒
x (A3 − 3λA2 + 3λ2 A − λ3 I3 ) = 0 x ∈ Gλ
for all
⇒ x − 3λ2
x A3 = λ3 x A + 3λ
x A2 x ∈ Gλ ,
for all
all we need to do is to choose a vector v ∈ Gλ so that { v, v A2 } is
v A,
linearly independent. Since (A − λI3 ) = O, it is possible to choose a vector
2
v ∈ Gλ = R3 so that

v (A − λI3 )2 = 0 v (A − λI3 ) = 0 )
(which implicitly implies that
v A2 = −λ2
⇒ v + 2λ
v A and

v (A − λI3 )2 is an eigenvector of A associated to λ.
It follows that
B = {
v, v A2 }
v A,
is a basis for R3 (see Ex. <A> 1). In B,
   
0 1 0 v
[A]B = PAP −1 =  0 0 1 , vA .
where P = 
λ3 −3λ2 3λ 2
vA
Case 4 det(A − tI3 ) = −(t − λ)(t2 + a1 t + a0 ) where a21 − 4a0 < 0. Still
denote Eλ = Ker(A − λI3 ) and introduce

Kλ = {
x ∈ R3 |
x (A2 + a1 A + a0 I3 ) = 0 }.
It is obvious that Kλ is an invariant subspace of R3 , i.e.
x A ∈ Kλ for each
x ∈ Kλ .

Since a21 − 4a0 < 0, so Eλ ∩ Kλ = { 0 }.
We claim that dim Kλ = 2. Take any nonzero vector v ∈ Kλ , then
{ v , v A} is linearly independent and hence, is a basis for Kλ . To see this,

suppose on the contrary that v and v A are linearly dependent. Therefore,

there exist scalars α and β, at least one of them is not equal to zero, so that

α
v + β
vA = 0

⇒ α
v A + β
v A2 = 0

⇒ (since
v A2 = −a1
v A − a0
v ) − a0 β
v + (α − a1 β)
vA = 0
⇒ α: (−a0 β) = β: (α − a1 β) and hence −a0 β 2 = α2 − a1 αβ
⇒ α2 − a1 αβ + a0 β 2 = 0
which is impossible since a21 − 4a0 < 0.
Take any nonzero vector u ∈ Eλ . Then
B = {
v, u}
v A,
is a basis for R3 . In B,
  
0 1 0 v
[A]B = PAP −1 = −a0 −a1 0 , v A .
where P = 

0 0 λ u
We summarize as (refer to (3.7.46) and compare with (3.7.50)
The rational canonical form of a nonzero real matrix A3×3
det(A − tI3 ) = −(t − λ)(t2 + a1 t + a0 ),
where λ, a1 and a0 are real numbers.
(1) In case a21 − 4a0 > 0, then det(A − tI3 ) = −(t − λ1 )(t − λ2 )(t − λ3 )
where λ = λ1 = λ2 = λ3 are real numbers. A is diagonalizable. See (1)
in (3.7.46).
(2) In case a21 − 4a0 = 0, then det(A − tI3 ) = −(t − λ1 )2 (t − λ2 ) where λ1
is a real number and λ = λ2 .
(a) λ1 = λ2 but (A − λ1 I3 )(A − λ2 I3 ) = O. A is diagonalizable. See (2)

in (3.7.46).
(b) λ1 = λ2 but (A − λ1 I3 )(A − λ2 I3 ) = O. Let
Gλ1 = Ker(A − λ1 I3 )2 and Eλ2 = Ker(A − λ2 I3 ).
Then
1. dim Gλ1 = 2, dim Eλ2 = 1,

2. Gλ1 ∩ Eλ2 = { 0 },
3. R3 = Gλ1 ⊕ Eλ2 .
Take any vector v1 ∈ Gλ1 \Eλ1 where Eλ1 = Ker(A − λ1 I3 ) and
v2 ∈ Eλ2 , then B = {
any nonzero vector v1 , v2 } is a basis for
v1 A,
R3 and
   
0 1 0 v1
[A]B = PAP −1 = −λ21 2λ1 0  , where P =  v 1 A .

0 0 λ2 v2
(c) λ1 = λ2 = λ but A − λI3 = O. A = λI3 is a scalar matrix. See (3)

in (3.7.46).
(d) λ1 = λ2 = λ but (A − λI3 ) = O, (A − λI3 )2 = O. Let
Gλ = Ker(A − λI3 )2 and Eλ = Ker(A − λI3 ).
Then dim Eλ = 2 and dim Gλ = 3. Take any vector v1 ∈ Gλ \Eλ

so that v1 A ∈/ Eλ and is linearly independent of
v1 . In the basis
B = {v1 , v2 } for R3 , where
v1 A, v 2 ∈ Eλ ,
   
0 1 0 v1
[A]B = PAP −1 = −λ2 2λ 0  , where P =  v 1 A .

0 0 λ v2
(e) λ1 = λ2 = λ but (A − λI3 )2 = O, (A − λI3 )3 = O. Let
Gλ = Ker(A − λI3 )3 and Eλ = Ker(A − λI3 ).
Then dim Eλ = 1 and dim Gλ = 3. Choose a vector v ∈ Gλ = R3

so that v (A − λI3 )2 = 0 . Thus, in the basis B = { v , v A,

v A2 } for
R ,
3
   
0 1 0 v
[A]B = PAP −1 =  0 0 1  , where P =  vA .
λ3 −3λ2 3λ 2
vA
(3) In case a21 − 4a0 < 0. Let
Kλ = Ker(A2 + a1 A + a0 I3 ) and Eλ = Ker(A − λI3 ).
Thus
1. dim Kλ = 2, dim Eλ = 1,

2. Kλ ∩ Eλ = { 0 },
3. R3 = Kλ ⊕ Eλ .
Take any nonzero vector v1 ∈ Kλ and any nonzero vector
v2 ∈ Eλ . In
the basis B = { v1 , v1 A, v2 },

   
0 1 0 v1
[A]B = PAP −1 = −a0 −a1 0  , where P =  v1 A . (3.7.53)

0 0 λ v2
For geometric mapping properties for matrices in rational canonical forms,

refer to Examples 4 and 5 in Sec. 3.7.2.
Example 1 Let
 
0 0 1
A = 1 0 −1 .
0 1 1
Find the rational canonical form of A. If A is considered as a complex
matrix, what happens?
Solution The characteristic polynomial of A is
det(A − tI3 ) = t2 (1 − t) + 1 − t = −(t − 1)(t2 + 1).
For E = Ker(A − I3 ): Solve

x (A − λI3 ) = 0
⇒ E = (1, 1, 1).
For K = Ker(A2 + I3 ): By computing

     
0 1 1 1 0 0 1 1 1
A2 + I3 = 0 −1 0 + 0 1 0 = 0 0 0 ,
1 1 0 0 0 1 1 1 1
then solve

x (A2 + I3 ) = 0
⇒ x1 + x3 = 0
x ∈ R3 | x1 + x3 = 0} = (0, 1, 0), (1, 0, −1).
⇒ K = {
Choose
v1 = (0, 1, 0). Then v1 A = (1, 0, −1). Also, take
v2 = (1, 1, 1).
Then B = { v1 , v1 A, v2 } is a basis for R . Therefore,
3
v1 A = 0 ·

v1 + 1 ·
v1 A + 0 ·
v2 ,
( v1 A)A = v1 A = (0, −1, 0) = −
2
v1 + 0·
v1 A + 0 ·
v2 ,
v2 A = v2 = 0 · v1 + 0 · v1 A + 1 ·

v2
   
0 1 0 0 1 0
⇒ [A]B = PAP −1
= −1 0 0 , 
where P = 1 0 −1 .
0 0 1 1 1 1
This is the required canonical form.
On the other hand, if A is considered as a complex matrix,
det(A − tI3 ) = −(t − 1)(t − i)(t + i)
and then A has three distinct complex eigenvalues i, −i and 1.
For λ1 = i: Solve

x (A − iI3 ) = 0

x = (x1 , x2 , x3 ) ∈ C3 .
where
⇒ −ix1 + x2 = −ix2 + x3 = 0
x = (−x3 , −ix3 , x3 ) = x3 (−1, −i, 1)
⇒ for x3 ∈ C
⇒ Ei = Ker(A − iI3 ) = (−1, −i, 1).
Ei is a one-dimensional subspace of C3 . For λ1 = −i: Solve

x (A + iI3 ) = 0
⇒ ix1 + x2 = ix2 + x3 = 0
x = (−x3 , ix3 , x3 ) = x3 (−1, i, 1) for x3 ∈ C
⇒
⇒ E−i = Ker(A + iI3 ) = (−1, i, 1)
and dim E−i = 1. As before, we knew already that E1 = Ker(A − I3 ) =
(1, 1, 1). In the basis C = {(−1, −i, 1), (−1, i, 1), (1, 1, 1)} for C3 ,
   
i 0 0 −1 −i 1
[A]C = QAQ−1 = 0 −i 0 , where Q = −1 i 1 .
0 0 1 1 1 1
Thus, as a complex matrix, A is diagonalizable (refer to (3.7.46)). 2
Example 2 Determine the rational canonical form of

 
4 2 3
A= 2 1 2 .
−1 −2 0
What happens if A is considered as a complex matrix?
det(A − tI3 ) = −(t − 1)2 (t − 3).
Also
 
3 2 3
A − I3 =  2 0 2 ⇒ r(A − I3 ) = 2
−1 −2 −1
⇒ dim Ker(A − I3 ) = 1 < 2,
 
10 0 10
(A − I3 )2 =  4 0 4 ⇒ r(A − I3 )2 = 1
−6 0 −6
⇒ dim Ker(A − I3 )2 = 2.
Hence A is not diagonalizable, even as a complex matrix.

Solve

x (A − I2 )2 = 0

⇒ 5x1 + 2x2 − 3x3 = 0

⇒ Ker(A − I3 )2 = (3, 0, 5), (0, 3, 2).
Take v1 A = (7, −4, 9). Choose

v1 = (3, 0, 5) and compute v2 = (1, 0, 1) ∈
Ker(A − 3I3 ). Then

v1 A =0·
v1 + 1 ·
v1 A + 0 ·
v2
( v1 A2 = (11, −8, 13) = 2(7, −4, 9) − (3, 0, 5)
v1 A)A =
= − v1 A + 0 ·
v1 + 2 v2

v2 A v2 = 0 ·
= 3 v1 + 0 ·
v1 A + 3 ·
v2
   
0 1 0 3 0 5
⇒ [A]B = PAP −1 = −1 2 0 , where P = 7 −4 9 .
0 0 3 1 0 1
2
Example 3 Determine the rational canonical forms of

   
2 0 0 2 1 0
A = 0 2 1 and B = 0 2 1 .
0 0 2 0 0 2
Solution Both A and B have the same characteristic polynomial
det(A − tI3 ) = det(B − tI3 ) = −(t − 2)3 .
For A:
 
0 0 0

A − 2I3 = 0 0 1 ⇒ r(A − 2I3 ) = 1 ⇒ dim Ker(A − 2I3 ) = 2,
0 0 0
(A − 2I3 )2 = O ⇒ dim Ker(A − 2I3 )2 = 3.
Solve

x (A − 2I3 ) = 0

⇒ Ker(A − 2I3 ) = (1, 0, 0), (0, 0, 1).
Choose v1 = (0, 1, 0) so that

v1 A = (0, 2, 1) is linearly independent of v1 .
v2 = (1, 0, 0). In the basis B = {
Take v1 , v2 } for R3 , since (
v1 A, v1 A)A =
v1 A = (0, 4, 4) = −4(0, 1, 0) + 4(0, 2, 1) = −4
2
v1 + 4 v1 A, so
   
0 1 0 0 1 0
[A]B = PAP −1 = −4 4 0 , where P = 0 2 1 .
0 0 2 1 0 0
For B:
 
0 1 0
B − 2I3 = 0 0 1 ⇒ r(B − 2I3 ) = 2 ⇒ dim Ker(B − 2I3 ) = 1,
0 0 0
 
0 0 1
(B − 2I3 )2 = 0 0 0 ⇒ r(B − 2I3 )2 = 1,
0 0 0
(B − 2I3 )3 = O3×3 ⇒ dim Ker(B − 2I3 )3 = 3.

Take
v = v (B − 2I3 )2 = (0, 0, 1) = 0 and so consider
e1 = (1, 0, 0), then
v B = (2, 1, 0) and v B = (4, 4, 1). In the basis C = {
2
v, v B 2 } for R3 ,
v B,

v B 3 = (8, 12, 6) = 6 · (4, 4, 1) − 12 · (2, 1, 0) + 8 · (1, 0, 0)
v − 12
= 8 v B + 6
v B2
   
0 1 0 1 0 0
⇒ [B]C = QBQ−1 = 0 0 1 , where Q = 2 1 0 .
8 −12 6 4 4 1
2
To the end, we use two concrete examples to illustrate the central ideas
behind the determination of the rational canonical forms of matrices of
order higher than 3. For details, read Sec. B.12.
Example 4 Analyze the rational canonical form

 
..
 0 1 0 0 . 
 .. 
 0 0 1 0 . 
 
 .. 
 0 0 0 1 . 
 .. 
 
 −1 −2 −3 −2 . 
 .. 
· · · · · · · · · · · · . ··· ··· ··· 
A=  .. ..


 . 0 1 . 
 
 .. . 
 . −1 −1 .. 
 
 .. . 
 . ··· · · · .. ··· · · ·
 
 .. 
 . 0 1
..
. −1 1 8×8
 
R1 0
= R2 .
0 R3
Analysis The characteristic polynomial is

det(A − tI8 ) = det(R1 − tI4 )det(R2 − tI2 )det(R3 − tI2 )
= (t2 + t + 1)2 · (t2 + t + 1) · (t2 − t + 1)
= (t2 + t + 1)3 (t2 − t + 1).
For computations concerned with block matrices, refer to Ex. <C> of

Sec. 2.7.5. Let N = {
e1 , e8 } be the natural basis for R8 .
e2 , . . . ,
For R1 and R2 :

e1 A =
e2 ,

e2 A =
e3 =
e1 A2 ,

e3 A =
e4 =
e1 A3 ,

e4 A = −
e1 − 2
e2 − 3
e3 − 2
e4
= −
e1 − 2
e1 A − 3
e1 A2 − 2
e1 A3 =
e1 A4

⇒ ei (A2 + A + I8 )2 = 0 for 1 ≤ i ≤ 4;
ei (A4 + 2A3 + 3A2 + 2A2 + I8 ) =

e5 A =
e6 ,

e6 A = −
e5 A −
e6 = −
e5 − e5 (−I8 − A) =
e5 A = e5 A2

⇒
ei (A2 + A + I8 ) = 0 for i = 5, 6.
Also, compute the ranks as follows:
 
R12 0
A2 =  R22 
0 R32
 .. 
 0 0 1 0 . 
 . 
 0 0 0 1 .. 
 
 . 
 −1 −2 −3 −2 .. 
 
 . 
 2 1 .. 
 3 4 
 . 
 
· · · ··· ··· · · · .. · · · ······ 
= 
 .. . 
 . −1 −1 .. 
 
 .. . 

 . 1 0 .. 

 .. . 
 . ··· · · · .. ··· · · ·
 
 .. 
 . −1 1
 
..
. −1 0
 .. 
 1 1 1 0 . 
 . 
 0 1 1 1 .. 
 
 . 
 −1 −2 −2 −1 .. 
 
 . 
 1 0 .. 
 1 1 
 . 
 
· · · ··· ··· · · · .. · · · ······ 
⇒ A2 + A + I8 =  
 .. . 
 . 0 0 .. 
 
 .. . 

 . 0 0 .. 

 .. . 
 . ··· · · · .. ··· · · ·
 
 .. 
 . 0 2
 
..
. −2 2
⇒ r(A + A + I8 ) = 4;
2
 
O4×4
 O2×2 
 
 .. 
 . ··· · · ·
⇒ (A + A + I8 ) = 
2 2

 .. 
 . −4 4
 
..
. −4 0 8×8
⇒ r(A + A + I8 ) = 2 for k ≥ 2.
2 k
These relations together provide the following information:

1. Let p1 (t) = t2 + t + 1. The set

Kp1 = {
x ∈ R8 |
x (A2 + A + I8 )3 = 0 } = Ker p1 (A)3
=
e1 , . . . ,
e4 , e6 =
e5 , e1 ,
e2 , e4 ⊕
e3 , e6
e5 ,
is an invariant subspace of R8 of dimension equal to
3·2 = (the algebraic multiplicity 3 of p1 (t)). (the degree of p1 (t)) = 6.
2. Kp1 contains an invariant subspace

e1 ,
e2 , e4 =
e3 , e1 ,
e1 A, e1 A3 = C(
e1 A2 , e1 )
which is an A-cycle of length 4 = degree of p1 (t)2 . Note that p1 (A)2 is
an annihilator of CA (
e1 ), simply denoted by C( e1 ), of least degree, i.e.

p1 (A)2 C(
e1 ) = O4×4 .
3. Kp1 contains another invariant subspace

e6 =
e5 , e5 ,
e5 A = C(
e5 )
which is an A-cycle of length 2 = degree of p1 (t). Note that p1 (A) is an
annihilator of CA (
e5 ) of least degree, i.e.
p1 (A)|C(
e5 ) = O2×2 .
4. Solve

x p1 (A) = 0 x ∈ R8
for
⇒
x = x3 (1, 1, 1, 0, 0, 0, 0, 0) + x4 (−1, 0, 0, 1, 0, 0, 0, 0)
+ (0, 0, 0, 0, x5 , x6 , 0, 0)
⇒ Ker p1 (A) =
v1 ,
v2 , e6
e5 , where

v1 = (1, 1, 1, 0, 0, 0, 0, 0) and
v2 = (−1, 0, 0, 1, 0, 0, 0, 0).
5. Solve

x p1 (A)2 = 0
⇒ Ker p1 (A)2 =
e1 ,
e2 ,
e3 ,
e4 , e6 .
e5 ,
Hence
Ker p1 (A)2 =
e1 ,
e2 , e4 ⊕
e3 , e6
e5 ,
=
v1 ,
v2 , e4 ⊕
e3 , e6 .
e5 ,
v ∈ Ker p1 (A)2 but not in Ker p1 (A), say
6. Take any vector v =
e3 +
e4 .
Then
v A = (−1, −2, −3, −1, 0, . . . , 0),

v A2 = (1, 1, 1, −1, 0, . . . , 0),

v A3 = (1, 3, 4, 3, 0, . . . , 0),

v A4 = (−3, −5, −6, −2, 0, . . . , 0) = −
v − 2
v A − 3
v A2 − 2
v A3 .
Also, B
v = { v , v A, v A , v A } is a basis for Ker p1 (A) . In B
2 3 2
v,
[A|Ker p1 (A)2 ]B = R1 .
v
The matrix R1 is called the companion matrix of p1 (t)2 = t4 + 2t3 +
u ∈ Ker p1 (A), say
3t2 + 2t + 1. Take any vector u =
v1 . Then

u A = (0, 1, 1, 1, 0, 0, 0, 0),
u A2 = (−1, −2, −2, −1, 0, 0, 0, 0) = −

u −
uA
u = { u , u A} is a basis for Ker p1 (A). In B

and B
u,
[A|Ker p1 (A) ]B = R2 .

u
The matrix R2 is called the companion matrix of p1 (t) = t2 + t + 1.

7. It can be show that B v ∪ B
u is linearly independent. Hence B
v ∪ Bu
is a basis for Ker p1 (A)3 = Ker p1 (A)2 . Notice that
H cycles
HH C(
v) C(
u)
dots HH
• • ← total number 2
1
= [dim R8 − r(A2 + A + 1)]
2
1
= ·4=2
2
• ← total number 1
1
= [r(A2 + A + 1)
2
−r(A2 + A + 1)2 ] = 1
⇓ ⇓
p1 (t)2 p1 (t)
produces the produces the
annihilator p1 (A)2 ; annihilator p1 (A).
1
Where the 2 in 2 is the degree of p1 (t). Also,
dim Ker p1 (A)3 = deg p1 (t)2 + deg p1 (t)

=2·2+2·1
=2·3
= (the degree of p(t)) · (the number of dots)
Combining together, we get

R1 0

A Ker p1 (A)3 B ∪B = . (*1)
v u 0 R2 6×6
For R3 :

e7 A =
e8 ,

e8 A = (0, . . . , 0, −1, 1) = − e8 = −
e7 + e7 +
e7 A =
e7 A2

⇒
ei (A2 − A + I8 ) = 0 for i = 7, 8.
Also,
 .. 
 0 −1 1 0 . 
 . 
 0 0 −1 0 .. 
 
 
 −1 −2 −3 −3 ... 
 
 
 3 5 7 3 ... 
 
 . 
 
· · · · · · · · · · · · .. · · · · · · ··· 
A − A + I8 = 
2

 .. .. 
 . −1 2 . 
 
 .. .. 
 . 2 1 . 
 
 .. 
 ··· ··· ··· .· · · · · ·
 
 .. 
 . 0 0
 
..
. 0 0
⇒ r(A − A + I8 ) = 6
2
⇒ r(A2 − A + I8 )k = 2 for k ≥ 2.
These facts provide the following information:
1. Set p2 (t) = t2 − t + 1. Then

Ker p2 (A) = {
x ∈ R8 |
x p2 (A) = 0 }
is an invariant subspace of dimension 2, the degree of p2 (t).

2. Ker p2 (A) = e8 =
e7 , e7 ,
e7 A = C(
e7 ) is an A-cycle of length 2
which is annihilated by p2 (t), i.e.
p2 (A)|C(
e7 ) = O2×2 .
3. Solve

x p2 (A) = 0 x ∈ R8
for
⇒ Ker p2 (A) = e8
e7 , as it should be by 2.
w ∈ Ker p2 (A), say

4. Take any nonzero vector w = α
e7 + β
e8 . Then

wA = α
e7 A + β
e8 A = α e7 A2 = −β
e7 A + β e7 + (α + β)
e8
⇒
wA2 = −β e8 A = −β
e7 A + (α + β) e7 A + (α + β)(−
e7 +
e7 A)
= −(α + β)
e7 + α
e7 A
= −
w+
wA.
Then B w = {w, wA} is a basis for Ker p2 (A). The matrix

A Ker p2 (A) B = R3 (*2)
w
is called the companion matrix of p2 (t) = t2 − t + 1.
5. Notice that
H cycles C( e7 )
HH
dots H HH
• ← total number
1
= [dim R8 − r(A2 − A + 1)] = 1
2
⇓
p2 (t) produces the
annihilator p2 (A).
Putting (*1) and (*2) together, let
v ∪ B
B = B u ∪ B
w = { v , v A, v A , v A , u , u A, w, wA}.
2 3
B is basis for R8 and is called a rational canonical basis of A. In B,


  v
R1 0 v A
[A]B = PAP −1 =  R2  , where P =   . 

 .. 
0 R3

wA 8×8
is called the rational canonical form of A. 2
Example 5 Find a rational canonical basis and the rational canonical form
for the matrix
 
0 2 0 −6 2
1 −2 0 0 2
 

A = 1 0 1 −3 2 .
1 −2 1 −1 2
1 −4 3 −3 4
What happens if A is considered as a complex matrix?
Solution The characteristic polynomial of A is

det(A − tI5 ) = −(t2 + 2)2 (t − 2).
Then p1 (t) = t2 + 2 has algebraic multiplicity 2 and p2 (t) = t − 2 has
algebraic multiplicity 1. Therefore
dim Ker p1 (A)2 = 4 and dim Ker p2 (A) = 2.
To compute the ranks:
 
−2 0 0 0 0
 0 −2 6 −12 6
 
A2 = 
 0 0 4 −12 6 
 0 0 6 −14 6
0 0 12 −24 10
 
0 0 0 0 0
0 0 6 −12 6
 
⇒ A2 + 2I5 = 
0 0 6 −12 6  ⇒ r(A + 2I5 ) = 1
2
0 0 6 −12 6 
0 0 12 −24 12
⇒ r(A2 + 2I5 )2 = 1.
Since 12 (dim R5 − r(A2 + 2I5 )) = 12 (5 − 1) = 2, so there exists a rational
canonical basis B = {v1 , v1 A, v2 ,
v2 A, v 3 } of A so that
 
..
 0 1 . 
 .   
−2 0 ..  v1
 . 
· · · · · · .. · · · · · · · · ·  v 1 A
 .. ..   
−1
[A]B = QAQ =     
. 0 1 .  , where Q =  v2  . (*3)
   
 .. .  v2 A
 . −2 0 .. 
 .. ..  v3
 . ··· ··· . ··· 
..
. 2
To find such a rational canonical basis B: Solve

x (A2 + 2I5 ) = 0
⇒ x2 + x3 + x4 + 2x5 = 0
x = (x1 , −x3 − x4 − 2x5 , x3 , x4 , x5 )
⇒
= x1 (1, 0, 0, 0, 0) + x3 (0, −1, 1, 0, 0) + x4 (0, −1, 0, 1, 0)
+ x5 (0, −2, 0, 0, 1) where x1 , x3 , x4 , x5 ∈ R.
Take
v1 =
e1 , then
v1 A = (0, 2, 0, −6, 2)

⇒ v1 A = (−2, 0, 0, 0, 0)
2
= −2
e1 = −2
v1 .
v2 = (0, −1, 1, 0, 0) which is linearly independent of
Take v1 and
v1 A. Then
v2 A = (0, 2, 1, −3, 0)

⇒ v2 A = (0, 2, −2, 0, 0)
2
= −2
v2 .
Solve

x (A − 2I5 ) = 0

x = (0, 0, x3 , −2x3 , x3 ) = x3 (0, 0, 1, −2, 1)

⇒ for x3 ∈ R.
Take the eigenvector
= (0, 0, 1, −2, 1). Then B =
v3 {
v1 ,
v1 A,
v2 , v 3}
v2 A,
is a rational canonical basis of A and a required Q is
 
1 0 0 0 0
0 2 0 −6 2
 
Q=  0 −1 1 0 0.
0 2 1 −3 0
0 0 1 −2 1
The transpose A∗ is the matrix B mentioned in Ex. 15 of Sec. B.12.
P A∗ P −1 is the rational canonical form indicated in (*3). Hence, A and A∗
have the same rational canonical form and thus, they are similar. In fact
A∗ = RAR−1 , where R = P −1 Q.
Suppose A is considered as a complex matrix. Then
√ √
det(A − tI5 ) = −(t − 2i)2 (t + 2i)2 (t − 2)
and A has the rational canonical form
 
..
 0 1 . 
 
 2 2√2i ... 
 
 .. 
· · · ··· . ··· ··· ··· 
 
 
 .. .. 
 . 0 1 . 
 √ 
 .. .. 
 . 2 −2 2i . 
 
 .. .. 
 . ··· ··· . · · ·
 
..
. 2 5×5
Readers are urged to find an invertible complex matrix S5×5 so that SAS −1
is the above-mentioned rational canonical form.
Exercises
<A>
1. In Case 3, try to show that B = { v, v A2 } is a basis for R3 , where

v A,

v ∈ Gλ but v (A − λI3 ) = 0 .
2

(1) Find a rational canonical basis B and a matrix P so that PAP −1 is
the rational canonical form of A.
(2) Find a matrix Q so that QA∗ Q−1 is the rational canonical form of
A. Try to use (3.7.32) and Ex. <A> 5 there to derive the results
from (1).
(3) Show that A and A∗ are similar and find an invertible R so that
A∗ = RAR−1 .
(4) If A is considered as a complex matrix, find the rational canonical
form of A. Is A diagonalizable? If it is, find a matrix S so that
SAS −1 is diagonal.
     
0 3 1 10 11 3 2 0 0
(a) −1 3 1 . (b) −3 −4 −3 . (c) 3 2 0 .
0 1 1 −8 −8 −1 0 0 5
     
1 1 1 7 3 3 2 0 3
(d) 0 1 0 . (e)  0 1 0 . (f) 0 1 0 .
0 0 1 −3 −3 1 0 1 2
   
3 0 0 3 0 0
(g) 0 3 1 . (h) 1 3 0 .
0 0 3 0 1 3

1. For each of the following matrices, do the same problems as listed in

Ex. <A> 2.
     
2 3 3 5 2 0 0 0 2 1 0 0
3 2 2 3 0 2 0 0 0 2 0 0
(a) 
0 0
. (b)  . (c)  .
1 1 0 0 3 1 0 0 3 0
0 0 0 1 0 0 0 3 0 0 0 3
2. In Example 5, find invertible complex matrix S so that SAS −1 is the

rational canonical form over the complex field.
3. Let V be the vector space of all 2 × 2 real lower triangular matrices.
Define f : V → V by

a11 0 7a11 + 3a21 + 3a22 0
f = .
a21 a22 a21 −3a11 − 3a21 + a22
Show that f is linear and find its rational canonical form and a rational
canonical basis.
Read Sec. B.12 and try to do exercises there.
3.8 Affine Transformations

At this moment, one should turn to Sec. 2.8 for general description about
affine space (2.8.1), affine subspace (2.8.2) and affine transformation or
mapping (2.8.3).
R3 will play as a three-dimensional real vector space and, at the same
time, as an affine space.
Results such as (2.8.4) and (2.8.8), including the procedures to obtain
them, can be easily extended to R3 .
For example, an affine transformation T : R3 → R3 is geometrically char-
acterized as the one-to-one mapping from R3 onto itself that preserves ratios
of signed lengths of line segments along the same or parallel lines in R3 .
While, the affine group on the space R3 is
Ga (3: R) = { x) |
x 0 + f ( x 0 ∈ R3 and
f : R3 → R3 is an invertible linear operator} (3.8.1)
with the identity element I: R3 → R3 defined as the identity operator

1R3 : R3 → R3 , i.e.
I(
x ) = 1R3 (
x) = x ∈ R3 ,
x for all
and the inverse

T −1 (
x ) = −f −1 (
x 0 ) + f −1 (
x)
of T (
x) = x 0 + f (
x ).
Elements in Ga (3; R) are also called affine motion if their geometric
aspects are going to be emphasized.
This section is divided into five subsections, each of them as a counter-
part of the same numbered subsection of Sec. 2.8.
Section 3.8.1: Use examples to illustrate matrix representations of an
affine transformation with respect to various affine bases for R3 and the
relations among them.
Section 3.8.2: As an extension of Sec. 3.7.2 to affine transformations,
here we introduce the following basic ones along with their geometric and
algebraic characterizations:
Translations; Reflections;
(one-way, two-way) Stretches and Enlargement (Similarity);
Shearings; Rotations; Orthogonal reflections.
Section 3.8.3: Study the affine invariants under the affine group Ga (3; R)
and prove (3.7.15) and the geometric interpretation of the determinant of
a linear operator on R3 .
Section 3.8.4: Here we treat the affine geometry for R3 as a model for
R or finite-dimensional affine space over a field or ordered field:
n
1. Affine independence and dependence; affine basis; affine coordinates

and barycentric coordinates; affine subspaces and their operations
including intersection theorem; Menelaus and Ceva theorems.
2. Half space, convex set, simplex and polyhedron.
3. Affine transformations.
4. Connections of R3 with the three-dimensional projective space
P 3 (R).
Section 3.8.5: Introduce the classifications of quadrics in the affine
(Euclidean) space R3 and in the projective space P 3 (R). The algebraic
characterization of each quadric and the study of such subjects as diametral
plan, center, tangent plane, pole and polar, etc. will be postponed to Sec. 5.10.
3.8.1 Matrix representations

Just like Sec. 3.7.3 to Sec. 2.7.2, the processes adopted, the definitions
concerned and the results obtained in Sec. 2.8.1 can be easily generalized,
almost verbatim, to affine transformations on R3 .
For examples,
R2 R3
affine basis affine basis B = { a0 ,
a1 , a3 } with base point
a2 ,
B = {a0 , a2 } with
a1 ,
a0 (see (3.6.1)).
base point a0 .
T : R2 → R2 is an T (
x) = x ) : R3 → R3 is an affine
x0 + f (
affine transformation. transformation.
(2.8.9) [T (
x )]C = [T ( x ]B [f ]B
a0 )]C + [ C is the matrix
representation of T with aspect to B and C. See
Explanation below. (3.8.2)
B
f 0
(2.8.10) ([T (
x )]C 1) = ([
x ]B 1) C . (3.8.3)
T ( a0 ) C
1
7
A 0
Ga (2; R) in (2.8.11) Ga (3; R) = x0 1 A ∈ GL(3; R) and
8

x0 ∈ R3 with

I 0
identity: 03 1 , and
−1
A 0 A 0
inverse: x0 1 = − x A−1 1
. (3.8.4)
0
(2.8.15) T = f2 ◦ f1 with f2 ( x ) = [T (x0 ) −

x0 ] +
x, a
translation and f1 ( x ) = x0 + f ( x − x0 ), an affine

transformation keeping x0 fixed. (3.8.5)

(2.8.16) The fundamental theorem: For any two affine bases

B = { a0 ,
a1 , a3 } and C = { b0 , b1 , b2 , b3 },
a2 ,
there exists a unique T ∈ Ga (3; R) such that
T (ai ) = bi for 0 ≤ i ≤ 3. (3.8.6)
Explanation about (3.8.2) and (3.8.3):

Let B = {a0 ,
a1 , a3 } and C = { b0 , b1 , b2 , b3 } be two affine bases for
a2 ,
R3 and
T (
x) =
x0 + f (
x)
an affine transformation on R3 .
Rewrite T as
T (
x ) = T ( x −
a0 ) + f ( a0 ), where T (
a0 ) =
x0 + f (
a0 ).

⇒ T (
x) − b0 a0 ) −
= T ( b0 x −
+ f ( a0 ).
Consider B and C as bases for R3 , i.e.

a1 −
B = { a2 −
a0 , a3 −
a0 , a0 }, and

C= { b1 − b0 , b2 − b0 , b3 − b0 }. (*1)
(see (3.6.1)). Then

x ) − b0 ]C = [T (
[T ( a0 ) − b0 ]C + [f (
x −
a0 )]C

a0 ) − b0 ]C + [
= [T ( a0 ]B [f ]B
x − C
or, in short (see (3.6.5))

[T (
x )]C = [T ( x ]B [f ]B
a0 )]C + [ C (3.8.2)
which is called the matrix representation of T with respect to the affine
bases B and C in the sense (∗ 1), i.e.

3
x )]C = (y1 , y2 , y3 ) ⇔ T (
1. [T ( x ) − b0 = i=1 yi ( bi − b0 ).

3
a0 )]C = (p1 , p2 , p3 ) ⇔ T (
2. [T ( a0 ) − b0 = i=1 pi ( bi − b0 ).

3
x ]B = (x1 , x2 , x3 ) ⇔
3. [ x − a0 = i=1 xi ( ai − a0 ).
4.
 
f ( a1 −
a0 ) C
 
[f ]B
C =  f ( a2 − a0 ) C 

(*2)

f ( a3 − a0 ) C 3×3

ai −
where each f ( a0 ) is a vector in R3 and
ai −
[f ( a0 )]C = (αi1 , αi2 , αi3 )

3

⇔ f (
ai −
a0 ) = αij ( bj − b0 ) for 1 ≤ i ≤ 3.
j=1
(see (3.6.6)). While, (3.8.3) is just another representation of (3.8.2).

Example 1 In the natural affine basis N = { 0 ,
e1 , e3 }, let affine trans-
e2 ,
formation T on R be defined as
3
 
4 −3 3
T (
x) =
x0 +
xA, where x0 = (1, −1, 0) and A = 0 1 4 .
2 −2 1
Let B = {a0 ,
a1 , a3 }, where
a2 , a0 = (1, 0, 1),
a1 = (2, 2, 2),
a2 = (1, 1, 1),

a3 = (2, 3, 3) and C = { b0 , b1 , b2 , b3 }, where b0 = (0, −1, −1),

b1 = (2, −1, −1), b2 = (1, 1, 0), b3 = (0, 0, 1).
(a) Show that both B and C are affine bases for R3 .

(b) Find the matrix representation of T with respect to B and C.
(c) Show that A is diagonalizable. Then, try to find an affine basis D for
R3 so that, in D, T is as simple as possible.
Solution (a) Since

 
a1 −
a0 1 2 1

det a0  = 0 1 0 = 1 = 0,
a2 −
a3 −

a0 1 3 2
so {
a0 ,
a1 , a3 } is affinely independent
a2 , (see Sec. 3.6) and hence B is an
affine basis for R3 . Similarly,
 
b1 − b0 2 0 0
 
det b2 − b0  = 1 2 1 = 6

b3 − b0 0 1 2
says that C is an affine basis for R3 .

(b) Here, we write B = { a1 − a2 −
a0 , a3 −
a0 , a0 } and C = { b1 − b0 ,

b2 − b0 , b3 − b0 } and consider B and C as bases for the vector space R3 .
In the formula (see (3.8.2))
[T (
x )]C = [T ( x ] B AB
a0 )]C + [ C
a0 )]C and AB
we need to compute [T ( C . Now
T (
a0 ) =
x0 +
a0 A
= (1, −1, 0) + (1, 0, 1)A = (1, −1, 0) + (6, −5, 4) = (7, −6, 4)

⇒ T (
a0 ) − b0 = (7, −6, 4) − (0, −1, −1) = (7, −5, 5)

⇒ [T ( a0 ) − b0 )[1R3 ]N
a0 )]C = (T ( C (refer to (2.7.23) and (3.3.3)),
where
   
b1 − b0 2 0 0
 
=  b2 − b0  = 1 2 1 and

[1R3 ]CN

b3 − b0 0 1 2
 
3 0 0 −1
1
[1R3 ]N
C = −2 4 −2 = [1R3 ]CN .
6
1 −2 4
Hence
 
3 0 0
1 1
a0 )]C = (7 −5
[T ( 5)· −2 4 −2 = (36, −30, 30) = (6, −5, 5).
6 6
1 −2 4
On the other hand, AB B N N

C = [1R3 ]N AN [1R3 ]C means that (see (3.3.4))
     
(

a1 −
a0 )A C
a0 )A[1R3 ]N
( a1 − C a1 −
a0
AC =  ( a2 − a0 )A C  = ( a2 − a0 )A[1R3 ]N
B  =  a2 − a0  A[1R3 ]N

C C
( a3 −
a0 )A C ( a0 )A[1R3 ]N
a3 − C a3 −

a0
    
1 2 1 4 −3 3 3 0 0
1
= 0 1 0 0 1 4 · −2 4 −2
6
1 3 2 2 −2 1 1 −2 4
 
36 −36 54
1
=  2 −4 14 .
6
49 −50 76
(c) The characteristic polynomial of A is det(A − tI3 ) = −(t − 1)(t − 2)

(t − 3). Hence A is diagonalizable and
   
1 0 0 4 −3 −6
P AP −1 = 0 2 0 , where P =  1 −1 −1 .
0 0 3 −2 2 1
Let v1 = (4, −3, −6), v2 = (1, −1, −1) and v3 = (−2, 2, 1) be eigen-
vectors associated to 1, 2 and 3, respectively. Then D = {
x0 ,
x0 +
v1 ,
x0 +
v2 ,
x0 + v3 } is an affine basis for R . With x0 as a base point,
3
T (
x) =
x0 +
xA = T ( x −
x0 ) + ( x0 )A, where T (
x0 ) =
x0 +
x0 A
x) −
⇒ T ( x0 ) −
x0 = T ( x −
x0 + ( x0 )A
⇒ [T (
x )]D = [T ( x −
x0 )]D + [( x0 )A]D , i.e.
(T ( x ) −

x0 )P −1

= x0 A)P −1
( + (x −

x0 )P −1 (PAP −1 ).

Let
[
x ]D = ( x0 )P −1 = (α1 , α2 , α3 ),
x − and
[T ( x )]D = (T ( x ) −

x0 )P −1

= (β1 , β2 , β3 ).
By computing
 
1 −9 −3
P −1 = 1 −8 −2
0 −2 −1
  
4 −3 3 1 −9 −3
⇒ [T (
x0 )]D = x0 AP −1

= (1 −1 0) 0 1 4 1 −8 −2
2 −2 1 0 −2 −1
= (0 5 −3).
Then, in terms of D, T has the simplest form as
 
1 0 0
x ]D 0 2
x )]D = (0, 5, −3) + [
[T ( 0 or
0 0 3
β1 = α1 ,
β2 = 5 + 2α2 ,
β3 = −3 + 3α3 .
See Fig. 3.55 and try to explain geometric mapping behavior of T in D.
For example, what is the image of the parallelepiped (
x0 +
v1 )(
x0 +
v2 )

( x0 + v3 ) with vertex at x0 under T ? 2
e3
x0 + v3
x0 e2
0
e1
x0 + v2
x0+ v1
Fig. 3.55
Example 2 Let a0 = (1, 1, 1), a2 = (1, −1, 1),

a1 = (−1, 1, 1), a3 =
(1, 1, −1). How many affine transformations are there that map the tetra-
hedron ∆ a0
a1 a3 onto the tetrahedron ∆(−
a2 a0 )(−a1 )(−
a2 )(−
a3 )? Try to
find out some of them and rewrite them in the simplest forms, if possible.
See Fig. 3.56.
(−1,−1,1)
(−1,1,1)
e3
(1,−1,1) (1,1,1)
e2
0
e1
(−1,−1,−1) (−1,1,−1)
(1,−1,−1) (1,1,−1)
Fig. 3.56
Solution According to the fundamental theorem (3.8.6), there are 4! = 24

different affine transformations that will meet the requirement.

Notice that, letting bi = −
ai for 0 ≤ i ≤ 3,

a1 −a0 = (−2, 0, 0); b1 − b0 = (2, 0, 0) = −(a1 −
a0 )

a2 −a0 = (0, −2, 0); b2 − b0 = (0, 2, 0) = −( a2 −

a0 )

a3 − a0 = (0, 0, −2);

b3 − b0 = (0, 0, 2) = −( a3 − a0 ).

The simplest one among all is, even by geometric intuition, the one T1
that satisfies

ai ) = bi = −
T1 ( ai for i = 0, 1, 2, 3.
⇔ The unique invertible linear operator f1 : R3 → R3 , where T1 (
x) =

x −
b0 + f1 ( x0 ), satisfies

aj −
f1 ( a0 ) = bj − b0 = −( aj − a0 ) for j = 1, 2, 3.
 
−1 0 0

⇒ T1 ( x ) = b0 + ( x − a0 )A1 , where A1 =  0 −1

0.
0 0 −1

⇒ (since a0 A1 = b0 )
 
−1 0 0
T1 (x ) = f1 (
x) = xA1 = x  0 −1 0 or
0 0 −1

y1 = −x1
y = −x2 , where x = (x1 , x2 , x3 ) and y = T (
x ) = (y1 , y2 , y3 ).
 2
y3 = −x3
Another one is T2 that satisfies
a0 ) = −
T2 ( a0 , a1 ) = −
T2 ( a1 a2 ) = −
and T2 ( a3 , a3 ) = −
T2 ( a2 .
⇔ The linear operator f2: R3 → R3 satisfies
a1 −
f2 ( a0 ) = −(
a1 −
a0 ), and
f2 (
a2 −
a0 ) = −(
a3 −
a0 ), a3 −
f2 ( a0 ) = −(
a2 −
a0 ).

⇒ (since
a0 A2 = b0 )
 
−1 0 0
T2 (
x ) = f2 (
x) = x 0
x A2 = 0 −1 or
0 −1 0
y1 = −x1 , y2 = −x3 , y3 = −x2 .

If the vertex a0 is preassigned to b0 = −
a0 , then the total number of such
mappings is 3! = 6. Each one of them is represented by one of the following
matrices A:
     
−1 0 0 −1 0 0 0 0 −1
 0 −1 0 ,  0 0 −1 ,  0 −1 0 ,
0 0 −1 0 −1 0 −1 0 0
     
0 −1 0 0 0 −1 0 −1 0
−1 0 0 , −1 0 0 ,  0 0 −1 .
0 0 −1 0 −1 0 −1 0 0
The corresponding affine transformation is of the form T ( x) = xA.

Suppose a0 is assigned to b1 = − a1 . Construct a T3 : R → R satisfying
3 3
a0 ) = −
T3 ( a1 , a1 ) = −
T3 ( a0 , a2 ) = −
T3 ( a2 a3 ) = −
and T3 ( a3 .
⇔ The unique invertible linear operator f3: R3 → R3 satisfies

a1 −
f3 ( a0 ) = b0 − b1 = −(
a0 − a1 −
a1 ) = a0 , and

f3 (
aj −
a0 ) = bj − b1 = −( aj − a1 )
= −(
aj − a0 ) + ( a1 −

a0 ) for j = 2, 3.
 
1 0 0

⇒ T3 (
x) = b1 x −
+ ( a0 )A3 , where A3 = 1 −1 0.
1 0 −1

⇒ (since
a0 A3 = (3, −1, −1) and b1 = −
a1 = (1, −1, −1))
 
1 0 0
T3 ( x 1 −1
x ) = (−2, 0, 0) + 0 .
1 0 −1
Notice that A3 is diagonalizable and hence is similar to

   
1 0 0 1 0 0
0 −1 0 = P A3 P −1 , where P = 1 −2 0 .
0 0 −1 1 0 −2
Try to find an affine basis B for R3 so that the linear part A3 of T3 is in

this diagonal form.
Suppose T4 satisfies
a0 ) = −
T4 ( a1 , a1 ) = −
T4 ( a2 , a2 ) = −
T4 ( a3 a3 ) = −
and T4 ( a0 .
⇔ The unique invertible linear operator f4 satisfies
a1 −
f4 ( a0 ) = − a1 = −(
a2 + a2 −
a1 ) = −(
a2 − a1 −
a0 ) + ( a0 ),
a2 −
f4 ( a0 ) = − a1 = −(
a3 + a3 − a1 −
a0 ) + ( a0 ),
a3 −
f4 ( a0 ) = −
a0 + a1 −
a1 = a0
 
1 −1 0
⇒ T4 (
x ) = − x −
a1 + ( a0 )A4 , where A4 = 1 0 −1.
1 0 0
⇒ (since −a1 = (1, −1, −1), a0 A4 = (3, −1, −1))
 
1 −1 0
T4 ( x 1
x ) = (−2, 0, 0) + 0 −1 .
1 0 0
There are four others of this type whose respective linear parts are
       
1 0 −1 1 −1 0 1 0 0 1 0 −1
1 −1 0 , 1 0 0 , 1 0 −1 , 1 0 0 .
1 0 0 1 0 −1 1 −1 0 1 −1 0
Yet, there are another 12 such affine transformations which are left as
Ex. <A> 2. 2
Exercises
<A>
1. Prove (3.8.4), (3.8.5) and (3.8.6) in details.
2. Find another 12 affine transformations as mentioned in Example 2.
3. Let B = { a0 ,
a1 , a3 }, where
a2 , a0 = (2, 1, −1), a1 = (3, 1, −2),

a2 = (2, 3, 2), a3 = (3, 2, 0) and C = { b0 , b1 , b2 , b3 }, where b0 =

(1, 3, 1), b1 = (0, 3, 2), b2 = (2, 4, 2), b3 = (2, 1, 2). Show that B and
C are affine bases for R3 . For each of the following affine transforma-

tions T in the natural affine basis N = { 0 , e1 , e3 }, do the following
e2 ,
problems:
(1) The matrix representation of T with respect to B and B, denoted
by [T ]B , i.e.
[T ]B (
x ) = [T (
x )]B = [T ( x ] B AB
a0 )]B + [ B.
(2) The matrix representation [T ]C of T with respect to C and C.

(3) The matrix representation [T ]B
C of T with respect to B and C,
i.e.
[T ]B B
C ( x ) = [T ( x )]C = [T ( a0 )]C + [ x ]B AC .
(4) What is [T ]CB ? Anything to do with [T ]B

C?
(5) Find the (diagonal, Jordan or rational) canonical form of A.
Thus, construct an affine basis D for R3 in which T is as simple
as possible.
 
0 1 −1
(a) T (
x) = x0 +
xA, where x0 = (1, −1, 1) and A = −4 4 −2.
−2 1 1
 
−3 −3 −2
(b) T (
x) = x0 +
xA, where x0 = (−1, −1, 5) and A = −7 6 −3.
1 −1 2
 
−1 3 0
(c) T (
x) = x0 +
xA, where x0 = (1, 2, 3) and A =  0 −1 2.
0 0 −1
 
2 0 0
(d) T (
x) = x0 +
xA, where x0 = (−2, 3, 2) and A =  2 2 0.
−2 1 2
 
0 0 1
(e) T (
x) = x0 +
xA, where x0 = (−2, 1, 0) and A = 1 0 −1.
0 1 1


4 2 3
(f) T (
x) =
x0 + x0 = (4, −1, −1) and A =  2
xA, where 1 2.
−1 −2 0
 
2 1 0
(g) T (
x) =
x0 + x0 = (2, 1, −1) and A = 0 2 1.
xA, where
1 0 2
4. Let T and B and D be as in Ex. 3(a).
(a) Compute T −1 and [T −1 ]B . Is [T −1 ]B = [T ]−1
B (refer to (3.3.3))?
(b) Compute T and [T ]B , where T = T · T .
2 2 2
(c) Try to compute T 3 in terms of D.

5. Let T1 be as in Ex. 3(b) and T2 be as in Ex. 3(f). Also, B and C are as
in Ex. 3.
(a) Compute [T2 ◦T1 ]B . Is this equal to [T2 ]B [T1 ]B ?
(b) Compute [T1 ◦T2−1 ]C .
(c) Compute [T1 ◦T2 ]B
C.
6. In Fig. 3.56, find all possible affine transformations mapping the tetra-
hedron ∆(− a1 ) a2 (−
a3 a0 ) onto the tetrahedron ∆ a0
a1
a2
a3 , where

a1 , a2 , . . ., etc. are as in Example 2.
7. In Fig. 3.56, find all possible affine transformations mapping the tetrahe-
dron ∆ a0 a1
a2
a3 onto itself. Do they constitute a group under composite
operation? Give precise reasons.
8. Try to extend Exs. <A> 2 through 8 of Sec. 2.8.1 to R3 and prove your
statements.

1. We define the affine group Ga (3; R) with respect to the natural affine
basis N (see (3.8.4)). Show that the affine group with respect to an affine
basis B = {a0 ,
a1 , a3 } is the conjugate group
a2 ,
−1
A0 0 A0 0
G a (3; R) ,
a0 1 4×4 a0 1 4×4
where A0 is the transition matrix from B = { a1 − a2 −

a0 , a3 −
a0 , a0 }
to N = { e1 , e2 , e3 } as bases for the vector space R .
3
2. Let B = { a0 ,
a1 ,
a2 , a4 }, where
a3 , a0 = (−1, −1, 1, −2), a1 =
(0, 0, 2, −1), a2 = (−1, 0, 2, −1), a3 = (−1, −1, 2, −1), a4 =

(−1, −1, 1, −1) and C = { b0 , b1 , b2 , b3 }, where b0 = (2, −3, 4, −5),

b1 = (3, −2, 5, −5), b2 = (3, −3, 5, −4), b3 = (3, −2, 4, −4), b4 =
(2, −2, 5, −4). Show that both B and C are affine bases for R4 . For each
of the following affine transformations T (x) =
x0 +
x A, do the same
problems as in Ex. <A> 3.
 
1 0 0 0
 1 1 0 0
(a) x0 = (2, 3, 5, 7), A = 
 2 0
.
2 1
−1 1 −1 1
 
2 −1 0 1
0 3 −1 0
x0 = (0, −2, −2, −2), A = 
(b) 0
.
1 1 0
0 −1 0 3
 
2 −4 2 2
−2 0 1 3
x0 = (−3, 1, 1, −3), A = 
(c) −2 −2 3 3 .

−2 −6 3 7
 
1 −2 0 0
2 1 0 0
x0 = (2, 0, −2, −2), A = 
(d) 1
.
0 1 −2
0 1 2 1
3. Try to describe all possible affine transformations on R4 mapping the
four-dimensional tetrahedron (or simplex).
4 /

4

∆ a0 a1 a2 a3 a4 = λi ai | λi ≥ 0 for 0 ≤ i ≤ 4 and

λi = 1
i=0 i=0

onto another tetrahedron ∆ b0 b1 b2 b3 b4 . Try to use matrices to write
some of them explicitly.
Read Ex. <C> of Sec. 2.8.1.
3.8.2 Examples
We are going to extend the contents of Sec. 2.8.2 to the affine space R3 .
The readers should review Sec. 2.8.2 for detailed introductions.
Remember that R3 also plays as a vector space.
We also need spatial Euclidean concepts such as angles, lengths, areas
and volumes in some cases. One can refer to Introduction and Natural Inner
Product in Part 2 and the beginning of Chap. 5 for preliminary knowledge.
Some terminologies are unified as follows.

Let T (
x) = x0 + x A be an affine transformation on R3 . It is always
understood that, letting A = [aij ]3×3 ,
detA = 0
unless specified. If
x0 = (b1 , b2 , b3 ), then the traditional algebraic equivalent
of T is written as
3
yj = aij xi + bj for 1 ≤ j ≤ 3, (3.8.7)
i=1
where x = (x1 , x2 , x3 ) and y = T (

x ) = (y1 , y2 , y3 ).
A point x ∈ R3 is called a fixed point or an invariant point of T if
T (
x) =
x. (3.8.8)
An affine subspace S of R3 is called an invariant (affine) subspace of T if
T (S) ⊆ S. (3.8.9)
If, in addition, each point of S is an invariant point, then S is called an
(affine) subspace of invariant points.
Case 1 Translation
x0 be a fixed vector in R3 . The mapping
Let
T (
x) =
x0 +
x, x ∈ R3

(3.8.10)
is called a translation of R3 along
x0 . Refer to Fig. 2.96. The set of all such
translations form a subgroup of Ga (3; R).
Translations preserve all geometric mapping properties listed in (3.7.15).

A translation does not have fixed point unless x0 = 0 , which in this case
every point is a fixed point. Any line or plane parallel to x0 is invariant
under translation.
Case 2 Reflection
Suppose a line OA and a plane Σ in space R3 intersect at the point O. For
any point X in R3 , draw a line XP , parallel to OA, intersecting Σ at the
point P and extend it to a point X so that XP = PX . See Fig. 3.57. The
mapping T : R3 → R3 defined by
T (X) = X
−
is called the (skew) reflection of space R3 along the direction OA with
respect to the plane Σ. In case the line OA is perpendicular to the plane
A X
Σ
O P
X′
Fig. 3.57
Σ, T is called the orthogonal reflection or symmetric motion of the space

R3 with respect to the plane Σ. For details, see (3.8.26).
Just like what we did in (2.8.22)–(2.8.25), we can summarize as
The reflection
Let
a0 ,
a1 ,
a2 and a3 be non-coplanar points in the space R3 . Denote by T
the reflection of R along the direction
3
a3 − a0 with respect to the plane
Σ = a0 + a1 − a0 , a2 − a0 . Then,


a0 ,
a1 , a3 },
a2 ,
 
1 0 0
x )]B = [
[T ( x ]B [T ]B , where [T ]B = 0 1 0 .
0 0 −1

2. In the natural affine basis N = { 0 ,
e1 , e3 },
e2 ,
 
a1 −
a0
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − B
where P = PN =  a0 
a2 − .
a3 −

a0 3×3
Also,
(1) A reflection preserves all properties 1–7, c and d listed in (3.7.15).

(2) An orthogonal reflection, in addition to (1), preserves
a. angle,
b. length
but reverses the direction. (3.8.11)
The details are left as Ex. <A> 1.

T in (3.8.11) can be rewritten as
T ( a0 (I3 − P −1 [T ]B P ) +
x) = x P −1 [T ]B P
=
x0 +
x A, where a0 (I3 − A) and A = P −1 [T ]B P. (3.8.12)
x0 =
Notice that det A = det[T ]B = −1. (3.8.12) suggests how to test whether
a given affine transformation T (x) = x0 +
x A is a reflection. Watch the
following steps (compare with (2.8.27)):
1. det A = −1 is a necessary condition.

2. If A has eigenvalues 1, 1 and −1, and A is diagonalizable, then T is a
x (I3 − A) =
reflection if x0 has a solution.
3. Compute linearly independent eigenvectors v1 and
v2 corresponding
to 1. Then the solution set of x (I3 − A) = x0 , i.e.

1 1
( 0 + T ( 0 )) + v2 =
v1 , x0 + v2
v1 ,
2 2
is the plane of invariant points of T .
4. Compute an eigenvector v3 corresponding to −1. Then
v3 or −
v3 is a
direction of reflection T . In fact,

x0 (A x (I3 − A)(I3 + A) = 0
+ I3 ) = (see (2)in (3.7.46)),

so x0 = 0 . Note that
x0 is a direction if v3 and
x0 are linearly depen-
dent.
(3.8.13)
The details are left to the readers.
Example 1
(a) Find the reflection of R3 along the direction v3 = (−1, 1, −1) with
respect to the plane (2, −2, 3) + (0, 1, 0), (0, −1, 1).
(b) Show that
 
1 0 0
 5 4
T (
x) =
x0 +
x A, x0 = (0, −2, −4) and A = 0
where 3 3
0 − 43 − 53
is a reflection. Determine its direction and plane of invariant points.

Solution (a) In the affine basis B = {(2, −2, 3), (2, −1, 3), (2, −3, 4),
(1, −1, 2)}, the required T has the representation
 
1 0 0
[T ( x ]B [T ]B , where [T ]B = 0 1
x )]B = [ 0 .
0 0 −1
While, in the natural affine basis N ,
x − (2, −2, 3))P −1 [T ]B P
x ) = (2, −2, 3) + (
T (
where
   
0 1 0 0 −1 −1
B
P = PN = 0 −1 1 ⇒ P −1 = 1 0 0 .
−1 1 −1 1 1 0
Therefore,
   
0 −1 −1 1 0 0 0 1 0
P −1 [T ]B P = 1 0 0 0 1 0  0 −1 1
1 1 0 0 0 −1 −1 1 −1
 
−1 2 −2
=  0 1 0 , and
0 0 1
(2, −2, 3) − (2, −2, 3)P −1 [T ]B P = (2, −2, 3) − (−2, 2, −1) = (4, −4, 4)
 
−1 2 −2
x 0 1
x ) = (4, −4, 4) +
⇒ T ( 0 for x ∈ R3 , or
0 0 1
y1 = 4 − x1 , y2 = −4 + 2x1 + x2 , y3 = 4 − 2x1 + x3 .
(b) Since det A = −1, it is possible that T is a reflection. To make cer-
tainty, compute the characteristic polynomial det(A−tI3 ) = −(t−1)2 (t+1).
So A has eigenvalues 1, 1 and −1. Moreover,
  
0 0 0 2 0 0
 2 4  8 4
(A − I3 )(A + I3 ) = 0 3 3  0 3 3  = O3×3
0 − 43 − 83 0 − 43 − 23
indicates that A is diagonalizable and thus, the corresponding T is a reflec-
x (I3 − A) =
tion if x0 has a solution. Now
x (I3 − A) =

x0
⇒ x2 − 2x3 − 3 = 0 (the plane of invariant points).
So T is really a reflection.
Take eigenvectors v1 = (2, 0, 0) and

v2 = (1, 2, 1) corresponding to 1
and v3 = (0, 1, 2) corresponding to −1. Then

1
x0 + v2 = (0, −1, −2) + {(2α1 + α2 , 2α2 , α2 )|α1 , α2 ∈ R}
v1 ,
2

x1 = 2α1 + α2
⇔ x2 = −1 + 2α2

x3 = −2 + α2 , for α1 , α2 ∈ R
⇔ x2 − 2x3 − 3 = 0
indeed is the plane of invariant points. Also, v3 = − 12 x0 or just

x0 itself is
the direction of T .
In the affine basis C = { v3 ,
v3 +
v1 ,
v3 + v3 },
v 2 , 2
 
1 0 0
[T ( x ]C 0 1
x )]C = [ 0 .
0 0 −1
See Fig. 3.58.
2v3
x v3 + v2
v3
v3 + 〈〈 v1, v2 〉〉 e3
v2
v3 + v2
e2
0
e1
T ( x)
v1
Fig. 3.58
Remark The linear operator

 
0 1 0
T (
x) =
x A, where A = 1 0 0 (3.8.14)
0 0 1

is a standard orthogonal reflection of R3 in the direction
v3 = √1 , − √1 , 0
2 2
with respect to the plane
v2 = {(x1 , x2 , x3 ) ∈ R3 | x1 = x2 }, where
v1 ,

1 1

v1 = √ , √ , 0 and v2 = e3 = (0, 0, 1).
2 2
See Fig. 3.59 and compare with Fig. 3.32 with a = b = c = 1. Notice that
v2 = e3
T ( x)
0 e2 x
v1
v3 e1
x1 = x2
Fig. 3.59
     √1 √1 0

1 0 0 v1 2 2
 
A = P −1 0 1 0 P, where P = 
v2  =  0 0 1 .
0 0 −1
v3 √1 − √12 0
2
For details, refer to Case 7 below. 2
Case 3 One-way stretch or stretching

Let the line OA intersect the plane Σ at the point O as indicated in Fig. 3.60.
Take any fixed scalar k = 0. For any point X in R3 , draw the line XP ,
X
A
X′(k > 0)
Σ
P
O
X′(k < 0)
Fig. 3.60
parallel to OA and intersecting Σ at the point P . Pick up the point X

on XP so that X P = kXP in signed length. The mapping T : R3 → R3
defined by
T (X) = X
is an affine transformation and is called a one-way stretch with scale factor

k of R3 in the direction OA with the plane Σ as the plane of invariant
points. In case OA is perpendicular to Σ, T is called orthogonal one-way
stretch. One-way stretch has the following features:
1. Each line parallel to the direction is an invariant line.

2. Each plane, parallel to the plane Σ of invariant points, moves to a new
parallel plane with a distance proportional to |k| from the original one.
3. Each line L, not parallel to the plane Σ, intersects with its image line
T (L) at a point lying on Σ.
4. Each plane Σ , not parallel to the plane Σ, intersects with its image
plane T (Σ ) along a line lying on Σ.
See Fig. 3.61.
Σ′ L
T (Σ′)
T (L)
Fig. 3.61
The one-way stretch

Let
a0 ,
a1 , a3 be non-coplanar points in R3 . Let T denote the one-
a2 and
way stretch, with scale factor k, of R3 along the direction a3 −
a0 and the
plane of invariant points Σ: a0 + a1 − a0 , a2 − a0 . Then,


a0 ,
a1 , a3 },
a2 ,
 
1 0 0
 
[T (
x )]B = [
x ]B [T ]B , where [T ]B = 
0 1 0
.
0 0 k

e1 , e3 },
e2 ,
 
a1 −
a0
 
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − B
where P = PN = 
 a2 − a0  .

a3 −
a0
Also,
(1) A one-way stretch preserves all the properties 1–7 listed in (3.7.15),
(2) but enlarges the volume by the scale factor |k| and preserves the orien-
tation if k > 0 while reverses the orientation if k < 0.
(3.8.15)
In case k = −1, then (3.8.15) reduces to (3.8.13) for reflections.

To test if an affine transformation T ( x) =
x0 +
x A is a one-way stretch,
follow the following steps (refer to (3.8.13) and compare with (2.8.29)):
1. A has eigenvalues 1, 1 and k = 1 and A is diagonalizable.

x (I3 − A) =
2. A is a one-way stretch if x0 has a solution.
3. Compute linearly independent eigenvectors v1 and v2 corresponding
to 1, then
1
x0 + v2 = {
v1 , x ∈ R3 |
x (A − I3 ) =
x0 }
1−k
is the plane of invariant points.
4. Compute an eigenvector v3 corresponding to k, then
v3 is the direction
of the stretch. In particular,
1
x0 or
x0
1−k

x0 = 0 .
is a direction (up to a nonzero scalar) if (3.8.16)

Example 2
(a) (See Example 1(a).) Find the one-way stretch with scale factor k = 1
of R3 along the direction v3 = (−1, 1, −1) with respect to the plane
(−1, 1, 1) + v2 , where
v1 , v2 = (0, −1, 1).
v1 = (0, 1, 0) and
(b) Show that
 
−1 4 2
T (
x) = x0 + x A, where x0 = (−2, 4, 2) and A = −1 3 1
−1 2 2
represents a one-way stretch. Determine its direction and its plane of

invariant points.
Solution (a) In the affine basis B = {(−1, 1, 1), (−1, 2, 1), (−1, 0, 2),
(−2, 2, 0)}, the one-way stretch T is
 
1 0 0
[T ( x ]B [T ]B , where[T ]B = 0 1 0 .
x )]B = [
0 0 k
In the natural affine basis N ,
T ( x − (−1, 1, 1))P −1 [T ]B P,
x ) = (−1, 1, 1) + (
where
   
0 1 0 0 −1 −1
B
P = PN =  0 −1 1 ⇒ P −1 = 1 0 0 .
−1 1 −1 1 1 0
Hence,
   
0 −1 −1 1 0 0 0 1 0
P −1 [T ]B P = 1 0 0 0 1 0  0 −1 1
1 1 0 0 0 k −1 1 −1
 
k 1 − k −1 + k
= 0 1 0 ,
0 0 1
(−1, 1, 1) − (−1, 1, 1)P −1 [T ]B P = (−1, 1, 1) − (−k, k, 2 − k)

= (−1 + k, 1 − k, −1 + k)
⇒ T (
x ) = (−1 + k, 1 − k, −1 + k)
 
k 1 − k −1 + k
+x 0 1 0  x ∈ R3 ,
for or
0 0 1

y1 = −1 + k + kx1 ,
y = 1 − k + (1 − k)x1 + x2 ,
 2
y3 = −1 + k + (−1 + k)x1 + x3 ,
where x = (x1 , x2 , x3 ) and

y = T (x ) = (y1 , y2 , y3 ).
(b) The characteristic polynomial of A is det(A − tI3 ) = −(t − 1)2 (t − 2).
So A has eigenvalues 1, 1 and 2. Since (A−I3 )(A−2I3 ) = O3×3 (see Example
1 of Sec. 3.7.6), A is diagonalizable and certainly, the corresponding T will
represent a one-way stretch if x (I3 − A) = x0 has a solution. Solve
x (I3 − A) =

x0 , i.e.
 
2 −4 −2
(x1 , x2 , x3 ) 1 −2 −1 = (−2, 4, 2)
1 −2 −1
⇒ 2x1 + x2 + x3 = −2 does have a (in fact, infinite) solution.
Therefore, T is a one-way stretch with scale factor 2.

Note that x (A − I3 ) = 0 ⇔ 2x1 + x2 + x3 = 0. Take v1 = (1, −2, 0)
and v2 = (1, 0, −2) as linearly independent eigenvectors corresponding to 1.

On the other hand, x (A − 2I3 ) = 0 ⇔ 3x1 + x2 + x3 = 4x1 + x2 + 2x3 =
2x1 +x2 = 0. Choose v3 = (1, −2, −1) as an eigenvector corresponding to 2.
x0 = −2
Since v3 , both x0 and
v3 can be chosen as a direction of the
one-way stretch T . To find an invariant point of T , let such a point be
denoted as αx0 . By the very definition of a one-way stretch along x0 ,

x0 − T ( 0 ) = α
α x0 − x0 − 0 ) with
x0 = k(α k=2

⇒ (since = 0 )α − 1 = kα with k = 2

x0
1
⇒α= = −1
1−k
⇒ An invariant point αx0 = −
x0 = 2
v3
⇒ The plane of invariant point is 2
v3 + v2 .
v1 ,
Notice that
x = (x1 , x2 , x3 ) ∈ 2

v3 + v2
v1 ,
⇒ x1 = 2 + α1 + α2 , x2 = −4 − 2α1 , x3 = −2 − 2α2 for α1 , α2 ∈ R
⇒ (by eliminating the parameters α1 and α2 ) 2x1 + x2 + x3 = −2
x (I3 − A) =
⇒ x0
as we claimed before. In the affine basis C = {2v3 , 2
v3 + v1 , 2
v3 +
v2 ,
2 v3 + v3 },

   
1 0 0 1 −2 0
[T ( x ]C 0 1 0 ,
x ]C P AP −1 = [
x )]C = [ where P = 1 0 −2 .
0 0 2 1 −2 −1
See Fig. 3.62.
e3
( −1, 0, 0)
0 (2,1,1)
(0, − 2, 0)
e2
v1
e1
2x1 + x2 + x3 = 0
v3
v2
2v3 + v1 2v3
2v3 + v3 2v3 + v2 2x1 + x2 + x3 = −2

(plane of invariant points)
Fig. 3.62
We propose the following questions:

Q1 Where is the image of a plane, parallel to 2x1 +x2 +x3 = −2, under T ?
Q2 Where is the image of a plane or a line, nonparallel to 2x1 + x2 +
x3 = −2, under T ? Where do these two planes or lines intersect?

Q3 What is the image of the tetrahedron ∆ a0
a1 a3 , where
a2 a0 = 0 ,
a1 =
(1, 0, −1), v3 = (2, −4, −2) and
a2 = 2 a3 = (−2, 0, 0), under T ? What
are the volumes of these two tetrahedra?
To answer these questions, we compute firstly that

 
4 −4 −2
1
A−1 = 1 0 −1 .
2
1 −2 1
This implies the inverse affine transformation of T is

x = (T ( x0 )A−1 .
x) −
For Q1 A plane, parallel to 2x1 +x2 +x3= −2,is of the form 2x1 +x2 +x3 =
c which intercepts x1 -axis at the point 2c , 0, 0 . Since 2x1 + x2 + x3 = −2
is the plane of invariant points of T , with scale factor 2, so the image plane
(refer to Fig. 3.62) is
2x1 + x2 + x3 = −2 + 2(c − (−2)) = 2(c + 1),
where c is any constant.

We can prove this analytically as follows.
 
2
x 1 = c
2x1 + x2 + x3 =
1
⇒ (let y = T ( x ) = (y1 , y2 , y3 ) temporarily)
 
2
[(y1 , y2 , y3 ) − (−2, 4, 2)]A−1 1 = c
1

1
⇒ since A−1 y ∗0 = y ∗0 for y 0 = (2, 1, 1), see Note below
2
 
2
[(y1 , y2 , y3 ) − (−2, 4, 2)] 1 = 2c
1
⇒ 2y1 + y2 + y3 − (−4 + 4 + 2) = 2c
⇒ (change y1 , y2 , y3 back to x1 , x2 , x3 respectively)
2x1 + x2 + x3 = 2(c + 1)
as claimed above. Therefore, such a plane has its image a plane parallel to
itself and, of course, to 2x1 + x2 + x3 = −2.
Note It is well-known that Ker(A − I3 )⊥ = Im(A∗ − I3 ) (see (3.7.32) and

Ex. <A> 5 of Sec. 3.7.3). Since Ker(A−I3 )⊥ = (2, 1, 1), so Im(A∗ −I3 ) =

(2, 1, 1) holds. Since (A∗ − I3 )(A∗ − 2I3 ) = O3×3 , so
y 0 (A∗ − 2I3 ) = 0

where y 0 = (2, 1, 1). Therefore
y 0 A∗ = 2

y0
∗ ∗
⇒Ay0 = 2y0
1 ∗
y ∗0 =
⇒ A−1 y
2
which can also be easily shown by direct computation.
For Q2 Take, for simplicity, the plane x1 − x2 − x3 = 5. To find the image
plane of this plane under T , observe that
 
1
x1 − x2 − x3 = (x1 , x2 , x3 ) −1 = 5

−1
 
1
⇒ (let y = T (x ) temporarily) [(y1 , y2 , y3 ) − (−2, 4, 2)]A−1 −1 = 5
−1
 
5
⇒ (y1 + 2, y2 − 4, y3 − 2) 1 = 5(y1 + 2) + y2 − 4 + y3 − 2 = 5
1
⇒ (replace y1 , y2 , y3 by x1 , x2 , x) 5x1 + x2 + x3 = 1
which is the equation of the image plane.
The two planes x1 − x2 − x3 = 5 and 5x1 + x2 + x3 = 1 do intersect
along the line x1 = 1, x2 + x3 = −4 which obviously lies on the plane
2x1 + x2 + x3 = −2. Refer to Fig. 3.61.
For Q3 By computation,

T (
a0 ) = T ( 0 ) =
x0 = (−2, 4, 2),
a1 ) = (−2, 4, 2) + (1, 0, −1)A = (−2, 4, 2) + (0, 2, 0) = (−2, 6, 2),
T (
a2 ) = (−2, 4, 2) + (2, −4, −2)A = (−2, 4, 2) + (4, −8, −4) = (2, −4, −2),
T (
a3 ) = (−2, 4, 2) + (−2, 0, 0)A = (−2, 4, 2) + (2, −8, −4) = (0, −4, −2).
T (
These four points form a tetrahedron ∆T ( a0 )T (
a1 )T (
a2 )T (
a3 ) whose vol-
ume is equal to
 
T ( a1 ) − T (a0 ) 0 2 0

det T (a2 ) − T (a0 ) = 4 −8 −4 = 16.
T ( a3 ) − T ( a0 )
2 −8 −4
While
a0
a1
a2
a3 has volume equal to
 
a1 1 0 −1

a2  = 2 −4 −2 = 8.
det 

a3 −2 0 0
Therefore,
Volume of T (
a0 ) · · · T (
a3 ) 16
= = 2 = det A.
Volume of a0 · · · a3 8
See Fig. 3.63.
T ( a0 )
T ( a1 )
a3
a0
T ( a3 )
a1
T ( a2 ) = a2
Fig. 3.63
Case 4 Two-way stretch

A two-way stretch is the composite of two one-way stretches whose planes
of invariant points intersect along a line which is going to be the only line
of invariant points if both scale factors are different from 1.
The two-way stretch
Let
a0 ,
a1 , a3 be non-coplanar points in R3 . Denote by T the two-
a2 and
way stretch having scale factor k1 along a0 + a1 − a0 and scale factor
k2 along a0 + a2 − a0 and a0 + a3 − a0 the only line of invariant

points. Then

a0 ,
a1 , a3 },
a2 ,
[T (
x )]B = [
x ]B [T ]B ,
    
k1 0 0 k1 0 0 1 0 0
where [T ]B =  0 k2 0 =  0 1 0 0 k2 0 .
0 0 1 0 0 1 0 0 1

e1 , e3 },
e2 ,
 
a1 −
a0
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − where P =  a0  .
a2 −
a3 − a0

Also,
(1) A two-way stretch preserves all the properties 1–7 listed in ( 3.7.15),
(2) but enlarge the volume by the scale factor |k1 k2 | and preserves the
orientation if k1 k2 > 0 while reverses the orientation if k1 k2 < 0.
(3.8.17)
See Fig. 3.64 (refer to Fig. 2.102).
T(x), k1 < 0, k2 > 0 T(x), k1 > 0, k2 > 0

T(x), k1 > 0, k2 < 0
x
e3
e2
T(x), k1 < 0, k2 < 0
0 e1
k1 0 0
(a) T =  0 k2 0  in
 
0 0 1
T ( x)
x
a3 a2
a0 a1
(b) in
Fig. 3.64
In case k1 = k2 = k = 1, then the plane a0 + a1 − a2 −

a0 , a0 is an
invariant plane of T , on which T is an enlargement with scale factor k.
To see if T (
x) = x0 +
x A is a two-way stretch, try the following steps:
1. A has eigenvalues 1, k1 and k2 where k1 = 1 and k2 = 1.

2. If A is diagonalizable (hence, the rank r(I3 − A) = 2) and the equation
x (I3 − A) =

x0 has a nontrivial solution, then A is a two-way stretch.
3. Compute an eigenvector v3 associated to 1. Let x1 be a solution of
x (I3 − A) = x0 . Then

x1 +
v3 = {
x ∈ R3 |
x (I3 − A) =
x0 }
is the line of invariant points.
4. Compute an eigenvector v1 for k1 and an eigenvector
v2 for k2 so that

v1 and v2 are linearly independent in case k1 = k2 . Then

x1 + v2
v1 ,
is an invariant plane.
(3.8.18)
Example 3 Give an affine transformation

 
−1 6 −4
T (
x) =
x0 +
x A, where A =  2 4 −5 .
2 6 −7
Try to determine x0 so that T is a two-way stretch. In this case, determine
the line of invariant points and the invariant plane.
Solution A has characteristic polynomial det(A − tI3 ) = −(t − 1)(t + 2) ×
(t + 3), so A has eigenvalues −2, −3 and 1 and thus A is diagonalizable.
Since
 
2 −6 4
I3 − A = −2 −3 5
−2 −6 8
has rank equal to 2, the range of I3 − A is of dimension two. The range of
I3 − A is
x (I3 − A) =

y, where
x = (x1 , x2 , x3 ) and
y = (y1 , y2 , y3 )
⇔ y1 = 2x1 − 2x2 − 2x3 , y2 = −6x1 − 3x2 − 6x3 , y3 = 4x1 + 5x2 + 8x3
⇔ y 1 + y2 + y3 = 0
⇔ (replace y1 , y2 , y3 by x1 , x2 , x3 respectively)
Im(I3 − A) = {
x = (
x1 , x3 ) | x1 + x2 + x3 = 0}.
x2 ,
So, any point x0 ∈ Im(I3 − A) will work.
x0 so that

Solving x (A − I3 ) = 0 , get the corresponding eigenvector v3 =

(1, 4, −3). Solving x (A + 2I3 ) = 0 , get v1 = (0, 1, −1) and solving

v2 = (1, 0, −1).
x (A + 3I3 ) = 0 , get
For any x1 such that

x1 (I3 − A) = x0 holds, T is a two-way stretch with
x1 +
the line of invariant points: v3 , and
the invariant plane: x1 + v1 , v2 .

x0 = (2, −3, 1) and solve

For example, take x (I3 − A) = (2, −3, 1) so that
x = (x1 , −3 + 4x1 , 2 − 3x1 ) = (0, −3, 2) + x1 (1, 4, −3),

x1 ∈ R
⇒ (0, −3, 2) + (1, 4, −3), where
x1 = (0, −3, 2)
is the line L of invariant points. On the other hand, the plane
(0, −3, 2) + v2 = {(α2 , −3 + α1 , 2 − α1 − α2 ) | α1 , α2 ∈ R}
v1 ,
x = (x1 , x2 , x3 ) | x1 + x2 + x3 = −1}
= {
is an invariant plane.
Note Careful readers might have observed in Example 2 that, since

(A − 2I3 )(A − I3 ) = O, Im(A − 2I3 ), Ker(A − I3 ) and v2 are the
v1 ,
same plane 2x1 + x2 + x3 = 0 and hence, the plane of invariant points is
just a translation of it, namely 2x1 + x2 + x3 = −2. Refer to Fig. 3.62.
This is not accidental. This does happen in Example 3 too. Since
(A − I3 )(A + 2I3 )(A + 3I3 ) = O3×3
⇒ (since r(A − I3 ) = 2) Im(A − I3 ) = Ker(A + 2I3 )(A + 3I3 )
⇒ (since Ker(A + 2I3 ) ⊕ Ker(A + 3I3 ) ⊆ Ker(A + 2I3 )(A + 3I3 ))
Im(A − I3 ) = Ker(A + 2I3 ) ⊕ (A + 3I3 ) = v2 .
v1 ,
This justifies that both Im(A − I3 ) and v2 are the same plane x1 +
v1 ,
x2 + x3 = 0 as indicated above. As a consequence, all the planes parallel to
it are invariant planes. See Fig. 3.65.
In the affine basis C = { x1 ,
x1 +
v1 ,
x1 + v2 , v3 },
x1 +
 
−2 0 0
[T ( x ]C[T ]C , where [T ]C =  0 −3 0 .
x )]C = [
0 0 1
Try to answer the following questions:
Q1 What is the image of a plane parallel to the line of invariant points?
Q2 Is each plane containing the line of invariant points an invariant plane?
x1 + v3
x1 + 〈〈v3 〉〉 v3
e3
x1 e2
x1 + v1
(−1, 0, 0) e1
0 v1
x1 + v2 (0, −1, 0)
(0, 0, −1)
v2
x1 + x2 + 3 = −1 x1 + x2 + 3 = 0
Fig. 3.65
x1 +
Q3 Where is the intersecting line of a plane, nonparallel to v2 ,
v1 ,
with its image?
Q4 Let
a0 = x1 = (0, −3, 2),
a1 = (−1, −1, −1),
a2 = (1, −2, 1) and a3 =
x1 + v3 = (1, 1, −1). Find the image of the tetrahedron

a0
a1
a2
a3
and compute its volume.
For these, note that the inverse transformation of T is
 
2 18 −14
1

y −(2, −3, 1))A−1 ,
x = ( where x ) and A−1
y = T ( = 4 15 −13 .
6
4 18 −16
For Q1 Geometric intuition (see Fig. 3.65) suggests that the image of any
such plane would be parallel to the plane. This can be proved analytically
as follows.
A plane Σ parallel to the line L of invariant points has equation like
(−4a2 + 3a3 )x1 + a2 x2 + a3 x3 = b, where b = −3a2 + 2a3 , a2 , a3 ∈ R.
The condition b = −3a2 + 2a3 means that L is not coincident on Σ. Then
ΣL
 
−4a2 + 3a3
x
⇔ a2 =b
a3
   
−4a2 + 3a3 10a2 − 8a3
1
y − (2, −3, 1)]A−1 
⇔ [ a2  = [
y − (2, −3, 1)]  −a2 − a3  = b
6
a3 2a2 − 4a3
⇔ (10a2 − 8a3 )y1 − (a2 + a3 )y2 + (2a2 − 4a3 )y3 = 6b + 25a2 − 17a3
⇒ (let α2 = −a2 − a3 , α3 = 2a2 − 4a3 )
(−4α2 + 3α3 )x1 + α2 x2 + α3 x3 = 6b + 25a2 − 17a3 ,
where 6b + 25a2 − 17a3 = −18a2 + 12a3 + 25a2 − 17a3 = −3α2 + 2α3 . Hence,
the image plane T (Σ)ΣL. See Note below.
Note (refer to Note in Example 2) It is easy to see that
Σ v1 ⇔ a2 = a3 .
v3 ,
Hence, Σ has equation −ax1 + ax2 + ax3 = b, where a = 0, or equivalently,

x1 − x2 − x3 = b with − ab replacing by b = 1. Let
n = (1, −1, −1) be the
normal to these parallel planes. Then
v3 ∈ Ker(A − I3 )

= Im(A∗ − I3 )⊥ and
∗ ⊥
v1 ∈ Ker(A + 2I3 )

= Im(A + 2I3 )
n ∈ Im(A∗ − I3 ) ∩ Im(A∗ + 2I3 ) = Ker(A∗ + 3I3 )

⇒
nA∗ = −3
⇒ n
n ∗ = −3
⇒ A n∗
1 ∗
⇒ A−1n∗ = − n .
3
Similarly,
Σ v2 ⇔ 2a2 − a3 = 0.
v3 ,
⇒ Σ has equation 2x1 + x2 + 2x3 = b, b = 1.
1 ∗
⇒ A−1u∗ = − u , where u = (2, 1, 2), the normal vector.
2
For Q2 The answer is negative in general except the planes x1 +
v1 and
v3 , x1 + v2 which are invariant planes. But each such
v3 ,
plane and its image plane intersect along the line L of invariant points. Q1
answers all these claims.
For Q3 For simplicity, take x3 = 0 as a sample

plane. The plane x3 =
0 intercepts the line L at the point 23 , − 13 , 0 and intersects the plane
x1 +

v2 along the line x3 = 0, x1 + x2 + x3 = −1. To find its image
v1 ,
under T ,
 
0
x 0 = 0
x3 =
1
   
0 −14
1
y − (2, −3, 1)]A−1 0 = [
⇔ [ y − (2, −3, 1)] −13 = 0
6
1 −16
⇒ (replace
y by
x) 14x1 + 13x2 + 16x3 = 5.

This image plane does intercept the line L at the invariant point 23 , − 13 , 0
and intersects the original plane x3 = 0 along the line x3 = 0, 14x1 + 13x2 +
16x3 = 5.
T ( x1 = (0, −3, 2),

a0 ) =
T ( a1 ) = (0, −3, 2) + (−1, −1, −1)A = (−3, −19, 18),

T (a2 ) = (0, −3, 2) + (1, −2, 1)A = (−3, 1, 1),

T (a3 ) =
x1 + v3 = (1, 1, −1).
Then,
The signed volume of

a0
a1
a2
a3

a1 −
a0 −1 2 −3

= a2 −
a0 = 1 1 −1 = −6, and
a − a 1 4 −3
3 0
the signed volume of T (
a0 )T (
a1 )T (
a2 )T (
a3 )

−3 −16 16

= −3 4 −1 = −36.
1 4 −3
a0 ) · · · T (
The signed volume of T ( a3 ) −36
⇒ = = 6 = detA. 2
The signed volume of a0 · · · a3

−6
Case 5 (Three-way) stretch

The composite of three one-way stretches with scale factors all different
from 1 is called a three-way stretch or simply a stretch.
As an extension of (3.8.17), we have
The stretch
Let a0 ,
a1 , a3 be non-coplanar points in R3 . Denote by T the stretch
a2 and
with scale factor ki along the line a0 +
ai − a0 for i = 1, 2, 3, where
k1 , k2 and k3 are all nonzero and different from 1. Then

a0 ,
a1 , a3 },
a2 ,
[T (
x )]B = [
x ]B [T ]B , where
     
k1 0 0 k1 0 0 1 0 0 1 0 0
[T ]B =  0 k2 0  =  0 1 0 0 k2 0 0 1 0 .
0 0 k3 0 0 1 0 0 1 0 0 k3

e1 , e3 },
e2 ,
 
a1 −
a0
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − 
where P = a2 −

a0 .

a3 −
a0
If k1 = k2 = k3 , then a0 is the only invariant point and there is no invariant

plane. If k1 = k2 = k3 , then a0 + a1 − a2 −
a0 , a0 is an invariant plane.
If k1 = k2 = k3 = k, then T is called an enlargement with scale factor k
and any plane parallel to each of the three coordinate planes (see Sec. 3.6)
is an invariant plane. In case a1 − a2 −
a0 , a0 and a3 − a0 are perpendicular
to each other, T is called an orthogonal stretch; if, in addition, the lengths
|
a1 − a0 | = |
a2 − a0 | = |a3 − a0 | and k1 = k2 = k3 = k, then T is called a
similarity with scale factor k. In general,
(a) A stretch preserves all properties 1–7 listed is (3.7.15),

(b) but enlarge the volumes by the scalar factor |k1 k2 k3 | and preserves the
orientation if k1 k2 k > 0 while reverses the orientation if k1 k2 k < 0.
(3.8.19)
See Fig. 3.66.

To test if an affine transformation T (
x) =
x0 +
x A is a stretch, all
one needs to do is to compute the eigenvalues of A and to see if it is
diagonalizable (see (3.7.46)).
Case 6 Shearing (Euclidean notions are needed)

v = 0 be a space vector which is parallel
Let Σ be a plane in space and
to Σ. Take any fixed scalar k = 0. Each point X in space moves in the
T ( x)
x
a3
a2
a1
a0
Fig. 3.66
X′
Σ
X
Fig. 3.67
v to a new point X so that

direction
the signed distance XX from X to X
= k.
the (perpendicular) distance from X to Σ
It is understood that, if X ∈ Σ, then X = X. Then, the mapping T : R3 →
R3 defined by
T (X) = X
is an affine transformation and is called a shearing with coefficient k in the
direction
v with respect to the plane Σ of invariant points. Obviously,
1. Any plane parallel to Σ or the direction is an invariant plane.
2. Points on opposite sides of Σ move in opposite directions
v and− v (refer
to Fig. 2.105).
3. Each plane which intersects with Σ along a line L will intersect its image
plane, under T , along the same line L.
4. T preserves the volume of a parallelepiped, and hence volumes of a tetra-
hedron and any solid domain in space (refer to Case 5 in Sec. 2.8.2 and
the right figure in Fig. 3.34).
(3.8.20)
As a counterpart of (2.8.32) in R3 , we have

The shearing
Let
a0 ,
a1 , a3 be non-coplanar points in R3 so that
a2 and
1. (in lengths) |
a1 −
a0 | = |
a2 −
a0 | = |
a3 −
a0 | = 1, and
2. (perpendicularity) ( a1 − a0 )⊥( a2 − a0 )⊥(

a3 − a0 ),
i.e. B = {
a0 ,
a1 , a3 } is an orthonormal affine basis for R3 . Denote by T
a2 ,
the shearing with coefficient k in the direction a1 − a0 with respect to the
plane a0 + a1 − a0 , a3 − a0 as plane of invariant points. Then

a. In B,
 
1 0 0
[T (
x )]B = [
x ]B [T ]B , where [T ]B = k 1 0 .
0 0 1

b. In N = { 0 ,
e1 , e3 },
e2 ,
 
a1 −
a0
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − where P =  a0  .
a2 −
a3 − a0


Also,
(1) A shearing preserves all the properties 1–7 listed in (3.7.15), and
(2) preserves the volumes and the orientations. (3.8.21)
For skew shearing, refer to Ex. <A> 17 of Sec. 2.8.2 and Ex. 1.
To see if an affine transformation T ( x) = x0 + x A is a shearing, try
the following steps (refer to (2.8.33)):
1. Compute the eigenvalues of A. In case A has only one eigenvalue 1 with
multiplicity 3 and (A − I3 )2 = O, A is not diagonalizable (see (2) in
x (I3 − A) =
(3.7.50)) and then, the associated T is a shearing if x0 has
a solution.
2. The eigenspace
E = Ker(A − I3 )
has dimension equal to 2. Determine eigenvectors v1 and v3 so that
| v1 | = | v3 | = 1 and v1 ⊥ v3 in order to form an orthonormal basis for E.

3. Take a vector v2 , of unit length and perpendicular to E, then v2 A−

v2 =

k v1 holds and k is the coefficient.
4. Take a0 as a solution of x (I3 − A) =
x0 , then
v1 is the direction and
a0 + v1 , v3 or x (I3 − A) = x0 is the plane of invariant points.

(3.8.22)
Example 4 Let a1 = (2, 0, −1) and

a0 = (1, 1, 0), a2 = (0, −1, 1). Try to
construct a shearing with coefficient k = 0 in the direction a1 − a0 with
⊥
a0 + a2 − a0 as the plane of invariant points. Note that (

a1 − a0 )⊥
a2 −
( a0 ).
√
Solution Since a1 −
a0 = (1, −1, −1) has length 3, take the unit vector
v1 = √13 (1, −1, −1). Since

a2 −
a0 = (−1, −2, 1) happens to be perpendic-
ular to a1 − a0 , i.e.

 
1
(a2 − a1 −
a0 )( a0 )∗ = (−1 − 2 1) −1 = −1 + 2 − 1 = 0,
−1
√
so we can choose v2 to be equal to a2 −
a0 dividing by its length 6, i.e.
v2 = √16 (−1, −2, 1). Then, choose a vector

v3 = (α1 , α2 , α3 ) of unit length
so that
v3 ⊥

v1 and v3 ⊥

v2
⇒ α1 − α2 − α3 = 0 and − α1 − 2α2 + α3 = 0
⇒ α1 = α3 and α2 = 0
1
⇒v3 = √ (1, 0, 1).
2
In the orthonormal affine basis B = { a0 ,
a0 +
v1 ,
a0 +
v2 , v3 }, the
a0 +
required shearing T has the representation
 
1 0 0
[T ( x ]B [T ]B , where [T ]B = k 1 0 , simply denoted as A.
x )]B = [
0 0 1

While in N = { 0 ,
e1 , e3 },
e2 ,
T (
x) =
a0 + ( a0 )P −1 AP,
x −
where
 1 
  √ − √13 − √13
v1 
3

  
P = v2 = − 6
√1
− √26 √1  with P −1 = P ∗ .
6

v3 √1 0 √1
2 2
By computation,
 
1 − 3√
k
2
k
√
3 2
√k
3 2
 
P −1 AP =  −2k
√
 3 2 1 + 32k
√
2
2k
√
3 2 
, and
−k
k
√
3 2
√
3 2
1 − 3√k
2
k

x0 = a0 P −1 AP =
a0 − a0 P −1 (I3 − A)P = √ (1, −1, −1)
2
−1
⇒ T ( x ) = x0 + x P AP.

See Fig. 3.68.
e3
a0 + v2
e2
0
x
e1
T ( x) a0 + v3 a0
plane of
a0 + v1
invariant points
Fig. 3.68
√
For simplicity, take k = 2 and consider the converse problem. Let
T (
x) =
x0 +
x B, x0 = (1, −1, −1) and
where
 2 1 1
3 3 3
 2 2
B = P −1 AP = − 3 5
3 3
.
1
3 − 13 2
3
We want to check if this T is a shearing. Follow the steps in (3.8.22).

1. B has characteristic polynomial

1
det(B − tI3 ) = (−27t3 + 81t2 − 81t + 27) = −(t − 1)3 .
27
So B has eigenvalue 1 of algebraic multiplicity 3. Furthermore,
B − I3 = O but
 1 2
− 13 1
3 3
 2
(B − I3 )2 = − 23 2
3 3
= O3×3 .
1
3 − 13 − 13
Hence B is not diagonalizable. Solve

x (I3 − B) =
x0 = (1, −1, −1), where
x = (x1 , x2 , x3 )
⇒ x1 + 2x2 − x3 = 3 has infinitely many solutions.
Pick any such a solution, say

a0 = (1, 1, 0). These information guarantee
that T is a shearing.
2. Solve

x (B − I3 ) = 0

⇒ x1 + 2x2 − x3 = 0.
Pick up two orthogonal eigenvectors of unit length, say

v1 =
√1 (1, −1, −1) and
v 3 = 1
√ (1, 0, 1).
3 2
3. To choose a unit vector v2 ⊥

v2 = (α1 , α2 , α3 ) so that v2 ⊥
v1 and v3
hold, solve
α1 − α2 − α3 = α1 + α3 = 0 ⇒ (α1 , α2 , α3 ) = α1 (1, 2, −1).
Take v2 = √16 (1, 2, −1). Notice that this v2 is not an eigenvector asso-
ciated to 1 since dim Ker(B − I3 ) = 2. By computation,
 2 1 1
3 3 3
1  2 1

v2 B = √ (1, 2, −1) − 23 5
3 3
= √ (−1, 4, 1)
6 6
1
3 − 13 2
3
2 √
⇒
v2 B −
v2 = √ (−1, 1, 1) = − 2
v1 .
6
√
Therefore, − 2 is the coefficient.
Hence, in C = {
a0 ,
a0 +
v1 ,
a0 +
v2 , v3 },
a0 +
 
1
√ 0 0
[T (
x )]C = [
x ]C [T ]C , where [T ]C = − 2 1 0 .
0 0 1
Notice that the C is equal to −

v2 in √ v2 in B metioned
√ above. This is the
reason why we have − 2 here instead of the original 2 (see 2 in (3.8.20)).
Two questions are raised as follows.
Q1 What is the image plane of the plane x1 + x2 = 0 under T ? Where do

they intersect?

Q2 Where is the image of the tetrahedron b0 b1 b2 b3 , where b0 =

(1, 1, 1), b1 = (−1, 1, 1), b2 = (1, −1, 1) and b3 = (1, 1, −1)? How are
their volumes related?
We need to use the inverse of T (

x) =
x0 +
x B, namely,

x = ( x0 )B −1 ,
y − x0 = (1, −1, −1) and
where y = T (
x ) and
 
4 −1 −1
1
B −1 = P −1 A−1 P =  2 1 −2 .
3
−1 1 4
For Q1
 
1
x1 + x2 = x 1 = 0 if x = (x1 , x2 , x3 )
0
  
4 −1 −1 1
1
⇒ [(y1 , y2 , y3 ) − (1, −1, −1)] · 2 1 −2 1 = 0
3
−1 1 4 0
⇒ (replace y1 , y2 , y3 by x1 , x2 , x3 ) x1 + x2 = 0.
Hence, x1 + x2 = 0 and its image plane are coincident. This is within our
expectation because the normal vector (1, 1, 0) of x1 + x2 = 0 is perpen-
v1 = √13 (1, −1, −1), the direction of the shearing T (see 1 in
dicular to
(3.8.20)).
Any plane parallel to x1 + 2x2 − x3 = 3 has equation x1 + 2x2 − x3 = c

where c is a constant. Then
 
1
x1 + 2x2 − x3 = x  2 = c
−1
  
4 −1 −1 1
1
⇒ [(y1 , y2 , y3 ) − (1, −1, −1)] · 2 1 −2  2 = c
3
−1 1 4 −1
⇒ y1 + 2y2 − y3 = c,
which means that it is an invariant plane.

1 5 5 4 2 2
T ( b0 ) = (1, −1, −1) + (1, 1, 1)B = (1, −1, −1) + , , = , , ,
3 3 3 3 3 3
T (b1 ) = (1, −1, −1) + (−1, 1, 1)B = (1, −1, −1) + (−1, 1, 1) = (0, 0, 0),
T (b2 ) = (1, −1, −1) + (1, −1, 1)B

5 5 1 8 8 2
= (1, −1, −1) + ,− , = ,− ,− ,
3 3 3 3 3 3
T (b3 ) = (1, −1, −1) + (1, 1, −1)B

1 7 1 2 4 2
= (1, −1, −1) + − , , = , ,− .
3 3 3 3 3 3

These four image points form a tetrahedron T ( b0 ) · · · T ( b3 ). Thus

b − b0

1 1

the signed volume of b0 b1 b2 b3 = b2 − b0
6
b3 − b0

−2 0 0
1 4
= 0 −2 0 = − ,
6 3
0 0 −2

T ( b ) − T ( b0 )

1 1

the signed volume of T ( b0 ) · · · T ( b3 ) = T ( b2 ) − T ( b0 )
6
T ( b3 ) − T ( b0 )
4
− − 23 − 23
1 3

= 34 − 10 − 43
6 3
− 2 2
− 43
3 3

1 8 4
= · − · 27 = − .
6 27 3
So both have the same volumes. 2
Case 7 Rotation (Euclidean notions are needed)

Let the line OA be perpendicular to a plane Σ in space R3 , as indicated in
Fig. 3.69. With OA as the line of invariant points, rotate the whole space
through an angle θ so that a point X is moved to its new position X .
Define a mapping T : R3 → R3 by
T (X) = X .
X′
A
Σ
O
X
axis
Fig. 3.69
Then T is an affine transformation and is called the rotation of the space

with the line OA as axis and through the angle θ. Then,
1. The axis is the only line of invariant points if θ = 0.

2. The rotation in R3 results in a rotation on any plane, perpendicular to
the axis at O, with O as center and through the angle θ (see Fig. 2.108).
3. Any plane containing the axis is rotated through the angle θ just like
the action of opening a door.
(3.8.23)

As a counterpart of (2.8.34) in space, we have
The rotation
Let
a0 ,
a1 , a3 be non-coplanar points in R3 so that
a2 and
a1 −
1. | a0 | = |
a2 −
a0 | = |
a3 −
a0 | = 1, and
2. ( a1 − a0 )⊥( a2 − a0 )⊥( a3 −

a0 ).
a0 +
Let T denote the rotation of the space with a1 −
a0 as axis and
through the angle θ. Then
1. In the orthonormal affine basis B = {

a0 ,
a1 , a3 },
a2 ,
 
1 0 0
[T (
x )]B = [
x ]B [T ]B , where [T ]B = 0 cos θ sin θ  .
0 − sin θ cos θ

2. In N = { 0 ,
e1 , e3 },
e2 ,
 
a1 −
a0
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − where P =  a0  .
a2 −
a3 − a0

Also,
(a) a rotation not only preserves all the properties 1–7 listed in (3.7.15),
(b) but also preserves length, angle, area, volume and orientation.
(3.8.24)

To test if an affine transformation T ( x) = x A is a rotation in R3 ,
x0 +
try the following steps (refer to (2.8.35)):
1. Justify if A∗ = A−1 , i.e. A is an orthogonal matrix, and det A = 1. Then,

the associated T is a rotation if x (I3 − A) =
x0 has a solution.
2. Solve det(A − I3 ) = 0 and see what is the algebraic multiplicity m of the
eigenvalue 1 which could only be 1 and 3 (see (3) in (3.7.46) and (2), (3)
in (3.7.50)). In case m = 3, then A = I3 holds. If m = 1, determine an
eigenvector v1 of unit length associated to 1.
v1 to an orthonormal basis B = {
3. Extend v1 , v3 } for R3 . Then
v2 ,

1 0
[A]B = ,
0 B
where B2×2 is an orthogonal matrix with detB = 1 and hence (see

(2.8.35))

cos θ sin θ
B= for some θ ∈ R.
− sin θ cos θ
4. In case θ = 0, i.e. A = I3 , take any solution

a0 of the equation
x (I3 − A) =

x0 . Then,
a0 +
(a) The axis of rotation: v1 .
(b) The (perpendicular) rotational plane:

a0 + v3 =
v2 , a0 + Im(A − I3 ).
(c) The angle of rotation: θ.

(3.8.25)

 
− 13 2
3 − 23
 1
T (
x) =
x0 +
x A, where A =  2
3
2
3 3
.
2
3 − 13 −3
2
Try to determine these x0 so that T is a rotation. In this case, determine

the axis and the angle of the rotation, also the rotational plane.
Solution The three row (or column) vectors of A are of unit length and
are perpendicular to each other. So A∗ = A−1 and A is orthogonal. Direct
1 1
det A = (4 + 4 + 4 + 8 + 8 − 1) = · 27 = 1.
27 27
Therefore the associated T is a rotation of R3 if

x0 is chosen from Im(I3 −A).

1
det(A − tI3 ) = − (t − 1)(3t2 + 4t + 3)
3
and thus A has a real eigenvalue 1 of multiplicity one. Solve
 4 
−3 3
2
− 23
 1

x (A − I3 ) = (x1 , x2 , x3 )  23 − 31 3
= 0
2
3 − 31 − 53
⇒ 2x1 − x2 − x3 = 0, 2x1 − x2 + 5x3 = 0
⇒ x = x1 (1, 2, 0) for x1 ∈ R.

Hence, take an eigenvector v1 = √1 (1, 2, 0).

5

Let v = (α1 , α2 , α3 ). Then

v ⊥
v1
⇔ α1 + 2α2 = 0.
Therefore, we may choose v2 = √15 (−2, 1, 0). Again, solve α1 + 2α2 =
−2α1 + α2 = 0, we may choose v3 = (0, 0, 1) = e3 . Now, B = {
v1 , v3 }
v2 ,
forms an orthonormal basis for R .
3
By computation,
√
1 2 5
v2 A = √ (4, −2, 5) = v2 +

v3 ,
3 5 3 3
√
1 5 2
v3 A = (2, −1, −2) = −

v2 + v3
3 3 3
   
1 0 0 √1 √2 0
√
 5  5 5

⇒ [A]B = P AP −1 =   − √2 0
2
0 3 , where P = √1
 √
3  5 5 .
0 − 35 2
3 0 0 1
To find Im(I3 − A), let x (I3 − A). Then

y =

y = x (I3 − A)
2 1
⇔ y1 = (2x1 − x2 − x3 ), y2 = (−2x1 + x2 + x3 ) and
3 3
1
y3 = (2x1 − x2 + 5x3 )
3
⇔ y1 + 2y2 = 0.
So the image subspace is x1 + 2x2 = 0 with a unit normal vector v1 . Note

∗ −1 ⊥ ∗ −1
that, since A = A , Im(A − I3 ) = Ker(A − I3 ) = Ker(A − I3 ) =
Ker(A − I3 ). This is the theoretical reason why v1 ⊥Im(A − I3 ) = Ker(A −
I3 )⊥ = v3 .
v2 ,

Take any point x0 on Im(A − I3 ), say
x0 = 0 for simplicity. Then, the
axis of rotation is v1 , the rotational

√ plane is v2 , v3 = Im(A − I3 )

and the angle of rotation is θ = tan−1 25 . See Fig. 3.70.
〈〈 v2, v3 〉〉 = 〈〈v1 〉〉⊥

〈〈v1 〉〉
v3 = e3
e2
v2 v1
e1
0
Fig. 3.70
We have two questions:
Q1 What is the image of the unit sphere x21 + x22 + x23 = 1 under
y =
x A?
2 2
Q2 What is the image of the cylinder x1 + x2 = 1 under y = x A?
For Q1 It is the unit sphere itself both by geometric intuition and by

analytic proof.
For Q2 Geometric intuition tells as that it is still a cylinder with central
e3 A = 23 , − 13 , − 23 and the base circle lying on the
axis along the vector
plane 2x1 − x2 − 2x3 = 0. And computation shows that the image has the
equation, a complicated one,
5x21 + 8x22 + 5x23 + 4x1 x2 + 8x1 x3 − 4x2 x3 = 9 (*)

in the eyes of N = { 0 ,
e1 , e3 }.
e2 ,
Hence, we raise a third question concerning Q2 and (*).
Q3 What is the equation of the image cylinder (*) in the orthonormal affine

basis C = { 0 , A1∗ , A2∗ , A3∗ } where Ai∗ denotes the ith row vector of A for
i = 1, 2, 3?
In B, one needs to compute

 
1 0 0

x 0 1 0 x∗
0 0 0

1 0 0
y P −1 )(P AP −1 )P 0 1 0 P −1 (P AP −1 )−1 (
= ( y P −1 )∗ = 1,
0 0 0
where y = x A and y P −1 is the coordinate vector of

y in the basis B.
In C, one needs to compute
   
1 0 0 1 0 0
  ∗  
x 0 1 0 y A−1 ) 0 1 0 (
x = ( y A−1 )∗ = α12 + α22 = 1,
0 0 0 0 0 0
where y A−1 = (α1 , α2 , α3 ) is the coordinate vector of

y =
x A in the basis
C. Hence, to view the image cylinder, under y = x A, in C is the same as

to view the original cylinder in N .
Case 8 Orthogonal reflection (Euclidean notions are needed)

Let us give a detailed account about the orthogonal reflection introduced
in Case 2.
The orthogonal projection and reflection

Let
a0 , a2 be three non-collinear points in the space R3 .
a1 and
(1) The orthogonal projection of R3 onto the affine subspace S:

a0 +

a1 − a2 −
a0 , a0 is the mapping Pproj defined by
x →

x −
a0 + ( a0 )Aproj = Pproj (
x ).
where

∗ ∗ −1 a1 −
a0
Aproj = A (AA ) A and A =
a2 −
a0 2×3
(see (*14) in Sec. 3.7.5). Note that AA∗ = I2 if {

a1 − a2 −
a0 , a0 } is
an orthonormal affine basis for the subspace S. See Fig. 3.71.
(2) The orthogonal reflection of R3 with respect to the affine subspace S is

the affine transformation T defined by

x → x −
a0 + ( a0 )R = T (
x ),
where
R = 2A∗ (AA∗ )−1 A − I3 = 2Aproj − I3 ,
satisfies
1. R is symmetric, i.e. R∗ = R, and
2. R is orthogonal, i.e. R∗ = R−1 .
and thus R2 = I3 (see (*17) and (*24) in Sec. 3.7.5). See Fig. 3.71.
a2
e3 PProj ( x ) a0 + 〈〈 a1 − a0,a2 − a0 〉〉
a0 a1
T ( x)
e1
0
e2
Fig. 3.71
(3) Let B = { a0 ,
a1 , a3 } be an orthonormal affine basis for R3 so that
a2 ,
S is a0 + a1 − a0 , a2 −

a0 . Let T be the orthogonal reflection defined
as in (2). Then,
1. In B,
 
1 0 0
[T (
x )]B = [
x ]B [T ]B , where [T ]B = 0 1 0 .
0 0 −1

2. In N = { 0 ,
e1 , e3 },
e2 ,
 
a1 −
a0
T (
x) =
a0 + ( a0 )P −1 [T ]B P,
x − where P =  a0  .
a2 −
a3 − a0

Here P is orthogonal, i.e. P ∗ = P −1 . Also, R = P −1 [T ]B P .

Also, an orthogonal reflection

(a) preserves all the properties 1–7 listed in (3.7.15), and
(b) preserves the length, angle, area and volume but reverses the
orientations. (3.8.26)

To test if an affine transformation T (
x) =
x0 +
x A is an orthogonal
reflection, try the following steps:
1. Justify if A is orthogonal and symmetric and det A = −1. Then the
x (I3 − A) =
associated T is an orthogonal reflection if x0 has a solution.
2. A has eigenvalues 1, 1, −1. Two kinds of normalized A are possible.
⊥
(a) Take a unit vector v1 ∈ Ker(A − I3 ) and vectors v3 ∈
v2 , v1 so
that B = { v1 , v2 , v3 } is an orthonormal basis for R . Then,
3
 
1 0 0
[A]B = 0 cos θ sin θ  for some θ ∈ R.
0 sin θ − cos θ
(b) Choose u 2 ∈ Ker(A − I3 ) which has dimension equal to 2 and
u 1,
u 3 ∈ Ker(A + I3 ) so that C = {

u 1, u 3 } is an orthonormal basis
u 2,
for R . Then
3
 
1 0 0
[A]C = 0 1 0 .
0 0 −1
3. Take any solution x (I3 − A) =
a0 of the equation x0 . Then,
(a) The direction of orthogonal reflection: a0 + v3 .
(b) The plane of invariant points: a0 + v2 , v3 =

a0 + Im(A − I3 ).
(3.8.27)

1 2 2
3 3 3
2 
T (
x) =
x0 +
x A, where A = 3 1
3 − 23  .
2
3 − 23 1
3
Determine these x0 so that each such T is an orthogonal reflection, and the
direction and the plane of invariant points.
Solution It is obvious that A is orthogonal and det A = −1. If we pick

x0 ∈ Im(A − I3 ), then the associated T is an orthogonal reflection.

1
det(A − tI3 ) = (−27t3 + 27t2 + 27t − 27) = −(t − 1)2 (t + 1)
27
and hence A has eigenvalue 1 of multiplicity 2 and another eigenvalue −1.
Solve

x (A − I3 ) = 0
⇒ x1 − x2 − x3 = 0.
Take unit eigenvectors

u1 = √1 (1, 1, 0) and
u2 = √1 (1, −1, 2) so that
2 6
u 1 ⊥

u 2.
Solve

x (A + I3 ) = 0
⇒ 2x1 + x2 + x3 = 0, x1 + 2x2 − x3 = 0.
Take a unit vector u 3 = √13 (1, −1, −1). Then, C = {

u 1, u 3 } is an
u 2,
orthonormal basis for R . In C,
3
 1 
  √ √1 0
1 0 0  1
2 2

[A]C = QAQ−1 = 0 1 0 , where Q =   6 − 6
√ √1 √2  .
6
0 0 −1 √1 − √13 − √13
3
Note that Im(A − I3 ) = Ker(A + I3 ) = Ker(A − I3 )⊥ . So there are

three different ways to compute Im(A − I3 ) = (1, −1, −1). Pick up any

point x0 on Im(A − I3 ), say x0 = 0 for simplicity, then the direction of the
reflection T = u 3 and the plane of invariant points is
x A is u 2 =
u 1,
⊥
{ x ∈ R | x1 − x2 − x3 = 0} = u 3 . See Fig. 3.72(a).
3
⊥
On the other hand, take v1 = u 1 = √12 (1, 1, 0). Then v1 =
{x | x1 + x2 = 0}. Choose v2 = √12 (1, −1, 0),
v3 = e3 = (0, 0, 1) so that
B = { v1 , v2 , v3 } forms an orthonormal basis for R . Since
3

v1 A =
v1 ,

1 1 1 4 1 4

v2 A =√ − , , =− v2 + √ v3 ,
2 3 3 3 3 3 2
√
2 2 1 2 2 1

v3 A = ,− , = v2 + v3
3 3 3 3 3
   
0 0 √1 √1 0
√ 2 2
 − 13 2 2  1 
⇒ [A]B = P AP −1 = 0 3 , where P =  √2 − √12 0 .
√
2 2 1 0 0 1
0 3 3
Also, the submatrix

√

− 13 2 2
3 −1 0 cos θ sin θ
√ =
2 2 1 0 1 − sin θ cos θ
3 3
√
3 , sin θ = − 3 . To interpret T ( x )

where cos θ = 1 2 2
=
x A in B, take any
x ∈ R3 . Then, letting y = x A,
point

x P −1 (P AP −1 )P
y =
⇒ [ x ]B (P AP −1 ),
y ]B = [ i.e. letting [
x ]B = (α1 , α2 , α3 ) and
[
y ]B = (β1 , β2 , β3 ),
β1 = α1 ,

−1 0 cos θ sin θ
(β2 , β3 ) = (α2 , α3 ) .
0 1 − sin θ cos θ
This means that, when the height α1 of the point x to the plane v3
v2 ,
being fixed, the orthogonal projection (α2 , α3 ) of x on v2 , v3 is subject

to a reflection with respect to the axis v3 to the point (−α2 , α3 ) and

then is rotated through the angle θ in the plane v3 with 0 as center
v2 ,
to the point (β2 , β3 ). Therefore, the point (β1 , β2 , β3 ), where β1 = α1 , is
the coordinate vector of x A = T (x ) in B. See Fig. 3.72(b). Compare with
Figs. 2.109 and 2.110. 2
x x
T ( x)
v1
e2 u1 e1
T ( x) e2
e1
(0, 2, 3) (0, 2, 3)
e3 v3 = e3
u3 0 v2
0
〈〈v2, v3 〉〉⊥ v1
u2
〈〈 u1 ,u2 〉〉⊥ u3
(b)
(a)
Fig. 3.72
Just like (2.8.38) and (2.8.39), an affine transformation T ( x) =

x0 +
xA
on R can be expressed as a composite of a finite number of reflections,
3
stretches and shearings and then followed by a translation.

For example, let T ( x) =
x0 +
x A where A is the one in Example 1 of
Sec. 3.7.5. Then, A is the composite of the following affine transformations

in N = { 0 , e1 , e3 },
e2 ,

1 0 0
1. 4 1 0 e1 with
: a shearing in the direction e3 as the plane of
e1 ,
0 0 1
invariant
points and coefficient 4.
1 0 0
2. 0 1 0 e1 with
e1 ,
1 0 1
invariant
pointsandcoefficient
1.
1 0 0 1 0 0 1 0 0
3. 0 −1 0 0 1 0 = 0 −1 0 : a two-way stretch having scale factor −1
0 0 1 0 0 5 0 0 5
along
e2
and 5 along
e3 and
e1 the line of invariant points.
Note that the former factor represents an orthogonal reflection in the

direction
e2 .
1 1 0
4. 0 1 0 e2 with
e2 ,
0 0 1
invariant
1 0 0
5. 0 1 5 e3 with
e1 ,
0 0 1
invariant
1 0 −5
6. 0 1 0 e3 with
e2 ,
0 0 1
invariant points and coefficient −5.
x →
7. x0 +
x : a translation along
x0 .
Readers shall practice more examples by themselves.
Exercises
<A>
1. Prove (3.8.11) and (3.8.13).

2. Let
a 0 = (1, 1, 1) and the vectors
x1 = (1, 2, 3),
x2 = (2, 2, 1) and

x3 = (3, 4, 3).
(a) Determine the reflection Ti of R3 in the direction xi with respect

a0 +
to the plane xk , where j = k = i for i = 1, 2, 3.
xj ,
(b) Describe T1 ◦ T2 , T2 ◦ T3 and T3 ◦ T1 respectively their mapping
properties.
(c) Compute T1 ◦ T2 ◦ T3 , T2 ◦ T3 ◦ T1 and T3 ◦ T1 ◦ T2 . Are they
coincident? Any geometric interpretation?
3. Let
 
2 3 0
T (
x) =
x0 +
x A, where A = −1 −2 0 .
0 0 1
Use this T to justify (3.8.13). For a particular choice of

x0 , do the
following questions.
(1) Find the image of the plane x1 − x2 = c under T .
(2) Find the image of the plane a1 x1 + a2 x2 + a3 x3 = b, where
a1 + a2 = 0, under T . Where do they intersect?
(3) Find the image of the tetrahedron ∆ a0
a1
a2
a3 and compute their
volumes.
4. Prove (3.8.15) and (3.8.16).
a0 = (−3, −6, 1) and the vectors
5. Let x2 = (1, −1, 1) and
x1 = (1, 2, 2),
x3 = (4, −12, 1).

(a) Determine the one-way stretch Ti of R3 with scale factor ki in

the direction a0 +
xi with respect to the plane xk , where
xj ,
j = k = i for i = 1, 2, 3.
(b) Determine the two-way stretches T1 ◦ T2 , T2 ◦ T3 and T3 ◦ T1 respec-
tively and their lines of invariant points.
(c) Give geometric interpretation and analytic proof to see if T1 ◦ T2 is
equal to T2 ◦ T1 .
(d) Compute T1 ◦ T2 ◦ T3 , T2 ◦ T3 ◦ T1 and T3 ◦ T1 ◦ T2 . Are they equal?
6. Use
 
4 −3 −3
T (
x) =
x0 +
xA, where A = 6 −5 −6
0 0 1
to justify (3.8.16). For a particular choice of

x0 , do the following
questions.
(1) Find the image of the plane x1 + 2x2 = c under T .
(2) Find the image of the plane a1 x1 + a2 x2 + a3 x3 = b under T , where

a1 − a2 − a3 = 0.
(3) Find the image of the plane a1 x1 + a2 x2 + a3 x3 = b under T , where
the vector (a1 , a2 , a3 ) is linearly independent of (1, 2, 0).
(4) Find the image of the tetrahedron ∆ a0
a1
a2
a3 and the ratio of
their volumes.
Do the same problem to x →x0 + xA∗ .
7. Prove (3.8.17) and (3.8.18).
a0 = (4, −3, −3) and the vectors
8. Let x2 = (2, −1, −12)
x1 = (1, 1, 4),

and x3 = (2, 1, 1).
(a) Find the two-way stretch Ti of R3 with scale factor kj along a0 +

xj and scale factor kl along a0 +
xl and a0 +
xi the line
of invariant points, where j = l = i for i = 1, 2, 3.
(b) Do problems (a), (b) and (d) as in Ex. 5.
9. Use
 
5 4 4

T(x) =
x0
+ xA, 
where A = −7 −3 −1
7 4 2
questions.
(1) Do the same questions Q1, Q2 and Q3 as in Example 3.
(2) Find the image of the tetrahedron ∆
a0
a1
a2
a3 under T and com-
pute there volumes.
x →
Do the same problem to xA∗ .
x0 +
10. Prove (3.8.19).
11. Use both
 
3 1 −1
T (
x) =
x0 +
xA, where A =  1 3 −1
−1 −1 5
and
 
3 0 1
T (
x) =
x0 +
xA, where A = 2 2 2
1 0 3
to justify (3.8.19).
12. Prove (3.8.20)–(3.8.22).

13. Let
a0 = (4, 2, 1)and the vectors
x1 = 0, 35 , 45 , 25 , − 25 and
x2 = 35 , 16 12
x3 = 45 , − 12
9
25 , 25 .
(a) Show that | x1 | = |

x2 | = |
x3 | = 1 and
x1 ⊥
x2 ⊥
x3 .
(b) Find the shearing Ti of R3 with coefficient ki = 0 in the direction xi
with a0 + xj , xk as the plane of invariant points, where j = k = i

for i = 1, 2, 3.
(c) Do problems (b), (c) and (d) as in Ex. 5.
14. Let
a0 = (2, 2, 1) and the vectors
y1 = (2, 1, 0),
y2 = (0, 1, 1) and

y3 = (2, 0, 2).
(a) Try to construct vectors x1 ,

x2 and x3 form
y1 ,
y2 ,
y3 so that
| x1 | = | x2 | = | x3 | = 1 and x1 ⊥ x2 ⊥ x3 .

(b) Do the same problems as in Ex. 13.
15. For each of the following, do the same problems as in Exs. 13 and 14.
(a)
y1 = (1, 1, 0), y2 = (0, 1, 1),

y3 = (1, 0, 1).
(b)
a0 = (−1, −2, 1) and the vectors y1 = (1, 2, 1), y2 = (1, 0, 1),

y3 = (1, 0, 2).
(c)
y1 = (1, 1, 0), y2 = (2, 0, 1),

y3 = (2, 2, 1).
16. Use
 
0 1 1
T (
x) =
x0 +
xA, where A = −2 3 2
1 −1 0

questions.
(1) Find the image plane of any plane parallel to the direction of the
shearing.
(2) Find the image plane of any plane parallel to the plane of invariant
points.
(3) Find the image of the tetrahedron ∆a0
a1 a2
a3 under the shearing
and compute its volume.
17. Prove (3.8.23)–(3.8.25).

18. Use
 
√1 √1 0
2 2
− √1 √1 √2 

T (
x) =
x0 +
x A, where A =  6 6 6
√1
3
− √13 √1
3

questions.
(1) Find the image of a line, parallel to or intersecting with the axis of
rotation, under T .
(2) Find the image of a plane, containing or parallel to the axis of
rotation.
(3) Find the image of a plane, perpendicular to the axis of rotation.
(4) Find the image of a plane, intersecting the axis of rotation at a
point.
(5) Find the image of a tetrahedron ∆ a0
a1
a2
a3 and compute its
volume.
(6) Find the image of the unit sphere x21 + x22 + x23 = 1. How about the
x2 x2 x2
ellipsoid 21 + 22 + 23 = 1? How about |x1 | + |x2 | + |x3 | = 1?
a1 a2 a3
19. Let
 1 2 2
3 3 3
 2 
T (
x) =
x0 +
xA, where A = − 3 2
3 − 13  .
− 23 − 13 2
3
Do the same problems as Ex. 18.

20. Let
 4
− 79 4
9 9
 8
T (
x) =
x0 +
xA, where A =  4
9 − 19 9
.
4
9
8
9 −9
1

21. Prove (3.8.26) and (3.8.27).
22. Use
 
1
2 −1 −1
2 
T (
x) =
x0 +
xA, where A = −1 1
2 −1
3
−1 −1 1
2

questions.
(1) Find the image of a line, parallel or skew to the direction of the
orthogonal reflection, under T .
(2) Find the image of a plane, containing or parallel to the direction
of reflection.
(3) Find the image of a plane, perpendicular to the direction of
reflection.
(4) Find the image of a plane, intersecting with the plane of invariant
points along a line.
(5) Find the image of a tetrahedron ∆ a0
a1
a2
a3 and compute its
volume.
(6) Find the images of |x1 | + |x2 | + |x3 | = 1 and x21 + x22 + x23 = 1.
23. Let
 
2 −2 −1
1
T (
x) =
x0 +
xA, where A = −2 −1 −2 .
3
−1 −2 2
24. Let
 1 
0 3 0
 1
A = 0 0 3
.
9 −9 3
(a) Express A as a product of a finite number of elementary matri-
ces, each representing either a reflection, a stretch or a shearing in

N = {0, e1 , e3 }.
e2 ,
(b) Show that A has only one eigenvalue 1 of multiplicity 3 and
(A − I3 )2 = O but (A − I3 )3 = O.
(1) Find a basis B = {
v1 , v3 } so that
v2 ,
 
1 0 0
[A]B = PAP −1 = 1 1 0
0 1 1
is the Jordan canonical form (see (3) in (3.7.50)).
(2) In B, A can be expressed as a product of shearing matrices, i.e.
  
1 0 0 1 0 0
[A]B = 1 1 0 0 1 0 .
0 0 1 0 1 1
Note that shearing here means skew shearing (refer to
Ex. <A> 17 of Sec. 2.8.2 and Ex. 1).
(c) Try to find the image of the cube with vertices at (±1, ±1, ±1)
x →
under xA by the following methods.
(1) Direct computation.
(2) Factorizations in (b) and (a), respectively.
25. Let
 
−1 1 0
A = −4 3 0 .
1 0 2
(a) Do as (a) of Ex. 24.
(b) Show that A has eigenvalues 2 and 1, 1 but A is not diagonalizable.
(1) Find a basis B = {
v1 , v3 } so that
v2 ,
 
2 0 0
[A]B = P −1 AP = 0 1 0
0 1 1
is the Jordan canonical form.
(2) In B, show that A can be expressed as a product of a one-way
stretch and a shearing, i.e.
  
2 0 0 1 0 0
[A]B = 0 1 0 0 1 0 .
0 0 1 0 1 1
(c) Do as (c) of Ex. 24.

1. Try to define skew shearing in R3 (refer to Ex. <A> 17 of Sec. 2.8.2).

2. Model after (1) and (2) in (3.8.26) and try to define the orthogonal
reflection of the space R3 with respect to a straight line. Then, show
that the orthogonal reflection of R3 with respect to the line
x1 x2 x3
= =
a11 a12 a13
is given by the matrix
 a2 2 2 
11 −a12 −a13
2 a11 a12 a11 a13
2  
R=  a11 a12
−a211 +a212 −a213
a12 a13 .
a211 + a12 + a13 
2 2 2 
−a211 −a212 +a213
a11 a13 a12 a13 2
3. Show that the orthogonal reflection of R3 with respect to the plane
a11 x1 + a12 x2 + a13 x3 = 0
is given by −R, where R is as in Ex 2. Explain this geometrically.

4. Try to define
(1) translation,
(2) reflection,
(3) (one-way, two-way, three-way) stretch,
(4) shearing,
(5) rotation, and
(6) orthogonal reflection
on R4 and then try to explore their respective features.
Read Ex.<C> of Sec. 2.8.2.

Try to do Ex. 4 on Fn over a field F, in particular, on Rn endowed
with the natural inner product.
3.8.3 Affine invariants

We come back to the proof of (3.7.15) in the content of an affine transfor-
mation T (x) = xA on R3 . Remind that A = [aij ]3×3 is an invertible
x0 +
real matrix.
Only these different from the proofs of (2.7.9) (see Sec. 2.8.3) are needed
to be touched here.
2. T preserves the planes.

Give a fixed plane (see (1) in (3.5.3))

Σ:
x =
a + t 1 b 1 + t2 b 2 , t1 , t2 ∈ R,

where b1 and b2 are linearly independent vectors in R3 . To find its image
T (Σ) under T , notice that

b
Σ: x = a + (t1 , t2 ) 1

b2
% &
b
⇒ T ( x) = x0 + a + (t1 , t2 ) 1

A
b2

b
=
x0 + a A + (t1 , t2 ) 1 A

b2
=
y 0 + t1
c 1 + t2
c2 , (*)

where c1 =b1 A,
c2 =
b2 A and y0 = T (
a) =
x0 +
a A and
c1 and
c2 are
linearly independent vectors in R . The latter is easily seen by showing
3

c1 b
(t1 , t2 ) = (t1 , t2 ) 1 A = 0 for same scalar t1 , t2
c2 b2

b
⇔ (since A is invartible) (t1 , t2 ) 1 = 0
b2

⇔ (since b1 and b2 are linearly independent) t1 = t2 = 0.
4. T preserves relative positions of lines and planes.
Give two lines

L1 :
x =
a1 + t b1 ,

L2 :
x =
a2 + t b2 , t ∈ R.
They might be coincident, parallel, intersecting or skew to each other (see
(3.4.5)). We only prove the skew case as follows.

L1 and L2 are skew to each other, i.e. b1 , b2 and a2 −
a1 are linearly
independent.

⇔ (since A is invertible) b1 A, b2 A and (a2 − a1 )A = ( x0 + a2 A) −

( x0 + a1 A) are linear independent.
⇔ (see (∗ )) The image lines

T (L1 ):
x = (
x0 +
a1 A) + t b1 A,

T (L2 ):
x = (
x0 +
a2 A) + t b2 A, t∈R
are skew to each other.
This finishes the proof.
Give a line L and a plane Σ. They might be coincident, parallel or
intersecting at a point (see (3.5.5)). Obviously T will preserve these relative
positions. Details are left to the readers. The same is true for the relative
positions of two planes (see (3.5.6)).
3. T preserves tetrahedron and parallelepiped.
This is an easy consequence of 4 and 2.
7. T preserves the ratio of solid volumes.
Let
a0 ,
a1 ,
a2 and a3 be four different points in space R3 . The set
3 /

a0 a1 a2 a3 = λi ai | 0 ≤ λi ≤ 1 for i = 0, 1, 2, 3

(3.8.28)
i=0
is called a parallelepiped with vertex at a1 −

a0 and side vectors a2 −
a0 , a0
and a3 − a0 and is called a degeneraled one if the side vectors are linearly

dependent. See Fig. 3.73 (see also Fig. 3.2). While the tetrahedron (see
Secs. 3.5 and 3.6)
3

∆a0
a1
a2
a3 = ai | 0 ≤ λi ≤ 1 for 0 ≤ i ≤ 3 and
λi
i=0
/
λ0 + λ1 + λ2 + λ3 = 1 (3.8.29)

is contained in the parallelepiped a0 a1 a2 a3 and has its volume one-sixth
of the latter. See Fig. 3.73 (also, see Fig. 3.17).
T ( a3 )
T ( a3 )
a3
T
or
T (a1)
T ( a2 )
a2 T ( a0 )
T ( a0 ) T ( a2 )
a0 a1 T ( a1 )
det A > 0 det A < 0
Fig. 3.73
Thus (see Sec. 5.3),

the signed volume of T (
a0 )T (a1 )T (
a2 )T (
a3 )
     
T ( a1 ) − T (a0 ) ( a1 − a0 )A a1 −
a0
= det T (a2 ) − T (a0 ) = det ( a2 − a0 )A = det  a0  detA
a2 −
T ( a3 ) − T ( a0 )

( a3 − a0 )A

a3 − a0

= (the signed volume of a0 a1 a2 a3 )detA. (3.8.30)
From this it follows immediately (refer to (2.8.44))
The geometric interpretation of the determinant
of a linear operator or a square matrix
Let f (
x) = xA: R3 → R3 be a linear operator and T ( x) = x0 +
xA the
associated affine transformation (A is allowed to be non-invertible here, see
(2.8.20)) in N = { e1 , e3 }, where A = [aij ]3×3 is a real matrix. Then
e2 ,
the determinant

a11 a12 a13

det f = det A = a21 a22 a23
a a32 a33
31

= the signed volume of the parallelepiped a0 a1 a2 a3 where

ai
= f (
ei )
= (ai1 , ai2 , ai3 ), the ith row of A, i = 1, 2, 3.

the signed volume of a0 a1 a2 a3
= .
the volume of the unit cube o e1 e2 e3
Therefore, for any such affine transformation T (
x) =
x0 +
x A,
the signed volume of the image domain T (Ω)
1. the signed volume of measurable space domain Ω = det A, in particular, if Ω is
a tetrahedron or a parallelepiped.
2. T preserves orientation ⇔ det A > 0; and reverses the orientation ⇔
det A < 0.
3. The image tetrahedron or parallelepiped is degenerated ⇔ det A = 0.
(3.8.31)
We have encountered lots of examples concerned with (3.8.31) in Secs. 3.8.1

and 3.8.2. No further examples will be presented here.
Exercises
<A>
1. Prove (3.7.15) in detail by using the following two methods.
(1) Complete the proof shown in the content.
(2) Use matrix factorizations of A, see Secs. 3.7.5 and 3.8.2.
Then, give concrete numerical examples to show that a–d are not nec-
essarily true in general.

Try to formulate a result for R4 similar to (3.7.15) and prove it.
Read Ex. <C> of Sec. 2.8.3.
3.8.4 Affine geometry

Readers should go back to Sec. 2.8.4 for fundamental concepts about
(planar) affine geometry.
According to F. Klein, the objects we study in spatial affine geometry are
the properties that are invariant under the affine transformations, i.e. under
the group Ga (3; R) of affine transformations (see Sec. 3.8.1). Section 3.8.3
indicates that affine geometry deals with the following topics:
1. Barycenters (see Sec. 3.6) such as centroid or median points.

2. Parallelism.
3. Ratios of each pair of parallel segments (see Sec. 2.8).
4. Collinearity of affine subspaces (i.e. Menelaus theorem).
5. Concurrence of affine subspaces (i.e. Ceva theorem).
But it does not deal with such objects as lengths, angles, areas and volumes
which are in the realm of spatial Euclidean geometry (refer to Part 2).
In what follows we will introduce a sketch of affine geometry for R3 to
the content and by the method that are universally true for more general
affine space over a field or ordered field.
3.8.4.1 Affine independence and dependence (Sec. 3.6 revisited)

A set of points {
a0 , ak } in R3 (or any affine space V ) is said to be
a1 , . . . ,
affinely independent if the vectors

a1 − a2 −
a0 , ak −
a0 , . . . , a0 (3.8.32)
are linearly independent in the vector space R3 (or V ) and affinely depen-
dent otherwise. See Ex. <A> 1.
For R3 , an affinely independent set B = { a0 ,
a1 , a3 } is called an
a2 ,

affine basis or affine frame with the point a0 as the base point or origin and
ai −

a0 the ith unit vector or coordinate vector.
3.8.4.2 Affine subspaces

Let S be any vector subspace of R3 (or V ) of dimension k for k = 0, 1, 2, 3
x0 any point in R3 . The set
and
x0 + S = {
Sk = v |
x0 + v ∈ S} (3.8.33)
is called a k-dimension affine subspace of R3 . In particular,

a point: S 0 = {
x0 },
a line: S 1 ,
a plane or hyperplane: S 2 ,
the space: S 3 = R3 .
For any basis { vk } for S, the set {
v1 , . . . , x0 ,
x0 +
v1 , . . . , vk } is
x0 +
an affine basis for S k and can be extended to an affine basis for R3 .
3.8.4.3 Affine coordinates and barycentric coordinates

(Sec. 3.6 revisited)
Let B = {
a0 , ak } be an affine basis for S k . Then
a1 , . . . ,

x ∈ Sk
⇔ The exists unique scalars x1 , x2 , . . . , xk ∈ R so that

k

x=
a0 + ai −
xi ( a0 ). (3.8.34)
i=1
The associated terminologies are as follows:

x with respect to B:
1. The affine coordinate (vector) of
[
x ]B = (x1 , x2 , . . . , xk ).
2. The ith coordinate of x in B: xi for 1 ≤ i ≤ k.
3. The ith unit or coordinate point: ai for 1 ≤ i ≤ k.
4. The ith coordinate axis: a0 + ai − a0 for 1 ≤ i ≤ k.
5. The ith coordinate hyperplane: a0 +
a1 − ai−1 −
a0 , . . . , ai+1 −
a0 ,
a0 , . . . , ak − a0 for 1 ≤ i ≤ k.

6. For any (k + 1) points pi in S k , 0 ≤ i ≤ k, let [ pi ]B = (αi1 , αi2 , . . . , αik )

be the coordinate of pi in B. Then the quantity

 
p1 − p0 B
1  .. 
V (
p0 ,
p1 , . . . ,
pk ) = det  . 
k!
pk − p0 B k×k

α01 α02 · · · α0k 1

1 α11 α12 · · · α1k 1
= . .. .. .. (3.8.35)
k! .. . . .

α α ··· α 1
k1 k2 kk
is called the signed volume with respect to B of the point set

{p0 , pk } or the k-simplex ∆
p1 , . . . , p1 · · ·
p0 pk if
p0 ,
p1 , . . . ,
pk are
affinely independent (see (3.8.38) and (3.8.30)).
Refer to Figs. 3.73 and Fig. 3.22.

Suppose a0 ,
a1 , . . . ,
ak are k + 1 points, not necessarily affine indepen-
dent, in R . Then a point
3
x ∈ R3 can be expressed as

k

x= λi
ai , where λ0 + λ1 + · · · + λk = 1.
i=0
p0 in R3 , the vector
⇔ for any fixed point x −
p0

k
x −
can be expressed as p0 = ai −
λi ( p0 ). (3.8.36)
i=0

k
Then the ordered scalars λ0 , λ1 , . . . , λk with i=1 λi = 1 is called a barycen-
tric coordinate of
x , in the affine subspace a0 + a1 − ak −
a0 , . . . , a0 ,
with respect to { a0 , a1 , . . . , ak }. In particular, if B = { a0 , a1 , . . . , ak } is

affinely independent, such λ0 , λ1 , . . . , λk are uniquely determined as long as

x lies in the subspace and hence,

k
(
x )β = (λ0 , λ1 , . . . , λk ) with λi = 1 (3.8.37)
i=0
is called the barycentric coordinate of x with respect to B or a barycenter
of B with weight λ0 , λ1 , . . . , λk .
In case {a0 , ak } is affinely independent in R3 . The set
a1 , . . . ,
k /
k
∆ a0 a1 · · · ak =

λi ai | λi ≥ 0 for 0 ≤ i ≤ k and

λi = 1
i=0 i=0
(3.8.38)
is called the k-simplex (see (3.8.53) below) with
ai for 0 ≤ i ≤ k,
vertices:
edges: aj for i = j and 0 ≤ i, j ≤ k,
ai
2-dimensional faces: ∆ a0 a2 , ∆
a1 a0 a3 , ∆
a1 a0 a3 and ∆
a2 a1
a2
a3
if k = 3.
In particular,
a0 : the point {
∆ a0 },
∆ a1 : the line segment
a0 a0
a1 ,
∆
a0
a1
a2 : the triangle (see Fig. 2.26),
∆
a0
a1
a2
a3 : the tetrahedron (see Figs. 3.17 and 3.74).
Note that a tetrahedron has 4 vertices, 6 edges and 4 faces which satisfy
the Euler
formula V − E + F = 2 for polyhedron (see Ex. 8), and the
point 13 , 13 , 13 is the barycenter.
a0
a3
a1
a2
Fig. 3.74

Let B = { a0 , ak } be affinely independent in
a1 , . . . , R3 and x, b0 ,

b1 , . . . , br be points in the subspace a0 +
a1 −
a0 , . . . ,
ak −a0 so that

C = { b0 , b1 , . . . , br } is affinely independent. Denote
[
x ]B = (x1 , . . . , xk ), and (
x )C = (λ0 , λ1 , . . . , λr );

[ bi ]B = (αi1 , . . . , αik ) for 0 ≤ i ≤ r.
Then,

x ∈ b0 + b1 − b0 , . . . , br − b0

 
α01 α02 ··· α0k
α11 α12 ··· α1k 
 
⇔ (x1 · · · xk ) = (λ0 λ1 · · · λr )  . .. ..  or, in short,
 .. . . 
αr1 αr2 ··· αrk (r+1)×k
 
b0
 B 
 b1 B 
[ x )C 
x ]B = (  .. 
 . (3.8.39)
 . 

bk B (r+1)×k
3.8.4.4 Operations of affine subspaces

Let S and S k be two affine subspaces of R3 (or V ), say
r
Sr =
y 0 + S1 and S k =
x0 + S2
for some vector subspaces S1 and S2 of R3 (or V ) of respective dimension r

and k.
The intersection subspace of S r and S k is defined to be the intersection

of S r and S k as subsets of R3 and is denoted by
Sr ∩ Sk. (3.8.40)
It can be shown easily that
1. S r ∩ S k = O
/ ⇔x0 −
y 0 ∈ S 1 + S2 ;
2. in case S ∩ S = O, one may choose
r k / y0 ∈ S r ∩ S k and
x0 =
Sr ∩ Sk =
x0 + (S1 ∩ S2 ). (3.8.41)
The sum space of S r and S k is defined as and denoted by
S r + S k = ∩{W is affine subspace of R3 containing both S r and S k }.
(3.8.42)
It can be shown that

x + (S1 + S2 ), if S r ∩ S k = O
/ and x0 ∈ Sr ∩ Sk
S r + S k = 0 . (3.8.43)
x0 + y0 − x0 ⊕ (S1 + S2 ), if S ∩ S k = O
r /
3.8.4.5 Dimension (or intersection) theorem

Let S r = x0 + S2 be affine subspaces of R3 (or any finite
y0 + S1 and S k =
dimensional affine space V ).
(a) If S r ∩ S k = O
/, then
dim S r + dim S k = r + k
= dim S r ∩ S k + dim(S r + S k ).
(b) If S r ∩ S k = O / = −1, then
/, define dim O
dim S r ∩ S k + dim(S r + S k ) = dim S r + dim S k − dim(S1 ∩ S2 ).
In particular,

1. in case S1 ∩ S2 = { 0 },
dim(S r + S k ) = dim S r + dim S k + 1;

2. in case S1 ∩ S2 { 0 },
dim(S r + S k ) < dim S r + dim S k + 1. (3.8.44)

3.8.4.6 Relative positions of affine subspaces

Suppose S r = y0 + S1 and S k = x0 + S2 are two affine subspaces of R3
(or V ). For their relative positions, we consider the following four cases.
Case 1 If S r ⊆ S k (or S r ⊇ S k ) happens, S r is said to be coincident with
a subspace, namely S r itself, of S k .

/ and S1 ∩ S2 = { 0 } happen, S r and S k are said to
Case 2 If S r ∩ S k = O
be skew to each other. Then, by (3.8.41) and (3.8.44),
S r and S k are skew to each other

⇔
y0 −
x0 ∈
/ S1 ⊕ S2 (and hence, S r + S k =
x0 +
y0 −
x0 ⊕ (S1 ⊕ S2 )).
⇔ dim(S r + S k ) = dim S r + dim S k + 1. (3.8.45)

Case 3 If S r ∩ S k = O/ but either S1 ⊇ S2 or S1 ⊆ S2 happens, then S r
and S k are said to be parallel to each other and is denoted as
Sr Sk. (3.8.46)
In case S1 ⊆ S2 , the translation

x→ x0 −
y0 +
x will transform S r =
y0 +S1
to an affine subspace x0 + S1 of S k =
x0 + S2 .

Case 4 If S r ∩ Sk = O
/, S1 ∩ S2 { 0 } and S1 S2 and S2 S1 happen,
then S and S are neither coincident, skew nor parallel. In this case, S r
r k
contains a subspace S1p = y0 + S1 ∩ S2 , where p = dim S1 ∩ S2 and S k

p
contains a subspace S2 = x0 + S1 ∩ S2 so that

S1p S2p .
Does this happen in R3 ?

We can use concepts introduced here to recapture (2.5.9), (3.4.5), (3.5.5)
and (3.5.6).
Take (3.4.5) and (3.5.6) as examples.
Example 1 Determine the relative positions of two lines S11 =

x0 + S1
y0 + S2 in R3 (see Fig. 3.13).
and S21 =
Solution Remind that both S1 and S2 are one-dimensional vector sub-

spaces of R3 and y0 − x0 is a vector in R3 .
In case S1 = S2 : then x0 = y0 will induce that S11 is coincident with
S2 , while x0 = y0 will result in the parallelism of S11 with S21 .
1

In case S1 ∩ S2 = { 0 }: then the condition y0 ∈ S1 ⊕ S2 implies
x0 −
that S1 intersects S2 at a point, while x0 − y0 ∈
1 1
/ S1 ⊕ S2 implies that S11 is
skew to S21 . 2
Example 2 Determine the relative positions of two planes S12 =

x0 + S1
y0 + S2 in R3 (see Fig. 3.16).
and S22 =
Solution Both S1 and S2 are two-dimensional vector subspaces of R3 .

In case S1 = S2 : if x0 −
y0 ∈ S1 , then S12 = S22 is coincident; if
x0 − y0 ∈

/ S1 , then S1 S2 .
2 2
In case S1 ∩ S2 is one-dimensional: no matter

x0 −
y0 ∈ S1 ∩ S2 or not,
S1 intersects with S2 along a line, namely x0 + S1 ∩ S2 if
2 2
y0 is chosen to
be equal to
x0 .
By dimension theorem for vector spaces: dim S1 + dim S2 = dim
S1 ∩ S2 + dim(S1 + S2 ) (see Ex. <A> 11 of Sec. 3.2 or Sec. B.2), since

dim(S1 + S2 ) ≤ 3, therefore dim S1 ∩ S2 ≥ 1. Hence S1 ∩ S2 = { 0 } never
happens in R and S1 and S2 can not be skew to each other.
3
2
As a consequence of Examples 1 and 2, in a tetrahedron
a0
a1
a2
a3
(see Fig. 3.74), we have the following information:
1. Vertices are skew to each other.
2. Vertex a3 is skew to the face a0
a1
a2 , etc.
3. The edge a1 is skew to the edge
a0 a2 a3 but intersects with the face

a1 a2 a3 , etc.
4. Any two faces intersect along their common edge.
Do you know what happens to
a0
a1
a2 a4 in R4 ?
a3
To cultivate our geometric intuition, we give one more example.
Example 3 Determine the relative positions of

(a) two straight lines in R4 ,
(b) two (two-dimensional) planes in R4 , and
(c) one (two-dimensional) plane and one (three-dimensional) hyperplane
in R4 .
Solution
(a) The answer is like Example 1. Why? Prove it.
(b) Let S12 =

x0 + S1 and S22 = y0 + S2 be two planes in R4 .
Besides the known three relative positions in Example 2, namely, coin-
cident, parallel and intersecting along a line, there does exist another
two possibilities in R4 : intersecting at a point, and Case 4 but never
skew to each other.
In case dim S1 ∩ S2 = 1: then
dim S1 = dim S2 = 2
⇒ dim(S1 + S2 ) = 2 + 2 − 1 = 3.
Hence, it is possible to choose y 0 ∈ R4 so that
x 0, x0 −
y0 ∈
/ S1 + S2 .
According to (3.8.41), S12 ∩ S22 = O / holds. So Case 4 does happen.

In case S1 ∩ S2 = { 0 }: then
dim S1 = dim S2 = 2
⇒ dim(S1 + S2 ) = dim(S1 ⊕ S2 ) = 2 + 2 − 0 = 4,
namely, S1 ⊕ S2 = R4 holds. No matter how x 0 and y 0 are chosen,
x 0 − y 0 ∈ R4 is always true. Therefore, S12 and S22 intersect at a point.

Since x0 − y0 ∈
/ R4 does not happen in this case, so S12 and S22 are
never skew to each other.
(c) Let S 2 = x 0 + S1 and S 3 = y 0 + S2 where dim S1 = 2, dim S2 = 3.
Since
dim S1 ∩ S2 = dim S1 + dim S2 − dim(S1 + S2 )
= 2 + 3 − dim(S1 + S2 ) = 5 − dim(S1 + S2 )
and 3 ≤ dim(S1 + S2 ) ≤ 4, therefore dim S1 ∩ S2 could be 1 or 2 only.
In case dim S1 ∩ S2 = 1: then dim(S1 + S2 ) = 4. For any two points
x 0 , y 0 in R4 ,

x0 − y 0 ∈ S1 + S2 = R4 always hold. Then S 2 and S 3
will intersect along the line x 0 + S1 ∩ S2 if
y0 =
x 0.
In case dim S1 ∩ S2 = 2: then S1 ⊆ S2 and S1 + S2 = S2 holds. For
any two points y 0 ∈ R4 , either
x 0,

x0 −
y 0 ∈ S1 + S2 ⇒ S 2 and S 3 are coincident because S 2 ⊆ S 3 ,
or

x0 −
y0 ∈
/ S1 + S2 = S2 ⇒ S 2 || S 3 , parallel to each other.
Remark In R4 .
S11 and S22 can be only coincident, parallel or intersecting at a point.
How about S 1 =
x 0 + S1 and S 3 =
y 0 + S2 ? Since
dim S1 ∩ S2 = 1 + 3 − dim(S1 + S2 ) = 4 − dim(S1 + S2 ),
and 3 ≤ dim(S1 + S2 ) = 4, dim S1 ∩ S2 can be only 1 or 0.
1. dim S1 ∩ S2 = 1: then S1 ⊆ S2 and S1 + S2 = S2 hold. For y 0 in R4 ,

x 0,
either

x0 −
y 0 ∈ S1 + S2 ⇒ S 1 and S 3 are coincident with S 1 ⊆ S 3 ,
or

x0 −
y0 ∈
/ S1 + S2 ⇒ S 1 || S 3 .
2. dim S1 ∩ S2 = 0: then S1 + S2 = S1 ⊕ S2 = R4 . No matter how x 0 and

y 0 are chosen in R , x 0 − y 0 ∈ S1 + S2 always holds. Then S and S 3
4 1
will intersect at one point.
See Exs. <A> 7 and 8 for more practice.
3.8.4.7 Menelaus and Ceva theorems

As a generalization of (2.8.45), we have
Menelaus Theorem Suppose { a 0, a k } are affinely independent

a 1, . . . ,

in R (or V with dim V < ∞), and put
3
a k+1 = a 0 . Let b i be an arbitrary
point on the line segment a i+1 other than the end points
a i a i and
a i+1 for

0 ≤ i ≤ k. Let αi ∈ R (or a field F) so that, for the vectors a i b i = b i − ai

a i+1 − b i ,
and b i a i+1 =

αi
a i b i = b i
a i+1 for 0 ≤ i ≤ k.
Then,

b 0, b 1, b 2, . . . , b k are affinely dependent.
⇔ α0 α1 α2 · · · αk = (−1)k+1 . (3.8.47)
Note that, for k = 2, this is the Menelaus theorem in (2.8.45). For k = 3

in R3 , see Fig. 3.75.
a0 a0 a0
b0
b0 b3 b0
a3
a3 a3
a1 a1 a1
b3
b1 b2 b1 b2 b1 b2
a2 a2 a2
(a) (b) (c) b3 = ∞
Fig. 3.75
Sketch of proof Suppose k = 3.

By assumptions, α0 α1 α2 α3 = 0 and
α0 1
b0 = a0 + a 1,
1 + α0 1 + α0
α1 1
b1 = a1 + a 2,
1 + α1 1 + α1
(∗ 1)
α2 1
b2 = a2 + a 3,
1 + α2 1 + α2
α3 1
b3 = a3 + a 0.
1 + α3 1 + α3
The necessity In case Fig. 3.75(a) or (b) happens, by (3.8.36) it follows

that there exist scalars λ0 , λ1 , λ2 so that

b3 = λ0 b 0 + λ1 b 1 + λ2 b 2 , where λ0 + λ1 + λ2 = 1.
By (∗ 1), this implies that

λ0 α0 λ0 λ1 α1 λ1 λ2 α2
a0 + + a1 + + a2
1 + α0 1 + α0 1 + α1 1 + α1 1 + α2
λ2 α3 1
+ a3 = a3 + a0
1 + α2 1 + α3 1 + α3
λ0 α0 1 λ2 α3
⇒ = , = ,
1 + α0 1 + α3 1 + α2 1 + α3
λ0 λ1 α1 λ1 λ2 α2
+ = 0 and + =0
1 + α0 1 + α1 1 + α1 1 + α2
1 + α0 α3 (1 + α2 ) 1 + α1
⇒ λ0 = , λ2 = and λ1 = −
α0 (1 + α3 ) 1 + α3 α0 α1 (1 + α3 )
⇒ (since λ0 + λ1 + λ2 = 1) α0 α1 α2 α3 = 1.
In case Fig. 3.75(c) happens, then there exist scalars t0 , t1 , t2 , not all
zeros, so that

a3 =
a 0 + t0 b 0 + t1 b 1 + t 2 b 2 , where t0 + t1 + t2 = 0

i.e. the vector a 3 −

a0 is parallel to the subspace b 0 + b 1 − b 0 , b 2 − b 0 .
Substitute (∗ 1) into the right of the above relation, and get

α0 t0 t0 t1 α1
a3 = 1 + a0 + + a1
1 + α0 1 + α0 1 + α1

t1 t2 α2 t2
+ + a2 + a3
1 + α1 1 + α2 1 + α2
α 0 t0 t0 t1 α1
⇒ 1+ = 0, + = 0,
1 + α0 1 + α0 1 + α1
t1 t2 α2 t2
+ = 0 and =1
1 + α1 1 + α2 1 + α2
⇒ (since t0 + t1 + t2 = 0) α0 α1 α2 = −1.
By imagination, the extended line a 0a 3 will intersect the subspace

b0 + b 1 − b 0 , b 2 − b 0 at a point b 3 = ∞ at infinity (see Sec. 3.8.4.10
below) and in this situation, α3 should be considered as −1. Therefore,
α0 α1 α2 α3 = 1
still holds.
The sufficiency We are going to prove that

b 0, b 1, b 2 and b 3 are affinely dependent.
⇔ There exist scalars t0 , t1 , t2 , t3 , not all zeros, so that

t 0 b 0 + t1 b 1 + t 2 b 2 + t3 b 3 = 0 , where t0 + t1 + t2 + t3 = 0. (∗ 2)
Now, (∗ 1) and (∗ 2) imply that

α0 t0 t3 t0 t1 α1
+ a0 + + a1
1 + α0 1 + α3 1 + α0 1 + α1

t1 t2 α2 t2 t3 α3
+ + a2 + + a3 = 0.
1 + α1 1 + α2 1 + α2 1 + α3

Since α0 α1 α2 α3 = 1, so b 0 , b 1 , b 2 and b3 are all distinct and
t0 , t1 , t2 and t3 can be so chosen, all not equal to zeros, so that
t3 (1 + α3 )α0 t0 (1 + α0 )α1
=− , =− ,
t0 1 + α0 t1 1 + α1
t1 (1 + α1 )α2 t2 (1 + α2 )α3
=− and =− .
t2 1 + α2 t3 1 + α3
In this case,

t 0 b 0 + t1 b 1 + t2 b 2 + t 3 b 3 = 0
does happen and

3

1 αi
t0 + t1 + t2 + t3 = + ti , where α4 = α0
i=0
1 + αi 1 + αi

α0 t0 t3 t0 t1 α1 t1 t2 α2
= + + + + +
1 + α0 1 + α3 1 + α0 1 + α1 1 + α1 1 + α2

t2 t3 α3
+ +
1 + α2 1 + α3
= 0.

In case b 3 = ∞, then t3 = 0 if and only if α3 = −1. Then, the condition

α0 α1 α2 = −1 would imply that b 0 , b 1 and b 2 are coplanar. 2
As a generalization of (2.8.46), we have

Ceva Theorem Let a 0,
a 1, . . . ,
a k ; b 0 , b 1 , . . . , b k and α0 , α1 , . . . , αk be
as in the Menelaus theorem (3.8.47). Suppose k ≥ 2. Let the affine subspace

πi : b i +
ai+2 − b i , . . . ,
a i−1 − b i for 0 ≤ i ≤ k,
where
a −1 =
a k,
a k+1 =
a 1 and
a k+2 =
a 2 . Then
π0 , π1 , . . . , πκ are concurrent at a point.

⇔ α0 α1 · · · αk = 1. (3.8.48)
Note that the case k = 2 is the Ceva theorem in (2.8.46). See Fig. 3.76.
a0 a0
b0
b3
b0
a3
a3
b2
a1 b2 a1
b1 b1
a2 a2 b3 = ⬁
Fig. 3.76
Sketch of proof Suppose k = 3. Adopt (∗ 1) in what follows.

π0 and π2 intersect along the line b 0 b 2 , while π1 and π3 intersect along

the line b 1 b 3 . Then
π0 , π1 , π2 and π3 are concurrent at a point.

⇔ The lines b 0 b 2 and b 1 b 3 are concurrent at a point.
⇔ There exist scalars t0 and t1 such that

(1 − t0 ) b 0 + t0 b 2 = (1 − t1 ) b 1 + t1 b 3 .
(1 − t0 )α0 1 − t0 t0 α2 t0
⇔ a0 + a1 + a2 + a3
1 + α0 1 + α0 1 + α2 1 + α2
t1 (1 − t1 )α1 1 − t1 t1 α3
= a0 + a1 + a2 + a3 .
1 + α3 1 + α1 1 + α1 1 + α3
(1 − t0 )α0 t1 1 − t0 (1 − t1 )α1
⇔ = , = ,
1 + α0 1 + α3 1 + α0 1 + α1
t0 α2 1 − t1 t0 t1 α3
= and = .
1 + α2 1 + α1 1 + α2 1 + α3
1 − t1 α2 α3 (1 + α1 ) t1 α0 α1 (1 + α3 )
⇔ = and = .
t1 1 + α3 1 − t1 1 + α1
⇔ α0 α1 α2 α3 = 1. 2
This finishes the proof.
3.8.4.8 Half-space, convex set, simplex and polyhedron

Concepts introduced here are still valid for finite-dimensional affine space V
over an ordered field F.
Let S k be a k-dimensional affine subspace of R3 (or V ) and, in turn,
S k−1
be an affine subspace of S k . Take an affine basis { a 0, a k−1 }
a 1, . . . ,
for S k−1
and extend it to an affine basis B = { a 0 , a 1 , . . . ,

a k−1 , a k } for
S k . For x ∈ S k , let [
x ]B = (x1 , x2 , . . . , xk−1 , xk ) be the affine coordinate
of x with respect to B. Then

x ∈ S k−1

⇔ the kth coordinate xk = 0,
namely, xk = 0 is the equation of S k−1 in B. Define

k−1
S+ = {
x ∈ S k | xk > 0},
(3.8.49)
k−1
S− = {
x ∈ S k | xk < 0}
and are called the open half-spaces of S k divided by S k−1 . Both S+ k−1
∪S k−1
and S− ∪ S
k−1 k−1
are called the closed half-spaces. For a point x of S k that
does not lie on S k−1 , the half-space containing x is called the side of x
k−1 1
with respect to S . Let p and q be two points on a line S . The closed
side of
q with respect to p is called the (closed) half line or ray from p to
−

q and is denoted by p q . The set
− −

pq = pq ∩ qp = {x = (1 − t) p + tq | 0 ≤ t ≤ 1} (3.8.50)
is called a segment joining
p and
q. Note that
pq =
qp. See Fig. 3.77.
S2
2
S
S−3
S+2 x2
S1 S1 x2
p q x1 2
x0 S
− x1
x0
S+3 x3
pq
Fig. 3.77
A subset C of R3 (or V ) is called convex if the segment joining any two

points of C is contained in C. Obviously,
1. an open or closed half space of each dimension is convex, and
2. the intersection C = ∩α Cα is convex if each Cα is convex.
Therefore, for any nonempty subset D of R3 (or V ),
Con(D) = ∩{C is convex in R3 and C ⊇ D} (3.8.51)
is a convex set and is called the convex hull or closure of D. Note that
k

k
Con(D) = λi x i | x i ∈ D, λi ≥ 0 for 0 ≤ i ≤ k and

λi = 1,
i=0 i=0
/
where k ≥ 1 is arbitrary .
In particular, in case D = {
a 0, a k } is a finite subset of R3 (or V ),
a 1, . . . ,
then
Con(
a 0,
a 1, . . . ,
a k) (3.8.52)
is called convex cell with its dimension the dimension of the affine subspace

a0 + a1 − ak −
a 0, . . . , a 0 .
In case {
a 0, a k } is affinely independent,
a 1, . . . ,
Con(
a 0,
a 1, . . . ,
a k ) = ∆ a1 ···
a 0 ak (3.8.53)
is called k-simplex as mentioned in (3.8.38). This k-simplex is contained in

the k-dimensional affine subspace S k : a 0 +
a1 − ak −
a 0, . . . , a 0 . For
each i, 0 ≤ i ≤ k, let
 
 k 
k
Si+ = a j | λi > 0
λj ,
 
j=0
 
 k 
k
Si− = a j | λi < 0
λj
 
j=0
a i +
be the open half-spaces of S k divided by the face Sik−1 : a1 −

a i−1 −
a i, . . . , a i+1 −
a i, ak −
a i, . . . , a i .
Then the corresponding closed
half-spaces are
k = S k ∪ S k−1
Si+ k = S k ∪ S k−1 .
and Si−
i+ i i− i
Hence
k k
∆ a1 ···
a 0 a k = ∩ S i+ , and
i=0
k
Int ∆ a1 ···
a 0 a k = ∩ Si+
k
(3.8.54)
i=0
while the latter is called an open k-simplex. As a consequence, R3 (or V )

can be endowed with a suitable topological structure so that R3 (or V )
becomes a Hausdorff topological space.
A subset of R3 (or V ) is called bounded if it is contained in some simplex.
A polyhedron in R3 (or V ) is a bounded subset obtained via a finite process
of constructing intersections and unions from a finite number of closed half-
spaces of various dimensions. In algebraic topology, it can be shown that
any polyhedron admits simplicial decomposition. Hence, a polyhedron can
also be defined as the set-theoretic union of a finite number of simplexes.
A convex polyhedron can be characterized algebraically by several linear
inequalities satisfied by coordinates of points contained in it. For example,
x2
2x1 + x2 = 50
(0, 35)
x1 + 2x2 = 70
(25, 0)
x1
0
Fig. 3.78
in R2 ,
2x1 + x2 ≤ 50,
x1 + 2x2 ≤ 70,
x1 ≥ 0,
x2 ≥ 0
all together represent the shaded point set (a polyhedron) shown in
Fig. 3.78. A general form of linear programming problems is to find val-
ues of x0 , x1 , . . . , xk that will maximize or minimize the function

k
f (x0 , x1 , . . . , xk ) = αi xi
i=0
subject to the constrained conditions

k
aij xj ≤ or ≥ or = bi for 1 ≤ i ≤ m.
j=0
3.8.4.9 Affine mappings and affine transformations

A mapping T : Rm (or V with dim V < ∞) → Rn (or W with dim W < ∞)
is called an affine mapping if
T (
x) =
x 0 + f (
x ), (3.8.55)
where f : Rm (or V ) → Rn (or W ) is a linear transformation. In case
Rm = Rn (or V = W ), it is called affine transformation or a regular or
proper one if its linear part f is an invertible linear operator (refer to the
convention mentioned in (2.8.20)).
Fix an affine basis B = { a n } for Rn (n = 1, 2, 3 . . .) so that

a 1, . . . ,
a0
the same notation B = { a 1 − a 0 , . . . , a n −

a 0 } is a basis for the vector
space Rn . Then, as shown in Sec. 3.8.1,
T (
x) =
x 0 + f (
x ) (regular), x ∈ Rn

⇔ (in B)[T (
x )]B = [
x 0 ]B + [
x ]B [f ]B i.e. let
n
[ x −
x ]B = (x1 , . . . , xn ) means a0 = ai −
xi ( a 0 ),
i=1
[
x 0 ]B= (b1 , . . . , bn ),
 
[f ( a 1 − a 0 )]B
 .. 
[f ]B = [aij ]n×n =  . , and
[f (
an −
a 0 )]B

[T ( x )]B = (y1 , . . . , yn ),
then

n
yj = bj + aij xi for 1 ≤ j ≤ n. (3.8.56)
i=1
Refer to (3.8.2) and (3.8.7). In particular, an (regular) affine transformation

represented by
yj = axj for 1 ≤ j ≤ n and a = 0,
is called a similarity or homothety or enlargement (in particular, in Rn , see

Sec. 3.8.3) with the base point a 0 as center.
Let (see(3.8.4))
Ga (n; R) = {regular affine transformations on Rn }, and

Ta (n; R) = {translations on Rn }. (3.8.57)
Then Ta (n; R) is a group, called the group of translations of order n, and

is a normal subgroup of Ga (n; R), the group of affine transformations of
order n. Also,
1. Ta (n; R), as a vector group (i.e. an additive group of vector space), acts
transitively on Rn .
2. Ta (n; R), as an additive group, is isomorphic to Rn .
Hence, the quotient group

Ga (n; R)/Ta (n; R) is isomorphic to GL(n;R),
the general linear group over R of order n (see Sec. 3.7.3). Furthermore, for
a fixed point x 0 in Rn (or V ), the isotropy group at
x0
x0 ) = {T ∈ Ga (n; R) | T (
Ia ( x0 }
x0 ) =
is a normal subgroup of Ga (n; R) and is isomorphic to GL(n; R). See
Ex. <A> 5 of Sec. 2.8.1.
3.8.4.10 Projectionalization of an affine space

Exs. of Secs. 2.6 and 3.6 might help understanding the material in
this subsection.
Parallelism between affine subspaces of R3 (or V, dim V < ∞) is an
equivalent relation (see Sec. A.1).
The equivalent class of a one-dimensional subspace S 1 is called a point
0
at infinity and is denoted as S∞ . See Fig. 3.79.
k
For a fixed subspace S , let
k−1
S∞ = {S∞
0
| S 1 is a line contained in S k }
= the equivalent class of S k under parallelism (3.8.58)
S⬁0
S1
Fig. 3.79
and called a (k − 1)-dimensional subspace at infinity. Note that, for two

subspaces S k and Ak , S k || Ak ⇔ S∞
k−1
= Ak−1 2
∞ . In particular, S∞ is called
the hyperplane at infinity. See Fig. 3.80.
The set-theoretic union
P 3 (R) = R3 ∪ S∞
2
(3.8.59)
is supplied with the structure of a projective space:
S⬁0 S⬁0 S⬁2

S⬁0
S1 S⬁0
⬁ S⬁0
S⬁1
Fig. 3.80
1. points: elements of P 3 (R);

2. lines: S 1 ∪ S∞
0
where S 1 ⊆ R3 , and S∞
1
; and
3. planes: S ∪ S∞ where S ⊆ R , and S∞
2 1 2 3 2
.
P 3 (R) can be endowed with a linear structure introduced as follows.

Let R4 be the four-dimensional vector space over the real field R and fix
the natural basis N = { e1 ,
e2 , e4 } for simplicity. For nonzero vectors
e3 ,
x , y in R , we say that x and
4
y are equivalent if there exists a nonzero
scalar λ so that

y = λ
x
y =
⇔ x as a one-dimensional subspace of R4 .
Then, the set of equivalent classes can be considered as the projective

space, i.e.

P 3 (R) = {
x |
x ∈ R4 and
x = 0 }. (3.8.60)
In case x = (x1 , x2 , x3 , x4 ), then (x1 , x2 , x3 , x4 ) or
x is called a homoge-
neous coordinate or representative vector of the point x with respect
to the basis N and is simply considered as the point x itself in many

occasions. Define
an ideal point or point at infinity: (x1 , x2 , x3 , 0),
an ordinary or affine point: (x1 , x2 , x3 , x4 ) with x4 = 0, and
the affine or inhomogeneous coordinate of an affine point:

x1 x2 x3 x1 x2 x3
, , , 1 or , , .
x4 x4 x4 x4 x4 x4
Therefore, in (3.8.59),
the affine subspace R3 = {(x1 , x2 , x3 , x4 ) | x4 = 0},
2
the hyperplane at infinity S∞ or π∞ : x4 = 0, and
the line: x2 =

x1 , x1 ⊕
x2 ,
where x1 and x2 are linearly independent vectors in R4 .

Call Ñ = {
e1 ,
e2 ,
e3 ,
e4 ;
e1 +
e2 + e4 } the natural projective basis
e3 +
for P (R) in (3.8.60) with e1 , . . . , e4 as vertices and
3
e1 + · · · +
e4 as unit.
Then the projective coordinate of x with respect to Ñ is defined as

4
[
x ]Ñ = [
x ]N = (x1 , x2 , x3 , x4 ) if
x= xi
ei (3.8.61)
i=1
and is equal to a homogeneous coordinate of x .

A one-to-one mapping F from P 3 (R) onto itself is called a projective
transformation if, for any σ = 0,
 
[F ( e1 )]N
[F ( e2 )]N 
[F ( x ]Ñ [F ]N , where [F ]N = 
x )]Ñ = σ[ [F (

e3 )]N 
[F (
e4 )]N 4×4
= [aij ]4×4 is invertible. (3.8.62)
[F ]N or σ[F ]N for any σ = 0 is called a matrix representation of F with

respect to Ñ . (3.8.62) can be expressed as

4
yj = σ aij xi , σ = 0, 1 ≤ j ≤ 4,
i=1
where [F (x )]Ñ = (y1 , y2 , y3 , y4 ).

Suppose a projective transformation F on P 3 (R) preserves the hyper-
plane at infinity π∞ invariant. Then
x4 = 0 if and only if y4 = 0
⇔ a14 x1 + a24 x2 + a34 x3 = 0 for all x1 , x2 , x3 ∈ R
⇔ a14 = a24 = a34 = 0.
Taking σ = 1
a44 (a44 = 0, why?), the transformation reduces to
1
4
yj = aij xi for 1 ≤ j ≤ 3,
a44 i=1
y4 = x4
xi

which, in terms of inhomogeneous coordinates i.e. replacing xi by x4 , etc. ,
can be rewritten as
a4j aij
3
yj = + xi for 1 ≤ j ≤ 3.
a44 i=1 a44
This represents an affine transformation on the affine space R3 . The reverse

process tells us that an affine transformation on R3 induces a projective
transformation on P 3 (R) leaving π∞ invariant.
If, in addition, F preserves each point at infinity invariant, then aij = 0,
for 1 ≤ i, j ≤ 3 and i = j while aii = 1, 1 ≤ i ≤ 3. In this case, the
corresponding affine transformation is a translation.
We summarize as
The Group Gp (3; R) of projective transformations
of order 3 over R
The set
Gp (3; R)
of all projective transformations on P 3 (R) constitutes a group.
(1) The set of all projective transformations on P 3 (R) that leave π∞ invari-
ant forms a subgroup of Gp (3; R) and is isomorphic to Ga (3; R), the
group of affine transformations on R3 .
(2) The set of all projective transformations on P 3 (R) that leave each point
at infinity invariant forms a subgroup of Gp (3; R) and is isomorphic to
Ta (3; R), the group of translations on R3 (see(3.8.57)). (3.8.63)
For detailed account about projective line, plane and space, refer to [6,
pp. 1–218].
Exercises
<A>
1. Suppose { a0 , a k } is affinely independent in R3 as defined in

a1 , . . . ,
(3.8.32). Let π: {0, 1, . . . , k} → {0, 1, . . . , k} be any permutation. Show
that {a π(0) , a π(k) } is affinely independent.
a π(1) , . . . ,
2. Prove (3.8.36).
3. Prove (3.8.39).
4. Prove (3.8.41) and (3.8.43).
5. Prove (3.8.44).
6. Prove (3.8.45).
7. In R3 , determine the relative positions of
(1) three lines S11 , S21 and S31 ,
(2) three planes S12 , S22 and S23 ,
(3) two lines S11 , S21 and one plane S 2 , and
(4) one line S 1 and two planes S12 , S22 .
8. In R5 , determine the relative positions of
(1) two lines S11 , S21 ,
(2) two planes S12 , S22 ,
(3) one line S 1 and one plane S 2 ,
(4) S 1 and S 3 , S 1 and S 4 ,
(5) S 2 and S 3 , S 2 and S 4 ,
(6) S13 and S23 , S 3 and S 4 , and
(7) S14 and S24 .
9. Prove (3.8.47) in R5 for k = 4.
10. Prove (3.8.48) in R5 for k = 4.
11. Fix two scalars ai , bi ∈ R (or an ordered field) with ai < bi for 1 ≤ i ≤ 3.
The set
{(x1 , x2 , x3 ) ∈ R3 | ai ≤ xi ≤ bi for 1 ≤ i ≤ 3}
is generally called a parallelotope (especially in Rn ). Show that a

parallelotope is a polyhedron. Is it convex?
12. Show that any convex cell is a polyhedron.
13. Try to model after Sec. 3.8.4.10 to define the projective line P 1 (R) and
the projective plane P 2 (R).
14. Let B̃ = { a1 ,
a2 ,
a3 , a5 } be a set of vectors in R4 so that any four
a4 ,
of them are linearly independent. Try to use this B̃ to replace Ñ in
(3.8.61) to reprove (3.8.62) and (3.8.63).
15. State and prove similar results for P 1 (R) and P 2 (R) as (3.8.63).
16. (cross ratio) Let A1 = x1 , A2 = x2 , A3 = ε1 x2
x1 + ε2
and A4 = α1 x1 + α2 x2 be four distinct points in P 3 (R) (or

P 1 (R) or P 2 (R)). Note that they are collinear. The cross ratio of
A1 , A2 , A3 and A4 , in this ordering, is defined and denoted by
α1 α2 ε2 α1
(A1 , A2 ; A3 , A4 ) = : = .
ε1 ε2 ε1 α2
Let F : P 3 (R) → P 3 (R) be a projective transformation. Denote

Ai = F (Ai ) for 1 ≤ i ≤ 4. Show that
(A1 , A2 ; A3 , A4 ) = (A1 , A2 ; A3 , A4 )
holds. This means that cross ratio is a projective invariant. Hence, cross
ratio plays one of the essential roles in projective geometry. Refer to
Ex. <A> 4 of Sec. 2.8.4 and Fig. 2.125.

The following problems are all true for Rn , n = 1, 2, 3, . . . , even for
n-dimensional affine space V over the real field (or an ordered field). The
readers are encouraged to prove the case n = 3, at least.
1. For a k-simplex ∆ a1 · · ·
a0 a k in Rn , define its
k

boundary ∂∆ = λj a j | λj ≥ 0 for 0 ≤ j ≤ k but at least one of
j=0
/

k
λj is equal to zero and λj = 1 ,
j=0
k /

k
interior Int ∆ = λj
aj | λj > 0 for 0 ≤ j ≤ k and λj = 1 ,
j=0 j=0
and
k

exterior Ext ∆ = a j | at least one of λ0 , λ1 , . . . , λk is less than
λj
j=0
/

k
zero and λj = 1 .
j=0
(a) In case k = n, show that ∂∆, Int ∆ and Ext ∆ are pairwise disjoint
and
Rn = Int ∆ ∪ ∂∆ ∪ Ext ∆.
(b) In case k = n, show that
(1) Int ∆ is a convex set,
(2) any two points in Ext ∆ can be connected by a polygonal
curve, composed of line segments, which is contained entirely
in Ext ∆, and
(3) any polygonal curve connecting a point in Int ∆ to another

point in Ext ∆ will intersect the boundary ∂∆ at at least one
point.
See Fig. 3.81. Therefore, Rn is said to be separated by any n-simplex
into three parts: Int ∆, ∂∆ and Ext ∆.
(c) In case 0 ≤ k ≤ n − 1, a k-simplex cannot separate Rn in the
following sense: for any point x0 ∈ ∆ a1 · · ·
a0 a k and any point y0
outside of it but in R , there exists a polygonal curve connecting
n

x0 to y0 which does not contain points in ∆ a0 a1 · · ·
ak except
x0 .
See Fig. 3.82.
(R 2 )
a0 y0
a1 x0
a2
Fig. 3.81
(R 3 )
a1
a0
x0
y0 a2
Fig. 3.82
2. Let K be a convex set on a k-dimensional subspace S k of Rn , where

x0 ∈
0 ≤ k ≤ n − 1. Take any point / K. The set
x0 ; K) = {(1 − t)
Con( x | 0 ≤ t ≤ 1 and
x 0 + t x ∈ K}
is called a cone with K as base and x0 as the vertex. See Fig. 3.83.
Show that Con( x0 ; K) is a convex set.
x0
x0
K K
x0 ∉S
k
x0 ∈S k
Fig. 3.83
3. Suppose a0 , a n−1 are affinely independent in Rn . Let S n−1 =

a1 , . . . ,
a0 + a1 − a0 , . . . ,

an−1 − a0 be the hyperplane spanned by

a0 , a1 , . . . , a n−1 . Take points

a n and b n in Rn and denote
A = Con(
a0 ,
a1 , . . . ,
a n−1 ,
an ),

B = Con(
a0 ,
a1 , . . . ,
a n−1 , bn ).

(a) Suppose
a n and b n lie on the same side of S n−1 (see (3.8.49)).
Show that
Int A ∩ Int B = φ,
and A and B have the same orientation, i.e.

   
−

a1 a0 a1 − a0
 ..   .. 
 .   . 
det   · det   > 0.
 
a n−1 − a0  
a n−1 − a0

an −
a0 bn −
a0
See Fig. 3.84(a).


(b) Suppose
a n and b n lie on the opposite sides of S n−1 . Show that
A ∩ B = Con(
a0 ,
a1 , . . . ,
a n−1 ),
and A and B have the opposite orientations, i.e.
   
a1 − a0 a1 − a0
 ..   .. 
 .   . 
det   · det   < 0.
an−1 − a0  an−1 − a0 

an −
a0 bn −
a0
See Fig. 3.84(b).
a3
a3
b3
s2 s2
a2
a0 a0
a2
a1 a1
(a) (b)
b3
Fig. 3.84
4. Let ∆ a0 a1 · · ·
a k be a k-simplex in Rn where k ≥ 2 and x
be its barycenter. Suppose x i is the barycenter of its ith face
∆ a1 · · ·
a0 ai−1 ai+1 · · ·
ak for 0 ≤ i ≤ k.
(a) Show that the line segments x i , 0 ≤ i ≤ k, meet at
a i x.
1
(b) Show that x i x = k+1 x a i (in signed length).
See Fig. 3.85.
5. Let
x be the barycenter of the k-simplex ∆ a1 · · ·
a0 a k in Rn where
k ≥ 2. Construct k-simplexes as follows:
A0 = ∆
xa1 · · ·
a k,
Ai = ∆ a1 · · ·
a0 x
a i−1 a i+1 · · ·
ak for 1 ≤ i ≤ k.
(a) If i = j, show that Ai and Aj have at most one face in common
but do not have common interior points.
a0
a0
x2 x2
x3 x1
x x1
a1 x
a1
x0 a3
x0 a2
a2
(a) k = 2 (b) k = 3
Fig. 3.85
(b) Show that
a1 · · ·
a0 ak = A0 ∪ A1 ∪ · · · ∪ Ak .
In this case, call A0 , A1 , . . . , Ak the barycentric subdivision of

a1 · · ·
a0 a k with respect to its barycenter. See Fig. 3.86.
a0
a0
x a3
x
a1 a1
a2
(a) k = 2 (b) k = 3 a2
Fig. 3.86
6. Let x be the barycenter of the n-simplex a1 · · ·

a0 an in Rn . The

remaining vertices of a0 , a1 , . . . , a n , after eliminating from them the
vertices a ik with 0 ≤ i1 < i2 < · · · < ik ≤ n + 1,
a i1 , . . . ,
span an (n − k)-simplex. Use xi1 ···ik to denote the barycenter of this
(n−k)-simplex. Take n distinct numbers i1 , i2 , . . . , in from 0, 1, 2, . . . , n.
Define an affine transformation T : Rn → Rn as
T (
x) =
a0 ,
T (
x i1 ···ik ) =
ak for 1 ≤ k ≤ n.
Let det f denote the determinant of the linear part f of T−1 . Show that
1
det f = (−1)σ(i0 ,i1 ,...,in ) ,
(n + 1)!
where (i0 , i1 , . . . , in ) is a permutation of 0, 1, 2, . . . , n, defined as
k → ik for 1 ≤ k ≤ n and 0 → i0 (i0 is different from
i1 , . . . , in ); (−1)σ(i0 ,i1 ,...,in ) = 1 or −1 according (i0 , i1 , . . . , in ) is an
even permutation or odd.
7. Let ak be linearly independent in Rn . Fix a point
a1 , . . . , a0 in Rn .
Define, for given positive scalars c1 , . . . , ck ,
/
k

P = a0 + λi a i | |λi | ≤ ci , for 1 ≤ i ≤ k .

i=1
This parallelotope (see Ex. <A> 11) is sometime called a

k-parallelogram with center at
a0 and side vectors
a1 , . . . ,
a k.
Concerned terminologies are:
V ertex:
x = a1 + · · · + εk
a0 + ε1 a k, where ε1 = ±c1 , . . . , εk = ±ck ;
F ace: { x ∈ P | λi = ci or − ci and |λj | ≤ cj for j = i, 1 ≤ j ≤ k}.

There are 2k vertices and 2k faces. The set of all its faces constitutes
its boundary ∂P. Furthermore, the set
x ∈ P | |λj | < cj for 1 ≤ j ≤ k}
Int P = {
is called the interior of P and the set Ext P = Rn − P is called the
exterior of P. See Fig. 3.87.
a3
a0 a1 a2
a0 a1 a0 a2
a1
Fig. 3.87
Use P to replace ∆ a1 · · ·
a0 a n in Ex. 1 to prove (a)–(c) there.
8. Generalized Euler formula: V − F + E = 2.
(a) Let a1 · · ·
a0 a k be a k-simplex in Rn . Any s + 1 distinct points
out of a0 , a1 , . . . ,

a k can be used to construct a s-simplex. The
total of them is αs = Cs+1 k+1
for 0 ≤ s ≤ k. Show that
α0 − α1 + α2 − · · · + (−1)k−1 αk−1 + (−1)k αk = 1.
(b) Let P be as in Ex. 7. Let, for 0 ≤ s ≤ k,

αs = Csk · 2s
= the number of s-parallelograms contained
in any one of its 2k faces.
Show that
α0 − α1 + α2 − · · · + (−1)k−1 αk−1 + (−1)k αk = 1.
1. Prove Menelaus theorem (3.8.47) and Ceva theorem (3.8.48) in Rn or

any finite-dimensional affine space V over a field.
2. Try to model after Sec. 3.8.4.10 to construct the n-dimensional projective
space P n (R) and develop its basic properties.
<D> Application
1. Let a1 · · ·
a0 a n be an n-simplex in Rn and α0 , α1 , . . . , αn be fixed real
numbers. Define a function f : a1 · · ·
a0 a n → R by
% n &
n
f λi
ai = λi αi ,
i=0 i=0
where λi ≥ 0 for 0 ≤ i ≤ n and λ0 + λ1 + · · · + λn = 1. Show that

f is uniformly Lipschitzen on a1 · · ·
a0 a n , i.e. there exists a constant
M > 0 so that
x ) − f (
|f ( y )| ≤ M |
x −
y|
for any two points x, y ∈ ∆ a1 · · ·
a0 a n . Moreover, is this f differen-
tiable everywhere in Int ? Stepanov theorem (refer to [32]; for details,
see H. Federer: Geometric Measure Theory (Springer-Verlag, 1969)) says
that it is differentiable almost everywhere. Try to use (3.8.39) and com-
pute its total differential at point where it is differentiable.
3.8.5 Quadrics
Here in this subsection, R3 is considered as an affine space in general, and
as a vector space in particular. In some cases, R3 as an Euclidean space
(see Part 2 and Chap. 5) is implicitly understood.

N = {0, e1 , e3 } always represents the natural affine basis for R3 ,
e2 ,

and x = [ x ]N = (x1 , x2 , x3 ) is used.
x in R3 that satisfies the equation of the second degree

The set of points
in three real variables x1 , x2 and x3 with real coefficients
b11 x21 + b22 x22 + b33 x23 + 2b12 x1 x2 + 2b13 x1 x3
+ 2b23 x3 x3 + 2b1 x1 + 2b2 x2 + 2b3 x3 + b = 0 (3.8.64)
is called a quadric surface or surface of the second order or simply quadric,
where the coefficients b11 , b22 and b33 are not all zeros. Adopt the natural
inner product , notation
y =
x, xy ∗.
Then (3.8.64) can be expressed as

x, x , b + b = 0,
x B + 2
where
 
b11 b12 b13
B = b21 b22 b23  is symmetric, i.e. bij = bji for 1 ≤ i, j ≤ 3,
b31 b32 b33

b = (b1 , b2 , b3 ). (3.8.65)
According to (1) in (2.7.71) and better referring to Example 3 in
Sec. 3.7.5, there exists an invertible matrix P so that
 
λ1 0
P BP ∗ =  λ2  . (*1)
0 λ3 3×3
Thus, (3.8.65) can be rewritten as

x P −1 (P BP ∗ )(

x P −1 )∗ + (
x P −1 )( b P ∗ )∗ + b = 0. (*2)
Since B = {P1∗ , P2∗ , P3∗ }, composed of row vectors of P , is a basis for R3 ,
then
x P −1 = (y1 , y2 , y3 ) = [

x ]B .
(*2) is reduced to, in B,
    
λ1 0 y1 c1
(y1 y2 y3 )  λ2  y2  + (y1 y2 y3 ) c2  + b = 0, where
0 λ3 y3 c3

b P∗ =
c = (c1 , c2 , c3 )
⇒ λ1 y12 + λ2 y22 + λ3 y32 + c1 y1 + c2 y2 + c3 y3 + b = 0. (*3)
Therefore, the xi xj terms where i = j are eliminated from (3.8.65).
Suppose λ1 c1 = 0 in (*3). The complete square of λ1 y12 + c1 y1 as

2
c1 c1 c2
2 2
λ1 y1 + c1 y1 = λ1 y1 + y1 = λ1 y1 + − 1
λ1 2λ1 4λ1
suggests that the affine transformation

c1

x→ x P −1 =
, 0, 0 + z = (z1 , z2 , z3 )
2λ1
will reduce (*3) to
λ1 z12 + λ2 z22 + λ3 z32 + c2 z2 + c3 z3 + b = 0. (*4)
In case λ2 c2 = 0 or (and) λ3 c3 = 0, the same method can be used to

eliminate the first order terms concerning x2 , x3 .
Since B = O3×3 , so λ1 , λ2 and λ3 in (*3) cannot be all equal to zero.
This means that there always has at least one term of second order left no
matter what affine transformation is applied.
Hence, after a suitable affine transformation, (3.8.65) can be reduced to
one of the following types.
The standard forms of quadrics
In the Cartesian coordinate system N = { e1 , e3 }, the quadrics (3.8.64)
e2 ,
are classified into the following 17 standard forms, where a1 > 0, a2 > 0,
a3 > 0 and a = 0.
x21 x22 x23
1. Ellipsoid (Fig. 3.88) + + = 1.
a21 a22 a23
x21 x22 x23
2. Imaginary ellipsoid + + = −1.
a21 a22 a23
x21 x2 x2
3. Point ellipsoid or imaginary elliptic cone 2 + 22 + 23 = 0.
a1 a2 a3
x21 x2 x2
4. Hyperboloid of two-sheets (Fig. 3.89) 2 + 22 − 23 = −1.
a1 a2 a3
x21 x2 x2
5. Hyperboloid of one-sheet (Fig. 3.90) 2 + 22 − 23 = 1.
a1 a2 a3
x21 x22 x23
6. Elliptic cone (Fig. 3.91) + − = 0.
a21 a22 a23
x21 x22
7. Elliptic paraboloid (Fig. 3.92) + + 2ax3 = 0.
a21 a22
x21 x2
8. Hyperbolic paraboloid (Fig. 3.93) 2 − 22 + 2ax3 = 0.
a1 a2
x21 x2
9. Elliptic cylinder (Fig. 3.94) 2 + 22 − 1 = 0.
a1 a2
x21 x22
10. Imaginary elliptic cylinder + + 1 = 0.
a21 a22
x21 x22
11. Imaginary intersecting planes + = 0.
a21 a22
x21 x2
12. Hyperbolic cylinder (Fig. 3.95) 2 − 22 = 1.
a1 a2
x21 x2
13. Intersection planes (Fig. 3.96) 2 − 22 = 0.
a1 a2
14. Parabolic cylinder (Fig. 3.97) x21 + 2ax2 = 0.
15. Parallel planes (Fig. 3.98) x21 − a2 = 0.
16. Imaginary parallel planes x21 + a2 = 0.
17. Coincident planes (Fig. 3.99) x21 = 0.
In the Euclidean space R3 , the coefficients of terms of second order can

be chosen as eigenvalues of A (see Remark 1 below), while in the affine
space R3 , they are not necessarily so (see Remark 2 below).
(3.8.66)
x3
e3
e2
e1
x1 x2
Fig. 3.88
Several remarks are provided.
Remark 1 The standard forms in the Euclidean space R3 .

x3
e3
e2
e1
x1 x2
Fig. 3.89
x3
e3
e2
e1
x1 x2
Fig. 3.90
x3
e3
e2
e1
x1 x2
Fig. 3.91
In (*1), P can be chosen as an orthogonal matrix and hence, λ1 , λ2 and λ3

are eigenvalues of B (see (5.7.3)). In this case, λ1 , λ2 and λ3 are kept all
the way down to (*3) and (*4).
x3
e3
e2
e1
x1 x2
Fig. 3.92
x3
e3
e1
e2
x1 x2
Fig. 3.93
x3
e3
e2
e1
x1 x2
Fig. 3.94
Since λ1 , λ2 and λ3 in (*3) cannot be all zeros, in case (*3) is of the

form
λ1 y12 + c2 y2 + b = 0, where λ1 = 0, c2 = 0,
x3
e3
e1 e2
x1 x2
Fig. 3.95
x3
e3
e1
e2
x1 x2
Fig. 3.96
x3
e3
e2
e1
x1 x2
Fig. 3.97
a translation will reduce it to the form
λ1 z12 + c2 z2 = 0
⇒ (replace z1 and z2 by x1 and x2 , respectively)
c2
x21 + 2ax2 = 0, where 2a = .
λ1
x3
e3
e2
e1
x1 x2
Fig. 3.98
e3
e2
0
e1
Fig. 3.99
This is the standard form 14. The readers should practice all other cases.
Instead of affine transformations, in Euclidean space R3 we use the
rigid motions to reduce the quadrics (3.8.64) to their standard forms listed
in (3.8.66). For details, refer to Sec. 5.10.
Remark 2 The standard forms in the affine space R3 .

If we adopt the Sylvester’s law of inertia (see (2) and (3) in (2.7.71)), let k
be the index of B and r = r(B) the rank of B. Then the quadrics (3.8.64)
have the following canonical forms:

k
r
Type (I) (k, r − k): x2i − x2i = 0,
i=1 i=k+1

k
r
Type(II) (k, r − k): x2i − x2i + 1 = 0, (3.8.67)
i=1 i=k+1

k
r
Type(III) (k, r − k): x2i − x2i + 2xr+1 = 0,
i=1 i=k+1
in the affine space R in a strict sense. For example,

3
1 (Ellipsoid): Type (II) (0, 3),

2 : Type (II) (3, 0).
3 : Type (I) (3, 0).
4 : Type (II) (2, 1), etc.
In general, a cone is of Type (I), a parabolic surface is of Type (II), an
elliptic surface is of Type (II) (0, r), a hyperbolic surface is of Type (II)
(k, r − k) where 0 < k < r and Type (II) (k, 0) represents empty set.
But, in practice, especially in lower dimensional affine spaces such as
R2 and R3 , we prefer to use (3.8.66) as our standard forms for quadrics.
Remark 3 Centers. Nondegenerated quadrics.

c in the space R3 is called a center of a quadric S if for each point
A point
x ∈ S, there is another point

y ∈ S so that
1
c = ( x + y ). (3.8.68)
2
(refer (2.8.53)).
According to the standard forms (3.8.66), types 1, 3, 4, 5 and 6 all have

a unique center at 0 = (0, 0, 0), types 9 and 12 all have the x3 -axis as their
sets of centers, type 15 has the x2 x3 -coordinate plane as its set of centers,
while type 17 has every point of it as a center. Types 7, 8 and 14 do not
have any center.
Among these, types 1, 4, 5, 7 and 8 are called nondegenerated or proper
quadrics, and the others degenerated quadrics.
Remark 4 Regular and singular points of a surface (calculus are required).

Let F (x1 , x2 , x3 ) be a real-valued C∞ (or C3 ) function defined on an open
subset of the Euclidean space R3 . Roughly speaking, the set of points sat-
isfying

F (x1 , x2 , x3 ) = 0, x = (x1 , x2 , x3 )
is called a surface in R3 . For simplicity, usually call F (x1 , x2 , x3 ) = 0 is (the
equation of) a surface S.
Suppose a certain neighborhood of a point x0 on S is given by a C3
vector-valued mapping
(u, v) →
γ (u, v) = (x1 (u, v), x2 (u, v), x3 (u, v))
so that the tangent vectors at
γ (u0 , v0 ) =
x0 ,
∂
γ ∂
γ
(u0 , v0 ) and (u0 , v0 ) are linearly independent,
∂u ∂v
then x0 is called a regular point of S. A point on S that is not a regular

point is called a singular point.
It can be shown that, in case x0 = γ (u0 , v0 ) is a regular point, there
exist an open neighborhood O of (u0 , v0 ) and a neighborhood, say γ (O),
of
x0 so that
(u, v) ∈ O → (u, v) ∈
γ (O)
is a diffeomorphism (i.e. one-to-one, onto, γ −1 are C3 ). See
γ and
Fig. 3.100.
O *γ
(u , v )
*v 0 0
e2
γ x0
γ (O) *γ
(u0 , v0 ) (u , v )
*u 0 0
0 e1
Fig. 3.100
Geometrically, this means that the part γ (O) of S is smooth, single-layered

and contains no cusp or self-intersecting subset of S.
By a double point x0 of a quadric F (x1 , x2 , x3 ) = F (
x ) = 0, we mean
the point x0 with the property that F (

x + x0 ) does not contain first order
terms in x1 , x2 and x3 .
A quadric without singular points is called a central quadric if it has a
center or centers. According to (3.8.66), they are types 1, 4, 5, 9, 12 and 15.
Noncentral quadrics without singular points are types 7, 8 and 14.
A singular point of a quadric is a double point. The set of singular points
of a quadric is either

1. a point, such as type 6 with 0 as the singular point,
2. a line, such as type 13 with
x3 -axis as the singular line, or
3. a plane, such as type 17 with itself as the singular plane.
See Figs. 3.91, 3.96 and 3.99, respectively.
In what follows, we adopt homogeneous coordinates and affine coordi-

nates for R3 , introduced in Sec. 3.8.4.10 to determine affine invariants about
quadrics (refer to (2.8.59)).
In homogeneous coordinates (x1 , x2 , x3 , x4 ), the quadrics (3.8.64) is

expressed as

4
bij xi xj = 0 with b4j = bj4 = bj for 1 ≤ j ≤ 3 and b44 = b, (3.8.69)
i,j=1
or (3.8.65) as

x,
x B̃ = 0, with x = (x1 , x2 , x3 , x4 ) and
 
b11 b12 b13 b1 ∗

b21 b22 b23 b2  B b

B̃ =   = . (3.8.70)
b31 b32 b33 b3  b b
4×4
b1 b2 b3 b
Note that, both (3.8.64) and (3.8.65) may be obtained by setting
x4 = 1 in (3.8.69) and (3.8.70) respectively. In this case, the affine coordi-
nate (x1 , x2 , x3 , 1) is treated as (x1 , x2 , x3 ) and is considered as a point in R3
and is still denoted by x . Meanwhile, it is convenient to write (3.8.64) as
 
x1
x2 
(x1 , x2 , x3 , 1)B̃  
x3  = 0. (3.8.71)
1
According to (1) in (3.8.63), an affine transformation T (
y) =
y0 +
yA
on R can be written, in homogeneous coordinates, as
3

x =
y Ã, with

y = (y1 , y2 , y3 , y4 ),

x = T (
y ) = (x1 , x2 , x3 , x4 ),
 
a11 a12 a13 0

A 0  a21 a22 a23 0
Ã = = a31 a32
 , (3.8.72)
y0 1 a33 0
α1 α2 α3 1 4×4
where
y 0 = (α1 , α2 , α3 ) and A = [aij ]3×3 is invertible. Or, in the corre-
sponding affine coordinates,
(x1 , x2 , x3 , 1) = (y1 , y2 , y3 , 1)Ã,
where

y = (y1 , y2 , y3 ) and x = T (
y ) = (x1 , x2 , x3 ). (3.8.73)
Note that Ã is invertible.
By the same process leading to (*14) and beyond in Sec. 2.8.5, we have
the following counterpart of (2.8.59).
The affine invariants of quadrics

For a quadric

x, x + b = 0,
x B + 2 b ,
the signs or zeros of
1. det B, and
∗

B b
2. det
b b
are affine invariants. In case these two quantities are positive, the positive-
ness of tr B is also an affine invariant.
(3.8.74)
These three quantities are Euclidean invariants under the rigid motions on
R3 (see Sec. 5.7).
We postpone the characterizations of quadrics by means of Euclidean
concepts to Sec. 5.10 (refer to (2.8.52) for quadratic curves), including there
many computational examples.
Remark 5 The standard (or canonical ) forms of quadrics in the projective

space P 3 (R), introduced in (3.8.59) and (3.8.60).
By applying the Sylvester’s law of inertia (see (2.7.71)) to the symmetric
matrix B̃ in (3.8.70), we obtain the canonical forms of quadrics in P 3 (R) as

k
r
x2i − x2i = 0, (3.8.75)
i=1 i=k+1
where k is the signature of B̃ and r = r(B̃), the rank of B̃ (compare with

(3.8.67)).
Also, a regular quadric is central or noncentral (see Remark 4) if and
only its center belongs to the affine space R3 or is a point at infinity (refer
to Ex. 8 of Sec. 2.8.5).
Exercises
<A>
4. Prove (3.8.75) and the statement in the last paragraph in detail.
APPENDIX A
Some Prerequisites
A.1 Sets
A set is a collection of objects, called members or elements of the set. When
a set is to be referred to more than once, it is convenient to label it, usually,
by a capital letter such as A, B, . . ..
There are two distinct ways to describe a set:
1. By listing the elements of the set between curly brackets { }.

2. By giving a rule or characteristic property, which the elements of the set
must satisfy.
For example, the set of all even positive integers less than 10 is written as
{2, 4, 6, 8} = {6, 2, 8, 4} or
{x | x is an even positive integer less than 10}.
Note that each element of a set is not repeated within the set itself and the
order in which the elements of a set are listed is immaterial.
Some definitions and notations are listed as follows:
1. A ⊆ B or B ⊇ A (A is a subset of B): every element of A is an element

of B. A B (A is a proper subset of B).
2. A = B (A is equal to B): if and only if A ⊆ B and B ⊆ A.
3. ∅ (empty set): the set that contains no elements.
4. x ∈ A: x is an element of the set A.
5. x ∈
/ A: x is not an element of the set A.
6. A ∪ B (the union of A and B): A ∪ B = {x | x ∈ A or x ∈ B}.
7. A ∩ B (the intersection of A and B): A ∩ B = {x | x ∈ A and x ∈ B}.
8. A and B are disjoint if A ∩ B = ∅.
9. A − B (the difference of A by B): {x | x ∈ A and x ∈ / B}.
10. A × B (the Cartesian product of A and B): {(x, y) | x ∈ A and y ∈ B}.
681
682 Some Prerequisites
Let ∧ be an index set and {Aλ | λ ∈ ∧} be a collection of sets, the union

and intersection of these sets are defined respectively by

Aλ = {x | x ∈ Aλ for some λ ∈ ∧}, and
λ∈∧
4
Aλ = {x | x ∈ Aλ for all λ ∈ ∧}.
λ∈∧
A rule for determining if, for each ordered pair (x, y) of elements of a
set A, x stands in a given relationship to y, is said to define a relation R
on A. A relation R on a set A is called an equivalent relation if the following
three conditions hold.
1. (Reflexivity) xRx (x is in relation R to itself).
2. (Symmetry) xRy ⇒ yRx.
3. (Transitivity) xRy and yRz ⇒ xRz.
Let R be an equivalent relation, x∼y is usually written in place of xRy.
That x − y is divisible by a fixed integer is an equivalent relation on the set
of integers.
A.2 Functions
A function f from a set A into a set B, denoted by
f : A → B,
is a rule that associates each element x ∈ A to a unique element, denoted
by f (x), in B. Equivalently, a function is a set of ordered pairs (as a subset
of A × B) with the property that no two ordered pairs have the same first
element.
Some terminologies are at hand.
1. f (x): the image of x under f .
2. x: a preimage of f (x) under f .
3. A: the domain of f .
4. f (A) = {f (x) | x ∈ A}: the range of f , a subset of B.
5. f −1 (S) = {x ∈ A | f (x) ∈ S}: the preimage of S ⊆ B.
6. f = g (f is equal to g): f (x) = g(x) for all x ∈ A if f : A → B and
g: A → B.
7. f is one-to-one: If f (x) = g(y) implies x = y, or, equivalently, if x = y
implies f (x) = f (y).
8. f is onto: f (A) = B if f : A → B, and said that f is onto B.
A.2 Functions 683
9. f |S or f | S (restriction of f on a subset S ⊆ A): f |S (x) = f (x) for each

x ∈ S if f : A → B.
Let A, B and C be sets and f : A → B and g: B → C be functions.
Note that the range f (A) of f lies in the domain B of the definition of g
as a subset. See Fig. A.1. Then, the composite of g and f is the function
g ◦ f : A → C defined by
B
A
f f ( A) g
g( f (A)) = g f (A)
˚
g f
˚
Fig. A.1
(g ◦ f )(x) = g(f (x)), x ∈ A.
Usually, g◦f = f ◦g, even if both are defined. But associative law h◦(g◦f ) =
(h ◦ g) ◦ f is true.
The identity function 1A : A → A is the function
1A (x) = x, x ∈ A.
It keeps every element of A fixed.
Suppose that f : A → B is a function such that there exists another
function g: B → A satisfying
g ◦ f = 1A , i.e. g(f (x)) = x, x ∈ A.
Then f is one-to-one, and g is onto and is called a left inverse function of f .
In case
f ◦ g = 1B , i.e. f (g(y)) = y, y∈B
then f is onto, and g is one-to-one and is called a right inverse function of f .
The following are equivalent:
1. f : A → B has a function g: B → A as both left and right inverse
functions, i.e.
g(f (x)) = x, x ∈ A and f (g(y)) = y, y ∈ B.
2. f : A → B is one-to-one and onto.
Such a function f is called invertible. Being unique if f is invertible,

g: B → A is called the inverse function of f and is denoted by
f −1 : B → A.
A.3 Fields
Model after the set of real numbers in the following definition and the
properties henceforth derived.
Definition A field F is a set together with two operations “+” (called
addition) and “·” (called multiplication) defined on it so that, for each pair
of elements a, b ∈ F, there are unique elements a + b and a · b in F for which
the following conditions hold for all a, b, c ∈ F:
(1) Addition
(a) (commutative) a + b = b + a.
(b) (associative) (a + b) + c = a + (b + c).
(c) (identity element) There exists an element 0 ∈ F, called zero,
such that
a + 0 = a.
(d) (inverse element) For each a ∈ F, there exists an element, denoted
by −a, in F such that
a + (−a) = a − a = 0.
(def.)
(2) Multiplication
(a) (commutative) a · b = b · a.
(b) (associative) (a · b) · c = a · (b · c).
(c) (identity element) There exists an element 1 ∈ F, called unity, such
that 1 · a = a.
(d) (inverse element) For each nonzero element a ∈ F, there exists an
element, denoted by a−1 , in F such that
a−1 · a = 1.
(3) Addition and multiplication
(distributive) a · (b + c) = a · b + a · c.
The elements a + b and a · b (also denoted as ab) are called the sum and
product, respectively, of a and b. If b = 0, then a · b−1 also denote as a/b or
a
b and is called division by b.
A.3 Fields 685
In a F, the following operational properties hold:

1. Cancellation laws
a + b = c + b ⇒ a = c.
ab = cb and b = 0 ⇒ a = c.
2. Hence, 0 and 1 are unique.
3. a · 0 = 0 for any ∈ F.
4. a · (−b) = (−a) · b = −a · b.
5. (−a) · (−b) = a · b.
Proofs are left to the readers.
Standard fields and their fixed symbols
The Real Field R:
R is an ordered field too and has Archimedean property. Readers of this
book are supposed to be familiar with all the operational properties of R.
The Complex Field C:
The set
√
C = {a + ib | a, b ∈ R} where i = −1
with the
addition: (a1 + ib1 ) + (a2 + ib2 ) = (a1 + a2 ) + i(b1 + b2 ), and
multiplication: (a1 + ib1 )(a2 + ib2 ) = (a1 a2 − b1 b2 ) + i(a1 b2 + a2 b1 ),
is a field where
0 = 0 + i0, 1 = 1 + i0
are the respective zero and identity elements and, if z = a + bi = 0 (means
at least one of a and b is not zero), the multiplicative inverse is
1 a − bi
= z −1 = 2 .
z a + b2
C is called the complex Field and its elements complex numbers. C can-
not be ordered. A complex number is usually denoted by z = a + ib with
a = Re z the real part and b = Im z the imaginary part of it. The following
are common sense in C.
1. z̄ = a − ib is called the conjugate of z = a + bi = a + ib.
2. z z̄ = a2 + b2 ≥ 0.
3. Define the absolute value of z by
√
|z| = z z̄ = a2 + b2 .
Then,
(a) |z| ≥ 0, and = 0 ⇔ z = 0.
z1 |z1 |
(b) |z| = |z̄|; |z1 z2 | = |z2 z1 | = |z1 ||z2 | and z2 = |z2 | .
(c) |z1 + z2 | ≤ |z1 | + |z2 | with equality if and only if there exists α ≥ 0
such that z1 = αz2 or z2 = αz1 .
Geometrically, a complex number z = a+ib is better interpreted as the point

with coordinate (a, b) in the Cartesian coordinate plane or is considered as
a plane vector pointing from (0, 0) to the point (a, b).
The Finite Field Ip (p is a prime):
Let p be a fixed prime number. The residue classes of remainders under
division by p
Ip = {0, 1, 2, . . . , p − 1}
together with usual addition and multiplication of integers modulus p is a
field with finitely many elements.
Ip has a finite characteristic p. This means that 1 + 1 + · · · + 1 = 0 if
and only if, the exact time of the appearance of 1 is a multiple of p in the
summand.
A field having no finite characteristic is said to be of characteris-
√
such as R, C, the rational field Q and the radical field Q( 2) =
tic 0, √
{a + b 2 | a, b ∈ Q} with addition and multiplication as in R.
A.4 Groups
Model partially after the addition properties of the set of integers or the
multiplication properties of the set of nonzero real numbers.
Definition A group G is a set on which an operator ◦ is defined so that
associates to each pair of elements a, b in G, a unique element a ◦ b in G for
which the following properties hold for all a, b, c in G:
1. (associative) (a ◦ b) ◦ c = a ◦ (b ◦ c).
2. (identity element) There exists an element e in G so that
a ◦ e = e ◦ a = a.
3. (inverse element) For each a in G, there exists an element a−1 in G
so that
a ◦ a−1 = a−1 ◦ a = e.
A.5 Polynomials 687
If, in addition, a ◦ b = b ◦ a holds for all a, b in G, then the group G is called

commutative or abelian.
A nonempty subset S of a group G is called a subgroup if a, b ∈ S, then
a ◦ b ∈ S and if a ∈ S, then a−1 ∈ S.
For example, the set of nonzero elements in any field is a group under
the operation of field multiplication, while the field itself is a group by using
addition operation.
The set of invertible real n × n matrices forms a nonabelian group
GL(n; R) under matrix multiplication. Various subgroups of GL(n; R), espe-
cially when n = 2, 3, will be introduced in this book.
A.5 Polynomials
A polynomial in indeterminate t with coefficients from a field F is an expres-
sion of the form
p(t) = an tn + an−1 tn−1 + · · · + a1 t + a0 ,
where n is a non-negative integer and an , an−1 , . . . , a1 , a0 are elements of F.

If F = R, p(t) is called a polynomial with real coefficients; if F = C,
a polynomial with complex coefficients.
Here are some conventions:
1. If an = · · · = a0 = 0, then p(t) = 0 is called the zero polynomial and of

degree −1.
2. The degree of a nonzero polynomial is defined to be the largest exponent
of t in the expression of p(t) with a nonzero coefficient.
3. Two polynomials p(t) and g(t) are equal
p(t) = g(t)
if both have the same degree and the coefficients of like powers of t are
equal.
4. If F is an infinite field (i.e. a field containing infinite elements), a polyno-
mial p(t) with coefficients in F are often regarded as a function p: F → F
and p or p(t), t ∈ F is called a polynomial function.
Furthermore, for a pair of polynomials p(t) as above and q(t) =

bm tm + · · · + b1 t + b0 with n ≥ m, the sum p + q of p and q is defined
as the addition
(p + q)(t) = p(t) + q(t)
= an tn + · · · + am+1 tm+1 + (am + bm )tm
+ · · · + (a1 + b1 )t + a0 + b0 .
The scalar product or multiplication αp of a scalar α ∈ F and p is defined as
(αp)(t) = αp(t)
= αan tn + · · · + αa1 t + αa0 .
Therefore, the set of all polynomials with coefficients from a field F
P(F)
is a vector space over F (see Sec. B.1), while the set of such polynomials of
degrees no more than n, a nonnegative integer,
Pn (F)
forms a vector subspace of dimension n + 1.
Let f (t) be a polynomial and g(t) a polynomial of non-negative degree.
Then, there exist unique polynomials q(t) and r(t) such that
1. the degree of r(t) is less than that of f (t), and
2. f (t) = q(t)g(t) + r(t).
This is the so-called Division Algorithm for Polynomials. It follows that,
t − a divides f (t) if and only if f (a) = 0, and such a is called a zero of f (t).
Any polynomial of degree n ≥ 1 has at most n distinct zeros.
A polynomial p(t) of positive degree is called irreducible if it cannot be
factored as a product of polynomials with coefficients from the same field F,
each having positive degree. If f (t) is irreducible and f (x) does not divide
another polynomial g(t) with coefficients from the same field, then f (t) and
g(t) are relatively prime. This means that no polynomial of positive degree
can divide both of them. In this case, there exist polynomials f1 (t) and
g1 (t) such that
f (t)f1 (t) + g(t)g1 (t) = 1 (constant polynomial 1).
If an irreducible polynomial p(t) divides the product g(t)h(t) of two poly-
nomials g(t) and h(t) over the same field F, in symbol
f (t) | g(t)h(t),
then f (t) | g(t) or f (t) | h(t).
A.5 Polynomials 689
Unique Factorization Theorem for Polynomials

For any polynomial p(t) ∈ P (F) of positive degree, there exist
1. a unique constant c,
2. unique irreducible polynomials p1 (t), . . . , pn (t) whose leading coefficients
are equal to 1 (the so-called monic polynomial), and
3. unique positive integers r1 , . . . , rn ,
such that
p(t) = cp1 (t)r1 · · · pn (t)rn .
√
For an infinite field F such as Q, Q( 2), R and C, two polynomials f (t)
and g(t) are equal if and only if there are more than n distinct scalars
a1 , . . . , am such that
f (ai ) = g(ai ), 1≤i≤m
where n is the larger of the degrees of f (t) and g(t).
In this book, polynomials concerned in most cases are polynomials with
real coefficients and of degree 1, 2 or 3.
For quadratic polynomial p(x) = ax2 + bx + c with a, b, c in R and x as
a real variable,
% √ &% √ &
−b + b2 − 4ac −b − b2 − 4ac
p(x) = a x − x− .
2a 2a
In case
1. b2 − 4ac > 0, p(x) has two distinct real zeros;
2. b2 − 4ac = 0, p(x) has two equal real zeros; and
3. b2 − 4ac < 0, p(x) has two conjugate complex zeros.
As for cubic polynomial p(x) = ax3 + bx2 + cx + d with a, b, c, d in R
and x as a real variable,
1. it always has at least one real zero, and
2. if it has only one real zero, the other two zeros are conjugate complex
numbers.
APPENDIX B
Fundamentals of Algebraic Linear Algebra
This appendix is divided into twelve sections. Among them, Secs. B.1–B.6
are devoted to static structures of vector spaces themselves, while Secs. B.7–
B.12 are mainly concerned with dynamic relations between vector spaces,
namely the study of linear transformations. Most topics are stated in the
realm of finite dimensional vector spaces. From Sec. B.4 on, some exercise
problems are attached as parts of the contents. Few geometric interpreta-
tions of abstract results are touched and the methods adopted are almost
purely algebraic. Essentially no proofs are given. In short, the manner pre-
sented is mostly adopted in the nowadays linear algebra books.
A vector space V over a field F is defined axiomatically in Sec. B.1, with
Fn , Inp and M(m, n; F) as concrete examples along with subspace operations.
Via the techniques of linear combination, dependence and independence
introduced in Sec. B.2, Sec. B.3 introduces the basis and dimension for
a vector space. The concept of matrices over a field and their algebraic
operations are in Sec. B.4, and the elementary row or column operations
on a matrix are in Sec. B.5. The determinant function on square matrices
is sketched in Sec. B.6.
Section B.7 is devoted to linear transformation (functional, opera-
tor, or isomorphism) and its matrix representation with respect to bases.
Section B.8 investigates a matrix and its transpose, mostly from the view-
point of linear transformations. Inner product spaces with specified linear
operators on them such as orthogonal, normal, etc. are in Sec. B.9. Eigen-
values and eigenvectors are in Sec. B.10, while Sec. B.11 investigates the
diagonalizability of a matrix. For nondiagonalizable matrices, their Jordan
and rational canonical forms are sketched in Sec. B.12. That is all!
B.1 Vector (or Linear) Spaces
Definition A vector or linear space V over a field F consists of a set

(usually, still denoted by V ) on which two operations (called addition and
691
692 Fundamentals of Algebraic Linear Algebra
scalar multiplication, respectively) are defined so that for each pair of ele-
ments x,
y in V , there is a unique element

x +
y (called the sum of
x and
y)
in V , and for each element α in F and each element

x in V , there is a
unique element
α
x (called the scalar product of
x by α)
in V , such that the following conditions hold for all

x, z ∈ V and
y,
α, β ∈ F:
(1) Addition
(a) (commutative) x + y = y +
x.
(b) (associative) ( x + y ) + z =

x + (
y +z ).

(c) (zero vector) There is an element, denoted by 0 , in V such that

x+ 0 =
x.
x ∈ V , there exists
(d) (negative or inverse vector of a vector) For each
another element, denoted by − x , in V such that

x + (−
x) = 0 .
(2) Scalar multiplication
(a) 1
x = x.
(b) α(β
x ) = (αβ)
x.
(3) The addition and scalar multiplication satisfy the distributive laws:
(α + β)
x = α
x + β
x,
α(
x +
y ) = α
x + α
y.
The elements of the field F are called scalars and the elements of the
vector space V are called vectors. The word “vector”, without any practical
meaning such as displacement or acting force, is now being used to describe
any element of a vector space.
If the underlying field F is the real field R or the complex field C, then
the corresponding vector space is called specifically a real or a complex
vector space, respectively.
A vector space will frequently be discussed without explicitly mentioning
its field of scalars.
B.1 Vector (or Linear) Spaces 693
Some elementary consequences of the definition of a vector space are

listed as follows:
1. (cancellation law) If
x,
y,z ∈ V and x +
z =
y +
z , then
x =
y.

2. Then, zero vector 0 is unique.
3. Negative − x of a vector x is unique.

4. α x = 0 ⇔ α = 0 or x = 0.

5. (−α)x = −(α x ) = α(−x ).
Examples
For a given field F and a positive integer n, define the set of all n-tuples
with entries from F as the set
Fn = {(x1 , . . . , xn ) | xi ∈ F for 1 ≤ i ≤ n}.
If y = (y1 , . . . , yn ) are in Fn , define the operations of

x = (x1 , . . . , xn ) and
componentwise addition and multiplication as

x +
y = (x1 + y1 , . . . , xn + yn ),
α
x = (αx1 , . . . , αxn ), α ∈ F.
Fn is then a vector space over F with

0 = (0, . . . , 0),
− x = (−x1 , . . . , −xn ).

The following specified vectors and notations for them will be used through-
out the whole book:

ei = (0, . . . , 0, 1, 0, . . . , 0), 1 ≤ i ≤ n.
↑ ith component
In particular, we have vector spaces
Rn , Cn and Inp (p a prime), n ≥ 1.
By Sec. A.5, the set P (F) of all polynomials with coefficients from a
field F is a vector space over F.
Let X be a nonempty set and F a field. The set of all functions from X
into F
F(X, F) = {function f : X → F}
is a vector space over F under the operations f + g and αf defined by

(f + g)(x) = f (x) + g(x),
(αf )(x) = αf (x), α ∈ F, x ∈ X.
In real analysis, the set C[a, b] of continuous functions on an interval [a, b]
forms a vector space. This is an important vector space.
The set M(m, n; F) of m × n matrices with entries from a field F forms
a vector space over F. See Sec. B.4.
A nonempty subset S of a vector space V over a field F is called a vector
or linear subspace of V if S is a vector space over F under the original
addition and scalar multiplication defined on V . This is equivalent to say
that, if
1.
x and y are in S, then
x +
y is in S, and
2. for each α ∈ F and x ∈ S, then α

x ∈ S.
Then S is simply called a subspace of V .

Any vector space V has the zero subspace { 0 } and itself as trivial sub-
spaces. Subspaces carry the following three operations with themselves:
1. Intersection subspace If S1 and S2 are subspaces of V , then
S1 ∩ S2 = {
x ∈ V |
x ∈ S1 and
x ∈ S2 }
is a subspace of V . Actually, for an arbitrary family {Sλ }λ∈∧ of subspaces
of V , the intersection
4
Sλ
λ∈∧
is always a subspace of V .
2. Sum subspace If S1 and S2 are subspaces of V , then
S1 + S2 = { x2 |
x1 + x1 ∈ S1 and
x2 ∈ S2 }
is a subspace which has S1 and S2 as its subspaces. In case

S1 ∩ S2 = { 0 }, denote S1 + S2 by
S1 ⊕ S 2
and is called the direct sum of S1 and S2 .
x ∈ V , the coset of
3. Quotient space of V modulus a subspace S For any
S containing x is the set
x + S = {

v |
x + v ∈ S}.
B.2 Main Techniques: Linear Combination, Dependence and Independence 695
Then
x +S = x −
y + S if and only if y ∈ S. The quotient set
V /S = {
x + S |x ∈ V }
forms a vector space over F under the well-defined operations:
(
x1 + S) + (
x2 + S) = (
x1 +
x2 ) + S, and

α( x + S) = α x + S,

with 0 + S = S acts as zero vector in it.
For example, let m and n be integers with m < n and

F̃m = {(x1 , . . . , xm , 0, . . . , 0) | xi ∈ F for 1 ≤ i ≤ m}.
Then F̃m is a subspace of Fn . F̃m can be identified with Fm in isomorphic
sense (see Sec. B.7).
In R2 , let S be the subspace defined by the line ax + by = 0. Then, the
quotient space R2 /S is the set of all lines (including S itself) parallel to S,
which, in turn, can be identified with the line bx − ay = 0 in isomorphic
sense.
B.2 Main Techniques: Linear Combination, Dependence

and Independence
How to construct subspaces of a vector space? What are the forms they
exist in the simplest manner?
For any finite number of vectors x1 , . . . ,
xk in a vector space V over a
field F and any scalars α1 , . . . , αk ∈ F, the finite sum

k
x1 + · · · + αk
α1 xk or in short αi
xi
i=1

is called a linear combination of the vectors x1 , . . . ,
with coefficients xk
α1 , . . . , αk .
Let S be any nonempty subset of V . The set of linear combinations of
(finitely many) vectors in S
S = {finite linear combinations of vectors in S}
is a subspace of V and is called the subspace generated or spanned by S. In
case S = { xk } is a finite set, we would write
x1 , . . . ,
xk
x1 , . . . ,
instead of { xk }.
x1 , . . . ,
It could happen that most of vectors in S can be written as linear

x ∈S
combinations of elements from a certain special subset of S. Suppose
is of the form

x1 + · · · + αk
x = α1 xk .
Then, in a certain problem concerning the set of vectors { x, xn },
x1 , . . . ,

the role of x becomes inessential because it can be determined by the
remaining x1 , . . . ,
xk through the process of linear combination. Note also
that, the formula can be equivalently written as

α x1 + · · · + αk
x + α1 xk = 0 ,
where α = −1, α2 , . . . , αk are not all equal to zero. This latter expression
indicates implicitly that one of x,
x1 , . . . ,
xk , say
x in this case, depends
linearly on the others. Hence, here comes naturally the
Definition A nonempty subset S of a vector space V is said to be linearly
dependent if there exist a finite number of distinct vectors
x1 , . . . ,
xk in S
and scalars α1 , . . . , αk , not all zero, such that

x1 + · · · + αk
α1 xk = 0 .
Vectors in S are also said to be linearly dependent.
If S is linearly dependent, then at least one of the vectors in S can be
written as a linear combination of other vectors in S. This vector is said
to be linearly dependent on these others. Thus, those vectors in S that are
linearly dependent on the others in S play no role in spanning the subspace
S and hence, can be eliminated from S without affecting the generation
of S. This process can be proceeded until what remain in S are not
linearly dependent any more. These remaining vectors, if any, are linearly
independent and are good enough to span the same S.
Definition A nonempty subset S of a vector space V is said to be linearly
independent if for any finite number of distinct vectors
x1 , . . . ,
xk in S, the
relation

x1 + · · · + αk
α1 xk = 0
holds if and only if the scalars α1 , . . . , αk are all zero. In this case, vectors
in S are said to be linearly independent.
Note the following trivial facts:

1. { 0 } is linearly dependent.
2. Any nonzero vector alone is linearly independent.
B.3 Basis and Dimension 697
3. Let S1 ⊆ S2 ⊆ V . If S1 is linearly dependent, so is S2 . Thus, if S2 is

linearly independent, so is S1 .
B.3 Basis and Dimension

We start from the vector space Fn stated in Sec. B.1.
Each vector x = {x1 , . . . , xn } ∈ Fn can be expressed as a linear combi-
nation of e1 , . . . ,

en as

n

e1 + · · · + xn
x = x1 en = xi
ei .
i=1
Also, { en } is linearly independent, i.e.

e1 , . . . ,

n

xi
ei = 0
i=1
would imply that x1 = · · · = xn = 0. Such a linearly independent set

{ en }, that generates the whole space Fn is called a basis for Fn ,
e1 , . . . ,
and the number n of vectors in a basis is called the dimension of Fn .
Vector spaces are divided into finite-dimensional and infinite-
dimensional according to the fact that if they can be generated by a finite
subset of itself or not.
A basis B for a vector space V is a linearly independent subset that gen-
erates V . Vectors in B are called basis vectors of V . Note that every vector
in V can be expressed uniquely as a linear combination of vectors of B.
Every vector space would have at least one basis.
The most powerful tool to determine if a basis exists is the following
Steinitz’s Replacement Theorem Let { xn } be a finite set of
x1 , . . . ,
vectors in a vector space V. Suppose y1 , . . . , ym ∈

xn are linearly
x1 , . . . ,
independent. Then
1. m ≤ n, and
2. m vectors among x1 , . . . ,
xn , say
x1 , . . . ,
xm , can be replaced by
y1 , . . . , ym so that { y1 , . . . , ym , xm+1 , . . . ,

xn } still generates the same
subspace xn , i.e.
x1 , . . . ,

y1 , . . . ,
ym , xn =
xm+1 , . . . , x1 , . . . ,
xm , xn .
xm+1 , . . . ,
Now, suppose that V is finite-dimensional and V has at least one nonzero
vector.
Take any nonzero vector x1 ∈ V . Construct subspace x1 . In case
V = x1 and B = { x1 } is a basis for V . If

x1 V (this means

x1 ⊆ V but x1 = V ), then there exists a vector x2 ∈ V − x1 and,
x1 and x2 are linearly independent. Construct subspace x1 , x2 . Then,

either V = x2 and B = {

x1 , x2 } is a basis for V , or there exists
x1 ,
x3 ∈ V − x1 , x2 so that x1 , x2 , x3 are linearly independent. Continue

this procedure as it will stop after a finite number of steps, say n. Therefore,
V = xn ,
x1 , . . . ,
where xn are linearly independent. Then B = {

x1 , . . . , xn } is a basis
x1 , . . . ,
for V , and is called an ordered basis if the ordering x1 , . . . ,

xn is emphasized.
The number of basis vectors in any basis for V is the same positive
integer for any two bases for V , say n, and is called the dimension of V and
denoted by
dim V = n.
In this case, V is said to be an n-dimensional vector space over a field F.

If V = { 0 }, define dim V = 0 and V is called 0-dimensional.
If W is subspace of V , then
dim W ≤ dim V
and equality holds if and only if W = V . Furthermore, any basis for W can
be extended to a basis B for V . In other words, there exists another vector
subspace U of V such that
V = W ⊕ U.
For any two subspaces W1 and W2 of V , where dim V is finite
dim(W1 ∩ W2 ) + dim(W1 + W2 ) = dim W1 + dim W2
holds.
If V is infinite-dimensional, it can be proved, by Zorn’s lemma from set

theory, that V has a basis which is a maximal linearly independent subset
of V . Every basis has the same cardinality.
In such an elementary book like this one, we will study finite-dimensional
vector spaces only unless specified.
Example
Let Pn (F), n ≥ 0, be as in Sec. A.4 where F is an infinite field.
The set {1, t, . . . , tn } forms a basis for Pn (F).
B.4 Matrices 699
Take any fixed but distinct scalars a0 , a1 , a2 , . . . , an in F. The polyno-

mial of degree n
(t − a0 ) · · · (t − ai−1 )(t − ai+1 ) · · · (t − an )
pi (t) =
(ai − a0 ) · · · (ai − ai−1 )(ai − ai+1 ) · · · (ai − an )
=n
t − aj
= , 0≤i≤n
a − aj
j=0 i
j=i
is uniquely characterized as the polynomial function pi : F → F satisfying

1, i = j
pi (aj ) = .
0, i = j
The polynomials p0 , p1 , . . . , pn are called the Lagrange polynomials associ-

ated with a0 , a1 , . . . , an . It is easily seen that {p0 , p1 , . . . , pn } is a basis for
the (n + 1)-dimensional vector space Pn (F). Every polynomial p ∈ Pn (F) is
uniquely expressed as

n
p= p(ai )pi ,
i=0
which is called the Lagrange interpolation formula in the sense that, if

α0 , α1 , . . . , αn are arbitrarily given (n + 1) scalars in F, then this p is the
unique polynomial having the property that
p(ai ) = αi , 0 ≤ i ≤ n.
B.4 Matrices
Let m and n be positive integers.
An m × n matrix with entries from a field F is an ordered rectangular
array of the form
   
a11 a12 · · · a1n a11 a12 · · · a1n
 a21 a22 · · · a2n   a21 a22 · · · a2n 
   
 . . .  or  . .. ..  ,
 .. .. ..   .. . . 
am1 am2 · · · amn am1 am2 · · · amn
where each entry aij , 1 ≤ i ≤ m, 1 ≤ j ≤ n, is an element of F. Capital

letters such as A, B and C, etc. are used to denote matrices. The entries
ai1 , ai2 , . . . , ain of the matrix A above compose the ith row of A and is
denoted by
Ai∗ = (ai1 ai2 · · · ain ), 1≤i≤m
and is called a row matrix. Similarly, the entries a1j , a2j , . . . , amj of A com-
pose the jth column of A and is denoted by
   
a1j a1j
a  a 
 2j   2j 
A∗j =  .  or 
  
 ..  , 1 ≤ j ≤ n
 ..   . 
amj amj
and is called a column matrix. The entry aij which lies in the ith row and
jth column is called the (i, j) entry of A. Then, the matrix A is often written
in shorthand as
Am×n or A = [aij ]m×n or (aij )m×n or A = [aij ].
Matrices are used to describe route maps in topology and networks,
and to store a large quantity of numerical data on many occasions. They
appear seemingly naturally in the treatment of some geometrical problems.
Actually, if we endow this notation with suitable operations, the static and
dynamic properties of matrices will play the core of study about finite-
dimensional vector spaces in linear algebra.
Matrices with entries in R or C are called real or complex matrices,
respectively.
Two m × n matrices A = [aij ] and B = [bij ] are defined to be equal if
and only if aij = bij for 1 ≤ i ≤ m, 1 ≤ j ≤ n and is denoted as
A = B.
The m × n matrix having each entry aij equal to zero is called the zero
matrix and is denoted by
O.
The n × m matrix obtained by interchange m rows into m columns and
n columns into n rows of a m × n matrix A = [aij ] is called the transpose
of A and is denoted by
A∗ = [bji ] where bji = aij , 1 ≤ i ≤ m, 1 ≤ j ≤ n.
An n × n matrix A = [aij ]n×n is called a square matrix of order n
with aii , 1 ≤ i ≤ n, as its (main) diagonal entries.
B.4 Matrices 701
The following are some special square matrices.
1. Diagonal matrix aij = 0 for 1 ≤ i, j ≤ n but i = j. It is of the form
 
a11 0
 a22 
 
 .. 
 . 
0 ann
and is denoted by diag[a11 , . . . , ann ].

2. Identity matrix In of order n This is the diagonal matrix with diagonal
entries aii = 1, 1 ≤ i ≤ n and is denoted by
 
1 0
 1 
 
In =  .. .
 . 
0 1
3. Scalar matrix αIn with α ∈ F This is the diagonal matrix with

a11 = a22 = · · · = ann , say equal to α, and is denoted by
 
α 0
 α 
 
αIn =  .. .
 . 
0 α
4. Upper triangular matrix aij = 0 for 1 ≤ j < i ≤ n. It is of the form

 
a11 a12 a13 ··· a1n
 
 a22 a23 ··· a2n 
 
 a33 · · · a3n 
 .
 .. 
 .. 
 0 . . 
ann
5. Lower triangular matrix aij = 0 for 1 ≤ i < j ≤ n.

6. Symmetric matrix aij = aji , 1 ≤ i, j ≤ n, i.e. A∗ = A.
7. Skew-symmetric matrix aij = −aji , 1 ≤ i, j ≤ n, i.e. A∗ = −A. For

infinite field F or a finite field with characteristic p = 2, the main diagonal
entries aii of a skew-symmetric matrix are all zero.
Let A = [aij ]m×n be a complex matrix. Then Ā = [āij ]m×n is called the
conjugate matrix of A and the n × m matrix
Ā∗ = [āji ]n×m
the conjugate transpose of A.
8. Hermitian matrix A complex square matrix A satisfying
Ā∗ = A
is called Hermitian.
9. Skew-Hermitian matrix This means a complex square matrix A having
the property
Ā∗ = −A.
In this case, the main diagonal entries of A are all pure imaginaries.
Hermitian or skew-Hermitian matrices with real entries are just real sym-
metric or real skew-symmetric matrices, respectively.
Let
M(m, n; F) = {m × n matrices with entries in the field F}.
On which two operations are defined as follows.
1. Addition For each pair of matrices A = [aij ], B = [bij ] ∈ M(m, n; F), the
sum A + B of A and B is the m × n matrix
A + B = [aij + bij ]m×n
obtained by entry-wise addition of corresponding entries of A and B.

2. Scalar multiplication For each α ∈ F and A = [aij ] ∈ M(m, n; F), the
scalar product of A by α is the m × n matrix
αA = [αaij ]m×n .
B.4 Matrices 703
They enjoy the following properties:
A + B = B + A;
(A + B) + C = A + (B + C);
A+O =A (O is zero m × n matrix);
A + (−A) = O (−A means (−1)A);
1A = A;
α(βA) = (αβ)A;
α(A + B) = αA + αB;
(α + β)A = αA + βA.
Therefore, M(m, n; F) is an mn-dimensional vector space over F with
 
0
 .. 
 
 0 . 0 
 
 0 
 
Eij = 0 ··· 0 1 0 · · · 0 ← ith row, 1 ≤ i ≤ m, 1 ≤ j ≤ n
 
 0 
 
 .. 
 0 . 0 
0
↑
jth column
as basis vectors.
Owing to the same operational properties, a 1 × n row matrix is also
regarded as a row vector in Fn and vice versa, while a m × 1 column matrix
as a column vector in Fm and vice versa.
In order to define matrix multiplication in a reasonable way, we start
from a simple example.
Suppose a resort island has two towns A and B. The island bus company
runs a single bus which operates on two routes:
1. from A to B in either direction;

2. a circular route from B to B along the coast.
See the network on the Fig. B.1. By a single stage journey we mean either
A → B or B → A, or B → B and can be expressed in matrix form
(to)
AB

A 0 1
(from) .
B 1 1
B A
Fig. B.1
How many routes can a tourist choose for a second-stage journey? Suppose
the bus starts at A, then the possible routes are
A
1
-B
A PP
1st stage PP
2nd stage
q
P
B
If the bus starts at B, then the possible routes are
A
1 -
B

B
PP
*
PP
q B
P
1st stage PP
PP
2nd stage qA
P
The situation can be described simply by using matrix notation such as
AB AB AB

A 0 1 A 0 1 A 1 1
=
B 1 1 B 1 1 B 1 2
or just

0 1 0 1 1 1
= .
1 1 1 1 1 2
Do you see how entries 1, 1, 1, 2 of the right matrix came out? Readers are
urged to try more complicated examples from your daily livings.
B.4 Matrices 705
For an m × n matrix A = [aij ]m×n and an n × p matrix B = [bjk ]n×p

the product AB of A following by B is an m × p matrix whose (i, k) entry
is equal to the sum of the componentwise product of the corresponding
entries of ith row of A and kth column of B, i.e.
n
AB = [cik ]m×p , where cik = aij bjk = Ai∗ B∗k .
(def.)
j=1
Note that this multiplication operation is defined only if the number of

columns of the left-handed matrix A is equal to the number of rows of the
right-handed matrix B. BA is not defined except p = m. Multiplication
1. (AB)C = A(BC);
2. A(B + C) = AB + AC, (A + B)C = AC + BC;
3. (Identity matrix) If A ∈ M(m, n; F), then
Im A = AIn = A.
In case m = n, the vector space M(n; F) with this operation of multiplica-
tion between pair of its elements is said to be an associative algebra with
identity In .
The problems in the following exercises are designed for the reader to
increase their abilities in the manipulation of matrices. Some of them will
be discussed in the text.
Exercises
1. Unusual properties of matrix multiplication In M(2; R) or even
M(m, n; F), m, n ≥ 2, the followings happen (refer to (1) in (1.2.4) and
Sec. A.3).
(a) AB = BA. For example,

1 1 1 0
A= and B = , or
0 0 1 0
 
1 1
1 0 0
A = 0 0 and B = .
1 0 1
1 0
(b) There exist A = O and B = O for which AB = O. For example,

1 0 0 0
A= and B = .
0 0 0 1
In this case, BA = O holds too.
(c) There exist A = O and B = O so that AB = O but BA = O. For

example,

1 0 0 0
A= and B = .
0 0 1 1
(d) There exist A = O, B and C such that AB = AC but B = C. For
example,

1 0 0 0 0 0
A= , B= and C = .
0 0 1 1 0 1
(e) A2 = O but A = O where A2 = AA. For example,

0 1
A= .
0 0
2. (continued) Do the following problems.
(a) Construct 2 × 2 matrices A = O, X and Y such that AX = XA =
AY = Y A but X = Y .
(b) Construct 2 × 2 matrices A and B with nonzero entries such that
AB = O.
(c) Construct 2×2 matrices A = O and B = O for which A2 +B 2 = O.
(d) Construct a 2 × 2 matrix A with nonzero entries such that AB = I2
and BA = I2 for any 2 × 2 matrix B.
3. Diagonal matrices.
(a) Show that the following are equivalent: for a square matrix A of
order n,
(1) A commutes with all n × n matrices B, i.e. AB = BA.
(2) A commutes with all Eij , 1 ≤ i, j ≤ n.
(3) A is a scalar matrix.
(b) Any two diagonal matrices (of the same order) commute.
(c) If a matrix A commutes with a diagonal matrix diag[a1 , . . . , an ]
where ai = aj , i = j, then A itself is diagonal.
(d) Show that the set V of all diagonal matrices of order n forms a
vector subspace of M(n; F) with Eii , 1 ≤ i ≤ n, as a basis. Try to
find another subspace W so that
M(n; F) = V ⊕ W.
4. Fix a matrix A ∈ M(n; F). The set of matrices that commute with A
V = {Z ∈ M(n; F) | AZ = ZA}
B.4 Matrices 707
is a subspace.

1 0
(a) Let A = −1 0 . Show that

a22 − a21 0
V = a , a ∈ F .
a22
21 22
a21
Find a basis for V . What is dim V ?
(b) Let
 
0 1 0
A = 0 0 1 .
0 0 0
Determine the corresponding V and find a basis for V .
5. Let A ∈ M(n; F) and p ≥ 1 be an integer. The pth power of A is
Ap = AA · · · A (p times)
0
and A is defined to be the identity matrix In . Obviously,
Ap Aq = Aq Ap = Ap+q , and
p q pq
(A ) = A
hold for any nonnegative integers p, q. In the following, A = [aij ] ∈
M(2; F).
(a) Find all such A so that A2 = O.
(b) All A so that A3 = O.
(c) All A so that Ap = O where p ≥ 3 is an integer.
6. A matrix A ∈ M(n; F) is said to be idempotent if
A2 = A.
(a) Show that the matrices
 
1 2 2
1 0 0 0 −1
and
0 0
0 0 1
are idempotent.
(b) Show that
   
2 −2 4 −1 2 4
A = −1 3 4 , B=  1 −2 −4
1 −2 −3 1 2 −4
are idempotent and AB = O. What is BA?
(c) Determine all diagonal matrices in M(n; R) that are idempotent.

(d) If A is idempotent and p is a positive integer, then Ap = A.
(e) Suppose that AB = A and BA = B. Show that A and B are
idempotent.
(f) Suppose A and B are idempotent. Show that A + B is idempotent
if and only if AB + BA = O.
(g) Determine all real 2 × 2 matrices which are idempotent.
7. A matrix A ∈ M(n; F) is called nilpotent if there exists a positive integer
p such that
Ap = O.
Then, for any integer q ≥ p, Aq = O also holds. Hence, the least
positive integer p for which Ap = O is referred to as the degree or index
of nilpotency of A.
(a) Show that
 
    0 1 0 0
1 2 3 −4 4 −4 0
 1  0 1 0
2 3 ;  1 −1 1 ; 0 0 0 1
−1 −2 −3 5 −5 5
0 0 0 0
are nilpotent of respective degree 2, 3, 4.
(b) A square matrix A = [aij ]n×n , aij = 0 for 1 ≤ j ≤ i ≤ n, i.e.
an upper triangular matrix having zeros on the main diagonal, is
nilpotent.
(c) Suppose A is nilpotent of degree 2. Show that for any positive
integer p, A(In + A)p = A holds.
(d) If A is a nilpotent n × n matrix and A = O, show that there exists
a nonzero square matrix B so that AB = O. Try to prove the case
that n = 2.
(e) Determine all real nilpotent 2 × 2 matrices of degree 2. Find some
complex nilpotent 2 × 2 matrices.
8. Let A = [aij ] ∈ M(2; R). Solve the matrix equation
XA = O,
where X ∈ M(2; R).
9. A matrix A is called involutory if
A2 = In ,
which is equivalent to (In − A)(In + A) = O.
B.4 Matrices 709
(a) Show that

cos θ sin θ
, θ∈R
− sin θ cos θ
is involutory.
(b) Show that
 
3 − 4a 2 − 4a 2 − 4a
−1 + 2a 2a −1 + 2a , a∈C
−3 + 2a −3 + 2a −2 + 2a
is involutory.
(c) Show that 12 (In + A) is idempotent if A is involutory.
(d) Determine all real 2 × 2 involutory matrices and find some complex
ones.
10. Find all real 2 × 2 matrices A for which A3 = I2 . How complicated it
might be if A is a complex matrix?
11. Find all real 2 × 2 matrices A such that A4 = I2 .
12. Upper and lower triangular matrices
(a) A product of two upper (or lower) triangular matrices is upper (or
lower) triangular.
(b) Show that

a11 a12 1 0 c11 c12
= .
a21 a22 b21 1 0 c22
Find out the relations among aij , b21 and cij .
(c) Show that the set of upper triangular matrices
V = {A ∈ M(n; F) | A is upper triangular}
is a vector subspace of M(n; F). Find a basis for V and another
subspace W of M(n; F) such that
M(n; F) = V ⊕ W.
(d) Do (c) for lower triangular matrices.
13. Transpose
(a) Show that the operator ∗: M(m, n; F) → M(n, m; F) defined by
∗(A) = A∗
is a linear isomorphism (see Sec. B.7), i.e. ∗ is 1-1, onto and
(A + B)∗ = A∗ + B ∗ ,
(αA)∗ = αA∗ for α ∈ F.
(b) Show that
(A∗ )∗ = A,
(AB)∗ = B ∗ A∗ .
14. Symmetric and skew-symmetric matrices

(a) Let A = [aij ] ∈ M(n; R). Thus,
1 1
A= (A + A∗ ) + (A − A∗ ),
2 2
where 12 (A + A∗ ) is symmetric while 12 (A − A∗ ) is skew-symmetric.

(b) A is symmetric and skew-symmetric simultaneously if and only if
A = O.
(c) Find symmetric matrices A and B for which AB is not symmet-
ric. If A and B are symmetric such that AB = BA, then AB is
symmetric. Is AB = BA a necessary condition? Are there similar
results for skew-symmetric matrices?
(d) Suppose A and B are symmetric matrices.
(1) A + B, AB + BA and ABA are symmetric but AB − BA is
skew-symmetric.
(2) αA is symmetric for α ∈ R.
(3) Ap is symmetric for any positive integer p.
(4) P AP ∗ is symmetric for any n × n matrix P .
(5) P AP −1 is symmetric for any orthogonal n × n matrix P (see
Ex. 18 below).
(e) Show that (1), (2), (4) and (5) in (d) still hold if “symmetric” is
replaced by “skew-symmetric”. How about ABA and (3)?
(f) Let
V1 = {A ∈ M(n; R) | A = A∗ }, and
∗
V2 = {A ∈ M(n; R) | A = −A }.
Show that V1 is a 12 n(n+1)-dimensional subspace of M(n; R), while

V2 is 12 n(n − 1)-dimensional. Find a basis for V1 and a basis for V2 .
Then, prove that
M(n; R) = V1 ⊕ V2 .
B.4 Matrices 711
(g) Let
V3 = {A = [aij ] ∈ M(n; R) | aij = 0 for 1 ≤ i ≤ j ≤ n}
and V1 be as in (f). Show that V3 is a 12 n(n − 1)-dimensional sub-
space of M(n; R) and
M(n; R) = V1 ⊕ V3 .
15. Rank of a matrix (see Sec. B.5 and Ex. 2 of Sec. B.7)
Let A = [aij ] ∈ M(m, n; F) and A = O. The maximal number of linear
independent row (or column) vectors of A is defined as the row (or
column) rank of A. It can be show that, for any matrix A = O,
row rank of A = column rank of A = r(A).
This common number r(A) is called the rank of A. Define the rank of
O to be zero. Therefore 0 ≤ r(A) ≤ m, n. Try to prove this result in
case A is a 2 × 3 or 3 × 3 real matrix.
16. Invertible matrix and its inverse matrix
A matrix A = [aij ] ∈ M(n; F) is said to be invertible (refer to (2.4.2))
if there exists another n × n matrix B such that
AB = BA = In .
In this case, B is called the inverse matrix of A and is denoted by
A−1 .
(a) A−1 is unique if A is invertible, and (A−1 )−1 = A.
(b) A is invertible if and only if A∗ is invertible, and (A∗ )−1 = (A−1 )∗ .
(c) If A and B are invertible, then AB is invertible and
(AB)−1 = B −1 A−1 .
(d) The following are equivalent: A = [aij ]n×n
(2) The row rank of A is n.
(3) The column rank of A is n.
(4) The linear transformation f : Fn → Fn defined by
f (
x) =
xA
is a linear isomorphism (see Sec. B.7).
(5) The linear transformation g: Fn → Fn defined by
g(
y ) = A
y (
y is considered as a column vector)
is a linear isomorphism (see Sec. B.7).

(6) The homogenous equation x ∈ Fn ) has zero solution
x A = 0 (

0 only. So is A y = 0 .
(7) There exists an n × n matrix B such that AB = In .
(8) There exists an n × n matrix B such that BA = In .
(9) The matrix equation XA = O has zero solution X = O only.
So is AY = O.
(10) The matrix equation XA = B always has a unique solution.
So is AY = B.
(11) The determinant of A satisfies
det A = 0
and thus det A−1 = (det A)−1 .
These results extend those stated in Exs. <A> 2 and 4 of
Sec. 2.4. Try to prove these in case A is a 2 × 2 or 3 × 3 real matrix.
(e) State equivalent conditions for an n × n matrix A to be not invert-
ible which is called singular.
(f) Suppose A is invertible. For positive integer p, Ap is invertible and
(Ap )−1 = (A−1 )p .
Therefore, extend the power of an invertible matrix A to negative
exponent:
A−p = (A−1 )p , p > 0.
(g) Suppose A, B ∈ M(n; F) such that AB is invertible. Then A and B
are invertible.
17. Suppose A and B are invertible. Find such A and B so that A + B is
invertible. If, in addition, A+B is invertible, then show that A−1 +B −1
is invertible and
(A−1 + B −1 )−1 = A(A + B)−1 B = B(A + B)−1 A.
18. Let A ∈ M(n; R) be skew-symmetric.
(a) Then In − A is invertible. Try to prove this directly if n = 2. Is
aIn + A invertible for all a = 1 in R?
(b) The matrix
B = (In + A)(In − A)−1
satisfies BB ∗ = B ∗ B = In , i.e. B ∗ = B −1 . A real matrix such as
B is called an orthogonal matrix.
B.4 Matrices 713
19. Let A ∈ M(m, n; F) and B ∈ M(n, m; F).

(a) In case m = 2 and n = 1, show directly that AB is singular.
(b) Suppose m > n. Then AB is singular.
(c) If the rank r(A) = m < n, then AA∗ is invertible while A∗ A is
singular.
(d) If the rank r(A) = n < m, then AA∗ is singular while A∗ A is
invertible.
(e) AA∗ − aIn is invertible for a < 0 in R if A is a real matrix.
(f) r(AA∗ ) = r(A∗ A) = r(A).
20. Suppose A ∈ M(n; F) and there exists a positive integer k such that
Ak = O. Prove that In − A is invertible and
(In − A)−1 = In + A + A2 + · · · + Ak−1 .
21. (a) If symmetric matrix A is invertible, then A−1 is symmetric.

(b) If skew-symmetric matrix A is invertible, then so is A−1 .
22. If A = diag[a1 , a2 , . . . , an ] is a diagonal matrix, show that A is invertible

if and only if its main diagonal entries ai = 0, 1 ≤ i ≤ n. In this case,
A−1 = diag[a−1 −1 −1
1 , a2 , . . . , an ].
23. An upper (or lower) triangular matrix is invertible if and only if its
main diagonal entries are nonzero.
24. Let A = [aij ], B = [bij ] ∈ M(2; R) or M(2; C).
(a) Compute AB and BA.
a b
(b) Show that AB − BA = c −a for some a, b, c ∈ R.
(c) Prove that there does not exist any α = 0 in R such that
AB − BA = αI2 .
(d) Show that (AB − BA)2 is a scalar matrix by direct computation.

(e) Calculate (AB − BA)n for all positive integers n ≥ 3.
25. Trace of a square matrix
The mapping tr: M(n; F) → F defined by, for A = [aij ]n×n ,

n
tr A = aii
i=1
i.e. the sum of entries on A’s main diagonal, is called the trace of A.
(a) tr is a linear transformation (functional), i.e.

tr(A + B) = tr A + tr B;
tr(αA) = αtr A
for A, B ∈ M(n; F) and α ∈ F.
(b) tr(AB) = tr(BA). In fact, if A = [aij ]n×n and B = [bij ]n×n , then

n
tr(AB) = aij bji .
i,j=1
(c) If P is invertible, then tr(P AP −1 ) = tr A.

(d) tr A = tr A∗ .
26. In M(n; R), define
A, B = tr AB ∗ .
Then , has the following properties:
(1) A, A ≥ 0 with equality if and only if A = O.
(2) A, B = B, A.
(3) α1 A1 + α2 A2 , B = α1 A1 , B + α2 A2 , B for α1 , α2 ∈ R.
It is said that , defines an inner product on the vector space M(n; R)
(see Sec. B.9).
(Note In the complex vector space M(m, n; C), define
A, B = tr AB̄ ∗
where B̄ ∗ = [b̄ji ] is the conjugate transpose of B = [bij ]. Then,
(1) A, A ≥ 0 and = 0 ⇔ A = O.
(2) A, B = B, A.
(3) α1 A1 + α2 A2 , B = α1 A1 , B + α2 A2 , B.
And , is said to define an inner product on M(m, n; C).)
27. Miscellanea about trace Do the following problems, at least for n = 2
or n = 3.
(a) If A, B ∈ M(n; F) and A is idempotent, then tr(AB) = tr(ABA).
(b) If tr(AB) = 0 for all matrices B ∈ M(n; F), then A = O.
(c) Suppose both tr A = 0 and tr A2 = 0 for a 2 × 2 matrix A. Then,
A2 = O holds.
(d) If A2 = O for A ∈ M(2; F), then tr A = 0.
(e) If A ∈ M(2; R) is idempotent, then tr A is an integer and 0, 1, 2 are
the only possibilities. How about idempotent matrix A ∈ M(n; R)?
(f) If A ∈ M(2; R) is symmetric and nilpotent, then A = O.
B.4 Matrices 715
(g) If A1 , A2 , . . . , Ak are real n × n symmetric matrices and

k
i=1 Ai = 0, then, A1 = A2 = · · · = Ak = O.
2
tr
(h) If tr(ABC) = tr(CBA) for all C in M(n; F), then AB = BA.
28. Suppose A ∈ M(2; R) and tr A = 0. Show that there exists an invertible
matrix P such that

0 α
PAP −1 =
β 0
for some α, β ∈ R. Are α, β and P unique? Justify this result if

1 2
A=
3 −1
and try to explain it geometrically if possible.
29. Let
V = {A ∈ M(n; F) | tr A = 0}.
Then V is an (n2 − 1)-dimensional vector subspace of M(n; F) and

{−E11 + Eii | 2 ≤ i ≤ n} ∪ {Eij | 1 ≤ i, j ≤ n, but i = j} forms a basis
for V . Also
M(n; F) = V ⊕ E11 .
30. In M(2; R), let
W = {AB − BA | A, B ∈ M(2; R)}.
(a) Show that W is a subspace of V mentioned in Ex. 29.

(b) Let B = [bij ]2×2 . Shows that
E11 B − BE11 = b12 E12 − b21 E21 ,

E12 B − BE12 = b21 (E11 − E22 ) + (b22 − b11 )E12 ,
E21 B − BE21 = b12 (−E11 + E22 ) + (b11 − b22 )E21 ,
E22 B − BE22 = −b12 E12 + b21 E21
and hence shows that
{E11 − E22 , E12 , E21 }
forms a basis for W.

(c) Therefore, W = V holds. This means that, for any C ∈ M(2; R)
with tr C = 0, there exists A, B ∈ M(2; R) such that
C = AB − BA.

Note For any field F and n ≥ 2,
{A ∈ M(n; F) | tr A = 0}
= {AB − BA | A, B ∈ M(n; F)}
still holds as subspaces of M(n; F). In case F is a field of characteristic 0
(i.e. 1 + 1 + · · · + 1 = 0 for any finite number of 1) such as R and C, it
is not possible to find matrices A, B ∈ M(n; F) such that
AB − BA = In ,
(refer to Ex. 25). For field F of characteristic p (i.e. 1 + 1 + · · · + 1 = 0
for p’s 1) such as Ip = {1, 2, . . . , p − 1} where p is a prime, this does
happen. For example, in I3 = {0, 1, 2}, let
   
0 1 0 0 0 0
A = 0 0 1 and B = 1 0 0 .
0 0 0 0 2 0
Then
      
1 0 0 0 0 0 1 0 0
AB − BA = 0 2 0 − 0 1 0 = 0 1 0 = I3 .
0 0 0 0 0 2 0 0 −2
31. Similarity
Two square matrices A, B ∈ M(n; F) are said to be similar if there
exists an invertible matrix P ∈ M(n; F) such that
B = P AP −1 (or A = P −1 BP )
(refer to (2.7.25)). Use A ∼ B to denote that A is similar to B.
(a) Similarity is an equivalent relation (see Sec. A.1) among matrices
of the same order. That is,
(1) A ∼ A.
(2) A ∼ B ⇒ B ∼ A.
(3) A ∼ B and B ∼ C ⇒ A ∼ C.
(b) Similarity provides a useful tool to study the geometric behavior
of linear or affine transformations by suitable choices of bases (see
Secs. 2.7.6, 3.7.6, etc.). Algebraically, similarity has many advan-
tages in computation. For example,
B.4 Matrices 717
(1) Orthogonal similarity keeps symmetric and skew-symmetric

properties of matrices. (See Ex. 6(c) (4)–(6) of Sec. B.9.)
(2) Similarity preserves the determinants of matrices, i.e.
det A = det B.
(3) Similarity raises the power of a matrix easier, i.e.
B n = PAn P −1 ,
where one of An and B n such as diagonal matrices can be easily
computed.
(4) If B = P AP −1 is a scalar matrix, then so is A and AB = BA
holds.
(c) Fibonacci sequence a0 , a1 , a2 , . . . , an , . . . is defined by a
(1) recursive equation: an = an−2 + an−1 , n ≥ 2 with a
(2) boundary condition: a0 = a1 = 1.
How to determine explicitly the general term an which is expected
to be a positive integer? With matrix form,
2
0 1 0 1
(an−1 an ) = (an−2 an−1 ) = (an−3 an−2 )
1 1 1 1
n−1
0 1
= · · · = (a0 a1 ) , n ≥ 2.
1 1
The problem reduces to compute An where

0 1
A= .
1 1
Does An seem to be easily computed? If yes, try it. Fortunately, it
can be shown (see Ex. 8 of Sec. B.9 and Sec. B.11) as follows:
√ −1 √ √
1 1+2√5 1+ 5
2 0√ 1 1+2√5
A=
1 1−2 5 0 1− 5
2 1 1−2 5
√ −1
 * √ + n  √
1+ 5
1 1+ 5 0 1 1+ 5
⇒A =n √
2  2
* √ +n  2√
1 1−2 5 0 1− 5 1 1−2 5
2
By simple computation, the general term is

% 
 √ &n+1 % √ &n+1 
1 1+ 5 1− 5
an = √ − , n ≥ 0.
5 2 2 
Is this really a positive integer?

(d) Not every square matrix can be similar to a diagonal matrix. For
example, there does not exist an invertible matrix P such that

1 0 −1
P P
1 1
is a diagonal matrix. Why? Is
n
1 0 1 0
= , n≥1
1 1 n 1
correct?
32. Necessary condition for a matrix to be similar to a diagonal matrix
Let A = [aij ] ∈ M(n; F) and P be an invertible matrix so that
 
λ1 0
 .. 
P AP −1 =  . .
0 λn
Fix i, 1 ≤ i ≤ n, then
P AP −1 − λi In = P AP −1 − P (λi In )P −1 = P (A − λi In )P −1
 
λ1 − λi 0
 .. 
 . 
 
 λi−1 − λi 
 
 
= 0 
 
 λi+1 − λi 
 
 .. 
 . 
0 λn − λi
⇒ det(A − λi In ) = 0, 1 ≤ i ≤ n.
This means that the entries λi , 1 ≤ i ≤ n, of the diagonal matrix

diag[λ1 , . . . , λn ] are zeros of the polynomial

a11 − t a12 ··· a1n

a21 a22 − t · · · a2n

det(A − tIn ) = . .. ..
.. . .

a an2 · · · ann − t
n1
= (−1)n tn + αn−1 tn−1 + · · · + α1 t + α0

B.5 Elementary Matrix Operations and Row-Reduced Echelon Matrices 719
which is called the characteristic polynomial of A. Note that, if

x (A − λIn ) = 0, i.e.
det(A − λIn ) = 0, then the homogeneous equation

x A = λ
x,
has nonzero solutions (see Ex. 16(d) (11)). Such a nonzero x is called
an eigenvector corresponding to the eigenvalue λ. In particular, if
xi is

the ith row vector of P , then x i = 0 and

x iA = λi
x i, 1 ≤ i ≤ n.
(a) Justify the above by the example
−1
3 2 5 12 3 2 13 0
= .
2 −3 12 −5 2 −3 0 −13
(b) Suppose that there exists an invertible matrix P = [pij ] such that
√
5+ 33
1 2 −1 2 0√
P P = 5− 33
.
3 4 0 2
Determine pij , 1 ≤ i, j ≤ 2. Is such a P unique? If yes, why? If no,

how many of them are there?
(c) Suppose
−1
2 7 a11 a12 2 7 1 0
= .
0 1 a21 a22 0 1 0 −1
Determine aij , 1 ≤ i, j ≤ 2.

1 0
(d) Can you say more precisely why 1 1 is not similar to a diagonal
matrix than you did in Ex. 31(d)?
B.5 Elementary Matrix Operations and Row-Reduced

Echelon Matrices
Based on the essence of elimination method of variables in solving systems
of linear equations, we define three types of elementary row (or column)
operations on matrices as follows:
Type 1: Interchanging any two rows (or columns) of a matrix.
Type 2: Multiplying any one row (or column) of a matrix by a nonzero
constant.
Type 3: Adding any constant multiple of a row (or column) of a matrix to
another row (or column).
An n × n elementary matrix of type 1, 2 or 3 is a matrix obtained by

performing an elementary operation on In , of type 1, 2 or 3 respectively.
We adopt the following notations for elementary matrices:
Type 1: E(i)(j) (interchanging ith row and jth row, i = j);

Type 2: Eα(i) (multiplying ith row by α = 0);
Type 3: E(j)+α(i) (adding α multiple of ith row to jth row);
and F(i)(j) , Fα(i) , F(j)+α(i) for corresponding elementary matrices obtained

by perform elementary column operations. For example, 2 × 2 elementary
matrices are:

0 1 α 0 1 0
E(i)(j) = ; Eα(1) = ; Eα(2) = ;
1 0 0 1 0 α

1 0 1 α
E(2)+α(1) = ; E(1)+α(2) = .
α 1 0 1
Some 3 × 3 elementary matrices are:

   
0 0 1 1 0 0
E(1)(3) = 0 1 0 ; E2(α) = 0 α 0 ;
1 0 0 0 0 1
   
1 0 0 1 α 0
E(3)+α(1) = 0 1 0 = F(1)+α(3) ; E(1)+α(2) = 0 1 0 = F(2)+α(1) .
α 0 1 0 0 1
The following are some basic properties of elementary matrices:
1. E(i)(j) = F(i)(j) ; Eα(i) = Fα(i) ; E(j)+α(i) = F(i)+α(j) .

2. The determinants (see Sec. B.6) are det E(i)(j) = −1, det Eα(i) = α;
det E(j)+α(i) = 1.
3. Elementary matrices are invertible. In particular,
−1 −1 −1
E(i)(j) = E(j)(i) ; Eα(i) = E α1 (i) ; E(j)+α(i) = E(j)−α(i) .
4. The matrix obtained by performing an elementary row operation on a

given matrix A, of the respective type E(i)(j) , Eα(i) or E(j)+α(i) , is
E(i)(j) A, Eα(i) A or E(j)+α(i) A,
respectively.
Echelon and row-reduced echelon matrices

Give a matrix A = [aij ] ∈ M(m, n; F).
The first nonzero entry of a row is called its leading entry.
A matrix is called an echelon matrix if
1. the leading entries move to the right in successive rows,

2. the entries of the column passing a leading entry are all zero below that
leading entry, and
3. all zero rows, if any, are at the bottom of the matrix.
For example, the following are echelon matrices:
 
2 0 1 5 0
 
0 
−1 3 1 0 −1 3 4 0
0 0 2 1 , 
0 0 0 −2 1.
0 0 0 1 0 0 0 0 4
0 0 0 0 0
An echelon matrix is called row-reduced if
1. Every leading entry is 1, and

2. the entries of the column passing a leading entry are all zero above that
leading entry.
For examples,
 1  
0 1 −3 0 2 0 0 1 −11
0 0 0 1 2  , 1 0 0 17
0 0 0 0 0 0 1 0 −5
the former is a row-reduced echelon matrix, while the latter is not.

The leading entry of a certain row of a matrix is called a pivot if there is
no leading entry above it in the same column. A column of an echelon matrix
A in which a pivot appears is called a pivot column and the corresponding
∗
variable in the equation A x ∗ = 0 , where x ∈ Fn , a pivot variable, while
all remaining variables are called free variables.
Give a nonzero matrix A ∈ M(m, n; F). After performing a finite number
of elementary row operations on A, A can be reduced to a matrix of the
following type: there exists a unique positive integer r for 1 ≤ r ≤ m, n and
a sequence of positive integers 1 ≤ k1 < k2 < · · · < kr ≤ n such that
k1 k2 k3 kr
 
1 ∗ ··· ∗ 0 ∗ ··· ∗ 0 ∗ ······ ∗ 0 ∗ ······ ∗
 0 · · · · · · 0 1 ∗ · · · ∗ 0 ∗ · · · · · · ∗ 0 ∗ · · · · · · ∗
 . 
 . 
 . 0 · · · · · · 0 1 ∗ · · · · · · ∗ 0 ∗ · · · · · · ∗
 . .. . 
 .. 0 ∗ · · · · · · ∗ .. 
 . 
 . . . 
 .
R = 0 . . . 0 ∗ · · · · · · ∗ ,
. . 
 .. .. .. 
 . . 0 · · · · · · 0 1 ∗ · · · · · · ∗
 .  ← rth row
 .. .. .. .. 
 . . . . 0 0 · · · · · · 0
 . .. 
 . .. .. .. .. .. 
. . . . . . .
0 · · · · · · · · · 0 · · · · · · · · · 0 0 · · · · · · · · · 0 0 · · · · · · 0 m×n
(∗)
where ∗ could be any scalars and zeros elsewhere. Such a R is unique once
A is given, and is called the row-reduced echelon matrix of A.
The following are some basic results about row-reduced echelon
matrices:
1. There exists an invertible m × m matrix P which is a product of finitely

many elementary matrices such that
PA = R
is the row-reduced echelon matrix.

2. The first r rows R1∗ , . . . , Rr∗ of R are linearly independent. The k1 th
column R∗k1 , . . . , kr th column R∗kr are linearly independent. Therefore,
owing to the invertibility of P , the k1 th column vector A∗k1 , . . . , kr th
column vector A∗kr are linearly independent. Since R has rank r, so
r(A) = r(R) = r.
3. Suppose k = k1 , . . . , kr and 1 ≤ k ≤ n. If the kth column vector R∗k of

R is the linear combination R∗k = α1 R∗k1 + · · · + αr R∗kr , then the kth
column vector A∗k has the linear combination
A∗k = α1 A∗k1 + · · · + αr A∗kr .
This is true because P A∗j = R∗j for 1 ≤ j ≤ n. Let P = [pij ]m×m .

m
Then, for r + 1 ≤ i ≤ m, Pi∗ A = j=1 pij Aj∗ = Ri∗ = 0 holds.
4. Normal form of a matrix Suppose Am×n = O. Then, the exist invertible

matrix Pm×m and invertible matrix Qn×n (which is a product of finitely
many elementary n × n matrices) such that

Ir 0
P AQ = ,
0 0
where r = r(A).
Readers should try your best to give proofs of the above-mentioned results,
at least for 2 × 2 or 3 × 3 or 2 × 3 matrices.
Applications of row-reduced echelon matrix (a few supplement)

Application 1 To solve system of homogenous linear equations
For A ∈ M(m, n; F) and x ∈ Fn , considered as a column vector (here and
Application 2 only), the matrix equation
 
x1
 .. 
A x = 0 , where A = [aij ]m×n and x =  .  ,
xn
represents a system of homogenous m linear equations in n unknowns

x1 , . . . , xn . Let P A = R be row-reduced. Since P is invertible, A x = 0

is consistent with R x = 0 , i.e. both have the same set of solutions. Let
r = r(A) ≥ 1.
1. Let y1 , . . . , yn−r , be these unknowns among x1 , . . . , xn but other than
xk1 , . . . , xkr . Note that y1 , . . . , yn−r are free variables.

2. Write out R x = 0 as

n−r
xki = bij yj , 1 ≤ i ≤ r.
j=1
3. For each j, 1 ≤ j ≤ n − r, let yj = 1 but yk = 0 if k = j, 1 ≤ k ≤ n − r.

The resulted fundamental solution is

vj = (0, . . . , 0, b1j , 0, . . . , 0, b2j , 0, . . . , 0, brj , 0, . . . , 0).
↑ ↑ ↑
k1 th k2 th kr th
4. The vectors v 1, . . . ,
v n−r are linearly independent. The general solution
is a linear combination of v 1, . . . ,
v n−r such as

n−r

v = v j , y1 , . . . , yn−r ∈ F.
yj
j=1
5. Therefore, the solution space

x ∈ F n | A
V = { x = 0}
is an (n − r(A))-dimensional subspace of Fn .
Application 2 To solve system of non-homogenous linear equations

For a given column vector b ∈ Fm , the matrix equation

A
x= b
is a system of non-homogenous m linear equations in n unknowns

x1 , . . . , xn . A
x = b is consistent with R
x = P b . Suppose r(A) = r.

1. Perform P on the augmented m × (n + 1) matrix[A | b ] to obtain

P [A | b ] = [P A | P b ] = [R | P b ].
2. Thus,

A
x = b has a solution.

⇔ r(A) = r([A | b ])

⇔ The last (m − r) components P(r+1)∗ b , . . . , Pm∗ b of the column

vector P b are all equal to zero.

3. In case having a solution, write R
x = P b as

n−r
xki = Pi∗ b + bij yj , 1≤i≤r
j=1
4. Letting y1 = · · · = yn−r = 0, a particular solution is

v0 = (0, . . . , 0, P1∗ b , 0, . . . , 0, P2∗ b , 0, . . . , 0, Pr∗ b , 0, . . . , 0)
↑ ↑ ↑
k1 th k2 th kr th
5. The general solution is

n−r

v =
v0 + yj
vj , y1 , . . . , yn−r ∈ F.
j=1
6. Therefore, the set of solutions

x ∈ Fm |
{ xA = b }
is the (n − r(A))-dimensional affine subspace of Fn

v0 + {
x ∈ Fn | A
x = 0} =
v0 + V.
See Fig. B.2.
affine subspace
v0
v0 + V
0 vector space V
Fig. B.2
For example, let

 
2 3 1 4 −9
1 1 1 1 −3
A=
1
.
1 1 2 −5
2 2 2 3 −8
Then A has row-reduced echelon matrix
 
1 0 2 0 −2
0 1 −1 0 1
R= 0 0
.
0 1 −2
0 0 0 0 0
Do the following problems:
1. Find invertible matrix P , expressed as a product of finitely many 4 × 4
elementary matrices, so that P A = R.
2. Find invertible matrix Q, expressed as a product of finitely many 5 × 5
elementary matrices, so that

I3 0
P AQ = .
0 0

3. Solve A
x = 0.

4. Find necessary  sufficient condition for A x = b to have a solution.
 and
17
6
5. In case b =  8 , show that A
x = b has a solution. Then, find a par-
14
ticular solution of it and write out the solution set as an affine subspace
of R5 .
Application 3 To determine the rank of a nonzero matrix
Application 4 To determine the invertibility of a square matrix and how

to compute the inverse matrix of an invertible matrix
A square matrix A is invertible if and only if its row-reduced echelon matrix
is the identity matrix, i.e.
R = In .
Thus, just perform a finite sequence of elementary row operations to n × 2n
matrix [A | In ] until the row-reduced echelon matrix R of A comes out,
such as
E 1
[A | In ] −→ E1 [A | In ]
2 E
= [E1 An | E1 ] −→ E2 [E1 A | E1 ]
3 E k E
= [E2 E1 A | E2 E1 ] −→ · · · −→ Ek [Ek−1 · · · E2 E1 A | Ek−1 · · · E2 E1 ]
= [Ek Ek−1 · · · E2 E1 A | Ek Ek−1 · · · E2 E1 ] = [P A | P ] = [R | P ],
where P = Ek Ek−1 · · · E2 E1 .
If R = In , then A is definitely not invertible. In case R = In , then A is
invertible and, at the same time,
A−1 = P = Ek Ek−1 · · · E2 E1 .
Application 5 To express an invertible matrix as a product of finitely many

elementary matrices
See Application 4, then
A = E1−1 E2−1 · · · Ek−1
−1
Ek−1 .
Application 6 To compute the determinant of an invertible matrix

See Application 5, then
det A = det E1−1 · det E2−1 · · · det Ek−1
−1
· det Ek−1 .
For example,
     
1 1 1 −3 1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0  
A= =  · 0 1 0 0 ·
1 1 2 −3 0 0 1 0   1 0 1 0
2 2 4 −5 0 0 2 1 0 0 0 1
     
1 1 0 0 1 0 1 0 1 0 0 −3
0 1 0 0 0 1 0 0  0
 ·  · 0 1 0 
0 0 1 0 0 0 1 0   0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
and det A = 1.
B.6 Determinants 727
Application 7 See Sec. 5.9.2 and Ex. 2(f ) of Sec. B.7.

Application 8 To compute the row space R(A), the left kernel N (A), the
column space R(A∗ ) and the right kernel N (A∗ ) to be introduced in Sec. B.8
Adopt notation in (∗). Then
R(A) = R1∗ , . . . , Rr∗ ;
R(A∗ ) = A∗k1 , . . . , A∗kr ;
N (A) = Pr+1,∗ , . . . , Pm∗ , where P A = R;
N (A∗ ) = v n−r , where
v 1, . . . , v 1, . . . ,
v n−r are fundamental
∗
x ∗ = 0 in Application 1 above.
solution of A
In fact, we are able to determine these four subspaces once we reduce the
original matrix A to its echelon form S, i.e. QA = S for some invertible
matrix Qm×m .
B.6 Determinants
Let A = [aij ] be an n×n matrix with entries from a field F. The determinant
of A is an element of F, denote by
det A,
which can be defined inductively on n as follows:
1. n = 1. The determinant det A of order 1 is

det A = a11 .
2. n = 2. The determinant det A of order 2 is

a11 a12
det A = = a11 a22 − a12 a21 .
(def.) a21 a22 (def.)
Suppose the determinant of order n−1 has been defined and A = [aij ]n×n
for n ≥ 3 is given. Let
 
a11 a12 ··· a1j ··· a1n
 .. .. .. .. 
 . . . . 
 
Aij =  a i1 a i2 ··· aij · · · ain  ← ith row deleted
 . . .. .. 
 .. .. . . 
an1 an2 · · · anj · · · ann
↑
jth column deleted
be the (n − 1) × (n − 1) matrix obtained from A by deleting the ith row and

the jth column. The defined determinant det Aij of order n − 1 is called
the minor of the (i, j)-entry aij in the matrix A and
(−1)i+j det Aij , 1 ≤ i, j ≤ n
is called the cofactor of aij in A.
3. The determinant det A of order n is defined as

n
det A = (−1)i+j aij det Aij .
j=1
and is called the expansion of the determinant det A along the ith row.
This inductive definition for determinants provide, at the same time, a

technique needed to compute a determinant.
[3] gives five different definitions for determinants. One of these defini-
tions is the following
Characteristic Properties of Determinants
There exists a unique function det : M(n; F) → F satisfying the following
properties:
1. (multiplinear) det is a linear function of each row of an n × n matrix

when the remaining n − 1 rows are held fixed, i.e.
     
A1∗ A1∗ A1∗
 ..   ..   .. 
 .   .   . 
     
det αA
 i∗ + B 
i∗  = α det  A
 i∗ 
 + det  Bi∗ 
 
 ..   .   . 
 .   .
.   . 
.
An∗ An∗ An∗
for 1 ≤ i ≤ n and α ∈ F.
2. (alternating) If Bn×n is obtained by interchanging any two rows of an
n × n matrix A, then
det B = − det A.
3. (unit) For the n × n identity matrix In,
det In = 1.
Try to prove this result for n = 2 or n = 3.

From the very definition for determinants, the following basic properties
can be deduced.
1. The determinants of elementary matrices (see Sec. B.5) are
det E(i)(j) = −1;

det Eα(i) = α;
det E(j)+α(i) = 1.
2. An×n is not invertible (i.e. singular) if and only if
det A = 0
and hence is equivalent to the rank r(A) < n.

3. The det A can be expanded along any row, i.e.

n
det A = (−1)i+j aij det Aij , 1 ≤ i ≤ n.
j=1
4. The interchange of rows and columns does not change the value of
det A, i.e.
det A∗ = det A.
Thus, det A can be expanded

(2) along any column.
(1)
5. For A = aij n×n , B = aij n×n and α ∈ F,
det(αA) = αn det A;
(1) (2)
det(A + B) = det aij + aij

a(k1 ) a(k2 ) (k )
· · · a1nn
11
2
. 12

= . .. .. .
. . .
k1 ,...,kn =1 (k1 ) (k )
· · · annn
(k2 )
an1 an2
(a sum of 2n terms)
6. Let A = [aij ]m×n and B = [bij ]n×m .

If m = n: det AB = det A · det B.
If m > n: det AB = 0.
If m < n:
det AB

a1j1 ··· a1jm bj1 1 ··· bj1 m

.. .. .. .. .
= . . . .

1≤j1 <j2 <···<jm ≤n
amj1 · · · amjm bjm 1 · · · bjm m
(a sum of Cnm terms)
This is called Canchy–Binnet formula.

To extend the expansion of a determinant along a row or a column to that
along some rows or columns, say along i1 th, . . . , ik th rows, where 1 ≤ i1 <
· · · < ik ≤ n, let
 
ai1 j1 · · · ai1 jk
i · · · ik  ..  , 1 ≤ j < · · · j ≤ n
A 1 = det  ... .  1 k
j1 · · · jk
aik j1 · · · aik jk
denote a subdeterminant of order k of det A and is called principal if il =

jl for 1 ≤ l ≤ k. Use ik+1 , . . . , in , where 1 ≤ ik+1 < · · · < in ≤ n to
denote those integers among 1, 2, . . . , n that are different from i1 , . . . , ik .
Similarly, jk+1 , . . . , jn are these integers among 1, 2, . . . , n but are different
from j1 , . . . , jk . Then, the subdeterminant of n − k

i1 · · · ik i1 +···+ik +j1 +···+jk ik+1 · · · in
Ã = (−1) A
j1 · · · jk jk+1 · · · jn
* +
i ··· i
is called the cofactor of A j1 · · · jk in det A. By using 5, we have
1 k
7. Laplace expansion Give an n × n matrix A = [aij ]. For any positive

integer 1 ≤ k ≤ n and 1 ≤ i1 < · · · < ik ≤ n,

i · · · ik i · · · ik
det A = A 1 Ã 1 .
j1 · · · jk j1 · · · jk
1≤j1 <···<jk ≤n
In most cases, it is not easy to calculate the value of a determinant of

order n ≥ 4 by hand-computing. A compensative method to do so is to use
elementary row operations, the so-called Gaussian elimination method, to
reduce an n × n square matrix to an upper triangular matrix whose deter-
minant is the product of its main diagonal entries. Also, see Application 6
to Sec. B.5.
Laplace expansion formula shows that

A A12
det 11 = det A11 · det A22 ,
0 A22
where A11 and A22 are square matrices. Furthermore, if A11 is invertible,
then by use of the identity

In1 0 A11 A12 A11 A12
= ,
−A21 A−111 In2 A21 A22 0 −A21 A−1
11 A12 + A22
it follows that

A A12
det 11 = det A11 · det(−A21 A−1
11 A12 + A22 ).
A21 A22
Therefore, if det A11 = 0 and A11 A21 = A21 A11 holds, then

A11 A12
det = det(A11 A22 − A21 A12 ).
A21 A22
Let A = [aij ]n×n . By basic properties 2 and 3 above, the following
identities

n
(−1)i+j akj · det Aij = δik det A, 1 ≤ i, k ≤ n (∗)
j=1
hold. To put these identities in a compact form, we define the adjoint matrix
of A as
adj A = [bij ]n×n ,
bij = (−1)j+i det Aji , 1 ≤ i, j ≤ n.
Thus, (∗) can be written as a single identity
A · adj A = adj A · A = (det A)In .
We conclude that a square matrix A is invertible if and only if det A = 0.
In this case, the inverse matrix is
1
A−1 = adj A.
det A
Let A = [aij ]n×n . The system of linear equations

A
x= b

x1
..
x is the n × 1 column vector
in n unknowns x1 , . . . , xn , where . ∈ Fn
xn

b1
..
and b = . ∈ Fn , has a unique solution if and only if det A = 0. Let Xk
bn
be the matrix obtained from In by replacing its kth column by

x . Then
e1 ···
AXk = A[ x
e k−1 e k+1 · · ·
e n]
e 1 · · · A
= [A e k−1 A e k+1 · · · A
x A e n]

= [A∗1 · · · A∗,k−1 b A∗,k+1 · · · A∗n ]
⇒ det AXk = det A · det Xk = xk det A

= det[A∗1 · · · A∗,k−1 b A∗,k+1 · · · A∗n ]
1
⇒ xk = det[A∗1 · · · A∗,k−1 b A∗,k+1 · · · A∗n ], 1 ≤ k ≤ n,
det A

where [A∗1 · · · A∗,k−1 b A∗,k+1 · · · A∗n ] is the matrix obtained from A by

replacing its kth column A∗k by b . This is the Cramer’s Rule for solutions

of the equation A x = b . Note that, in this case, x = A−1 b .
B.7 Linear Transformations and Their Matrix

Representations
Definition Suppose V and W are vector spaces over the same filed F. A
function (see Sec. A.2)
f: V → W
is called a linear transformation or mapping from V into W if it preserves
linear structures of vector spaces, i.e. for any y ∈ V and α ∈ F, the
x,
following properties hold:
1. f (αx ) = αf (x ).
2. f ( x +

y ) = f (
x ) + f (
y ).
If, in addition, f is both one-to-one and onto, then f is called a linear
isomorphism from V onto W , and V and W are called isomorphic. In case
W = V , a linear transformation f : V → V is specially called a linear
operator ; while if W = F, f : V → F is called a linear functional.
Remark
In general situations, both conditions 1 and 2 in the definition are indepen-
dent of each other and hence are needed simultaneously in the definition of
a linear transformation.
For example, define a mapping f : R2 (vector space) → R2 by

x , if x1 x2 ≥ 0
f(x) =
− x , if x1 x2 < 0,
B.7 Linear Transformations and Their Matrix Representations 733
x = (x1 , x2 ) ∈ R2 . Then f obviously satisfies condition 1 but not 2

where
and hence is not linear.
When R is considered as a vector space over the rational filed Q, R is
an infinite-dimensional vector space and has a basis B (see Sec. B.3). Let
x1 and x2 be distinct elements of B, and define τ : B → R by

x1 , if x = x2 ,
τ (x) = x2 , if x = x1 ,

x, otherwise.
Then, there exists a linear transformation f : R(over Q) → R such that
f (x) = τ (x) for all x ∈ B. Of course, f is additive, i.e. condition 2 in
definition holds but for α = xx21 which is an irrational number, f (αx1 ) =
f (x2 ) = τ (x2 ) = x1 = xx21 · x2 = αf (x1 ).
It is worth mentioning that a continuous or even bounded additive map-
ping f : R(over R itself) → R should be linear.
The existence of linear transformations

The function 0: V → W mapping every vector in V into zero vector 0
in W , i.e.

0(
x) = 0 , x ∈V

is a linear transformation and is called the zero transformation.

For simplicity, suppose that V is finite-dimensional and dim V = m.
Choose any fixed basis B = { x m } for V and any m vectors
x 1, . . . ,
y 1 , . . . , y m in W. There exists exactly one linear transformation f : V → W

such that
f (
x i) =
y, 1 ≤ i ≤ m.
All we need to do is to define a function f : V → W by assigning f (
x i) =
y i,

m
1 ≤ i ≤ m and then extending it linearly to all vectors i=1 αi x i in V by

%m &
m

f αi x i = αi
y i.
i=1 i=1
This result still holds for infinite-dimensional space V .

The set of linear transformations from V into W
L(V, W ) or Hom(V, W )
forms a vector space over F, where (f +g)(
x ) = f (
x )+g(
x ) and (αf )(
x) =
αf ( x ), x ∈ V . dim L(V, W ) = mn if dim V = m and dim W = n

(see Ex. 6).

Suppose f ∈ L(V, W ) and g ∈ L(W, U ), then the composite g ◦ f ∈

L(V, U ).
Kernel (space) and range (space) of a linear transformation

Suppose f : V → W is a linear transformation. Then

x ∈ V | f (
Kernel : Ker(f ) = { x ) = 0 }, or denoted by N(f ),
x) ∈ W |
Range: Im(f ) = {f ( x ∈ V }, or denoted by R(f )
are subspaces of V and W respectively. In case dim V < ∞, then
dim Ker(f ) + dim Im(f ) = dim V
holds with dim Ker(f ) called the nullity of f and dim Im(f ) the rank of f .

As a consequence, f is one-to-one if and only if Ker(f ) = { 0 }.
Suppose dim V = m < ∞ and dim W = n < ∞. Then,
1. If m < n, f can only be one-to-one but never onto.

2. If m > n, f can only be onto but never one-to-one.
3. If m = n, f is one-to-one if and only if f is onto.
In Case 3, such a f : V → W is a linear isomorphism.
Matrix representations of linear transformations

between finite-dimensional spaces
Let dim V = m and dim W = n. Fix a basis B = { a m } for V and
a1 , . . . ,

a basis C = { b1 , . . . , bn } for W.
For x ∈ V , there exists unique scalars x1 , . . . , xn ∈ F such that

m

x= xi
a i.
i=1
The coefficients x1 , . . . , xm forms a vector
x ]B = (x1 , . . . , xm ) ∈ Fm
[
x relative to the basis B. Similarly,

and is called the coordinate vector of
y ∈ W , it has coordinate vector relative to C
for
y ]C = (y1 , . . . , yn ) ∈ Fn
[

n
if and only if
y = j=1 yj b j .
Let f : V → W be linear. Then f (a i ) ∈ W and

n
f (
a i) = aij b j , 1 ≤ i ≤ m
j=1
⇒ [f (
a i )]C = (ai1 , ai2 , . . . , ain ), 1 ≤ i ≤ m.

m
For any
x= i=1 xi
a i,
%m &

m
m
n

n

f(x) = xi f (
a i) = xi aij b j = xi aij bj
i=1 i=1 j=1 j=1 i=1
%m &

m
m
⇒ [f (
x )]C = xi ai1 , xi ai2 , . . . , xi aim
i=1 i=1 i=1
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
 
= (x1 · · · xm )  . .. .. 
 .. . . 
am1 am2 · · · amn
= [ x ]B [f ]B

C,
where the m × n matrix

 
[f (
a 1 )]C
 .. 
[f ]B
C = . 
[f (
a m )]C
is called the matrix representation of f relative to ordered bases B and C.
In case W = V and C = B, simply denote it by [f ]B .
Let B = {a 1 , . . . ,
a m } be another basis for V. The identity linear
operator 1V : V → V can be written in the matrix form as
[ x ]B [1V ]B
x ]B = [ B ,
where
 
[
a 1 ]B
 
[1V ]B
B =  ... 
[
a m ]B
is the matrix representation of 1V relative to B and B and is called the
change of coordinate matrix or transition matrix changing B into B . Simi-
larly, for another basis C for W and
y ∈ W , we have
[ y ]C [1W ]CC .
y ]C = [
Both [1V ]B C
B and [1W ]C are invertible.

What are the possible relations among [f ]B B B C
C , [f ]C , [1V ]B and [1W ]C ?

B
Since [f ( x )]C = [ x ]B [f ]C , therefore
[f ( x )]C [1W ]CC

x )]C = [f (
x ]B [f ]B
= [ C
C [1W ]C

x ]B [1V ]B
= [ B
B [f ]C ,

x ∈V

⇒ [f ]B C
C [1W ]C = [1V ]B B
B [f ]C
−1 B
⇒ [f ]B
C = [1V ]B
B [f ]C [1W ]CC = [1V ]B B C
B [f ]C [1W ]C .
This means the following diagram is commutative.
C [f ]B
V −−−− → W
(B) f (C)
@ 

1V [1V ]B
B C 
[1W ]C 11W

(B ) [f ]B (C )
C
V −−−− → W
f
Summarize above results as partial of
The Relations between L(V, W ) and M(m, n; F)

Let dim V = m, dim W = n and dim U < ∞.
1. For each basis B for V,

x ∈ V → [
x ]B ∈ F m
is a linear isomorphism.
2. For each basis B for V and basis C for W , each f ∈ L(V, W ) has a unique
matrix representation [f ]B
C relative to B and C:
[f ( x ]B [f ]B
x )]C = [ C.
This is equivalent to say that the following diagram is commutative.

f
V −−−→ W
(B) (C)
iso # # iso
B
m [f ]C
F −−−→Fn

3. For another basis B for V and C for W, [f ]B B
C and [f ]C are related to each

C
other, subject to changes of coordinate matrices [1V ]B B and [1W ]C , as

[f ]B B B C
C = [1V ]B [f ]C [1W ]C .
In case W = V , let B = C and B = C , then [f ]B and [f ]B are similar, i.e.

B −1
[f ]B = [1V ]B
B [f ]B [1V ]B .
4. The mapping (refer to Ex. 6)
f ∈ L(V, W ) → [f ]B
C ∈ M(m, n; F)
is a linear isomorphism, i.e.

[f + g]B B B
C = [f ]C + [g]C and [αf ]B B
C = α[f ]C for α ∈ F.
5. Suppose f ∈ L(V, W ) and g ∈ L(W, U ) and D is a basis for U. Then,

[g ◦ f ]B B C
D = [f ]C [g]D .
6. Suppose W = V and B = C. Let

GL(V, V ) = {f ∈ L(V, V ) | f is invertible}, and
GL(n; F) = {A ∈ M(n; F) | A is invertible}.
The former is a group under the operation of the composite of functions,
while the latter is a group under the multiplication of matrices. Both are
called the general linear group of order n. The mapping, for each basis
B for V,
f ∈ GL(V, V ) → [f ]B ∈ GL(n; F)
is a group isomorphism, i.e.
[g ◦ f ]B = [f ]B [g]B and [f −1 ]B = [f ]−1
B .
These results enable us to reduce the study of linear transformations

between finite-dimensional vector spaces to the study of matrices, sub-
ject to “similarity” (see Ex. 31 in Sec. B.4). More precisely, for a matrix
A ∈ M(m, n; F), we would consider A as the linear transformation
defined by
x ∈ Fm →

x A ∈ Fn
whose matrix representation relative to the natural basis N = { e m}
e 1, . . . ,

for F and the natural basis N for F is A itself, while relative to another
m n
basis B for Fm and basis B for Fn is

B −1
[1Fm ]BN A [1Fn ]N .
Exercises
Let V and W be vector spaces over the same field F throughout the following
problems.
1. Suppose dim V < ∞ and f ∈ L(V, W ).

(a) If S is a subspace of V , then

dim(f −1 ( 0 ) ∩ S) + dim f (S) = dim S.

Hence dim f (S) ≤ dim S with equality if and only if f −1 ( 0 ) ∩ S =

{ 0 }.
(b) If T is a subspace of W , then

dim f −1 ( 0 ) + dim(f (V ) ∩ T ) = dim f −1 (T ).
In particular, dim(f (V ) ∩ T ) ≤ dim f −1 (T ) with equality if and

only if f −1 ( 0 ) = { 0 }, i.e. f is one-to-one.
2. Let A ∈ M(m, n; F) be a nonzero matrix. Consider
x ∈ Fm →
x A ∈ Fn
as a linear transformation and y ∈ Fn → y A∗ ∈ Fm as a linear

transformation.
(a) Then (refer to Ex. 15 of Sec. B.4)
the maximal number of linearly independent row vectors of A

= the dimension of the range space {
xA |
x ∈ Fm }.
This common number is called the row rank of A.

(b) Similarly,
the maximal number of linearly independent column

vectors of A
y A∗ |
= the dimension of the range space { y ∈ Fn }.
This common number is called the column rank of A.

(c) Therefore,
dim Ker A∗ = n — the column rank of A.

(d) On the other hand, Ker A∗ = { y A∗ = 0 } is the solution
y ∈ Fn |

space of the system of homogenous linear equations Ay ∗ = 0 in n
unknowns. From Application 1 in Sec. B.5, we have already known

that
dim Ker A∗ = n — the row rank of A.
(Note The invertible matrix P such that P A = R is row-reduced

echelon matrix preserves row rank of A, because the range space
xA |
{ x ∈ Fm } = {
x (P A) |
x ∈ Fm } = {
xR |
x ∈ Fm }.)
(e) Combining (c) and (d), the following holds:
row rank of A = column rank of A.
(f) Let ai = (ai1 , ai2 , . . . , aim ) ∈ Fm , 1 ≤ i ≤ k ≤ m. Show that

a1 , a2 , . . . ,
ak are linearly independent if and only if there exist
integers 1 ≤ j1 < j2 < · · · < jk ≤ m such the k × k submatrix
 
a1j1 · · · a1jk
a2j1 · · · a2jk 
 
 . .. 
 .. . 
akj1 · · · akjk
is invertible.
(g) Let r(1 ≤ r ≤ m, n) denote the largest integer such that some
r × r submatrix of A are invertible, or equivalently, has a nonzero
determinant. This r is defined to be the rank of A.
(h) Show that
row rank of A = column rank of A = rank of A.
Usually, this common number is denoted by r(A). Note r(O) = 0.

3. (a) Let P ∈ M(k, m; F) and A ∈ M(m, n; F). Show that
r(P ) + r(A) − m ≤ r(P A) ≤ min{r(P ), r(A)}
with
(1) r(PA) = r(A) ⇔ Fm = Im(P ) + Ker(A),

(2) r(PA) = r(P ) ⇔ Im(P ) ∩ Ker(A) = { 0 },
(3) r(PA) = r(P ) + r(A) − m ⇔ Ker(A) ⊆ Im(P ).
(b) Suppose P ∈ M(m; F) and Q ∈ M(n; F) are invertible, and
A ∈ M(m, n; F). Then
r(PAQ) = r(PA) = r(AQ) = r(A).

(c) Suppose A, B ∈ M(m, n; F). Then

r(A + B) ≤ r(A) + r(B)
with equality if and only if Im(A + B) = Im(A) ⊕ Im(B).
4. Suppose A ∈ M(m, n; F), where F is a subfield of C. Show that
r(A) = r(A∗ ) = r(AA∗ ) = r(A∗ A).
5. Let f ∈ L(V, W ).
(a) f is onto ⇔ f is right-invertible, i.e. there exists g ∈ L(W, V )
such that
f ◦ g = 1W .
(b) f is one - to - one ⇔ f is left-invertible, i.e. there exists h ∈ L(W, V )
such that
h ◦ f = 1V .
In case dim V = m and dim W = n, these results are equivalent to
say that, for A ∈ M(m, n; F) considered as the mapping
x ∈ Fm →
xA ∈ F ,
n
(c) A has rank n ⇔ there exists a matrix B ∈ M(n, m; F) such that

BA = In .
(d) A has rank m ⇔ there exists a matrix C ∈ M(n, m; F) such that
AC = Im .
Try to prove (c) and (d) by the concept of row-reduced echelon matrix
or some other methods without recourse to (a) and (b).
6. Suppose dim V = m, dim W = n and B = { xm } is a basis for V
x1 , . . . ,
and C = { y1 , . . . , yn } a basis for W . Define fij ∈ L(V, W ) by

fij (
xk ) = δki
yj , 1 ≤ i, k ≤ m, 1 ≤ j ≤ n.
Then N = {fij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} forms a basis for L(V, W ). In

m,n
particular, if f ∈ L(V, W ) and f = i,j=1 aij fij , then
[f ]B
C = [aij ]m×n .
Therefore, dim L(V, W ) = dim V · dim W .

7. Suppose dim V = n. Let f ∈ L(V, V ) be an idempotent linear operator
(refer to Ex. 6 of Sec. B.4), i.e.
f2 = f ◦ f = f or f ◦ (f − 1V ) = (f − 1V ) ◦ f = 0.
(a) Show that

x ∈ V | f (
R(f ) = { x } = Ker(f − 1V ),
x) =

R(f − 1V ) = {
x | f (
x ) = 0 } = Ker(f )
and hence
V = Ker(f − 1V ) ⊕ Ker(f ) = R(f − 1V ) ⊕ R(f ).
(b) Let { xr } be a basis for R(f ) = Ker(f − 1V )
x1 , . . . , and
{ xr+1 , . . . , xn } be a basis for Ker(f ) = R(f − 1V ) so that

B =
{ xr ,
x1 , . . . , xn } is a basis for V . Then
xr+1 , . . . ,

I 0
[f ]B = r .
0 0 n×n
8. Suppose F is a field with 1 + 1 = 0 and dim V = n. Let f ∈ L(V, V ) be
an involutory linear operator (see Ex. 9 of Sec. B.4), i.e.
f 2 = 1V or (f − 1V ) ◦ (f + 1V ) = (f + 1V ) ◦ (f − 1V ) = 0.
(a) Then
Ker(f − 1V ) = {
x ∈ V | f ( x },
x) =
Ker(f + 1V ) = {
x ∈ V | f (
x ) = −
x}
are subspaces of V . Let { xr } be a basis for Ker(f − 1V )
x1 , . . . ,
and extend it to a basis { x1 , . . . ,
xr , xn } for V . Thus,
xr+1 , . . . ,
{f ( xr+1 ) − xr+1 , . . . , f ( xn ) − xn } is linearly independent in V .

(b) It is clear that Ker(f − 1V ) ∩ Ker(f + 1V ) = { 0 } and, for x ∈V,
1 1

x = ( x + f ( x )) + ( x − f ( x )).
2 2
It follows that
V = Ker(f − 1V ) ⊕ Ker(f + 1V ).
Hence {f (
xr+1 ) − xn ) −
xr+1 , . . . , f ( xn } forms a basis for
R(f − 1V ) = Ker(f + 1V ).
Similarly, R(f + 1V ) = Ker(f − 1V ).
(c) Take a basis { xr } for Ker(f −1V ) and a basis {
x1 , . . . , xn }
xr+1 , . . . ,
for Ker(f + 1V ) so that B = { x1 , . . . , xr , xr+1 , . . . , xn } is a basis

for V . then

Ir 0
[f ]B = ,
0 −In−r n×m
where r is the rank of f + 1V which is equal to dim Ker(f − 1V ).
9. Let V be an n-dimensional real vector space and f ∈ L(V, V ) satisfy

f 2 = f ◦ f = −1V .
(a) The complexification of V Define two operations on V as follows:
1. Addition: as the one originally defined on V .
x ∈ V and a, b ∈ R,
2. Scalar multiplication: for
(a + bi)
x = a
x + bf (
x ).
Then V is an n-dimensional complex vector space.
(b) Now, consider V as a complex vector space. Then f can be regarded
as a complex linear operator in the following sense:
f ((a + ib)
x ) = f (a
x + bf (
x )) = af (
x ) + bf 2 ( x ) − b
x ) = af ( x
= (a + bi)f (
x ), a, b ∈ R.
Therefore f 2 = −1V is equivalent to (f − i1V )(f + i1V ) = 0.
(c) Note that
Ker(f − i1V ) = {
x ∈ V | f ( x },
x ) = i
Ker(f + i1V ) = {
x ∈ V | f (
x ) = −i
x}
and
x = 12 (
x + if ( x − if (
x )) + 12 ( x ∈ V . Thus
x )) for each
V = Ker(f − i1V ) ⊕ Ker(f + i1V ).
(d) Suppose r = dim Ker(f − i1V ). There exist vectors xr+1 , . . . ,
xn ∈ V such that (f − i1V )( x ) = f ( xj ) − i xj , r + 1 ≤ j ≤ n,

from a basis for the range space R(f − i1V ). At the same time, for
x ) ∈ Ker(f + i1V ) which has dimension
r + 1 ≤ j ≤ n, (f − i1V )(
n − r. Therefore,
R(f − i1V ) = Ker(f + i1V ).
Similarly,
R(f + i1V ) = Ker(f − i1V ).
xj ), r + 1 ≤ j ≤ n, are linearly independent
The vectors (f + i1V )(
in R(f + i1V ). Then
n−r ≤r
holds. Similarly, r ≤ n − r. Hence
n = 2r.
n
In particular, n is even and r = 2.
(e) By (d), there exist linearly independent vectors x1 , . . . ,

xr such
that (f − i1V )( xj ), 1 ≤ j ≤ r, forms a basis for R(f − i1V ) and

xj ), 1 ≤ j ≤ r, forms a basis for R(f +i1V ). They together

(f +i1V )(
form a basis C for V . Note that
1
xj = [(f ( xj ) − (f (
xj ) + i xj ) − i
xj )],
2i
1
f (
xj ) = [(f ( xj ) + i xj ) − i
xj ) + (f ( xj )], 1 ≤ j ≤ r.
2
Hence,
B = {
x1 , . . . ,
xr , f (
x1 ), . . . , f (
xr )}
forms a basis for the real space V .
(f) Let C and B be as in (e). When consider V as an n-dimensional
complex vector space, then

iIn O n
[f ]C = , r = = dim Ker(f − i1V );
O −iIn n×n 2
while as a real vector space,

O In
[f ]B = .
−In O n×n
10. Let A = [aij ]n×n ∈ M(n; F). Define g, h: M(n; F) → M(n; F) as

g(X) = XA,
h(X) = AX
for X ∈ M(n; F). Let N = {E11 , E12 , . . . , E1n , . . . , En1 , . . . , Enn } be
the natural basis for M(n; F).
(a) Show that
 
A
 A 0
 
[g]N =  .. 
0 . 
A n2 ×n2
and g(X) = [X]N [g]N . [g]N or g has rank

r([g]N ) = n r(A)
and hence, g is invertible if and only if r(A) = n, i.e. det A = 0.
The kernel space Ker(g) has dimension n2 − n r(A) = n(n − r(A)).
(b) Show that

 
a11 In a21 In ··· an1 In
 a12 In a22 In ··· an2 In 
 
[h]N = . .. .. 
 .. . . 
a1n In a2n In · · · ann In n2 ×n2
and
r([h]N ) = n r(A),
det[h]N = (det A)n .
(c) Let B = {E11 , E21 , . . . , En1 , . . . , E1n , E2n , . . . , Enn } be another

ordered basis for M(n; F). Show that
[g]B = [h]∗N .
11. Suppose T : M(n; F) → M(n; F) is an algebra homomorphism, i.e.

1. T is a linear operator, and
2. T preserves matrix multiplication
T (XY ) = T (X)T (Y ), X, Y ∈ M(n; F).
Then, either T = 0 (zero linear operator) or there exists an invertible

matrix P ∈ M(n; F) such that
T (X) = P XP −1 , X ∈ M(n; F).
We may suppose that T = 0. Prove this result by the following steps.

(a) Show that
T (In ) = In .
T (X −1 ) = T (X)−1 if X is invertible,
T (QXQ ) = T (Q)T (X)T (Q)−1 .
−1

n
(b) Now In = i=1 T (Eii ). So at least one of T (Eii ), 1 ≤ i ≤ n, is
not a zero matrix, say T (E11 ) = O. For some suitable elementary
matrices Q1 , Q2 of type 1, Eij = Q1 E11 Q2 holds for 1 ≤ i, j ≤ n.
Hence T (Eij ) = O, 1 ≤ i, j ≤ n.
(c) The rank r(T (Eii )) = 1, 1 ≤ i ≤ n. Let x1 ∈ Fn be such that

x1 = 0 and e1 T (E11 ) = x1 . Let xi = x1 T (E1i ), 2 ≤ i ≤ n.

By using E1k Eij = δik E1j ,

xk T (Eij ) =
x1 T (E1k )T (Eij ) =
x1 T (E1k Eij ) =
x1 T (δik E1j )
= δik
x1 T (E1j ) = δik
xj , 1 ≤ i, j ≤ n, 1 ≤ k ≤ n.

In particular, x1 = 0 implies that
xi T (Ei1 ) = xi = 0 , 2 ≤ k ≤ n.
Also xi T (Eii ) = xi , 2 ≤ i ≤ n. Therefore, {

x1 , xn } forms a
x2 , . . . ,
basis for Fn .
(d) Let
 
x1
 
 x2 
P =  . .
 .. 

xn
Then P is invertible and
T (Eij ) = P Eij P −1 , 1 ≤ i, j ≤ n
and hence T (X) = P XP −1 , X ∈ M(n; F).
12. Prove that the following are equivalent.
(a) V is a k-dimensional subspace of Fm .
(b) There exist linearly independent vectors
x1 ,
x2 , . . . ,
xk in V such
that
V =
x1 , xk .
x2 , . . . ,
(c) There exists an m × n matrix A of rank m − k such that

V = {
x ∈ Fm |
x A = 0 }.
(d) There exists an n × m matrix B of rank k such that
V = {
xB |
x ∈ Fn }
Is it possible that n < k or n = k?
13. Suppose V1 and V2 are subspaces of V and V = V1 ⊕ V2 . For each
x ∈ V , there exist unique

x1 ∈ V1 and
x2 ∈ V2 such that
x =
x1 +
x2 .
Define p: V → V1 by
p(
x) =
x1 .
Then p ∈ L(V, V ). Such a p has the following geometric properties:
(1) p keeps vectors in V1 fixed, i.e.
p(
x) =
x,
x ∈ V1 .

(2) p maps each vector in V2 into zero vector 0 , i.e. p(x) = 0 for
x ∈ V2 .

(3) p projects each vector

x =
x1 +
x2 in V into
x1 along the direction
parallel to the complementary subspace V2 of V1 in V .
See Fig. B.3. Hence, p is called the projection (operator) of V onto V1
along V2 . Note that p(p(
x )) = p( x ∈ V , i.e. in short,
x ) for each
p2 = p
(Refer to Ex. 7). Conversely, if p ∈ L(V, V ) is such that p2 = p, then
(1) V = Ker(p) ⊕ R(p) where R(p) = { x ∈ V | p( x }, and
x) =
(2) p is the projection of V onto R(p) along Ker(p).
Try to explain why 1V −p is the projection of V onto Ker(p) along R(p).
V2
p
x
x − p ( x ) = x2
V1
0 p ( x) = x1
Fig. B.3
14. Prove that the following are equivalent. In cases (d), (e) and (f), V is
supposed to be finite-dimensional. Let V1 , . . . , Vk be subspaces of V .
(a) V = V1 ⊕ · · · ⊕ Vk and is called the direct sum of V1 , . . . , Vk if

k
(1) V = i=1 Vi ,

(2) Vi ∩ j=i Vj = { 0 } for each i, 1 ≤ i ≤ k.

k
(b) V = i=1 Vi , and, for any vector xi ∈ Vi , 1 ≤ i ≤ k such that, if

x1 + · · · +

xk = 0 , then x1 = · · · =
xk = 0 .
(c) Each vector x ∈ V can be uniquely expressed as x = x1 + · · · +
xk ,
where xi ∈ Vi , 1 ≤ i ≤ k.

(d) If Bi is any ordered basis for Vi , 1 ≤ i ≤ k, then B1 ∪ · · · ∪ Bk is an

ordered basis for V .
(e) There exists an ordered basis Bi for Vi , 1 ≤ i ≤ k, such that
B1 ∪ · · · ∪ Bk is an ordered basis for V .

k
(f) V = i=1 Vi and dim V = dim V1 + · · · + dim Vk .
(g) There exist linear operators p1 , . . . , pk ∈ L(V, V ) with R(pi ) = Vi
for 1 ≤ i ≤ k and satisfy:
(1) pi is a projection, i.e. p2i = pi , 1 ≤ i ≤ k.
(2) pi ◦ pj = pj ◦ pi = 0 for i = j, 1 ≤ i, j ≤ k, i.e.
A
Ker(pi ) = Vj .
j=i
(3) 1V = p1 + · · · + pk , i.e.

x ) + · · · + pk (
x = p1 ( x ), x ∈ V.

15. Suppose dim V < ∞ and f ∈ L(V, V ). Prove the following:

(a) Ker(f ) ⊆ Ker(f 2 ) ⊆ · · · ⊆ Ker(f k ) ⊆ Ker(f k+1 ) ⊆ · · · and
R(f ) ⊇ R(f 2 ) ⊇ · · · ⊇ R(f k ) ⊇ R(f k+1 ) ⊇ · · · .
(b) There exists a positive integer k such that R(f m ) = R(f k ) and
Ker(f m ) = Ker(f k ) for any positive integer m ≥ k.
(c) There exists a positive integer n such that

Ker(f n ) ∩ R(f n ) = { 0 }.
16. Let S be a vector subspace of V and V /S be the quotient space (refer

to Sec. B.1).
(a) Define π: V → V /S by
π(
x) =
x +S x ∈ V.
for
Then π is well-defined and is a linear transformation which is called

the natural projection of V onto V /S.
(b) V = S ⊕ U if and only if the restriction π|U : U → V /S is a linear
isomorphism.
(c) In case dim V < ∞, then
dim V /S = dim V − dim S.
Once dim V /S < ∞, it is called the co-dimension of S in V and is

denoted as co-dim S.
17. Suppose f ∈ L(V, W ) and S, T are respective subspace of V, W such
that f (S) ⊆ T holds.
(a) Then there exists a unique linear transformation f˜: V /S → W/T

such that the following diagram is commutative.
f
V −−−−→ W
 

πV 1
π
1 W
V /S −−−−→ W/T
f˜
Where πV , πW are natural projections (see Ex. 16).

(b) In case W = V, T = S, and S is an f -invariant subspace, i.e. f (S) ⊆
S, then there exists a unique linear transformation f˜: V /S → V /S
such that f˜ ◦ π = π ◦ f where π = πV is the projection.
(c) There is a unique linear isomorphism f˜: V /Ker(f ) → R(f )
defined by
f˜(
x + Ker(f )) = f (
x)
so that f = f˜ ◦ π where π: V → V /Ker(f ) is the projection. This
means that every linear transformation can be expressed as the com-
posite of an onto linear transformation and a linear isomorphism.
18. Suppose f ∈ L(V, W ) and g ∈ L(V, U ) such that Ker(f ) ⊆ Ker(g)
holds. Try to find h ∈ L(W, U ) such that g = h ◦ f . The following
diagram will guarantee the existence of such a linear transformation h
where
π: V → V /Ker(f ) is the natural projection;
f˜: V /Ker(f ) → R(f ) is the linear isomorphism as in Ex. 17(c);
f : V /Ker(f ) → U is the linear transformation such that g = f ◦ π.
This is possible because Ker(f ) ⊆ Ker(g) and f is defined by
f (x + Ker(f )) = g( x ∈V;
x ) for
˜−1
h = f ◦ f : R(f ) → U is linear; and
h: W → U is a linear extension of h from R(f ) to the whole space W.
Then this h is a required one.

(a) Ker(f ) ⊆ Ker(g) if and only if there exists a linear transformation
h: W → U such that
g = h ◦ f.
In this case, g is said to be decomposable through f .

An element in L(V, F) is specifically called a linear functional.
(b) Suppose fi ∈ L(V ; F) for 1≤ i ≤ n. Consider ϕ = (f1 , . . . , fn ) ∈
L(V ; Fn ) and use Ex. 17(c), then
4
n
co- dim Ker(fi ) ≤ n < +∞.
i=1
(c) Suppose f, fi ∈ L(V, F) for 1 ≤ i ≤ n. Then, by (a),
4
n
Ker(fi ) ⊆ Ker(f )
i=1
if and only if f is a linear combination of f1 , . . . , fn , i.e. there exist

scalars a1 , . . . , an ∈ F such that
f = a1 f1 + · · · + an fn .
(Note This result lays the linearly algebraic foundation for the
Lagrange multiplier method in solving constrained extremum prob-
lems (refer to Ex. <D> 8 of Sec. 5.6).)
19. The vector space L(V, F) is usually denoted by
V∗
x ∈ V and f ∈ V ∗ , the
and is called the (first) dual space of V . For
scalar f ( x ) is also denoted as x , f .

(a) For any x ∈ V and x = 0 , there exists an f ∈ V ∗ such that
x , f = 0 but y , f = 0, for any other

y ∈V.
(b) Suppose dim V = n and B = { xn } is a basis for V . Then,
x1 , . . . ,
there exists a unique basis B∗ = {f1 , . . . , fn } for V ∗ such that
xi , fj = δij ,
1 ≤ i, j ≤ n.
B ∗ is called the dual basis of B in V ∗ . Therefore,

n

x=
x , fj
xj , x ∈V;

j=1

n
f=
xi , f fi , f ∈ V ∗.
i=1
In particular, dim V ∗ = dim V = n and hence V ∗ is isomor-

phic to V .
(c) Let Pn (F), N = {1, t, t2 , . . . , tn } and B = {p0 , p1 , p2 , . . . , pn } be
as in Sec. B.3, where pi , 1 ≤ p ≤ n, are Lagrange polynomi-
als associated with the scalars a0 , a1 , a2 , . . . , an ∈ F. Let B∗ =
{f0 , f1 , f2 , . . . , fn } be the dual basis of B in Pn (F)∗ . Note that
fi (p) = p(ai ), 0 ≤ i ≤ n, for p ∈ Pn (F). Therefore, by (b), for
any p ∈ Pn (F),

n
n
n
p= p, fj pj = fj (p)pj = p(aj )pj
j=0 j=0 j=0

n
⇒ p(t) = p(aj )pj (t).
j=0
In particular, take p = ti , then

n
ti = aij pj (t), 0 ≤ i ≤ n.
j=0
The coefficient matrix

 
1 1 1 ··· 1
 a0 a1 a2 · · · an 
 
[aij ] =  . .. .. .. 
 .. . . .
an0 an1 an2 · · · ann
is the transition matrix from the basis N to the basis B and hence
is invertible. This matrix is called Vandermonde matrix of order
n + 1 and has the determinant
=
det[aij ] = (aj − ai ).
0≤i<j≤n

n
In general, p(t) = j=0 p(aj )pj (t) is nothing but the Lagrange inter-
polation formula mentioned in Sec. B.3.
20. Suppose S is a proper subspace of V such that, for any subspace U

satisfying S ⊆ U ⊆ V , then it is necessarily that U = S or U = V .
Such a subspace S is called a hypersubspace of V .
(a) Let f ∈ V ∗ and f = 0. Then, for any
x0 ∈ V \ Ker(f ),
x0
V = Ker(f ) ⊕
holds and f (α x0 ), for any α ∈ F.

x0 + Ker(f )) = αf (
(b) S is a hypersubspace of V if and only if there exists a nonzero
f ∈ V ∗ such that
S = Ker(f ).
(c) Suppose f, g ∈ V ∗ . Then, Ker(f ) ⊆ Ker(g) if and only if there

exists a scalar α ∈ F such that
g = αf.
In case g = 0, take α = 0; if g = 0, then Ker(f ) = Ker(g) and

α = 0 can be taken as g( x0 )/f (
x0 ) for any x0 ∈ V \Ker(f ).
∗
(Note For f ∈ V and f = 0, the mapping Ker(f ) ↔ f sets up
a one-to-one correspondence between the set of all hypersubspaces
of V and the set of all one-dimensional subspaces of V ∗ .)
(d) Suppose dim V = n. U is a k-dimensional subspace of V if and only
if U is the intersection of (n − k) hypersubspaces of V . That is to
say, there exist linearly independent f1 , . . . , fn−k ∈ V ∗ such that
4
n−k
U= Ker(f ).
i=1
(e) Suppose dim V = n. Let U and f1 , . . . , fn−k be as in (d). For

f ∈ V ∗ , then
4
n−k
U= Ker(fi ) ⊆ Ker(f )
i=1
if and only if there exist scalars α1 , . . . , αn−k ∈ F such that

f = α1 f1 + · · · + αn−k fn−k , i.e. f ∈ f1 , . . . , fn−k .
21. For a vector space V , the dual space
(V ∗ )∗ = V ∗∗
of its first dual space V ∗ is called the second dual space of V .

x ∗∗ : V ∗ → F by
x ∈ V , define
(a) For each vector
∗∗
x (f ) = f (x) = f,
x .
x ∗∗ ∈ V ∗∗ .
Then,
(b) Define Φ: V → V ∗∗ by
Φ( x ∗∗ .
x) =
Then, Φ is a one-to-one linear transformation from V into V ∗∗ .
(c) Suppose dim V = n < ∞. Then Φ: V → V ∗∗ is a natural isomor-
phism. Therefore, every basis for V ∗ is a dual basis of some basis
for V in the sense that

x , f = f,
x , x ∈ V and f ∈ V ∗ .

We call V and V ∗ are dual to each other relative to the symmetric

bilinear functional , : V × V ∗ → F defined by
x , f = f (
x ).
∗ ∗
(d) Suppose dim V = n. Let B and B , C and C be two pairs of dual
bases for V and V ∗ respectively. Then
Q = P ∗−1 ,
∗
where P = [1V ]CB and Q = [1V ∗ ]CB∗ .
(e) Let B = { xn } and B ∗ = {f1 , . . . , fn }, C = {
x1 , . . . , yn }
y1 , . . . ,
∗ ∗
and C = {g1 , . . . , gn }. Let φ, ψ: V → V be the unique linear
isomorphisms such that
ϕ(
xi ) = fi and ψ(
y i ) = gi , for 0 ≤ i ≤ n.
Then, ϕ = ψ if and only if P = Q, i.e.
P = P ∗−1 .
22. Let S be a nonempty subset of a vector space V . The set

S 0 = {f ∈ V ∗ | f ( x ∈ S}
x ) = 0 for all
is a vector subspace of V ∗ and is called the annihilator of S in V ∗ . In
particular,

{ 0 }0 = V ∗ ,
V 0 = {0}.
(a) For a general vector space V , the following hold.
x ∈
(1) If S is a subspace of V and / S, then there exists f ∈ S 0
such that f ( x ) = 0.

(2) If S ⊆ V is a subspace, then V is isomorphic to the external

direct sum space S ⊕ S 0 = {(
x, f) |
x ∈ S and f ∈ S 0 }.
(3) Suppose S1 and S2 are subspaces of V. Then S1 = S2 if and
only if S10 = S20 .
(4) The subspace S generated by S satisfies
S ⊆ (S 0 )0 = S 00 .
(b) Suppose dim V = n < ∞. The following hold.

1. Let S be a subspace of V . Then,
dim S + dim S 0 = dim V.

5k
In this case, for any basis {f1 , . . . , fk } for S 0 , S = i=1 Ker(fi ).
2. S = S 00 for any nonempty subset S of V .
3. Suppose S1 and S2 are subspaces of V . Then,
(S1 + S2 )0 = S10 ∩ S20 ;

(S1 ∩ S2 )0 = S10 + S20 .
(Note For f ∈ V ∗ and f = 0,
Ker(f )0 = f
holds. Hence (refer to Note in Ex. 20(c)), S → S 0 sets up a one-

to-one correspondence between the family of subspaces of V and
the family of subspaces of V ∗ , but reverses the inclusion relation
(i.e. S1 ⊆ S2 ⇒ S10 ⊇ S20 ). This reflects geometrically the duality
principal between V and V ∗ .)
23. Suppose dim V = n < ∞.
(a) In case, V = S1 ⊕ S2 , then S10 is isomorphic to S2∗ = {f |s2 | f ∈ V ∗ }
and S20 is isomorphic to S1∗ and V ∗ = S10 ⊕ S20 = S1∗ ⊕ S2∗ .
(b) Conversely, if S is a subspace of V , then V = S ⊕ (S ∗ )0 ; V ∗ =
S∗ ⊕ S0.
(c) Suppose S is a subspace of V . Then
(1) S 0 is naturally isomorphic to (V /S)∗ .
(2) S ∗ is naturally isomorphic to V ∗ /S 0 .
24. Suppose ϕ ∈ L(V, W ). Then, there exists a unique ϕ∗ ∈ L(W ∗ , V ∗ )

such that
ϕ∗ (g) = g ◦ ϕ, g ∈ W∗
or
x , ϕ∗ (g) = ϕ(
x ), g,
x ∈ V, g ∈ W ∗ .
See the following diagram.
ϕ g
V −−−−→ W −−−−→ F
V ∗ ←−−∗−− W ∗
ϕ
Such a ϕ∗ is called the adjoint or the dual of ϕ. For matrix counterpart,

see Sec. B.8. The mapping ϕ → ϕ∗ has the following properties:
(ϕ + ψ)∗ = ϕ∗ + ψ ∗ ;
(αϕ)∗ = αϕ∗ , for α ∈ F;
∗ ∗ ∗
(ϕ ◦ ψ) = ψ ◦ ϕ , if ψ ∈ L(W, U );
∗∗ ∗ ∗
ϕ = (ϕ ) = ϕ, if dim V < ∞ and dim W < ∞.
(a) Suppose dim V = m < ∞ and dim W = n < ∞. Let B be a basis
for V and B∗ its dual basis in V ∗ , and C a basis for W and C ∗ its
dual basis in W ∗ . Then, for ϕ ∈ L(V, W ),
∗ ∗
[ϕ∗ ]CB∗ = [ϕ]B
C ,
∗
that is, n × m matrix [ϕ∗ ]CB ∗ is the transpose of m × n matrix [ϕ]B
C.
(b) Suppose dim V = m < ∞ and dim W = n < ∞. Then, for any
ϕ ∈ L(V, W ),
Ker(ϕ) = R(ϕ∗ )0 , R(ϕ) = Ker(ϕ∗ )0 ;
Ker(ϕ∗ ) = R(ϕ)0 , R(ϕ∗ ) = Ker(ϕ)0 ;
by using the dimension theorems, dim Ker(ϕ) + dim R(ϕ) =
dim V = m and Ker(ϕ) + dim Ker(ϕ)0 = dim V = m. See Fig. B.4.
(c) Suppose dim V < ∞ and dim W < ∞. Then, for ϕ ∈ L(V, W ),
r(ϕ) = r(ϕ∗ ),
i.e. ϕ and its adjoint ϕ∗ have the same rank. In matrix terminology,
this means that the row rank of a matrix is equal to its column rank
(refer to Sec. B.5 and Ex. 2).
Ker() R()
(V ) 0
0 (W )
(V * ) 0 * 0 (W * )
R(*) = Ker()0 Ker(*) = R()0
Fig. B.4
(d) Let f1 , f2 , f3 ∈ (C4 )∗ be defined as
x ) = ix1 + x2 − 3ix3 + 4x4 ,

f1 (
x ) = −x1 + x3 − ix4 ,
f2 (
x ) = −2x2 + (1 + i)x3 + ix4 .
f3 (
Find the unique subspace S of C4 such that S 0 = f1 , f2 , f3 .

Define f : C4 → C3 by f ( x ) = (f1 (
x ), f2 (
x ), f3 (
x )) and use it to
justify results stated in (b).
(e) Let S be the subspace of R5 generated by

x1 = (1, −1, 2, 3, 0),
x2 = (−1, 1, 2, 5, 2),

x3 = (0, 0, −1, −2, 3),
x4 = (2, −2, 3, 4, −1)
Find S 0 .
It is worth saying something more about x ∈ V and f ∈ V ∗

x , f where
in case dim V = n < ∞.
Suppose B = { xn } is a basis for V , and D = {f1 , . . . , fn } is
x1 , . . . ,
a basis for V which is not necessarily the dual basis of B in V ∗ . For
∗

n
n

x = i=1 ai xi and f = j=1 bj fj , by bilinearity of , ,

n
x , f = f (
x) = ai bj
x i, fj ,
i,j=1
which is a bilinear form in a1 , . . . , an and b1 , . . . , bn with symmetric coeffi-

cient matrix
xi , fj ]n×n .
, B,D = [
Thus,
x ]B , B,D [f ]∗D .
x , f = [

In case D = B∗ is the dual basis of B in V ∗ , then , B,B∗ = In and

n

x, f = x ]B [f ]∗B∗ .
ai bi = [
i=1
Just a stone’s throw from here is the natural inner product on pairs of
vectors in the real vector space Rn (see Sec. B.9). Identify (Rn )∗ with Rn
in the sense that the dual basis N ∗ = {f1 , . . . , fn } of the natural basis N =

n
{ ei for 1 ≤ i ≤ n and f = i=1 yi fi ∈ (Rn )∗ is
e } is so that fi =
e1 , . . . ,

nn
equal to i=1 yi y ∈ Rn . Then,
ei =

n
y =
x, xi yi
i=1
is the required inner product in Rn . x is said to be perpendicular to

y if
n ∗
x , y = 0. Isomorphism between R and (R ) will guarantee the justifi-
n
cation of this viewpoint. See Ex. 5 of Sec. B.9.

Bilinear functionals or the resulted quadratic forms by themselves
form an important topic in linear algebra, especially related to algebraic
geometry. These materials are beyond the scope of an elementary book
such as this one.
B.8 A Matrix and its Transpose

Theory concerning the transpose of a linear transformation as present in
Ex. 24 of Sec. B.7 seems simpler and neater in the eyes of matrices. The
pictures here aim to illustrate this by centering around the solution of the
system of linear equations.
Let A = [aij ] ∈ M(m, n; F). Then its transpose A∗ ∈ M(n, m; F).
x in Fm would view as a 1 × m matrix so that
For simplicity, vectors

x A is meaningful both as a matrix product and as the action of the linear
transformation A acting on the vector x . Instead of emphasizing x as a
column vector whenever needed, we will use x ∗ to represent both as a
vector and as an m × 1 matrix, the transpose of 1 × m matrix x.
B.8 A Matrix and its Transpose 757
Associated with A are the following two linear transformations:

A: F m → F n defined as
x →
x A,
A∗ : F n → F m defined as y A∗ which is equivalently to say that
y →
∗
y ∗.
y → A
and four vector subspaces (refer to Application 8 in Sec. B.5 for their com-
putations):
xA |
R(A) = { x ∈ F n }: the row space of A, i.e. Im(A),

N(A) = {
x ∈ Fm |
x A = 0 }: the left kernel space of A, i.e. Ker(A),
R(A∗ ) = {
y A∗ |
y ∈ F n }: the column space of A, i.e. Im(A∗ ),

N(A∗ ) = { y A∗ = 0 }: the right kernel space of A, i.e. Ker(A∗ ).
y ∈ Fn |
In order to simulate the concept of perpendicularity of vectors in Rn , we
would use the notation
S⊥
to denote the annihilator of a nonempty subset S of Fm instead of S 0 as
we did in Sec. B.7. Also, dual space (Fm )∗ is always regarded as Fm itself
under isomorphism.
For any x ∈ Fm and y ∈ N(A∗ ) ⊆ Fn ,
∗
y = (
x A, y∗ =
x A) y ∗) =
x (A x0 =0
always holds. It comes immediately the following facts:
1. R(A)⊥ = N(A∗ ) and R(A) = N(A∗ )⊥ , and Fn = N(A∗ ) ⊕ R(A) with
dim N(A∗ ) = n − r, r = dim R(A) = r(A), the rank of A.
2. R(A∗ )⊥ = N(A) and R(A∗ ) = N(A)⊥ , and Fm = N(A) ⊕ R(A∗ ) with
dim N(A) = m − r, r = dim R(A∗ ) = r(A).
For geometric intuition, remember that every vector in the left kernel space
is “perpendicular” to every vector in the column space.
x ∈ Fm ,
For every x A ∈ R(A) ⊆ Fn . On the other hand, there exists
y ∈ F such that the expression
n

y A∗ +
x −
x = y A∗ , y A∗ ∈ N(A) and
x −

y A∗ ∈ R(A∗ )
is unique. Note that there are many such
y . Therefore

y A∗ )A =
x A = ( y A∗ A
holds. This implies that
A|R(A∗ ) : R(A∗ ) → R(A)
is a linear transformation and is an isomorphism too. This is because that,

if
x A = 0 , then x ∈ N(A) which induces that y A∗ ∈ N(A) and in turn
∗
yA = 0.
Dual results also hold for A∗ .
We summarize these in the following
Theorem Let A ∈ M(m, n; F). Consider A: Fm → Fn as the linear trans-

x A and A∗ : Fn → Fm as
x →
formation defined as y →y A∗ .
1. A|R(A∗ ) : R(A∗ ) → R(A) defined by

y A∗ → (
y A∗ )A =
y A∗ A =
x A,
y ∈ Fn
is a linear isomorphism where x = x −y A∗ +
y A∗ , y A∗ ∈ N(A)
x −
∗ ∗
and y A ∈ R(A ) for some y ∈ F if x is given.
n
2. A∗ |R(A) : R(A) → R(A∗ ) defined by

x A)A∗ =
x A → (

x AA∗ =
y A∗ , x ∈ Fm

is a linear isomorphism where

y =y −
xA +
x A, x A ∈ N(A∗ ) and
y −
x A ∈ R(A) for some x ∈ F if y is given.
m
See Fig. B.5.
x − yA*
(F m )
x
R( A* ) 0 A
yA*
N(A)
y − xA y
(F n )
*
A | R( A )
xA = ( yA* ) A
A*
0 xA
x − yA*
R( A)
N( A* )
x
A* | R((A)
yA* = ( xA) A* 0 (F ) m
N(A) yA*
R( A* )
Fig. B.5
The materials here and the problems in the forthcoming exercises are
briefly taken from [4]. And readers are highly recommended to catch a
glimpse of Gilbert Strang’s article [17].
Exercises
A = [aij ] ∈ M(m, n; F) throughout the following problems unless specified
noted .
1. Let
 
1 −1 2 3 0
 
−1 1 2 5 2
A=

.

 0 0 −1 −2 3
2 −2 3 4 −1
(a) Find R(A) and N(A∗ ) so that R5 = R(A) ⊕ N(A∗ ).

(b) Find R(A∗ ) and N(A) so that R4 = R(A∗ ) ⊕ N(A).
(c) Give x = (x1 , x2 , x3 , x4 ) ∈ R4 and determine all possible y =
(y1 , y2 , y3 , y4 , y5 ) ∈ R5 such that y A∗ ∈ N(A) and hence,
x −
∗
y A A = x A holds.
∗
y ∗ = b , where
2. Consider the system of linear equations: A y ∈ Fn and
∗
b ∈ F , which is equivalent to y A = b .
m
(a) Existence of solution The following are equivalent.

∗
y ∗ = b has at least one solution for any b ∈ Fm .
(1) A
(2) The column space R(A∗ ) = Fm , i.e. r(A) = m.
(3) A is right-invertible, i.e. there exists B ∈ M(n, m; F) such that
AB = Im .
(b) Uniqueness of solution The following are equivalent.
∗
y ∗ = b has at most one solution for any b ∈ Fm .
(1) A
(2) The n column vectors of A form a basis for the column space
R(A∗ ), i.e. r(A) = n ≤ m.
(3) A is left-invertible, i.e. there exists C ∈ M(n, m; F) such that
CA = In .
(c) Existence and uniqueness of the solution The following are

equivalent.
∗
(1) Ay ∗ = b has a unique solution for any b ∈ Fm .
(2) r(A) = n = m.
∗
y ∗ = A−1 b or
In this case, the unique solution is y = b A∗−1 .
∗
(Note Change A y ∗ = b back to y A∗ = b . We use part of
Fig. B.5 to interpret results obtained here. See Fig. B.6. For any given

b ∈ R(A∗ ) ⊆ Fm , the equation y A∗ = b certainly has a solution
x0 A ∈ R(A) and its solution set is an (n − r(A))-dimensional affine

subspace x0 A + N(A∗ ) of Fn .)
N ( A* ) x0 A + N ( A* )
y
R( A)
R( A* )
x0 A
A* 0
0
b = yA* = x0 AA*
(F )
n
(F m )
Fig. B.6
3. Right and left invertible matrices of A

(a) Suppose r(A) = m < n. Then the set of all right invertible matrices
of A is
R = {(y ∗1 , . . . , y ∗m ) ∈ M(n, m; F) | each yi ∈ Fn is a solution of

Ay ∗ = ei∗ ∈ Fm for 1 ≤ i ≤ m}.
y∗ =
For each i, 1 ≤ i ≤ m, the solution set of A ei∗ ∈ Fm is an
(n − m)-dimensional affine subspace of F . Therefore, A has
n
infinitely many right invertible matrices if F is an infinite field.

In particular, a right invertible matrix is
A∗ (AA∗ )−1 .
(b) Suppose r(A) = n < m. Then the set of all left invertible matrices
of A is
 

 x1

 
 x2 
=  .  ∈ M(n, m; F)| each
L xi ∈ Fm is a solution of

  .. 


xn




xA = ei ∈ Fn for 1 ≤ i ≤ n .


A particular left invertible matrix is
(A∗ A)−1 A∗ .
∗
y ∗ = b where
4. Solution set of A y ∈ Fn and b ∈ Fm , b = 0 Suppose
r(A) = r and

(A11 A12 ), r(A) = r(A11 ) = m,
A = A11 A12
 , r(A11 ) = r < m,
A21 A 22
∗
∗
b , r(A) = m,
∗
b = b1
 ∗ , r(A) = r < m, b1 ∈ Fr , b2 ∈ Fm−r .
b2
∗
y ∗ = b has a solution if and only if
(a) A
(1) in case r(A) = m, or
∗
(2) in case r(A) = r < m, then b∗2 = A21 A−1
11 b1 holds.
(b) The solution set is

−1 ∗
n ∗ A11 b1 − A−1 y1 ∗
A12
y ∈F |y =
11 , y1 ∈ F n−r
y1 ∗

= ( b1 (A∗11 )−1 , 0 )1×n
y1 A∗12 (A∗11 )−1 ,
+ {(− y1 )1×n ∈ Fn |
y1 ∈ Fn−r }
which is an (n − r)-dimensional affine subspace of Fn . Note that, if

r(A) = n = r, Fn−r = { 0 }.
∗
y∗ = b
5. Solution set of A y ∈ Fn and b ∈ Fm , b = 0
where
(a) Suppose r(A) = m < n. The solution set
∗
y ∗ = B b where B is any right invertible matrix of A}
y ∈ Fn |
{
is an (n − m)-dimensional affine subspace of Fn with a particular

solution
∗
(A∗ (AA∗ )−1 b )∗ = b (AA∗ )−1 A.
∗ ∗
(b) Suppose r(A) = n < m. If A y ∗ = b has a solution (i.e. r[A | b ] =
∗
y ∗ = A∗ b ,
r(A)), then the unique solution is the solution of A∗ A
which is
∗
((A∗ A)−1 A∗ b )∗ = b A(A∗ A)−1
(refer to Ex. 2(b) and Ex. 3(b)).
6. Examples for Exs. 4 and 5.
(a) Let

1 1 −2 3
A = [A11 A12 ]2×4 , where A11 = and A12 = .
1 −2 1 −1
Then r(A) = 2 < 4. The right invertible matrices of A are of the
form
2 5 
3 + z1 − 3 z2 3 + z3 − 3 z 4
5 1
1 
 + z1 − 43 z2 − 13 + z3 − 43 z4 
B = 3  , z1 , z2 , z3 , z4 ∈ F.
 z1 z3 
z2 z4 4×2
∗
For b = (b1 , b2 ) ∈ F2 , A y ∗ = b has solutions

2 1 1 1

y = b1 + b2 , b1 − b2 , 0, 0
3 3 3 3

5 4
+ α(1, 1, 1, 0) + β − , − , 0, 1 ,
3 3
where α, β ∈ F.
(b) Let
 
0 1 1
1 0 1
A=
1
 .
1 0
4 1 −1 4×3
Then r(A) = 3 < 4. A has left invertible matrices of the form
 1 
− 2 + 2z1 2 − z1
1
2 − 3z1
1
z1
 
B =  21 + 2z2 − 12 − z2 2 − 3z2
1
z2  , z1 , z2 , z3 ∈ F.
1
2 + 2z3 1
2 − z3 − 12 − 3z3 z3
∗
For b = (b1 , b2 , b3 , b4 ) ∈ F4 , the possible solution of A y ∗ = b is
 1 
− 2 b1 + 12 b2 + 12 b3 + z1 (2b1 − b2 − 3b3 + b4 )
∗ ∗  
y = B b =  12 b1 − 12 b2 + 12 b3 + z2 (2b1 − b2 − 3b3 + b4 ) .
1
2 b1 + 12 b2 − 12 b3 + z3 (2b1 − b2 − 3b3 + b4 )
∗ ∗
y ∗ = b has a solution if and only if r(A) = r[A | b ]
Therefore, A
which induces that 2b1 − b2 − 3b3 + b4 = 0. In this case, the unique
solution is

y = b A(A∗ A)−1
 
0 1 1  
1  9 −15 9
0 1 1 
= (b1 b2 b3 b4 ) 
1 · −15 45 −15
1 0 60
9 −15 29
4 1 −1

1 1 1 1 1 1 1 1 1
= − b 1 + b 2 + b 3 , b 1 − b 2 + b 3 , b1 + b2 − b 3 ,
2 2 2 2 2 2 2 2 2
∗
y ∗ = B b or
which coincides with y = b B ∗ above.
Exercises 7–14 are concerned with real or complex m × n matrix A =
[aij ]m×n . By the way, the readers are required to possess basic knowledge
about inner product , in Rn or Cn (see Sec. B.9, if necessary).
∗
y∗ = b
7. The geometric interpretation of the solution b (AA∗ )−1 A of A

with b = 0 Suppose r(A) = m < n. Thus, the solution b (AA∗ )−1 A
is the one that makes | y | minimum among so many solutions of
∗ ∗
A y = b (see Exs. 4 and 5). That is, the distance from 0 to the
∗
y∗ = b
solution set of A
min∗
|
y|
y ∗= b
A

is obtained at b (AA∗ )−1 A with

perpendicular vector : b (AA∗ )−1 A,

distance: | b (AA∗ )−1 A|.
See Fig. B.7.

8. The geometric interpretation of b A(A∗ A)−1 in case r(A) = n < m

and b = 0 For a given b ∈ Rm (or Cm ), as an approximate solution
∗ ∗
y ∈ Rn to A

y ∗ = b , the error function | b − A
y ∗| = | b −
y A∗ |
∗ −1 ∗ ∗
attains its minimum at b A(A A) . In particular, if A y = b or
the solution affine subspace in Rn
bB* (B is a right invertible matrix of A)

* –1
b ( AA ) A
Fig. B.7

y A∗ = b has a solution, then | b −

y A∗ | attains its minimum 0 at the
∗ −1
unique solution b A(AA ) . That is,

y A∗ |
min | b −
y ∈Rn

is obtained at b A(A∗ A)−1 . The linear operator A(A∗ A)−1 A∗ : Rm →
R(A∗ ) ⊆ Rm is the orthogonal projection of Rm onto the column space
R(A∗ ) of A with

projection vector: b A(AA∗ )−1 A∗ ∈ R(A∗ ),

distance vector: b − b A(AA∗ )−1 A∗ ∈ N(A),
the left kernel of A, and

distance from b to R(A ): | b − b A(AA∗ )−1 A∗ |.
∗
See Fig. B.8.
N(A)
b
b − bA( A* A)−1 A*
0
bA( A* A) −1 A*
R( A* )
Fig. B.8

In short, y A∗ = b is not possible for general b ∈ Rm but b A(A∗ A)−1

is the optimal solution of y A∗ = b in the sense that it minimizes the
∗
quantities | b − y A | as y varies in Rn by solving

y A∗ A = b A. This is
the so-called least square problems. Take Ex. 6(b) as our example here.

|b − y A∗ | has its minimum at
 
−6 30 14
1  18 −30 38
(b1 b2 b3 b4 )  

b A(A∗ A)−1 = −6
60 30 −6
12 0 −8
1
= (−3b1 + 9b2 − 3b3 + 6b4 ,
30
15b1 − 15b2 + 15b3 , 7b1 + 19b2 − 3b3 − 4b4 ).
As a summary, it is worth reviewing some of main results in order to picture
what will be going on in the sequel. For Am×n ,
(1) If r(A) = m = n, then A has invertible matrix A−1 .
(2) If r(A) = m < n, then A has right invertible matrix A∗ (AA∗ )−1 .
(3) If r(A) = n < m, then A has left invertible matrix (A∗ A)−1 A∗ .
What happens if r(A) = r < min(m, n)? To answer this question, Exercises
9–14 investigate the generalized inverse matrix or pseudoinverse matrix
introduced by E. H. Moore (1935) and R. Penrose (1953).
9. Geometric mapping behavior of AA∗ : Rm → R(A∗ ) ⊆ Rm The following
are equivalent (refer to Fig. B.5).
x AA∗ =
(a) y A∗ ,
x ∈ Rm and y ∈ Rn .
(b) y − x A ∈ N(A ) = R(A)⊥ . Therefore,
∗
y = y − xA + x A,
x A ∈ R(A).

(c)
x A is the orthogonal projection (see Sec. B.9) of y onto the row
space R(A) of A.
(d) |
y −x A| = min z ∈Rm | y − z A|, i.e. | y − x A| is the distance from

y to the row space R(A).
Try to figure out corresponding results for the mapping A∗ A: Rn →
R(A) ⊆ Rn .
10. Characterization of AA∗ : Rm → R(A∗ ) as an orthogonal projection (see
Sec. B.9) The following are equivalent (refer to Fig. B.5).
(a) (geometric) AA∗ : Rm → R(A∗ ) ⊆ Rm is an orthogonal projection
onto the column space R(A∗ ) of A (i.e. AA∗ is symmetric and
(AA∗ )2 = AA∗ ).
(b) (algebraic) A∗ AA∗ = A∗ or AA∗ | R(A∗ ) is an identity mapping.

(c) (geometric) x A ∈ N(A∗ ) = R(A)⊥ ,
y − x −y A∗ ∈ N(A) = R(A∗ )⊥
for x ∈ R and y ∈ R .
m n
∗
y A∗ | x | = | y A | for a given y ∈ R and all
n
(d) (geometric) minx AA∗ =
such x ∈ Rm .

x ∈ Rm and
(e) (geometric) For y ∈ Rn such that x AA∗ =
y A∗ ,
xA

is the orthogonal projection of y on the row space R(A) of A and

y A∗ A.
xA =
Therefore,
x A| = min |
y −
| y −
z A|.
z ∈Rm
∗ ∗
(f) (algebraic) AA A = A or A A | R(A) is an identity mapping.
(g) (geometric) A∗ A: Rn → R(A) ⊆ Rn is an orthogonal projection
from Rn onto the row space R(A) of A (i.e. A∗ A is symmetric and
(A∗ A)2 = A∗ A).
In case AA∗ : Rm → Rm is not an orthogonal projection, equivalent state-
ments in Ex. 10 are no more true and we shall withdraw our knowledge
from here to what stated in Ex. 9. For a given Am×n , it is the deviation of
the geometric mapping properties of A∗ that makes AA∗ not an orthogonal
one. How to compensate this shortcoming of A∗ is to replace A∗ by an
n × m matrix A+ so that AA+ will recover many of the equivalent prop-
erties stated in Ex. 10. This A+ is the Moore–Penrose generalized inverse
matrix of A.
11. Equivalent definitions of the generalized inverse matrix Suppose Am×n
is a real or complex matrix. For any real or complex n × m matrix A+ ,
the following possible properties of A and A+ are equivalent.
(a) (algebraic)
(1) AA+ A = A.
(2) A+ AA+ = A+ .
(3) AA+ and A+ A are symmetric matrices (or Hermitian in case
complex matrices).
x ∈ Rm and
(b) (geometric) AA+ and A+ A are symmetric, and for
y ∈ Rn , then

xA −

y ∈ N(A+ ), the left kernel of A+
⇔
x −
y A+ ∈ N(A).

(c) (geometric and algebraic) For given b ∈ Rn , then among all solu-

tions or approximate solutions
x of
x A = b , it is the vector

x0 = bA+
that satisfies the following restricted conditions simultaneously,

(1)
x A is the orthogonal projection of b on the row space R(A),
x ∈ R(A∗ ), the column space of A.
(2)

(d) (geometric and algebraic) For given b ∈ Rn , then among all solu-

tions or approximate solutions
x of
x A = b , it is the vector

x0 = bA+
that satisfies

(1) | b − z ∈Rm | b − z A|,
x A| = min
(2) | x | is the minimum.

x0 is usually called the optimal solution of the equations
xA = b
under the constrained conditions 1 and 2.
Such an A+ exists uniquely once A is given (see Ex. 12), and is called
the generalized inverse (matrix) or pseudoinverse of A.
12. Existence and uniqueness of A+ For any permutation
σ: {1, 2, . . . , m} → {1, 2, . . . , m}, the m × m matrix
 
e σ(1)
.. 
. 

e σ(m) m×m
obtained by performing σ on rows of the identity matrix Im is called a

permutation matrix of order m. For an m × n matrix A of rank r, there
exist permutation matrices Pm×m and Qn×n such that

A11 A12
P AQ = ,
A21 A22 m×n
where A11 is an invertible r ×r matrix so that r(A11 ) = r(A) = r. Then,
A = BC, where

Ir A11
B = P −1 or P −1 ,
A21 A−1
11 A21
C = (A11 A12 )Q−1 or (Ir A−1
11 A21 )Q
−1
.
Note that Bm×r and Cr×n are such that r(B) = r(C) = r. Therefore,
the generalized inverse is
A+ = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ .
In particular,
 ∗ ∗ −1
A (AA ) , if r(A) = m < n
A = (A∗ A)−1 A, if r(A) = n < m .
+
 −1
A , if r(A) = m = n
Consider A+ : Rn → Rm as the linear transformation defined by
y →

y A+ . Then
R(A+ ) = R(A∗ ),
R((A+ )∗ ) = R(A),
and, of course, r(A+ ) = r(A). Hence,
(1) AA+ : Rm → Rm is the orthogonal projection of Rm onto the col-
umn space R(A∗ ) of A, i.e.
AA+ is symmetric and (AA+ )2 = AA+ .
(2) A+ A: Rn → Rn is the orthogonal projection of Rn onto the row
space R(A) of A, i.e.
A+ A is symmetric and (A+ A)2 = A+ A.
Remember that Rn = N(A∗ ) ⊕ R(A) and R(A)⊥ = N(A∗ ). (1) suggests
that A+ : Rn → Rm takes the row space R(A) back to the column space
R(A∗ ) and its restriction to the right kernel N(A∗ ) is zero. Because
r(A) = r(A∗ ), its restriction A | R(A∗ ): R(A∗ ) → R(A) is invertible.
Therefore,
A+ = (A | R(A∗ ))−1 : R(A) → R(A∗ )
inverts A | R(A∗ ). I agree with Gilbert Strang’s comment: that this is the
one natural best definition of an inverse (see [17, p. 853]). See Fig. B.9
(compare with Fig. B.5).
x AA∗ is not necessarily the orthogonal projection

x ∈ Rm ,
Notice that for
of x onto R(A∗ ) and hence

x AA∗ may be not equal to x AA+ . If it is for
∗
each x ∈ R , then AA = AA holds. This is equivalent to say that, for
m +
y ∈ Rn such that

y = y − xA + y A∗ =
x A, x AA∗ which is equivalent to
+ +
y A = x AA (see Exs. 10 and 11).
(R m ) N(A) x N( A* ) y ( R n)
AA+ (orthogonal projection)
A R( A)
*
A | R( A )
R( A* ) (isomorphic)
0 x AA+ = y A+ 0 xA
A+
x AA* = y A*
A*
Fig. B.9
13. Some basic properties of generalized inverses
(1) (αA)+ = α1 A+ where α = 0.

(2) (A+ )∗ = (A∗ )+ . Hence A+ is symmetric if A is.
(3) (A+ )+ = A.
(4) Suppose Am×r and Br×n and r(A) = r(B) = r, then
(AB)+ = B + A+ .
But, in general, (AB)+ = B + A+ (see (b) below).

(5) Suppose An×n and A∗ = A and A2 = A. Then
A+ = A.
+
Hence, On×n = O.
+
(6) Om×n = On×m .
(7) Rm×m and Qn×n are invertible. Then
(P AQ)+ = Q−1 A+ P −1 .
(8) Suppose
 .. 
a11 ..
 . .. 


.. .. 0
A=
.. 
. 
. . .... . 
 . . . . . . . . . . a. rr where aii = 0 for 1 ≤ i ≤ r.
 
 .. 0
0 ..
. m×n
Then
 .. 
1
..

a11
. .. 


.. .. 0
+  ..  
arr .. 
A = 1 .
 . . . . . . . . . . . . . ... . 
 .. 0
0 ..
. n×m
Do the following two problems.

(a) Let
 
−1 0 1 2
A =  0 1 −1 −1 .
1 1 −2 −3
Show that
 
−9 −9 0
 9
A+ =
1  0 −9 
27 −3 −6 −3
6 3 −3

and, for b ∈ R3 , find

min |
x |.
x ∈R3

xAA∗ = b A∗
(b) Let
 
−1 1
1 1 1
A= and B =  1 −1 .
−1 −1 −1
1 −1
Show that r(A) = r(B) = 1 and compute (AB)+ and B + A+ .

14. Polar decomposition and singular value decomposition of a matrix
Suppose Am×n is a nonzero real matrix with rank r. Then
(1) The symmetric matrix AA∗ has m eigenvalues (see Sec. B.10)
λ21 , λ22 , . . . , λ2r , 0, . . . , 0

with corresponding eigenvectors x1 ,

x2 , . . . ,
xr , xm ∈ Rm
xr+1 , . . . ,

x AA∗ = λ2i
such that xi , 1 ≤ i ≤ r and x AA∗ = 0 , r + 1 ≤ i ≤ m,
where
λi = |
xi A| > 0 for 1 ≤ i ≤ r.
(2) The eigenvectors {
x1 , . . . ,
xr , xm } from an orthonormal
xr+1 , . . . ,
basis for Rm = R(A∗ ) ⊕ N(A) such that { xr } is a basis for
x1 , . . . ,
∗ ⊥
R(A ) = N(A) and { xr+1 , . . . , xm } is a basis for N(A). Thus,

 
x1
 . 
 .. 
 
 
 xr 
P =  
 x r+1 
 
 .. 
 . 

xm m×m
is an orthogonal matrix, i.e. P ∗ = P −1 .

(3) Take an arbitrary orthonormal basis { yn } for N(A∗ ).
y r+1 , . . . ,
-x1 A xr A
.
Then, λ1 , . . . , λr , yr+1 , . . . , yn is an orthonormal basis for
Rn = R(A) ⊕ N(A∗ ) and the basis vectors in that order are eigen-
vectors of A∗ A corresponding to eigenvalues λ21 , . . . , λ2r , 0, . . . , 0.
Then
 x A

1
λ
 .1 
 .. 
 
 xr A 
 
Q = λr 
 yr+1 
 
 .. 
 . 

yn n×n
is an orthogonal matrix.
Therefore, A can be written as the following singular value decomposition
 
λ1
 .. 
 . 
 0 
 
 λ 
A = P −1 DQ, where D =  r
.
 0 
 
 .. 
 0 . 
0
See Fig. B.10.
N(A) R( A* ) N( A* ) R( A)
x3
y3
x2 x2 A
0 A 0 x2 A
2
x1 A
x1 1
x1 A
P P −1 Q
e3
1 0 e3
 
0 0
2
e2
0 2e2
e1 0
1 e1
Fig. B.10
While the polar decomposition is A = N P where N = R−1 DR is positive

semidefinite and P = R−1 Q is orthogonal.
(Note This result is usually proved by using general theory of diago-
nalizability about symmetric matrix (Ex. 8 of Sec. B.9). [7] provides a
direct geometric proof without recourse to that general theory . We will
encounter this for n = 2 in Sec. 4.5 and n = 3 in Sec. 5.5.)
Eventually, the generalized inverse of A is
1 
λ1
 .. 
 . 
 0 
 1 
 
A+ = Q−1  λr  P.
 0 
 
 .. 
 0 . 
0
B.9 Inner Product Spaces 773
B.9 Inner Product Spaces

Here in this section, the field F will always denote the real field R or the
complex field C.
Definition An inner product , on a vector space V over the field F is
a function that assigns to each ordered pair of vectors x and y in V a
scalar x , y in F, such that for all x , y , z in V and all scalars α in F, the

following properties hold.

1. (positive) x ≥ 0 with equality holds if and only if
x, x = 0.
2. (conjugate symmetric) y,x = x, y (the complex conjugate of

x, y ); in case F = R, (symmetric) y, x = y .
x,
3. (linear) α x + y , z = α x , z + y , z .

The scalar y is called the inner product of

x, x and
y.
A vector space V over the field F endowed with a specific inner product
, is called an inner product space and is denoted as (V, , ) or simply as V.
If F = R, V is called a real inner product space, whereas if F = C, a complex
inner product space or unitary space.
Examples For y = (y1 , . . . , yn ) in Rn (see Sec. B.1)
x = (x1 , . . . , xn ) and

n
∗
x, y = x y =

xk yk
k=1
is an inner product in Rn , and is called the natural or standard inner product
on the real vector space Rn .
For x and y ∈ Cn (see Sec. B.1), the inner product
∗

n
y =
x, xy¯ = xk ȳk
k=1
is called the natural or standard inner product on the complex vector
space Cn .
Rn and Cn are always endowed with natural inner products unless
otherwise specified.
Another example is shown in Ex. 26 of Sec. B.4.
Let C[a, b] be the complex vector space of complex-valued continuous
functions on [a, b]. For f, g ∈ C[a, b],
6 b
f, g = f (t)g(t) dt
a
is an inner product.
2 (square convergent series) and L2 [a, b] (square Lebesgue integrable

functions) are important inner product (or Hilbert) spaces in Real Analysis.
For a complex inner product space V , notice that
y = β̄
x , β y
x, β∈C
and hence
Bm C

n
m
n
αj
xj , βk
yk = αj β̄k yk .
xj ,
j=1 k=1 j=1 k=1
In case V is real, β̄ and β̄k are replaced by β and βk and , becomes

bilinear.
Define the norm or length of x ∈ V by
x| =
| x .
x,
Call y ∈ V orthogonal or perpendicular to each other if
x, y = 0 and
x,
is denoted as
x ⊥

y.

y = 0 . Then for β ∈ F (remember F = C or R),
Suppose
x − β
| y |2 = |
x |2 − β̄ y − β
x, x + |β|2 |
y, y |2 ≥ 0.

x ,
y
By choosing β = , then the above inequality becomes
|
y |2
2

x − x, y | y |2

x,
2 y = |x| − ≥0
2
|y| |y|
2
⇒ | y | ≤ |
x, x ||
y |,
where equality holds if and only if
y = β
x or x = β y for β ∈ F. This is
called the Cauchy–Schwarz inequality. Thus, it follows the triangle inequality
| y | ≤ |
x + x | + |
y |.

x = 0 and
In case V is a real inner product space, then for y = 0 ,
x, y
−1 ≤ ≤ +1.
| x || y |
Therefore, it is reasonable to define the cosine of the angle θ between
x
and
y by
x, y
cos θ = .
| x || y |
We now can reinterpret y as the signed orthogonal projection
x,
| y | cos θ of y along x multiplied by the length |

x | of
x itself. Of course,
if at least one of x and y is a zero vector, y = |

x, x ||
y | cos θ still
holds.
For inner product space V and y ∈ V with
x, x = 0, call
D
E ,
1. x
y , |
x | |
x
x|
= |y x
x |2
x the orthogonal projection vector of y along x,
and
,
2. y − |y x
x |2
x the perpendicular or orthogonal vector from y to x.
See Fig. B.11. Try to use these concepts to catch the idea used in the proof
of Cauchy–Schwarz inequality above.
〈 y, x 〉
y y− 2
x
x
〈 y, x 〉
0 2
x
x
Fig. B.11
Orthogonality is the main feature in almost all topics concerned with

inner products.

By the very definition, zero vector 0 is the only vector orthogonal to
all vectors in the space V .
A nonempty subset S of V consisting of nonzero vectors is said to be
orthogonal if any two distinct elements of S are orthogonal. Such a set is
easily seem to be linearly independent.
An orthogonal set S consisting entirely of vectors of unit length is called
orthonormal.
For example, in Rn or Cn , N = { en } satisfies
e1 , . . . ,
ej = δij ,
ei , 1 ≤ i, j ≤ n.
Therefore, N is an orthonormal basis, i.e. N is a basis that is orthonormal.
Exercises
In what follows, V will always denote an inner product space with inner
product , , usually not particularly mentioned.
1. Suppose dim V = n and B = { xn } is a basis for V. For

x1 , . . . , x =

n
n
i=1 αi xi and y = i=1 βi xi , then

n
x, y =

αi β̄j xj = [
xi , y¯ ]∗B ,
x ]B AB [
i,j=1
where
 
x1 , x1 x1 , x2 ···
x1 , xn
 x2 ,

x1 x2 , x2

···
x2 , xn 
 
AB = [ xj ]n×n
xi , = .. .. .. 
 . . . 
x1
xn , x2
xn , · · · xn
xn ,
is called the matrix representation of the inner product , related to the

basis B. AB is Hermitian (or symmetric if V is real), i.e. its conjugate
transpose
Ā∗B = AB or (A∗B = AB )
and is positive-definite, i.e.
∗

v ≥ 0 for any
v AB v ∈ Fn

with equality only if
v = 0 . Conversely, for any positive-definite
Hermitian matrix A ∈ M(n; C) and a basis B for V ,
y = [
x, y ]∗B
x ]B A[
defines an inner product , on V whose matrix representation is A itself.

Two matrix representations AB and AB of an inner product ,
related to bases B and B for V are congruent, i.e.
AB = P AB P̄ ∗ ,

where P = [1V ]B
B is the transition matrix from B to B (see Sec. B.7).
2. The Gram–Schmidt orthogonalization process Suppose { y1 ,
y2 , . . . ,

yk , . . .} is linearly independent in V .
(a) Let (see Fig. B.11)

x1 =
y1 ,

y 2 , x1

x2 = y2 − x1 ,
| x1 |2
..
.

k−1
xj
yk ,

xk yk −
= 2 xj ,
j=1
| xj |
..
.
Then,
1. xk =
x1 , . . . , yk , k ≥ 1.
y1 , . . . ,
2. xk+1 is orthogonal to every vector in

yk ; in symbol,
y1 , . . . ,
xk+1 ⊥ y1 , . . . , yk , k ≥ 1; moreover

xk+1 is the orthogonal
vector from yk+1 to the subspace yk .
y1 , . . . ,
-
Hence { x1 , x2 , . . . ,
xk , . . .} is an orthogonal set and | x1 x2
x 1 | , |
x2 | , . . . ,

xk
.
x k | , . . . is an orthonormal set.
|
(b) The (n + 1)-dimensional real vector space Pn (R) (see Sec. B.3) has
a basis B = {1, x, x2 , . . . , xn }. Use this to show that (n + 1) × (n + 1)
matrix
 
1 1
2
1
3 ··· 1
n
1
n+1
 1 1 
 2 1 1
· · · n+11
n+2 
 3 4 
 . .. .. .. .. 
 .. . 
 . . . 
 1 1 1
· · · 1 1 
 n n+1 n+2 2n−1 2n 
1
n+1 n+2
1 1
n+3 ··· 1
2n
1
2n+1
is positive-definite. Also, {
y0 , yn } where
y1 , . . . ,
y0 (t) = 1,
y1 (t) = t,
1
y2 (t) = t2 − ,
3
..
.
dk
yk (t) = k (t2 − 1)k , 1≤k≤n
dt
is an orthogonal basis. How to get an orthonormal basis?
(c) V has orthogonal (or orthonormal) sets of vectors.
(d) Suppose dim V = n < ∞. Then V has an orthonormal basis B =

{
x1 , xn }. For any
x2 , . . . , x ∈ V , then

n

x= xi
x, xi
i=1
and for any linear operator f : V → V , the matrix representation of

f related to B is
[f ]B = [f ( xj ]n×n .
xi ),
If S1 and S2 are nonempty subsets of V such that each vector in S1 is

orthogonal to each vector in S2 , then S1 and S2 are said to be orthogonal
or perpendicular to each other and is denoted as
S1 ⊥S2 .
In particular, if S1 = {
x }, this is briefly as
x ⊥S2 . Let S be a nonempty
subset of V . The subspace
S ⊥ = {
x ∈ V |
x ⊥S}
is called the orthogonal complement of S in V and has the property that

(S ⊥ )⊥ ⊇ S which is equal to S if S is already a subspace.
3. The orthogonal decomposition of an inner product space.
x0 ∈ V . Then there exist a

(a) Let S be a subspace of V and y0 ∈ S
such that
|
x0 −
y0 | =
min |
x0 −
y|
y ∈S
x0 −
if and only if ( y0 )⊥S. In this case,
y0 is unique and is called the
orthogonal projection of x0 on S and x0 − y0 the orthogonal vector
from x0 to S. Moreover, the Pythagorean theorem
|
x0 |2 = |
y0 |2 + |
x0 −
y0 |2
holds.
(b) Suppose {x1 ,

x2 , . . . ,
xn , . . .} is an orthonormal system in V . Then,
for any x ∈ V , the optimal approximation inequality is

n n
x − αi xi ≥ x − x , xi xi

i=1 i=1
/1/2
n
= |x| − 2
| x , xi |
2
,
i=1
α1 , . . . , αn ∈ F and n ≥ 1
with equality if and only if αi = xi
for 1 ≤ i ≤ n. That
x,
is, the minimum is obtained at the orthogonal projection vector

n
i=1 x , xi xi of x . By the way,

n
| xi |2 ≤ |
x, x |2 , n≥1
i=1
is called the Bessel inequality. Try to show that

6 1
38
min |t3 − a − bt − ct2 |2 dt = .
a,b,c −1 525
(c) Suppose S is a finite-dimensional subspace of V . Then
V = S ⊕ S⊥.
4. The orthogonal projection (operator) of V onto a finite-dimensional sub-
space. Suppose S is a finite-dimensional subspace of V .
(a) There exists a unique linear operator p: V → V such that for
x ∈V,

x = p( x − p(
x) + x) ∈ S
x ), p( and
⊥
(1V − p)( x ) = x − p( x ) ∈ S .

Such an operator p is called the orthogonal projection of V onto

S; meanwhile, 1V − p is the orthogonal projection of V onto S ⊥ .
Moreover,
p( x ⇔
x) = x ∈ S,

p( x) = 0 ⇔ x ∈ S ⊥ , and
| x | = |p( x )| + |
2 2
x − p(
x )|2 .
(b) Suppose dim V < ∞ and p: V → V is a linear operator. Then the
following are equivalent:
(1) p is the orthogonal projection of V onto its range space S = p(V )
along S ⊥ .
(2) (algebraic) p is self-adjoint and idempotent, i.e.

p̄∗ = p,
p2 = p.
(3) (geometric) p is a projection, i.e. p2 = p and |p( x )| ≤ |x | for
x ∈V.

(4) (geometric) p(

x )⊥(x − p(
x )) or |p(
x )|2 = p( x for
x ), x ∈V.
(5) (algebraic and geometric) p is a projection and
Im(p) = Ker(p)⊥ ,
Ker(p) = Im(p)⊥ .
(c) Suppose dim S = k and { yk } is a basis for S. Suppose
y1 , . . . ,
dim V = n and B = { x1 , . . . ,

xn } is an orthonormal basis for V .
Let
 
y1 B
 .. 
A= .  .

yk B k×n
Then, the orthogonal projection of V onto S is
Ā∗ (AĀ∗ )−1 A: V → S ⊆ V
defined by [x ]B → [x ]B Ā∗ (AĀ∗ )−1 A. Try to explain what linear
∗ ∗ −1
operator 2Ā (AĀ ) A − 1n means?
5. Suppose dim V = n.
(a) Riesz representation theorem For each f ∈ V ∗ (the dual space of V ,
y ∈ V such that
see Ex. 19 of Sec. B.7), there exists a unique
x ) =
f ( y
x, x ∈ V.
for
∗
y ( ) = , y is in V

Denote this f temporarily by f y . Conversely, f
for each y ∈ V .

(b) Define , on V ∗ by f ∗

y = x , y . Then V is an n-dimensional
x , f

∗
inner product space. Let ϕ: V → V be defined by
ϕ(
x ) = f
x.
Then,
(1) For each orthonormal basis B = { xn } for V , its dual
x1 , . . . ,
basis B = {f1 , . . . , fn } is an orthonormal basis for V ∗ . Also,
∗
fj = fxj for 1 ≤ j ≤ n.
(2) ϕ is conjugate linear, i.e. for y ∈ V and α ∈ F,

x,
ϕ(α
x +
y ) = ᾱf
x + f

y = ᾱϕ( x ) + ϕ( y ).
(3) ϕ preserves inner products, i.e.

ϕ( y ) =
x ), ϕ( y ,
x, x , y ∈ V.

∗
Thus, we identify x and ϕ( x ) = fx , and V and V and call V
a self-dual inner product space. We will adopt this convention when
dealing with inner product spaces.
(c) For each linear operator f : V → V , there exists a unique linear
operator f ∗ : V → V , called the adjoint of f , such that
f (
x ), x , f ∗ (
y = y ), x , y ∈ V.

Moreover, for each orthonormal basis B = { xn },

x1 , . . . ,
∗
[f ∗ ]B = [f ]B .
In case V is real, then [f ∗ ]B = [f ]∗B . We already treat its counterpart
in matrices in Exs. 1–14 of Sec. B.8.
(Note The following diagram shows the difference between the dual
mapping g: V ∗ → V ∗ as indicated in Ex. 24 of Sec. B.7 and the
adjoint f ∗ : V → V discussed here, where ϕ: V → V ∗ is defined as
in (b).
f
-
V V
f∗
ϕ ϕ
? ?
V ∗ g
V∗
Notice that f ∗ = ϕ−1 ◦ g ◦ ϕ.)
(d) Suppose f, g: V → V are linear operators. Then
(1) (f + g)∗ = f ∗ + g ∗ .
(2) (αf )∗ = ᾱf ∗ for α ∈ F.
(3) (g ◦ f )∗ = f ∗ ◦ g ∗ .
(4) (f ∗ )∗ = f ∗∗ = f .
(5) 1∗V = 1V .
(e) Some special linear operators. Suppose B is an orthonormal basis
for V and f : V → V is a linear operator.
(1) f ( y ) = f ∗ (
x ), f ( x ), f ∗ (
y ) for y ∈V.
x,
∗ ∗
⇔ f ◦ f = f ◦ f.
∗ ∗
⇔ [f ]B [f¯]B = [f¯]B [f ]B .
Such an f is called a normal operator and [f ]B a normal matrix.
(2) f ( y ) =
x ), f ( x, y for y ∈ V.
x,
∗ ∗
⇔ f ◦ f = f ◦ f = 1V .
∗ ∗
⇔ [f ]B [f¯]B = In or [f¯]B = [f ]−1 B .
Such an f is called a unitary operator and [f ]B a unitary matrix.
(3) f ( y ) =
x ), x , f (
y ) for y ∈V.
x,
∗
⇔ f = f.
∗
⇔ [f¯]B = [f ]B .
f is called a Hermitian operator or self-adjoint operator and [f ]B
a Hermitian matrix.
(4) f ( y ) = −
x ), x , f (
y ) for y ∈ V.
x,
∗
⇔ f = −f.
∗
⇔ [f¯]B = −[f ]B .
f is called a skew-Hermitian operator and [f ]B a skew-Hermitian
matrix.
In case V is a real vector space, a unitary operator (matrix) is usually
called an orthogonal operator (matrix), Hermitian operator (matrix)
called symmetric operator (matrix) and skew-Hermitian operator
(matrix) called skew-symmetric operator (matrix).
For an n-dimensional complex inner product space V and any fixed

orthonormal basis B = { xn } for V , the linear isomorphism
x1 , . . . ,
Φ: V → Cn
defined by Φ( x ]B carries the inner product , on V into the natural

x ) = [
inner product , on Cn , and thus preserves inner products and any linear
operator f : V → V is transformed into a matrix [f ]B ∈ M(n; C) such that
[f (
x )]B = [
x ]B [f ]B , x ∈ V.

Conversely, any result concerning Cn and M(n; C) can be reinterpreted as a

corresponding result, uniquely up to similarity, in V with a fixed orthonor-
mal basis. Henceforth, we focus our study on Cn endowed with the natural
inner product and a matrix A ∈ M(n; C) is considered as a linear opera-
tor on Cn defined by x → x A. Since R is a subfield of C, a real matrix
A ∈ M(n; R) may be considered as a complex matrix in many occasions and
hence will inherit directly many valuable results from the complex one. One
shall refer to Ex. 32 of Sec. B.4 and Sec. B.10 for concepts of eigenvalues
and eigenvectors.
6. Unitary matrices and orthogonal matrices
(a) For a matrix U ∈ M(n; C), the following are equivalent:
(1) U is unitary, i.e. U Ū ∗ = Ū ∗ U = In .
(2) The n row vectors of U form an orthonormal basis for Cn .
(3) The n column vectors of U form an orthonormal basis for Cn .
(4) U (as a linear operator) transforms any orthonormal basis for Cn
into an orthonormal basis for Cn .
(5) U transforms an orthonormal basis for Cn into another one.
(6) U preserves inner products (and hence, orthogonality), i.e.
y U =
x U, y ,
x, x , y ∈ Cn .

(7) U preserves lengths, i.e.

|
x U | = |
x |,
x ∈ Cn .
Note that these results are still valid for orthogonal matrix and Rn .
(b) Suppose U and V are unitary matrices. Then
Ū , U ∗ , U −1 and U V
are unitary too. Also, |det U | = 1 and in particular, det P = ±1 if
P is orthogonal.
(c) The eigenvalues λ (see Sec. B.10) of a unitary matrix all have abso-
lute value |λ| = 1, i.e. λ = λ̄1 .
(Note Let Pn×n be an orthogonal matrix. Then
(1) If det P = 1 and n is odd, or det P = −1 and n is even, P
has eigenvalue 1; whereas if det P = −1, P always has eigen-
value −1. P has no other real eigenvalues except ±1.
(2) If λ is an eigenvalue of P , so is λ−1 . Hence, complex eigenvalues
of P are in conjugate pairs
eiθ , e−iθ (θ ∈ R and 0 < θ < π).
(3) The set of all orthogonal matrices
O(n; R) = {P ∈ M(n; R) | P ∗ = P −1 }
forms a subgroup of GL(n; R), and has {P ∈ O(n; R) | det P = 1}
as its subgroup. These two groups handle the rigid motions in
space Rn .
(4) Symmetric and skew-symmetric matrices are invariant under

orthogonal similarity (see Ex. 31(b) of Sec. B.4)
(5) (Cayley, 1846) Let An×n be a real skew-symmetric matrix. Then
In + A and In − A are invertible and
P = (In − A)(In + A)−1
is orthogonal. P does not have eigenvalue –1 and det P = 1.

(6) Conversely, if orthogonal matrix P does not have −1 as its eigen-
value, then there exists a skew-symmetric matrix
A = (In + P )−1 (In − P )
such that P = (In − A)(In + A)−1 .)

(d) A transformation f : Cn → Cn is called a rigid motion or an
isometry if
x ) − f (
|f ( y )| = |
x −
y |, x , y ∈ Cn .

For any fixed orthonormal basis B for Cn , there exsits a unique

x0 ∈ Cn such that
unitary matrix U and a point
[f (
x )]B = [
x0 ]B + [
x ]B U,
x ∈ Cn .
(Note A rigid motion f : Rn → Rn is of the form
f (
x) =
x0 +
x P, x ∈ Rn ,

where Pn×n is orthogonal. Whereas
x →

x0 +
x (rP ), r>0
is a similar (i.e. angle-preserving) transformation.)

(e) Any unitary matrix Un×n has n (complex) eigenvalues λ1 , . . . , λn
with eigenvectors xn ∈ Cn such that
x1 , . . . ,

xj U = λj xj , |λj | = 1 for 1 ≤ j ≤ n
and B = { xn } is an orthonormal basis for Cn . Then

x1 , . . . ,
 
  x1
λ1 0  
 ..   x2 
[U ]B = QU Q−1 =  .  with Q =  ..  .
 . 
0 λn
xn
(f) Suppose A ∈ M(n; C) has rank r = r(A) ≥ 1 and its first r rows
are linearly independent. Then there exists a unitary matrix U such
that
A = BU = U −1 C,
where B is a lower-triangle matrix with first r diagonal entries
positive and the remaining elements all zeros, whereas C is a upper-
triangle matrix having the same diagonal entries as B. Such B and
C are unique. In case A is invertible, U is unique too.
(g) (Schur, 1909) Every complex square matrix is unitarily similar (or
equivalent) to a triangular matrix whose main diagonal entries are
(complex) eigenvalues. That is to say, for A ∈ M(n; C), there exists
a unitary matrix U such that
   
λ1 x1
 b21 λ2 0   
x2 
  
−1  b31 b32 λ3   x3 
UAU =  with U =   .
 . .. .. ..   . 
 .. . . .   .. 
bn1 bn2 bn3 · · · λn n×n

xn
Note that the first row vector x1 of U is an eigenvector of A corre-
sponding to the eigenvalue λ1 . Refer to Ex. <C> 10(a) of Sec. 2.7.6.
7. Normal matrix and its spectral decomposition Let N ∈ M(n; C).
(a) Suppose that N is normal. Then
(1) N has n complex eigenvalues.
(2) Eigenvectors of N corresponding to distinct eigenvalues are
orthogonal.
(3) If
x N = λ
x for nonzero x N̄ ∗ = λ̄
x , then x.
(b) (Schur and Toeplitz, 1910) N is normal if and only if N is unitarily
similar to a diagonal matrix. In fact, let λ1 , . . . , λn be eigenvalues of
N with corresponding eigenvectors x1 , . . . ,
xn , i.e.

xj N = λj
xj for 1 ≤ j ≤ n
such that B = { xn } is an orthonormal basis for Cn . Then
x1 , . . . ,
 
  x1
λ1 0  
 ..   x2 
U N U −1 =  .  , U =  .. 
 . 
0 λn
xn
where U is unitary.
(c) In particular, in case N is normal, then
(1) N is unitary ⇔ N is unitarily similar to a diagonal matrix with

main diagonal entries of absolute value 1 (see
Ex. 6(e)).
⇔ all eigenvalues of N are of absolute value 1.
(2) N is Hermitian ⇔ n eigenvalues of N are all real. Thus, N is
unitarily similar to a real diagonal matrix.
(3) N is positive-definite Hermitian, i.e. N is Hermitian and
∗
x N x > 0 for any x = 0 in Cn ⇔ n eigenvalues of N are all
positive.
(4) N is skew-Hermitian ⇔ n eigenvalues of N are all pure
imaginary.
Moreover,
(5) N is orthogonal ⇔ N is a real matrix and its eigenvalues are
of the forms eiθ , e−iθ , 0 ≤ θ ≤ π.
(6) N is symmetric ⇔ N is a real matrix and all its eigenvalues
are real.
⇔ N is a real matrix and is orthogonally
similar to a real diagonal matrix.
(7) N is positive semidefinite symmetric, i.e. N is symmetric and
∗
x N x ≥ 0 for any x ∈ Rn ⇔ n eigenvalues are all nonnegative.
(d) The spectral decomposition theorem for normal matrix. Let λ1 , . . . , λr

be distinct eigenvalues of a normal matrix N with respective alge-
braic multiplicity k1 , . . . , kr where k1 + · · · + kr = n. Let
Wj = {
x ∈ Cn | x },
x N = λj 1≤j≤r
be the eigenspace corresponding to λi . Then

(1)
Cn = W1 ⊕ W2 ⊕ · · · ⊕ Wr
= Wj ⊕ Wj⊥ , 1 ≤ j ≤ r,
where
A
r
Wj⊥ = Wl .
l=1
l=j
(2) The mapping

r
=
N − λl In
Nj = : Cn → Cn
λj − λl
l=1
l=j
is the orthogonal projection of Cn onto Wj for 1 ≤ j ≤ r which

is positive semi-definite Hermitian.
(3) Ni Nj = δij Ni for 1 ≤ i, j ≤ r.
(4) Nj s decompose In as
In = N1 + · · · + N r .
(5) Nj s decompose N as
N = λ 1 N 1 + · · · + λr N r .
See Fig. B.12.
W3
xN3
xN
3xN3
x
2xN2
0
1xN1
xN2
xN1
W1 W2
Fig. B.12
(Note It can be shown that the algebraic multiplicity kj of λj is

equal to its geometric multiplicity dim Wj (see Sec. B.10). For each
j, 1 ≤ j ≤ r, take an orthonormal basis Bj = {xj1 , . . . , xjkj } for Wj
so that B = B1 ∪ B2 ∪ · · · ∪ Br forms an orthonormal basis for Cn .
Let
 
x11
 .. 
 . 
 
 x1k 
 1 
 
 x21 
 . 
 . 
 . 
U =  .
 x2k2 
 
 .. 
 . 
 
 xr1 
 
 . 
 .. 

xrkr
Then U is unitary and
Hence it follows that the orthogonal projection Nj : Cn → Wj is
kj
which is equal to Pj (N ) where Pj (t), 1 ≤ j ≤ k, are the Lagrange

polynomials associated with λ1 , λ2 , . . . , λk (see Sec. B.3).)
(e) Prove the following:

(1) If N is normal and N = λ1 N1 + · · · + λr Nr is the spectral
decomposition as in (d), then for any polynomial f (t),

r
f (N ) = f (λj )Nj
j=1
holds. Therefore, deduce that if N l = 0, then N = 0.

(2) If N is normal, then a matrix M commutes with N if and only
if M commutes with each Nj .
(3) If N is normal, there exists a normal matrix M such that
M2 = N.
(4) N is normal if and only if N̄ ∗ = f (N ) for some polynomial
f . For the necessity, take f (t) to be the Lagrange interpolation
polynomial such that f (λj ) = λ̄j for 1 ≤ j < r.
(5) If N is normal, then a matrix M commutes with N if and only
if M commutes with N̄ ∗ .
(f) Show that
 
0 1+i 0
N = 1 + i 0 0
0 0 i
is a normal matrix and use it to justify (d).
8. Let A = [aij ]n×m be a symmetric real matrix. Then A is
(1) Positive definite ⇔
x Ax ∗ > 0 for any x ∈ Rn and x = 0.
∗
(2) Negative definite ⇔ x A x < 0 for any x ∈ Rn and x = 0.
∗
(3) Positive semidefinite ⇔ x A x ≥ 0 for any x ∈ R .
n
(4) Negative semidefinite ⇔ x Ax ∗ ≤ 0 for any

x ∈ Rn .
(5) Indefinite ⇔ there exist x1 , x2 ∈ R such that
n
∗ ∗
x1 A x1 > 0 and x2 A x2 < 0.
x ∈ Cn .
There are corresponding definitions for Hermitian matrix with
(a) Suppose An×n = [aij ]n×n is symmetric. The following are equiva-
lent.
(1) A is positive definite.
(2) There exists an invertible real matrix Pn×n such that
A = PP ∗ .
Moreover, P may be taken as a lower triangular matrix.
(3) All n eigenvalues of A are positive.

(4) (Frobenius, 1894) The leading principal submatrices Ak of A
have positive determinants, i.e.
 
a11 · · · a1k
 ..  > 0, ≤ k ≤ n.
det Ak = det  ... . 
ak1 · · · akk
(5) There exists a unique real lower-triangular matrix ∆ with
main diagonal entries all equal to 1 and a diagonal matrix
diag[d1 , . . . , dn ] such that
 
d1 0
 ..  ∗
A = ∆ . ∆
0 dn
where (det Ak as in (4))
det Ak
dk = > 0 for ≤ k ≤ n; det A0 = 1.
det Ak−1
Try to figure out criteria for negative definite and semidefinite matri-
ces.
(b) Probably, the simplest criteria for a symmetric matrix to be indef-
inite is that it possesses at least one positive eigenvalue and one
negative eigenvalue. Can you figure out others like (a)?
(c) Use
 
2 −1 0
A = −1 2 −1
0 −1 2
to justify (a).
B.10 Eigenvalues and Eigenvectors

Let A = [aij ] ∈ M(n; F) and λ ∈ F.
x ∈ Fn such that
If there exists a nonzero vector

x A = λ
x,
then
x is called an eigenvector of A corresponding to the eigenvalue λ. An
eigenvector is also called a characteristic vector and an eigenvalue called a
characteristic value. The eigenvalue λ of a linear transformation f : Fn → Fn
B.10 Eigenvalues and Eigenvectors 791
is defined to be that of the matrix [f ]B with respect to any fixed basis B for
Fn (see Sec. B.7), while the corresponding eigenvector x is the one that sat-
isfies [x ]B [f ]B = λ[
x ]B which is equivalent to f (
x ) = λx . Similar definition
is still valid for linear transformation f : V → V where dim V < ∞.
Exercises
1. The following are equivalent:
(1) λ is an eigenvalue of A.
(2) λ is a zero of the characteristic polynomial det(A − tIn ) of A (see
Ex. 32 of Sec. B.4), i.e. det(A−λIn ) = 0 or A−λIn is not invertible.
(3) The kernel Ker(A − λIn ) of the linear transformation A − λIn :
Fn → Fn defined by x → x (A − λIn ) has dimension ≥ 1.
2. Some basic facts Let λ be an eigenvalue of A with corresponding
eigenvector
x.
(1) λk is an eigenvalue of Ak for any positive integer k, with the same
eigenvector x.
(2) λ−1 is an eigenvalue of A−1 if A is invertible, with the same
eigenvector x.
(3) The eigenspace corresponding to λ

Eλ = {
x ∈ Fn | x } (including
x A = λ x = 0 ) = Ker(A − λIn )
is a subspace of Fn , of positive dimension.
Also
(4) Similar matrices have the same characteristic polynomial and hence
the same eigenvalues.
(5) A matrix A and its transpose A∗ have the same characteristic poly-
nomial and eigenvalues.
(6) Let det(A − tIn ) = (−1)n tn + αn−1 tn−1 + · · · + α1 t + α0 be the
characteristic polynomial of A = [aij ]n×n . Then (refer to Ex. <C>
4 of Sec. 3.7.6)
αn−1 = (−1)n−1 tr A,
α0 = det A.
Hence (added to Ex. 16(d) of Sec. B.4)
A is invertible.
⇔ Zero is not an eigenvalue of A.

⇔ The constant term α0 = 0.
3. Some basic facts

(a) The eigenvalues of an upper (or lower) triangular matrix are the
diagonal entries.
(b) Eigenvectors associated with distinct eigenvalues are linearly
independent.
(c) For any A, B ∈ M(n; C) such that B is invertible, there exists a
scalar α ∈ C such that A + αB is not invertible. There exist at
most n distinct such scalars α.
(d) For any A, B ∈ M(n; C), the characteristic polynomials for the
products AB and BA are equal.
4. Cayley–Hamilton Theorem Let A = [aij ] ∈ M(n; F) with characteristic
polynomial det(A − tIn ) = (−1)n tn + αn−1 tn−1 + · · · + α1 t + α0 , then
(−1)n An + αn−1 An−1 + · · · + α1 A + α0 In = O.
In short, a matrix satisfies its characteristic polynomial. For exam-

ple, let

1 −1
A= .
2 4
Then, the characteristic polynomial is

1 − t −1

det(A − tI2 ) = = (1 − t)(4 − t) + 2 = t2 − 5t + 6.
2 4 − t
Actual computation shows that

1 −1 1 −1 −1 −5
A2 = =
2 4 2 4 10 14

−1 −5 1 −1 1 0 0 0
⇒ A − 5A + 6I2 =
2
−5 +6 = = O.
10 14 2 4 0 1 0 0
(a) Prove this theorem in cases n = 2, 3.

(b) Justify this theorem by using
 
3 −1 −1
A= 1 0 −1 .
−2 5 4
(c) Prove this theorem in case A is diagonalizable (see Sec. B.11) and
for general matrix A.
B.11 Diagonalizability of a Square Matrix or a Linear Operator 793
(d) If α0 = det A = 0, then A is invertible (see Ex. 2(6)). Thus

[(−1)n An−1 + αn−1 An−2 + · · · + α2 A + α1 In ]A = −α0 In
shows that the inverse matrix is
1
A−1 = − [(−1)n An−1 + αn−1 An−2 + · · · + α2 A + α1 In ].
α0

1 7
(e) Show that A = −6 −43 satisfies A2 +42A−I2 = O. Compute A−1 .
(f) If A ∈ M(3; R) satisfies A5 − 4A4 + 7A3 − 6A2 − 14I2 = O, show
that A is invertible, and express A−1 in terms of A.
(g) If A = [aij ] ∈ M(2; F) is invertible, show that
A−1 = αA + βI2
for some α, β ∈ F. Determine α and β, and use aij ’s to express
entries of A−1 .
(h) If A ∈ M(2; F) satisfies A2 + A + I2 = O, find A3n explicitly in
terms of A and I2 for every integer n, positive or negative.
5. Compute real or complex eigenvalues for the following matrices and find
out associated eigenvectors of the eigenvalues. Check Cayley–Hamilton
theorem for each matrix.
1 1
1 0 4 1 4 −1
; ; ; 12 51 ;
1 1 2 3 2 3 5 3
       
1 0 0 0 0 1 3 1 1 6 −3 −2
1 1 0 ; 0 1 0 ;  2 4 2 ;  4 −1 −2 ;
1 1 1 1 0 0 −1 −1 1 10 −5 −3

−1 2 0
AA∗ and A∗ A where A = .
3 1 4
B.11 Diagonalizability of a Square Matrix

or a Linear Operator
We have touched this concept and some of its related results in quite a few
places before, such as in Exs. 31 and 32 of Sec. B.4, Exs. 7–9 of Sec. B.7,
Ex. 14 of Sec. B.8, Exs. 6–8 of Sec. B.9 and Sec. B.10. Now, we formally
introduce the diagonalizability of a square matrix or a linear operator on a
finite-dimensional vector space.
A square matrix is said to be diagonalizable if it is similar to a diagonal
matrix (see Ex. 31 of Sec. B.4). That is to say, for An×n , if there exists an
invertible matrix P such that

P AP −1
is a diagonal matrix, then A is called diagonalizable. If dim V < ∞, a linear
operator f : V → V is diagonalizable if there exists a basis B for V such
that [f ]B is diagonalizable.
Suppose λ is an eigenvalue of a matrix An×n . In addition to the
eigenspace Eλ introduced in Ex. 2 of Sec. B.10, we introduce the gener-
alized eigenspace of A corresponding to λ as the subspace

Gλ = {
x ∈ Fn |
x (A − λIn )p = 0 for some positive integer p}.
Nonzero vectors in Gλ are called generalized eigenvectors of A correspond-
ing to λ.
Note that Eλ ⊆ Gλ holds. The role played by Gλ will become much
more clear in the next section B.12.
Using division algorithm for polynomials (see Sec. A.5), Cayley–
Hamilton theorem (see Ex. 4 of Sec. B.10) allows us to find a polynomial
p(t) with the following properties:
1. The leading coefficient of p(t) is 1.
2. If g(t) is any polynomial such that g(A) = O, then p(t) divides g(t).
Such a p(t) does exist and has the smallest degree among all such polynomi-
als g(t) and is called the minimal polynomial of the given square matrix A.
Note that the minimal polynomial divides the characteristic polynomial.
For example, the matrix
 
−9 4 4
A =  −8 3 4
−16 8 7
has characteristic polynomial det(A − tI3 ) = −(t + 1)2 (t − 3) and minimal
polynomial (t + 1)(t − 3) = t2 − 2t − 3. What are generalized eigenspaces
Gλ for λ = −1 and λ = 3?
While the matrix
 
6 −3 −2
B =  4 −1 −2
10 −5 −3
has characteristic polynomial −(t − 2)(t2 + 1) and minimal polynomial
(t − 2)(t2 + 1). Readers are strongly urged to decide what are the gen-
eralized eigenspaces Gλ for λ = 2 and λ = i and compare all possible
differences between these and those for A above.
Exercises
1. Prove the following.
(a) A matrix is similar to a scalar matrix αIn if and only if it is αIn
itself.
(b) A diagonalizable matrix having only one eigenvalue is a scalar
matrix. Therefore,
 
1 0 0
1 0
, 1 1 0 , etc.
1 1
0 1 1
are not diagonalizable.

2. Let

a11 a12
A=
a21 a22
be a real matrix. Then, there exist at most two distinct real numbers λ
such that A − λI2 is not invertible. If such a number exists, it should be
an eigenvalue of A.
(a) Suppose A = I2 . Then A has two distinct real eigenvalues λ1 and
λ2 , if and only if there exists an invertible matrix P such that

λ 0
P AP −1 = 1 .
0 λ2
(b) If det A < 0, then A has two distinct real eigenvalues.

(c) If a11 = a22 with a12 = a21 = 0, then the symmetric matrix A has
two real eigenvalues and is diagonalizable.
(d) If a11 = a22 with a12 = −a21 = 0, then A has no real eigenvalue and
hence, A − λI2 is invertible for any real number λ.
(e) In case A2 = A, then

R2 = {
x ∈ R2 | x } ⊕ {
xA = x ∈ R2 |
x A = 0 }.
3. Characteristic and minimal polynomials Let p(t) be the minimal poly-

nomial of a given matrix An×n .
(a) A scalar λ is an eigenvalue of A if and only if p(λ) = 0. Hence the
characteristic polynomial and the minimal polynomial for A has the
same zeros.
(b) In case the characteristic polynomial is
det(A − tIn ) = (−1)n (t − λ1 )r1 · · · (t − λk )rk ,
where λ1 , . . . , λk are distinct eigenvalues of A and r1 ≥ 1, . . . , rk ≥ 1

with r1 + · · · + rk = n, then there exist integers n1 , n2 , . . . , nk such
that 1 ≤ ni ≤ ri for 1 ≤ i ≤ k and the minimal polynomial is
p(t) = (t − λ1 )n1 · · · (t − λk )nk .
(c) Moreover, if ϕ(t) is an irreducible factor polynomial of det(A − tIn ),

then ϕ(t) | p(t) holds.
(d) For any polynomial
p(t) = tn + an−1 tn−1 + · · · + a1 t + a0 ,
the so-called companion matrix of p(t)
 
0 1 0 ··· 0
 0 0 1 ··· 0 
 
 .. .. .. .. 
A= . . . ··· . 
 
 0 0 0 ··· 1 
−a0 −a1 −a2 · · · −an−1
has the characteristic polynomial (−1)n p(t) and the minimal poly-
nomial p(t) itself.
4. Criteria for diagonalizability For An×n , let
det(A − tIn ) = (−1)n (t − λ1 )r1 · · · (t − λk )rk ,
where λ1 , . . . , λk are distinct eigenvalues of A and r1 ≥ 1, . . . , rk ≥ 1

with r1 + · · · + rk = n. Then the following are equivalent.
(a) A is diagonalizable.
(b) There exists a basis B = { xn } for Fn consisting of eigenvec-
x1 , . . . ,
tors of A such that
where


x1
 .. 
 
 . 
 
 xr1 
 
P =

..
.


 
 xr1 +···+rk−1 +1 
 
 .. 
 . 

xr1 +···+rk−1 +rk n×n
with xl for r1 + · · · + ri−1 + 1 ≤ l ≤ r1 + · · · + ri−1 + ri and

xl A = λi
1 ≤ i ≤ k.
(c) Fn is the direct sum of eigenspaces Eλi , 1 ≤ i ≤ k, of A, i.e.
Fn = Eλ1 ⊕ · · · ⊕ Eλk .
(d) Each eigenspace Eλi has dimension
dim Eλi = ri = n − r(A − λi In ),
i.e. the algebraic multiplicity ri of the eigenvalue λi is equal to its

geometric multiplicity dim Eλi for 1 ≤ i ≤ k.
(e) Each eigenspace Eλi of A is equal to the corresponding generalized
eigenspace Gλi for each i, 1 ≤ i ≤ k, i.e.
Eλi = Gλi .
(f) The ranks r(A − λi In ) = r((A − λi In )2 ) for 1 ≤ i ≤ k (this would

imply that Gλi = Ker(A − λi In ) = Eλi ).
(g) The minimal polynomial p(t) is a product of distinct linear factors,
i.e.
p(t) = (t − λ1 ) · · · (t − λk ).
In particular, an n × n matrix having n distinct eigenvalues is diagonal-
izable.
5. The process for diagonalization Suppose A ∈ M(n; F). Then,
(1) Compute the characteristic polynomial det(A−tIn ) and try to factor
it into product of linear factors such as (−1)n (t − λ1 )r1 · · · (t − λk )rk
with distinct λ1 , . . . , λk and r1 + · · · + rk = n.
(2) Use Ex. 4 to test for diagonalizability.
(3) If A is diagonalizable, choose a basis Bi = { xiri }
xi1 , . . . ,
for the eigenspace Eλi with dimension ri , 1 ≤ i ≤ k. By Ex. 4(c),
B1 ∪ · · · ∪ Bk forms a basis for Fn .
(4) Form an n × n matrix P with basis vectors in the ordered basis
B1 ∪ · · · ∪ Bk as row vectors, i.e.
 
x11
 . 
 .. 
 
 
 x1r1 
 . 
P =  ..  .

 
 xk1 
 
 .. 
 . 

xkrk
This P is invertible.
(5) Then one can check that
B.12 Canonical Forms for Matrices: Jordan Form and Rational Form 799
For a linear operator f : V → V , where dim V = n, choose a suitable basis

B for V and start from the matrix representation [f ]B . Use each of the
following matrices to justify the process stated above:
     
0 1 0 1 3 1 2 1 1
1 5
; 0 0 1 ; 2 2 4 ; 1 2 1 ;
3 4
1 −1 1 1 1 −1 1 1 2
 
    0 1 0 1
1 0 −4 11 21 3 1 0 1 0
 0 5 4 ; −4 −8 −1 ;  
0 1 0 1 .
−4 4 3 −5 −11 0
1 0 1 0
B.12 Canonical Forms for Matrices: Jordan Form

and Rational Form
As noted before, not every square matrix is diagonalizable.
For a given A ∈ M(n; F), by unique factorization theorem for polyno-
mials (see Sec. A.5), the characteristic polynomial is factored as
det(A − tIn ) = (−1)n p1 (t)r1 · · · pk (t)rk ,
where p1 (t), . . . , pk (t) are irreducible monic polynomials and r1 ≥
1, . . . , rk ≥ 1 are positive integers with r1 deg p1 (t) + · · · + rk deg pk (t) = n.
In case each pi (t) = t − λi is a linear polynomial, a basis can be chosen from
Fn according to which A can be expressed as the so-called Jordan canonical
form; otherwise the rational canonical form.
This section is divided into two subsections.
B.12.1 Jordan canonical form

The main the result is the following theorem.
Theorem Suppose A ∈ M(n; F) is a nonzero matrix having characteristic
polynomial factored over the field F as
det(A − tIn ) = (−1)n (t − λ1 ) · · · (t − λn )
where λ1 , . . . , λn are eigenvalues of A which are not necessarily distinct.
Then there exists a basis B = { xn } for Fn such that
x1 , . . . ,
 
J1
 J2 
 
P AP −1 =  ..  (called the Jordan canonical form of A)
 . 
Jk n×n
where each Ji is a square matrix of the form [λj ]1×1 or the form
 
λj 0 0 ··· 0 0
1 λj 0 · · · 0 0
 
 
0 1 λj · · · 0 0
. .. 
. .. .. .  (called a Jordan block corresponding to λj )
. . . · · · .. .
 
0 0 0 · · · λj 0 
0 0 0 · · · 1 λj
for some eigenvalue λj of A and
 
x1
 
P =  ...  .

xn n×n
If det(A − tIn ) = (−1) (t − λ1 ) · · · (t − λk )rk where λ1 , . . . , λk are

n r1
distinct and r1 + · · · + rk = n, the Jordan canonical form can be put in a

neater way (see Ex. 7 below). We will use this factorization of det(A − tIn )
throughout the exercises.
Exercises
1. Suppose R8 has a basis B = { x8 } so that, for a matrix A8×8 ,
x1 , . . . ,
 
2 0 0
1 2 0 
 
0 1 2   
 
  x1
 2   .. 
−1
P AP =    , P =  . .
3 0 
 
 1 3 
x8
 
 
 0 0
1 0
8×8
(a) Show that det(A − tI8 ) = (t − 2)4 (t − 3)2 t2 . Note that, among the
basis vectors
x1 , . . . ,
x8 , only
x1 ,
x4 ,
x5 and
x7 are eigenvectors of
A corresponding to eigenvalues λ1 = 2, λ2 = 3 and λ3 = 0 with
respective multiplicity 4, 2 and 2 which are the number of times
that eigenvalues appear on the diagonal of P AP −1 .
(b) Determine the eigenspace Eλi and the generalized eigenspace Gλi
(see Sec. B.11) for 1 ≤ i ≤ 3 and see, if any, Eλi = Gλi or not.
(c) For each λi , find the smallest positive integer pi for which
Gλi = Ker((A − λi I8 )pi ), 1 ≤ i ≤ 3.
(d) Show that x3 (A − λ1 I8 ) =
x3 , x3 (A − λ1 I8 )2 =
x2 , x1 , and
x4 are
linearly independent and hence,
Gλ1 =
x1 ,
x2 , x4 .
x3 ,
Similarly, Gλ2 = x6 with
x5 , x6 (A − λ2 I8 ) and Gλ3 =
x5 =
x7 , x8 with x7 = x8 (A − λ3 I8 ).

(e) Let Ai = A | Gλi denote the restriction of the linear operator

A: R8 → R8 to its invariant subspace Gλi for 1 ≤ i ≤ 3. Compute
the following for each i:
1. r(Ai ), r(A2i ), r(A3i ) and r(A4i ).
2. dim Ker(Ali ) for l = 1, 2, 3, 4.
2. Generalized eigenspace (or root space) Suppose λ is an eigenvalue of a
matrix An×n .
(a) Since Ker(A − λIn ) ⊆ Ker(A − λIn )2 ⊆ · · · ⊆ Fn , there exists a
smallest positive integer q such that Ker(A − λIn )l = Ker(A − λIn )q
for all l ≥ q. Hence,
Gλ = Ker(A − λIn )q
is an A-invariant subspace of Fn containing the eigenspace Eλ (see
Ex. 15 of Sec. B.7).
(b) For each generalized eigenvector x ∈ Gλ , there exists a smallest

positive integer p such that 1 ≤ p ≤ q and x (A − λIn )p = 0 . p is
called the order of
x . The set
{ x (A − λIn ),
x, x (A − λIn )2 , . . . ,
x (A − λIn )p−1 }
is linearly independent and is called a cycle of generalized vectors of
A corresponding to λ. Note that x (A − λIn )p−1 ∈ Eλ .
(c) The restriction A | Gλ of A to the invariant subspace Gλ has the
minimal polynomial
(t − λ)q .
Also, q | dim Gλ (see Ex. 3).
(d) Suppose λ1 and λ2 are distinct eigenvalues of A. Then

Gλ1 ∩ Gλ2 = { 0 }
and Gλ1 ∪ Gλ2 is linearly independent.
3. First decomposition theorem for the space Suppose λ1 , . . . , λk are dis-

tinct eigenvalues of An×n , i.e.
det(A − tIn ) = (−1)n (t − λ1 )r1 · · · (t − λk )rk
where r1 + · · · + rk = n. Then
Fn = Gλ1 ⊕ · · · ⊕ Gλk
and the restriction A | Gλi has the characteristic polynomial (t − λi )ri
with
ri = dim Gλi
for 1 ≤ i ≤ k.
x ∈ Fn and
4. Cyclic invariant subspace Suppose A ∈ M(n; F). For

x = 0 , let

x ) = ∩ {W | W is A-invariant subspace of Fn containing

C( x}
=
x,
x A, x Ak , . . . .
x A2 , . . . ,
Then C( x ) is an A-invariant subspace and is the smallest such one.
C(x ) is called the cycle generated by
x related to A or A-cycle generated

by x . A polynomial g(t) ∈ P (F) (see Sec. A.5) for which

x g(A) = 0 is
called an annihilator of x related to A or just an A-annihilator of x.
There exists a unique annihilator of x with minimal degree and leading
coefficient 1, and it is called the minimal polynomial, denoted d x (t), of

x related to A.
(a) Suppose the degree deg d
x (t) = k. Then
dim C(
x) = k
and Bx = { x , x A, . . . , x A
k−1
} is a basis for C(
x ) for which
 
0 1 ··· 0 0
 0 ··· 0 
 0 0 
 . .. .. .. 
 . 
[A|C(
x )]B =  . . . .  ,
x  
 0 0 ··· 1 0 
 
 0 0 ··· 0 1 
−a0 −a1 · · · −ak−2 −ak−1 k×k
where d k
x (t) = t + ak−1 t
k−1
+ ak−2 tk−2 + · · · + a1 t + a0 .
k
(b) (−1) d x (t) is the characteristic polynomial and d x (t) itself the
minimal polynomial for the restriction A | C( x ).

5. Nilpotent matrix A matrix Bn×n is said to be a nilpotent matrix of

power or degree or index m if B m = O but B m−1 = O for positive
integer m. According to Ex. 2, the restriction of A − λIn to Gλ is
nilpotent of power q.
x 1 ∈ Fn such that the cycle C(
(a) There exists a nonzero vector x 1)
satisfies
(1) dim C(x 1 ) = m, and
(2) the restriction B | C(
x 1 ) is nilpotent of power m1 = m.
(b) There exists a B-invariant subspace V of Fn such that
Fn = C(
x 1) ⊕ V
and B | V is nilpotent of power m2 ≤ m1 .
(c) There exist nonzero vectors x 1, x k ∈ Fn such that
x 2, . . . ,
Fn = C(
x 1 ) ⊕ C(
x 2 ) ⊕ · · · ⊕ C(
x k)
where,
(1) the restriction B | C(
x i ) is nilpotent of power mi for 1 ≤ i ≤ k
with
m = m1 ≥ m2 ≥ · · · ≥ mk , and dim C(
xi ) = mi ;
(2) B | C(
x i ) has minimal polynomial
di (t) = tmi , 1≤i≤k
and has the basis Bi = x i B mi −1 ,
{ x i B mi −2 , . . . , x i }.
x i B,
Therefore, B = B1 ∪ · · · ∪ Bk is a basis for Fn and the matrix
representation of B related to B is
 
N1 0
 N2 
 
[B]B =  ..  ,
 . 
0 Nk n×n
where
 
0 0 ··· 0 0 0
1 0 · · · 0 0 0
 
 .. .. .. 
Ni =  ... ... . . . , 1 ≤ i ≤ k.
 
0 0 · · · 1 0 0
0 0 ··· 0 1 0 mi ×mi
Such a direct sum decomposition of Fn is unique up to the ordering

of C(
x i ). Refer to Ex. <C> 3 of Sec. 3.7.7 for a different treatment.
6. Second decomposition theorem for the space Suppose λ is an eigenvalue
of algebraic multiplicity r of a matrix An×n . Then, there exist linear
independent vectors x k ∈ Gλ , the generalized eigenspace of A
x 1, . . . ,
corresponding to λ, such that
x 1 ) ⊕ · · · ⊕ C(
Gλ = C( x k ) = Ker(A − λIn )q ,
where each C(x i ) is the cycle generated by x i related to (A−λIn ) | Gλ .
Moreover, let mi = dim C( x i ) for 1 ≤ i ≤ k and q = m1 ≥ m2 ≥ · · · ≥
mk . Note that m1 + · · · + mk = r = dim Gλ . Then
Hcycles C(
x 1) C(
x 2) ··· C(
x k)
H
bases H

x 1 (A − λIn )m1 −1
x 2 (A − λIn )m2 −1 ··· x k (A − λIn )mk −1 ← 1st row

x 1 (A − λIn )m1 −2
x 2 (A − λIn )m2 −2 ··· x k (A − λIn )mk −2 ← 2nd row
.. .. ..
. . .

x 1 (A − λIn ) x 2 (A − λIn ) · · · x k (A − λIn ) ← (m1 − 1)th

x1 x2 ··· x k ← m1 th row
6 6 6
B1 B2 Bk
where k = dim Eλ , the dimension of the eigenspace Eλ of A corre-

sponding to λ and Eλ = x 1 (A − λIn )m1 −1 , . . . ,
x k (A − λIn )mk −1 .
Therefore, B = B1 ∪ · · · ∪ Bk forms a Jordan canonical basis for A | Gλ
and (t − λ)mi , 1 ≤ i ≤ k, are called elementary divisors of A | Gλ (or,
of A). The matrix representation of A | Gλ with respect to B is
 
J1 0
 J2 
 
[A | Gλ ]B =  ..  ;
 . 
0 Jk r×r
 
λ 0
1 λ 
 
 
Ji =  0 1 λ  , 1 ≤ i ≤ k.
. .. .. .. 
 .. . . . 
0 0 0 ···1 λ m
i ×mi
Note that [A | Gλ ]B is a direct sum of k Jordan canonical blocks of

decreasing sizes and m1 + · · · + mk = r = dim Gλ , the algebraic multi-
plicity of λ.
Remark In the Table above, let
lj = the number of vectors in the jth row, 1 ≤ j ≤ m1 .
Then,
l1 = dim Fn − r(A − λIn ) = dim Ker(A − λIn ) = dim Eλ ;

lj = r(A − λIn )j−1 − r(A − λIn )j for j > 1.
This means the size of the table is completely determined by A | Gλ or

by A itself.
7. Jordan canonical form for matrix Combined Ex. 3 and Ex. 6, here
comes the main result. Suppose that λ1 , . . . , λk are distinct eigenvalues
of An×n such that
det(A − tIn ) = (−1)n (t − λ1 )r1 · · · (t − λk )rk with

r1 + · · · + rk = n and r1 ≥ r2 ≥ · · · ≥ rk .
Then
(1) Each generalized eigenspace
xi1 ) ⊕ · · · ⊕ C(
Gλi = C( x iki ), 1≤i≤k
such that dim C( x ij ) = mij for 1 ≤ j ≤ ki and mi1 ≥

mi2 ≥ · · · ≥ miki . Choose a basis Bij for each C(
x ij ) for 1 ≤ j ≤ ki
so that Bi = Bi1 ∪ · · · ∪ Biki is a Jordan canonical basis for Gλi .
Thus
[A | Gλi ]Bi
is as in Ex. 6.
(2) Moreover
Fn = Gλ1 ⊕ · · · ⊕ Gλk
and B = B1 ∪ · · · ∪ Bk is a Jordan canonical basis for Fn . Thus, A

has the Jordan canonical form
 
[A | Gλ1 ]B1 0
 [A | Gλ2 ]B2 
 
[A]B = P AP −1 =  .. 
 . 
0 [A | Gλk ]Bk n×n
where P is the invertible matrix where row vectors are basis vectors
of B but arranged in a definite ordering.
By Ex. 6,
the characteristic polynomial det(A − tIn )

= (−1)n times the product of all elementary divisors of A | Gλi , 1 ≤
i ≤ k; the minimal polynomial = (t − λ1 )m11 (t − λ2 )m21 · · · (t − λk )mk1 .
8. The process of finding Jordan canonical form of a matrix Given An×n ,
the following process is suggested.
(1) Compute det(A − λIn ) to see if it splits as (−1)n (t − λ1 )r1 · · ·

(t − λk )rk where λ1 , . . . , λk are distinct and r1 + · · · + rk = n.
(2) Note dim Gλi = ri so that the resulted Table for A | Gλi as shown
in Ex. 6 contains ri terms. Then, decide the corresponding numbers
lij of jth row in the Table as indicated inside the Remark beneath
Ex. 6.
(3) Write out the Jordan block [A | Gλi ]Bi where Bi is the Jordan
canonical basis for Gλi consisting of vectors from the Table. Then,
construct the Jordan canonical form
 
[A | Gλ1 ]B1 0
 .. 
[A]B = P AP −1 =  . .
0 [A | Gλk ]Bk
(4) Step 2 indicates how to determine the basis Bi and finally B =

B1 ∪ · · · ∪ Bk when determining lij for 1 ≤ i ≤ k and 1 ≤ j.
For example, let

 
2 0 0 0
−1 3 1 −1
A=
 0 −1 1
.
0
1 0 0 3
The characteristic polynomial is

3 − t 1 −1

det(A − tI4 ) = (2 − t) det −1 1 − t 0 = (t − 2)3 (t − 3).
0 0 3 − t
Then A has distinct eigenvalues λ1 = 2 and λ2 = 3 with respective

multiplicity 3 and 2.
For λ1 = 2 Note dim Gλ1 = 3 and
 
0 0 0 0
−1 1 1 −1
A − λ 1 I4 =  0 −1
 ⇒ r(A − λ1 I4 ) = 2;
−1 0
1 0 0 1
 
0 0 0 0
−2 0 0 −2
(A − λ1 I4 ) = 
2
 1 0
 ⇒ r(A − λ1 I4 )2 = 1.
0 1
1 0 0 1
Hence Ker(A − λ1 I4 )2 = { x = (x1 , x2 , x3 , x4 ) | − 2x2 + x3 +

x4 = 0} = (1, 0, 0, 0), (0, 1, 2, 0), (0, 1, 0, 2) has dimension 3. Take

x1 ∈
x 1 = (0, 1, 0, 2). Hence / Ker(A − λ1 I4 ) and { x 1 (A − λ1 I4 ),
x 1 } is
linearly independent. Choose x 2 = (1, 0, 0, 0) ∈ Ker(A − λI4 ) which is

linearly independent of x 1 (A − λ1 I4 ) = (1, 1, 1, 1).

Then
Gλ1 =
x 1 (A − λ1 I4 ),
x 1 ⊕
x 2
and B1 = {
x 1 (A − λ1 I4 ), x 2 } is a basis for Gλ1 . Therefore
x 1,
 
2 0 0
[A | Gλ1 ]B1 = 1 2 0 .
0 0 2
For λ2 = 3 Note that dim Gλ2 = 1 = dim Eλ2 . Take a corresponding

x 3 = (1, 0, 0, 1) so that Gλ2 =
eigenvector x 3 with basis B2 = {
x 3 }.
Therefore,
[A | Gλ2 ]B2 = [3].
Combining these results together, we have the Jordan canonical form

 
2 0
1 2 
 
P AP −1 =   ,
 2 
3
4×4
where
   
x 1 (A − λ1 I4 ) 1 1 1 1
x  0 1 0 2
P =

1 =
 1 0
.
x2 0 0

x3 1 0 0 1
For each of the following matrices A, find a Jordan canonical basis B

so that [A]B is the Jordan canonical form of A.
   
13 −5 −6 3 3 −2
(a) 16 −7 −8. (b) 0 −1 0.
16 −6 −7 8 6 5
   
−1 −5 6 4 −2 −1
(c)  1 21 −26. (d)  5 −2 −1.
1 17 −21 −2 1 1
 
  3 −4 7 −17
3 −2 −4 1 −1 1 −6
(e)  7 −5 −10. (f) 0
.
0 2 −1
−3 2 3
0 0 1 0
   
1 0 0 0 0 −2 −2 −2
2 1 0 0 −3 1 1 −3
(g)  . (h)  .
3 2 1 0  1 −1 −1 1
4 3 2 1 2 2 2 4
9. Similarity of two matrices Two matrices An×n and Bn×n , each having
its Jordan canonical form, are similar if and only if they have the same
Jordan canonical form, up to the ordering of their eigenvalues. Use this
result to determine which of the following matrices

     
0 −3 7 −3 −7 1 0 −4 −2
−1 −1 5 ;  3 6 −1 ;   1 4 1
−1 −2 6 −2 −3 2 −1 −2 1
are similar.
10. Write out all Jordan canonical matrices (up to the orderings of Jordan
blocks and eigenvalues) whose characteristic polynomials are the same
polynomial
(t − 2)4 (t − 3)2 t2 (t + 1).
B.12.2 Rational canonical form

The main result is the following theorem.
Theorem Suppose A ∈ M(n; F) is a nonzero matrix having its character-

istic polynomials factored as
det(A − tIn ) = (−1)n p1 (t)r1 . . . pk (t)rk
where p1 (t), . . . , pk (t) are distinct irreducible monic polynomials and

r1 ≥ 1, . . . , rk ≥ 1 are positive integers. Then there exists a basis
B = { x n }, called rational canonical basis of A, such that
x 1, . . . ,
 
R1
 R2 
 
[A]B = P AP −1 =  . 
 . . 
Rl n×n
(called the rational canonical form of A)
where each Ri is the companion matrix (refer to Ex. 3(d) of Sec. B.11) of
some polynomial p(t)m , where p(t) is a monic divisor of the characteristic
polynomial det(A − tIn ) of A and m is a positive integer, or Ri is a 1 × 1
matrix [λ], where λ is an eigenvalue of A, and
 
x1
 
 x2 
P =  . .
 .. 

xn
Exercises (continued)
11. For a matrix A9×9 , suppose R9 has a basis B = { x 9 } so that
x, . . . ,
 
0 1 0 0
 0 0 1 0 
   
 0  x1
 0 0 1 
−1 −2 −3 −2   
   x2 
−1    .. 
P AP =  0 −1  , P =  . .
   
 −1 −1  x8 
 
 0 1 
  x9
 −1 0 
3 9×9
−1
P AP is the rational canonical form for A with B the corresponding
rational canonical basis.
(a) Show that the characteristic polynomial det(A − tI9 ) =
−p1 (t)3 p2 (t)p3 (t) where p1 (t) = t2 + t + 1, p2 (t) = t2 + 1 and
p3 (t) = t − 3 with the consecutive submatrices R1 , R2 , R3 and R4
as the respective companion matrix of p1 (t)2 , p1 (t), p2 (t) and p3 (t).
Among the diagonal entries of P AP −1 , only 3 is an eigenvalue of
A with x 9 the corresponding eigenvector.
(b) Determine A-invariant subspaces

Epi = {
x ∈ R9 |
x pi (A)m = 0 for some positive integer m},
for i = 1, 2, 3. Try to find a smallest positive integer mi such that

Epi = Ker pi (A)mi for i = 1, 2, 3.
(c) Show that x 6 ∈ Ep1 and
x 1, . . . , x2 = x 1 A,
x3 = x 2A =
x 1 A , x 4 = x 1 A and x 6 = x 5 A. Also, x 7 , x 8 ∈ Ep2 and
2 3

x8 = x 7 A. Therefore,
Ep1 =
x 1,
x 1 A, x 1 A3 ⊕
x 1 A2 , x 5,
x 5 A,
Ep2 =
x 7,
x 7 A,
Ep3 =
x 9 .
(d) Let Ai = A | Epi denote the restriction of the linear mapping

A: R9 → R9 to its invariant subspace Epi for 1 ≤ i ≤ 3. Compute
the following for each i
(1) r(Ali ) for l = 1, 2, 3, 4,
(2) dim Ker(Ali ) for l = 1, 2, 3, 4.
12. (compare with Ex. 2) Let p(t) be an irreducible monic factor of

det(A − tIn ). Define

Ep = {
x ∈ Fn |
x p(A)r = 0 for some positive integer r}.
(a) Ep is an A-invariant subspace of Fn . Also, there exists a small-

est positive integer m for which
Ep = Ker p(A)m .
(b) The restriction A | Ep has the minimal polynomial
(p(t))m .
Also, md | dim Ep , where d = deg p(t) (see Ex. 3(b) of

Sec. B.11).
(c) Suppose p1 (t) and p2 (t) are distinct irreducible monic factors
of det(A − tIn ), then

(1) Ep1 ∩ Ep2 = { 0 } and Ep1 ∪ Ep2 is linearly independent.
(2) Ep1 is invariant under p2 (A) and the restriction of p2 (A)
to Ep1 is one-to-one and onto.
Recall the A-cycle invariant subspace C( x ) generated by a nonzero
vector x ∈ Fn (related to A) and its basis Bx = { x , x A, . . . , x A
k−1
}

if k = dim C( x ) (see Ex. 4).
13. (compare with Ex. 6) Suppose p(t)r is a divisor of the characteristic
polynomial of An×n where p(t) is an irreducible monic polynomial
such that p(t)r+1 is no more a divisor. Let d = deg p(t).
(a) Let V be any A-invariant subspace of Ep and B be a basis
for V. Then, for any x ∈ Ker p(A) but
x ∈
/ V, B ∪ B x is
linearly independent.
(b) There exist linearly independent vectors x 1, . . . , x k in
Ker p(A) ⊆ Ep = Ker p(A) such that B = B
m
x1 ∪ · · · ∪ B x1
forms a basis for Ep and
x 1 ) ⊕ · · · ⊕ C(
Ep = C( x k ).
(c) Let p(t)mi be the A-annihilator of xi with m1 ≥ m2 ≥ · · · ≥

mk . Construct the following table.
C( C( C(

cycles
H x 1) x 2) ... x k)
basesHH
• • • ← 1st row
• • ··· • ← 2nd row
• • •
.. .. ..
. . ··· .
• • ··· • ← d m1 th row
6 6 6
B
x -with B
x -with B
x -with
1 2 k

x1 p(A)m1 = 0, x2 p(A)m2 = 0, xk p(A)mk = 0.
Notice that dim Ep = r · d = m1 d + m2 d + · · · + mk d and hence

r = m1 + m2 + · · · + mk .
Let lj be the number of dots in the jth row for 1 ≤ j ≤ d m1 .
Then
1 1
l1 = [dim Fn − r[p(A)]] = dim Ker p(A),
d d
1
lj = [r[p(A)j−1 ] − r[p(A)j ]] for j > 1,
d
where d is the degree of p(t). Therefore, the size of the Table

is completely determined by A | Ep or by A itself. Each dot
contributes a companion matrix.
(d) The matrix representation of A | Ep with respect to B is
 
R1 0
 R2 
 
[A | Ep ]B =  .. 
 . 
0 Rk
where each Rj , 1 ≤ j ≤ k, is the companion matrix of p(t)mj

and dr = dim Ep = d(m1 + · · · + mk ) if Ep = Ker(p(A))r =
Ker(p(A))m as in Ex. 12.
14. (compare with Ex. 7) Fundamental decomposition theorem for
square matrix or linear operator Let the characteristic polynomial
of a nonzero matrix An×n be
det(A − tIn ) = (−1)n p1 (t)r1 . . . pk (t)rk
where p1 (t), . . . , pk (t) are distinct irreducible monic polynomials

with respective degree d1 , . . . , dk and d1 ≥ d2 ≥ · · · ≥ dk .
(a) For 1 ≤ i ≤ k,
x i1 ) ⊕ · · · ⊕ C(
Epi = C( x iki )
such that dim C( x ij ) = di mij for 1 ≤ j ≤ ki and mi1 ≥

mi2 ≥ · · · ≥ miki with di ri = dim Epi = (mi1 + · · · + miki )di .
Take basis Bij = { x ij , xij Adi mij −1 } for C(
xij A, . . . , xij ),
1 ≤ j ≤ ki . Then Bi = Bi1 ∪ · · · ∪ Biki is a rational canoni-
cal basis for Epi and
[A | Epi ]Bi
is the rational canonical form of A | Epi as in Ex. 13.

(b) Moreover,
Fn = Ep1 ⊕ · · · ⊕ Epk
and B = B1 ∪ · · · ∪ Bk is a rational canonical basis for Fn so

that the matrix representation of A with respect to B is the
rational canonical form
 
[A | Ep1 ]B1 0
 [A | Ep2 ]B2 
 
[A]B = P AP −1 =  .. 
 . 
0 [A | Epk ]Bk n×n
where P is the invertible n × n matrix where row vectors are

basis vectors of B arranged in a definite order.
(c) For each i, 1 ≤ i ≤ k, the restriction A | Epi has characteristic
polynomial pi (t)ri and minimal polynomial pi (t)mi1 . Therefore,
the minimal polynomial of A is
p1 (t)m11 p2 (t)m21 · · · pk (t)mk1 .

15. (compare with Ex. 8) Ex. 13(d) indicates how to compute the ratio-
nal canonical form of a matrix. For example, let
 
0 1 0 0
−4 −1 −1 −1
A=  12
 ∈ M(4; R).
3 6 8
−7 −3 −4 −5
Step 1 Compute the characteristic polynomial.
By actual computation,
det(A − tI4 ) = t4 + 5t2 + 6 = (t2 + 2)(t2 + 3).
Let p1 (t) = t2 + 2 and p2 (t) = t2 + 3.
Step 2 Determine [A | Epi ]Bi .
Now
 
−4 −1 −1 −1
−1 −3 −1 −2
A2 =   4

3 1 5
−1 −1 −1 −4
 
−2 −1 −1 −1
 −1 −1 −1 −2
⇒ p1 (A) = A2 + 2I4 = 
 4
 with r(A2 + 2I4 ) = 2;
3 3 5
−1 −1 −1 −2
 
−1 −1 −1 −1
 −1 0 −1 −2
p2 (A) = A2 + 3I4 = 
 4
 with r(A2 + 3I4 ) = 2.
3 4 5
−1 −1 −1 −1
Hence, there exist
x1 ∈ Ker(A + 2I4 ) and
2
x 2 ∈ Ker(A2 + 3I4 ) so
that B1 = { x 1 , x 1 A} is a basis for Ep1 and B2 = {

x 2,
x 2 A} is a
basis for Ep2 . Therefore,

0 1 0 1
[A | Ep1 ]B1 = and [A | Ep2 ]B2 =
−2 0 −3 0
and the rational canonical form of A is
  
0 1 
x1
−2 0   
   1 A
x
[A]B = P AP −1 =  , P =  x2 
 0 1
−3 0
x 2A
where B = B1 ∪ B2 is a rational canonical basis.
Step 3 Determine B and hence P .

Let
x = (x1 , x2 , x3 , x4 ). Solving
x p1 (A) = 0 , we get
Ker p1 (A) = (1, 2, 1, 0), (0, −1, 0, 1).

Solving
x p2 (A) = 0 , we get
Ker p2 (A) = (3, 1, 1, 0), (−1, 0, 0, 1).
Choose x 1 = (0, −1, 0, 1) and x 2 = (−1, 0, 0, 1). Then x 1A =

(−3, −2, −3, −4) and x 2 A = (−7, −4, −4, −5). Thus, the ratio-

nal canonical basis B = {(0, −1, 0, 1), (−3, −2, −3, −4), (−1, 0, 0, 1),
(−7, −4, −4, −5)} and
 
0 −1 0 −1
−3 −2 −3 −4
P =
−1
.
0 0 1
−7 −4 −4 −5
Notice that, if A is considered as a complex matrix, then A is

diagonalizable. Try to find an invertible complex matrix Q4×4 so
that
√ 
2i √ 0
 − 2i √ 
QAQ−1 =  
.

3i √
0 − 3i
For another example, let
 
0 1 1 1 1
 2 −2 0 −2 −4 
 
B=
 0 0 1 1 3  ∈ M(5; R).

−6 0 −3 −1 −3 
2 2 2 2 4
Then det(A − tI5 ) = −(t2 + 2)2 (t − 2). Let p1 (t) = t2 + 2 with

multiplicity r1 = 2 and p2 (t) = t − 2 with r2 = 1. Hence dim Epi =
2.2 = 4, and r1 = 2 = m1 + · · · + mk implies that m1 = m2 = 1 or
m1 = 2. Now
 
−2 0 0 0 0
 0 −2 0 0 0
 
2 
B = 0 6 4 6 12
 0 −12 −12 −14 −24
0 6 6 6 10
 
0 0 0 0 0
0 0 0 0 0
 
2 
⇒ B + 2I5 = 0 6 6 6 12
0 −12 −12 −12 −24
0 6 6 6 12
with r(p1 (B)) = r(B 2 + 2I5 ) = 1.
Thus, the first row in the Table for Ep has 12 (5 − r(p1 (B))) = 2
dots.
This implies that the only possibility is m1 = m2 = 1 and each
dot contributes the companion matrix

0 1
−2 0
of p1 (t) = t2 + 2 to the rational canonical form. On the other hand,

dim Ep2 = 1 and p2 (t) results in the 1×1 matrix [2] to the canonical
form. Combining together, the rational canonical form of B is
   
0 1 x1
−2 0  
  x1 A

−1    x2 
P BP = 0 1 , P =  
  
 −2 0  x2 A

2 x3
where x1 = e 1 = (1, 0, 0, 0, 0) and x2 = e 2 = (0, 1, 0, 0, 0) are

x 2 A = (2, −2, 0, −2, 4),
in Ker p1 (B), and x 1 A = (0, 1, 1, 1, 1) and

x 3 = (0, 1, 1, 1, 2) ∈ Ker p2 (B).

while
For each of the following real matrices A, find a rational canon-

ical basis B so that [A]B is the rational canonical form of A.
 
    1 −2 0 0
6 −3 −2 1 0 3 2 1 0 0
(a)  4 −1 −2 ; (b) 2 1 2 ; (c)  1
;
0 1 −2
10 −5 −3 0 0 2
0 1 2 1
 
2 0 0 0
1 2 0 0
(d) 
0 1 2 0 .

0 0 0 2
References
On Linear Algebra
[1] · : (Elementary Geometry), , , 1998.
[2] · : (Linear Algebra), , , 1989.
[3] : (1): (Determinants), , 1982, 1987.
[4] : (2): (Matrices), , 1982, 1987.
[5] : (3): , (Vector and Affine Spaces), , 1983.
[6] : (4): (Projective Spaces), , 1984.
[7] : (5): (Inner Product Spaces), , 1984.
[8] : (Linear Algebra and Theory of Matrices),
, , 1992.
[9] A. R. Amir-Moe’z and A. L. Fass: Elements of Linear Algebra, Pergamon
Press, The MacMillan Co., New York, 1962.
[10] N. V. Efimov and E. R. Rozendorn: Linear Algebra and Multidimensional
Geometry (English translation), Mir Publishers, Moscow, 1975.
[11] A. E. Fekete: Real Linear Algebra, Marcel Dekkel, Inc., New York, Basel,
1985.
[12] H. Gupta: Matrices in n-dimensional Geometry, South Asian Publishers,
New Delhi, 1985.
[13] M. Koecher: Lineare Algebra und Analytischer Geometrie, zweite Auflage,
Grundwissen Mathematik 2, Springer-Verlag, Berlin, Heidelberg, New York,
Tokyo, 1985.
[14] P. S. Modenov and A. Parkhomenko: Geometric Transformations, Academic
Press, New York, London, 1965.
[15] G. Strang: Introduction to Linear Algebra, 2nd ed., Wellesley-Cambridge
Press, 1998.
[16] G. Strang: Linear Algebra and its Applications, 3rd ed., Saunders,
Philadelphia, 1988.
[17] G. Strang: The Fundamental Theorem of Linear Algebra, AMS, Vol 100,
number 9(1993), 848–855.
[18] S. M. Wilkinson: The Algebraic Eigenvalue Problem, Oxford University
Press, New York, 1965.
Other Related Sources
History
[19] M. Kline: Mathematical Thought: From Ancient to Modern Times, Oxford
University Press, New York, 1972.
819
820 Geometrics Linear Algebrea — I
Group
[20] J. J. Rotman: An Introduction to the Theory of Groups, 4th ed., Springer-
Verlag, New York, 1995.
Geometry
[21] H. Eves: A Survey of Geometry, Vol 1, 1963, Vol 2, 1965, Allyn and Bacon,
Inc., Boston.
Differential Geometry, Lie Group

[22] M. P. do Carmo: Differential Geometry of Curves and Surfaces, Prentice–
Hall, Inc., New Jersey, 1976.
[23] C. Chevalley: Theory of Lie Groups, Vol I, Princeton University Press,
Princetion, New Jersey, 1946.
[24] S. Helgason: Differential Geometry, Lie Group and Symmetric Spaces,
Academic Press, New York, San Francisco, London, 1978.
Note. : , , , 1982. This is a popular book in
Japan with its informal, nonrigorous approach to Lie group.
Fractal Geometry
[25] M. Barnsley: Fractals Everywhere, Academic Press, San Diego, 1988.
[26] B. B. Mandelbrot: The Fractal Geometry of Nature, W. H. Freeman,
New York, 1983.
Matrix Analysis
[27] R. Bellman: Introduction to Matrix Analysis, Siam, Philadelphia, 1995.
Real Analysis (including Differentiable Manifolds)

[28] L. H. Loomis and S. Sternberg: Advanced Calculus, revised ed., Johnes and
Bartlett Publishers, Boston, 1990.
Note. · : (Theoretical Analysis), Vol I (1977, 1978, 1985,
1993); Vol II (1981, 1989); Vol III (1982, 1989), . These three voluminous
books contain routine materials in Advanced Calculus plus topics in Metric
Space, Functional Analysis, Measure and Integration, and Submanifolds in
Rn up to the Stokes Theorem in Differential Form.
Complex Analysis
[29] L. V. Ahlfors: Complex Analysis, 3rd ed., McGraw-Hill Book Co., New York,
1979.
[30] L. K. Hua( ): Harmonic Analysis of Functions of Several Complex
Variables in the Classical Domains ( ), Trans. Math.
Monographs, Vol 6, AMS, Providence, R.I., 1963.
References 821
[31] O. Lehto and K. I. Virtanen: Quasikonforme Abbildungen, 1965, English

Translation: Quasiconformal Mappings in the Plane, Springer-Verlag, Berlin,
Heidelberg, New York, 1970.
[32] J. Väisälä: Lectures on n-Dimensional Quasiconformal Mappings, Lecture
Notes in Mathematics 229, Springer-Verlag, Berlin, Heidelberg, New York,
1970.
Differential Equations
[33] W. E. Boyce and R. C. Diprema: Elementary Differential Equations, 7th ed.,
John Wiley & Sons, New York, 2000.
[34] S. J. Farlow: An Introduction to Differential Equations and their Applica-
tions, McGraw-Hill, Inc., New York, 1994.
Fourier Analysis
[35] R. E. Edwards: Fourier Series, A Modern Approach, Vols 1 and 2, Pergamon
Press, Inc., New York, 1964.
[36] E. C. Titchmarsh: Introduction to the Theory of Fourier Integrals, 2nd ed.,
Oxford at the Clarendon Press, 1959.
[37] A. Zygmund: Trigonometric Series, Vols I and II, Cambridge University
Press, New York, 1959.
Markov Chains
[38] K. L. Chung ( ): Elementary Probability Theory with Stochastic
Processes, UTM, Springer-Verlag, New York, 1974.
[39] K. L. Chung: Markov Chains, 2nd ed., Springer-Verlag, 1967.
[40] J. L. Doob: Stochastic Processes, Wiley & Sons, New York, 1953.
[41] J. G. Kemeny and J. L. Snell: Finite Markov Chains, Springer-Verlag,
New York, 1976.
Index of Notations
Notation Definition Pages

{} set 681
∈ is an element of 681
∈
/ is not an element of 681
⊆ subset 681
proper subset 681

union 681
5
intersection 681
/
O empty set 681
A\B or A − B difference set of set A by set B 681
A×B Cartesian product of A and B 681
1A the identity map or function on the 683
set A
δij Kronecker delta: = 1 if i = j; = 0 if
i = j
⇒ implies
⇔ if and only if
the end of a proof or a solution
Sec.2.7.1(etc.) the section numbered 7.1 in Chapter 2
Sec.A.1(etc.) the section numbered 1 in Appendix A
Sec.B.2(etc.) the section numbered 2 in Appendix B
Fig.1.5(etc.) the figure numbered 5 in Chapter 1
823
824 Index of Notations
[1] (etc.) Reference [1]

Ex.<A>1 problem 1 in Exercises <A> of
of Sec.2.3 (etc.) Sec.2.3
Ex.<A> or Exercises <A> or or <C>
or <C> or <D> or <D>
A = [aij ]m×n , [aij ] matrix of order m × n : m rows, n 700
or Am×n columns, where 1 ≤ i ≤ m
and 1 ≤ j ≤ n
A = [aij ]n×n or square matrix of order n 700
An×n
∗
A transpose of A = [aij ]m×n : [aji ]n×m 124, 702
Ā conjugate matrix of A = [aij ] : [āij ] 702
Ā∗ conjugate transpose of A: 702
(Ā)∗ = (A∗ )
A + B = [aij + bij ] the sum (addition) of the matrices 702
A and B of the same order
αA = [αaij ] the scalar multiplication of the 702
scalar α and the matrix A
AB = [cik ] the product of A = [aij ]m×n and 705
B = [bjk ]n×p :

n
cik = aij bjk , 1 ≤ i ≤ m, 1 ≤ k ≤ p
j=1
A0 = In the zero power of a square matrix A 120, 701
is defined to be the identity
matrix In
A−1 the inverse of an invertible (square) 47, 55, 336,
matrix A 711
Ap the pth power of a square matrix A, 120, 707
where p ≥ 1 is an integer; negative
integer p is permitted if A is
invertible
A=B equality of two matrices Am×n and 700
Bm×n
AB B
B or [1v ]B the transition matrix from a basis B 47, 337, 735
to a basis B in a
finite-dimensional vector space V ;
change of coordinates matrix
Index of Notations 825
AA∗ Gram matrix of a matrix Am×n 758(etc.)

−1
P AP matrix similar to An×n 117, 407, 716
∗
P AP matrix congruent to a symmetric 166, 453, 776
matrix An×n where P is
invertible
A −→ EA the matrix EA obtained by 152(etc.),
E
performing elementary row 160, 443, 720
operation (matrix) E to the
matrix A
A −→ AF the matrix AF obtained by 151(etc.), 160,
F
performing elementary column 449, 720
operation (matrix) F to the
matrix A

xA Am×n acts as a linear 87, 735,
transformation from Fm to Fn 757(etc.)
defined as x → x A, where the

vector x = (x1 , x2 , . . . , xm ) is
treated as a 1 × m matrix
(x1 x2 · · · xm ) or [x1 x2 · · · xm ],
the row matrix
[A]B
B the matrix representation of Am×n , 59, 115, 406,
as a linear transformation from 735
Fm to Fn w. r. t. a basis B for
Fm and a basis B for Fn
[A]B = [A]B
B in case An×n and B = B when A 58, 86, 103,
acts as a linear operator on Fn 117, 389,
407, 735
Ker(A); N(A) kernel or null space of the linear 56, 85, 88,
transformation A; specifically, 124, 407,
left kernel
- spacemof the matrix 734, 757
.
Am×n : x ∈ F xA = 0
Im(A); R(A) image or range space of the linear 56, 85, 88,
transformation A; specifically, 124, 407,
the row space of 734, 757

Am×n : { x A x ∈ Fm } ⊆ Fn
Ker(A∗ ); N(A∗ ) right kernel
- spacenof 124, 407, 757
.
Am×n : x ∈ F x A∗ = 0
Im(A∗ ); R(A∗ ) column space of 124, 407, 757
Am×n : { x A∗
x ∈ Fn }
r(A) rank of Am×n , defined to be the 89, 123, 124,

dimension dim Im(A), 366, 383,
specifically called the row rank of 407, 711,
A and equal to r(A∗ ), the column 722, 739
rank of A
tr(A) trace of a square matrix 122, 407, 713
A = [aij ]n×n : the sum
n
aii of diagonal entries of A
i=1
detA; det[A]B determinant associated to the 47, 87, 121,
square matrix An×n ; 704, 727
determinant of linear operator A
Aij the minor of aij in A = [aij ]n×n 336, 727
i+j
(−1) Aij the cofactor of aij in A = [aij ]n×n 336, 728
adjA adjoint matrix of A : [bij ]n×n with 336, 731
bij = (−1)j+i det Aji , 1 ≤ i, j ≤ n
Ai∗ ith row vector (ai1 , . . . , ain ), also 375, 408, 700
treated as 1 × n row matrix, of
A = [aij ]m×n for 1 ≤ i ≤ m
A∗j jth column vector or m × 1 column 368, 375, 408,
matrix 700
 
a1j
 . 
 .  of A = [aij ]m×n
 . 
amj
for 1 ≤ j ≤ n

xA = b A = [aij ]m×n , 87, 150, 367,

x = (x1 , . . . , xm ) ∈ Fm and b =

415, 424,

m
442, 711
(b1 , . . . , bn ) ∈ Fn : aij xi = bj ,
(etc.)
i=1
1 ≤ j ≤ n, a system of n linear
equations in m unknowns
x1 , . . . , xm ; homogeneous if

b = 0 , non-homogeneous if

b = 0
∗

x A = b or x = (x1 , . . . , xn ) ∈
A = [aij ]m×n , 152, 415, 424,
∗
x∗ = b
A Fn and b = (b1 , . . . , bm ) ∈ Fm : 442, 711,
n
723, 759
aij xj = bi , 1 ≤ i ≤ m
(etc.)
∗ ∗ j=1
A
A b or augmented matrix of the coefficient 150, 152, 156,
∗
b matrix A of Ax ∗ = b or 377, 416,
∗
x A = b , respectively 425, 724
(etc.)

A B
block or partitioned matrix 180
C D
A+ the generalized or pseudo inverse of 175, 177, 419,
a real or complex matrix Am×n , 429, 462,
which is A∗ (AA∗ )−1 if r(A) = m; 469, 766
(A∗ A)−1 A∗ if r(A) = n; (etc.)
C ∗ (CC ∗ )−1 (B ∗ B)−1 B if
A = Bm×r Cr×n with
r(B) = r(C) = r, where r = r(A)
det(A − tIn ) the characteristic polynomial 106, 399, 407,
(−1)n tn + αn−1 tn−1 + · · · + 491, 719,
α1 t + α0 , αn−1 = tr(A), 791(etc.),
α0 = detA, of a matrix An×n 795

m
p(A) = αk Ak polynomial matrix of An×n 108, 128, 403,
k=0 induced by the polynomial 486, 496,

m
499
p(t) = αk tk . Note A0 = In
k=0
∞

A 1 k
e = A matrix exponential of A 496, 499, 547,
k!
k=0 558
ρ(A) spectral radius of A 496
(k) (k)
lim A the limit matrix (if exists) of A , 494
k→∞ (k)
where An×n , as k → ∞

a , b , x , y (etc.) vector 8, 26, 692
α1 a1 +· · ·+αk
ak = linear combination of vectors 31, 324, 695
k
a1 , . . . ,
ak with coefficients
αi ai
(scalars) α1 , . . . , αk
i=1

a1 a2 (etc.) directed line segment from point a1 18, 65
to point a2 ; line segment with
endpoints a1 and a2
∆
a1
a2
a3 (affine and Euclidean) triangle with 65, 76
vertices at points a1 ,
a2 and
a3 ;
base triangle in a barycentric
coordinate system for the plane
¯
∆ a1
a2
a3 oriented triangle 75
∆
a1
a2
a3
a4 a tetrahedron with vertices at 356, 363,
points a1 , a2 ,
a3 and a4 ; 638, 643
4-tetrahedron; 4-simplex; base
tetrahedron
∆ a1 · · ·
a0 ak k-tetrahedron or k-simplex, where, 642, 654

a0 , a1 , . . . ,
ak are affinely
independent points in an affine
space

a0 a1 ···
ak k-parallelogram with a0 as vertex 61, 446, 638,
and side vectors 661, 667

a1 − ak −
a0 , . . . , a0 , where

a0 , a1 , . . ., ak are affinely
independent points in an affine
space; k-hyperparallelepiped
B, C, D, N (etc.) basis for a finite-dimensional vector 10, 34, 115,
space V 326, 406,
697 (etc.)
−
[P ]B or [OP ]B or coordinate vector of a point P in R 10
−
[
x ]B = α or the vector OP = x with
respect to a basis B
−
[P ]B or [OP ]B or coordinate vector of a point P in 34
−
[
x ]B = (x1 , x2 ) R2 or the vector OP = x w. r. t.
a basis B = { a2 }, namely,
a1 ,
2

x= xi
ai
i=1
−
[P ]B or [OP ]B or coordinate vector of a point P in 326
−
[
x ]B = (x1 , x2 , x3 ) R3 or the vector OP = x w. r. t.
a basis, B = {a1 , a3 }, namely,
a2 ,
3

x= xi
ai
i=1
B∗ dual basis of a basis B 420, 750

B= affine basis with
a0 as base point, 19, 71, 362,
{
a0 , ak }
a1 , . . . , where
a0 ,
a1 , . . . ,
ak are affinely 640, 641,
independent points in Rn 642
(
x )B = (normalized) barycentric 19, 71, 362,
(λ0 , λ1 , . . . , λk ), coordinate of the point x w. r. t. 642
λ 0 + λ1 + · · · + affine basis B:
λk = 1 k
k

x= λi
ai , λi = 1
i=0 i=0
[
x ]B = [ x − a0 ]B = affine coordinate of the point x 19, 72, 363, 641
(λ1 , . . . , λk ) w. r. t. affine basis

k
B: x − a0 = ai −
λi ( a0 )
i=1
(λ1 : λ2 : λ3 ) homogeneous area coordinate or 76
(nonnormalized) barycentric
coordinate
C complex field 685

C or C
1
standard one-dimensional complex 29, 693
vector space
Cn standard n-dimensional complex 42, 693
vector space
Cf (
x ) or C(
x) f -cycle subspace generated 210, 212,
by a vector 570, 802
x :

x , f ( x ), . . .
x ), f 2 (
where f is a linear operator
diag[a11 , . . . , ann ] diagonal matrix 701
dim V dimension of a finite dimensional 43, 698
vector space V

ei = (0, . . . , 0, 1, ith coordinate vector in the 38, 328, 693,
0, . . . , 0) standard basis for Fn (for 697
example, Rn , Cn , Inp ), for n ≥ 2
E(i)(j) elementary matrix of type I: 150, 160, 442,
interchange of ith row and jth 720
row of In , i = j
Eα(i) elementary matrix of type II: 150, 160, 442,
multiplication of ith row of In by 720
scalar α = 0
E(j)+α(i) elementary matrix of type III: 150, 160, 442,

addition of α multiplier of ith 720
row to jth row, i = j
Eλ eigenspace associated to the 193, 199, 210,
eigenvalue λ 479, 791
F field 684
F or F
1
standard one-dimensional vector 693
space over the field F
Fn standard n-dimensional vector 42, 332, 333,
space over the field F, n ≥ 2 693
F(i)(j) elementary matrix of type I: 150, 160, 442,
interchange of ith column and 720
jth column of In , i = j
Fα(i) elementary matrix of type II: 150, 160, 442,
multiplication of ith column of 720
In by scalar α = 0
F(j)+α(i) elementary matrix of type III, of 150, 160, 442,
columns 720
f: A→B function from a set A to a set B 682
−1
f :B→A inverse function of f : A → B which 684
is both one-to-one and onto
f |s or f |S : S → B the restriction of f : A → B to a 683
subset S ⊆ A
f ◦g the composite of g followed by f 683
f: V →W linear transformation from a vector 57, 58, 84,
space V to a vector space W over 366, 732
the same field F; in case W = V ,
called linear operator ; in case
W = F, called linear functional
F(X, F) the vector space of functions form a 333, 693
set X to a field F
Ker(f ), N(f ) kernel or null space of 56, 58, 85,

f : {
x ∈ V |f (
x) = 0 } 124, 366, 407,
412, 734
Im(f ), R(f ) image or range space of 56, 58, 85,
f : {f ( x ∈ V}
x )| 124, 366, 407,
412, 734
f 0 = 1V zero power of a linear operator f 120, 406

f =f ◦f
n n−1
nth power of a linear operator f , 120, 406
n ≥ 1: f 1 = f ; n might be
negative integer if f is invertible
and f −n = (f −1 )n , n ≥ 1
[f ]B
C matrix representation of 59, 115, 406,
f : V → W , where dim V < ∞ 735
and dim W < ∞, w. r. t. a basis
B for V and a basis C for W
[f ]B = [f ]B
B matrix representation of a linear 57, 58, 59, 107,
operator f w. r. t. a basis B 117, 366,
735
f∗ adjoint operator of a linear 422, 754, 781
operator f on an inner product
space (V, , ); adjoint or dual
operator of
f : V → W : f ∗ ∈ L(W ∗ , V ∗ )
det f = det[f ]B determinant of a linear operator f 121, 407
on V , dim V < ∞
det(f − t1V ) = characteristic polynomial of a 106, 121, 407,
det([f ]B − tIn ) linear operator f on V , 719, 791
dim V = n < ∞ (etc.)
tr(f ) = tr[f ]B trace of f 122, 407, 713
r(f ) = dim Im(f ) rank of a linear transformation f 124, 407, 711,
on V , dim V < ∞ 739
Gλ (f ), Gλ generalized eigenspace associated 210, 224, 516,
to λ 794, 801

GL(n; F) {A ∈ M(n; F)A is invertible}, 119, 120, 406,
called the general linear group of 657, 737
F
order n over
GL(V, V ) {f ∈ L(V, V ) f is invertible}, 119, 120, 406,
where dim V = n, also called the 657, 737
general linear group of order n
on V

A O
Ga (n; R) A ∈ GL(n; R) 238, 239, 578,
x0 1
/ 580, 656
x 0 ∈ Rn , namely, the set
and
of regular affine transformations,

called the real group of affine
transformations of order n or
on Rn
Gp (2; R) projective group of order 2 over R 661
Gp (3; R) projective group on the projective 660
space P3 (R); projective group of
order 3 over R
Γ intuitive 3-dimensional space, used 319, 322
in Sec. 3.1 only
Γ(O; A1 , A2 , A3 ) vectorized space 324
{x1 a1 + x2 a3 |x1 , x2 , x3 ∈
a2 + x3
R} of the space Γ with O as o
−−
and ai = OAi , i = 1, 2, 3, as
basis vectors, where O, A1 , A2
and A3 are not coplanar points
in Γ, used only in Secs. 3.1–3.6
Hom(V , W ) or the vector space of linear 85, 737
L(V , W ) transformations from vector
space V to vector space W , over
the same field F
Hom(V , V ) or the vector space of linear operators 85, 733
L(V , V ) on V
Hom(V , F) = (first) dual space of V 749
L(V , F) = V ∗
Hom(R2 , R2 ) or 117, 119
L(R2 , R2 )
Hom(R3 , R3 ) or 406
L(R3 , R3 )
Hom(R3 , R) 420
= (R3 )*
In = In×n identity matrix of order n 701

Ip finite field {0, 1, 2, . . . , p − 1}, 686
where p is a prime
Inp standard n-dimensional vector 693

space {(x1 , . . . , xn ) | xi ∈ Ip ,
1 ≤ i ≤ n} over Ip
, inner product on a vector space V 773
over R or C
L intuitive one-dimensional straight 5, 7
line, used only in Sec. 1.1

L(O; X) vectorized space {α x α ∈ R} of 9, 10
the line L with O as o and
−−
x = OX as the basis vector,
where O and X are distinct
points on L, used only in
Secs. 1.1–1.4
1
l∞ or S∞ line at infinity or infinite line or 315, 657
ideal line
M(m, n; F ) mn-dimensional
vector space {A = 333, 694, 702

[aij ]m×n aij ∈ F, 1 ≤ i ≤ m,
1 ≤ j ≤ n} of m × n matrices
over F
M(n; F) M(n, n; F) 118, 406, 702,
705
M(2; R) 118, 119
M(3; R) 406
∗
(M(m, n; F), , ) A, B = trAB , if 714
F = C; A, B = trAB, if F = R
N = { e n}
e 1, . . . , standard or natural basis for Fn ; 38, 55, 328,
for inner product spaces Rn 335, 366,
and Cn 693, 697,
775
N = { e 2}
e 1, 38, 55
N = {
e 1, e 3}
e 2, 328, 335, 366
N = standard or natural affine basis for 335, 362, 581
{o, e 1, e 3}
e 2, R3
Ñ = { e 1 ,

e 2, e 3, standard or natural projective basis 659

e 4, e 1 + e 2 + for P 3 (R)

e3 + e 4}

n
[
x ]N =
x
x = (x1 , . . . , xn ) = e i ∈ Fn
xi
i=1
Om×n m × n zero matrix 700

o zero vector 7, 26, 324, 692
{o}

zero subspace 41, 694, 698
P Q(etc.) segment between P and Q on a 6

line; the length of the segment
PQ
−
P Q(etc.) directed segment from P to Q; the 6, 7, 24
−
signed length of P Q;
displacement (position, free)
vector from P to Q, usually
denoted as x or
a
P(F) vector space of polynomials over a 333, 688, 698
field F
Pn (F) vector space of polynomials over F 333, 688, 698
whose degree is no more than n
P 1 (R) one-dimensional projective line 661
over R
P 2 (R) = R2 ∪ l∞ two-dimensional projective plane 315, 661
over R
P 3 (R) = R3 ∪ S∞
2
three-dimensional projective space 657, 658
over R
Q rational field 686
R real field 685

RL(O; X) coordinatized space {[P ]B P ∈ L} 11
of the line L w. r. t. the basis
−−
B = { x },
x = OX
R1 or R standard one-dimensional real 12
vector space
R2Σ(O; A1 , A2 ) coordinatized space {[P ]B P ∈ Σ} 35
of the plane Σ w. r. t. the basis
−−
B = { a 2 }, where
a 1, a 1 = OA1 ,
−−
a 2 = OA2 and [P ]B = (x1 , x2 )
R2 standard two-dimensional real 37
vector space

R3Γ(O; A1 , A2 , A3 ) coordinatized space {[P ]B P ∈ Γ} 326
of the space Γ w. r. t. the basis
B = { a 1, a 3 }, where
a 2,
−
a i = OAi , i = 1, 2, 3 and
[P ]B = (x1 , x2 , x3 )
R3 standard three-dimensional real 327
vector space
Rn (n ≥ 2) standard n-dimensional affine, or 42, 327, 693,
Euclidean (inner product) or 773
vector space over the real field R
(R3 )∗ (first) dual space of R3 420
(R3 )∗∗ second dual space of R3 420
S(etc.) subspace of a vector space V 40, 330, 694

S 1 ∩ S2 intersection space of subspaces S1 330, 694
and S2
S1 + S 2 sum space of subspaces S1 and S2 41, 330, 694
S 1 ⊕ S2 direct sum (space) of subspaces 41, 330, 694
S1 and S2
S subspace generated or spanned by 41, 330, 695
a nonempty subset S of a vector
space V : {finite linear
combinations of vectors in S}
So annihilator 421, 752
{f ∈ V ∗ f ( x) = o for all x ∈ S}
of a nonempty subset S of a
vector space V , a subspace of V ∗ ;
also denoted as S ⊥ in Sec. B.8
S⊥ orthogonal complement 129, 412, 778
{y ∈ V y,x = 0 for all

x ∈ S} of a nonempty subset S
in an inner product space (V, , )
Sk =
x0 + S k-dimensional affine subspace of an 640, 724
affine or vector space V , where S
is a k-dimensional subspace of V
and x 0 is a point in V :
{x0 +

v v ∈ S}
Sr ∩ Sk intersection (affine) subspace of S r 644
and S k
Sr + Sk sum (affine) subspace of S r and S k 644

0 3
S∞ a point at infinity in P (R) 657
k−1
S∞ (k − 1)-dimensional subspace at 657
infinity in P 3 (R)
2
S∞ hyperplane at infinity in P 3 (R) 657
SVD singular value decomposition of a 771
real or complex matrix Am×n
Σ intuitive two-dimensional plane, 21
used only in Sec. 2.1
Σ(O; A1 , A2 ) vectorized space 31
{x1 a 2 x1 , x2 ∈ R} of the
a 1 + x2
plane Σ with O as o and
−
a i = OAi , i = 1, 2, as basis
vectors, where O, A1 and A2 are
not collinear points on Σ, used
only in Secs. 2.1–2.6
T (
x) = x 0 + f (
x) affine transformation 67, 83, 236
or x 0 +

xA
−−
x = OX(etc.) vector 5, 26, 29, 37,
692
−
x = (−1)
x negative of
x or inverse of
x under 6, 26, 27,
addition 29, 692

x +
y sum or addition of vectors
x and
y 26, 29, 692

αx scalar multiplication of x by 6, 27, 29, 692
scalar α
x −

y
x + (− y) 27, 692
x

subspace generated or spanned 41
by x
x2
x1 , subspace generated or spanned 330
by x1 and x2
x k
x1 , . . . , subspace generated or spanned by 41, 330, 695
{ x k }, i.e. {
x1 , . . . , x k }
x1 , . . .
x →

x0 +
x translation along the vector x0
67, 73, 236,
247, 251

x0 +S image of subspace S under 67, 359
x →

x1 + x , an affine subspace
y
x, inner product of
x and
y 773
1
|x|

x , x , length of x
2

360, 774
x⊥y

x and y is perpendicular or 129, 412, 774
orthogonal to each other:
y = 0.
x,
V, W (etc.) vector or linear space over a field 691
∗
V (first) dual space of V 749
∗∗ ∗ ∗
V = (V ) second dual space of V 751
V /S quotient space of V modulus 200, 211, 369
subspace S (etc.), 695
INDEX
adjoint linear operator, 422, 754, 781 space, 9, 12, 31, 34, 235, 324, 326,
(also called) dual operator in dual 640
space, 422, 754 one-dimensional, 8, 12
self-adjoint or symmetric operator, two-dimensional, 31, 36
782, 786 three-dimensional, 324, 326
adjoint matrix, 336, 731 n-dimensional, 235
affine difference space, 235
basis (or frame), 9, 10, 19, 31, 34, zero vector, 235, 236
71, 239, 324, 326, 362, 580, 640 base point, 235
base point (origin), 9, 10, 19, position vector (with initial
31, 34, 71, 239, 324, 326, point and terminal
362, 580, 640 point), 235
coordinate vector, 10, 34, 326, free vector, 235
640 difference vector, 236
orthonormal affine basis, 613, affine subspace, 236
625 affine transformation or
natural affine basis for R2 , 239 mapping, 237
natural affine basis for R3 , 581 affine motion, 237, 240
coordinate system, 19, 34, 38, 324, subspace, 67, 236, 359, 425, 640,
326, 328, 362, 640 724
affine coordinate, 19, 72, 239, dimension, 235
363, 641 k-dimensional plane, 640
barycentric coordinate, 19, 72, hyperplane, 641
363, 642 dimension (or intersection)
barycenter, 19, 72, 363 theorem, 644
dependence of points, 32, 71, 239, operation of affine
325, 362, 580, 640 subspaces, 643
geometry, 292, 640 intersection, 644
invariant, 19, 90, 286, 374, sum, 644
636, 638, 679 relative positions of affine
group, 238, 240, 580, 581, 656 subspaces, 645
independence of points, 33, 71, 239, coincident, 645
325, 362, 580, 640 parallel, 645
invariant, 19, 90, 286, 310, 374, 636 skew (etc.), 645
839
840 Index
transformation or mapping or hyperbolic (orthogonal)

motion, 16, 18, 47, 67, 73, 83, rotation, 283
237, 250, 326, 338, 364, 366, 580, elliptic (orthogonal) rotation,
581, 655 283
regular or proper or affinity, 285
nonsingular, 251, 655 algebra homomorphism, 744
singular, 251 algebra isomorphism, 119, 406
the fundamental theorem, algebraic rank (of a matrix), 125,
241, 580 367(etc.), 383, 407, 739
matrix representation, 239, annihilator, 213, 214, 421, 752, 757,
580, 581 802
invariant point, augmented matrix, 150, 152, 156,
251(etc.), 591(etc.) 180, 377, 416, 425, 724
subspace of invariant area coordinate, 76
points, oriented triangle, 75
251(etc.), 591(etc.) signed area, 75
invariant affine subspace, coordinate or base triangle, 76
251(etc.), 591(etc.) base point, 76
decomposition of affine homogeneous area coordinate, 76
transformation, 241, 580 barycentric coordinate, 76
translation, 67, 247, 251, 591 normalized barycentric
reflection (skew), 251, 591 coordinate, 76
orthogonal reflection, 253, ideal or infinite point, 76
274, 592, 624 affine coordinate system, 76
symmetric motion, 253, 592 base point, 76
one-way stretch, stretching, vertex, 76
255, 596, 597 base triangle, 76
orthogonal one-way stretch, orthogonal or Descartesian or
256, 597 rectangular coordinate system,
two-way stretch, 261, 604 77
orthogonal two-way stretch, signed distance, 78
261
enlargement (scale factor), basis, 9, 10, 31, 34, 71, 324, 326, 330,
262, 605 336, 697
(three-way) stretch, 611 basis vectors, 9, 10, 31, 34, 71, 324,
orthogonal stretch, 611 326, 697
enlargement, 611, 656 ordered basis, 9, 10, 34, 324, 326,
similarity, 611, 656 331, 698
shearing, 265, 266, 611, 612, natural or standard (orthonormal)
613 basis for R2 , 38, 55
skew shearing, 282, 635 natural or standard (orthonormal)
rotation 271, 272, 620 basis for R3 , 328, 335, 366
orthogonal reflection, 274, natural or standard basis for Fn ,
461, 624 693, 697
orthogonal projection, 460, changes of bases (coordinates), 16,
462, 468, 624, 627, 635 46, 117, 337, 406, 735
Index 841
Steinitz’s replacement theorem, 697 rational canonical form (of a

barycentric coordinate square matrix), 233, 390, 398,
for R (a line), 19 399, 560, 562, 574, 809, 812(etc.)
barycenter (middle point of canonical form of 3 × 3 real
base segment), 19 matrices under similarity, 399
for R2 (a plane), 72 canonical or standard form of
barycenter (of base triangle), quadratic curves, 303
72 canonical or standard form of
base triangle, 72 quadrics: Euclidean, affine,
affine coordinate, 72 projective, 670, 675, 679
for R3 (a space) 363, 642 Cayley–Hamilton theorem, 108, 209,
affine coordinate, 363 213, 221, 399, 402, 477, 792
base tetrahedron, 363 Cauchy–Schwarz inequality, 774
barycenter (of base Ceva theorem, 291, 296, 651
tetrahedron), 363, 642 change of coordinates, 16, 46, 337
coordinate axis, 363 formula, 16, 46, 337
transition matrix or change of
coordinate plane, 363
coordinates matrix, 16, 46, 337,
Bessel inequality, 779
735
bilinear form, 304, 756
characteristic
block or partitioned matrix, 180
of a field, 686
submatrix, 134, 180
polynomial (of a linear operator or
of the same type, 181 square matrix), 106, 193, 212,
addition, 181 384(etc.), 407, 491, 719, 791, 795
scalar multiplication, 181 equation, 106, 384(etc.), 719, 791
product, 181 root (eigenvalue), 105, 384(etc.),
transpose, 181 407, 719, 790
conjugate transpose, 181 vector (eigenvector), 105, 384(etc.),
pseudo-diagonal, lower or upper 407, 719, 790
triangular, 181 co-dimension, 747
Schur’s formula, 182 coefficient matrix, 56, 149(etc.), 152,
rank decomposition theorem (of a 156, 250, 342, 377, 416, 425,
matrix), 184 442(etc.)
Frobenius inequality (for ranks), column
186 rank (of a matrix), 125, 366,
Sylvester inequality (for ranks), 367(etc.), 383, 407
186 space, 125, 407, 757
vector, 125, 367(etc.), 375, 703
canonical form (of a matrix), 559 complete quadrilateral, 299
diagonal canonical form (of a signed ratio, 299
square matrix), 193, 198, 386, harmonic points, 299, 300
399, 479, 718, 797 cross ratio, 299, 300, 661
Jordan canonical form (of a square congruence of matrix, 166, 454
matrix), 222, 224, 393, 395, 399, Sylvester’s law of inertia (of
516, 541, 543, 547–552, 799, symmetric bilinear form or
805(etc.) symmetric matrix), 166, 454
842 Index
conic, 300, 306 singular value decomposition

central, 306 theorem (SVD), 178, 771
noncentral, 306 Desargues theorem, 296
center, 306 determinant, 727(etc.)
nondegenerated, 306 expansion along a row or a column,
degenerated, 306 728
conjugate (matrix) of a matrix, 702 properties, 728
conjugate transpose, 702, 714 subdeterminant, 730
convex set, 653 principal, 730
convex hull or closure, 653 cofactor, 728, 730
convex cell, 653 minor, 728
coordinate system (also called affine Laplace expansion, 730
basis), 9, 10, 31, 34, 46, 324 adjoint matrix of a square matrix,
origin, 9, 31, 324 731
zero vector, 9, 31, 324 inverse matrix of an invertible
basis (vector), 9, 31, 324 matrix, 731
(associated) vector space, 9, 10, 31, Cramer’s rule, 88, 732
34, 324, 326 real determinant of order 2, 47, 55,
Cartesian or rectangular or natural 106, 121
coordinate system, 38, 328, 335 real determinant of order 3, 336,
affine or oblique coordinate system, 337
38, 328 cofactor, 336
coordinate vector (of a point or a minor, 336
vector w. r. t. a basis) expansion along a row or a
in R, 10 column, 336
in R2 (first, second component), 34 adjoint matrix, 336, 731
in R3 (first, second, third diagonalizable linear operator or
component), 326 matrix, 117, 193, 198, 212, 385,
coordinatized space (w. r. t. an affine 399, 479, 718, 794
basis) eigenvalue or characteristic value
of a straight line, 11 (root), 105, 106, 190, 193, 198,
of a plane, 35 385(etc.), 477, 719, 790
of a (three-dimensional) space, 326 eigenvector or characteristic vector,
cross ratio (of four points), 300, 661 105, 106, 121, 190, 193, 198,
385(etc.), 477, 719, 790
Darboux theorem, 291 eigenspace, 193, 198, 478, 791
decomposition of matrices characteristic polynomial
LU; LDU; LDL∗ ; LPU; LS, 161, (equation), 106, 121, 193, 198,
162, 167, 442(etc.) 216, 407, 437(etc.), 719, 791, 795
lower and upper triangular generalized eigenspace, 210, 224,
decomposition (of a square 516, 794, 801
matrix), 161 generalized eigenvector, 224, 516,
rank decomposition theorem (of a 794
matrix), 184 minimal polynomial, 199, 209, 214,
polar decomposition, 772 479, 794, 795
Index 843
criteria for diagonalizability, 198, dual space, 420, 749

212, 479, 718, 796 second, 421, 751
algebraic multiplicity, 199,
479, 797 echelon matrix, 164, 443(etc.), 721
geometric multiplicity, 199, leading entry of a row (of a
479, 797 matrix), 721
diagonal canonical form or pivot: 162, 164, 721
decomposition (of a square pivot column (or row), 164,
matrix or linear operator), 199, 721
479, 797 pivot matrix, 162
simultaneously diagonalizable, 209, pivot variable, 721
213 free variable, 721
differential equation, 505 row-reduced echelon matrix, 164,
homogeneous, 505 443(etc.), 721
nonhomogeneous, 508, 552 eigenspace, 193, 210, 224, 478, 479,
linear ODE with constant 516, 791, 794
coefficients, 509 eigenvalue (characteristic value, root),
dimension (of a vector space), 12, 36, 105, 190, 193, 224, 384(etc.), 477,
43, 327, 330, 698 479, 516, 719, 794
finite-dimensional, 697 eigenvector (characteristic vector),
infinite-dimensional, 697 105, 190, 193, 224, 384(etc.), 477,
n-dimensional, 698 479, 516, 719, 791
dimension theorem, 88, 366, 734, elementary divisor, 804
738 elementary matrix, 150, 442, 720
elementary matrix factorization (of
direct sum
an invertible matrix), 161,
of subspaces, 33, 41, 330, 694, 746
444(etc.), 726
of restriction operators, 210, 211
elementary column or row operation,
directed (line) segment, 6, 7, 65, 356
149, 150, 442, 719
signed length, 6
type 1, 149, 150, 442, 719, 720
positive direction, 6
type 2, 149, 150, 442, 719, 720
negative direction, 6 type 3, 149, 150, 442, 719, 720
length, 6 elementary symmetric function (of n
coordinate or parametric variables), 491
representation, 18, 61, 62, 356 elimination method, 148, 442(etc.),
initial point, 18, 65, 356 719(etc.)
terminal point, 18, 65, 356 ellipse, 303, 304, 305
end point, 18, 65, 356 imaginary ellipse, 303, 305
interior point, 18, 65, 356 ellipsoid, 670
exterior point, 18, 65, 356 imaginary ellipsoid, 670
middle point, 19, 65, 356 point ellipsoid, 670
discriminant elliptic cone, 670
of a quadratic equation, 689 elliptic paraboloid, 670
of a quadratic curve, 306 elliptic cylinder, 671
of a quadric, 679 imaginary elliptic cylinder, 671
dual basis, 420, 750 elliptic rotation, 312, 313
844 Index
transitive one-parameter subgroup generalized or pseudo-inverse (of a

of Ga (2; R), 313 matrix), 176, 419, 429, 642,
enlargement: scale, invariant point, 767(etc.)
95, 247, 262, 385, 611 row space (of a matrix Am×n ),
124, 407, 757
f -cyclic subspace generated by a left kernel, 124, 407, 757
vector
x , 209, 210, 802 column space, 124, 407, 757
f -invariant subspace, 210 right kernel, 124, 407, 757
field, 684 of A2×3 and Am×n (m ≤ n and
rational (Q), 686 r(A) = m), 419, 456, 462,
real (R), 685 763(etc.), 765(etc.)
complex (C), 685 right invertible, 457, 760
finite field (Ip ), 686 right inverse, 457, 760
infinite field, 686 of A3×2 and Am×n (m ≥ n and
characteristic, 686 r(A) = n), 429, 462, 469,
finite-dimensional vector space, 697 763(etc.), 765(etc.)
free variable, 721 left invertible, 463, 760
function, 682 left inverse, 464, 760
image, preimage, 682
of Am×n (with
domain, range, 682
r(A) = r ≤ min(m, n)), 419, 429,
equal, 682
767, 768
onto, 682
least square problem, 419, 429, 765
one-to-one, 682
optimal solution, 419, 429, 767
restriction, 683
geometric interpretation of a
composite, 683
determinant of order 3, 638
identity, 683
Gersgörin’s disk theorem (for
invertible, 684
eigenvalues), 490
left inverse, 683
right inverse, 683 Gram–Schmidt orthogonalization
inverse, 684 process, 776
group, 686
generalized eigenspace, 210, 224, 479, abelian, commutative, 687
514, 515, 516, 794, 801 subgroup, 245, 687
root space, 801 group isomorphism, 120
cycle (generated by a vector), 209, real general linear group on Rn ,
210, 801 120, 406, 737
order (of a vector), 801 affine group or group of affine
generalized eigenvector, 514, 515, motions on Rn , 238, 240, 580,
516, 794 581, 656
cyclic invariant subspace, 802 projective group or group of
A-cycle (generated by a vector), projective transformations on
210, 570, 802 P 3 (R) n = 2, 3, 660
annihilator, A-annihilator, 213, general linear group on Fn , 737
214, 570, 802
minimal polynomial of a vector x half-space, 652
related to An×n , 802 open, 653
Index 845
closed, 653 orthogonal projection, 418,

side (of a point w. r. t. a subspace), 420
653 solution space, 417
half line or ray, 653 solution affine subspace, 417
segment, 653 distance, 418
homogeneous coordinate (of a point generalized or pseudo-inverse,
w. r. t. a basis), 306, 311 419
representative vector, 658 two equation in three unknowns:
homogeneous linear equation, 87, 124, 424(etc.)
377(etc.), 415(etc.), 424(etc.), coefficient matrix, 425
723(etc.) augmented matrix, 425
general theory: 723 solution space, 425, 426
consistent, 723 particular solution, 425, 426
fundamental solution, 723 solution affine subspace, 425,
general solution, 723 426
solution space, 724 distance, 426
hyperbola, 303, 304, 305 orthogonal projection, 428,
hyperbolic rotation, 313 429
transitive one-parameter subgroup generalized or pseudo-inverse,
of Ga (2; R), 313 429
hyperboloid, 670 general theory, 724, 759(etc.)
of two-sheets, 670 augmented matrix, 724
of one-sheet, 456, 670 particular solution, 724
hyperbolic paraboloid, 671 solution affine subspace, 724
hyperbolic cylinder, 671 connection with generalized inverse
hyperplane, 378 of a matrix, 765(etc.)
at infinity, 657 inner product space, 169, 773
hypersubspace, 378, 751 inner product
(abstract) general definition,
idempotent (linear) operator, matrix, 773
111, 707, 740 positive definite, 773
identitical vectors, 7, 25 conjugate symmetric
identity matrix, 701 (complex), symmetric,
indefinite matrix, 789 773
induced quotient operator, 211 linear, 773
infinite-dimensinal vector space, 697 bilinear, 774
infinite or ideal point, 76, 315, 657 real inner product space (V, , ) or
infinite line, 315, 657 V , 773
inhomogeneous or nonhomogeneous complex inner product space
linear equation, 88, 150(etc.), (V, , ) or V , 773
377(etc.), 415(etc.), 424(etc.), unitary space, 773
442(etc.), 724(etc.), 759(etc.) norm or length of a vector, 360, 774
three equations in two unknowns: orthogonal or perpendicular (of
415(etc.) two vectors), 774
coefficient matrix, 416 Cauchy–Schwarz inequality, 774
augmented matrix, 416 triangle inequality, 774
846 Index
angle (between two nonzero skew-Hermitian operator

vectors), 774 (matrix), 782
signed orthogonal projection (as a orthogonal operator (matrix),
quantity), 774 782
orthogonal projection (of a vector symmetric (or self-adjoint)
on another), 775 operator (matrix), 782
Gram–Schmidt orthogonalization skew-symmetric operator
process, 776 (matrix), 782
orthogonal set or basis, 775 invariant
orthonormal basis, 775 affine, 19, 90, 286, 374, 636, 638,
standard or natural 679
orthonormal basis for Rn or projective, 300, 317, 661
Cn , 775 invariant subspace, 85, 188, 193, 210,
orthogonal complement (of a set), 591, 748
778 line, 188, 591
orthogonal or perpendicular (of point, 90, 591
two sets of vectors), 778
inverse matrix, 47, 55, 336, 338, 711,
matrix representation of an inner
731
product, 776
inverse vector, 9, 13, 26, 692
Hermitian (positive-definite),
invertible matrix, 47, 55, 336, 338,
776
711, 731
symmetric (positive-difinite),
776 left invertible (inverse), 142, 423,
760(etc.)
congruent, 776
right invertible (inverse), 142, 423,
orthogonal decomposition (of an
760(etc.)
inner product space), 778
orthogonal projection vector, invertible (linear) operator or
778 transformation, 16, 35, 47, 84,
336, 338, 732
orthogonal vector, 778
Pythagorean theorem, 778 invariance under similarity
optimal approximation determinant (of a linear operator),
inequality, 779 121, 407
Bessel inequality, 779 characteristic polynomial, 121, 407
orthogonal projection (operator), eigenvalue, 121, 407
779 trace, 121, 407
Riesz representation theorem, 780 rank, 121, 407
conjugate linear, 781 involutory operator or matrix, 112,
self-dual inner product space, 544, 708, 741
781 reflection, 544
adjoint (or conjugate) operator of a irreducible quadratic curve, 306
linear operator, 781
normal operator (matrix), 782 Jordan canonical form, 222, 224, 516,
unitary operator (matrix), 782 541, 549, 799(etc.), 805, 806
Hermitian or self-adjoint Jordan canonical basis, 222, 224,
operator (matrix), 782 516, 528, 529, 549, 806
Index 847
Jordan block, 520, 525, 546, 800, direction, 61, 346

806 parametric equation w. r. t. a
exponential of, 521, 534, 546 basis, 61, 346
coordinate equation w. r. t. a
kernel (of a linear transformation), basis, 61, 346
56, 85, 124, 366, 383, 407, 734 relative positions of two lines,
left kernel, 124, 407, 757 63, 349
right kernel, 124, 407, 757 coincident, 63, 349
parallel, 63, 349
Lagrange interpolation theorem intersecting, 63, 349
(formula), 441, 699 skew (in R3 ), 349
polynomial, 441, 699 relative positions of a line and
Laplace expansion (of determinent), a plane, 354
730 coincident, 354
LDL∗ -decomposition (of matrix), 162, parallel, 354
442(etc.) intersecting, 354
LDU-decomposition (of matrix), 162, linear combination (of vectors)
442(etc.) in R1 , 8, 9
left invertible matrix, 142, 423, 463, in R2 , 31
760 in R3 , 324
left inverse (of a matrix), 142, 423, in general vector space, 695
464, 761 coefficients (scalars), 8, 9, 31, 324,
left kernel (or nullspace), 124, 407, 695
757 linear dependence (of vectors)
length of a vector, 360, 774 in R1 , 9
Lie algebra, 146 in R2 , 32, 33, 40
product vector (of two vectors), in R3 , 324
146 in general vector space, 696
bilinearity, 146 linear equation, 723, 724
Jacobi identity, 146 consistent equation, 723
Lie product or bracket operation fundamental solution, 723
(of two linear operators), 146 general solution, 723
limit matrix, 451 solution space, 723
line particular solution, 724
in affine or vector space R: solution affine subspace, 724
coordinate representation linear functional
w. r. t. an affine basis, 18 general theory in vector space over
in affine or vector spaces R2 and a field: 84, 378(etc.), 420(etc.),
R3 732, 749(etc.)
coordinate equations of Lagrange multiplier’s method,
coordinate axes in an affine 749
basis (coordinate system), dual space, 420, 749
60, 346, 363 dual basis, 420, 750
parametric equation in vector hyperplane (or
form, 61, 346 hypersubspace), 378, 751
point, 61, 346 second dual basis, 421, 751
848 Index
natural linear isomorphism, unitary operator (matrix), 782

421, 752 Hermitian operator (matrix),
annihilator, 421, 752 782
symmetric bilinear, 756 skew-Hermitian operator
adjoint or dual operator, 422, (matrix), 782
754 orthogonal operator (matrix),
in inner product space: 780 782
Riesz representation theorem, symmetric or self-adjoint
780 operator (matrix), 782
adjoint operator, 781 skew-symmetric operator
linear independence (of vectors) (matrix), 782
in R1 , 10 canonical form
in R2 , 33, 41 diagonal, 193, 198, 385, 479,
in R3 , 325, 336 796(etc.)
in general vector space, 696 Jordan, 224, 393, 394, 395,
linear isomorphism, 11, 18, 35, 58, 84, 399, 516, 799(etc.), 805
117, 136, 326(etc.), 373, 732, 734 rational, 233, 390, 398, 399,
natural, 421, 752 562, 809(etc.), 812
linear operator, 84, 89, 366, 406, 435, geometric mapping properties, 90,
732, 781 286, 374
invertible (see linear isomorphism), classification of linear operators on
84, 89, 117, 144 R 2 , R3
in general vector space enlargement, 95, 385
iterative linear operator, 120, one-way stretch, 385
141, 438, 737 two-way stretch, 95, 385
adjoint or dual operator, 422, three-way stretch, 385
754 reflection, 96, 387
differential operator, 440 shearing, 100, 392(etc.)
integral operator, 440 invariant
idempotent operator, 110, determinant, 121, 407
707, 740 characteristic polynomial, 121,
involutory operator, 112, 544, 407
708, 741 eigenvalue, 121, 407
nilpotent operator, 110, 215, trace, 122, 407
537, 539, 708, 803 rank, 124, 407
power, index, 215, 537, 539,
708, 803 linear space (see vector space)
Jordan canonical form, 539, linear or vector subspace, simply
542, 803 called subspace, 40, 85, 330, 694,
invariant system, 540, 541 695
square root, 543 proper, 41
matrix and its transpose, 756(etc.) generated or spanned by a set of
in inner product vectors, 41, 330, 695
space (etc.), 781(etc.) sum subspace, 41, 330, 694
adjoint operator, 781 intersection subspace, 330, 694
normal operator (matrix), 782 direct sum subspace, 41, 330, 694
Index 849
zero subspace, 41, 694 LS-decomposition, 167

invariant subspace, 85, 748 LU-decomposition, 161, 167, 442(etc.)
kernel or null subspace, 85
image or range subspace, 85 Markov (or stochastic) process (or
zero subspace, 85 chain), 487, 501
(whole) space, 85 probability vector, 487, 501
linear transformation (mapping), 11, regular stochastic matrix, 487, 501
16, 18, 35, 47, 56, 57, 58, 84, 87, transition (or stochastic or
116, 136, 326, 338, 343, 366, 406, Markov) matrix, 501
435, 732 positive matrix, 501
kernel or null subspace, 56, 85, 343, positive stochastic matrix, 501, 504
366, 734 nonnegative matrix, 502
image or range subspace, 56, 85, limiting probability vector, 503
343, 366, 734 stability or steady-state vector, 503
nullity, 89, 366, 734 matrix, 37, 46, 47, 54, 73, 86(etc.),
rank, 89, 366, 407, 734 336(etc.), 342, 343, 364,
linear isomorphism, 11, 16, 35, 47, 366(etc.), 399(etc.), 699(etc.)
84, 326, 338, 373, 732, 734 notations, 699
right invertible (inverse), 141 entry (element), 699
left invertible (inverse), 142 ith row, 699
linear operator, 84, 732 jth column, 700
restriction operator, 89, 210 row matrix (vector), 700, 703
direct sum, 211 column matrix (vector), 700, 703
linear functional, 84, 345, 420, 732 equal, 700
decomposable, 749 zero matrix, 700
projection (operator, transpose, 700
transformation), 136, 435 square matrix, 700
projectionalization (of a linear (main) diagonal entries, 700
transformation), 137, 436, diagonal matrix diag[a1 , . . . , an ],
746 701
natural projection, 211, 747 identity matrix In (of order n), 701
matrix representation w. r. t. scalar matrix αIn , 701
bases, 57, 58, 59, 86, 99(etc.), upper triangular matrix, 161, 701
115, 172, 198, 211, 366, 369, 372, lower triangular matrix, 161, 701
374, 407, 413, 580, 735 triangularizable, 547
change of coordinates matrix, decomposition as the sum of a
47, 337, 735 diagonalizable matrix and a
transition matrix, 47, 337, 735 nilpotent matrix, 549, 550
diagonalizable, 117, 193, 198, block or partitioned matrix (see
407, 794 the index), 180
rank theorem (of a linear operations and their properties:
transformation or matrix), 140, 702, 703
436 addition (sum), 702
normal form (of a linear scalar multiplication, 702
transformation or matrix), 140, multiplication, 705
164, 437, 457(etc.) power, 707
850 Index
conjugate, 702 using determinant and

conjugate transpose, 702 adjoint matrix, 55, 336,
some peculiar matrices 731
symmetric, 700, 710, 782, 786 using Cayley–Hamilton
skew-symmetric, 497, 701, theorem (eigenvalues),
710, 782 108, 399, 793
Hermitian, 701, 782, 786 expression as a product of
skew-Hermitian, 701, 782, 786 finitely many elementary
orthogonal, 712, 782, 783, 786 matrices, 161(etc.),
unitary, 782, 783, 786 442(etc.), 726
normal, 782, 785(etc.) singular (square) matrix, 712
idempotent, 111, 129, 405, 707 generalized or pseudo inverse (see
nilpotent, 110, 215, 404, 522, the index)
537, 539, 708, 803 inner product for m × n real or
degree or index or power, complex matrices, 714
215, 537, 539 similar, 117, 132, 716
involutory, 112, 129, 405, 544, equivalent relation, 716
707 necessary condition for a
adjoint matrix, 336, 731 matrix similar to a diagonal
permutation matrix, 160, 767 matrix, 198, 718
companion matrix, 572, 574, criteria of similarity of real
796 matrices of order 3, 517
elementary matrix (see the necessary and sufficient
index) conditions for similarity,
matrix equation, 712 198, 808
rank, 89, 125, 366, 383, 407, 711, trace, 107, 122, 713, 714
722, 734, 739 matrix representation (of linear
geometric rank: 125, 367(etc.), transformation), 57, 58, 59, 86,
383, 407, 738 116, 366(etc.), 399, 406, 735
row rank, 125, 367(etc.), norm (of a matrix), 490, 498, 546,
383, 407, 711, 738, 757 557
column rank, 125, 367(etc.), square root (of a matrix), 522, 543
383, 407, 711, 738, 757 acting as a linear transformation,
algebraic rank, 125, 368(etc.), 86, 737
383, 407, 739 normal or canonical form (of a
equalities of ranks, 125, matrix), 140, 164, 369, 372, 374,
367(etc.), 383, 407, 711, 739 399, 437, 438, 559, 723
invertible or nonsingular (square) tensor product (of two matrices),
matrix, 47, 55, 336, 338, 339, 711 148
inverse matrix, 47, 55, 336, matrix exponential, 496(etc.), 551,
338, 339, 711 558
computation of inverse spectral radius, 496
matrix, 519 power series of a matrix, 499
using elementary row matrix sine, 499
operations, 161(etc.), matrix cosine, 499
442(etc.), 726 matrix logarithm, 500
Index 851
matrix polynomial, 551 orthogonally similar, 717, 784

matrix power, 550, 552 Hermitian, symmetric, 782, 786
matrix mth root, 552 positive, negative definite,
Menelaus theorem, 291, 292, 648 786, 789
minimal polynomial, 199, 209, 214, positive, negative semidefinite,
224, 479(etc.), 516, 794, 795 786, 789
multiplicity, 199, 797 indefinite, 789
algebraic, 199, 480, 516, 797 skew-Hermitian, skew-symmetric,
geometric, 199, 480, 516, 797 782
spectral decomposition theorem,
natural or standard basis (see basis) 786
natural coordinate system (see normalized barycentric coordinate
coordinate system) (see also barycentric coordinate),
natural isomorphism, 752 76
natural or standard inner product null space (see also kernel), 89, 366,
(see inner product) 734
natural projection, 211, 747 nullity, 89, 366, 734
natural projective basis, 659
vertex, 659 oblique (or affine) coordinate system,
unit, 659 38, 328
projective coordinate, 659 one-dimensional
n-dimensional vector space, 698 vector space, 12
negative definite matrix, 789
standard, 12
negative semidefinite matrix, 789
affine space, 12
negatively-oriented or left-handed on
one-way stretch, 255, 385, 597
R2 , 246
orientation
Newton identity, 494
on R1 , 6
nilpotent operator or matrix (see
linear operator and matrix) on R2 , 22, 90, 246
noncentral same orientation, 246
conics, 306 opposite orientation, 246
quadrics, 677 positively-oriented or
nondegenerated right-handed, 246
conics, 306 negatively-oriented or
quadrics, 676 left-handed, 246
quadratic surface, 676 oriented, 246
nonhomogeneous linear equation (see anti (or counter) clockwise,
inhomogeneous linear equation) 22, 90, 246
nonsingular matrix (see matrix) clockwise, 22, 90, 246
norm or length of a vector (see inner on R3 , 320
product) right-handed or anticlockwise,
normal form of a matrix, 140, 164, 320
385, 437, 438, 457, 459, 464, 723 left-handed or clockwise, 320
normal operator or matrix (see also orthogonal or perpendicular (of two
linear operator), 782, 785(etc.) vectors), 129, 169, 774
unitarily similar, 785, 786 orthogonal
852 Index
complement (of a set in an inner 2-parallelogram, i.e. a

product space), 129, 412, 778 parallelogram usually called in
decomposition theorem (of an R2 , 68
inner product space), 778 3-parallelogram, i.e. a
matrix, 266, 272, 313, 461, 613, parallelepiped usually called in
625, 783 R3 , 320, 375, 605(etc.), 638, 667
of order 2, 169, 266, 272, 313 k-parallelogram or
of order 3, 461, 466, 613, 625 k-hyperparallelepiped in Rn
of order n, 783 (0 ≤ k ≤ n), 638, 667
side vectors, 638, 667
characteristic properties,
783 degenerated, 638
parallelotope, 661
rigid motion or isometry,
particular solution (see
784
inhomogeneous linear equation)
orthogonally similar, 786
Pasch axiom, 295
operator, 782, 783 permutation matrix (see matrix), 160,
group of, 783 767
projection, 176, 418, 420, 428, 429, perpendicular or orthogonal of two
460, 462, 468, 469, 624, 764, 779, vectors (see orthogonal or
characterizations, 779 perpendicular)
reflection, 274, 460, 461, 468, 469, pivot, 162, 721
592, 625 matrix, 162
stretch, 261, 597 plane, 351
set or system, 775, 778 in affine or vector space R3 : 351
orthonormal basis, 176, 461, 613, 621, coordinate equations of
625, 775, 776 coordinate planes in an
parabola, 303, 304, 305 affine base (coordinate
parabolic translation, 314 system), 352, 363
transitive one-parameter subgroup passing point or points, 351,
of Ga (2; R), 314 352
paraboloid, 670 directions, 351, 352
elliptic paraboloid, 670 parametric equation in vector
form, 352
hyperbolic paraboloid, 671
parametric equation w. r. t. a
parabolic cylinder, 671
basis, 352
parallel coordinate equation w. r. t. a
lines in affine plane or space, 63, basis, 352
303, 304, 349, 645 relative positions of a line and
two imaginary parallel lines, a plane (see line), 354
303, 304, 305 relative positions of two
planes in affine space, 355, 645, 671 planes, 355, 645
two imaginary parallel planes, coincident, 355, 645
671 parallel, 355, 645
parallel law or invariance (of vectors), intersecting, 355, 645
7, 25 polar decomposition (of a matrix),
parallelogram 772
Index 853
polyhedron, 654 unit, 659

polynomial, 687 projective coordinate, 659
degree, 687 projective transformation, 659
equal, 687 matrix representation, 659
polynomial function, 687 group of, 660
division algorithm, 688 Pythagorean theorem (or formula),
irreducible, 688 22, 778
relatively prime, 688
unique factorization theorem, 689 quadratic form, 304, 455
vector space of polynomials over a as sum of complete square terms,
field, 688, 693 456
vector space of polynomials of quadratic curve, 300
degree not greater than n, 688, conics, 300, 306
698 degenerated; nondegenerated,
positive 306
matrix, 491, 502, 504 reducible; irreducible, 306
definite matrix, 786, 789 central; noncentral, 306
semidefinite matrix, 786, 789 classification in affine geometry:
positively-oriented or right-handed on 309
R2 , 246 affinely equivalent, 307
projection (operator), 136, 193, 200, affine invariants, 310
435, 480, 746 canonical or standard forms
projectionalization of a linear (types), 303
operator, 137, 436 classification in projective
projective geometry, 317
line P 1 (R), 661 classification in Euclidean
plane P 2 (R), 315, 661 geometry, 303, 305
infinite point or point at center, 306
infinity or ideal point, 315 quadric or quadric surface in R3 , 456,
infinite line, 315 668
space P 3 (R) or P n (R), 657, 658 canonical or standard forms, 670
point at infinity or ideal point, in affine geometry (space R3 ),
657, 658 675
(k − 1)-dimensional subspace affine invariants, 679
at infinity 1 ≤ k ≤ n, 657 in projective geometry (space
hyperplane at infinity, 657 P 3 (R)), 679
homogeneous coordinate in Euclidean geometry (space
(w. r. t. a basis), 658 R3 ), 671
representative vector, 658 center, 676
ideal or infinite point, 658 central quadric, 677
ordinary or affine point, 658 noncentral quadric, 677
affine or inhomogeneous degenerated, 676
coordinate (of an affine nondegenerated or proper, 676
point), 658 regular point; singular point;
natural projective basis, 659 double point, 677
vertex, 659 regular surface, 677
854 Index
quotient space, 200, 211, 250, concrete examples, 150(etc.),

372(etc.), 695 442(etc.)
range (subspace, space), 56, 85, 734 Schur’s formula (see block matrix),
rank, 89, 124, 366, 368(etc.), 383, 182
406, 711, 722, 734, 739 set, 681
rank theorem for matrices or linear member; element, 681
operators (see also normal form subset; proper subset, 681
of a matrix), 140, 164, 436, 438 empty set, 681
rank decompoisition theorem of a union, 681
matrix, 184 intersection, 681
rational canonical form (see canonical difference, 681
form), 233, 399, 562, 569, 574, Cartesian product, 681
809(etc.) shearing (see affine, transformation),
A-cycle (of a matrix A), 570, 802 100, 265, 392, 612
annihilator, 214, 570, 802 signed length (of a segment), 6
companion matrix, 571, 574, 796, similar matrix (see matrix, similar)
802 simplex, 642, 654, 662
rational canonical basis, 233, 562, 0-simplex, i.e. a point, 5
574, 813 1-simplex, i.e. a line segment, 5, 18
real field, 12, 685 2-simplex, usually called a triangle
real general linear group on Rn (see in R2 (see triangle), 65
group) 3-simplex, usually called a
real inner product space (see inner tetrahedron in R3 (see
product), 773 tetrahedron),
real vector space, 29, 692 k-simplex or k-tetrahedron (in Rn ),
rectangular coordinate system (see 590, 642, 654, 662
coordinate system), 38, 328 vertex, 642
reflection (see affine, transformation), edge, 642
253, 274, 387, 461, 468, 544, 592 face, 642
Riesz representation theorem (see open simplex, 654
inner product space), 780 boundary, interior, exterior,
right inverse, 142, 175, 423, 457, 760 662
right invertible, 142, 423, 457, 760 separation, 663
right kernel, 124, 407, 757 barycentric subdivision, 666
rotation (see orthogonal, generalized Euler formula, 667
transformation), 272 k-dimensional volume, 641
row simultaneously
rank, 125, 367(etc.), 383, 407, 711, diagonalizable, 209, 213
739 lower or upper triangularizable, 215
space, 125, 407, 757 singular matrix (see matrix)
vector, 125, 367(etc.), 375, 703 singular value (of a matrix), 177, 771
row-reduced echelon matrix, 164, decomposition theorem (SVD),
443(etc.), 464, 721 178, 771
implications and applications, polar decomposition (see the
722–727 index), 772
Index 855
generalized inverse (see the index), unitary matrix (see inner product
176, 766(etc.) space, adjoint operator), 782,
standard 783, 786

zero-dimensional vector space { 0 }, unitarily similar, 786
694, 698 upper triangular matrix (see matrix),
one-dimensional vector space R, 12 161, 701
two-dimensional vector space R2 ,
37 Vandermonde determinant, 750
three-dimensional vector space R3 , vector
327 line vector, 7
n-dimensional vector space Rn or zero vector, 7
Cn or Fn , 693, 698 identical vectors, 7
basis or natural basis (see basis) parallel invariance of
inner product or natural inner vectors, 7
product for Rn or Cn (see inner scalar multiplication
product), 773 (product), 8, 9
stretch or stretching (see affine, linear dependence, 9
transformation) linear independence, 1
Sylverster’s law of inertia (of a plane vector, 24
symmetric matrix), 166, 454, zero vector, 24, 27
467, 470, 471 identical vectors, 25
index, 166, 454, 467, 470, 471 parallelogram law or
parallel invariance, 25
signature, 166, 454, 467, 470, 471
negative of a vector, 26
rank, 166, 454, 467, 470, 471
addition (sum) of vectors, 26
scalar multiplication, 27
tensor product (of two matrices), 148
subtraction vector, 27
tetrahedron in R3 (see simplex), 356,
scalar, 28
363, 401, 638, 642
linear combination: coefficient,
vertex, 356
31
edge, 356
linear dependence, 32, 40
face, 356 linear independence, 33, 41
median plane, 356 spatial vector, 29, 320
triangle in R2 (see simplex), 65 in abstract sense, 692
vertex, 65 addition, 692
side, 65 sum, 692
median, 65 scalar multiplication, 692
centroid, 65 scalar, 684, 692
transition matrix product, 692
between bases (see change of zero vector, 692
coordinates) negative or inverse of a vector,
in Markov chain (see Markov 692
chain) linear combination: coefficient,
translation (see affine, 695
transformation) linear dependence, 696
transpose (see matrix) linear independence, 696
856 Index
positive vector, 503 complex vector space, 29,

nonnegative vector, 503 692
vector or linear space real vector space, 29, 692
one-dimensional real vector space vectorized space or vectorization
R, 8, 12, 13, 41 of a line, 9
two-dimensional real vector space scalar product, 9
R2 , 28, 31, 34, 37, 41 position vector, 7, 8
three-dimensional real vector space base vector, 9
R3 , 324, 326, 327 zero vector, 9
in abstract sense, 691(etc.) of a plane, 30, 31
subspace (see linear or vector addition, 30
subspace), 694 scalar multiplication, 30
quotient space, 695 position vector, 30
Steinitz’s replacement basis, basis vector, 31
theorem, 697 zero vector, 31
basis, basis vectors (see basis), of a space, 323
697 position vector, 324
ordered basis, 698 origin, 324
dimension, 698 zero vector, 324
finite-dimensional, basis, basis vector, 324
n-dimensional, 697, 698
infinite-dimensional, 697

Geometric Linear Algebra, Volume I - Hsiung Lin, Yixiong Lin

Uploaded by

Copyright:

Available Formats

Geometric Linear Algebra, Volume I - Hsiung Lin, Yixiong Lin

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Geometric Linear Algebra, Volume I - Hsiung Lin, Yixiong Lin

Uploaded by

Copyright:

Available Formats

What is the book about?

What is the book about?

What topics does it cover?

What topics does it cover?

G EOM ET RI C

Library of Congress Cataloging-in-Publication Data

British Library Cataloguing-in-Publication Data

Copyright © 2005 by World Scientific Publishing Co. Pte. Ltd.

What is linear algebra about?

As a consequence, the whole plane can be described algebraically as the

be expressed as such linear combinations. Then the addition α

with appropriate operational properties and using the techniques:

The purpose of this introductory book

Features of the book

1. Use intuitive geometric concepts or methods to introduce or to motivate

Therefore, in order to vivify these connections of geometries with linear

image vector of x under A, unless otherwise stated. Similar explanation is

Ways of writing and how to treat Rn for n ≥ 4

geometric objects in (Stage one)

As a whole, we can use the following problem

area of shadow region 12 1 2·1+1

12 area of shadow region 12 + 22 5 2·2+1

Sketch of the contents

introduces another useful barycentric coordinates. The important concepts

Suggestions to the readers (how to use this book)

Of course, there are other options up to one’s taste.

1. Why linear dependence and independence are deﬁned in such way?

Hence, all one needs to do is to cram up the algebraic rules of computation

1. Sophomore: Shu-li Hsieh, Ju-yu Lai, Kai-min Wang, Shih-hao Huang,

2. Junior: S. D. Tom, Christina Chai, Sarah Cheng, I-ming Wu, Chih-

4. Bo-how Chen, Kai-min Wang, Sheng-fan Yang, Shih-hao Huang,

graphed the ﬁgures, using GSP, WORD and FLASH; and

5. S. D. Tom, Siao-jyuan Wang, Chih-chiang Huang, Wan-ju Liao, Shih-

Chapter 1 The One-Dimensional Real Vector Space R (or R1 ) . . . . . . . 5

Chapter 2 The Two-Dimensional Real Vector Space R2 . . . . . . . . . . . 21

2.8 Aﬃne Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Chapter 3 The Three-Dimensional Real Vector Space R3 . . . . . . . . . . 319

Appendix A Some Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . 681

Appendix B Fundamentals of Algebraic Linear Algebra . . . . . . . . . . . 691

B.5 Elementary Matrix Operations and Row-Reduced Echelon Matrices . 719

Index of Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823

The Aﬃne and Linear Structures of R1 , R2 and R3

1. a point as a zero vector,

And hence, we deﬁne two vector operations: scalar multiplication α x and

1. Eigenvalues and eigenvectors (Secs. 2.7.1, 2.7.2, 3.7.1 and 3.7.2).

4. Diagonalizability (Secs. 2.7.6 and 3.7.6).

1. Stretch, reﬂection, shearing, rotation and orthogonal reﬂection and their

Chapter 1 is trivial in content but is necessary in the inductive process.

Part 1, we emphasize the computational aspects of matrices and determi-

The One-Dimensional Real Vector Space R (or R1 )

(1) There are uncountably inﬁnite points on L.

(5) Arbitrarily ﬁx two diﬀerent points O and A on line L. Consider the

Remark For convenience, we endow P Q with two meanings: one repre-

Therefore, ﬁnally we have

Sketch of the Content

1.1 Vectorization of a Straight Line: Aﬃne Structure

In particular, for any points P and Q on L, one has

α ∈ R with the ﬁxed vector x is suitable for describing any point P on L

As contrast to linear dependence, one has

(1) (geometric) Points O and X are diﬀerent.

Suppose that (1.3.1) is true. Then one has (why α = 0?)