Lecture Notes
Lecture Notes
Lecture Notes
These are lecture notes for Math 290-1, the first quarter of MENU: Linear Algebra and Multivariable Calculus, taught at Northwestern University in the fall of 2013. The book used was the
5th edition of Linear Algebra with Applications by Bretscher. Watch out for typos! Comments and
suggestions are welcome.
Contents
September 25, 2013: Introduction to Linear Systems
11
14
19
23
27
31
34
Rn
38
41
46
50
56
61
66
71
76
80
86
92
97
103
longer has an x in it, so weve eliminated a variable. Similarly, multiplying the first equation
by 2 gives 2x 4y 6z = 0 and adding this to the third gives 8y 5z = 2, and again weve
eliminated x. Now consider the system keeping the first equation the same but replacing the second
and third with the new ones obtained:
x + 2y + 3z = 0
4y + z = 8
8y 5z = 2.
The point is that this new system has precisely the same solutions as the original one! In other
words, row operations do change the actual equations involved but do not change the set of
solutions.
We can keep going. Now we move down to the 4y terms and decide we want to get rid of the
8y below it. We multiply the second equation by 2 and add the result to the third equation to
give 7z = 14. Thus we get the new system
x + 2y + 3z = 0
4y + z = 8
7z = 14.
Now were in business: the third equation tells us that z = 2, substituting this into the second and
solving for y gives y = 3/2, and finally substituting these two values into the first equation and
solving for x gives x = 9. Thus this system has only solution:
x = 9, y = 3/2, z = 2.
Again, since this method does not change the solutions of the various systems of equations we use,
this is also the only solution of our original system.
Now there are multiple ways we could proceed. First, we could add these two equations together
and use the result to replace the first equation, giving:
2x + 8y = 0
5y z = 4.
Compared to our original set of equations, these are simpler to work with. The question now
is: what do we do next? Do we keep trying to eliminate variables, or move on to trying to find
the solution(s)? Note that any further manipulations we do cannot possibly eliminate any more
variables, since such operations will introduce a variable weve already eliminated into one of the
equations. Well see later how we can precisely tell that this is the best we can do. So, lets move
towards finding solutions.
For now, we actually go back to equations we had after our first manipulations, namely:
2x + 3y + z = 0
5y z = 4.
We could instead try to eliminate the y term in the first equation instead of the z term as we did.
This illustrates a general point: there are often multiple ways of solving these systems, and it would
be good if we had a systematic way of doing so. This is what Gauss-Jordan elimination will do for
us. Here, lets just stick with the above equations.
We will express the values of x and y in terms of z. The second equation gives
y=
z4
.
5
Plugging this in for y in the first equation and solving for x gives:
z
3 z4
3y z
12 8z
5
x=
=
=
.
2
2
10
These equations weve derived imply that our system in fact has infinitely many solutions: for any
and y equal to z4
value we assign to z, setting x equal to 128z
10
5 gives a triple of numbers (x, y, z)
which form a solution of the original equation. Since z is free to take on any value, we call it a
free variable. Thus we can express the solution of our system as
x=
12 8z
z4
, y=
, z free.
10
5
Warm-Up 2. Find the polynomial function of the form f (x) R= a+bx+cx2 satisfying the condition
2
that its graph passes through (1, 1) and (2, 0) and such that 1 f (x) dx = 1.
The point of this problem is understanding what is has to do with linear algebra, and the
realization that systems of linear equations show up in many places. In particular, this problem boils
down to solving a system of three equations in terms of the three unknown variables a, b, and c.
The condition that the graph of f (x) pass through (1, 1) means that f (1) should equal 1 and the
condition that the graph pass through (2, 0) means that f (2) should equal 0. Writing out what
this means, we get:
f (1) = 1 means a + b + c = 1
and
f (2) = 0 means a + 2b + 4c = 0.
4
Finally, since
2
Z
1
R2
1
2
3
bx2 cx3
7
= a + b + c,
(a + bx + cx ) dx = ax +
+
2
3
2
3
1
2
f (x) dx = 1 gives
3
7
a + b + c = 1.
2
3
In other words, the unknown coefficients a, b, c we are looking for must satisfy the system of equations:
a+b+c=1
a + 2b + 4c = 0
3
7
a + b + c = 1.
2
3
Thus to find the function we want we must solve this system. Well leave this for now and come
back to it in a bit.
Augmented Matrices. From now on we will work with the augmented matrix of a system of
equations rather than the equations themselves. The augmented matrix encodes the coefficients of
all the variables as well as the numbers to the right of the equals sign. For instance, the augmented
matrix of the system in the first Warm-Up is
2 3 1 | 0
.
1 1 1 | 2
The first column encodes the x coefficients, the second the y coefficients, and so on. The vertical
lines just separate the values which come from coefficients of variables from the values which come
from the right side of the equals sign.
Important. Gauss-Jordan elimination takes a matrix and puts it into a specialized form known as
reduced echelon form, using row operations such as multiplying a row by a nonzero number,
swapping rows, and adding rows together. The key point is to eliminate (i.e. turn into a 0) entries
above and below pivots.
Example 1. We consider the system
x + 2y z + 3w = 0
2x + 4y 2z
2x
=3
+ 4z 2w = 1
3x + 2y
+ 5w = 1,
1
2
2
3
2 1 3 |
4 2 0 |
0 4 2 |
2 0
5 |
0
3
.
1
1
The pivot (i.e. first nonzero entry) of the first row is in red. Our first goal is to turn every entry
below this pivot into a 0. We do this using the row operations:
2I + II II, 2I + III III, and 3I + IV IV,
5
where the roman numerals denote row numbers and something like 3I +IV IV means multiply
the first row by 3, add that to the fourth row, and put the result into the fourth row. These
operations produce
1 2 1 3 | 0
0 0
0 6 | 3
,
0 4
2
4 | 1
0 4 3 4 | 1
where we have all zeros below the first pivot.
Now we move to the second row. Ideally we want the pivot in the second row to be diagonally
down from the pivot in the first row, but in this case its notthe 6 is further to the right.
So, here a row swap is appropriate in order to get the pivot of the second row where we want it.
Swapping the second and fourth rows gives
1 2 1 3 | 0
0 4 3 4 | 1
.
0 4
2
4 | 1
0 0
0 6 | 3
Our next goal is to get rid of the entries above and below the pivot 4 of the second row. For this
we use the row operations:
II + III III and 2I + II I.
This gives
2 0 1 2
0 4 3 4
0 0 5 0
0 0 0 6
|
|
|
|
1
1
.
2
3
Now onto the third row and getting rid of entries above and below its pivot 5. Note that the
point of swapping the second and fourth rows earlier as opposed to the second and third is that
now we already have a zero below the 5, so we only have to worry about the entries above the 5.
The next set of row operations (5II 3III II and 5I + III I) give
10
0
0 10 | 3
0
20 0 20 | 1
.
0
0
5
0
| 2
0
0
0 6 | 3
Finally, we move to the final pivot 6 in the last row and make all entries above it using the
operations
3I + (5)IV I and 3II + (10)IV II.
This gives
30 0 0 0
0 60 0 0
0 0 5 0
0 0 0 6
| 24
| 33
.
| 2
| 3
As we wanted, all entries above and below pivots are zero. The final step to get to so-called
reduced echelon form is to make all pivots one, by dividing each row by the appropriate value.
So, we divide the first row by 30, the second by 60, third by 5, and fourth by 6 to get:
1 0 0 0 | 24/30
0 1 0 0 | 33/60
0 0 1 0 | 2/5 .
0 0 0 1 | 1/2
This matrix is now in reduced echelon form. Looking at the corresponding system of equations,
the point is that weve now eliminated all variables but one in each equation. Right away, writing
down this corresponding system we get that
x=
24
33
2
1
, y= , z= , 2=
30
60
5
2
1 1
1 | 1
1 2
4 | 0 .
1 3/2 7/3 | 1
To avoid dealing with
operations gives:
1 1 1
1 2 4
6 9 14
| 1
1 1
1 |
1
1 0 2 | 2
| 0 0 1 3 |
1 0 1 3 | 1
| 6
0 3
8 | 12
0 0 1 | 9
1 0
0 | 20
1 0 0 | 20
0 1 0 | 28 0 1 0 | 28 .
0 0 1 | 9
0 0 1 |
9
The corresponding system of equations is
a = 20, b = 28, c = 9
and have found our desired unknown value. The conclusion is that the function f (x) = 2028x+9x2
is the one satisfying the properties asked for in the second Warm-Up.
using Gauss-Jordan elimination. First, we switch the first two rows in the augmented matrix in
order to have 1 in the uppermost position instead of 2this will help with computations. The
augmented matrix is
1
2
1
4 1 | 5
2 4 2 3 3 | 5
.
3
6
5 10 4 | 14
1 2 1 2 4 | 2
Performing the row operations
1
2
1
2 4 2
3
6
5
1 2 1
2I + II II, 3I + III
4 1 | 5
1
3 3 | 5
0
0
10 4 | 14
2 4 | 2
0
2 1 4 1 | 5
0 0 5 5 | 5
.
0 2 2 1 | 1
0 2 2 5 | 7
Now, there can be no pivot in the second column since the entries in the second, third, and
fourth rows are 0. The best place for the next pivot would be the third entry of the second row, so
to get a pivot here we switch the second and fourth rows:
1 2 1 4 1 | 5
0 0 2 2 5 | 7
0 0 2 2 1 | 1 .
0 0 0 5 5 | 5
We perform the row operation II III
1 2 1 4 1
0 0 2 2 5
0 0 2 2 1
0 0 0 5 5
III:
| 5
1
0
| 7
0
| 1
| 5
0
2
0
0
0
1
2
0
0
4
2
4
5
1
5
4
5
|
|
|
|
5
7
.
8
5
In usual Gauss-Jordan elimination we would also want to eliminate the 1 above the pivot 2 in the
second row, but for now we skip this. To simplify some computations, we next divide the third row
by 4 to get
1 2 1 4 1 | 5
0 0 2 2 5 | 7
0 0 0 1 1 | 2 .
0 0 0 5 5 | 5
8
1 2 1
0 0 2
0 0 0
0 0 0
5III + IV IV :
4 1 | 5
1
2 5 | 7
0
0
1 1 | 2
5 5 | 5
0
2
0
0
0
1
2
0
0
4 1 | 5
2 5 | 7
.
1 1 | 2
0 0 | 5
Since the last row corresponds to the impossible equation 0 = 5, the original system has no
solutions. Note that we did not have to do a full Gauss-Jordan elimination to determine this.
Important. If you are only interested in determining whether there is a solution, or how many
there are, a full Gauss-Jordan elimination is not needed. Only use a full elimination process when
trying to actually describe all solutions.
Warm-Up 2. We consider the same system as before, only changing the 2 at the end of the last
equation to 2. If you follow through the same row operations as before, you end up with the
augmented matrix:
1 2 1 4 1 | 5
1 2 1 4 1 | 5
0 0 2 2 5 | 7
0 0 2 2 5 | 7
0 0 0 1 1 | 2 0 0 0 1 1 | 2 .
0 0 0 5 5 | 5
0 0 0 0 0 | 0
We no longer have the issue we had before, so here we will have a solution, and in fact infinitely
many. We first do 2I + II I to get rid of the 1 above the pivot 2, which we skipped in the first
Warm-Up:
1 2 1 4 1 | 5
2 4 0 6 3 | 3
0 0 2 2 5 | 7
0
0 2 2 5 | 7
.
0 0 0 1 1 | 2 0
0 0 1 1 | 2
0 0 0 0 0 | 0
0
0 0 0
0 | 0
Next we do 6III + I I
2 4
0
0
0
0
0
0
2 4 0 0 9 | 9
0 6 3 | 3
2 2 5 | 7
0 2 0 3 | 3
.
0
0
0 1 1 | 2
0 0 1 1 | 2
0 0
0 | 0
0
0 0 0 0 | 0
1 2 0 0 9/2 | 9/2
0 0 1 0 3/2 | 3/2
.
0 0 0 1 1 |
2
0 0 0 0
0
|
0
This matrix is now in whats called row-reduced echelon form since all pivots are 1, all entries above
and below pivots are zero, and each pivot occurs strictly to the right of any pivot above it.
The variables which dont correspond to pivots are the ones we call free variables, and when
writing down the general form of the solution we express all pivot variables in terms of the free
ones. The rank (i.e. # of pivots in the reduced echelon form) of this matrix, or of the original one
we started with, is 3. This final augmented matrix corresponds to the system:
+ 29 x5 = 92
x1 + 2x2
x3
23 x5 =
3
2
x4 x5 =
so we get
9
9
3
3
x1 = 2x2 x5 , x3 = x5 + , x4 = x5 + 2
2
2
2
2
with x2 and x5 free. In so-called vector form, the general solution is
x1
2s 29 t 92
x2
s
3
3
x3 =
2t + 2
x4
t+2
t
x5
where s and t are arbitrary numbers.
Fact about reduced echelon form. It is possible to get from one matrix to another using a
sequence of row operations precisely when they have the same reduced echelon form. For instance,
since the reduced echelon form of
1 4
3
1 2 3
4 5 6 and 9 1 10
89 23838
7 8 10
both have
1 0 0
0 1 0
0 0 1
as their reduced echelon form, it is possible to get from one to the other by some sequence of row
operations.
Relation between rank and number of solutions. Based on the form of the reduced echelon
form of a matrix, there is a strong relation between the rank of a matrix and the number of solutions
of a system having that matrix as its coefficients. For instance, any system where the rank is < the
number of variables cannot possibly have a unique solution. Also, any system where the rank equals
the number of variables cannot possibly have an infinite number of solutions. We will explore this
further later, but check the book for similar facts.
Vectors. A vector is a matrix with one column, and is said to be in Rn when it has n entries. (R
is a common notation for the set of real numbers.) For instance,
1
1
2
is in R , and 2 is in R3 .
2
3
We draw vectors as arrows starting at the origin ( 00 ) and ending at the point determined by the
vectors entries. We add vectors simply by adding the corresponding entries together, and multiple
vectors by scalars (i.e. numbers) simply by multiplying each entry of the vector by that scalar.
10
| 1
A | 2
| 3
has a unique solution? If you think about what the reduced echelon form of A looks like, we know
that it should have 3 pivots. However, with 4 columns this means that one column wont have
a pivot and so will correspond to a free variable. This means that it is not possible for such a
system to have exactly one solution: either it will have no solutions or infinitely many depending
on whether the last row in the reduced echelon form corresponds to 0 = 0 or some impossible
equation. The key point is understanding the relation between number of pivots and number of
solutions.
As a contrast, we ask if there is a 4 3 matrix A of rank 3 such that the system with augmented
matrix
A | ~0 ,
where ~0 denotes the zero vector in R4 , has a unique solution. In this case, in fact for any such
matrix this system will have a unique solution. Again the reduced form will have 3 pivots, but now
with A having only 3 columns there wont be a column without a pivot and so no free variables.
Since we started with the augmented piece (i.e. the final column corresponding to the numbers
on the right side of equals signs in the corresponding system) consisting of all zeroes, any row
operations which transform the A part into reduced form will still result in the final column being
all zeroes. Thus there are no contradictions like 0 = 1 and so such a system will always have a
unique solution, namely the one where all variables equal 0.
11
Geometric meaning of vector addition and scalar multiplication. Given a vector ~x and a
scalar r, the vector r~x points is parallel to ~x but its length is scaled by a factor of r; for negative r
the direction is turned around:
Given vectors ~x and ~y , their sum ~x + ~y is the vector which forms the diagonal of the parallelogram
with sides ~x and ~y :
Linear combinations. Recall that previously we saw how to write the system
2x + 3y + z = 0
x y+z =2
as the vector equation
2
3
1
0
x
+y
+z
=
.
1
1
1
2
The expression on the left is what is called a linear combination of
2
3
1
,
, and
.
1
1
1
So, asking whether or not ( 02 ) can be expressed as such a linear combination is the same as
asking whether or not the corresponding system has a solution. We already know from previous
12
examples that this system has infinitely many solutions, but let us now understand why this is true
geometrically.
Consider first the simpler vector equation given by
2
3
0
x
+y
=
.
(1)
1
1
2
Using the geometric interpretations of vector addition and scalar multiplication, it makes sense
that this system has a unique solution
since we can eyeball that there are specific scalars x and y
3
we can use to scale ( 21 ) and 1
and have the results add up to ( 02 ).
(2)
Now we go back to our original vector equation. The solution for x and y in equation 1 together
with z = 0 gives a solution of
2
3
1
0
x
+y
+z
=
.
1
1
1
2
13
In the same manner the solution for y and z in equation 2 together with x = 0 gives another
solution of this same vector equation. Since this vector equation now has at least two solutions, it
in fact must have infinitely many since we know that any system has either no, one, or infinitely
many solutions. This agrees with what we found previously when solving this system algebraically.
The point is that now that weve rewritten systems in terms of vectors, we have new geometric
ideas and techniques available at our disposal when understanding what it means to solve a system
of linear equations.
Matrix forms of systems. Continuing on with the same example, the expression
2
3
1
x
+y
+z
1
1
1
x
3 1 by the vector
y :
is also what we call the result of multiplying the matrix 21 1
1
z
x
2 3 1
2
3
1
y =x
+y
+z
.
1
1
1
1 1 1
z
Thus the system we are considering can also be written in matrix form as
x
0
2 3 1
y =
.
2
1 1 1
z
Solving this matrix equation for ( xy ) is the same as solving the original system of equations we
considered, which is also the same as solving the vector equation we had before. The idea we
will expand on in the coming weeks is that understanding more about matrices and such matrix
equations will give us yet another point of view on what it means to solve a system of linear
equations.
Important. Systems of linear equations, vector equations involving linear combinations, and
matrix equations are all different ways of looking at the same type of problem. These different
points of view allow the use of different techniques, and help to make systems more geometric.
1
3
1
0 , 2 , and 2?
1
1
1
Recall that a linear combination of these three vectors is an expression of the form
1
3
1
a 0 + b 2 + c 2 ,
1
1
1
14
a + 3b c
4
2b 2c = 0 ,
a b c
2
so our problem boils
matrix gives
1
0
1
down to asking whether this system has a solution. Reducing the augmented
3 1 | 4
1 3 1 | 4
1 3 1 | 4
2 2 | 0 0 2 2 | 0 0 2 2 | 0 ,
1 1 | 2
0 2 2 | 6
0 0 0 | 6
4
so we see that there is no solution. Hence the vector 0 is not a linear combination of the three
2
given vectors.
4
Instead we now ask if 2 is a linear combination of the three given vectors. This is asking
2
whether
1
3
1
4
a 0 + b 2 + c 2 = 2
1
1
1
2
has a solution, which is the same as asking whether the system
a + 3b c =
2b 2c =
a b c = 2
has a solution. Reducing the
1
3 1
0
2 2
1 1 1
| 4
1 3 1 | 4
1 3 1 | 4
| 2 0 2 2 | 2 0 2 2 | 2 ,
| 2
0 2 2 | 2
0 0 0 | 0
4
2
at which point we know that there will be a solution. Thus
is a linear combination of
2
the three given vectors. To be precise, continuing on and solving this system completely gives
a = 1, b = 2, c = 1 as one solution (there are infinitely many others), and you can check that
1
3
1
0 + 2 2 + 2
1
1
1
4
indeed equals 2 .
2
Important. Questions about linear combinations often boil down to solving some system of linear
equations.
15
1
0
1
is what you get when rotating ( 01 )
you get when rotating ( 0 ) counterclockwise by 90 and
0
by the same amount. So we would say that geometrically applying T has the effect of rotating ( 10 )
and ( 01 ) by 90 . In fact, as we will see, it turns out that this is the effect of T on any possible
input: given a vector ~x in R2 , the vector T (~x) obtained after applying T is the vector you get when
rotating ~x (visualized as an arrow) by 90 .
The above transformation T has the property that for any input vectors ~x and ~y we have
T (~x + ~y ) = T (~x) + T (~y ),
and for any scalar r we have
T (r~x) = rT (~x).
(This just says that A(~x + ~y ) = A~x + A~y and A(r~x) = rA~x, which we will see later are general
properties of matrix multiplication.) The first equality says that when taking two inputs vectors,
it does matter whether we add them together first and then apply T to the result, or apply T to
the two inputs separately and then add, we will always get the same result. The second equality
says that scaling an input vector ~x by r and then applying T is the same as applying T to ~x first
and then scaling by r. Both of these properties should make sense geometrically since T after all is
nothing but a rotation transformation. These two properties together give us the first definition
of what it means to say a function is a linear transformation.
First definition of Linear Transformation. A function T from some space Rm to some space
Rn is a a linear transformation if it has the properties that
T (~x + ~y ) = T (~x) + T (~y )
for any inputs ~x and ~y in Rm , and
T (r~x) = rT (~x)
for any input ~x in Rm and any scalar r.
16
Thus S is a linear transformation. In general the formula for a linear transformation should only
involve linear (i.e. to the first power) terms in the input variables (so nothing like x2 , xy, or sin y)
and should not have an extra constants added on.
Now, notice that the formula for S here can be rewritten in terms of matrix multplication as
x
2x + 3y
2
3
x
S
=
=
.
y
3x 4y
3 4
y
So, S is actually a matrix transformation, just as the transformation in Example 1 was. It turns
out that this is true of any linear transformation, giving us our second way to define what it means
for a transformation to be linear.
Second definition of Linear Transformation. A function T from Rm to Rn is a linear transformation if there is some n m matrix A with the property that
T (~x) = A~x,
that is, applying T to an input vector is the same as multiplying that vector by A. We call A the
matrix of the transformation.
Example 4. Consider the function L from R3 to R3 defined by
x
y
L y = z .
z
x
This is a linear transformation since the above formula is the same as
x
0 1 0
x
L y = 0 0 1 y .
z
1 0 0
z
The matrix of the linear transformation L is thus
0 1 0
001
100
The two defintions of linear are the same. The second definition of linear transformation is
the one the book gives as the main definition, and then later on it talks about the first definition.
I think it is conceptually better to give the first definitio as the main one, and to then realize that
such things are represented by matrices. Both definitions are equivalent in the sense that a function
satisying one must satisfy the other. In particualr, if T is a function from Rm to Rn satisfying the
first definition here is how we can see that it will also satisfy the second.
3
3
To keep notation simple we only focus
x on the case of a transformation T from R to R . The
key point is that for any input vector yz , we can express it as
x
1
0
0
y = x 0 + y 1 + z 0 .
z
0
0
1
18
Then using the properties in the first definition of linear transformation we have:
x
1
0
0
T y = T x 0 + y 1 + z 0
z
0
0
1
1
0
0
=T x 0
+T y 1
+ T z 0
0
0
1
1
0
0
= xT 0 + yT 1 + zT 0
0
0
1
1
0
0
x
= T 0 T 1 T 0 y .
0
0
1
z
This shows that applying T to a vector is the
same as multlying that vector by the matrix whose
1
0
first column is the result of applying T to 0 , second column the result of applying T to 1 ,
0
0
0
and third column the result of applying T to 0 . Thus T is a linear transformation according to
1
the second definition with matrix given by
0
0
1
T 0 T 1 T 0 .
1
0
0
A similar reasoning works for T from Rm to Rn in general, not just R3 to R3 .
Important. For any linear transformation, the matrix which represents it is always found in the
same way: we determine what T does to input vectors with a single entry equal to 1 and other
entries equal to 0, and use these results of these computations as the columns of the matrix.
19
.
1
0
1
Then rearranging terms we have
1
1
2
0
=
+
.
0
1
1
2
20
Again using the properties in the first definition of linear transformation, we compute:
1
1
1/6
2
0
0
1
,
T
+T
=
+ 1/3 1/3 =
T
=
1/3
1
1
1
0
2
2
which is the first column of the matrix of T . The matrix of T is thus
1/6 1/3
1/3 1/3
and you can double check that multiplying this by each of the original input vectors given in the
setup indeed reuslts in the corresponding output. The upshot is that now we can compute the
result of applying T to any vector simply by multiplying that vector by this matrix, even though
we were only given two pieces of information about T to begin with.
Geometric Transformations. Various geometric transformations in 2 and 3 dimensions are
linear and so can be represented by matrices. That these are linear follows from using the geometric
interpretations of vector addition and scalar multiplication to convince yourselves that
T (~x + ~y ) = T (~x) + T (~y ) and T (r~x) = rT (~x).
Without knowing that these geometric transformations satisfied these two properties we would have
a really hard time guessing that they were represented by matrices. However, now knowing that
this is the case, in order to find the corresponding matrices allwe
to
do is determine the result
have
0
1
in two dimensions, and
and
of applying these transformations to the standard vectors
1
0
their analogs in three dimensions.
Example 1. Let T be the transformation from R2 to R2 which rotates the xy-plane (counterclockwise) by an angle . Rotating the standard vectors gives
The x and y-coordinates of the vector obtained by rotating ( 10 ) are cos and sin respectively, so
1
cos
T
=
.
0
sin
The x and y-coordinates of the vector obtained by rotating ( 01 ) are sin (negative since the result
is in the negative x-direction) and cos respecitvely, so
0
sin
T
=
.
1
cos
21
1
1
S 0 = 0 .
0
0
Rotating
0
1
0
and
0
0
1
gives
0
0
0
0
S 1 = cos and S 0 = sin ,
0
sin
1
cos
1
0
0
0 cos sin .
0 sin cos
Example 3. The matrix of reflection across the line y = x in R2 is
0 1
1 0
since reflecting ( 10 ) across y = x gives ( 01 ) and reflecting ( 01 ) gives ( 10 ). As a check:
0 1
x
y
=
1 0
y
x
is indeed the result of reflecting the vector ( xy ) across y = x.
Example 4. Consider the linear transformation L from R2 to R2 which first applies the shear
determined by ( 10 11 ) and then scales the result by a factor of 2. (Check the book for a picture of
what a shear transformation does.) Starting with ( 10 ), the shear transformation gives
1 1
1
1
=
,
0 1
0
0
and then scaling by 2 gives ( 20 ). Shearing ( 01 ) gives
1 1
0
1
=
0 1
1
1
22
and then scaling by 2 gives ( 22 ). Thus the matrix of this combined transformation, which first
shears and then scales, is
2 2
.
0 2
Important. You should be familiar with rotations, reflections, shears, and scalings this quarter.
Orthogonal projectins have a somewhat more complicated formula, and is something we will come
back to next quarter.
.
0
1/ 2
Reflecting this across y = x flips its direction so we get
1/2
1/2
.
1/ 2
1/ 2
Thus overall the transformation in question sends ( 10 ) to
1/ 2
.
1/ 2
1/ 2
0
1
1/ 2
and then reflecting
this
across y = x does nothing since this vector is on this line. Thus overall
1/ 2
0
( 1 ) it sent to 1/2 . The matrix of this combined transformation is thus
1/2 1/ 2
.
1/ 2 1/ 2
Example 1. Consider linear transformations T and S from R2 to itself represented respectively
by matrices
a b
m n
and
.
c d
p q
We determine the matrix for the composed transformation T S which first applies S and then applies
23
24
1 1 0
0
3 2
3
2
1 and 1 5
1
2 0 1
2 1 1
is the 3 3 matrix given by
1 1 0
0
3 2
1 2 3
3
2
1 1 5
1 = 0 18 3 .
2 0 1
2 1 1
2 5 4
0
Again, the first column of the product is the result of multiplying the first matrix by 1 , the
2
3
2
second column is the first matrix times 5 , and the last column is the first matrix times 1 .
1
0 1
.
1 0
(This second one is obtained by reflecting ( 10 ) across y = x and then doing the same for ( 01 ).)
Thus, since matrix multiplication corresponds to composition of transformations, the combined
transformation of the Warm-Up has matrix equal to the product
0 1
1/2 1/ 2
1/2 1/ 2
=
,
1 0
1/ 2 1/ 2
1/ 2 1/ 2
25
agreeing with the matrix we found in the Warm-Up. Again note the order of composition: our
combined transformation first applied the rotation and then the reflection, so the matrix for rotation
is on the right and reflection on the left. (We always read compositions from right to left.)
1/ 2 1/ 2
Example 3. Let A = 1/2 1/2 . We want to compute A80 , which means A multiplied by
itself 80 times. Of course, doing this multiplication by hand 80 times would be crazy. As well, if
you start multiplying A by itself a few times you might notice some pattern which would help, but
this is still not the most efficient way to approach this. Instead, recognize that A is the matrix for
rotation by /4, so A80 is the matrix for composing this rotation with itself 80 times. The point is
that we know that rotating by /4 eight times is the same as a rotation by 2, which geometrically
puts a vector back where it started. Thus A8 should be the matrix for the transformatoin which
leaves a vector untouched; this is called the identity transformation and its matrix is the identity
matrix :
1 0
.
0 1
Thus A8 = I2 (the subscript denotes the fact that we are looking at the 2 2 identity matrix), and
every further eighth power will again result in I2 . Since 80 is a multiple of 8, we have A80 = I2
without having to explicitly multiply A by itself 80 times.
Similarly, A40 = I2 so A43 A40 A3 = A3 , which is the matrix for rotation by 3/4, which is
the same as rotating by /4 three times in a row. Thus
1/ 2 1/2
43
A =
.
1/ 2 1/ 2
Properties and non-properties of matrix multplication. Just like usual multiplication of
numbers, matrix multiplication is associative:
(AB)C = A(BC)
for any matrices A, B, C for which all products above make sense; it is distributive:
A(B + C) = AB + AC and (A + B)C = AC + BC,
and it has an identity, namely the identity matrix I:
AI = A = IA for any A.
However, matrix multiplication is unlike multiplication of numbers in that it is not necessarily
commutative:
AB does not necessarily equal BA,
and it is possible to multiply nonzero matrices together to get the zero matrix:
AB = 0 does not necessarily mean A = 0 or B = 0.
Here, 0 denotes the zero matrix, which is the matrix with all entries equal to 0.
Example 4. For matrices A, B, it is not necessarily true that
(A + B)(A B) = A2 B 2
26
1 0
0 1
0 1 0
A = 0 0 1 .
1 0 0
We compute A100 . The point is that we should not sit down and multiply A by itself 100 times,
but rather we should think about what the linear transformation T corresponding to A is actually
doing. We have
y
x
0 1 0
x
y = z ,
T y = 0 0 1
x
z
1 0 0
z
so we see that T has the effect of shifting the entries of an input vector up by one while moving
the first entry down to the end. Thus, T 2 has the effect of doing this twice in a row and T 3 the
effect of shifting three times in a row. However, with only three entries in an input vector, after
doing the shift three times in a row were back where we started so
x
x
T 3 y = y .
z
z
In other words, T 3 is the identity transformation which does nothing to input vectors. Thus
A3 , which is supposed to be the matrix for the composition T 3 , equals the matrix for the identity
transformation, which is the identity matrix:
1 0 0
A3 = 0 1 0 .
0 0 1
27
Then A6 = I, A9 = I, and so on everything time we take a power thats a multiple of 3 we get the
identity, so
0 1 0
A100 = A99 A = IA = A = 0 0 1 .
1 0 0
Of course, you can multiply A by itself three times directly and see that A3 = I, buts important
to see why this is so from the point of view of composing linear transformations.
Warm-Up 2. Find all matrices B commuting with
B
2 1
7 5
2 1
2 1
=
B.
7 5
7 5
Note that in order for both of these products to be defined B must be 2 2. Writing down an
arbitrary expression for B:
a b
B=
,
c d
the problem is to find all values of a, b, c, d such that
a b
2 1
2 1
a b
.
=
c d
7 5
7 5
c d
Multiplying out both sides gives the requirement
2a c 2b d
2a + 7b a + 5b
.
=
7a + 5c 7b + 5d
2c + 7d c + 5d
Equating corresponding entries on both sides gives the requirements
2a + 7b = 2a c
a + 5b = 2b d
2c + 7d = 7a + 5c
c + 5d = 7b + 5d,
so after moving everything to one side of each of these we see that the values of a, b, c, d we are
looking for must satisfy the system
7b + c
7a
=0
3c + 7d = 0
a + 3b
+ d=0
7b c
Hence our question boils down to solving this system of
Row-reducing the augmented matrix gives
0
7
1 0 | 0
1
7 0 3 7 | 0
0
1 3
0
0 1 | 0
0 7 1 0 | 0
0
28
=0
equations. (Imagine that!)
0 3/7 1 | 0
1 1/7 0 | 0
0 0
0 | 0
0 0
0 | 0
3
7c + d
a
b 1 c
7
,
=
c
c
d
d
so our conclusion is that the matrices B which commute with 27 1
are those of the form
5
3
7 c + d 17 c
B=
for any numbers c and d.
c
d
Note that taking c = 7 and d = 5 gives 27 1
5 , which makes sense since any matrix indeed
commutes with itself.
1/ 2 1/ 2
Inverse Transformations. Say that A = 1/2 1/2 is rotation by /4. We call rotation by
/4 the inverse transformation since it undoes what the first one does. In other words, applying
the linear transformation determined by A and then following it with the inverse transformation
always gives you
what
you started with. The matrix for this inverse transformation (rotation
back
by /4) is
1/ 2 1/ 2
1/ 2 1/ 2
B = ( 01 10 ) is that same reflection since to undo what a reflection does you simply apply that
same reflection again. We say that a reflection is its own inverse.
In general, given a transformation T such that with input ~x you get output ~y :
T (~x) = ~y ,
the inverse transformation (if it exists) is the linear transformation with the property that inputting
~y gives as output ~x.
Invertible Matrices. A (square) matrix A is invertible if there is another (square) matrix B (it
will necessarily be of the same size as A) with the property that AB = I and BA = I. We call
this matrix B the inverse of A and denote it by A1 . (It turns out that for square matrices the
requirement that AB = I automatically implies BA = I, but this is not at all obvious.) If A is the
matrix for a linear transformations T , then A1 is the matrix for the inverse transformation of T .
Back to previous geometric examples. The geometric examples we just looked at say that
1
1/2 1/ 2
1/ 2 1/2
=
1/ 2 1/ 2
1/ 2 1/ 2
and
0 1
1 0
1
0 1
=
.
1 0
0 1
1 0
0 1
1 0
=
.
1 0
0 1
29
a b
c d
1
1
d b
a b
.
=
c d
ad bc c a
KNOW THIS FORMULA
BY HEART. The denominator of the fraction involved is called the
a
b
determinant of c d ; we will come back to determinants later. You can verify that this formula
for the inverse of a 2 2 matrix is correct by checking that
1
a b
d b
1 0
times
equals
.
c d
0 1
ad bc c a
Also, note that applying this formula to the geometric 2 2 examples above indeed gives what we
claimed were the inverses.
Inverses in general. As in the 2 2 case, there is an explicit formula for the inverse of any
invertible n n matrix. However, this formula gets a lot more complicated even in the 3 3
case and is NOT worth memorizing. Instead, we compute inverses in general using the following
method, which will most often be much, much quicker.
To find the inverse of an invertible matrix A, set up a big augmented matrix
A | I
with A on the left and the appropriately-sized identity on the right, then start doing row operations
to reduce that A part to the identity while at the same time doing the same operations to the
identity part:
A | I I | A1 .
The matrix you end up with on the right side is the inverse of A. Check the book for an explanation
of why this works.
Note that this process only works if it is indeed possible to reduce A to the identity matrix,
giving us our first way to check whether a given matrix is invertible.
Important. A is invertible if and only if the reduced echelon form of A is the identity matrix,
which can happen if and only if A has full rank, meaning rank equal to the numbers of rows and
columns.
Example 1. Let A be the matrix
1 1 0
2
1 .
A= 3
2 0 1
Reducing A | I gives:
1 1 0 | 1 0 0
1 1 0 | 1 0 0
1 1 0 | 1 0 0
3
2
1 | 0 1 0 0 5
1 | 3 1 0 0 5
1 | 3 1 0 .
2 0 1 | 0 0 1
0 2 1 | 2 0 1
0 0 3 | 4 2 5
30
Note that at this point we know that A is invertible since we can already see that the reduced
echelon form of A will end up with three pivots and will be the 3 3 identity matrix. Continuing
on yields
1 1 0 | 1 0 0
5 0 1 | 2 1 0
15 0
0 | 10 5 5
0 5
1 | 3 1 0 0 5 1 | 3 1 0 0 15 0 | 5 5 5 ,
0 0 3 | 4 2 5
0 0 3 | 4 2 5
0 0 3 | 4 2 5
Dividing by the appropriate scalars turns the right side
2/3
1/3
1
A = 1/3 1/3
4/3 2/3
into
1/3
1/3 ,
5/3
1 1 0
2/3
1/3
1/3
1 0 0
3
2
1 1/3 1/3
1/3 = 0 1 0
2 0 1
4/3 2/3 5/3
0 0 1
as required of the inverse of A.
Example 2. For the matrix
1 2 3
B = 4 5 6 ,
7 8 9
we have
1 2 3 | 1 0 0
1 2
3
| 1 0 0
1 2
3 | 1
0 0
4 5 6 | 0 1 0 0 3 6 | 4 1 0 0 3 6 | 4 1 0 .
7 8 9 | 0 0 1
0 6 12 | 7 0 1
0 0
0 | 1 2 1
Since we can now see that the left part will not give us a reduced form with three pivots, we can
stop here: B is not invertible.
so that whether we perform row operations before or after multiplying matrices matters. If A and B
are both invertible, then AB is also invertible (with inverse B 1 A1 ) so in this case rref(A), rref(B),
and rref(AB) are all the identity and here we do have
rref(AB) = rref(A) rref(B).
1 1
However, for A = ( 11 11 ) and B = 1
1 we have
1 1
1 1
0 0
rref(A) =
, rref(B) =
, and rref(AB) =
,
0 0
0 0
0 0
so
rref(AB) 6= rref(A) rref(B)
in this case. That is, rref(AB) = rref(A) rref(B) is only sometimes true.
Warm-Up 2. Suppose that A and B are square matrices such that AB = I. We claim that both
A and B are then invertible. Comparing with the definition of what it means for either A or B to
be invertible, the point is that here we only know that multiplying A and B in one order gives the
identity, whereas the definition would require that BA = I as well. It is not at all obvious that just
because AB = I it must also be true that BA = I, and indeed this is only true for square matrices.
A first thought might be to multiply both sides of AB = I on the left by A1 to give
B = A1 .
Then multiplying by A on the right gives BA = I, which is what we want. However, this assumes
that A1 already exists! We cant start multiplying by A1 before we know A is actually invertible,
so this is no good. Instead, we use the fact that a matrix B is invertible if and only if the only
solution to B~x = ~0 is ~x = ~0. Indeed, if this is true then the reduced echelon form of B has to be
the identity, so B will be invertible.
So, we want to show that only solution of B~x = ~0 is the zero vector. Multiplying on the left by
A here gives
A(B~x) = A~0, so (AB)~x = ~0.
But since AB = I this gives ~x = ~0 as we wanted. This means that B is invertible, and hence B 1
exists. Now we can multiply both sides of AB = I on the right by B 1 to give
A = B 1 , so BA = I
after multiplying on the left by B. Thus AB = I and BA = I so A is invertible as well with inverse
B. Again, note that we said nothing about B 1 above until we already knew that B was invertible,
and we were able to show that B is invertible using another way of thinking about invertibility.
(More on this to come.)
Remark. As said above, for non-square matrices A and B it is not necessarily true that AB = I
automatically implies BA = I. (In fact, it never does but to understand this we need to know a
bit more about what the rank of a matrix really means.) For example, for
1 0
1 0 0
A=
and B = 0 1
0 1 0
0 0
32
we have
AB =
1 0
0 1
1 0 0
but BA = 0 1 0 ,
0 0 0
0 1
k 0
1 0
,
, and
k 1
1 0
1 0
33
are examples of what are called elementary matrices because they induce elementary row operations.
There are analogues for matrices of any size.
As an application, we can fully justify why the process we use for computing inverses actually
works. Starting with an invertible A, we know there are row operations which will reduce A to
the identity I. Each of these row operations can be obtained via multiplication by an elementary
matrix, so there are elementary matrices E1 , E2 , . . . , Em such that
Em E2 E1 A = I.
But then this equation says that Em E2 E1 satisfies the requirement of being the inverse of A,
so A1 = Em E2 E1 . We can write this as
A1 = Em E2 E1 I,
where the right side now means we take the operations which reduced A to I and instead perform
them on I; the result is A1 , which is precisely what our method for the finding inverses says.
1 2
4 5
7 8
1
1 2
3
3
6 0 3 6 0
0
0 6 12
9
steps gives:
2
3
3 6 .
0
0
(3)
Given this final form, we know that if we had a vector ~b such that the final augmented
piece of
1
the corresponding reduced augmented matrix ended up being something like 1 :
1
1 2 3 |
1 2
3 | 1
4 5 6 | ~b 0 3 6 | 1 ,
0 0
0 | 1
7 8 9 |
then A~x = ~b would indeed have no solution. The point is that starting from this reduced form we
can work our way backwards to find such a ~b.
Going back to the row operations we did in (3), to undo the last one we multiply the second
row by 2 and add it to the last row:
1 2
3 | 1
1 2
3
| 1
0 3 6 | 1 0 3 6 | 1 .
0 0
0 | 1
0 6 12 | 3
To undo the first row operations we did in (3), we now do 4I + II II and 7I + III III:
1 2
3
| 1
1 2 3 | 1
0 3 6 | 1 4 5 6 | 5 .
0 6 12 | 3
7 8 9 | 10
34
1
Thus, ~b = 5 is an example of a vector ~b such that A~x = ~b has no solution. Well reinterpret
10
this fact in terms of the image of A in a bit.
Warm-Up 2. The transpose of an n n matrix B is
the columns of B into the rows of B T . For instance,
T
a b c
a
d e f = b
g h i
c
d g
e h .
f i
We claim that if B is invertible then so is B T , and the only fact we need is the following identity:
(AB)T = B T AT for any matrices A and B.
(Well come back to this identity and work with transposes more later on.) To show that B T
is invertible we use one of the equivalent characterizations of invertibility from the Amazingly
Awesome Theorem and verify that the only vector satisfying B T ~x = ~0 is ~x = 0.
So, start with B T ~x = ~0. Since B is invertible B 1 exists and it has a transpose (B 1 )T .
Multiplying both sides of our equation by this gives
(B 1 )T B T ~x = (B 1 )T ~0 = ~0.
Using the identity for transposes stated above, this left-hand side equals (BB 1 )T , which equals
the transpose of the identity matrix, which is the identity matrix itself. So the above equation
becomes
(BB 1 )T ~x = ~0 = I~x = ~0,
so the only vector satisfying B T ~x = ~0 is ~x = ~0. This means that B T is invertible.
The kernel of a matrix. The kernel of a matrix A, denoted by ker A, is the space of all solutions
to A~x = ~0. In terms of the linear transformation T determined by A, this is the space of all input
vectors which are sent to ~0 under T .
Example 1. Looking at the matrix A from Warm-Up 1, to find its kernel we must find all solution
of A~x = ~0. Continuing the row operations started before, we have:
1 2
3 | 0
1 0 1 | 0
A | ~0 0 3 6 | 0 0 1 2 | 0 .
0 0
0 | 0
0 0 0 | 0
x
Denoting ~x = yz , we have that z is free and x = z, y = 2z. Thus solutions of A~x = ~0, i.e.
vectors ~x in the kernel of A, look like:
x
t
1
y = 2t = t 2 .
z
t
1
1
Thus, any vector in ker A is a multiple of 2 . The collection of all such multiples is what we
1
1
call the span of 2 , so we can say that
1
1
ker A = span 2 .
1
35
1
In words we would also say that 2 spans the kernel of A. Geometrically, this kernel is the line
1
1
containing 2 , which is what the set of all multiples of this vector looks like.
1
Definition of Span. The span of vectors ~v1 , ~v2 , . . . , ~vk is the collection of all linear combinations
of ~v1 , ~v2 , . . . , ~vk ; i.e. all vectors expressible in the form
c1~v1 + c2~v2 + + ck~vk
for scalars c1 , . . . , ck .
Example 2. We find vectors which span the kernel
1 2
B = 2 4
3 6
of
3
6 .
9
1 2 3
1 2 3
2 4 6 0 0 0 .
0 0 0
3 6 9
When solving B~x = ~0 we would add on an extra augmented column of zeros, but from now on
when finding the kernel of a matrix we will skip this additional step. From the reduced echelon
form above, we can see that the solutions of B~x = ~0 all look like
x
2s + 3t
y =
.
s
z
t
To find vectors which span the collection of all vectors which look like this, we factor out each
free variable:
2s + 3t
2
3
= s 1 + t 0 .
s
t
0
1
2
3
Thus, any solution of B~x = ~0 is expressible as a linear combination of 1 and 0 , so these two
1
0
vectors span the kernel of B:
3
2
ker B = span 1 , 0 .
0
1
Important. To find vectors spanning the kernel of any matrix A, find all solutions of A~x = ~0 and
express all variables in terms of free variables. Then factor out each free variable to express the
solution as a linear combination of some vectors; these vectors span the kernel.
Example 3. We have
1
2 0 2 3
1
1 2 0 2 0 2
2 4 4 3 5 1 0 0 1 1/4 0 0 .
3
6 4 7 11 5
0 0 0
0
1 1
36
Thus the kernel of the first matrix consists of things which look like
x1
2s + 2t + 2u
2
2
2
x2
1
0
0
1
x3
4t
=
= s 0 + t 1/4 + u 0 ,
x4
0
1
0
t
x5
1
u
0
0
x6
and so
2
2
2
1 0 0
1
2 0 2 3
1
0
0
1/4
ker 2 4 4 3 5 1 = span ,
, .
0 1 0
3
6 4 7 11 5
0 0 1
0
0
1
The image of a matrix. The image of a matrix A is the collection im A of all possible outputs
of the linear transformation determined by A. More concretely, any such output has the form A~x,
so the image of A consists of all vectors ~b such that A~x = ~b has a solution. Better yet, the product
A~x is expressible as a linear combination of the columns of A, so ~b is in the image of A if ~b is a
linear combination of the columns of A. Thus we can say that
im A = span {columns of A} .
1
1 2 3
Back to Example 1. The first Warm-Up shows that ~b = 5 is not in the image of A = 4 5 6
10
789
since A~x = ~b has no solution.
Back to Example 2. The image of B from Example 2 is the span of its columns, so
3
2
1
2 , 4 , 6 .
im B = span
9
6
3
However, we can simplify this description. Note that second and third vectors in this spanning set
are multiples of the first. This means that any vector which is expressible as a linear combination
of all three is in fact expressible as a linear combination (i.e. multiple) of the first alone. Thus, the
above span is the same as the span of the first column alone, so
1 2 3
1
im 2 4 6 = span 2 .
3 6 9
3
Geometrically, the span of one vector is the line containing that vector, so in this case the image
of B is a line.
Important. If ~b is a linear combination of vectors ~v1 , . . . , vk , then
n
o
span ~v1 , . . . , ~vk , ~b = span {~v1 , . . . , ~vk } .
37
In other words, if one vector is itself in the span of other vectors, throwing that vector away from
our spanning set does not change the overall span.
Final Example. We want to find matrices A and B such that the plane x 2y + 3z = 0 is at the
same time the kernel of A and the image of B. First, the equation defining this plane is precisely
what it means to say that
x
y is in the kernel of 1 2 3 ,
z
so for the 1 3 matrix A = 1 2 3 , the plane x 2y + 3z = 0 is the kernel of A. There are
tons of other matrices which work; for instance, the kernel of
1 2 3
1 2 3
1 2 3
is also the plane x 2y + 3z = 0.
Now, note that we can find vectors which span the plane as follows. From the equation of the
plane we find that x = 2y 3z, so vectors on the plane look like
3
2
2y 3z
x
y = y = y 1 + z 0 .
1
0
z
z
Hence the plane is equal to the span of
2
3
1 and 0 ,
0
1
which is equal to the image of the matrix
2 3
B = 1 0 .
0 1
That is, im B is the plane x 2y + 3z = 0.
our Amazingly Awesome Theorem characterizing the various things equivalent to a matrix being
invertible.
The image of A consists of all ~b in Rn such that A~x = ~b has a solution. But since A is invertible,
this equation always has the solution ~x = A1~b, so any vector in Rn is in the image of A. Thus
im A = Rn .
Conversely, an n n matrix whose image is all of Rn must be invertible, again adding on to our
Amazingly Awesome Theorem.
Amazingly Awesome Theorem, continued. The following are also equivalent to a square nn
matrix A being invertible:
ker A = {~0}, i.e. the kernel of A consists of only the zero vector
im A = Rn , i.e. the image of A consists of every vector in Rn
Warm-Up 2. Let A be the matrix
1 1 2 1
1 1 3 4 .
2 2 2 4
We want to find vectors spanning ker A and im A, and
as possible. First we row reduce:
1 1 2 1
1
1 1 3 4 0
2 2 2 4
0
1 0 5
0 1 3 .
0 0 0
5
x1
2
2s + 5t
0
x2 s
1
=
x3 3t = s 0 + t 3 ,
1
x4
t
0
so
2
5
1
, 0 .
ker A = span
3
0
1
Since neither of these vectors is a linear combination of the other (recall that a linear combination
of a single vector is simple a multiple of that vector), throwing one vector away wont give us the
same span, so we need both of these in order to span the entire kernel.
Now, im A is spanned by the columns of A. However, note that the second column is a multiple
of the first so that throwing it away gives the same span as all four columns:
2
1
1
1 , 3 , 4 .
im A = span
2
2
4
39
But were not done yet! The third vector here is actually a linear combination of the first two:
1
1
2
4 = 5 1 + 3 3 ,
4
2
2
so that the first two vectors have the same span as all three. (Again, to be clear, we are using the
fact that if ~v3 = a~v1 + b~v2 , then
c1~v1 + c2~v2 + c3~v3 = c1~v1 + c2~v2 + c3 (a~v1 + b~v2 ) = (c1 + c3 a)~v1 + (c2 + c3 b)~v2 ,
so a linear combination of ~v1 , ~v2 , ~v3 can be rewritten as a linear combination of ~v1 and ~v2 alone.)
The first two vectors in the above span are not multiples of each other, so finally we conclude that
2
1
im A = span 1 , 3
2
2
and that we need both of these vectors to span the entire image.
Important properties of kernels and images. For any matrix A, its kernel and image both
have the following properties:
Both contain the zero vector: ~0 is in ker A since A~0 = ~0 and ~0 is in im A since A~x = ~0 has a
solution,
Adding two things in either one gives something still in that same space: if ~x and ~y are in
ker A then A(~x + ~y ) = A~x + A~y = ~0 + ~0 = ~0 so ~x + ~y is also in ker A, and if ~b1 and ~b2
are in im A, meaning that A~x = ~b1 has a solution ~x1 and A~x = ~b2 has a solution ~x2 , then
A~x = ~b1 + ~b2 has solution ~x1 + ~x2 , so ~b1 + ~b2 is still in im A, and
Scaling something in either one gives something still in that same space: if ~x is in ker A, then
A(c~x) = cA~x = c~0 = ~0 so c~v is in ker A, and if A~x = ~b has solution ~x, then A(c~x) = cA~x = c~b
so c~b is still in im A.
Definition of a subspace. A collection V of vectors in Rn is a subspace of Rn if it has all of the
following properties:
The zero vector ~0 is in V ,
V is closed under addition in the sense that if ~u and ~v are in V , then ~u + ~v is also in V ,
V is closed under scalar multiplication in the sense that if ~u is in V and c is any scalar, then
c~v is also in V .
Back to kernels and images. So, for an n m matrix, ker A is a subspace of Rm and im A is a
subspace of Rn .
Example 1. Consider a line in R2 which passes through the origin, such as y = x. This line
consists of all vectors ( xy ) whose coordinates satisfy y = x. We claim that this is a subspace of R2 .
Indeed, ( 00 ) satisfies
y = x, so this
line contains the zero vector. Given two vectors on this line,
a+b
a
b
say ( a ) and b , their sum a+b is also on this line since its x and y coordinates satisfy y = x.
40
of the xy-plane including the nonnegative x and y-axes. We check whether V is a subspace of R2 .
Based on what we said last time, this answer should be no since V is not {~0}, nor a line through
the origin, nor all of R2 .
First, V does contain the zero vectors since ( 00 ) satisfies the requirement that its x and y
coordinates are 0. V is also closed under addition: if ( xy ) and ( ab ) are in V (meaning both
coordinates of each are 0), then so is x+a
y+b since
x + a 0 and y + b 0.
However, V is not closed under scalar multiplication: ( 11 ) is in V but 2 ( 11 ) is not. Thus V is not
a subspace of R2 .
Warm-Up 2. We now ask whether the following set of vectors in R3 is a subspace of R3 :
W = y : x y + z = 0 and 2x y 2z = 0 .
z
x
a
We can check the subspace conditions one at a time again. For instance, if yz and b are in
c
x+a
W , then each satisfies the equations defining W , so y+b satisfies
z+c
(x + a) (y + b) + (z + c) = (x y + z) + (a b + c) = 0 + 0 = 0
and
2(x + a) (y + b) 2(z + c) = (2x y 2z) + (2a b 2c) = 0 + 0 = 0.
a
Thus yz + b also satisfies the equations defining W , so this is in W and hence W is closed
c
under addition.
However, there is a simpler way to see that W is a subspace of R3 . The equations defining W
say precisely that
x
1 1 1
0
y =
,
0
2 1 2
z
1
so W is the same as the kernel of 12 1
1 2 . Since kernels are always subspaces, W is indeed a
3
subspace of R .
x
0
2
3
3
, and
4
7
5
1
42
in R4 are linearly dependent or independent. According to the definition, we must see if any vector
is a linear combination of the rest. For instance, to check if the first is a linear combination of the
other three we ask whether
1
0
2
1
3 2
3
0
a
4 + b 7 + c 4 = 2
3
1
5
5
has a solution for a, b, c. Reducing the corresponding
1 2 0 | 1
1 2 0 |
0
0 3 3 |
3
3
|
2
4 7 4 | 2 0 1 4 |
5 5 1 | 3
0 5 1 |
1
1 2 0 | 1
2
0 3 3 | 2 ,
2
0 0 9 | 4
2
0 0 0 | 4
from which we can see there is no solution. Thus the first vector in our collection is not a linear
combination of the others, so it is not redundant.
We move on and ask whether the second vector is a linear combination of the rest; that is, does
1
2
0
1
2
3
3 0
a
2 + b 7 + c 4 = 4
3
5
1
5
have a solution for a, b, c. As before, we can
1
2
2
3
2 0 | 1
3 3 | 0
7 4 | 4
5 1 | 5
until we see that this wont have a solution either. So, the second vector in our collection is not
redundant either.
And so on, we can do the same for the third vector and then the fourth. However, note that this
gets pretty tedious, and you can imagine that if we had more vectors in our collection this process
would become way to much work. We will need a better way to check for dependence/independence.
Important. Vectors ~v1 , . . . , ~vk in Rn are linearly independent if and only if the only solution of
c1~v1 + + ck~vk = ~0
is the zero solution c1 = = ck = 0. So, to test whether some vectors are linearly independent we
set a linear combination of them equal to ~0 and solve for the corresponding coefficients; if all the
coefficients must be 0 the vectors are independent, if at least one is nonzero they are dependent.
Remark. The idea behind this fact is simple: if one vector is a linear combination of the others,
say
~v1 = c2~v2 + + ck~vk ,
we can rewrite this as
~v1 + c2~v2 + + ck~vk = ~0
43
with at least one coefficient, namely the one in front of ~v1 equal to 0. Similarly, if we have
c1~v1 + + ck~vk = ~0
with at least one coefficient nonzero, say c1 , we can use this equation to solve for ~v1 in terms of
the other vectors by moving c1~v1 to one side and dividing by c1 (which is why we need a nonzero
coefficient).
Back to Example 1. Consider the equation
0
0
2
1
1
3 0
3
0
2
c1
2 + c2 4 + c3 7 + c4 4 = 0 .
0
1
5
5
3
Reducing the corresponding augmented
1 1 2 0
2
0
3 3
2 4 7 3
3 5 5 1
matrix gives:
1 1 2
0
| 0
0 2 1 3
| 0
0 0 2 1
| 0
0 0
0 1
| 0
|
|
|
|
0
0
,
0
0
and we see that the only solution to our equation is the zero solution c1 = c2 = c3 = c4 = 0. Hence
the vectors in Example 1 are linearly independent.
Definition of basis. Suppose that V is a subspace of Rn . A collection of vectors ~v1 , . . . , ~vk in V
are said to be a basis of V if:
~v1 , . . . , ~vk span all of V , and
~v1 , . . . , ~vk are linearly independent.
The point is the following: the first condition says that the basis vectors are enough to be able
to describe any vector in V since anything in V can be written as a linear combination of the
basis vectors, and the second condition says that the basis vectors constitute the fewest number of
vectors for which this spanning condition is true.
Important. Intuitively, a basis of V is a minimal spanning set of V .
Standard basis of Rn . The vectors
1
0
~e1 =
and ~e2 =
0
1
form a basis of R2 , called the standard basis of R2 . In general, the standard basis of Rn is the
collection of vectors ~e1 , . . . , ~en where ~ei is the vector with 1 in the i-th position and 0s everywhere
else. Expressing an arbitrary vector in Rn as a linear combination of the standard basis of Rn is
easy:
x1
x2
.. = x1~e1 + x2~e2 + + xn~en .
.
xn
44
Remark. Note that a space can (and will) have more than one possible basis. For example, the
vectors
1
1
and
1
2
also form a basis of R2 . Indeed, for any ( xy ) in R2 the equation
1
1
x
+ c2
= c1
2
1
y
has a solution, so these two vectors span all of R2 , and these two vectors are linearly independent
since neither is a multiple of the other. So, although above we defined what we mean by the
standard basis of Rn , it is important to realize that Rn has many other bases. What is common
to them all however is that they will all consist of n vectors; this is related to the notion of
dimension, which we will come back to next time.
Example 2. We determine bases for the kernel and
0 1
2
A = 0 2 4
0 3
6
image of
0
3
1 2 .
2 1
There is a standard way of doing this, using the reduced echelon form of A, which is:
0 1 2 0 3
0 0 0 1 4 .
0 0 0 0 0
We first use this to find vectors spanning the kernel of A. Anything in the kernel looks like
0
0
x1
1
s
3
2
x2 2t 3u
0
x3 =
= s 0 + t 1 + u 0 ,
t
4
0
x4 4u
0
1
0
0
u
x5
so
1
0
0 ,
0
0
0
0
2
3
1 , and 0
0
4
0
1
span ker A. Now, note that these vectors are actually linearly independent! Indeed, the first is not
a linear combination of the other two since any combination of the other two will have a zero first
entry, the second is not a linear combination of the other two since any combination of the first
and third will have a zero third entry, and the third is not a linear combination of the first two
since any combination of the first two will have a zero fifth entry. (This will always happen when
using the procedure weve described before for finding vectors which span the kernel of a matrix.)
Thus the three vectors above form a basis of ker A.
To find a basis of im A, we also look at the reduced echelon form. In this reduced form, note
that the second and fourth columns are the pivot columns, meaning columns which contain a
45
pivot. It turns out that the corresponding columns of the original matrix form a basis of the image
of A! (Well see why next time.) In our case then, the second and fourth columns of A:
1
0
2 and 1
3
2
form a basis of im A.
Important. To find a basis for the kernel or image of a matrix, find the reduced echelon form of
that matrix. Then:
For the kernel, find vectors spanning the kernel as we have done before using the idea of
factoring out free variables. The resulting vectors will be a basis for the kernel.
For the image, take the columns in the original matrix which correspond to the pivot columns
in the echelon form. These columns form a basis for the image.
1
2
1
0 1 | 0
1 0
1 | 0
1 0 1 |
3 4 | 0 0 3
6 | 0 0 3 6 |
2 k | 0
0 2 k 1 | 0
0 0 k5 |
system gives:
0
0 .
0
Thus our equation has only the zero solution c1 = c2 = c3 = 0 whenever k 6= 5, so the three given
vectors are linearly independent as long as k 6= 5.
For k = 5 our vectors are then linearly dependent, and it should be possible to express one as
a linear combination of the rest. Indeed, when k = 5 we have
1
0
1
0
2 + 2 3 + 4 = 0 ,
1
2
5
0
46
and this can be used to express any vector in our original collection as a linear combination of the
other two.
Warm-Up 2. We find a basis for the span of
1
2
1
4
1
2 4 2 3
3
, , , , and .
3 6 5 10
4
1
2
1
2
4
Note that for sure the second vector is redundant since it is a multiple of the first, so removing
it from our collection wont change the span. Instead of trying to see which other vectors are
redundant by inspection, we use a more systematic approach. First, note that the span of these
vectors is the same as the image of the matrix
1
2
1
4 1
2 4 2 3 3
.
A=
3
6
5 10 4
1 2 1 2 4
So, we are really looking for a basis for this image, and we saw last time how to find one. Reducing
this matrix a bit gives:
1 2 1 4 1
1
2
1
4 1
2 4 2 3 3
0 0 2 2 1 .
0 0 0 4 4
3
6
5 10 4
0 0 0 0
0
1 2 1 2 4
The claim was that the columns in the original matrix which correspond to pivot columns in the
reduced echelon form give a basis for the image. Our matrix so far is not yet in reduced echelon
form, but we can already tell that the first, third, and fourth columns will be the ones containing
pivots in the end. Thus
4
1
1
2 2 3
, ,
3 5 10
1
1
2
forms a basis for the image of A, and hence a basis for the span of the original five vectors.
Let us justify a bit why these vectors indeed form a basis for im A. First, they are linearly
independent: the only solution of
1
1
4
2 2 3
~x = ~0
3
5 10
1 1 2
is ~x = ~0 since row-reducing this matrix ends up giving
1 1 4
0 2 2
0 0 4 ,
0 0 0
47
which are the first, third and fourth columns from the reduced form of A we computed above.
Second, any other vector among our original list is a linear combination of these three: for instance
if we want to write the fifth vector as
1
1
1
4
3
2
2
3
= c1 + c2 + c3 ,
4
3
5
10
4
1
1
2
the corresponding augmented matrix reduces to
1 1 4
0 2 2
0 0 4
0 0 0
| 1
| 1
| 4
| 0
based on the reduced form of A we previously computed. From here we can that there are scalars
c1 , c2 , c3 which express the fifth vector in our original list as a linear combination of the three we
are claiming is a basis. These ideas carry over for any matrix, which is why our method for finding
a basis for the image of a matrix always works.
Number of linearly independent and spanning vectors. For a subspace V of Rn , any linearly
independent set of vectors in V always has fewer (or as many) vectors than any spanning set of
vectors in V :
# of any linearly independent vectors # of any spanning vectors.
For instance, say we have four vectors ~v1 , ~v2 , ~v3 , ~v4 in R3 . Since the standard basis of R3 is a
spanning set with three vectors, our four vectors cannot be linearly independent. So, as soon as we
have more than three vectors in R3 they must be linearly dependent. Similarly, anytime we have
fewer than three vectors in R3 , they cannot possibly span all of R3 since the standard basis is a
linearly independent set with three vectors.
The dimension of a subspace. The above fact tells us the following. Say we have two bases
~v1 , . . . , ~vk and w
~ 1, . . . , w
~ ` for a subspace V of Rn . Viewing the ~v s as the linearly independent set
and the ws
~ as the spanning set we get that
k `.
Switching roles and viewing ~v s as spanning and ws
~ as linearly independent gives
` k.
These two inequalities together imply that k = `, so we come to the conclusion that any two bases
of V must have the same number of vectors! This common number is what we call the dimension
of V , and we denote it by dim V .
Important. Any two bases of a subspace V of Rn have the same number of vectors, and dim V is
equal to the number of vectors in any basis of V .
Simplified basis check. Say dim V = n. Then any n linearly independent vectors must automatically span V . Indeed, if ~v1 , . . . , ~vn were linearly independent in V and did not span V it
48
would be possible to extend this to a basis of V , giving a basis of V with more than n = dim V
vectors. This is not possible, so ~v1 , . . . , ~vn being linearly independent is enough to guarantee that
they actually form a basis of V .
We can also see this computationally, say in the case V = Rn . If ~v1 , . . . , ~vn are linearly independent vectors in Rn , the only solution of
|
|
~v1 ~vn ~x = ~0
|
|
is the zero vector ~x = ~0. (Here, the matrix is the one whose columns are ~v1 , . . . , ~vn .) But, the only
way in which this can be possible is for the reduced echelon form of this matrix to be the identity:
1
0
|
|
1
~v1 ~vn
.
..
.
|
|
0
1
But with this echelon form, it is true that
|
~v1
|
|
~vn ~x = ~b
|
has a solution no matter what ~b is, meaning that any ~b in Rn is a linear combination of the columns
of this matrix. Thus ~v1 , . . . , ~vn span all of Rn and hence form a basis of Rn .
Similarly, any n vectors in Rn which span Rn must automatically be linearly independent and
hence form a basis of Rn .
Important. In an n-dimensional space V , any n linearly independent vectors automatically form
a basis, and any n spanning vectors automatically form a basis.
The dimensions of the kernel and image of a matrix. Let us compute the dimension of the
kernel and image of
1
2
1
4 1
2 4 2 3 3
.
A=
3
6
5 10 4
1 2 1 2 4
We previously reduced this in the second Warm-Up to end up with
1 2 1 4 1
0 0 2 2 1
A
0 0 0 4 4 .
0 0 0 0
0
Each pivot column contributes one basis vector for the image, so dim im A = 3. Notice that this
is precisely the same as the rank of A, and we finally have our long-awaited meaning behind the
rank of a matrix: it is simply the dimension of its image! Based on the method we saw last time
for finding a basis for the kernel of a matrix, each free variable contributes one basis vector for the
kernel, so dim ker A equals the number of free variables. In this case, dim ker A = 2.
Important. For any matrix A,
49
1
3
4
t , 2t 2 , and 2 4t .
2
t+4
t + 10
(Note that these are not quite the vectors I used in the actual Warm-Up in class, which is what
led to my confusion. I think I dropped the t in the first vector and just used t by mistake; these
vectors above better illustrate what I meant the Warm-Up to show.) We can view this subspace as
the image of the matrix
1
3
4
t 2t 2 2 4t ,
2
t+4
t + 10
50
so we are really asking about finding the dimension of the image of this matrix, which is equal to
the rank of this matrix. We row-reduce:
1
3
4
1
3
4
1
3
4
t 2t 2 2 4t 0 t 2
2 0 t 2 2 .
2
t+4
t + 10
0 t2 t+2
0
0
t
Now we can see that our answer will depend on what t is: when t = 0 or t = 2, our matrix has
rank 2 and so our original vectors span a 2-dimensional subspace of R3 , while if t 6= 0 and t 6= 2
our matrix has rank 3 and the span of the original vectors is 3-dimensional.
Geometrically, when t = 0 or t = 2 our vectors span a plane, while for all other values of t our
vectors span all or R3 . In this case, there are no values of t for which our vectors will span only a
line.
Geometric meaning of non-invertibility. Suppose that A is a non-invertible 2 2 matrix and
consider the transformation T (~x) = A~x. Then rank A is either 0 or 1. When rank A = 0, the image
of T is zero dimensional and so consists of just the origin {~0}; thus in this case the transformation
T collapses all of R2 to a single point. When rank A = 1, the image of T is a 1-dimensional line,
so T in this case collapses all of R2 to this line. Thus, geometrically a transformation R2 R2
is non-invertible precisely when it collapses 2-dimensional things down to a single point or a line.
Analogously, the transformation T corresponding to a non-invertible 3 3 matrix will collapse
3-dimensional things down to a single point, a line, or a plane. A matrix is invertible precisely
when the corresponding transformation does not do any such collapsing.
Amazingly Awesome Theorem, continued. The following are equivalent to an n n matrix
A being invertible:
The columns of A are linearly independent
The columns of A span all of Rn
The columns of A form a basis of Rn
The kernel of A is zero dimensional
The image of A is n-dimensional
The first two conditions were actually included in the original version of the Amazingly Awesome
Theorem I gave, only we hadnt defined linearly independent and span at that point. Im
listing them here again for emphasis.
Geometric idea behind coordinates. The idea behind coordinates is the following. Usually
we represent vectors in terms of the usual x and y-axes, which geometrically are the span of ( 10 )
and of ( 01 ) respectively. Thus usually we are using the standard basis of R2 to represent vectors.
However, there is nothing stopping us from using a different set of axes (i.e. a different basis) for
R2 to represent vectors. Doing so can help to clarify some properties of linear transformations;
in particular, well see examples where picking the right set of axes can make the geometric
interpretation of some transformations easier to identify.
Definition of coordinates. Suppose that B = {~v1 , . . . , ~vn } is a basis of Rn . We know that we
can then express any ~x in Rn as a linear combination of the basis vectors in B:
~x = c1~v1 + + cn~vn for some scalars c1 , . . . , cn .
51
This is what I mean when I say that usually when we express vectors, we are doing so in terms of
the standard basis.
Example 2. Consider the basis B = ( 11 ) , 1
of R2 . This is a basis because these vectors are
1
linearly independent, and any two linearly independent vectors in a 2-dimensional space automatically span that space. We find the coordinates of any vector ( ab ) relative to this basis. We want c1
and c2 satisfying
a
1
1
= c1
+ c2
.
b
1
1
We can solve this equation using an augmented matrix as usual, or we can note the following: this
equation can be written as
c1
1 1
a
=
,
1 1
c2
b
so we can solve for the coordinates we want by multiplying both sides by the inverse of this matrix!
This equation says that our old coordinates a and b are related to our new ones c1 and c2 via
multiplication by this matrix; because of this we call
1 1
1 1
the change of basis matrix from the basis B to the standard basis. Since
1
c1
1 1
a
=
,
c2
1 1
b
we would call
1
1 1 1
1 1
=
1 1
2 1 1
the change of basis matrix from the standard basis to B; in other words, it is this inverse matrix
which tells us how to move from old (standard) coordinates to new coordinates.
Thus in our case,
a+b
1 1 1
a
c1
a
2
=
=
= ba
,
b B
c2
b
2 1 1
2
52
Important. Given a basis B = {~v1 , . . . , ~vn } of Rn , the matrix S whose columns are the basis
vectors is called the change of basis matrix from B to the standard basis, and its inverse S 1 is
the change of basis matrix from the standard basis to B. (Note that S is invertible according to
the Amazingly Awesome Theorem since its columns form a basis of Rn .) These matrices have the
properties that:
[~x]old = S[~x]new and [~x]new = S 1 [~x]old ,
where by new we mean relative to the basis B and by old we mean relative to the standard
basis. The book doesnt use the term change of basis matrix, but I think it is a useful term since
it emphasizes the role the matrix S plays in moving between different coordinates.
3 , 7 } of R2 ; that is,
Example 3. We find the coordinates of ( 35 ) relative to the basis B = { 5
12
we want c1 and c2 such that
3
3
7
= c1
+ c2
.
5
5
12
The change of basis matrix in this case is
S=
3 7
,
5 12
Geometrically, the axes determined by our basis vectors are non-perpendicular lines, and
these coordinates tells us how far along these axes we must go in order to reach ( 35 ):
After all, ( 11 ) is the first basis vector already so we dont even need the second basis vector if we
want to express the first as a linear combination of the two. The first column of the matrix of T
relative to B is then ( 10 ).
We have
1
1
T
=
1
1
since anything perpendicular to the line we reflect across just has its direction flipped around. The
coordinates of this relative to B are
1
1
0
T
=
=
1
1
1
B
B
since we dont even need to use the first basis vector in order to express the second as a linear
combination of the two. The matrix of T relative to B is thus
1
1
1 0
T
T
[T ]B =
=
.
1 B
1
0 1
B
The point of this matrix is the following: say we want to figure out what T does to some vector
~x. We can determine this by taking the coordinate vector of ~x relative B and multiplying that by
[T ]B ; the result will be the coordinate vector of T (~x) relative to B:
[T (~x)]B = [T ]B [~x]B .
In this example, the matrix of T relative to B tells us that geometrically T leaves the span {( 11 )}1
coordinate
1 of a vector alone (due to the first column being0 ( 0 )) but changes the sign of the
span
-coordinate (due to the second column being 1 ), which is precisely what reflec1
tion across y = x should do.
The point is that by switching to a better set of axes, we have somewhat simplified the
geometric description of T . Well see more and better examples of this next time.
55
4
Warm-Up
We find
2.
the 2matrix of the reflection across the line spanned by ( 3 ) relative to the
3
4
basis B = ( 3 ) , 4
of R . Recall that this matrix is given by
4
3
T
T
[T ]B =
,
3 B
4
B
where we take the coordinate vectors of T ( 43 ) and T 3
relative to the basis B. Note that on a
4
previous homework assignment you computed the matrix of this reflection relative to the standard
basis, which ended up being
24/25 7/25
.
7/25 24/25
The point is that the matrix of T relative to B is much simpler than this, because the basis B has
nice properties with respect to T .
56
First,
4
4
T
=
3
3
since ( 43 ) is on the line we are reflecting across. The coordinate vector of this relative to B is
4
1
=
3 B
0
since ( 43 ) is itself the first vector in our basis. Next,
3
3
T
=
4
4
since any vector perpendicular to the line of reflection simply has its direction changed by multiplying by 1. This has coordinate vector
0
3
=
1
4 B
since 3
is 0 times the first basis vector plus 1 times the second. The matrix of T relative to
4
B is then
1 0
.
[T ]B =
0 1
4
As we know from the geometric description of T , this
matrix
suggests that T leaves the span {( 3 )}3
direction of a vector alone while flipping the span
-direction. Also, as we said earlier, this
4
matrix is much simpler than the matrix of T relative to the standard basis.
Similarly, we can consider the transformation R which is orthogonal projection of R2 onto the
line spanned by ( 43 ). Since this satisfies
0
3
4
4
,
=
and R
=
R
0
4
3
3
1 0
.
0 0
This again is much simpler than the matrix of T relative to the standard basis, which is
31/25 24/25
.
24/25 31/25
Important. We emphasize that the matrix of T relative to B satisfies
[T (~x)]B = [T ]B [~x]B ,
57
which says that to determine the result of applying T to a vector ~x, we can take the coordinate
vector of ~x relative B and multiply it by [T ]B ; the result will be the coordinate vector of T (~x)
relative to B. You should view this equation as analogous to T (~x) = A~x, only now we are writing
everything in terms of a new basis.
Example 1. We
want to come up with a geometric description of the transformation defined by
13 6
T (~x) = 1
x. For a first attempt, we compute:
12 ~
1
13
0
6
T
=
and T
=
.
0
1
1
12
If we draw these vectors it is not clear at all what kind of geometric properties T has: it doesnt
appear to be a rotation nor a reflection, and its hard to guess whether it might be some kind of
shear or something else. The problem is that were using the standard basis of R2 to analyze T .
2Instead,
let us compute the matrix of T relative to the basis from the first Warm-Up: B =
( 1 ) , 3
. We have
1
20
2
13 6
2
,
=
=
T
10
1
1 12
1
which has coordinates 10 and 0 relative to B since it is just 10 times the first basis vector. Also,
45
3
13 6
3
,
=
=
T
15
1
1 12
1
which has coordinates 0 and 15 since it is 15 times the second basis vector. The matrix of T relative
to B is thus
10 0
.
[T ]B =
0 15
Now, what does this tell us? If we consider the axes corresponding to B, the form of this matrix
tells us that T acts by scaling the span {( 21 )}-direction by a factor of 10 and scales the span 3
1
direction by a factor of 15. Thus we do have a pretty nice description of what T does geometrically,
which would have been nearly impossible to determine given the original definition of T .
Now, in the first Warm-Up we computed that
1
7/5
=
.
2 B
3/5
Recall that the matrix of T relative to B satisfies:
[T (~x)]B = [T ]B [~x]B ,
which says that multiplying the coordinate vector of ~x by [T ]B gives the coordinate vector of T (~x).
Thus we should have
1
10 0
7/5
70/5
14
T
=
=
=
.
2 B
0 15
3/5
45/5
9
Indeed, we can directly compute T ( 12 ) as
1
13 6
1
1
T
=
=
,
2
1 12
2
23
58
and you can check that the coordinates of this relative to B are indeed 14 and 9:
1
2
3
= 14
+9
.
23
1
1
Geometrically, in terms of our new axes, scaling 7/5 by 10 and 3/5 by 15 does look like it should
1 ):
give the coordinates of ( 23
The point again is that T is much simpler to describe now that weve switched to a new basis.
2
Remark.
A fair question to ask at this point is: how did I know that the basis consisting of ( 1 )
3
and 1 was the right one to use? Well come back to this later; the answer is related to what
are called eigenvalues and eigenvectors.
4 2 2
Example 2. We find the matrix of T (~x) = 2 4 2 ~x relative to the basis
224
1
0
1
0 , 1 , 1
1
1
1
of R3 . We have
1
2
0
0
1
8
0
0 , T 1 = 2 , and T 1 = 8 ,
T
=
1
2
1
2
1
8
which respectively have coordinate vectors
2
0
0
0 , 2 , and 0
0
0
8
relative to our basis. Thus the matrix of T relative to this basis is
2 0 0
0 2 0 .
0 0 8
59
1
We can now see that, geometrically, T scales by a factor of 2 in the direction of 0 , it scales by
1
0
1
a factor of 2 in the direction of 1 , and it scales by a factor of 8 in the direction of 1 .
1
13 6
Back to Example 1. Let A = 1
be the matrix of Example 1; we want to now compute
12
A100 . It will be of no use to try to multiply A by itself 100 times, or to multiply it by itself enough
times until we notice some kind of pattern. We need a better way to do this.
The key comes from the following equation:
1
13 6
2 3
10 0
2 3
=
.
1 12
1 1
0 15
1 1
You can certainly multiply out the right hand side to see why this is true, but I claim that we know
it has to be true without doing any further computation by thinking about each of these matrices
are supposed to represent. Recall that
2 3
S=
1 1
is the change of basis matrix from the basis B = ( 21 ) , 3
of R2 to the standard basis and that
1
10 0
B=
0 15
is the matrix of the transformation T (~x) = A~x relative to B. Take a vector ~x and consider
SBS 1 ~x.
First, S 1 is the change of basis matrix from the standard basis to B, so
S 1 ~x
gives the coordinate vector of ~x relative to B. Now, since B tells us what T does relative to our
new basis, multiplying S 1 ~x by B gives us the coordinate vector of T (~x) relative to B. Finally,
multiplying by S takes this coordinate vector and expresses it back in terms of coordinates relative
to the standard basis. The end result is that
SBS 1 ~x
gives T (~x) expressed in terms of the standard basis, but this is precisely what A~x is supposed to
be! In other words, the transformation corresponding to the product SBS 1 does the same thing
to ~x as does A, so
A = SBS 1
as claimed.
Definition of similar matrices. Two matrices A and B are said to be similar if there is an
invertible matrix S satisfying A = SBS 1 .
Important. Similar matrices represent the same linear transformation only with respect to
different bases. In particular, for an n n matrix A and a basis B of Rn , A is similar to the matrix
B of the transformation T (~x) = A~x relative to B; i.e.
A = SBS 1
60
13 6
1 12
100
=
100
1
2 3
10 0
2 3
,
1 1
0 15
1 1
which is simple to compute now since powers of a diagonal matrix are easy to compute:
100 100
10
0
10 0
,
=
0
15100
0 15
so
1
100
2 3
10
0
2 3
1 1
1 1
0
15100
2 10100 3 15100 1 1 3
=
10100
15100
5 1 2
100
100
1 2 10 + 3 15
6 10100 6 15100
=
.
10100 15100
3 10100 + 2 15100
5
A100 =
as desired. Note how useful it was to compute the matrix of the transformation corresponding to
A in terms of another well-chosen basis!
Thus the first basis vector ~v1 would need to have the property that applying T to it gave a multiple
of it; however, rotating a nonzero vector by an angle strictly between 0 and 180 can never produce
a multiple of that vector! Similarly, the second basis vector would need to satisfy
T (~v2 ) = b~v2 ,
and again such an equation cannot possibly hold for the type of rotation we are considering.
This shows that there will never be a basis of R2 relative to which the matrix of T has the form
a 0
,
0 b
so A is not similar to a diagonal matrix. Again, the key point is in realizing what having a diagonal
matrix as the matrix of T relative to a basis would mean about what happens when we apply T to
those basis vectors.
0 . This would require
Warm-Up 2. We claim that any 2 2 reflection matrix A is similar to 10 1
0 . Again by considering
a basis {~v1 , ~v2 } of R2 relative to which the matrix of T (~x) = A~x is 10 1
the columns of this matrix as the coordinate vectors of T (~v1 ) and T (~v2 ), this means our basis must
satisfy
T (~v1 ) = ~v1 and T (~v2 ) = ~v2 .
In this case we can always find such a basis: take ~v1 to be any nonzero vector on the line we are
reflecting across and ~v2 to be any nonzero vector perpendicular to the line we are reflecting across.
Such vectors indeed
satisfy T (~v1 ) = ~v1 and T (~v2 ) = ~v2 , so the matrix of T relative to the basis
0
{~v1 , ~v2 } is 10 1
as desired.
The upshot is that all reflections look similar since they all look
1
0
like the matrix 0 1 after picking the right basis. This justifies the use of the term similar to
describe such matrices.
0 1 . Indeed, take the basis
We can also say that any 2 2 reflection matrix is also similar to 1
0
{~v1 , ~v2 } from before but switch their order and consider the basis {~v2 , ~v1 }. Doing so has the effect
of switching the columns in the corresponding matrix of T .
3
Warm-Up
n 0 o3. Finally, we find the matrix A of the orthogonal projection T of R onto the line
1
span
relative to the standard basis. Using previous techniques we would have to compute
1
1
0
0
T 0 , T 1 , and T 0 ,
0
0
1
which give the columns of the matrix we want. This is not so hard in this case, but could be more
complicated if we were projecting onto a different line. Heres another way to answer this which is
other situations is much simpler.
We know that the matrix A were looking for will be similar to the matrix B of T relative to
any basis of R3 :
A = SBS 1
where S is the change of basis matrix. Thus we can find A by computing SBS 1 . In order to
make this worthwhile, we should find a basis of R3 relative to which B will be simpler; thinking
about orthogonal projections geometrically we can always find such a basis: take ~v1 to be a nonzero
62
vector on the line were projecting onto and ~v2 and ~v3 to be nonzero vectors perpendicular to that
line. For instance,
0
0
1
~v1 = 1 , ~v2 = 1 , and ~v3 = 0
1
1
0
work. For these vectors, we have
T (~v1 ) = ~v1 , T (~v2 ) = ~0, and T (~v3 ) = ~0
since projecting a vector already on a line onto that line leaves that vector alone and projecting a
vector perpendicular to a line onto that line gives the zero vector. The coordinates of the above
vectors relative to the basis {~v1 , ~v2 , ~v3 } of R3 are respectively
1
0
0
0 , 0 , and 0 .
0
0
0
With the change of basis matrix S given by
0 0 1
S = 1 1 0 ,
1 1 0
we then have
1
0 0 1
1 0 0
0 0 1
A = 1 1 0 0 0 0 1 1 0 .
1 1 0
0 0 0
1 1 0
0 0 1
0 1/2 1/2
1 1 0 = 0 1/2 1/2 .
1 1 0
1
0
0
Thus
A= 1
1
= 1
1
0
= 1
1
0
= 0
0
0 1
1
1 0
0
1 0
0
0 1
1
1 0
0
1 0
0
0 1
0
1 0 0
1 0
0
0
0
1/2 1/2
1/2 1/2
0 0
0
0 0
1
0 0
1
0 0
0
0 0
0
0 0
1
1/2 1/2
0
0
0
0
63
1
0 1
1 0
1 0
1/2 1/2
1/2 1/2
0
0
is the matrix of T relative to the standard basis. Again, this would not have been hard to find
using earlier methods, but this new method
might be easier to apply in other situations, say when
1
projecting on the line spanned by 2 for instance.
3
Determinants. The determinant of a (square) matrix is a certain number we compute from that
matrix. The determinant of A is denoted by |A| or by det A. This one number will encode much
information about A: in particular, it will determine completely whether or not A is invertible.
More importantly, it has an important geometric interpretation, which we will come to over the
next few lectures.
To start with, the determinant of a 2 2 matrix is a number weve seen before in the formula
for the inverse of a 2 2 matrix:
a b
det
= ad bc.
c d
In this case, it is true that a 2 2 matrix is invertible if and only if its determinant is nonzero; this
will be true in general.
Remark on Section 6.1. The definition of determinants and method for computing them giving
in Section 6.1 is ridiculous: technically it is correct, but makes the idea of a determinant way too
complicated. In particular, ignore anything having to do with patterns and inversions. Instead,
you should use the method of cofactor or Laplace expansion described towards the end of Section
6.2 when computing determinants. The rest of 6.1 contains useful facts that well come to, but
seriously, I have no idea why the author chose to define determinants in the way he does.
Example 1. We illustrate the method of doing
computing the following determinant:
1 2
4 5
2 3
To expand along the first row means we take each entry of the first row and multiply it by the
determinant of the matrix leftover when we cross out the row and column that entry is in. So, for
the entry 1, crossing
out the row and column it is in (first row and first column in this case) leaves
us with 53 1
.
So
we
take
1
5 1
1
3 1
as part of our cofactor expansion. We do the same with the 2 and 3 in the first row, giving the
terms
4 5
4 1
2
and 3
2 1
2 3
in our expansion. The last thing to determine is what to do with the terms weve found: starting
with a + sign associated to the upper-left corner entry of our matrix, we alternate between assigning
+s and s to all other entries, so in the 3 3 case we would have
+ +
+ .
+ +
64
These signs tell us what to do with the corresponding terms in the cofactor expansion, so our
cofactor expansion along the first row becomes
4 5
4 1
5 1
.
+ 3
2
1
2 3
2 1
3 1
We are now down to computing
all together we have
1 2
4 5
2 3
Let us now compute the same determinant, only doing an expansion along the second column.
So we look at the terms
4 1
1 3
1 3
,
, 5
, and 3
2
2 1
2 1
4 1
which we get the same way as before: moving down the second column and multiplying each entry
by the determinant of whats left after crossing out the row and column that entry is in. In this
case, the first entry in the second column comes with a sign, and alternating signs down gives us
1 3
1 3
4 1
+ 5
2
2 1 3 4 1 .
2 1
Thus
1 2 3
4 5 1 = 2 4 1 + 5 1 3 3 1 3
2 1
2 1
4 1
2 3 1
= 2(4 2) + 5(1 + 6) 3(1 12)
= 70,
agreeing with our answer from when we expanded along the first row.
Important. Performing a cofactor expansion along any row or any column of a matrix will always
give the same value. Choose the row or column which makes computations as simple as possible,
which usually means choose the row or column with the most zeroes.
Example 2. We compute the determinant of
3
4 1 2
3
0
1 5
0 2 1 0
1 3 2 1
using a cofactor expansion along the third row, since this has two zeroes
term we get will automatically be zero, so we only get:
3
4 1 2
3 1 2
3
4
3
0
1 5
1 5 + 1 3
0
0 2 1 0 = (2) 3
1 2 1
1 3
1 3 2 1
65
2
5 .
1
Note that the signs follow the same pattern as before. Now we must compute each of these 3 3
determinants, and we do so by again using a cofactor expansion on each. Expanding along the first
row in the first and second row in the second, we get
3 1 2
1 5
3 5
3 1
3
1 5 = 3
(1)
+ 2
2 1
1 1
1 2
1 2 1
= 3(9) + 1(8) + 2(7)
= 5
and
3
4 2
4 2
3
4
3
5
0 5 = 3
3 1
1 3
1 3 1
= 3(10) 5(5)
= 5.
Putting it all
3
3
0
1
together gives
4 1 2
3 1 2
3 4 2
0
1 5
1 5 + 1 3 0 5 = 2(5) + 1(5) = 15.
= (2) 3
2 1 0
1 2 1
1 3 1
3
2 1
Amazingly Awesome Theorem, continued. A square matrix is invertible if and only if its
determinant is not zero. (We will come back to why this is true later.)
Formula for inverses. As in the 2 2 case, there is actually a concrete formula for the inverse
of any square matrix. This formula looks like
1
something pretty
1
,
A =
complicated
det A
which helps to explain why we need det A 6= 0 in order for A to be invertible. The fact that such
a concrete formula exists is nice for certain theoretical reasons, but is not very practical due to
the complicated nature of the matrix involved. Indeed, even for 3 3 matrices it will be faster
to compute inverses using the technique weve previously described. So, we wont say much more
about this explicit formula.
1 3
3
3
+5
3
3
3 1
66
is not invertible. Trying to do this using previous techniques we would have to perform row
operations until we can determine what the rank of this matrix will be. The trouble is that with
all that s floating around, these row operations will get a little tedious. Instead, we can simply
figure out when this matrix will have zero determinant.
Doing a cofactor expansion along the first row, we have:
1 3
3
3 + 5
3
+ 5
3
3
3
+5
3 = ( 1)
+ (3)
(3)
3 3
3 1
3 1
3
3 1
= ( 1)[( + 5)( 1) + 9] + 3[3( 1) + 9] 3[9 + 3( + 5)]
= ( 1)(2 + 4 + 4) + 3(3 + 6) 3(3 + 6)
= ( 1)( + 2)2 .
This, the given matrix has determinant equal to 0 only for = 1 and = 2, so these are the only
two values of for which the matrix is not invertible.
Warm-Up 2. Recall that the transpose of a matrix A is the matrix AT obtained by turning the
rows of A into the columns of AT . We claim that for any square matrix A, det AT = det A. This is
actually quite simple: doing a cofactor expansion along row i of AT is the same as doing a cofactor
expansion along column i of A, so both expansions will give the same value.
Determinants are linear in columns and rows. To say that determinants are linear in the
columns of a matrix means the following; to simplify matters, lets focus on a 3 3 matrix, but the
general case is similar. Suppose that the second column of a 2 2 matrix is written as the sum of
two vectors ~a and ~b:
|
|
|
~v1 ~a + ~b ~v3 .
|
|
|
Then
|
|
|
| | |
| | |
det ~v1 ~a + ~b ~v3 = det ~v1 ~a ~v3 + det ~v1 ~b ~v3
| | |
|
|
|
| | |
so that the determinant breaks up when splitting the column ~a + ~b into two pieces. What does
this have to do with linearity? We can define a linear transformation T from R3 to R by setting
| | |
T (~x) = det ~v1 ~x ~v3 .
| | |
Then this above property says that T (~a + ~b) = T (~a) + T (~b), which is the first property required in
order to say that T is a linear transformation. The second property, T (c~a) = cT (~a), is the second
linearity property of determinants:
|
|
|
| | |
det ~v1 c~a ~v3 = c det ~v1 ~a ~v3 ,
|
|
|
| | |
which says that scalars pull out when multiplied by a single column. Note that if two columns
were multiplied by c, then c would pull out twice and we would get c2 in front.
67
The same is true no matter which column is written as the sum of two vectors or no matter
which column is scaled by a number, and the same is true if we do this all with rows instead.
Example 1. We find the matrix of the linear transformation T from R2 to R defined by
a
3 a
T
= det
.
b
4 b
We have
1
3 1
0
3 0
= det
= 4 and T
= det
= 3,
0
4 0
1
4 1
so the matrix of T is A = 4 3 . Indeed, lets check that
T
a
a
a
3 a
A
= 4 3
= 4a + 3b, which is the same as T
=
= 3b 4a.
b
b
b
4 b
Row (and column) operations and determinants. Determinants behave in pretty simple
ways when performing row operations:
swapping rows changes the sign of a determinant,
scaling a row by a nonzero number multiplies a determinant by that same number, and
adding a multiple of one row to another does nothing to a determinant.
This last property is the one which makes these observations actually useful, and gives us a new
way to compute determinants. Note that in this last property that it is crucial that the row we are
replacing is not the row we are scaling; if instead we had scaled the row we replaced the determinant
would change, it would be scaled by that same amount.
Example 2. We compute the determinant of the matrix
3
4 1
3
0
1
A=
0 2 1
1 3 2
2
5
0
1
using row operations. We did this last time using a cofactor expansion, and it was a little tedious.
Row operations give us a bit of a smoother computation. We will reduce A, keeping track of how
the operations we do at each step affect the determinant.
First, we swap the first and fourth rows to get the 1 in the upper left corner. Note that if
instead we used something like 3IV + I IV , this does change the determinant were after since
we scaled the row we replaced; this is why were swapping rows first, so that the row additions we
do later do not affect the determinant. After the row swap, the determinant of the resulting matrix
is the negative of the one before:
3
4 1 2
1 3 2 1
3
0
1 5
0
1 5
0 2 1 0
0 2 1 0 , det A det A.
1 3 2 1
3
4 1 2
68
Now we do 3I + II II and 3I
1 3 2
3
0
1
0 2 1
3
4 1
1 3 2 1
1
5
0 9 7 8 , det A det A.
0 2 1 0
0
0 5 5 5
2
Next lets multiply the the last row by 51 , which scales the determinant by the same amount:
1
0
0
0
3
9
2
5
2
7
1
5
1 3 2
0 9 7
0 2 1
0 1 1
1
1
0
8
0
0
5
0
3
9
2
1
1
8
, det A 1 det A.
0
5
1
1 3 2 1
1
8
0 1 1 1 , 1 det A 1 det A.
0 2 1 0
0
5
5
0 9 7 8
1
1 3 2 1
1 3
0 1 1 1
0 1
0 2 1 0 0
0
0 9 7 8
0
0
Finally, we do 2III
1
0
0
0
2
7
1
1
2
1
1
1
, 1 det A 1 det A.
1 2 5
5
2 1
3 2
1
1 3 2
0 1 1
1 1
1
0
0 1 2
0 1
0 2 1
0
0
0
1
1
, 1 det A 1 det A.
2 5
5
3
Now, the whole point is that the determinant of this final matrix is super-easy to compute:
this matrix is upper-triangular, and the determinant of such a matrix is simply the product of its
diagonal entries:
1 3 2
1
0 1 1
1
= (1)(1)(1)(3) = 3.
det
0
0 1 2
0
0
0
3
But also, we found above that the determinant of this final matrix is
1
5
det A, so we get
1
det A = 3.
5
Thus det A = 15, as we computed using a cofactor expansion last time. So, now we have a new
method for computing determinants. Either this way or using a cofactor expansion will always
work. Actually, once you get used to using row operations, this method will almost always be
faster, but feel free to compute determinants in whatever way youd like.
69
a b c
d e f
det d e f = det a b c ?
g h i
g h i
A cofactor expansion along the first row of the first matrix looks like
a something b something + c something .
Notice that you get the same type of expression when taking a cofactor expansion of the second
matrix along the second row, except that all the signs change:
a something + b something c something .
The 2 2 determinants in this and the previous expression are exactly the same, so this last
expression is negative the first one, justifying the fact that swapping the first two rows changes the
sign the determinant.
Notice that if we swapped the first and third row instead, at first glance it seems that doing
a cofactor expansion along the first row in the original matrix and a cofactor expansion along the
third row of the result give the same expression since both look like
a something b something + c something .
However, if you look at the 22 determinants you get in this case it turns out that this is where the
extra negative signs show up, so that this type of row swap still changes the sign of the determinant.
As for the third type of row operation, consider something like
~r1
~r1
~r3
where the second matrix is the result of doing the row operation 3I + II II on the first. The
linearity property (in the second row in this case) of determinants tells us that:
~r1
~r1
~r1
det 3~r1 + ~r2 = 3 det ~r1 + det ~r2 .
~r3
~r3
~r3
The first matrix on the right is not invertible since it has linearly dependent rows, so its determinant
is zero. Hence were left with
~r1
~r1
det 3~r1 + ~r2 = det ~r2 ,
~r3
~r3
saying that this type of row operation does not change determinants.
70
Invertibility and nonzero determinants. Now we can justify the fact that a matrix is invertible
if and only if its determinant is not zero. Let A be a square matrix. There is some sequence of row
operations getting us from A to its reduced echelon form:
A rref(A).
Now, each row operation either changes the sign of the determinant, multiplies it by some nonzero
scalar, or does nothing, so the determinant of the final matrix ends up being related to the original
determinant by something like:
det(rref A) = kn k2 k1 (1)m det A,
where m is the number of row swaps we do and k1 , . . . , kn are the numbers we scale rows by
throughout. None of these are zero, so
det A = 0 if and only if det(rref A) = 0.
But det(rref A) = 0 only if some diagonal entry of rref A is zero, which happens if and only if rref A
is not the identity, in which case A is not invertible. Thus A is invertible if and only if det A = 0.
2 1 1
A = 2 0 3
3 1 2
using row operations. First we multiply the last row by 2, which multiplies the determinant by the
same amount:
2 1 1
2 1 1
2 0 3 2 0 3 , det A 2 det A.
3 1 2
6 2 4
Taking I + II II and
2
2
6
3I + III III
1 1
2
0 3
0
2 4
0
2 1 1
2
0 1 4 0
0 1 7
0
1 1
1 4 , 2 det A 2 det A.
1 7
determinant either:
1 1
1 4 , 2 det A 2 det A.
0
3
The determinant of this final matrix is 2(1)3 = 6, which should also equal 2 det A, so det A = 3.
71
Remark. Determinants also have the property that det(AB) = (det A)(det B), which we will
justify soon. For now, lets use this to prove some more basic formulas. First, if A is invertible,
then
1
det(A1 ) =
det A
which follows by taking the determinant of both sides of
AA1 = I
and using the fact that det(AA1 ) = (det A)(det A1 ). (Note that the fraction 1/ det A is defined
since det A 6= 0 for an invertible matrix.)
Second, if A and B are similar, so that A = SBS 1 for some invertible matrix S, we have
det A = det(SBS 1 ) = (det S)(det B)(det S 1 ) = det B
since det S and det S 1 cancel out according to the previous fact. Hence similar matrices always
have the same determinant.
Warm-Up 2. Say that A is a 4 4 matrix satisfying A3 = A5 . We claim that det A must be
0 or 1. Indeed, taking determinants of both sides of A3 = A5 and using the property from the
previous remark gives
(det A)3 = (det A)5 ,
and 0, 1, and 1 are the only numbers satisfying this equality.
To be complete, we give examples showing that each of these determinants is possible. First,
the zero matrix A satisfies A3 = A5 and det A = 0. Second, the identity matrix satisfies I 3 = I 5
and det I = 1. For the final possibility, the matrix
1 0 0 0
0 1 0 0
B=
0 0 1 0
0 0 0 1
satisfies B 3 = B 5 and det B = 1. Note that negative the identity matrix doesnt work for this
last example because although (I)3 = (I)5 , the 4 4 identity has determinant +1.
Determinants as areas and volumes. In the 2 2 case, the absolute value of
a b
det
= ad bc
c d
is the area of the parallelogram with sides formed by ( ac ) and db . Similarly, in the 3 3 case,
| det A| is the volume of the parallelepiped with sides formed by the columns of A.
Note that in both of these case it is the absolute value | det A| and not just det A itself which
is interpreted as an area or volume. It makes sense that this should be true: det A itself could be
negative, but areas and volumes cannot be negative.
Expansion factors. Say that T is a linear transformation from R2 to R2 given by T (~x) = A~x and
take some region D in the xy-plane. After applying T to the points of D we obtain a region T (D)
of the xy-plane called the image of D under T :
72
We want to compare the area of T (D) with that of D, and | det A| is precisely what allows us to
do this: the area of T (D) is | det A| times the area of D. So, | det A| is the factor by which areas
are altered by after applying T . Note that areas are indeed expanded when | det A| > 1 but are
actually contracted (or shrinked) when | det A| < 1. Regardless, we will always refer to | det A|
as an expansion factor.
Important. For T (~x) = A~x and a region D, we have
area or volume of T (D) = | det A|(area or volume of D)
where we use area in the 2-dimensional setting and volume in the 3-dimensional setting. This
is the crucial geometric interpretation of determinants.
The sign of the determinant. Before looking at examples, if the absolute value of a determinant
is giving us the expansion factor for the corresponding transformation, it is natural to wonder
what the sign of a determinant tells us. The sign of a determinant also has a nice geometric
interpretation in terms of whats called an orientation, and is relatively simple to state in the
2 2 case.
Suppose we have two vectors ~v1 and ~v2 where ~v2 occurs counterclockwise to the left of ~v1 ,
meaning you have to move counterclockwise to get from ~v1 to ~v2 ; for instance,
are both examples of when this happens. After we apply a transformation T (~x) = A~x we get two
new vectors T (~v1 ) and T (~v2 ), and we can ask whether T (~v2 ) occurs counterclockwise to the left
of T (~v1 ). The matrix A has positive determinant when this is true, and negative determinant when
it isnt. So, something like
73
corresponds to a matrix with det A < 0. The technical explanation is that matrices with positive
determinant preserve orientation while those with negative determinant reverse orientation. A
similar explanation works for 3 3 matrices, although it gets a little trickier to talk about what
orientation means in higher dimensions; well come back to this later when we do vector calculus
in the spring.
Remark. Recall that similar matrices have the same determinant. Now this makes sense: similar
matrices represent the same linear transformation only with respect to different bases, and thus the
expansion factor for each should be the same and so should the property of orientation preserving
or reversing.
Example 1. Consider the transformation T given by
a b
A=
.
c d
Taking D to be the unit square with sides the standard basis vectors ~e1 and ~e2 , we have
a
b
A~e1 =
and A~e2 =
.
c
d
The image T (D) of the unit square is then the parallelogram with sides ( ac ) and db , so the area
of this parallelogram is
area T (D) = | det A|(area D) = | det A|.
Hence | det A| is the area of the parallelogram with sides formed by the columns of A, recovering
the first geometric interpretation of determinants we gave above.
74
and
.
In the first case the area of the region gets larger so we would need | det A| > 1 while in the second
the area gets smaller so we would need | det A| < 1. We cannot have a matrix satisfying both, so
there is no much transformation.
Example 4. Suppose that T is a linear transformation from R3 to R3 with matrix A, and that
T sends a cube to a plane. We can ask whether T can then send a sphere of radius 5 to the unit
sphere. We have
volume of plane = | det A|(volume of cube).
But a cube has nonzero volume while a plane has zero volume, so this means that | det A| = 0 and
thus det A. Thus T cannot send a sphere of positive volume to another with positive volume.
Note that A is not invertible, which makes sense since T sends a 3-dimensional cube to a 2dimensional plane, so T collapses dimension. This means that rank A < 3, and we see that the
geometric interpretation of a determinant as an expansion factor gives us another way to see that
matrices of non-full rank (i.e. noninvertible matrices) must collapse dimension.
Justifying det(AB) = (det A)(det B). Suppose that A and B are 2 2 matrices and take some
region D in R2 . Applying the transformation B gives a region B(D) with
area of B(D) = | det B|(area of D).
75
Taking the resulting region B(D) and applying the transformation A gives a region A(B(D)) with
area of A(B(D)) = | det A|(area of B(D)) = | det A|| det B|(area of D.
Thus the composed transformation has expansion factor | det A|| det B|.
However, the matrix of this composition is equal to the product AB, so the expansion factor is
also | det(AB)| and hence
| det(AB)| = | det A|| det B|.
By considering the cases where each of these determinants are positive or negative, we get that
det(AB) = (det A)(det B).
For instance, suppose that det A and det B are both negative. We want to show that then
(det A)(det B) is positive, so det(AB) should be positive. But this makes sense: if the transformation B reverses orientation and A reverses it right back, the transformation AB will
preserve orientation and so will have positive determinant. Thus
det(AB) = (det A)(det B)
is true when det A and det B are both negative, and the other possibilities are similar to check.
Important. For square matrices A and B of the same size, det(AB) = (det A)(det B).
2 1
0 3
was irrelevant;
Motivation for eigenvalues and eigenvectors. Recall a previous example we did when talking
about coordinates, where we asked for a geometric interpretation of the linear transformation
T (~x) = A~x where
13 6
A = 1
12 .
We saw that if we use the basis vectors ( 21 ) and 3
for R2 , the matrix of T relative to this basis
1
is
10 0
,
0 15
meaning that T scales the axis span {( 21 )} by a factor of 10 and the axis span 3
by a factor
1
of 15. The lingering question is: why is this the right basis to use?
The key fact which made everything work is that these basis vectors satisfy
2
2
3
3
A
= 10
and A
= 15
.
1
1
1
1
The first of these equations says that 10 is an eigenvalue of
A with eigenvector ( 21 ) and the second
says that 15 is an eigenvalue of A with eigenvector 3
1 . Thus, by finding the eigenvalues and
eigenvectors of A is how we could determine the right basis to use above.
Definition of eigenvalues and eigenvectors. Say that A is a square matrix. We say that a
scalar is an eigenvalue of A if there is a nonzero vector ~x satisfying A~x = ~x; in other words,
is an eigenvalue of A if
A~x = ~x has a nonzero solution for ~x.
For such an eigenvalue , we call a nonzero vector ~x satisfying this equation an eigenvector of A
corresponding to the eigenvalue .
Important. Geometrically, the eigenvectors of a matrix A are those nonzero vectors with the
property that applying the transformation corresponding to A to them results in a multiple of that
vector; i.e. eigenvectors are scaled by the matrix A. In terms of axes, eigenvectors describe the
axes upon which A acts as a scaling, and the eigenvalues of A are the possible scalars describing
these scalings.
Remark. Lets be clear about why we require that the eigenvalue/eigenvector equation A~x = ~x
have a nonzero solution. The point is that for any scalar the equation A~x = ~x always has
at least one solution: ~x = ~0, so without this nonzero requirement any scalar would satisfy the
eigenvalue definition and the notion of an eigenvalue would not be very interesting. The key is that
A~x = ~x should have a solution apart from ~x = ~0.
Linguistic remark. The term eigen comes from the German word for proper or characteristic, and in older books you might see the phrase proper value or characteristic value instead
of eigenvalue. Nowadays the terms eigenvalue and eigenvector are much more standard, but the
old phrase is what suggests that eigenvalues capture something characteristic about a matrix.
Example 1. Suppose that A is the 2 2 matrix of a reflection across some line L in R2 . We
determine the eigenvalues of A using only geometry. First, note that we cannot possibly have a
vector satisfying something like
A~x = 2~x
77
since a reflection will never make a vector twice as long. In fact, since reflections preserve lengths,
the only way in which reflecting a vector ~x could result in a multiple of that vector is when that
vector satisfied either
A~x = ~x or A~x = ~x.
The first equation is satisfied for any nonzero vector the line L of reflection and the second for any
nonzero vector perpendicular to L. Thus, 1 is an eigenvalue of A and any nonzero ~x on L is an
eigenvector for 1, and 1 is also eigenvalue of A where any nonzero ~x perpendicular to L is an
eigenvector for 1.
Example 2. Consider the 2 2 matrix of a rotation by an angle 0 < < 180. As in the case of a
reflection, there is no way that rotating a vector by such an angle could result in a longer vector,
so only 1 or 1 might be possible eigenvectors. However, when rotating by an angle 0 < < 180,
no nonzero vector can be left as is, so no nonzero vector satisfies A~x = ~x, and no nonzero vector
will be flipped completely around, so no nonzero vector satisfies A~x = ~x. Thus neither 1 nor 1
are actually eigenvalues of A, and we conclude that A has no eigenvalues.
(Actually, we can only conclude that A has no real eigenvalues. After we talk about complex
numbers well see that A actually does have eigenvalues, they just happen to both be complex.
This hints at a deep relation between rotations and complex numbers.)
Having 0 as an eigenvalue. Let us note what it would mean for a matrix to have 0 as an
eigenvalue. This requires that there be some nonzero ~x satisfying
A~x = 0 ~x = ~0.
However, there can be such a nonzero vector only when A is not invertible, since this equation
would say that ~x is in the kernel of A. So, saying that a matrix has 0 as an eigenvalue is the
same as saying that it is not invertible, giving us yet another addition to the Amazingly Awesome
Theorem; this is probably the last thing well add to this theorem, and hopefully we can now all
see exactly why it is amazingly awesome.
Amazingly Awesome Theorem, continued. A square matrix A is invertible if and only if 0 is
not an eigenvalue of A.
Finding eigenvalues. The question still remains as to how we find eigenvalues of a matrix in
general, where a simple geometric interpretation might not be readily available as in the previous
examples. At first glance it might seem as if finding eigenvalues of a general matrix might be tough
since there are essentially two unknowns in the equation
A~x = ~x
we must consider: namely, and ~x are both unkown. In particular, it seems as though knowing
whether or not is an eigenvalue depends on knowing its eigenvectors ahead of time, but knowing
whether or not ~x is an eigenvector depends on knowing its eigenvalue ahead of time. However, it
turns out that we can completely determine the eigenvalues first without knowing anything about
its eigenvectors. Heres why.
To say that is an eigenvalue of A means that A~x = ~x should have a nonzero solution. But
this equation be can be rewritten as
A~x ~x = ~0, or (A I)~x = ~0
78
after factoring out ~x. (Note that we cant factor out ~x to get (A )~x = ~0 since it does not
make sense to subtract a scalar from a matrix. But this is easy to get around: we write A~x ~x as
A~x I~x and then we factor out ~x as desired.) So, to say that is an eigenvalue of A means that
(A I)~x = ~0
should have a nonzero solution. But this is only possible precisely when the matrix A I is not
invertible! (A solution ~x of this equation will then be in the kernel of A I.) And finally, A I
is not invertible precisely when its determinant is zero, so we get that
is an eigenvalue of A det(A I) = 0.
So, to find the eigenvalues of A we must solve the equation det(A I) = 0, and this does not
involve eigenvectors at all.
Definition. We call det(A I) the characteristic polynomial of A. Thus, the eigenvalues of A
are the roots of its characteristic polynomial.
Important. To find the eigenvalues of a matrix A, write down AI and then compute det(AI).
Setting this equal to 0 and solving for gives the eigenvalues of A.
Example 3. Recall the matrix A =
A I =
13 6
1 12
13
6
0
13 6
.
=
1
12
0
1 12
Hence
det(A I) = (13 )(12 ) 6 = 2 25 + 150
is the characteristic polynomial of A. (Hopefully this example makes it clear why we call refer
to this as a polynomial.) Since this factors as ( 10)( 15), the roots of the characteristic
polynomial of A are 10 and 15, so 10 and 15 are the eigenvalues of A, precisely as we said in our
motivating example.
Example 4. Let B = ( 11 32 ). Then
1
3
det(B I) = det
= (1 )(2 ) 3 = 2 3 1.
1
2
According to the quadratic formula, the roots of this are
3 9+4
3 + 13
3 13
=
, so
and
2
2
2
are the eigenvalues of B. In particular, this means that there should be a nonzero vector ~x satisfying
!
3 + 13
1 3
~x =
~x,
1 2
2
and such a vector is an eigenvector with eigenvalue
eigenvectors next time.
79
3+ 13
.
2
Remark. The 2 2 examples above illustrate a general fact about 2 2 matrices: in general, the
characteristic polynomial of A = ac db is
2 (a + d) + (ad bc).
The constant term is det A, and the sum a + d is called the trace of A and is denoted by tr A.
(The trace of any square matrix is the sum of its diagonal entries.) Thus we can rewrite this above
characteristic polynomial as
2 (tr A) + det A,
a nice formula which may help to simplify finding eigenvalues of 2 2 matrices. However, note
that there is no nice analog of this for larger matrices, so for larger matrices we have to work out
det(A I) by hand.
Example 5. Let C be the matrix
6 1 4
9 0 3 .
0 0 1
We have (using a cofactor expansion along the third row)
6 1
4
6 1
= (1 )( + 3)2 .
3 = (1 )
det(C I) = det 9
9
0
0 1
Thus the eigenvalues of C are 1 and 3. We will talk later about what it means for a 3 3 matrix
to only have two (real) eigenvalues.
4 2 2
A = 2 4 2 .
2 2 4
80
4
2
2
4
2
det(A I) = det 2
2
2
4
4
2
2 4
2
2
= (4 )
2
+ 2
2
4
2 4
2
2
4
2 4
2
+ 4
= (4 )
2
2
4
2
= (4 )(16 8 + 2 4) + 4(4 8 + 2)
= (4 )( 2)( 6) + 8( 2)
= ( 2)[(4 )( 6) + 8]
= ( 2)(2 + 10 16)
= ( 2)2 ( 8).
Thus the eigenvalues of A, which are the roots of the characteristic polynomial, are 2 and 8. Since
2 appears twice in the characteristic polynomial and 8 appears once, we say that the
eigenvalue 2 has algebraic multiplicity 2 and the eigenvalue 8 has algebraic multiplicity 1.
Remark. For the matrix A above, you can check that det A = 32. Note also that this is what you
get when you multiply the eigenvalues of A together, using 2 twice since it has multiplicity 2:
det A = 2 2 8.
This is true in general: for any square matrix A, det A equals the product of the eigenvalues of
A taking into account multiplicities and possibly having to use complex eigenvalues, which well
talk about later. We can see this using the fact that the eigenvalues of A are the roots of its
characteristic polynomial: if the eigenvalues are 1 , . . . , n , the characteristic polynomial factors as
det(A I) = (1 )(2 ) (n )
and setting = 0 in this expression gives det A = 1 2 n .
This makes sense geometrically: each eigenvalue tells us the amount by which A scales a certain
direction (the direction corresponding to an eigenvector), and so the overall expansion factor
corresponding to A is the product of these individual scaling factors.
Finding eigenvectors. Now that we know how to find the eigenvalues of a matrix, the next step is
to find its eigenvectors, which geometrically are the vectors on which your matrix acts as a scaling.
But we essentially worked out how to do this last time: recall that we derived the condition that
det(A I) = 0 for an eigenvalue using the fact that
A~x = ~x is the same as (A I)~x = ~0.
A nonzero vector satisfying the first equation is an eigenvector with eigenvalue , so this should be
the same as a nonzero vector satisfying the second equation. Thus the eigenvectors corresponding
to are precisely the nonzero vectors in the kernel of A I!
Important. For a square matrix A with eigenvalue , the eigenvectors of A corresponding to
are the nonzero vectors in ker(A I). We call this kernel the eigenspace of A corresponding to .
81
0 0
1 2
so a possible basis for ker(A 10I) is given by ( 21 ). That is,
2
eigenspace of A corresponding to 10 = span
,
1
which geometrically is the line passing through ( 21 ) and the origin. The claim is that any nonzero
vector on this line is an eigenvector of A with eigenvalue 10, which geometrically means that
anything on this line is scaled by a factor of 10 after under the transformation corresponding to A.
For good measure, note that
13 6
2
20
2
=
= 10
,
1 12
1
10
1
so ( 21 ) is indeed an eigenvector of A with eigenvalue 10.
The eigenvectors corresponding to 15 are the nonzero vectors in the kernel of
2 6
A 15I =
.
1 3
This matrix reduces to
2 6
1 3
,
1 3
0 0
82
so
3
1
3
45
3
=
= 15
,
1
15
1
and similarly anything on the line spanned by 3
is scaled by 15 under the transformation
1
corresponding to A.
13 6
1 12
Remark. This matrix above is the one we looked at last time when motivating eigenvalues and
eigenvectors, and came from a previous example dealing with coordinates. If you go back to that
coordinate example (from November 1st on the Week 6 Lecture Notes), it should now be clear why
we used the basis we did in that example: those basis vectors are precisely the eigenvectors we
found above, and give us the directions along which A acts as a scaling!
7 2 . First, the characteristic polynomial
Example 2. We find bases for the eigenspaces of B = 4
1
of B is
7
2
det(B I) =
= 2 8 + 15 = ( 5)( 3).
4 1
Thus the eigenvalues of B are 5 and 3. For the eigenvalue 5 we have
2
2
1 1
B 5I =
4 4
0 0
so
1
1
forms a basis for the eigenspace of B corresponding to 5. For the eigenvalue 3 we have
B 3I =
so
1
2
4
2
4 2
2 1
,
0 0
7 2
4 1
1
1
7 2
1
1
=3
,
and
=5
2
1
4 1
2
1
so our proposed basis eigenvectors are indeed eigenvectors with the claimed eigenvalues.
(Of course, the matrix we got above when reducing B 3I is not in reduced echelon form; if
you had put it into reduced form you might have gotten 1/2
as a basis vector. But of course,
1
1
this vector and 2 have the same span, so they are both bases for the same eigenspace. I used
the vector I did to avoid fractions, which is something you should be on the lookout for as well.)
Example 3. We find bases for the eigenspaces of the matrix A from the Warm-Up. The eigenvalues
were 2 with algebraic multiplicity 2 and 8 with multiplicity 1. For 2 we have:
2 2 2
1 1 1
A 2I = 2 2 2 0 0 0 ,
2 2 2
0 0 0
83
giving
1
1
1 and 0
0
1
as a basis for the eigenspace corresponding to 2. You can check that multiplying A by either of
these does give 2 times that same vector, as should happen if these are eigenvectors with eigenvalue
2. For the eigenvalues 8 we have:
4 2
2
4 2 2
A 8I = 2 4 2 0 6 6 ,
2
2 4
0
0 0
so
1
1
1
is a basis for the eigenspace corresponding to 8. (I jumped some steps here which you might want
to fill in. In fact, since I know that this eigenspace will only be 1-dimensional, if I want to get a
basis for this eigenspace all I need to do is find one nonzero vector in ker(A 8I), and the vector
I gave is such a vector. How did I know that this eigenspace would only be 1-dimensional? More
on that in a bit.)
This matrix was also one we looked at when dealing with coordinates (Example 2 from November
1st), and low-and-behold the basis eigenvectors we found here were precisely the basis vectors we
used in that example. Again, this was no accident ;)
Eigenvectors for different eigenvalues are linearly independent. Note in these three examples that in each case the eigenvectors we found for different eigenvalues turned out to be linearly
independent. In fact this is always true: for a square matrix A, if ~v1 , . . . , ~vk are eigenvectors of A
corresponding to distinct eigenvalues 1 , . . . , k , then ~v1 , . . . , ~vk must be linearly independent.
The book has a full justification for this, but to get a feel for it lets just work it out when
k = 3, so we have eigenvectors ~v1 , ~v2 , ~v3 of A corresponding to the different eigenvalues 1 , 2 , 3 .
To show that ~v1 , ~v2 , ~v3 are linearly independent, we setup the equation
c1~v1 + c2~v2 + c3~v3 = ~0
(4)
and show that for this to be true all coefficients must be zero. Multiplying this through by A and
using the fact that
A~v1 = 1~v1 , A~v2 = 2~v2 , and A~v3 = 3~v3 ,
we get
c1 1~v1 + c2 2~v2 + c3 3~v3 = ~0.
Now, multiplying equation (1) through by 1 gives
c1 1~v1 + c2 1~v2 + c3 1~v3 = ~0.
Subtracting this from the previous equation gets rid of c1 1~v1 , giving
c2 (2 1 )~v2 + c3 (3 1 )~v3 = ~0.
84
(5)
4
C = 0
0
the eigenspaces of
1 0
4 1 .
0 4
Here, the only eigenvalue is 4, since the eigenvalues of any upper-triangular (or lower-triangular)
matrix are simply the entries on its diagonal; indeed, the characteristic polynomial of C is (4 )3 .
We have
0 1 0
C 4I = 0 0 1 ,
0 0 0
and basis for ker(C 4I), and hence a basis for the eigenspace of C corresponding to 4, is given by
1
0 .
0
Geometric multiplicities and eigenbases. Note that something happened in the above example which had not happened in previous examples: even though the eigenvalue 4 has algebraic
multiplicity 3, the dimension of the eigenspace corresponding to 4 is only 1. In previous examples, it was always true that the dimension of each eigenspace was equal to the multiplicity of the
corresponding eigenvalue.
85
5 8 1
1 0 25
2 3 0
A = 0 0 7 , B = 3 2 15 , and C = 4 3 0 .
0 0 2
1 0
9
0 0 6
86
First, A is upper-triangular so its eigenvalues are its diagonal entries: 5, 0, 2. The algebraic multiplicity of each is 1, so the geometric multiplicity of each is also 1. Thus each eigenspace should
only have 1 basis eigenvector. We have:
0 8
1
0 8 1
1
A 5I = 0 5 7 0 0 1 , so a basis for this eigenspace is 0 ,
0 0 7
0 0 0
0
5 8 1
5 8 1
8
A 0I = 0 0 7 0 0 1 , so a basis for this eigenspace is 5 ,
0 0 2
0 0 0
0
7 8 1
1 8/7 1/7
54
A (2)I = 0 2 7 0 1 7/2 , so a basis for this eigenspace is 49 .
0 0 0
0 0
0
14
Note that in the last one if you use the given reduced form to get a basis for ker(A + 2I) you might
end up with
1/7 56/14
7/2 ,
1
which is fine, but to get a cleaner eigenvector I set the free variable equal to 14 to end up with
only integer entries. Note that
5 8 1
54
54
0 0 7 49 = 2 49
0 0 2
14
14
so the vector I used is indeed an eigenvector of A with eigenvalue 2. In this case putting all three
eigenvectors we found into one list gives an eigenbasis of R3 .
The characteristic polynomial of B is
det(B I) = ( 2)( 4)2 ,
so 2 is an eigenvalue of algebraic multiplicity 1 and 4 an eigenvalue with algebraic multiplicity
2. Thus we know that the eigenspace corresponding to 2 is 1-dimensional and the eigenspace
corresponding to 4 is either 1 or 2-dimensional. We have:
3 0 25
3 0 25
0
B 2I = 3 0 15 0 0 1 , so a basis for this eigenspace is 1 ,
1 0
7
0 0 0
0
5 0
25
5
0
25
5
B 4I = 3 2 15 0 10 0 , so a basis for this eigenspace is 0 .
1 0
5
0
0
0
1
In this case, there are only two linearly independent eigenvectors, so there can be no eigenbasis for
R3 consisting of eigenvectors of B.
Finally, the characteristic polynomial of C is
det(C I) = ( + 1)( 6)2 ,
87
so the eigenvalues are 1, with algebraic multiplicity 1, and 6, with algebraic multiplicity 2. We
have:
3 3 0
1 1 0
1
C (1)I = 4 4 0 0 0 1 , so a basis for this eigenspace is 1 ,
0 0 7
0 0 0
0
4 3 0
4 3 0
0
3
4 3 0
0 0 0 , so a basis for this eigenspace is
4 , 0 .
C 6I =
0
0 0
0 0 0
0
1
In this case, the geometric multiplicity of each eigenvalue is the same as its algebraic multiplicity,
and the three linearly independent eigenvectors we found gives an eigenbasis of R3 .
Web search rankings. When you search for something on the internet, whatever search engine
you use takes your search terms and goes through its catalog of all possible web pages, picking out
the ones which might in someway be relevant to your search. What then determines the order in
which these resulting pages are presented, with whatever the engine thinks is most relevant being
listed first? The answer heavily depends on the theory of eigenvectors!
To see how eigenvectors naturally come up in such a problem, lets consider a simplified version
of the internet with only three web pages:
where the arrows indicate links from one page to another. The basic assumption is that sites with
links from relevant pages should themselves be relevant, and the more links from relevant pages
it has the more relevant it is. Lets the denote the relevant of page k by xk . The goal is to find
values for these, which then determine the order in which our three pages are listed after a search,
with the one with largest xk value appearing first.
The assumption that a pages relevance depends on links coming to it from relevant pages turns
into a relation among x1 , x2 , x3 . For instance, page 1 has links from page 2 and from page 3, so x1
should depend on x2 and x3 . Since page 2 only has one link coming out of it, its entire relevance
contributes to the relevance of page 1, while since page 3 has two links coming out of it, only half
of its relevance contributes to that of page 1 with the other half contributing to the relevance of
page 2. This gives the relation
x3
x1 = x2 + .
2
Similarly, page 2s relevance comes from that of pages 1 and 3, with half of x1 contributing to x2
and half of x3 contributing to x2 since each of pages 1 and 3 have two links coming out of them;
this gives
x1 x3
+ .
x2 =
2
2
88
x1
2
since page 1 has two links coming out of it. The resulting
x3
x1 = x2 +
2
x1 x3
x2 =
+
2
2
x1
x3 =
2
This was a really simplified example, but the basic idea works for real-life web searches: ranking
orders are determined by looking at eigenvectors of some matrices whose entries have something to
do with links between pages. All modern search engines somehow use this idea, with various tweaks.
In particular, Googles ranking algorithm, known as PageRank, works as follows. Given some
search terms, take all pages that might have some relevance; this likely gives over a million pages.
Define the matrix A by saying that it ij-th entry is 1 is page i links to page j, and 0 otherwise.
As with the example above, now some modifications are done to weight the relevances we want,
with more links to a page give that page a higher weight; in its most basic form this amounts to
replacing A by something of the form
D+A
where D is some type of weighting matrix. (The exact nature of D is one of Googles trade
secrets, as well as any additional modifications which are done to A.) The claim is that the
rankings determined by Googles search engine come from dominant eigenvectors of D + A, which
are eigenvectors corresponding to the largest eigenvalue. In practice, such eigenvectors are almost
impossible to find directly, even for a computer, since D + A will be some huge (over 1000000
1000000 in size) matrix, but fortunately there exist good algorithms for approximating the entries
of these dominant eigenvectors.
The internet would surely be a much different place if it werent for the existence of eigenvectors
and eigenvalues!
Population models. Suppose we have populations of deer and wolves in some forest, with x1 (t)
denoting the population of deer at time t and x2 (t) the population of wolves at time t. We are
interested in understanding the long-term behavior of these two. The basic assumption is that the
rate at which these changes (i.e. the values of their derivatives) depend on the values of both at
any specific time.
For instance, since wolves feed on deer, the rate of change in the population of deer should obey
something like
x01 (t) = (positive)x1 (t) + (negative)x2 (t)
where the first term comes from deer reproducing (so having a positive effect on population) and
the second from deer lost due to the population of wolves (so having a negative effect on deer
population. Similarly, the rate of change in the population of wolves might be something like
x02 (t) = (positive)x1 (t) + (positive)x2 (t)
89
since the more wolves there are the more wolves there will be, and the more food there is the more
wolves there will be. In reality, these are somewhat naive assumptions since there are many other
factors contributing to these populations, and maybe in fact since a forest can only support so
many wolves, maybe x02 (t) should actually depend negatively on x2 (t). Regardless, well just use
this basic model.
Suppose that our populations are modeled by
x01 (t) = 13x1 (t) 6x2 (t)
x02 (t) = x1 (t) + 12x2 (t),
which again probably isnt very realistic, but whatever. This is what is known as a system of linear
differential equations, and we are interested in determining what functions x1 (t), x2 (t) satisfy these
equations. A key observation is that this system can be written as
0
x1 (t)
13 6
x1 (t)
=
.
x02 (t)
1 12
x2 (t)
Suppose that the functions we want have the form
x1 (t) = r1 et and x2 (t) = r2 et .
Plugging this into the rewritten system gives
t
r1 e
13 6
r1 et
.
=
1 12
r2 et
r2 et
Since et is never zero, we can divide both sides through by it to get
r1
13 6
r1
=
,
r2
1 12
r2
which says that the unknowns , r1 , r2 in our expressions for x1 (t) and x2 (t) come from eigenvalues
13 6
and eigenvectors of the matrix 1
12 ! So, finding these eigenvectors is how we are able to find
solutions of our population model.
This is a matrix weve seen before, where we computed that its eigenvalues were 10 and 15 with
corresponding eigenvectors
2
3
and
.
1
1
Thus we get as solutions of our model:
10t
x1 (t)
2e
x1 (t)
3e15t
=
and
=
.
x2 (t)
e10t
x2 (t)
e15t
As part of the general theory you would learn about for systems of differential equations in a more
advanced differential equations course, it turns out that the general solution of the system above
looks like
10t
2e
3e15t
x1 (t)
= c1
+ c2
.
x2 (t)
e10t
e15t
This helps us to visualize our solutions and determine the long-term behavior of our system. Plotting such solutions (for varying c1 and c2 ) on the x1 x2 -axes gives something which looks like:
90
The red lines are determined by the directions corresponding to the eigenvectors we found, and
each green curve represents a solution for some specific c1 and c2 . The observation is that no matter
which solution were on (say were at the orange dot at time t = 0), we will also moves towards
the line determined by the eigenvector with eigenvalue 15 as t , essentially because this is the
larger eigenvalue. So, long-term, no matter what the initial population of deer and wolves are, the
populations will always approach some ideal populations determined by the eigenvalue 15.
Again, this is all something you would learn more about in any course which heavily uses
differential equations. This was based on a population model, but the same types of models show
up in economics and finance, chemistry, engineering, and pretty much everywhere. Wed be lost in
all this applications were it not for eigenvalues and eigenvectors!
Eigenfunctions. The previous type of application suggests a strong relation between derivatives
and matrices, which indeed we will come to next quarter when we do multivariable calculus. But
heres another fun realization.
In this class, all spaces weve dealt with are either Rn or subspaces of Rn . However, linear
algebra works in more general types of settings; in particular, in other contexts we can consider
spaces of functions whose elements are themselves functions. Say that V is such a space,
containing say the function f (x) = x2 , or f (x) = ex , or f (x) = sin x, etc. We can define an
operation D from V to V by
D(f ) = f 0 .
In other words, D is the transformation which takes as input a function and spits out its derivative. The well-known properties of derivatives which say:
(f + g)0 = f 0 + g 0 and (cf )0 = cf 0 for a scalar c
then become the statements that
D(f + g) = D(f ) + D(g) and D(cf ) = cD(f ),
so that D is actually a linear transformation in this more general context!
Now, observe that since the derivative of ex is ex we have
D(ex ) = ex ,
so ex is an eigenvector of D with eigenvalue 1! Also, the derivative of e2x is 2e2x :
D(e2x ) = 2e2x ,
91
so e2x is an eigenvector of D with eigenvalue 2. Such functions are called eigenfunctions of D, and
D is called a differential operator. The study of differential operators and their eigenfunctions has
led to deep advancements in physics, chemistry, economics, and pretty much anywhere differential
equations show up. Again, things you would learn about in more advanced courses.
Remark. Again, todays lecture was outside the scope of this course, and is purely meant to
illustrate how eigenvectors and eigenvalues show up in various contexts. Hopefully you can now
somewhat better appreciate why we spend time learning about these things!
are similar, as well see, but ( 01 ) is an eigenvector of the first which is not an eigenvector of the
second. On the other hand, it is true that two matrices with different eigenvalues or multiplicities
cannot be similar.
Warm-Up 2. We find bases for the eigenspaces of
2 1 0
2 5 5
A = 0 2 0 and B = 0 3 1 .
0 0 3
0 1 3
Since A is upper-triangular, its eigenvalues are 2 with algebraic multiplicity 2 and 3 with multiplicity 1. Thus the eigenspace corresponding to 2 is 1 or 2-dimensional, while the eigenspace
corresponding to 3 is 1-dimensional. We have:
0 1 0
0 1 0
1
A 2I = 0 0 0 0 0 1 , so a basis for E2 is 0 ,
0 0 5
0 0 0
0
5 1 0
1 0 0
0
A + 3I = 0 5 0 0 1 0 , so a basis for E3 is 0 .
0 0 0
0 0 0
1
(Note that E = ker(A I) is just notation for the eigenspace corresponding to .) Putting these
basis vectors together only gives two linearly independent eigenvectors, so there does not exist an
eigenbasis of R3 associated to A.
Using a cofactor expansion along the first column, the characteristic polynomial of B is
2 5
5
3 1
det(B I) = 0
0
1 3
3 1
= (2 )
1 3
= (2 )(2 6 + 8)
= ( 2)2 ( 4).
Thus the eigenvalues of B are 2 with algebraic multiplicity 2 and 4 with multiplicity 1. We have:
0 5 5
0 1 1
0
1
B 2I = 0 1 1 0 0 0 , so a basis for E2 is 0 , 1
0 1 1
0 0 0
0
1
2 5 5
2 5 5
5
0 1 1
0
1 1 , so a basis for E4 is 1 .
B 4I =
0 1 1
0
0 0
1
Putting these basis vectors together gives an eigenbasis of R3 associated to B, meaning that
0
5
1
0 , 1 , 1
0
1
1
93
is a basis of R3 consisting of eigenvectors of B. Note that here the geometric multiplicity of each
eigenvalue is the same as its algebraic multiplicity.
Eigenbases are good. Finally we come to the question: why do we care about eigenbases, and
whether or not a matrix gives rise to one? The answer is one weve been hinting at for a while
now. Consider the matrix B from the Warm-Up, and the associated transformation T (~x) = B~x.
The matrix of T relative to the eigenbasis B we found turns out to be
2 0 0
[T ]B = 0 2 0 ,
0 0 4
precisely because each basis vector was an eigenvector of B! Indeed, the fact that T (~vi ) = i vi for
each of these basis vectors tells us that the coordinate vector of T (~vi ) simply has i in the i-th
position and zeroes elsewhere, which is why the matrix of T turns out to be diagonal. This is good,
since it says that geometrically T scales the axes corresponding to these specific basis eigenvectors
by an amount equal to the corresponding eigenvalue.
In general, given some transformation T (~x) = A~x, the only possible bases relative to which the
matrix of T is diagonal are ones where each basis vector is an eigenvector of A, since having the
i-th column in the matrix of T relative to this basis be of the form
0
..
.
i-th column of [T ]B =
i ,
..
.
0
with i in the i-th position, requires that the i-th basis vector vi satisfy A~vi = i~vi . In other words,
such a basis must be an eigenbasis corresponding to A!
Definition. A square matrix A diagonalizable if it is similar to a diagonal matrix; i.e. if there
exists an invertible matrix S and a diagonal matrix D satisfying A = SDS 1 . To diagonalize
a matrix A means to find such an S and D and to express A as A = SDS 1 . Geometrically,
diagonalizable matrices are the ones for which there exists a complete set of axes for Rn upon
which the corresponding transformations acts via scalings.
Important. An n n matrix A is diagonalizable precisely when it gives rise to an eigenbasis of
Rn . Thus, to diagonalize A (if possible):
(i) find all eigenvalues of A,
(ii) find a basis for each eigenspace of A, and
(iii) count the total number of basis eigenvectors you find and see if you have n of them.
If so, A is diagonalizable and A = SDS 1 with S being the matrix having the eigenvectors you
found as columns and D being the diagonal matrix with the corresponding eigenvalues down the
diagonal. If you end up with fewer than n basis eigenvectors, A is not diagonalizable.
Remark. If you only want to determine diagonalizability without explicitly finding an eigenbasis
(i.e. without explicitly diagonalizing it), it is enough to check whether the geometric multiplicity
94
of each eigenvalue is the same as its algebraic multiplicity: these have to be equal in order for a
matrix to be diagonalizable.
3 . The eigenvalues of A are 5 and 3, so already we know that A is
Example 1. Let A = 14 3
diagonalizable: the algebraic multiplicity of each eigenvalue is 1 and so the geometric multiplicity
of each eigenvalue must also be 1. Finding a basis for each of ker(A + 5I) and ker(A 3I) gives
1
3
E5 = span
and E3 = span
.
2
2
1 3
2
Thus
2 , ( 2 ) forms an eigenbasis for R and we can diagonalize A as
1
1 3
1 3
5 0
1 3
=
.
4 3
2 2
0 3
2 2
Note that the order in which we write the eigenvalues in the diagonal matrix D matters: they
should correspond to the order in which we write the eigenvectors as columns of S.
Of course, a diagonalization of A is not unique. For one thing, we can change the order of the
columns of S and the order of the eigenvalues in D:
1
3 1
3 0
3 1
1 3
,
=
2 2
0 5
2 2
4 3
2
or we can use different eigenvectors altogether; for instance, 4
is also an eigenvector of A with
eigenvalue 5 and ( 96 ) is another eigenvector with eigenvalue 3 so
1
2 9
5 0
2 9
1 3
=
4 6
0 3
4 6
4 3
as well. There is no preference for one diagonalization over another, except that trying to avoid
fractions might be a good idea.
1 1
Example 2. Let B = 41 1
2 . This only has one eigenvalue, namely 3. Since B 3I = 1 1 ,
we only come up with one basis eigenvector for the eigenspace corresponding to 3. With only
one eigenvalue, there are no other eigenspaces which could produce basis eigenvectors, so B is not
diagonalizable.
Actually, there is a way to see this is true only knowing that B has one eigenvalue. In general,
if A is an n n diagonalizable matrix with only one eigenvalue , then A must equal I. Indeed,
A diagonalizable gives A = SDS 1 with D diagonal, but if is the only eigenvalue of A then D
must be D = I. But then
A = SDS 1 = S(I)S 1 = (SIS 1 ) = I,
so A must have been A = I to start with. In the example above, since the only eigenvalue of B
is 3, if B was going to be diagonalizable it must have been equal to 3I, which it is not.
Back to Warm-Up. The matrix A from the second Warm-Up is not diagonalizable, while the
matrix B is. Using the eigenbasis we found associated to B, we can diagonalize B as
1
2 5 5
1 0 5
2 0 0
1 0 5
0 3 1 = 0 1 1 0 2 0 0 1 1 .
0 1 3
0 1 1
0 0 4
0 1 1
95
1
1 0 1
2 0 0
1 0 1
A = 0 0 1 0 2 0 0 0 1 .
1 1 1
0 0 3
1 1 1
Thus by computing out the right-hand side, we can figure out what A actually is.
this is totally unnecessary! To save this extra work, I claim that we can actually find
BUT,
3
A 1 without knowing A explicitly. Think about what is going on: we have a basis for R3 made up
2
of eigenvectors of A, and we know that
3 A acts as a scaling on each of those basis vectors. Thus, if
we know how to express the vector 1 in terms of that basis, then we can use linearity properties
2
3
to easily determine A 1 .
2
To be clear, since the given vectors form a basis of R3 there are coefficients satisfying.
3
1
0
1
1 = c1 0 + c2 0 + c3 1 .
2
1
1
1
Multiplying by A gives
3
1
0
1
A 1 = A c1 0 + c2 0 + c3 1
2
1
1
1
1
0
1
= c1 A 0 + c2 A 0 + c3 A 1
1
1
1
1
0
1
as desired. Again, note that we still dont even know what A actually is, and yet using the same
method as above we can in fact compute A~x for any possible ~x!
Remark. This idea, that we can determine how a (diagonalizable) linear transformation acts
without explicitly knowing that transformation, lies at the core of many important applications
of eigenvalues and eigenvectors. In practice, it is often the case that you have enough data to
determine enough eigenvectors and eigenvalues of some transformation without explicitly knowing
what that transformation is, and if youre lucky this is enough information to do what you want.
In particular, most computations in quantum physics are based on this idea ;)
0
A = 3
1
which
0 3
k 3
0 2
1 0 3
1 0 3
A I = 3 0 3 0 0 12 ,
1 0 3
0 0 0
so E1 is 1-dimensional. Hence we only get one basis eigenvector for = 1, and together with the
basis eigenvector for 3 we only get two overall, so A is not diagonalizable.
If k = 3, then again there are two eigenvalues, but now 1 has algebraic multiplicity 1 and
3 has algebraic multiplicity 2. We will get one basis eigenvector corresponding to 1, and since
(keeping in mind that k = 3)
3 0 3
3 0 3
A + 3I = 3 0 3 0 0 0
1 0 1
0 0 0
97
has a 2-dimensional kernel, E3 is two dimensional so we get two basis eigenvectors. These together
with the basis eigenvector for 1 gives three in total, so A is diagonalizable.
To summarize, A is diagonalizable for all k 6= 1. Note however that the reasons differ for k 6= 3
and k = 3: in the former case there are three distinct eigenvalues, while in the latter there are
only two but the geometric multiplicity of each eigenvalue agrees with its algebraic multiplicity.
Warm-Up 2. Suppose that A is diagonalizable and that A is similar to B. We claim that B must
also be diagonalizable. Indeed, A diagonalizable gives
A = SDS 1
for some invertible S and diagonal D, while A similar to B gives
A = P BP 1
for some invertible P . Then
SDS 1 = P BP 1 , so B = P 1 SDS 1 P = (P 1 S)D(P 1 S)1 ,
so B is similar to the diagonal matrix D and is hence diagaonlizable.
Computing powers. Why do we care about diagonalizable matrices? Here is perhaps the main
practical reason: if A = SDS 1 , then
Ak = SDk S 1 .
Indeed, if you write out Ak as (SDS 1 )k :
(SDS 1 )(SDS 1 )(SDS 1 ) (SDS 1 ),
note that all the S 1 S terms cancel out so were left with the first S, then a bunch of Ds, and the
final S 1 . In addition, if D is diagonal, its power are easy to compute:
k k
1 0
1 0
..
..
,
=
.
.
0
kn
that is, Dk is the diagonal matrix whose entries are the k-th powers of the diagonal entries of D.
Putting this all together gives a relatively easy way to find Ak when A is diagonalizable.
Important. If A is diagonalizable and A = SDS 1 with D diagonal, then Ak = SDk S 1 where
Dk is still diagonal with diagonal entries equal to the k-th powers of the diagonal entries of D.
3
as
Example 1. Last time we diagonalized 14 3
1
1 3
1 3
5 0
1 3
=
.
4 3
2 2
0 3
2 2
Then
k
k
1
1
1 3
1 3
5 0
1 3
1 3
(5)k 0
1 3
=
=
.
4 3
2 2
0 3
2 2
2 2
2 2
0
3k
98
The right side is now pretty straightforward to compute, and so we get a concrete description of
1 3 k for any k > 0.
4 3
Example 2. Say we want to solve
100
1 1 1
4
2 2 1 ~x = 1 .
2 2 1
2
The wrong way to go about is to try to actually this 100th power directly. Instead, the matrix
1 1 1
2 2 1
2 2 1
has three distinct eigenvalues 0, 1, and 1, so it is diagonalizable. Finding a basis for each eigenspace
gives one possible diagonalization as:
1
1 1 1
1 0 1
0 0 0
1 0 1
2 2 1 = 1 1 1 0 1 0 1 1 1 .
2 2 1
0 1 1
0 0 1
0 1 1
From this we have
100
1
1 1 1
2 2 1 = 1
0
2 2 1
1
= 1
0
0
0 1
0
1 1
0
1 1
0
0 1
1 1 0
0
1 1
1 0 1
0
1100
0 1 1 1
0 1 1
0
(1)100
1
1 0 1
0 0
1 0 1 1 1 .
0 1 1
0 1
0
0
1 1
1 0 1
1 1 1 = 1 1
0 ,
1 1 1
0 1 1
so
100
1 1 1
1
2 2 1 = 1
2 2 1
0
= 1
0
1
= 1
0
1
= 0
0
1
0 1
0 0 0
1 0 1
1 1 0 1 0 1 1 1
1 1
0 0 1
0 1 1
0 1
0 0 0
0
1 1
1 1 0 1 0 1 1
0
1 1
0 0 1
1 1 1
0 1
0
0 0
1 1 1 1 0
1 1
1 1 1
1 1
2 1 .
2 1
99
100
1 1 1
1 1 1
4
2 2 1 ~x = 0 2 1 ~x = 1
2 2 1
0 2 1
2
fairly straightforwardly, and it turns out that there are no solutions. The point is that diagonalizing
1 1 1
2 2 1
2 2 1
gave us a direct way to compute its powers.
Remark. The two examples which follow are only meant to illustrate how computing powers of
a matrix might come up in applications, but as said in the intro you will not be expected to know
how to do these types of examples on the final.
Fibonacci numbers. The Fibonacci numbers are the numbers defined as follows: start with 1, 1,
and take as the next term the sum of the previous terms. So, the first few fibonacci numbers are
1, 1, 2, 3, 5, 8, 13, 21, 35, . . . .
Our goal is to find an explicit expression for the n-th Fibonacci number Fn . The key is the equation:
Fn+2 = Fn+1 + Fn ,
which says precisely that each term is the sum of the two previous terms. Take this equation and
throw in the silly-looking equation Fn+1 = Fn+1 to get the system
Fn+2 = Fn+1 + Fn
Fn+1 = Fn+1 .
Now, this can be written in matrix form as
Fn+2
Fn+1
1 1
=
,
Fn+1
1 0
Fn
and thus the matrix ( 11 10 ) tells us how to move from Fn and Fn+1 to Fn+1 and Fn+2 . Similarly,
Fn+1
1 1
Fn
=
,
Fn
1 0
Fn1
and combining this with our previous equation gives
Fn+2
Fn+1
2
1 1
1 1
Fn
1 1
Fn
=
=
.
1 0
1 0
Fn1
1 0
Fn1
100
1+ 5
1 5
+ =
and =
.
2
2
We have
A I =
1
1
1
1
1
1
,
0
0
and
.
1
1
Thus ( 11 10 ) diagonalizes as
1 1
1 0
and so
1 1
1 0
=
n
+
1
1
=
1
+ 0
+
,
0
1
1
+
1
1
1
n
+
+ 0
.
0 n
1
1
+
1
1
1
1
=
5
1
,
1 +
so
1 1
1 0
n
n
1 +
1
+ 0
=
0 n
1 +
1
5 1
n
1 +
+ n+
=
n + n
1
5 1
n+1
1 + n+1
n+1
+ + n+1
=
.
n+ n
n+ + + n
5
,
=
n
n
n
n
1
Fn+1
+
+ + +
5
and multiplying out the right-hand side gives an explicit expression for Fn+1 ; after readjusting n
we get an explicit expression for Fn . Note that this expression can still be simplified further, since
for instance
!
!
1 5
1+ 5
+ =
= 1.
2
2
But, you get the idea.
Remark. Again, this is not a computation youd be expected to be able to on the final since, as
you can see, although fairly straightforward it does get a little messy. The system above where we
101
start with a vector and repeatedly multiply by the same matrix in order to generate new vectors
is an example of whats called a discrete dynamical system. Diagonalization plays a big role in the
study of such systems, and in a related concept known as a Markov chain. These are topics you
would perhaps come across in some later course, in other departments as well.
3
Matrix exponentials. Take A = 14 3
to be the matrix from Example 1. We want to compute
the matrix exponential eA . The first question is: what on Earth is meant by taking e to the power
of a matrix? The answer comes from recalling some calculus and looking at Taylor series for ex :
x
e =
X
xn
n=0
n!
=1+x+
x2 x3
+
+ .
2
3!
(If you didnt see Taylor series in your calculus course, no worries, this again is just to illustrate
how diagonalization can be useful.) The point is that this series makes sense when we substitute a
matrix in place of x, so we define eA to be the matrix
eA =
X
An
n=0
1
1
= I + A + A2 + A3 + .
n!
2
3!
eA = SeD S 1 .
0
for D = 5
0 3 . This is (fairly) straightforward:
1
1
eD = I + D + D2 + D3 +
2
3!
!
!
(5)2
(5)3
1 0
5 0
0
0
2
3!
=
+
+
+
+
32
33
0 1
0 3
0
0
2
3!
1 + (5) + 12 (5)2 + 3!1 (5)3 +
0
=
0
1 + 3 + 12 32 + 3!1 33
and we can now recognize these diagonal terms as the series expressions for e5 and e3 respectively.
Thus
5
e
0
D
e =
,
0 e3
102
so
A
e =
1 3
2 2
5
1
e
0
1 3
0 e3
2 2
..
..
.
.
n1
n
where all non-diagonal entries are zero except for possibly some 1s in the starred locations right
above the diagonal. Such a matrix is called a Jordan matrix and if A is similar to this, this
is called the Jordan form of A. The point is that for such matrices, powers are still somewhat
straightforward to compute in a way which is good enough for most applications.
Jordan forms are something you would learn about in a later linear algebra course, such as Math
334. They are related to what are called generalized eigenvectors, which as the name suggests are
generalizations of eigenvectors. As one final fact, we can now answer a questions which Im sure
has been on all of your minds: when are two matrices similar? The answer: two square matrices
are similar if and only if they have the same Jordan form.
1
2
0
0
corresponding to the eigenvalues 2, 2, 1, 1 respectively. We want to find A1 and justify along the
way that A is indeed invertible.
First note that the given eigenvectors are linearly independent, which is maybe easier to see by
writing them in the order
1
2
1
1
0 2 3 2
, , , .
0 0 1 3
1
0
0
0
The matrix with these as columns has full rank, so those columns are linearly independent. Hence
we know that the eigenspaces E1 and E2 are each at least 2-dimensional, since we have at least two
linearly independent eigenvectors in each. Thus the algebraic multiplicities of 1 and 2 are at least
2, so the characteristic polynomial of A looks like
det(A I) = ( 1)m ( 2)n (other stuff), where m, n 2.
But A is only 4 4, so since the degree of this polynomial has to be 4 there is no choice but to have
det(A I) = ( 1)2 ( 2)2 .
Thus 0 is not an eigenvalue of A so A is invertible, and since now we see that the geometric
multiplicity of each eigenvalue equals its algebraic multiplicity, A is diagonalizable.
Using the given eigenvectors, we can diagonalize A as
1
0
A=
0
0
2
1 2 1
2 3 2 0
0 1 3 0
0
0 0 1
0
1
0
0
0
0
2
0
0
1
0 0
0 0
1
0
1
1 2 1
2 3 2
.
0 1 3
0 0 1
Now, we could multiply this out to find A, and then use that to find A1 . However, we save some
time as follows. Recall that in general the inverse of a product of matrices is the product of the
individual inverses but in reverse order. So, for instance, (BCD)1 = D1 B 1 C 1 . In our case,
this means that
1
1
1 1
1 1 2 1
2 0 0 0
1 1 2 1
0 2 3 2
0 1 0 0 0 2 3 2 .
A1 =
0 0 1 3 0 0 2 0 0 0 1 3
0 0 0 1
0 0 0 1
0 0 0 1
But (S 1 )1 = S, so
A1
1
0
=
0
0
1
0
=
0
0
1 2 1
2 0
0 1
2 3 2
0 1 3 0 0
0 0 1
0 0
1 2 1
1/2
0
2 3 2
0 1 3 0
0
0 0 1
0
0
2
0
0
1
0
0
104
1
0
1 1
0 2
0
0 0 0
1
0 0
0 0
1
0
0 0
1/2 0 0
0 1
0
1
1
2
3
1
1
2 1
3 2
1 3
0 1
2
3
1
0
1
2
0
0
(Note that if A = SDS 1 , then A1 = SD1 S 1 so the formula Ak = SDk S 1 for the powers of a
diagonalizable matrix we saw last time works even for negative powers.) The inverse on the right
is
1
1 1 2 1
1 1/2 1/2 3/2
0 2 3 2
0 1/2 3/2 11/2
0 0 1 3 = 0
0
1
3
0 0 0 1
0
0
0
1
so multiplying out
A1
1
0
=
0
0
1/2
1 2 1
0
2 3 2
0 1 3 0
0
0 0 1
1 0 0
0 1/2 3/2 11/2
0
0
1
3
0 1/2 0
0
0
0
1
0 0 1
will give us A1 . As opposed to finding A first and then A1 , this method only requires us to find
one inverse explicitly (namely S 1 ) using row operations instead of two: S 1 and A1 .
Example 1. Consider the matrix A =
0 1
1 0
det(A I) = 2 + 1
so A has no real eigenvalues. (This makes sense, since a rotation by 90 will turn no vector into a
scalar multiple of itself.) However, A does have two complex eigenvalues: i and i. We can find
eigenvectors for each of these using the same method as for real eigenvalues.
For = i, we have
i 1
.
A iI =
1 i
Now, we can reduce this by multiplying the first row by i and adding to the second row, but
instead since we know this matrix will not be invertible, we know that the second row will have to
become all zeroes after reducing, so
i 1
i 1
.
0
0
1 i
Now, for a matrix of this form, finding a nonzero vector in the kernel is easy: we just switch the
two entries in the top row and multiply one by a negative. Thus,
1
i
is in ker(A iI), so this is an eigenvector of A with eigenvalue i. As a check:
0 1
1
i
1
=
=i
,
1 0
i
1
i
so
1
i
1 i
0 0
105
A (2 + i)I =
0
0
17 4 i
1
1
so 4+i
is an eigenvector of A with eigenvalue 2 + i. Hence 4i
is an eigenvector of A with
eigenvalue 2 i, so we can diagonalize A over C as
1
1
1
2+i
0
1
1
6
1
.
=
4 + i 4 i
0
2i
4 + i 4 i
17 2
Now, notice that the matrix B = 21 1
has the same eigenvalues as A, and after finding some
2
eigenvectors we see that we can diagonalize B as
1
1 1
2+i
0
1 1
B=
.
i i
0
2i
i i
0
2 1
Since A and B are both similar to 2+i
0 2i , they are similar to each other! The matrix 1 2
geometrically
with
some scalingscompare to the rotation matrix
represents a rotation combined
cos sin so we conclude that A =
6 1 also represents a rotation combined with scalings.
17 2
sin cos
Remark. In general, a 2 2 matrix with complex eigenvalues a ib will be similar to
a b
,
b a
106
and so geometrically represents a rotation combined with scalings. This is further evidence that
there is a deep relation between complex numbers and rotations, which you would elaborate more
on in a complex analysis course.
Example 3. Let A be the matrix
1 2 1
A = 0 1 3 ,
0 3 1
which has characteristic polynomial
(1 )(2 2 + 10).
1
Hence the eigenvalues of A are 1 and 1 3i. For = 1 we get 0 as an eigenvector. For = 1 + 3i
0
we have
3i 2 1
3i 2 1
A (1 + 3i)I = 0 3i 3 0 3i 3 .
0
3
3i
0
0
0
Setting the free variable equal to 3i, we find that one possible eigenvector is
1 + 2i
3 .
3i
Hence
1 2i
3
3i
1
1 1 + 2i 1 2i
1
0
0
1 1 + 2i 1 2i
3
3 0 1 + 3i
0 0
3
3 .
A = 0
0
3i
3i
0
0
1 3i
0
3i
3i
Note that things get tougher once you move past 2 2 matrices with complex eigenvalues!
3-dimensional rotations have axes of rotations. And now, after a full quarter, we can finally
justify something I claimed on the very first day of class, and which I included as part of the
introduction to the class on the syllabus: any 3-dimensional rotation has an axis of rotation. Note
how much we had to develop in order to get to this point!
Say that A is a 3 3 rotation matrix. First, we know that A must have at least one real
eigenvalue, since complex eigenvalues come in (conjugate) pairs and a 3 3 matrix will have 3
eigenvalues counted with multiplicity. Now, we also know that since A describes a rotation, only
1 and 1 can be real eigenvalues. We claim that 1 must be an eigenvalue. There are two possibilities:
either A has 3 real eigenvalues or it has 1 real eigenvalue.
If A has 3 real eigenvalues and they are all 1, then det A, which is the product of the eigenvalues
of A, would be 1, but weve seen that a rotation must have positive determinant. Hence if A has
3 real eigenvalues at least one of them must be 1.
If A has only 1 real eigenvalue, it has two other complex eigenvalue a ib. If 1 is the one real
eigenvalue, then
det A = (1)(a ib)(a + ib) = (a2 + b2 )
107
is negative, which again is not possible. Hence the one real eigenvalue of A must be 1.
Thus either way, 1 is an eigenvalue of A; take ~x to be a corresponding eigenvector. Then the
line spanned by ~x is an axis of rotation for A. Tada!
108