Math115 EntireCourseLectureNotes
Math115 EntireCourseLectureNotes
University of Waterloo
December 1, 2020
Week 1: September 8 – September 11
Lecture Page Topic Textbook
Lecture 1 2 Complex Numbers - Standard Form 9.1
Lecture 2 7 Complex Conjugate, Modulus, Geometry 9.1
Lecture 3 14 Polar Form, Powers of Complex Numbers 9.1
Lecture 4 21 Complex nth Roots, the Complex Exponential 9.1
Due: Assignment 0 by 8:30am on Friday, September 11
1
Only the material regarding complex vectors, conjugates, inner products and norms. The material about
vector spaces and the Gram Schmidt Procedure can be ignored.
i
Week 4: September 28 – October 2
Lecture Page Topic Textbook
Lecture 13 82 Spanning Sets 1.2, 1.4
Lecture 14 90 Linear Dependence and Linear Independence 1.2, 1.4
Lecture 15 96 Bases, Subspaces of Rn 1.4
Lecture 16 100 Bases of Subspaces 1.4
Due: Assignment 3 by 8:30am on Friday, October 2
2
Complex Numbers in Electrical Circuit Equations is omitted.
3
Only Spanning Problems are covered for now
ii
Week 7: October 26 – October 30
Lecture Page Topic Textbook
Lecture 25 157 Matrix Algebra 3.1
Lecture 26 162 The Matrix–Vector Product 3.1
The Fundamental Subspaces Associated with a Matrix,
Lecture 27 167 3.1, 3.4
Matrix Multiplication
Lecture 28 173 Complex Matrices, Application: Directed Graphs –
Due: Assignment 5 by 8:30am on Friday, October 30
4
The material here is a bit advanced - understanding the lecture notes will be sufficient.
5
Omit Linear Mappings for now.
6
The lecture notes are better here - the textbook material is too abstract.
7
The lecture notes are better here - the textbook material is too advanced
iii
Week 10: November 16 – November 20
Lecture Page Topic Textbook
Cofactors, Adjugates, Inverses, Elementary Row and
Lecture 37 233 5.1, 5.2, 5.38
Column Operations
Lecture 38 241 Properties of Determinants 5.2
Application: Polynomial Interpolation, Determinants
Lecture 39 247 5.4
and Area
Determinants and Volume, Eigenvalues and Eigenvec-
Lecture 40 253 5.4, 6.19
tors, Characteristic Polynomials
Due: Assignment 7 by 8:30am on Friday, November 20
8
Omit Cramer’s Rule.
9
Omit The Power Method of Determining Eigenvalues.
iv
Lecture 1
Note that every natural number is an integer, every integer is a rational number (with
denominator equal to 1) and that every rational number is a real number. Consider the
following five equations:
x+3=5 (1)
x+4=3 (2)
2x = 1 (3)
x2 = 2 (4)
x2 = −2 (5)
Equation (1) has solution x = 2, and thus can be solved using natural numbers. Equation (2)
does not have a solution in the natural numbers, but it does have a solution in the integers,
namely x = −1. Equation (3) does not have a solution in the integers, but it does have a
rational solution of x = 21 . Equation (4) does not have a rational solution, but it does a have
√
real solution: x = 2. Finally, since the square of any real number is greater than or equal
to zero, Equation (5) does not have a real solution. In order to solve this last equation, we
will need a “larger” set of numbers.
We introduce a bit of notation here. When we write x ∈ R, we mean that the variable x is
a real number. As another example, by p, q ∈ Z, we mean that both p and q are integers.
By x ∈/ N, we mean that x is not a natural number.
C = {x + yj | x, y ∈ R}.
Note that mathematicians (and most other humans including the authors of the text) use
i rather than j, however engineers use j since i is often used in the modelling of electric
networks.
2
Example 1.2.
• 3 = 3 + 0j ∈ C
• 4j = 0 + 4j ∈ C
• 3 + 4j ∈ C
• sin(π/7) + π π j ∈ C
In fact, every x ∈ R can be expressed as x = x + 0j ∈ C, so every real number is a complex
number. However, not every complex number is real, for example, 3 + 4j ∈ / R.
We introduce a little bit more notation here. We just mentioned that every real number is a
complex number. We denote this by R ⊆ C and say that R is a subset of C. We also showed
that not every complex number is a real number, which we denote by C 6⊆ R and say that
C is not a subset of R. From the definitions of natural numbers, integers, rational numbers,
real numbers and complex numbers, we have
N⊆Z⊆Q⊆R⊆C
Definition 1.3. Let z = x + yj ∈ C with x, y ∈ R. We call x the real part of z and y the
imaginary part of z:
• Im(3 − 4j) = −4
It is important to note that Im(3 − 4j) 6= −4j. By definition, for any z ∈ C we have
Re(z), Im(z) ∈ R, that is, both the real and imaginary parts of a complex number are real
numbers.
Having defined complex numbers, we now look at how the basic algebraic operations of
addition, subtraction, multiplication and division are defined.
Definition 1.5. Two complex numbers z = x + yj and w = u + vj with x, y, u, v ∈ R are
equal if and only if x = u and y = v, that is, if and only if Re(z) = Re(w) and Im(z) = Im(w).
In words, two complex numbers are equal if they have the same real parts and the same
imaginary parts.
3
Definition 1.6. Let x + yj and u + vj be two complex numbers in standard form. We define
addition, subtraction and multiplication as
To add two complex numbers, we simply add the real parts and add the imaginary parts.
Subtraction is done similarly. With our definition of multiplication, we can verify that
j 2 = −1:
There is no need to memorize the formula for multiplication of complex numbers. Using the
fact that j 2 = −1, we can simply do a binomial expansion:
Solution. We have
z + w = (3 − 2j) + (−2 + j) = 1 − j
z − w = (3 − 2j) + (−2 + j) = 5 − 3j
zw = (3 − 2j)(−2 + j) = −6 + 3j + 4j − 2j 2 = −6 + 3j + 4j + 2 = −4 + 7j
We see that addition, subtraction and multiplication are similar to that of real numbers, just
a little more complicated. We now look at division of complex numbers.
4
Notice that when we divide by a nonzero complex number x + yj, we multiply both the
numerator and denominator by x − yj. This is because (x + yj)(x − yj) = x2 + y 2 ∈ R,
which allows us to put the quotient into standard form. We can now divide any complex
number by any nonzero complex number.
Example 1.9. With z = 3 − 2j and w = −2 + j, compute z/w.
Solution. We have
−6 − 3j + 4j + 2j 2
z 3 − 2j 3 − 2j −2 − j −8 + j 8 1
= = = 2
= = − + j.
w −2 + j −2 + j −2 − j 4 + 2j − 2j − j 4+1 5 5
Example 1.10. Express
(1 − 2j) − (3 + 4j)
5 − 6j
in standard form.
Solution. We carry out our operations as we would with real numbers.
(1 − 2j) − (3 + 4j) −2 − 6j
=
5 − 6j 5 − 6j
−2 − 6j 5 + 6j
=
5 − 6j 5 + 6j
−10 − 12j − 30j − 36j 2
=
25 + 36
26 − 42j
=
61
26 42
= − j
61 61
Note that for z ∈ C, we have z 1 = z, and for any integer k ≥ 2, z k = z(z k−1 ). For z 6= 0
(here, 0 = 0 + 0j), z 0 = 1. As usual, 00 is undefined. For any z ∈ C with z 6= 0, we have
z −k = z1k for any positive integer k. In particular, z −1 = z1 for z 6= 0.
We now summarize the rules of arithmetic in C. Notice that we’ve used some of these rules
already.
Theorem 1.11 (Properties of Arithmetic in C). Let u, v, z ∈ C with z = x + yj. Then
(1) (u + v) + z = u + (v + z) addition is associative
5
(5) (uv)z = u(vz) multiplication is associative
These rules show that complex numbers behave much like real numbers with respect to
addition, subtraction, multiplication and division.
6
Lecture 2
Example 2.1. Find all z ∈ C satisfying z 2 = −7 + 24j.
Solution. Let z = a + bj with a, b ∈ R. Then
a2 − b2 = −7 (6)
2ab = 24 (7)
24 12 12
From (7), we have that a, b 6= 0, so b = 2a
= a
.
Substituting b = a
into (6) gives
2
2 12
a − = −7
a
144
a2 − 2 = −7
a
4 2
a + 7a − 144 = 0
(a + 16)(a2 − 9) = 0
2
a2 − b2 = −2 (8)
2ab = 0 (9)
2 2
From (9) we
√ see that a √ √ to −b = −2, that is b = 2.
= 0 or b = 0. If a = 0√then (8) reduces
Hence b = 2 or b = − 2. In this case, z = 2j or z = − 2j. On the other hand,√if b = 0
2
then a√ = −2 which has no solutions since a ∈ R implies that a2 ≥ 0. Thus z = 2j and
z = − 2j are the only solutions.
7
Definition 2.3. The complex conjugate of z = x + yj with x, y ∈ R is z = x − yj.
Example 2.4.
• 1 + 3j = 1 − 3j
√ √
• 2j = − 2j
• −4 = −4
(1) z = z
(2) z ∈ R ⇐⇒ z = z
(4) z + w = z + w
(5) zw = z w
k
(6) z k = z for k ∈ Z, k ≥ 0, (k 6= 0 if z = 0)
z z
(7) = provided w 6= 0
w w
(8) z + z = 2x = 2Re(z)
(10) zz = x2 + y 2
8
(4) z + w = (x + yj) + (u + vj) = (x + u) + (y + v)j = (x + u) − (y + v)j
= (x − yj) + (u − vj) = z + w.
(5) We have
zw = (x + yj)(u + vj) = (xu − yv) + (xv + yu)j = (xu − yv) − (xv + yu)j
and
z w = x + yj u + vj = (x − yj)(u − vj) = (xu − yv) + (−xv − yu)j
= (xu − yv) − (xv + yu)j
(6) This requires a proof technique called induction which we do not cover in MATH 115.
(7) For w 6= 0,
1 1 u v u v
= = 2 2
− 2 2
j = 2 2
+ 2 j
w u + vj u +v u +v u +v u + v2
and
1 1 u v
= = 2 + j
w u − vj u + v 2 u2 + v 2
so
1 1
= .
w w
Now, using (5) we obtain
z
1 1 1 z
= z =z =z =
w w w w w
Definition
p 2.6. The modulus of z = x + yj with x, y ∈ R is the nonnegative real number
|z| = x + y 2 .
2
9
Example 2.7.
√ √
• |1 + j| = 12 + 12 = 2
√ √
• |3j| = 02 + 32 = 9 = 3
p √
• | − 4| = (−4)2 + 02 = 16 = 4
For x ∈ R, we know that since R ⊆ C, x ∈ C. Thus the modulus of x is given by
√ √
|x| = |x + 0j| = x2 + 02 = x2 = |x| .
|{z} |{z}
modulus absolute value
We see that for real numbers x, the modulus of x is the absolute value of x. Thus the
modulus is the extension of the absolute value to the complex numbers which is why we
have chosen the same notation for the modulus as the absolute value. We will see shortly
that the modulus of a complex number can be interpreted as the size or magnitude of that
complex number, just like the absolute value of a real number can be interpreted as the size
of magnitude of that real number.
Given two complex numbers, it is natural to ask if one number is greater than the other, for
example, is 1+j < 3j? This is easy to decide for real numbers, but it is actually undefined for
complex numbers.10 This is where the modulus can help us: we compare complex numbers
by comparing
√ their moduli since the modulus of a complex number is real. For instance,
|1 + j| = 2 < 3 = |3j| (but we don’t say 1 + j < 3j).
Theorem 2.8 (Properties of Modulus). Let z, w ∈ C. Then
(1) |z| = 0 ⇐⇒ z = 0
(2) |z| = |z|
(3) zz = |z|2
(4) |zw| = |z||w|
z |z|
(5) = provided w 6= 0
w |w|
(6) |z + w| ≤ |z| + |w| which is known as the Triangle Inequality
Proof. Let z, w ∈ C.
√
(1) Assume first that z = 0. pThen |z| = 02 + 02 = 0. Assume now that z = x + yj is
such that |z| = 0. Then x2 + y 2 = 0 and so x2 + y 2 = 0. It follows that x = y = 0
and so z = 0.
10
We say that R is ordered by ≤, that is, for x, y ∈ R, x ≤ y or y ≤ x. As defined, ≤ doesn’t make sense
for complex numbers. One can however, redefine what ≤ means for complex numbers so that the complex
numbers are ordered by ≤, but this is beyond the scope of MATH 115.
10
p p
(2) |z| = |x − yj| = |x + (−y)j| = x2 + (−y)2 = x2 + y 2 = |z|.
(4) We have
Thus |zw|2 = (|z||w|)2 . Since the modulus of a complex number is never negative, we
can take square roots of both sides to obtain |zw| = |z||w|.
z 2 zz z z zz |z|2
= = = =
w w w ww ww |w|2
Since the modulus of a complex number is never negative, we can take square roots of
z |z|
both sides to obtain = .
w |w|
(6) Left as an exercise.
Note that for a complex number z 6= 0, the modulus and the complex conjugate give us a
nice way to write z −1 :
1 z z
z −1 = = = 2.
z zz |z|
Geometry
Visually, we interpret the set of real numbers as a line. Given that R ⊆ C and that there
are complex numbers that are not real, the set of complex numbers should be “bigger” than
a line. In fact, the set of complex numbers is a plane, much like the xy–plane as shown in
Figure 1. We “identify” the complex number x + yj ∈ C with the point (x, y) ∈ R2 . In
this sense, the complex plane is simply a “relabelling” of the xy–plane. The x–axis in the
xy–plane corresponds to the real axis in the complex plane which contains the real numbers,
The y–axis of the xy–plane corresponds to the imaginary axis in the complex plane which
contains the purely imaginary numbers. Note we will often label the real axis as “Re” and
the imaginary axis as “Im”.
11
(a) The xy-plane, known as R2 . (b) The complex plane C
We also have a geometric interpretation of the complex conjugate and the modulus as well
which is shown in Figure 2.
Figure 2: Visually interpreting the complex conjugate and the modulus of a complex number.
For z ∈ C, we see that that z is a reflection of z in the real axis and that |z| is the distance
between 0 and z. Also note that any complex number w lying on the green circle in Figure 2
satisfies |w| = |z|. If w is inside the green circle, then |w| < |z| and if w is outside the green
circle, then |w| > |z|.
12
We also gain a geometric interpretation of addition:
We see that the complex numbers 0, z, w and z+w form a parallelogram with the line segment
between 0 and z + w as one of the diagonals. Finally, we look at the triangle determined by
0, z and z + w.
Since the length of any one side of a triangle cannot exceed the sum of the other two sides
(or else the triangle wouldn’t “close”), we have must have
|z + w| ≤ |z| + |w|
Note that this is not a proof of the Triangle Inequality.
We will require a little more work before we can have a meaningful geometric understanding
of complex multiplication.
13
Lecture 3
Thus
z = x + yj = (r cos θ) + (r sin θ)j = r(cos θ + j sin θ).
√
Note that | cos θ + j sin θ| = cos2 θ + sin2 θ = 1, and as a result, we may understand an
argument of a complex number z as giving us a point on a circle of radius 1 to move towards
(that is measured counterclockwise from the positive real axis), while r > 0 tell us how far
to move in that direction to reach z. This is illustrated in Figure 6.
14
Figure 6: Using r and θ to locate a complex number. Here, r > 1.
z = r(cos θ + j sin θ)
Note that unlike standard form, z does not have a unique polar form. Recall that for any
k ∈ Z,
cos θ = cos(θ + 2kπ) and sin θ = sin(θ + 2kπ)
so
r(cos θ + j sin θ) = r cos(θ + 2kπ) + j sin(θ + 2kπ)
for any k ∈ Z.
(2) 7 + 7j
11
We typically write cos θ + j sin θ rather than cos θ + (sin θ)j to avoid the extra brackets. For standard
form, we still write x + yj and not x + jy.
15
Solution.
q√ √ √ √
(1) We have r = |1 + 3j| = 12 + ( 3)2 = 1 + 3 = 4 = 2. Thus, factoring r = 2 out
√
of 1 + 3j gives
√ !
√ 1 3
1 + 3j = 2 + j .
2 2
√
3
As this is of the form r(cos θ + j sin θ), we have that cos θ = 12 and sin θ = 2
. We thus
take θ = π3 so
√ π π
1 + 3j = 2 cos + j sin .
3 3
√ p √
(2) Since r = |7 + 7j| = 72 + 72 = 2(49) = 7 2, we have that
√ √
7 7 1 1
7 + 7j = 7 2 √ + √ j = 7 2 √ + √ j
7 2 7 2 2 2
√ √
√1 2 2 π
so cos θ = 2
= 2
and sin θ = 2
. Thus we take θ = 4
to obtain
√ π π
7 + 7j = 7 2 cos + j sin .
4 4
Converting from standard form to polar form is a bit computational, however the next
example shows it is quite easy to convert from polar form back to standard form.
Example 3.3. Write 3 cos 5π + j sin 5π
6 6
in standard form.
Solution. We have
√ ! √
5π 5π 3 1 3 3 3
3 cos + j sin =3 − + j =− + j.
6 6 2 2 2 2
As mentioned, polar form is useful for complex multiplication. To see how, we begin by
recalling the angle sum formulas
16
sin(θ1 + θ2 ) = sin θ1 cos θ2 + cos θ1 sin θ2
If
z1 = r1 (cos θ1 + j sin θ1 ) and z2 = r2 (cos θ2 + j sin θ2 )
are two complex numbers in polar form, then
z1 z2 = r1 (cos θ1 + j sin θ1 ) r2 (cos θ2 + j sin θ2 )
= r1 r2 (cos θ1 + j sin θ1 )(cos θ2 + j sin θ2 )
= r1 r2 (cos θ1 cos θ2 − sin θ1 sin θ2 ) + j(sin θ1 cos θ2 + cos θ1 sin θ2 )
= r1 r2 cos(θ1 + θ2 ) + j sin(θ1 + θ2 )
Thus
z1 z2 = r1 r2 cos(θ1 + θ2 ) + j sin(θ1 + θ2 ) .
This now allows us to understand polar multiplication geometrically. Given a complex
number z = r(cos θ + j sin θ), multiplying by z can be viewed as a counterclockwise rotation
by θ about the number 0 in the complex plane, and a scaling by a factor of r. This is
illustrated in Figure 7. Note that a counterclockwise rotation by θ is a clockwise rotation
by −θ. Thus, if θ = − π4 for example, then multiplication by z can be viewed as a clockwise
rotation by π4 (plus a scaling by a factor of r).
Figure 7: Multiplication of complex numbers in polar form. Note that in this image,
|z1 |, |z2 | > 1 and θ1 , θ2 > 0.
Recall that multiplying complex numbers in standard form requires a binomial expansion
which can be tedious and error prone by hand. Although it is also tedious to convert a
complex number in standard form to polar form, multiplying complex numbers in polar form
17
is quite simple. We simply multiply the two moduli together, which is just multiplication of
real numbers, and add the arguments together, which is just addition of real numbers.
√
Example 3.4. Let z1 = 2 cos π3 + j sin π3 and z2 = 7 2 cos π4 + j sin π4 . Express z1 z2 in
polar form.
Solution. We have
√ π π √
π π
7π 7π
z1 z2 = 2(7 2) cos + + j sin + = 14 2 cos + j sin .
3 4 3 4 12 12
Example 3.5. Let z1 = r1 (cos θ1 + j sin θ1 ) and z2 = r2 (cos θ2 + j sin θ2 ) be two complex
numbers in polar form with z2 6= 0 (from which it follows that r2 6= 0). Show that
z1 r1
= cos(θ1 − θ2 ) + j sin(θ1 − θ2 ) .
z2 r2
Solution. Recall that
cos(θ1 − θ2 ) = cos θ1 cos θ2 + sin θ1 sin θ2
sin(θ1 − θ2 ) = sin θ1 cos θ2 − cos θ1 sin θ2
We have
z1 r1 (cos θ1 + j sin θ1 )
=
z2 r2 (cos θ2 + j sin θ2 )
r1 cos θ1 + j sin θ1 cos θ2 − j sin θ2
=
r2 cos θ2 + j sin θ2 cos θ2 − j sin θ2
r1 (cos θ1 cos θ2 + sin θ1 sin θ2 ) + j(sin θ1 cos θ2 − cos θ1 sin θ2 )
=
r2 cos2 θ2 + sin2 θ2
r1
= cos(θ1 − θ2 ) + j sin(θ1 − θ2 ) .
r2
= r2 cos(2θ) + j sin(2θ) .
Continuing with this process, it appears that for any positive integer n,
z n = rn cos(nθ) + j sin(nθ) .
18
Example 3.6. For z = r(cos θ + j sin θ) 6= 0, show that
1 1
z −1 =
= cos(−θ) + j sin(−θ) .
z r
Solution. Note that |1| = 1 and θ = 0 is an argument for 1. Using the result of Example
3.5, we have
1 1(cos 0 + j sin 0) 1 1
z −1 =
= = cos(0 − θ) + j sin(0 − θ) = cos(−θ) + j sin(−θ) .
z r(cos θ + j sin θ) r r
The above example shows that z n = rn cos(nθ) + j sin(nθ) holds for n = −1 as well. We
have the following important result.
Theorem 3.7 (de Moivre’s Theorem). If z = r(cos θ + j sin θ) 6= 0, then
z n = rn cos(nθ) + j sin(nθ)
for any n ∈ Z.
Since de Moivre’s Theorem is stated for n ∈ Z, we have to allow for n < 0 and thus the
restriction that z 6= 0. It is easy to verify that de Moivre’s Theorem holds for z = 0 provided
n ≥ 1. The proof of de Moivre’s Theorem again requires induction so is not included here.
Example 3.8. Compute (2 + 2j)7 using de Moivre’s Theorem and express your answer in
standard form.
√ p √
Solution. We have r = |2 + 2j| = 4 + 4 = 2(4) = 2 2 and so
√ √ !
√ √
2 2 2 2
2 + 2j = 2 2 √ + √ j = 2 2 + j
2 2 2 2 2 2
19
√ 602
1 3
Example 3.9. Compute 2
+ 2
j and express your answer in standard form.
√ q
1 3 1 3
Solution. Since r = 2
+ 2
j = 4
+ 4
= 1, we see that
√
1 3 π π
+ j = cos + j sin .
2 2 3 3
Thus
√ !602
1 3 π π 602
+ j = cos + j sin
2 2 3 3
602π 602π
= cos + j sin by de Moivre’s Theorem
3 3
2π 2π
= cos + j sin
3 √ 3
1 3
=− + j
2 2
It is hopefully apparent that trigonometry will play a role here, so we include the unit circle
in the complex plane. Note that in MATH 115, we use radians to measure angles as opposed
to degrees.
20
Lecture 4
Let z = r(cos θ + j sin θ) and let w = R(cos φ + j sin φ). From wn = z we have
n
R(cos φ + j sin φ) = r(cos θ + j sin θ).
Using de Moivre’s Theorem, we obtain
Rn (cos(nφ) + j sin(nφ)) = r(cos θ + j sin θ).
From this we find that
Rn = r and nφ = θ + 2kπ
for some k ∈ Z. To understand this, notice that since wn = z, it must be the case that wn
and z have the same modulus and so Rn = r, and that any argument of wn must be equal
to an argument of z plus some integer multiple of 2π. Solving for R and φ gives
θ + 2kπ
R = r1/n and φ =
n
for some k ∈ Z. Here, r1/n is the nth root of the real number r and is evaluated in the
normal way. Thus, for any k ∈ Z, let
1/n θ + 2kπ θ + 2kπ
wk = r cos + j sin .
n n
Then
n
θ + 2kπ θ + 2kπ
wkn = r 1/n
cos + j sin
n n
1/n n
θ + 2kπ θ + 2kπ
= r cos n + j sin n by de Moivre’s Theorem
n n
= r cos(θ + 2kπ) + j sin(θ + 2kπ)
= r(cos θ + j sin θ)
= z.
Hence wkn = z for any integer k. It is tempting to think that there will be infinitely many
solutions to wn = z, but in fact we obtain exactly n solutions.
21
Theorem 4.1. Let z = r(cos θ + j sin θ) be nonzero, and let n be a positive integer. Then
the n distinct nth roots of z are given by
1/n θ + 2kπ θ + 2kπ
wk = r cos + j sin .
n n
for k = 0, 1, . . . , n − 1.
Example 4.2. Find the 3rd roots of 1, that is, find all w ∈ C such that w3 = 1.
Solution. Here, z = 1 and n = 3. In polar form, 1 = 1(cos 0 + j sin 0) so the 3rd roots of 1
are given by
1/3 0 + 2kπ 0 + 2kπ
wk = 1 cos + j sin , k = 0, 1, 2
3 3
2kπ 2kπ
= cos + j sin , k = 0, 1, 2
3 3
Thus
w0 = cos 0 + j sin 0 = 1
√
2π 2π 1 3
w1 = cos + j sin =− + j
3 3 2 √2
4π 4π 1 3
w2 = cos + j sin =− − j
3 3 2 2
√ √
3 3
Thus, the 3rd roots of 1 are given by 1, − 12 + 2
j and − 12 − 2
j. This means that
√ !3 √ !3
1 3 1 3
13 = − + j = − − j = 1.
2 2 2 2
22
Figure 9: The 3rd roots of 1.
Example 4.3. Find all 4th roots of −256 in standard form and plot them in the complex
plane.
Solution. Here, z = −256 and n = 4. We have that −256 = 256(cos π + j sin π) so the 4th
roots are given by
1/4 π + 2kπ π + 2kπ
wk = (256) cos + j sin , k = 0, 1, 2, 3
4 4
π + 2kπ π + 2kπ
= 4 cos + j sin , k = 0, 1, 2, 3.
4 4
Thus
√ √ !
π π 2 2 √ √
w0 = 4 cos + j sin =4 + j = 2 2 + 2 2j
4 4 2 2
√ √ !
√ √
3π 3π 2 2
w1 = 4 cos + j sin =4 − + j = −2 2 + 2 2j
4 4 2 2
√ √ !
√ √
5π 5π 2 2
w2 = 4 cos + j sin =4 − − j = −2 2 − 2 2j
4 4 2 2
23
√ √ !
√ √
7π 7π 2 2
w3 = 4 cos + j sin =4 − j = 2 2 − 2 2j
4 4 2 2
which we plot in the complex plane. Notice again that the roots are evenly spaced out on a
circle of radius 4.
√
Example 4.4. Find the 3rd roots of 4 − 4 3j. Express your answers in polar form.
√ √ √
Solution. Since |4 − 4 3j| = 4|1 − 3j| = 4 1 + 3 = 4(2) = 8, we have
√ ! √ !
√
4 4 3 1 3 5π 5π
4 − 4 3j = 8 − j =8 − j = 8 cos + j sin .
8 8 2 2 3 3
24
In the last example, it is difficult to write w0 , w1 and w2 in standard from without a calculator.
If z = r(cos θ + j sin θ) is the polar form of z ∈ C, then z = rejθ is the complex exponential
form of z. As polar form is not unique, neither is complex exponential form:
Also, recall that de Moivre’s Theorem states that for z = r(cos θ + j sin θ) and n ∈ Z, we
have that z n = rn (cos(nθ) + j sin(nθ)). Thus (rejθ )n = rn ej(nθ) . Taking r = 1 gives
n
ejθ = ej(nθ)
ejπ + 1 = 0
which is known as Euler’s Identity and is often regarded as the most beautiful equation in
mathematics because it combines some of the most important quantities mathematicians use
into one tidy little equation:
25
e − irrational number appearing all over mathematics, particularly in differential equations
π − irrational number important for trigonometry
j − most famous nonreal complex number
1 − the multiplicative identity
0 − the additive identity
Unless specifically asked otherwise, you may use complex exponential form instead of polar
form.
Solution. Since −64 = 64(cos π + j sin π) = 64ejπ , the 6th roots are given by
j π+2kπ j π+2kπ
wk = 641/6 e 6
= 2e 6
, k = 0, 1, 2, 3, 4, 5.
Thus,
√ !
3 1 √
w0 = 2ejπ/6 = 2 + j = 3+j
2 2
w1 = 2ejπ/2 = 2(0 + j) = 2j
√ !
j5π/6 3 1 √
w2 = 2e =2 − + j =− 3+j
2 2
√ !
3 1 √
w3 = 2ej7π/6 = 2 − − j =− 3−j
2 2
w4 = 2ej3π/2 = 2(0 − j) = −2j
√ !
3 1 √
w5 = 2ej11π/6 = 2 − j = 3−j
2 2
In a course on complex analysis, one often begins by studying well-known functions from
calculus while allowing the variable to be√ complex. Given z ∈ C, one considers functions
such as ez , sin z, cos z, tan z, ln z and z. As our work above suggests, these functions
behave quite differently when the variable is allowed to be complex. As an example of how
different the behaviour is, it can be shown that there exist infinitely many z ∈ C such that
sin z = w where w is any given complex number (say, w = 7). However, this is a topic for
another course.
26
Lecture 5
Complex Polynomials
Recall that p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 is the equation of a polynomial. We
call x the variable and a0 , a1 , . . . , an the coefficients. If an 6= 0, we say p(x) has degree n. A
number c is a root of p(x) if p(c) = 0.
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
Taking complex conjugates of both sides and using the fact that 0, a0 , a1 , . . . , an ∈ R, we
have
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0.
Thus the roots of p(x) are 0, 4j and −4j. Note that given any of these roots, the complex
conjugate of that root is also a root of p(x).
Note that we require p(x) to be a real polynomial for Theorem 5.2 to hold. The complex
polynomial
p(z) = z 2 + (2 + 3j)z − (5 − j)
has roots 1 − j and −3 − 2j, neither of which is a complex conjugate of the other.
28
Proof. Let z1 , z2 , z3 ∈ C and assume that |z1 | = |z2 | = |z3 | = 1. Then for each i = 1, 2, 3,
zi 6= 0 and it follows that zi 6= 0. We have
1 1 1 z1 z2 z3
+ + = + +
z1 z2 z3 z1 z1 z2 z2 z3 z3
z1 z2 z3
= 2
+ 2
+
|z1 | |z2 | |z3 |2
= z1 + z2 + z3 .
In the above proof, we began by stating our assumptions: z1 , z2 , z3 ∈ C and |z1 | = |z2 | =
|z3 | = 1. We then deduce that since |zi | = 1 for i = 1, 2, 3, zi 6= 0 from which it follows that
z i 6= 0. This justifies why we can multiply each z1i term by zzii . From there, we use properties
of conjugates to finish showing the conclusion holds. Note that in the proof, we state where
we used the hypothesis |z1 | = |z2 | = |z3 | = 1. Additionally, note that at no point in the
proof did we assume the conclusion was true – it is incorrect to write the following:
1 1 1
z1 + z2 + z3 = + +
z1 z2 z3
z1 z2 z3
z1 + z2 + z3 = + +
z1 z1 z2 z2 z3 z3
z1 z2 z3
z1 + z2 + z3 = 2
+ 2
+
|z1 | |z2 | |z3 |2
z1 + z2 + z3 = z1 + z2 + z3
as the very first line implies that we are already assuming the conclusion is true when it is
in fact the very statement we want to show is true.
√
Example 5.5. Let z ∈ C. Show that |Re(z)| + |Im(z)| ≤ 2|z|.
and 2
|Re(z)| + |Im(z)| = (|x| + |y|)2 = |x|2 + 2|x||y| + |y|2
29
we have
√ 2
( 2|z|)2 − |Re(z)| + |Im(z)| = 2|x|2 + 2|y|2 − |x|2 − 2|x||y| − |y|2
= |x|2 − 2|x||y| + |y|2
= (|x| − |y|)2
= (|Re(z)| − |Im(z)|)2
≥0
√ 2 2 √
Thus ( 2|z|)2 − |Re(z)| + |Im(z)| ≥ 0, that is, |Re(z)| + |Im(z)| ≤ ( 2|z|)2 . Since both
√
|Re(z)| + |Im(z)| √and 2|z| are nonnegative real numbers, we conclude that
|Re(z)| + |Im(z)| ≤ 2|z|.
√
Recall
√ that for any x ∈ R, x2 = |x|. However, if we know that x ≥ 0, then |x| = x, so
2
x = x in this case. This observation can be useful when dealing with radicals – we often
square both sides of an equality (or inequality) if it rids us of radicals, and then take square
roots once we are done.
Roundoff Error
Rounding real numbers to a certain number of decimal places is an extremely useful idea.
For example if you knew your exact weight was 123456/2345 kilograms and someone asked
you what your weight was, you wouldn’t likely respond with “123456/2345 kilograms.” You
would more likely use the fact that
123456
≈ 52.6
2345
and say that your weight was 52.6 kilograms. The reason you do this is because 52.6 is easier
to remember and it is more meaningful - it is not immediately clear how big 123456/2345
actually is.
x100 ≈ 1100 = 1
and we see that the resulting answers are not very close. We observe that a very small
change in x (exactly 0.01) leads to a relatively large change in x100 (approximately 1.705).
This phenomenon is known as roundoff error. We can attempt to avoid roundoff error by
not rounding exact answers before using them in further computations.
30
It might seem like the above roundoff error occurred because of the high power on x, but
consider the following system of equations:
x + y = 2
x + 1.014y = 0
To solve this system, we isolate for y in the first equation to get y = 2 − x. Substituting this
into the second equation gives
Now consider the above system where we round the coefficients to the nearest hundredth:
x + y = 2
x + 1.01y = 0
Here we observe an even worse roundoff error than before, and we only changed one of the
coefficients in the original system of equations by 0.004. Note also that there are no high
powers on any of the variables.
It is extremely important to control roundoff error. For instance, in the previous example,
if x and y represent changes to the amount of drugs being administered to a patient in a
hospital, then this roundoff error could severely harm the patient or cause their death.
So how do we avoid roundoff error? In general, it is not always possible. Many of the numbers
used in real-world applications have many decimal places, so even computers are forced to
round off or truncate values. A course in applied mathematics will introduce students to
sensitivity analysis, where we test how sensitive the output variables are to small changes to
the input variables.
31
Lecture 6
Vector Algebra
We now begin our study of linear algebra. Mostly we will focus on the “real case”, that is,
linear algebra using real numbers, but we will at times address the “complex case” as well.
We begin with the Cartesian Plane. We choose an origin O and two perpendicular axes
called the x1 −axis and the x2 −axis.12 A point P in this plane is represented by the ordered
pair (p1 , p2 ). We think of p1 as a measure of how far to the right (if p1 > 0) or how far to
the left (if p1 < 0) P is from the x2 −axis and we think of p2 as a measure of how far above
(if p2 > 0) or how far below (if p2 < 0) the x1 −axis P is. This is illustrated in Figure 11.
to be the collection of all such vectors. We refer to x1 and x2 as the entries or components
of the vector. We will define how to add these vectors and multiply them by constants,
two operations which we will see are vital to linear algebra. Of course, these ideas extend
naturally to three-dimensional space and beyond.
12
You might be more familiar with the names x−axis and y−axis. However, as we’ll see, it’s more
convenient to call them the x1 −axis and the x2 −axis.
32
Figure 12: The Cartesian Plane and the point P (p1 , p2 ).
33
in Rn are equal if x1 = y1 , x2 = y2 , . . . , xn = yn , that is, if their corresponding entries are
equal, and we write ~x = ~y in this case. Otherwise, we write ~x 6= ~y .
For example,
0
" # 0
~0R2 = 0 ~0R3 =
~0R4 = 0
, 0 , and so on.
0 0
0
0
Often, we denote the zero vector in Rn simply as ~0 when it is clear that we are talking about
~0Rn to avoid the messy subscript.
34
Example 6.5.
" # " # " #
1 −1 0
• + =
2 3 5
1 2 3
• 2 + 3 = 5
3 −2 1
1 " #
1
• 1 + is not defined as one vector is in R3 while the other is in R2 .
2
1
We have a nice geometric interpretation of vector addition that is similar to what we observed
for the addition of complex numbers. This is illustrated in Figure 14 (compare to Figure
3) where we see that two vectors determine a parallelogram with their sum appearing as a
diagonal of this parallelogram.
Figure 14: Geometrically interpreting vector addition. The figure on the left is in R2 with
vector components labelled on the corresponding axes and the figure on the right is vector
addition viewed for vectors in Rn .
35
that is, we multiply each entry of ~x by c.
Example 6.7.
1 2
6 12
• 2 =
−4 −8
8 16
−1 0
~
• 0 −1 = 0 = 0
2 0
We often refer to c as a scalar, and call c~x a scalar multiple of ~x. Figure 15 helps us
understand geometrically what scalar multiplication of a nonzero vector ~x ∈ R2 looks like.
The picture is similar for ~x ∈ Rn .
Definition 6.8. Two nonzero vectors in Rn are parallel if they are scalar multiples of one
another.
Example 6.9. The vectors
" # " #
2 −4
~x = and ~y =
−5 10
are parallel since ~y = −2~x, or equivalently, ~x = − 21 ~y . The vectors
−2 −2
~u = −3 and ~v = −1
−4 −13
36
are not parallel for ~u = c~v would imply that −2 = −2c, −3 = −c and −4 = −13c which
4
implies that c = 1, 3, 13 simultaneously, which is impossible.
Having equipped the set Rn with vector addition and scalar multiplication, we state here a
theorem that gives the resulting properties which we will use often throughout the course.
V3. (~x + ~y ) + w
~ = ~x + (~y + w)
~ addition is associative
V4. There exists a vector ~0 ∈ Rn such that ~v + ~0 = ~v for every ~v ∈ Rn zero vector
V5. For each ~x ∈ Rn there exists a (−~x) ∈ Rn such that ~x + (−~x) = ~0 additive inverse
Note that the zero vector of Rn is ~0 = ~0Rn and the additive inverse of ~x ∈ Rn is −~x = (−1)~x.
Many of these properties may seem obvious and it might not be clear as to why they are
stated as a theorem. One of the reasons is that everything we do in this course will fol-
low from these ten properties, so it is important to list them all here. Also, as we proceed
through the course, we will see that vectors in Rn are not the only mathematical objects
that are subject to these properties, and it is quite useful and powerful to understand what
other classes of objects behave the same as vectors in Rn .
We make the following definition that will be important throughout the course.
Definition 6.11. Let ~x1 , ~x2 , . . . , ~xk ∈ Rn and c1 , c2 , . . . , ck ∈ R for some positive integer k.
We call the vector
c1~x1 + c2~x2 + · · · + ck ~xk
a linear combination of the vectors ~x1 , ~x2 , . . . , ~xk .
Note that properties V1 and V6 of Theorem 6.10 together guarantee that if ~x1 , . . . , ~xk ∈ Rn
and c1 , . . . , ck ∈ R, then c1~x1 + c2~x2 + · · · + ck ~xk ∈ Rn , that is, every linear combination
of ~x1 , . . . , ~xk will again be a vector in Rn . Hence we say that Rn is closed under linear
combinations.
37
Example 6.12. In R3 , let
1 0 0
~e1 = 0 , ~e2 = 1 , and ~e3 = 0 .
0 0 1
x3
we see that
~x = x1~e1 + x2~e2 + x3~e3 .
That is, every ~x ∈ R3 can be expressed as a linear combination of ~e1 , ~e2 and ~e3 .
Thus far, we have associated vectors in Rn with points. Recall that given a point P (p1 , . . . , pn ),
we associate with it the vector
p1
.
p~ = .. ∈ Rn
pn
and view p~ as a directed line segment from the origin to P . Before we continue, we briefly
mention that vectors may also be thought of as directed segments between arbitrary points.
For example, given two points A and B in the x1 x2 −plane, we denote the directed line
−→
segment from A to B by AB. In this sense, the vector p~ from the origin O to the point P
−→
can be denoted as p~ = OP . This is illustrated in Figure 16.
38
Notice that Figure 16 is in R2 , but that we can view directed segments between vectors in
Rn in a similar way. We realize that there is something special about directed segments from
−→
the origin to a point P . In particular, given a point P , the entires in the vector p~ = OP are
simply the coordinates of the point P (refer to Figures 12 and 13). Thus we refer to a vector
−→
p~ = OP to be the position vector of P and and we say that p~ is in standard position. Note
that in Figure 16, only the vector p~ is in standard position.
Finding a vector from a point A to a point B in Rn is also not difficult. For two points
A(a1 , a2 ) and B(b1 , b2 ) we have that
" # " # " #
−→ b 1 − a1 b1 a1 −−→ −→
AB = = − = OB − OA
b 2 − a2 b2 a2
which is illustrated in Figure 17.
−→
Figure 17: Finding the components of AB ∈ R2 .
39
Now in Rn , given three points A, B and C, we have that
−→ −→ −→ −−→ −→ −→ −−→ −→ −−→
AC = OC − OA = OB − OA + OC − OB = AB + BC
Finally, putting everything together, we see that two points A and B and their corresponding
−→ −−→
position vectors OA and OB determine a parallelogram and that the sum and difference of
these vectors determine the diagonals of this parallelogram. This is displayed in Figure 19,
−−→
where the image on the right is obtained from the one on the left by setting ~u = OB and
−→ −−→ −→
~v = OA. Note that by orienting vectors this way, OB − OA = ~u − ~v is not in standard
position.
Figure 19: The parallelogram determined by two vectors. The diagonals of the parallelogram
are represented by the sum and difference of the two vectors.
40
Lecture 7
Example 7.2.
" #
1 √ √
• If ~x = ∈ R2 , then k~xk = 12 + 22 = 5
2
1
1 √ √
• If ~x = ∈ R4 , then k~xk = 12 + 12 + 12 + 12 = 4 = 2
1
1
41
Example 7.3. Find the distance from A(1, −1, 2) to B(3, 2, 1).
Solution. Since
3 1 2
−→ −−→ −→
AB = OB − OA = 2 − −1 = 3 ,
1 2 −1
the distance from A to B is
−→ p √ √
kABk = 22 + 32 + (−1)2 = 4 + 9 + 1 = 14.
Property (3) is known as the Triangle Inequality. As with complex numbers, the Triangle
Inequality has the same interpretation for vectors. Namely, that in the triangle determined
by vectors ~x, ~y and ~x + ~y , the length of any one side of the triangle cannot exceed the sum of
the lengths of the remaining two sides. This is illustrated in Figure 21 (compare to Figures
3 and 4).
Example 7.6.
" #
1 √
• ~x = is a unit vector since k~xk = 12 + 02 = 1
0
42
1
1 1 √ 2 1 √
• ~x = − √ 1 is a unit vector since k~xk = − √ 1 + 12 + 1 2 = √ 3 = 1
3 3 3
1
Consider a nonzero vector ~x ∈ Rn . Then
1
~y = ~x
k~xk
is a unit vector in the direction of ~x. To see this, note that since ~x 6= ~0, k~xk > 0 by Theorem
7.4(1). Thus ~y is a positive scalar multiple of ~x so ~y is in the same direction as ~x. Now
1 1 1
k~y k = ~x = k~xk = k~xk = 1
k~xk k~xk k~xk
so ~y is a unit vector in the direction of ~x.
4
Example 7.7. Find a unit vector in the direction of ~x = 5 .
6
√ √ √
Solution. Since k~xk = 42 + 52 + 62 = 16 + 25 + 36 = 77, we have
√
4 4/ 77
1 √
~y = √ 5 = 5/ 77
77 √
6 6/ 77
is the desired vector.
We now define the dot product of two vectors in Rn .
Definition 7.8. Let
x1 y1
. .
~x = .. and ~y = ..
xn yn
be vectors in Rn . The dot product 13 of ~x and ~y is the real number
~x · ~y = x1 y1 + · · · + xn yn .
Example 7.9.
1 −3
1 · −4 = 1(−3) + 1(−4) + 2(5) = −3 − 4 + 10 = 3.
2 5
13
The dot product is sometimes referred to the scalar product or the standard inner product. The term
scalar product comes from the fact that the dot product returns a real number which we call a scalar.
43
~ ~x, ~y ∈ Rn and c ∈ R.
Theorem 7.10 (Properties of Dot Products). Let w,
(1) ~x · ~y ∈ R
(2) ~x · ~y = ~y · ~x
(3) ~x · ~0 = 0
(4) ~x · ~x = k~xk2
~ · (~x ± ~y ) = w
(6) w ~ · ~x ± w
~ · ~y
Proof. We prove (2), (4) and (5). Let c ∈ R and
x1 y1
. .
~x = .. and ~y = ..
xn yn
~x · ~y = x1 y1 + · · · + xn yn = y1 x1 + · · · + yn xn = ~y · ~x.
For (5),
(c~x) · ~y = (cx1 )y1 + · · · + (cxn )yn = c(x1 y1 + · · · + xn yn ) = c(~x · ~y ).
That ~x · (c~y ) = c(~x · ~y ) is shown similarly.
Property (4) of Theorem 7.10 shows how the norm and dot product are related. Together,
norms and dot products lead to a nice geometric interpretation about angles between vectors.
Given two vectors ~x, ~y ∈ Rn , they determine an angle θ as shown in Figure 22. We restrict
θ to 0 ≤ θ ≤ π to avoid multiple values for θ and to avoid reflex angles.
44
Theorem 7.11. For two nonzero vectors ~x, ~y ∈ Rn determining an angle θ,
~x · ~y = k~xkk~y k cos θ
and subtracting k~xk2 + k~y k2 from both sides and then multiplying both sides by − 21 gives
~x · ~y = k~xkk~y k cos θ as required.
As an easy consequence of Theorem 7.11, we have the Cauchy-Schwarz Inequality, which
states that the size of the dot product of two vectors ~x, ~y ∈ Rn cannot exceed the product
of their norms. Note that the Cauchy-Schwarz Inequality holds for any vectors in Rn .
14
Corollary 7.12 (Cauchy-Schwarz Inequality). For any two vectors ~x, ~y ∈ Rn , we have
|~x · ~y | ≤ k~xkk~y k.
14
A Corollary is a result that follows from the preceding Theorem.
45
Proof. The inequality holds (with equality) if either ~x = ~0 or ~y = ~0. Thus, we assume both
~x and ~y are nonzero. Then, by Theorem 7.11, ~x · ~y = k~xkk~y k cos θ. Taking absolute values
of both sides give
|~x · ~y | = k~xkk~y k| cos θ| ≤ k~xkk~y k
where we have used that fact that | cos θ| ≤ 1 for any θ ∈ R.
The result of Theorem 7.11 can also be rearranged in order to compute the angle determined
by two nonzero vectors ~x and ~y . Indeed, for nonzero ~x and ~y we have that k~xk, k~y k > 0 and
so from ~x · ~y = k~xkk~y k cos θ we obtain
~x · ~y
cos θ = . (11)
k~xkk~y k
−1 −2
46
we see from Equation (11) that the sign of cos θ is determined by the sign of ~x · ~y since
k~xkk~y k > 0. Thus
π
~x · ~y > 0 ⇐⇒ 0 ≤ θ < ⇐⇒ ~x and ~y determine an acute angle
π 2
~x · ~y = 0 ⇐⇒ θ= ⇐⇒ ~x and ~y are orthogonal
π 2
~x · ~y < 0 ⇐⇒ <θ≤π ⇐⇒ ~x and ~y determine an obtuse angle
2
Example 7.14. For " # " #
1 6
~x = and ~y = ,
2 −2
we compute
~x · ~y = 1(6) + 2(−2) = 2 > 0
and so ~x and ~y determine an acute angle.
Note that to find the exact angle determined by ~x and ~y in the previous example we compute
~x · ~y 2 2 2 2 1
cos θ = =√ √ =√ √ =√ = √ = √
k~xkk~y k 1 + 4 36 + 4 5 40 200 10 2 5 2
so
1 −1
θ = cos √
5 2
which is our exact answer for θ as any computer or calculator will return an approximation
of this value.
47
Lecture 8
We have defined the norm for any vector in Rn and the dot product for any two vectors
in Rn . However, our work with angles determined by vectors has required that our vectors
be nonzero thus far. Now since ~x · ~0 = 0 for every ~x ∈ Rn , we define the zero vector to
be orthogonal to every vector in Rn . Thus we may simply say that two vectors ~x, ~y ∈ Rn
are orthogonal if and only if ~x · ~y = 0 and not insist that ~x, ~y are nonzero. Although the
zero vector of Rn is orthogonal to all vectors in Rn , we don’t explicitly compute the angle ~0
makes with another vector ~x ∈ Rn since
~x · ~y
cos θ =
k~xkk~y k
is not defined if either of ~x or ~y is the zero vector. Thus, we interpret ~x and ~y being orthogonal
to mean that their dot product is zero, and if they are both nonzero, then they determine
an angle of π2 .
Complex Vectors
The idea here is to extend our work in Rn to vectors whose entries are complex numbers.
For z1 , . . . , zn ∈ C, we define
z1
.
~z = ..
zn
to be a complex vector (a vector with complex entries) and
z1
..
n
C = . z1 , . . . , zn ∈ C
zn
48
Addition and scalar multiplication for vectors in Cn are defined in the same way as for vec-
tors in Rn . However, the norm and the dot product don’t behave quite the same way. Note
that for a vector ~x ∈ Rn , we have that k~xk ∈ R, k~xk ≥ 0 and that k~xk2 = ~x · ~x. We would
like to extend these operations to Cn in such a way that these properties still hold.
h~z, wi
~ = z 1 w1 + · · · + z n wn
• For ~z ∈ Cn , we have
~ ∈ Rn , then h~z, wi
• If ~z, w ~ = ~z · w.
~
15
The definition of the complex inner product given here is the one used by engineers. Mathematicians
define the complex inner product as h~z, wi
~ = z1 w1 + · · · + zn wn , that is, engineers put the complex conjugate
on the first entry in each term of the sum, whereas mathematicians put it on the second entry. We will
use the definition where the conjugate appears on the first entries, but be careful if you pick up a different
Linear Algebra text!
49
For
z1
.
~z = .. ∈ Cn
zn
we define
z1
.
~z = .. ∈ Cn
zn
from which we see
h~z, wi
~ = ~z · w.
~
~ ∈ Cn can be viewed as a dot
From this, we can see that the complex inner product of ~z, w
product of ~z and w
~ rather than of ~z and w.
~
Solution.
h~z, wi
~ = (2 − 2j)(2 + j) + (1 + j)(3)
= (2 + 2j)(2 + j) + (1 − j)(3)
= (2 + 6j) + (3 − 3j)
= 5 + 3j
and
hw,
~ ~z i = (2 + j)(2 − 2j) + (3)(1 + j)
= (2 − j)(2 − 2j) + (3)(1 + j)
= (2 − 6j) + (3 + 3j)
= 5 − 3j
~ ∈ Cn , h~z, wi
The last example shows us that for ~z, w ~ =6 hw,
~ ~z i in general.
~ ~z ∈ Cn and α ∈ C. Then
Theorem 8.3 (Properties of Complex Inner Products). Let ~v , w,
(2) h~z, wi
~ = hw,
~ ~z i
(3) h~v + w,
~ ~z i = h~v , ~z i + hw,
~ ~z i and h~z, ~v + w
~ i = h~z, ~v i + h~z, w
~i
50
(4) hα~z, w
~ i = αh~z, w
~ i and h~z, αw
~ i = αh~z, w
~i
(5) |h~z, w
~ i| ≤ k~z kkwk
~ (Cauchy–Schwarz Inequality)
(6) k~z + wk
~ ≤ k~z k + kwk
~ (Triangle Inequality)
and
h~z, αw
~ i = z 1 (αw1 ) + · · · + z n (αwn )
= α z 1 w1 + · · · + α z n wn
= α(z 1 w1 + · · · + z n wn )
= αh~z, w~i
• From Theorem 8.3(2), we see that the complex inner product does not commute, that
~ ∈ Cn , h~z, wi
is, for ~z, w 6 hw,
~ = ~ ~z i. Of course, if ~z, w
~ happen to have all real entries,
then h~z, wi
~ = hw, ~ ~z i. Thus, when we say the complex inner product doesn’t commute,
we mean that there exist ~z, w ~ ∈ Cn so that h~z, wi
~ 6= hw,~ ~z i. Of course, we have that
h~z, wi
~ = hw, ~ ~z i, so knowing the value of h~z, wi
~ allows us to easily compute hw, ~ ~z i.
• It might seem that the complex inner product is “made up” as it was our attempt
to make sure h~z, ~z i is a nonnegative real number. However, given that the complex
inner product obeys the Triangle Inequality and the Cauchy–Schwarz Inequality as
well as many of the other properties dot products satisfy for vectors in Rn , it should
be apparent that the complex inner product was the correct choice.
51
in Cn , however this time, we had to do a little bit of work to make sure everything still
made sense. Near the end of the course, we will extend these ideas further – vector
addition and scalar multiplication can be applied to objects other than the vectors in
Rn and Cn and studying collections of such objects is one of the things that makes
Linear Algebra such an interesting and beautiful branch of mathematics.
x3 y3
be two vectors in R3 . The cross product 17 of ~x and ~y is
x2 y 3 − y 2 x3
~x × ~y = −(x1 y3 − y1 x3 )
x1 y 2 − y 1 x2
3 2
Then
6(2) − 3(3) 3
~x × ~y = − 1(2) − (−1)(3) = −5
1(3) − (−1)(6) 9
52
The formula for ~x × ~y is quite tedious to remember. Here we give a simpler way. For
a, b, c, d ∈ R, define
a b
= ad − bc
c d
so that
x2 y 2
←− remove x1 and y1
x3 y 3
x1 y1
x1 y1
~x × ~y = x2 × y 2 = − x
←− remove x2 and y2 (don’t forget the “−” sign)
3 y3
x3 y3
x1 y1
←− remove x and y
3 3
x2 y2
x2 y 3 − y 2 x3
= −(x1 y3 − y1 x3 ) .
x1 y 2 − y 1 x2
It’s a good idea to try this “trick” using the above example.
53
Lecture 9
~ ∈ R3 , c ∈ R. Then
Theorem 9.1 (Properties of Cross Products). Let ~x, ~y , w
(1) ~x × ~y ∈ R3
(3) ~x × ~0 = ~0 = ~0 × ~x
(4) ~x × ~x = ~0
(5) ~x × ~y = −(~y × ~x )
~ × (~x ± ~y ) = (w
(7) w ~ × ~x ) ± (w
~ × ~y )
(8) (~x ± ~y ) × w
~ = (~x × w
~ ) ± (~y × w
~)
Proof. We prove (5). Let
x1 y1
~x = x2 and ~y = y2 .
x3 y3
Then
x2 y 3 − y 2 x3 −(y2 x3 − x2 y3 ) y2 x3 − x2 y3
~x × ~y = −(x1 y3 − y1 x3 ) = y1 x3 − x1 y3 = − −(y1 x3 − x1 y3 ) = −(~y × ~x ).
x1 y 2 − y 1 x2 −(y1 x2 − x1 y2 ) y1 x2 − x1 y2
0 0 1
54
Then
1 0 0 0 0 0
(~x × ~y ) × w
~ = 1 × 1 × 0 = 0 × 0 = 0
0 0 1 1 1 0
and
1 0 0 1 1 0
~ ) = 1 × 1 × 0 = 1 × 0 = 0
~x × (~y × w
0 0 1 0 0 −1
so we see that (~x × ~y ) × w
~ 6= ~x × (~y × w
~ ). Thus, the cross product is not associative.
Since the cross product is not associative, the expression ~x × ~y × w ~ is undefined. We must
always include brackets to indicate in which order we should evaluate the cross products as
changing the order will change the result. Also note that the cross product is not commu-
tative as ~x × ~y 6= ~y × ~x. However, since ~x × ~y = −(~y × ~x), we say that the cross product
is anti-commutative, that is, changing the order of ~x and ~y in the cross product changes the
result by a factor of −1.
Example 9.4. Find a nonzero vector orthogonal to both
1 1
~x = 2 and ~y = −1 .
3 −1
Moreover, show that this vector is orthogonal to any linear combination of ~x and ~y .
Solution. Using Theorem 9.1(2), we have that
1 1 1
~n = ~x × ~y = 2 × −1 = 4
3 −1 −3
is orthogonal to both ~x and ~y . Now for any s, t ∈ R,
~n · (s~x + t~y ) = s(~n · ~x) + t(~n · ~y ) = s(0) + t(0) = 0
so ~n = ~x × ~y is orthogonal to any linear combination of ~x and ~y .
Example 9.4 demonstrates one of the main uses of the cross product in R3 . Given two non
parallel vectors ~x, ~y ∈ R3 , it is quite useful to find a nonzero vector that is orthogonal to both
~x and ~y (and hence to any linear combination of them). Also, we note here that once the
cross product of ~x, ~y ∈ R3 is computed, we can check that our work is correct by verifying
that (~x × ~y ) · ~x = 0 = (~x × ~y ) · ~y .
We now look at how the cross product can be used to compute the area of a parallelogram.
We will need the following result which is stated without proof.
55
Theorem 9.5 (Lagrange Identity). Let ~x, ~y ∈ R3 . Then k~x × ~y k2 = k~x k2 k~y k2 − (~x · ~y )2 .
Denoting the base by b and the height by h, we see that b = k~x k and that h satisfies
sin θ = k~yh k which gives h = k~y k sin θ. Denoting the area of the parallelogram by A, we have
We see that the norm of the cross product of two nonzero vectors ~x, ~y ∈ R3 gives the area
of the parallelogram that ~x and ~y determine. Our derivation has been for nonzero vectors ~x
and ~y , and we implicitly assumed that ~x and ~y were not parallel in the above diagram. Note
that if ~x and ~y are parallel, then the parallelogram they determine is simply a line segment
(a degenerate parallelogram) and thus the area is zero. Moreover, if any of ~x and ~y are zero,
then the area of the resulting parallelogram is again zero. Note that in these two cases we
have ~x × ~y = ~0, so our formula A = k~x × ~y k holds for any ~x, ~y ∈ R3 .
56
Example 9.6. Let
1 1
~x = 1 and ~y = 2 .
1 −3
Find
(a) the area of the parallelogram determined by ~x and ~y .
(b) the area of the triangle determined by ~x and ~y .
Solution.
(a) Since
1 1 −5
~x × ~y = 1 × 2 = 4 ,
1 −3 1
the area of the parallelogram is
√ √
A = k~x × ~y k = 25 + 16 + 1 = 42.
(b) The area of the triangle determined by ~x and ~y is half of the√area of the parallelogram
determined by ~x and ~y , that is, the area of the triangle is 12 42 (see Figure 24).
57
where m, a, b, c are constants. How do we describe lines in Rn (for example, in R3 )? It might
be tempting to think the above equations are equations of lines in Rn as well, but this is
not the case. Consider the graph of the line x2 = x1 in R2 . This graph consists of all points
(x1 , x2 ) such that x2 = x1 , which yields a line (see Figure 25). If we consider the equation
x2 = x1 in R3 , then we are considering all points (x1 , x2 , x3 ) with the property that x2 = x1 .
Notice that there is no restriction on x3 , so we can take x3 to be any real number. It follows
that the equation x2 = x1 represents a plane in R3 and not a line (see Figure 26).
Figure 26: The graph of x2 = x1 is a plane in R3 . The red line indicates the intersection of
the plane with the x1 x2 −plane.
58
Note that we require two things to describe a line:
1) A point P on the line,
2) A vector d~ in the direction of the line (called a direction vector for the line).
~ where d~ ∈ Rn is nonzero,
Definition 9.7. A line in Rn through a point P with direction d,
is given by the vector equation
x1
. −→ ~ t ∈ R.
~x = .. = OP + td,
xn
−→
Figure 27 shows how the line through P with direction d~ is “drawn out” by the vector OP +td~
as t varies from −∞ to ∞.
−→
Figure 27: The line through P with direction d~ and the vector OP + td~ for a few values of t.
−→
We can also think of the equation ~x = OP + td~ as first moving us from the origin to the
~ This is shown
point P , and then moving from P as far as we like in the direction given by d.
in Figure 28.
Example 9.8. Find the vector equation of the line through the points A(1, 1, −1) and
B(4, 0, −3).
Solution. We first find a direction vector for the line. Since the line passes through the points
A and B, we take the direction vector to be the vector from A to B. That is,
4 1 3
−→ −−→ −→
d~ = AB = OB − OA = 0 − 1 = −1 .
−3 −1 −2
59
−→ ~
Figure 28: An equivalent way to understand the vector equation ~x = OP + td.
Hence, using the point A, we have a vector equation for our line:
1 3
−→ −→
~x = OA + tAB = 1 + t −1 , t ∈ R.
−1 −2
Note that the vector equation for a line is not unique. In fact, in Example 9.8, we could
−→
have used the vector BA as our direction vector, and we could have used B as the point on
our line to obtain
4 −3
−−→ −→
~x = OB + tBA = 0 + t 1 , t ∈ R.
−3 2
Indeed, we can use any known point on the line and any nonzero scalar multiple of the
direction vector for the line when constructing the vector equation. Thus, there are infinitely
many vector equations for a line (see Figure 29).
Finally, given one of the vector equations for the line in Example 9.8, we have
x1 1 3 1 3t 1 + 3t
~x = x2 = 1 + t −1 = 1 + −t = 1 − t
x3 −1 −2 −1 −2t −1 − 2t
60
Figure 29: Two different vector equations for the same line.
x1 = 1 + 3t
x2 = 1 − t, t∈R
x3 = −1 − 2t
which we call the parametric equations of the line. For each choice of t ∈ R, these equations
give the x1 −, x2 − and x3 −coordinates of a point on the line. Note that since the vector
equation for a line is not unique, neither are the parametric equations for a line.
61
Lecture 10
Example 10.2. Find a vector equation for the plane containing the points A(1, 1, 1),
B(1, 2, 3) and C(−1, 1, 2).
62
Solution. We compute
1 1 0
−→ −−→ −→
AB = OB − OA = 2 − 1 = 1
3 1 2
−1 1 −2
−→ −→ −→
AC = OC − OA = 1 − 1 = 0
2 1 1
−→ −→
and note that AB and AC are nonzero and nonparallel. A vector equation is thus
x1 1 0 −2
−→ −→ −→
~x = x2 = OA + sAB + tAC = 1 + s 1 + t 0 , s, t ∈ R.
x3 1 2 1
The plane from the previous example is shown in Figure 31. We see that by setting either of
s, t ∈ R to be zero and letting the other parameter be arbitrary, we obtain vector equations
for two lines – each of which lie in the given plane:
1 0 1 −2
−→ −→ −→ −→
~x = OA+sAB = 1 +s 1 , s ∈ R and ~x = OA+tAC = 1 +t 0 , t ∈ R.
1 2 1 1
63
Figure 32: A plane and two nonparallel lines in it.
We also note that evaluating the right hand side of the above vector equation gives
x1 1 0 −2 1 − 2t
~x = x2 = 1 + s 1 + t 0 = 1 + s
x3 1 2 1 1 + 2s + t
x1 = 1 − 2t
x2 = 1 + s s, t ∈ R
x3 = 1 + 2s + t
Finally, we note that as with lines, our vector equation for the plane in Example 10.2 is not
unique as we could have chosen
−−→ −−→ −→
~x = OB + sBC + tAB, s, t ∈ R
−−→ −→
as the vector equation instead (it is easy to verify that BC and AB are nonzero and non-
parallel).
Example 10.3. Find a vector equation of the plane containing the point P (1, −1, −2) and
the line with vector equation
1 1
~x = 3 + r 1 , r ∈ R.
−1 4
64
Solution. We construct two vectors lying in the plane. For one, we can take the direction
vector of the given line, and for the other, we can take a vector from a known point on the
given line to the point P . Thus we let
1 1 1 0
~u = 1 and ~v = −1 − 3 = −4 .
4 −2 −1 −1
Then, since ~u and ~v are nonzero and nonparallel, a vector equation for the plane is
−→
~x = OP + s~u + t~v
1 1 0
= −1 + s 1 + t −4 , s, t ∈ R.
−2 4 −1
We note that for the vector equation for a plane, we do require ~u and ~v to be nonparallel. If
~u and ~v are parallel, say ~u = c~v for some c ∈ R, then the vector equation we derive is
−→ −→ −→
~x = OP + s~u + t~v = OP + s(c~v ) + t~v = OP + (sc + t)~v ,
65
Definition 10.4. A nonzero vector ~n ∈ R3 is a normal vector for a plane if for any two
−→
points P and Q on the plane, ~n is orthogonal to P Q.
We note that given a plane in R3 , a normal vector for that plane is not unique as any nonzero
scalar multiple of that vector will also be a normal vector for that plane.
n3
and suppose P (a, b, c) is a given point on this plane. For any point Q(x1 , x2 , x3 ), Q lies on
the plane if and only if
n1 x1 − a
−→ −→ −→
0 = ~n · P Q = ~n · OQ − OP = n2 · x2 − b = n1 (x1 − a) + n2 (x2 − b) + n3 (x3 − c).
n3 x3 − c
n1 x1 + n2 x2 + n3 x3 = n1 a + n2 b + n3 c.
Example 10.6. Find a scalar equation of the plane containing the points A(3, 1, 2), B(1, 2, 3)
and C(−2, 1, 3).
Solution. We have three points lying on the plane, so we only need to find a normal vector
for the plane. We compute
1 3 −2
−→ −−→ −→
AB = OB − OA = 2 − 1 = 1
3 2 1
−2 3 −5
−→ −→ −→
AC = OC − OA = 1 − 1 = 0
3 2 1
66
−→ −→
Figure 34: The normal vector ~n is orthogonal to both AB and AC.
−→ −→
and notice that AB and AC are nonzero nonparallel vectors in R3 . We compute
−2 −5 1
−→ −→
~n = AB × AC = 1 × 0 = −3
1 1 5
−→ −→
and recall that the nonzero vector ~n is orthogonal to both AB and AC. It follows from
Example 9.4 that ~n is orthogonal to the entire plane and is thus a normal vector for the
plane. Hence, using the point A(3, 1, 2), our scalar equation is
1(x1 − 3) − 3(x2 − 1) + 5(x3 − 2) = 0
which evaluates to
x1 − 3x2 + 5x3 = 10.
Figure 34 helps us visualize the plane from the previous example.
We make a few remarks about the preceding example here.
• Using the point B or C rather than A to compute the scalar equation would lead to
the same scalar equation as is easily verified.
• As the normal vector for the above plane is not unique, neither is the scalar equation.
In fact, 2~n is also a normal vector for the plane, and using it instead of ~n would lead to
the scalar equation 2x1 − 6x2 + 10x3 = 20, which is just the scalar equation we found
multiplied by a factor of 2.
• From our work above, we see that we can actually compute a vector equation for the
plane:
3 −2 −5
−→ −→ −→
~x = OA + sAB + tAC = 1 + s 1 + t 0 , s, t ∈ R
2 1 1
67
−→
for example. In fact, given a vector equation ~x = OP + s~u + t~v for a plane in R3
containing a point P , we can compute a normal vector ~n = ~u × ~v .
• Note that in the scalar equation x1 − 3x2 + 5x3 = 10, the coefficients on the variables
x1 , x2 and x3 are exactly the entries in the normal vector as predicted by the formula
for the scalar equation. Thus, if we are given a scalar equation of a different plane, say
3x1 − 2x2 + 5x3 = 72, we can deduce immediately that
3
~n = −2
1 2 3
Suppose you are asked if the point (2, 6, 0) lies on this plane. Using the scalar equation
4x1 − x2 − x3 = 2, we see that 4(2) − 1(6) − 1(0) = 2 satisfies this equation so we can easily
conclude that (2, 6, 0) lies on the plane. However, if we use the vector equation, we must
determine if there exist s, t ∈ R such that
1 1 1 2
1 + s 2 + t 1 = 6
1 2 3 0
s + t = 1
2s + t = 5
2s + 3t = −1
With a little work, we can find that the solution19 to this system is s = 4 and t = −3
which again guarantees that (2, 6, 0) lies on the plane. It should be clear that using a scalar
equation is preferable here. On the other hand, if you are asked to find a point that lies
on the plane, then using the vector equation, we may select any two values for s and t (say
s = 0 and t = 0) to conclude that the point (1, 1, 1) lies on the plane. It is not too difficult
to find a point lying on the plane using the scalar equation either - this will likely be done
19
We will look at a more efficient technique to solve systems of equations in a few lectures.
68
by choosing two of x1 , x2 , x3 and then solving for the last, but this does involve a little bit
more math. Thus, the scalar equation is preferable when verifying if a given point lies on a
plane, and the vector equation is preferable when asked to generate points that lie on the
plane.
We have have discussed parallel vectors previously, and we can use this definition to define
parallel lines and planes.
Definition 10.7. Two lines in Rn are parallel if their direction vectors are parallel. Two
planes in R3 are parallel if their normal vectors are parallel.
69
Lecture 11
Projections
Given two vectors ~u, ~v ∈ Rn with ~v 6= ~0, we can write ~u = ~u1 + ~u2 where ~u1 is a scalar
multiple of ~v and ~u2 is orthogonal to ~v . In physics, this is often done when one wishes to
resolve a force into its vertical and horizontal components.
Figure 35: Decomposing ~u ∈ Rn as ~u = ~u1 + ~u2 where ~u1 is parallel to ~v and ~u2 is orthogonal
to ~v .
This is not a new idea. In R2 , we have seen that we can write a vector ~u as a linear
combination ~e1 = [ 10 ] and ~e2 = [ 01 ] in a natural way. Figure 36 shows that we are actually
writing a vector ~u ∈ R2 as the sum of a vector parallel to ~e1 and orthogonal to ~e1 .
70
so if we can find t, then we can find ~u1 and then find ~u2 . To find t, we have
Hence
0 = ~u · ~v − t(~v · ~v ) = ~u · ~v − tk~v k2
and since ~v 6= ~0,
~u · ~v
t= .
k~v k2
Note that from our above work, ~u1 = proj ~v ~u and ~u2 = perp ~v ~u.
Figure 37: Visualizing projections and perpendiculars based on the angle determined by
~u, ~v ∈ Rn .
3 2
71
Then
−1 −1 −7/6
~u · ~v −1 + 2 + 6 7
proj ~v ~u = ~v = 1 = 1 = 7/6 .
k~v k 2 1+1+4 6
2 2 7/3
and
1 −7/6 13/6
perp ~v ~u = ~u − proj ~v ~u = 2 − 7/6 = 5/6 .
3 7/3 2/3
• (perp ~v ~u) · ~v = − 13
6
+ 56 + 4
3
= − 86 + 8
6
= 0 so perp ~v ~u is orthogonal to ~v ,
Example 11.3. For ~u, ~v ∈ Rn with ~v 6= ~0, prove that proj ~v ~u and perp ~v ~u are orthogonal.
Proof. We have
72
Example 11.4. Find the shortest distance from the point P (1, 2, 3) to the line L which
passes through the point P0 (2, −1, 2) with direction vector
1
d~ = 1 .
−1
Before we state the solution, we illustrate the situation in Figure 38. Note that the line L and
the point P were plotted arbitrarily, so it is not meant to be accurate. It does however, give
us a way to think about the problem geometrically and inform us as to what computations
we should do.
Solution. We construct the vector from the point P0 lying on the line to the point P which
gives
1 2 −1
−−→ −→ −−→
P0 P = OP − OP0 = 2 − −1 = 3 .
3 2 1
−−→
Projecting the vector P0 P onto the direction vector of the line leads to
−−→ ~ 1 1 1/3
−−→ P0 P · d ~ −1 + 3 − 1 1
proj d~ P0 P = d= 1 = 1 = 1/3
~
kdk 2 1+1+1 3
−1 −1 −1/3
73
and it follows that
−1 1/3 −4/3
−−→ −−→ −−→
perp d~ P0 P = P0 P − proj d~ P0 P = 3 − 1/3 = 8/3 .
1 −1/3 4/3
2 −1/3 5/3
and
1 −4/3 7/3
−→ −→ −−→
OQ = OP − perp d~ P0 P = 2 − 8/3 = −2/3 .
3 4/3 5/3
7
, − 23 , 53
In either case, Q 3
is the point on L closest to P .
−−→
Now we see that Figure 38 was indeed inaccurate: It suggest that proj d~ P0 P is approximately
5~ −−→ ~
2
d, but our computations show that proj d~ P0 P = 13 d.
Example 11.5. Find the shortest distance from the point P (1, 2, 3) to the plane T with
equation x1 + x2 − 3x3 = −2. Also, find the point Q on T that is closest to P .
−3
3 0 3
74
−→ −−→
Figure 39: Finding the distance from a point to a plane. Note that kQP k = kproj ~n P0 P k.
and
−−→ 1 1
−−→ P0 P · ~n 3+2−9 4
proj ~n P0 P = ~n = 1 = − 1 .
k~nk 2 1+1+9
11
−3 −3
To find Q we have
1 1 15/11
−→ −→ −−→ 4
OQ = OP − proj ~n P0 P = 2 + 1 = 26/11
11
3 −3 21/11
15 26 21
so Q , ,
11 11 11
is the point on T closest to P .
75
Lecture 12
Volumes of Parallelepipeds in R3
Consider three nonzero vectors w, ~ ~x, ~y ∈ R3 such that no one vector is a linear combination
of the other two (that is, w,
~ ~x, ~y are nonzero and nonparallel and no one of them lies on the
plane determined by the other two20 ). These three vectors determine a parallelepiped, which
is the three dimensional analogue of a parallelogram.
The volume of the parallelepiped is the product of its height with the area of its base. We
know that the area of the base is given by k~x ×~y k (which is nonzero since ~x and ~y are nonzero
and nonparallel), and we can find the height by computing the length of the projection of w ~
onto ~x × ~y . Thus, the volume V of the parallelepiped is given by
76
Example 12.1. Let
1 1 1
w
~ = 1 , ~x = 1 and ~y = 2 .
1 2 −3
Then
1 1 1 1 −7
~ · (~x × ~y ) = 1 · 1 × 2 = 1 · 5 = −7 + 5 + 1 = −1
w
1 2 −3 1 1
V = |w
~ · (~x × ~y )| = | − 1| = 1.
• In our derivation of the formula for the volume of the parallelepiped determined by
the vectors w,
~ ~x and ~y , there was nothing special about labelling the vectors the way
that we did. We only needed to call one of the vectors w, ~ one of them ~x and one of
them ~y . Also, we could have chosen any of the six sides of the parallelogram to be the
base. Thus, we also have
V = |w
~ · (~y × ~x)| = |~x · (w
~ × ~y )| = |~x · (~y × w)|
~ = |~y · (~x × w)|
~ = |~y · (w
~ × ~x)|.
• Our derivation also required that no one of the vectors w, ~ ~x and ~y was a linear combi-
nation of the others (so no one of the three vectors lied in the plane through the origin
determined by the other two). Suppose one of the vectors is a linear combination of
the others, say w~ is a linear combination of ~x and ~y . Then w ~ = s~x + t~y for some
s, t ∈ R (from which we see w ~ lies in the plane through the origin determined by ~x
and ~y ). Geometrically, the resulting parallelepiped determined by w, ~ ~x and ~y is “flat”,
and thus the volume should be zero. Since w ~ lies in the plane determined by ~x and ~y ,
we have that w ~ is orthogonal to ~x × ~y and so w ~ · (~x × ~y ) = 0 so our derived formula
does indeed return the correct volume. A similar result occurs if ~x or ~y is a linear
combination of the other two vectors. Thus, our formula V = |w ~ · (~x × ~y )| holds for
3
~ ~x, ~y ∈ R .
any three vectors w,
77
Definition 12.2. A set is a collection of objects. We call the objects elements of the set.21
Example 12.3.
• S = {1, 2, 3} is a set with three elements, namely 1, 2 and 3,
• T = {♥, f (x), {1, 2}, 3},
• ∅ = { }, the set with no elements, which is called the empty set.
We see that one way to describe a set is to list the elements of the set between curly braces
“{” and “}”. The set T shows that a set can have elements other than numbers - the elements
can be functions, other sets, or other symbols. The empty set has no elements in it, and we
normally prefer using ∅ over { } in this case.
Given a set S, we write x ∈ S if x is an element of S, and x ∈
/ S is x is not an element of S.
Example 12.4. For T = {♥, f (x), {1, 2}, 3}, we have
♥ ∈ T, f (x) ∈ T, {1, 2} ∈ T and 3 ∈ T
but
1∈
/T and 2 ∈
/T
Example 12.5. Here are a few more sets that we know:
• N = {1, 2, 3, . . .},
• Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .},
na o
• Q= a, b ∈ Z, b 6= 0 ,
b
• R is the the set of all numbers that are either rational or irrational,
• C = {a + bj | a, b ∈ R},
x 1
..
n
• R = . x1 , . . . , x n ∈ R .
xn
Note that each of these sets contains infinitely many elements. The sets N and Z are defined
by listing their elements (or rather, listing enough elements so that you “get the idea”), the
set R is defined using words, and the sets Q, C and Rn are defined using set builder notation
where an arbitrary element is described. For example, the set
na o
Q= a, b ∈ Z, b 6= 0
b
is understood to mean “Q is the set of all fractions of the form ab where a and b are integers
and b is nonzero”.
21
This definition is far from the formal definition, and can lead to contradictions if we are not careful. For
our purposes here, however, this definition will be sufficient.
78
x1
1
3
Example 12.6. Let S = x2 ∈ R 2x1 − x2 + x3 = 4 . Is 2 ∈ S?
x3 3
1
Solution. Since 2(1) − 2 + 3 = 3 6= 4, we have that 2 ∈
/ S.
3
We now define two ways that we can combine given sets to create new sets.
Definition 12.7. Let S, T be sets. The union of S and T is the set
S ∪ T = {x | x ∈ S or x ∈ T }
S ∩ T = {x | x ∈ S and x ∈ T }.
We can visualize the union and intersection of two sets using Venn Diagrams. Although
Venn Diagrams can help us visualize sets, they should never be used as part of a proof of
any statement regarding sets.
(a) A Venn Diagram depicting the union (b) A Venn Diagram depicting the inter-
of two sets S and T . section of two sets S and T .
S ∪ T = {−1, 1, 2, 3, 4, 6, 7}
S ∩ T = {2, 4}
Example 12.10. Let S = {1, 2, 4} and T = {1, 2, 3, 4}. Then S ⊆ T since every element of
S is an element of T , but T 6⊆ S since 3 ∈ T , but 3 ∈
/ S.
79
Figure 42: A Venn diagram showing an instance when S ⊆ T on the left, and an instance
when S 6⊆ T on the right.
Note that it’s important to distinguish between an element of a set and a subset of a set.
For example,
1 ∈ {1, 2, 3} but 1 6⊆ {1, 2, 3}
and
{1} ∈
/ {1, 2, 3} but {1} ⊆ {1, 2, 3}.
More interestingly,
which shows that an element of a set may also be a subset of a set. This last example can
cause students to stumble, so the following may help:
Finally we mention that for any set S, we have that ∅ ⊆ S. This generally seems quite
strange at first. However if ∅ 6⊆ S, then there must be some element x ∈ ∅ such that x ∈
/ S.
But the empty set contains no elements, so we can never show that ∅ is not a subset of S.
Thus we are forced to conclude that ∅ ⊆ S.22
22
The statement ∅ ⊆ S is called vacuously true, that is, it is a true statement simply because we cannot
show that it is false.
80
Example 12.12. Let
( " # " # " # )
1 1 2
S= c1 + c2 + c3 c1 , c2 , c3 ∈ R
2 1 3
( " # " # )
1 1
T = d1 + d2 d1 , d2 ∈ R .
2 1
Show that S = T .
Before we give the solution, we note that S is the set of all linear combinations of the vectors
" # " # " #
1 1 2
, and
2 1 3
81
Lecture 13
Spanning Sets
Definition 13.1. Let B = {~v1 , . . . , ~vk } be a set of vectors in Rn . The span of B is
Span B = {c1~v1 + · · · + ck~vk | c1 , . . . , ck ∈ R}.
We say that the set Span B is spanned by B and that B is a spanning set for Span B.
It is important to note that there are two sets here: B and Span B. Given that B =
{~v1 , . . . , ~vk } is a set of vectors in Rn , Span B is simply the set of all linear combinations
of the vectors ~v1 , . . . , ~vk . To show that a vector ~x ∈ Rn belongs to Span B, we must show
that we can express ~x as a linear combination of ~v1 , . . . , ~vk . As an example, note that for
i = 1, . . . , k,
~vi = 0~v1 + · · · + 0~vi−1 + 1~vi + 0~vi+1 + · · · + 0~vk
from which we see that vi ∈ Span B for i = 1, . . . , k. This shows that B ⊆ Span B.
Example 13.2. Determine whether or not
" # (" # " #)
2 4 3
∈ Span , .
3 5 3
2 = 4c1 + 3c2
3 = 5c1 + 3c2 .
Subtracting the first equation from the second equation gives c1 = 1 and substituting c1 = 1
into either equation and solving for c2 gives c2 = − 32 . Thus
" # " # " #
2 4 2 3
=1 −
3 5 3 3
" # " # " #
2 4 3
and so can be expressed as a linear combination of and which allows
3 5 3
us to conclude that " # (" # " #)
2 4 3
∈ Span , .
3 5 3
82
Example 13.3. Determine whether or not
1 1
1
2 ∈ Span .
0 , 1
3 1 0
3 1 0 c1
1 = c1 + c2
2 = c2
3 = c1
It is clear that the last two equations give c1 = 3 and c2 = 2, but from the first equation
we
have c1 + c2 = 3 + 2 = 5 6= 1, so our system cannot have a solution.
Here we see that
1 1 1
2 cannot be expressed as a linear combination of 0 and 1 and we conclude
3 1 0
that
1 1
1
2 ∈
/ Span .
0 , 1
3 1 0
Given a set of vectors ~v1 , . . . , ~vk , we now try to understand what Span {~v1 , . . . , ~vk } looks like
geometrically.
Example 13.4. Describe the subset
1
S = Span 2 .
3
of R3 geometrically.
Solution. By defintion,
1
S = s 2 s ∈ R .
3
83
Thus, ~x ∈ S if and only if
1
~x = s 2
3
for some s ∈ R. The equation
1
~x = s 2 , s∈R
3
is called a vector equation for S. But we see that this is simply a vector equation for
a line in
1
3
R through the origin. Hence, S is a line through the origin with direction vector 2 .
of R3 geometrically.
Solution. By definition,
1 1
S = s 0 + t 1 s, t ∈ R
1 0
1 0
1 0
are not scalar multiples of one another, we see that S is a plane in R3 through the origin23 .
23
The set S is from Example 13.3. In light of what we have observed here, Example 13.3 shows us that
the point P (1, 2, 3) does not lie on the plane S.
84
Example 13.6. Let
1
1 1
S = Span 0 , 1 , 2 .
0 0 1
Show that S = R3 .
Solution. We show S = R3 by showing that S ⊆ R3 and that R3 ⊆ S. To see that S ⊆ R3 ,
note that
1 1 1
3
0 , 1 , 2 ∈ R
0 0 1
and that S contains all linear combinations of these three vectors. Since R3 is closed under
linear combinations (see V 1 and V 6 from Theorem 6.10), every vector in S must be a vector
in R3 , so S ⊆ R3 . Now, let
x1
~x = x2 ∈ R3
x3
and for c1 , c2 , c3 ∈ R consider
x1 1 1 1 c1 + c2 + c3
x2 = c1 0 + c2 1 + c3 2 = c2 + 2c3
x3 0 0 1 c3
We have the system of equations
x 1 = c1 + c2 + c3
x2 = c2 + 2c3
x3 = c3
The last equation gives c3 = x3 , and from the second equation we have that
c2 = x2 − 2c3 = x2 − 2x3 .
Finally, from the first equation, we see
c1 = x1 − c2 − c3 = x1 − (x2 − 2x3 ) − x3 = x1 − x2 + x3 .
Thus
x1 1 1 1
x2 = (x1 − x2 + x3 ) 0 + (x2 − 2x3 ) 1 + x3 2
x3 0 0 1
and it follows that ~x ∈ S so R3 ⊆ S. Hence, S = R3 .
85
It would seem (at least in R3 ) that the span of one vector gives a line through the origin,
the span of two vectors gives a plane through the origin, and the span of three vectors gives
all of R3 . Unfortunately this is not always true as the next example shows.
Example 13.7. Describe the subset
1
0 1
S = Span 0 , 1 , 1
0 0 0
of R3 geometrically.
Solution. By definition,
1 0 1
S = c1 0 + c2 1 + c3 1 c1 , c2 , c3 ∈ R
0 0 0
0 0 0
0 0 0
so
1 0 1 0
~x = c1 0 + c2 1 + c3 0 + 1
0 0 0 0
1 0
= (c1 + c3 ) 0 + (c2 + c3 ) 1 .
0 0
0 0
86
is also a vector equation for S. Since the vectors
1 0
0 and 1
0 0
are not scalar multiples of one another, we see that S is a plane in R3 through the origin.
From
1 0 d1
~x = d1 0 + d2 1 = d2 ,
0 0 0
we clearly24 see that S is the x1 x2 −plane of R3 .
In the previous example, one of the vectors in the spanning set for S was a linear combination
of the other vectors in that spanning set. We saw that we could remove that vector from the
spanning set and the resulting smaller set would still span S. It was important to do this as
it allowed us to understand that S was geometrically a plane in R3 through the origin.
Theorem 13.8. Let ~v1 , . . . , ~vk ∈ Rn . One of these vectors, say ~vi , can be expressed as a
linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk if and only if
We make a comment here before giving the proof. The theorem we need to prove is a double
implication as evidenced by the words if and only if.25 Thus we must prove two implications:
1. If ~vi can be expressed as a linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk , then
Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }
2. If Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }, then ~vi can be expressed as a
linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk .
The result of this theorem is that the two statements
“ ~vi can be expressed as a linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk ”
and
“ Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }”
are equivalent, that is, they are both true or they are both false. The proof that follows is
often not understood after just the first reading - it takes a bit of time to understand, so
don’t be discouraged if you need to read it a few times before it begins to make sense.
24
Please be very careful if you use “clearly”, as what seems clear to you might not be clear to someone
else. Many marks have been lost by students due to “clearly” being used when their work was not clear at
all.
25
We sometimes write ⇐⇒ to mean “if and only if”. To prove a statement of the form A ⇐⇒ B, we must
prove the two implications A =⇒ B and B =⇒ A, and so we call A ⇐⇒ B a double implication.
87
Proof. Without loss of generality26 , we assume i = k. To simplify the writing of the proof,
we let
To prove the first implication, assume that ~vk can be expressed as a linear combination of
~v1 , . . . , ~vk−1 . Then there exist c1 , . . . , ck−1 ∈ R such that
We must show that A = B. Let ~x ∈ A. Then there exist d1 , . . . , dk−1 , dk ∈ R such that
and we make the substitution for ~vk using Equation (12) to obtain
from which we see that ~x can be expressed as a linear combination of ~v1 , . . . , ~vk−1 and it
follows that ~x ∈ B. Hence A ⊆ B. Now let ~y ∈ B. Then there exist a1 , . . . , ak−1 ∈ R such
that
~y = a1~v1 + · · · + ak−1~vk−1
= a1~v1 + · · · + ak−1~vk−1 + 0~vk
and we have that ~y can be expressed as a linear combination of ~v1 , . . . , ~vk from which it
follows that ~y ∈ A. We have that B ⊆ A and combined with A ⊆ B we conclude that
A = B.
To prove the second implication, we now assume that A = B and we must show that
~vk can be expressed as a linear combination of ~v1 , . . . , ~vk−1 . Since vk ∈ A (recall that
~vk = 0~v1 + · · · + 0~vk−1 + 1~vk ) and A = B, we have ~vk ∈ B. Thus, there exist b1 , . . . , bk−1 ∈ R
such that ~vk = b1~v1 + · · · + bk−1~vk−1 as required.
88
Since " # " # " # " # " #
5 1 1 2 0
=5 =5 +0 +0
0 0 0 4 1
Theorem 13.8 gives (" # " # " #)
1 2 0
S = Span , ,
0 4 1
and since " # " # " #
2 1 0
=2 +4
4 0 1
it again follows from Theorem 13.8 that
(" # " #)
1 0
S = Span ,
0 1
and since " # " #
1 0
and
0 1
are not scalar multiples of one another, we cannot remove either of them from the spanning
set without changing the span. A vector equation for S is
" # " #
1 0
~x = c1 + c2 , c1 , c2 ∈ R.
0 1
Combining the vectors on the right gives
" #
c1
~x = .
c2
and it is clear that S = R2 .
Regarding the last example, the vector(s) that were chosen to be removed from the spanning
set depended on us noticing that some were linear combinations of others. Of course, we
could have noticed that " # " # " #
1 1 2 0
= −2
0 2 4 1
and concluded that (" # " # " #)
5 2 0
S = Span , ,
0 4 1
and then continued from there. Indeed, any of
nh i h io nh i h io nh i h io nh i h io
1 2 5 2 5 0 2 0
S = Span 0 , 4 = Span 0 , 4 = Span 0 , 1 = Span 4 , 1
are also correct descriptions of S where the spanning sets cannot be further reduced.
89
Lecture 14
90
Solution. Let c1 , c2 ∈ R and consider
" # " # " #
2 −1 0
c1 + c2 = .
3 2 0
−1 0 1 0
We obtain
c1 + 2c2 + c3 = 0
c2 + c3 = 0
−c1 + c3 = 0
From the third equation, we see that c1 = c3 and from the second equation we have c2 = −c3 .
Substituting into the first equation gives
c3 + 2(−c3 ) + c3 = 0
which holds for any value of c3 . Thus we let c3 = t, where t ∈ R is any scalar. We then have
c1 = t, c2 = −t and c3 = t, t ∈ R.
91
In the last example, we saw that the set B was linearly dependent. We showed that
1 2 1 0
t 0 − t 1 + t 1 = 0 .
−1 0 1 0
−1 0 1 0
−1 0 1
and use Theorem 13.8 to conclude that
1 2 1 2
1
Span B = Span 0 , 1 , 1 = Span 1 , 1
−1 0 1 0 1
In this case we could solve for any vector on the left hand side of Equation (13) in terms of
the other two to alternatively arrive at
2 1 1
1 2 1
1 1
1 = 0 + 1 =⇒ Span 0 , 1 1 = Span
, ,
0 1
0 −1 1 −1 0 1 −1 1
or
1 2 1
1 2 1
1 2
1 = 1 − 0 =⇒ Span 0 , 1 , 1 = Span 0 , 1 .
1 0 −1 −1 0 1 −1 0
is linearly independent.
92
Solution. For c1 , c2 ∈ R, consider
1 1 0
c1 0 + c2 1 = 0 .
−1 1 0
c1 + c2 = 0
c2 = 0
−c1 + c2 = 0
We see from the second equation that c2 = 0 and substituting c2 = 0 into both the first and
third equations each gives c1 = 0. Thus we have only the trivial solution c1 = c2 = 0 and we
conclude that C is linearly independent.
Theorem 14.5. A set of vectors {~v1 , . . . , ~vk } in Rn is linearly dependent if and only if
for some i = 1, . . . , k.
Proof. Assume first that the set {~v1 , . . . , ~vk } in Rn is linearly dependent. Then there exist
c1 , . . . , ck ∈ R, not all zero, such that
Without loss of generality, assume that ci 6= 0. Then we may isolate for ~vi on one side of the
equation:
c1 ci−1 ci+1 ck
~vi = − ~v1 − · · · − ~vi−1 − ~vi+1 − · · · − ~vk
ci ci ci ci
which shows that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }. To prove the other implication, we
assume that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk } for some i = 1, . . . , k. Then there exist
d1 , . . . , di−1 , di+1 , . . . , dk ∈ R such that
93
Given a spanning set of vectors {~v1 , . . . , ~vk } in Rn for a set S, we can now consider the vector
equation
c1~v1 + · · · + ck~vk = ~0. (14)
If the only solution to (14) is the trivial solution (c1 = · · · = ck = 0), then {~v1 , . . . , ~vk } is lin-
early independent. It follows that removing any vector from {~v1 , . . . , ~vk } will leave a smaller
spanning set that no longer spans S. If, on the other hand, there exists a nontrivial solution
to (14) where say, ci 6= 0, then we can solve for ~vi in (14) to express ~vi as a linear combination
of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk which shows that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }. It follows
that we can remove ~vi from the spanning set and the resulting set {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }
will still span S by Theorem 13.8.
We conclude with a few more examples involving linear dependence and linear independence.
Example 14.6. Consider the set {~v1 , . . . , ~vk , ~0} of vectors in Rn . Then
which shows that {~v1 , . . . , ~vk , ~0} is linearly dependent. Note that any subset of Rn containing
~0 ∈ Rn will be linearly dependent.
Example 14.7. Let ~v1 , ~v2 , ~v3 ∈ Rn be such that {~v1 , ~v2 , ~v3 } is linearly independent. Prove
that {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent.
Proof. We must prove that the set {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent. To do
so, we consider the vector equation
c1 + c2 + c3 = 0
c2 + c3 = 0
c3 = 0
We see that c3 = 0 and it follows that c2 = 0 and then that c1 = 0. Hence we have only the
trivial solution, so our set {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent.
Example 14.8. Let {~v1 , . . . , ~vk } be a linearly independent set of vectors in Rn . Prove that
{~v1 , . . . , ~vk−1 } is linearly independent.
94
Proof. It is given that {~v1 , . . . , ~vk } is linearly independent. Suppose for a contradiction that
{~v1 , . . . , ~vk−1 } is linearly dependent. Then there exist c1 , . . . , ck−1 , not all zero, such that
which shows that {~v1 , . . . , ~vk } is linearly dependent, since not all of c1 , . . . , ck−1 are zero. But
this is a contradiction since we were given that {~v1 , . . . , ~vk } is linearly independent. Hence,
our supposition that {~v1 , . . . , ~vk−1 } is linearly dependent was incorrect. This leaves only that
{~v1 , . . . , ~vk−1 } is linearly independent, as required.
In the previous example, we used a proof technique known as Proof by Contradiction. When
using proof by contradiction, you are proving a statement is true by proving that it cannot
be false. In the proof, we had to show that {~v1 , . . . , ~vk−1 } was linearly independent. The
set {~v1 , . . . , ~vk−1 } is either linearly independent or linearly dependent, but not not both.
Instead of proving that {~v1 , . . . , ~vk−1 } was linearly independent directly, we supposed that it
was linearly dependent. From that supposition, we argued until we arrived at {~v1 , . . . , ~vk }
being linearly dependent, which was impossible since we were given that {~v1 , . . . , ~vk } was
linearly independent. We arrived at a contradiction: if the set {~v1 , . . . , ~vk−1 } was linearly
dependent, then the set {~v1 , . . . , ~vk } was both linearly independent and linearly dependent.
Thus we showed that {~v1 , . . . , ~vk−1 } cannot be linearly dependent, and so must be linearly
independent (which is what we were asked to prove).
It follows from the last example that every nonempty subset of a linearly independent set is
also linearly independent. Of course, we should consider the empty set, since it is a subset
of every set. As the empty set contains no vectors, we cannot exhibit vectors from the
empty set that form a linearly dependent set. Thus, the empty set is (vacuously) linearly
independent. Thus, we can now say that given any linearly independent set B, every subset
of B is linearly independent as well.
95
Lecture 15
Bases29
Having discussed spanning and linear independence, we now combine the two ideas and
consider linearly independent spanning sets.
which shows that ~x ∈ Span B and we conclude that R2 ⊆ Span B. Since [ 10 ] , [ 01 ] ∈ R2 and
R2 is closed under linear combinations, we have that Span B ⊆ R2 . Hence Span B = R2 . To
show that B is linearly independent, let c1 , c2 ∈ R and consider
" # " # " # " #
0 1 0 c1
= c1 + c2 = .
0 0 1 c2
Definition 15.3. For i = 1, . . . , n, let ~ei ∈ Rn be the vector whose ith entry is 1 and whose
other n − 1 entries are 0. The set
{~e1 , . . . , ~en }
is a basis for Rn , called the standard basis for Rn .
96
and in R3 the standard basis is
1
0 0
{~e1 , ~e2 , ~e3 } = 0 , 1 , 0 .
0 0 1
It should now be clear how to write out the standard basis for R4 , R5 and so on. Note that
we have seen the standard basis for R3 before in Example 6.12. It is important to realize
that the definition of ~ei depends on n: in the previous example, ~e1 and ~e2 are expressed in
two different ways depending on the value of n. In general it will be clear from the context
what ~e1 , . . . , ~en are, that is, what the value of n is. As with Example 6.12, it is easy to write
any vector in Rn as a linear combination of the standard basis vectors for Rn .
Example 15.5. Is
1
−1
B = 2 , 2
0 1
a basis for R3 ?
h x1 i
Solution. Let ~x = x2
x3
∈ R3 and for c1 , c2 ∈ R, consider
x1 1 −1
x 2 = c1 2 + c2 2 .
x3 0 1
We obtain the system of equations
c1 − c2 = x 1
2c1 + 2c2 = x2
c2 = x 3
From the last equation we see that c2 = x3 and substitution into the second equation gives
x2 − 2x3
c1 = .
2
Substituting our values for c1 and c2 into the first equation gives
x2 − 2x3
− x3 = x1
2
and simplifying leads to
2x1 − x2 + 4x3 = 0.
h x1 i
From this, we deduce that ~x = x2
x3
∈ Span B if and only if 2x1 −x2 +4x3 = 0, and it follows
3 3
h i Span B 6= R so B cannot be a basis for R . For example, since 2(1) − 1 + 4(1) = 5 6= 0,
that
1
1 ∈ / Span B.
1
97
Note that in the previous example, B is linearly independent since it contains only two
vectors which are not scalar multiples of one another. A vector equation for Span B is
1 −1
~x = c1 2 + c2 2 , c1 , c2 ∈ R
0 1
which we recognize as the vector equation for a plane in R3 . Indeed, taking the cross product
of the vectors in B yields a normal vector for this plane, and leads to the scalar equation
2x1 − x2 + 4x3 = 0.
Given a set S, we would like to find a basis for S if possible.30 . Thus, we would like to find a
linearly independent set B ⊆ S such that Span B = S. A good reason to require Span B = S
is that we would like be able to write every vector in S (and only those vectors in S) as a
linear combination of the vectors in B. The reason for requiring linear independence may
not be so clear.
Theorem 15.6. If B = {~v1 , . . . , ~vk } is a basis for a set S ⊆ Rn , then every ~x ∈ S can be
expressed as a linear combination ~v1 , . . . , ~vk in a unique way.
Proof. Since B is a basis for S, S = Span B and so every ~x ∈ S can be expressed as a linear
combination of the vectors in B. Thus we only need to show that this expression is unique.
Suppose for c1 , d1 , . . . , ck , dk ∈ R we have two ways to express ~x as a linear combination of
~v1 , . . . , ~vk :
~x = c1~v1 + · · · + ck~vk and ~x = d1~v1 + · · · + dk~vk .
Then
c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk
and so
(c1 − d1 )~v1 + · · · + (ck − dk )~vk = ~0.
Since B is linearly independent, we have that c1 − d1 = · · · = ck − dk = 0, that is, ci = di
for i = 1, . . . , k, which shows any ~x ∈ S can be expressed uniquely as a linear combination
of the vectors in B.
We thus think of a basis B for a subset S of Rn as a minimal spanning set in the sense that
B spans S, but since B is linearly independent, we cannot remove a vector from B as we
would obtain a set that no longer spans S.
Subspaces of Rn
We now address which subsets of Rn admit a basis. If a subset S of Rn has a basis, then
S = Span B for some set B = {~v1 , . . . , ~vk }. It follows that ~v1 , . . . , ~vk ∈ S and that S is closed
under linear combinations. We’ve seen that Rn itself is closed under linear combinations,
so we expect that subsets of Rn with a basis to act very much like Rn itself under vector
addition and scalar multiplication. Thus we have the following definition:
30
We will soon see exactly which subsets of Rn can have a basis, but the work we’ve done thus far might
lead you to guess which sets. 98
Definition 15.7. A subset S of Rn is a subspace of Rn if for every w,
~ ~x, ~y ∈ S and c, d ∈ R
we have
S2 ~x + ~y = ~y + ~x addition is commutative
S3 (~x + ~y ) + w
~ = ~x + (~y + w)
~ addition is associative
S5 For each ~x ∈ S there exists a (−~x) ∈ S such that ~x + (−~x) = ~0 additive inverse
This should seem similar to Theorem 6.10. In fact, if we replace S with Rn in the above def-
inition, then we have Theorem 6.10, so we see immediately that Rn is itself a subspace of Rn .
99
Lecture 16
Theorem 16.1 (Subspace Test). Let S be a nonempty subset of Rn . If for every ~x, ~y ∈ S
and for every c ∈ R, we have that ~x + ~y ∈ S and c~x ∈ S, then S is a subspace of Rn .
is a subspace of R3 .
x3 y3
be two vectors in S. Then x1 + x2 = 0 = y1 + y2 and x2 − x3 = 0 = y2 − y3 . We must first
show that
x1 + y1
~x + ~y = x2 + y2
x3 + y3
belongs to S by showing that (x1 + y1 ) + (x2 + y2 ) = 0 and that (x2 + y2 ) − (x3 + y3 ) = 0.
We have
and
100
so ~x + ~y ∈ S. For any c ∈ R, we must next show that
cx1
c~x = cx2
cx3
and
101
Then we have that
" # " # " #
1 1 1
~x + ~y = c1 + c2 = (c1 + c2 )
3 3 3
so ~x + ~y ∈ S and for any c ∈ R,
" #! " #
1 1
c~x = c c1 = (cc1 )
3 3
Bases of Subspaces
We have discussed that every subspace S of Rn can be expressed as S = Span {~v1 , . . . , ~vk }
for some ~v1 , . . . , ~vk . Thus {~v1 , . . . , ~vk } is a spanning set for S. Removing any dependencies
from the set {~v1 , . . . , ~vk } will leave us with a linearly independent spanning set for S, that
is, a basis for S. We now look at how to find a basis for a subspace of Rn .
102
Example 16.7. Find a basis for the subspace
x1
S = x2 x1 + x2 = 0 and x2 − x3 = 0
x3
of R3 .
h x1 i
Solution. Let ~x = xx23 ∈ S. Then x1 + x2 = 0 and x2 − x3 = 0 and thus x1 = −x2 and
x3 = x2 . It follows that
x1 −x2 −1
~x = x2 = x2 = x2 1 .
x3 x2 1
nh −1 io h −1 i
Thus S ⊆ Span 1 . Now since 1 ∈ S and since S is closed under linear combina-
1 nh −1 io 1 nh −1 io
31
tions we have that Span 1 ⊆ S and so Span 1 = S. Hence the set
1 1
−1
B= 1
1
is a spanning set for S. Since B consists of a single nonzero vector, B is linearly independent
and is hence a basis for S.
Note that once we obtain a basis for S, we see that
h −1 i S is the set of all linear combinations (or
in this case, scalar multiples) of the vector 1 . Thus S is a line through the origin with
h −1 i 1
direction vector 1 .
1
When finding a spanning set for a subspace S of Rn , we choose an arbitrary ~x ∈ S and try
to “decompose” ~x as a linear combination of some ~v1 , . . . , ~vk ∈ S. This then shows that
S ⊆ Span {~v1 , . . . , ~vk }. Technically, we should also show that Span {~v1 , . . . , ~vk } ⊆ S, but this
is trivial as S is a subspace and thus contains all linear combinations of ~v1 , . . . , ~vk . Thus for
a subspace S of Rn , S ⊆ Span {~v1 , . . . , ~vk } implies that S = Span {~v1 , . . . , ~vk }, and we don’t
normally show (or even mention) that Span {~v1 , . . . , ~vk } ⊆ S.
31
Properties S1 and S6 from the definition of a subspace of Rn combine to give that every linear combi-
nation of vectors from S is again in S.
103
Example 16.8. Consider the subspace
a − b
S = b − c a, b, c ∈ R
c−a
c−a −1 0 1
Thus
1 −1 0
S = Span 0 , 1 , −1 .
−1 0 1
Now since
0 1 −1
−1 = − 0 − 1
1 −1 0
We have from Theorem 13.8 that
1 −1
S = Span 0 , 1
−1 0
so
1 −1
B= 0 , 1
−1 0
is a spanning set for S. Moreover, since neither vector in B is a scalar multiple of the other,
B is linearly independent and hence a basis for S.
Note that we now see that S is a plane through the origin and a vector equation for S is
1 −1
~x = s 0 + t 1 , s, t ∈ R
−1 0
104
Lecture 17
• Let ~v1 ∈ Rn be such that {~v1 } is linearly independent32 . The set with vector equation
~x = p~ + c1~v1 , c1 ∈ R
• Let ~v1 , ~v2 ∈ Rn be such that {~v1 , ~v2 } is linearly independent33 . The set with vector
equation
~x = p~ + c1~v1 + c2~v2 , c1 , c2 ∈ R
is a plane in Rn through the point P .
Definition 17.2. For some positive integer k ≤ n − 1, let ~v1 , . . . , ~vk ∈ Rn be such that
{~v1 , . . . , ~vk } is linearly independent. The set with vector equation
~x = p~ + c1~v1 + · · · + ck~vk , c1 , . . . , ck ∈ R
Thus in Rn , a 1−flat is a line and a 2−flat is a plane, both of which we have seen before.
We may think of a 3−flat as a “three dimensional plane”, but we aren’t normally able to
visualize such things in higher dimensions. We also mention that a 0−flat is simply a point
and has vector equation ~x = p~. The last type of k−flat that we have encountered already is
an (n − 1)−flat, known as a hyperplane:
Definition 17.3. Let ~v1 , . . . , ~vn−1 ∈ Rn be such that {~v1 , . . . , ~vn−1 } is linearly independent.
The set with vector equation
105
Note that the definition of a hyperplane depends on n, so how we geometrically interpret a
hyperplane depends on n. For example, in R2 , a hyperplane is a 1−flat, or a line. In R3 , a
hyperplane is a 2−flat, or a plane. The reason we are concerned with hyperplanes is that
they are the only k−flats in Rn that have scalar equations. Indeed, a scalar equation for a
line in R2 is of the form ax1 + bx2 = c for some a, b, c ∈ R and a scalar equation for a plane
in R3 is of the form ax1 + bx2 + cx3 = d for some a, b, c, d ∈ R. Hyperplanes will play a role
when we study systems of equations and their geometric interpretations shortly.
Finally, using our new terminology, we can now give a simple geometric description of all of
the subspaces of Rn : they are exactly the k−flats through the origin for k = 0, 1, . . . , n − 1
along with Rn itself. We note that in Rn , a k−flat with vector equation ~x = p~+c1~v1 +· · ·+ck~vk
is a subspace of Rn if and only if p~ ∈ Span {~v1 , . . . , ~vk }.
106
Definition 17.7. If an orthogonal set B is a basis for a subspace S of Rn , then B is an
orthogonal basis for S.
If B = {~v1 , . . . , ~vk } is an orthogonal basis of a subspace S of Rn and ~x ∈ S, then, since B is
a basis for S, there exist c1 , . . . , ck ∈ R such that ~x = c1~v1 + · · · + ck~vk . For any i = 1, . . . , k
it follows from B being an orthogonal set that
~vi · ~x = ~vi · (c1~v1 + · · · + ck~vk )
= ci k~vi k2
Hence, we can compute the coefficients that are used to express ~x as a linear combination of
the vectors in B directly, that is, without solving a system of equations. Also note that we
can solve for the coefficients independently of one another.
Example 17.8. Let ~x = [ −1
1 ] and
(" # " #)
1 6
B= ,
3 −2
107
Orthonormal Sets and Bases
Example 17.10. The standard basis {~e1 , . . . , ~en } for Rn is an orthonormal set (and an
orthonormal basis for Rn ). The set
√ √ √
1/√3 1/ 6 −1/ 2
√
1/ 3 , −2/ 6 , 0
√ √ √
1/ 3 1/ 6 1/ 2
Note that the condition k~vi k = 1 excludes the zero vector from any orthonormal set. It
follows that any orthonormal set is an orthogonal set of nonzero vectors34 , and as such, must
be linearly independent by Theorem 17.6.
108
Example 17.11. From before,
(" # " #)
1 6
B= ,
3 −2
is an orthogonal basis for R2 . Obtain an orthonormal basis C for R2 from B and express
~x = [ −1
1 ] as a linear combination of the vectors in C.
109
Lecture 18
a1 x 1 + a2 x 2 + · · · + an x n = b
3x1 + 2x2 − x3 = 3
2x1 + x3 = −1
3x2 − 4x3 = 4
The number aij is the coefficient of xj in the ith equation and bi is the constant term in the
ith equation. Each of the m equations is a scalar equation of a hyperplane in Rn .
Definition 18.3. A vector
s1
.
~s = .. ∈ Rn
sn
110
is a solution to a system of m equations in n variables if all m equations are satisfied when
we set xj = sj for j = 1, . . . , n. The set of all solutions to a system of equations is called the
solution set.
We may view the solution set of a system of m equations in n variables as the intersection
of the m hyperplanes determined by the system.
Example 18.4. Solving the system of two linear equations in two variables
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
can be viewed as finding the points of intersection of the two lines with scalar equations
a11 x1 + a12 x2 = b1 and a21 x1 + a22 x2 = b2 . Figure 43 shows the possible outcomes.
We see that a system of two equations in two variables can have no solutions, exactly one
solution or infinitely many solutions. Figure 44 shows a similar situation when we consider a
system of three equations in three variables, which we may view geometrically as intersecting
three planes in R3 . Indeed we will see that for any linear system of m equations in n variables,
we will obtain either no solutions, exactly one solution, or infinitely many solutions.
Definition 18.5. We call a linear system of equations consistent if it has at least one
solution. Otherwise, we call the linear system inconsistent.
Example 18.6. Solve the linear system
x1 + 3x2 = −1
x 1 + x2 = 3
Solution. To begin, we will eliminate x1 in the second equation by subtracting the first
equation from the second:
!
x1 + 3x2 = −1 Subtract the first x1 + 3x2 = −1
−→ −→
x1 + x2 = 3 equation from the second −2x2 = 4
111
Figure 44: Number of solutions resulting from intersecting three planes. Note that there are
other ways to arrange these planes to obtain the given number of solutions.
Finally we eliminate x2 from the first equation by subtracting the second equation from the
first equation three times:
!
x1 + 3x2 = −1 Subtract 3 times the second x1 = 5
−→ −→
x2 = −2 equation from the first x2 = −2
which we refer to as the parametric form of the solution, the vector form of the solution and
the point form of the solution respectively.
Notice that when we write a system of equations, we always list the variables in order and
that when we solve a system of equations, we are ultimately concerned with the coefficients
and constant terms. Thus, we can write the above systems of equations and the subsequent
operations we used to solve the system more compactly:
" # " # " # " #
1 3 −1 −→ 1 3 −1 −→ 1 3 −1 R1 −3R2 1 0 5
1 1 3 R2 −R1 0 −2 4 − 12 R2 0 1 −2 −→ 0 1 −2
112
so " # " #
x1 5
=
x2 −2
as above. We call " #
1 3
1 1
the coefficient matrix 35 of the linear system, which is often denoted by A. The vector
" #
−1
3
is the constant matrix (or constant vector) of the linear system and will be denoted by ~b.
Finally " #
1 3 −1
1 1 3
is the augmented matrix of the linear system, and will be denoted by [ A | ~b ].
From the previous example, we see that by taking the augmented matrix of a linear system
of equations, we can “reduce” it to an augmented matrix of a simpler system from which we
can “read off” the solution. Notice that by doing this, we are simply removing the variables
from the system (since we know x1 is always the first variable and x2 is always the second
variable), and treating the equations as rows of the augmented matrix. Thus, the operation
R2 − R1 written to the right of the second row of an augmented matrix means that we are
subtracting the first row from the second to obtain a new second row which would appear
in the next augmented matrix.
We are allowed to perform the following Elementary Row Operations (EROs) to the aug-
mented matrix of a linear system of equations:
• Swap two rows
• Add a scalar multiple of one row to another
• Multiply any row by a nonzero scalar
We say that two systems are equivalent if they have the same solution set. A system derived
from a given system by performing elementary row operations on its augmented matrix will
be equivalent to the given system. Thus elementary row operations allow us to reduce a
complicated system to one that is easier to solve. In the previous example, since
" # " #
1 3 −1 1 0 5
−→
1 1 3 0 1 −2
35
A matrix will be formally defined in Lecture 25 - for now, we view them as rectangular arrays of numbers
used to represent systems of linear equations.
113
the systems they represent
x1 + 3x2 = −1 x1 = 5
and
x 1 + x2 = 3 x2 = −2
must have the same solution set. Clearly, the second system is easier to solve as we can
simply read off the solution.
2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10
Solution. To solve this system, we perform elementary row operations to the augmented
matrix:
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
0 1 2 8 R1 ↔R3 0 1 2 8 0 1 2 8
1 0 3 10 2 1 9 31 R3 −2R1 0 1 3 11 R3 −R2
1 0 3 10 R1 −3R3 1 0 0 1
0 1 2 8 R2 −2R3 0 1 0 2
0 0 1 3 −→ 0 0 1 3
We thus have
x1 = 1 x1 1
x2 = 2 or x2 = 2 or (x1 , x2 , x3 ) = (1, 2, 3)
x3 = 3 x3 3
as our solution.
Note that the augmented matrix
1 0 3 10
0 1 2 8
0 0 1 3
corresponds to the linear system of equations
x1 + 3x3 = 10
x2 + 2x3 = 8
x3 = 3
114
From here, we can see that x3 = 3. We can then use the second equation to solve for x2 and
then the first equation to solve for x1 :
x2 = 8 − 2x3 = 8 − 2(3) = 8 − 6 = 2
x1 = 10 − 3x3 = 10 − 3(3) = 10 − 9 = 1
115
Lecture 19
1 0 3 10 0 0 1 3
In both cases, we chose our elementary row operations in order to get to the augmented
matrices on the right, and this is the “form” that we are looking for.
Definition 19.1.
• The first nonzero entry in each row of a matrix is called a leading entry (or a pivot).
• A matrix is in Row Echelon Form (REF) if
(1) All rows whose entries are all zero appear below all rows that contain nonzero entries,
(2) Each leading entry is to the right of the leading entries above it.
• A matrix is in Reduced Row Echelon Form (RREF) if it is in REF and
(3) Each leading entry is a 1 called a leading one,
(4) Each leading one is the only nonzero entry in its column.
Note that by definition, if a matrix is in RREF, then it is in REF.
When row reducing the augmented matrix of a linear system of equations, we aim first to
reduce the augmented matrix to REF. Once we have reached an REF form, we may either
use back substitution, or continue using elementary row operations until we reach RREF
where we can simply read off the solution.
From our last example, we rewrite the steps and circle the leading entries:
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
0 1 2 8 R1 ↔R3 0 1 2 8 0 1 2 8
1 0 3 10 2 1 9 31 R3 −2R1 0 1 3 11 R3 −R2
116
1 0 3 10 R1 −3R3 1 0 0 1
0 1 2 8 0 1 0 2
R2 −2R3
0 0 1 3 −→ 0 0 1 3
| {z } | {z }
REF REF and RREF
We point out here that any matrix has many REFs, but the RREF is always unique for any
matrix.
Example 19.2. Solve the linear system of equations
3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20
Solution. We use elementary row operations to carry the augmented matrix of the system
to RREF.
3 1 0 10 R1 −R2 1 0 −1 4 −→ 1 0 −1 4 −→
2 1 1 6 −→ 2 1 1 6 R2 −2R1 0 1 3 −2
0 0 0 0
x1 − x3 = 4
x2 + 3x3 = −2
0 = 0
The last equation is clearly always true, and from the first two equations, we can solve for
x1 and x2 respectively to obtain
x1 = 4 + x3
x2 = −2 − 3x3
x3 = t x3 0 1
117
Geometrically, we view solving the above system of equations as finding those points in R3
that lie on the three planes 3x1 + x2 = 10, 2x1 + x2 + x3 = 6 and −3x1 + 4x2 + 15x3 = −20.
Notice that the solution we obtained
x1 4 1
x2 = −2 + t −3 , t ∈ R
x3 0 1
is the vector equation of a line in R3 . Hence we see that the three planes intersect in a line,
and we have found the vector equation for that line. See Figure 45.
Figure 45: The intersection of the three planes in R3 is a line. Note that the planes may not
be arranged exactly as shown.
That our solution was a line in R3 was a direct consequence of the fact that there were no
restrictions on the variable x3 and that as a result, our solutions for x1 and x2 depended on
x3 . This motivates the following definition.
118
1 0 −1 4
0 1 3 −2
0 0 0 0
| {z }
REF (RREF actually)
0 0 0
As the first two columns of R have leading entries (leading ones in this case), we have that
x1 and x2 are leading variables. There is no leading entry in the third column of R, so x3 is
a free variable.
When solving a system, if there are free variables, then each free variable is assigned a
different parameter, and then the leading variables are solved for in terms of the parameters.
The existence of a free variable guarantees that there will be infinitely many solutions to the
linear system of equations.
Example 19.4. Solve the linear system of equations
x1 + 6x2 − x4 = −1
x3 + 2x4 = 7
Solution. We have that the augmented matrix for this system of linear equations
" #
1 6 0 −1 −1
0 0 1 2 7
is already in RREF. The leading entries are in the first and third columns, so x1 and x3
are leading variables while x2 and x4 are free variables. We will assign x2 and x4 different
parameters. We have
x1 = −1 − 6s + t
x2 = s
, s, t ∈ R
x3 = 7 − 2t
x4 = t
or as a vector equation
x1 −1 −6 1
x 0 1 0
2
= + s + t , s, t ∈ R
x3 7 0 −2
x4 0 0 1
which we recognize as the equation of a plane in R4 .
119
In the previous example, note that when we reached RREF and we begin to find the values
of x1 , x2 , x3 and x4 that will give the solution, it was easiest to solve for x4 first, then x3
followed by x2 and finally x1 .
Solution. We have
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
2 13 −6 −5 R2 −R1 0 1 2 −1 0 1 2 −1
Figure 46: Three nonparallel planes that have no common point of intersection.
120
Keeping track of our leading entries in the last example, we see
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
2 13 −6 −5 R2 −R1 0 1 2 −1 0 1 2 −1
with c 6= 0, then the system is inconsistent. Thus, there is no need to continue row operations
in this case. Note that in a row of the form [ 0 ··· 0 | c ] with c 6= 0, the entry c is a leading
entry. Thus, a leading entry appearing in the last column of an augmented matrix indicates
that the system of linear equations is inconsistent.
All of our work for systems of linear equations can easily be generalized to the complex case.
Solution. Our method to solve this system is no different than in the real case. We take the
augmented matrix of the system and use elementary row operations to carry it to RREF.
Note that our elementary row operations now involve multiplying a row by a complex number
and adding a complex multiple of one row to another, in addition to swapping two distinct
rows.
j −1 −1 −1 + j −1 −→ j −1 −1 −1 + j −1 −jR1
0 0 1 1+j 2+j 0 0 1 1 + j 2 + j −→
0 0 −1 − j −2j −1 − 3j R3 +(1+j)R2 0 0 0 0 0
1 j 0 2 1−j
0 0 1 1+j 2+j
0 0 0 0 0
121
We see that the system is consistent and that z1 and z3 are leading variables while z2 and
z4 are free variables. Thus
z1 = (1 − j) − js − 2t
z2 = s
s, t ∈ C
z3 = (2 + j) − (1 + j)t
z4 = t
or
z1 1−j −j −2
z2 0 1 0
= + s + t , s, t ∈ C
z3 2+j 0 −1 − j
z4 0 0 1
Note that when we are dealing with a complex system of linear equations, our parameters
should be complex numbers rather than just real numbers.
122
Lecture 20
Example 20.1. Determine if
31 2
1 9
8 ∈ Span 0 , 1 , 2 .
10 1 0 3
10 1 0 3
which leads to the system of equations with augmented matrix
2c1 + c2 + 9c3 = 31 2 1 9 31
c2 + 2c3 = 8 −→ 0 1 2 8
c1 + 3c3 = 10 1 0 3 10
We’ve seen this system before - it is the system from Example 18.7 with x1 , x2 , x3 replaced
with c1 , c2 , c3 . Thus we have that the system is consistent and c1 = 1, c2 = 2 and c3 = 3.
Hence,
31 2 1 9
8 = 0 + 2 1 + 3 2
10 1 0 3
and
31 2
1 9
8 ∈ Span 0 1 2 .
, ,
10 1 0 3
Note that the coefficient matrix of the system in the previous example is
2 1 9
0 1 2
1 0 3
and that its columns are the vectors from the above spanning set.
Theorem 20.2. Let ~v1 , . . . , ~vk ∈ Rn . Then ~b ∈ Span {~v1 , . . . , ~vk } if and only if the system
with augmented matrix h i
~
~v1 · · · ~vk b
is consistent. Note that ~v1 , . . . , ~vk , ~b are the columns of the augmented matrix.
123
Example 20.3. From Example 19.2, the linear system of equations
3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20
is consistent so we have that
10
3 1 0
6 ∈ Span 2 , 1 , 1 .
−20 −3 4 15
We observe that we now have a couple of ways to view a linear system of equations:
• In terms of its rows, where we can think of the system geometrically as intersecting
hyperplanes. The solution can then be interpreted as a description of this intersection
(if the system is inconsistent, then there is no intersection and the solution set is
empty).
• In terms of the columns (of the augmented matrix), where we think of the system
algebraically as determining if a vector ~b is in the span of a given set of vectors. The
answer is affirmative if the system is consistent and the solution tells us how to write
~b as a linear combination of the given vectors, and the answer is negative if the system
is inconsistent.
124
Rank
After solving numerous systems of equations, we are beginning to see the importance of
leading entries in an REF of the augmented matrix of the system. This motivates the
following definition.
Definition 20.5. The rank of a matrix A, denoted by rank (A), is the number of leading
entries in any REF of A.
Note that although we don’t prove it here, given a matrix and any two of its REFs, the
number of leading entries in both of these REFs will be the same. This means that our
definition of rank actually makes sense.
Example 20.6. Consider the following three matrices A, B and C along with one of their
REFs. Note that A and B are being viewed as augmented matrices for a linear system of
equations, while C is being viewed as a coefficient matrix.
2 1 9 31 1 0 3 10
A = 0 1 2 8 −→ 0 1 2 8
1 0 3 10 0 0 1 3
" # " #
2 0 1 3 4 1 1 4 −13 −5
B= −→
5 1 6 −7 3 0 -2 −7 29 14
" # " #
1 2 3 1 2 3
C= −→
2 4 6 0 0 0
Note that the requirement that a matrix be in REF before counting leading entries is im-
portant. The matrix " #
1 2 3
C=
2 4 6
has two leading entries, but rank (C) = 1.
Note that if a matrix has m rows and n columns, then rank (A) ≤ min{m, n}, the minimum
of m and n. This follows from the definition of leading entries and REF: there can be at
most one leading entry in each row and each column.
The next theorem is useful to analyze systems of equations and will appear throughout the
course.
125
Theorem 20.7 (System-Rank Theorem). Let [ A | ~b ] be the augmented matrix of a system
of m linear equations in n variables.
(1) The system is consistent if and only if rank (A) = rank [ A | ~b ]
(2) If the system is consistent, then the number of parameters in the general solution is
the number of variables minus the rank of A:
(3) The system is consistent for all ~b ∈ Rm if and only if rank (A) = m.
We don’t prove the System-Rank Theorem here. However, we will look at some of the
systems we have encountered thus far and show that they each satisfy all three parts of the
System-Rank Theorem.
Example 20.8. From Example 18.7, the system of m = 3 linear equations in n = 3 variables
2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10
1 0 3 10 0 0 1 3
and solution
x1 1
x2 = 2 .
x3 3
From the System-Rank Theorem we see that
(1) rank (A) = 3 = rank [ A | ~b ] so the system is consistent.
(3) rank (A) = 3 = m so the system will be consistent for any ~b ∈ R3 , that is, the system
2x1 + x2 + 9x3 = b1
x2 + 2x3 = b2
x1 + 3x3 = b3
will be consistent (with a unique solution) for any choice of b1 , b2 , b3 ∈ R.
126
Example 20.9. From Example 19.2, the system of m = 3 linear equations in n = 3 variables
3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20
−3 4 15 −20 0 0 0 0
and solution
x1 4 1
x2 = −2 + t −3 , t ∈ R.
x3 0 1
From the System-Rank Theorem, we have
(3) rank (A) = 2 6= 3 = m, so the system will not be consistent for every ~b ∈ R3 , that is,
the system
3x1 + x2 = b1
2x1 + x2 + x 3 = b2
−3x1 + 4x2 + 15x3 = b3
will be inconsistent for some choice of b1 , b2 , b3 ∈ R.
x1 + 6x2 − x4 = −1
x3 + 2x4 = 7
127
and solution
x1 −1 −6 1
x2 0 1 0
= + s + t , s, t ∈ R
x3 7 0 −2
x4 0 0 1
From the System-Rank Theorem,
(1) rank (A) = 2 = rank [ A | ~b ]) so the system is consistent.
(2) # of parameters = n − rank (A) = 4 − 2 = 2 so there are 2 parameters in the solution
(infinitely many solutions).
(3) rank (A) = 2 = m, so the system will be consistent for every ~b ∈ R2 , that is, the system
x1 + 6x2 − x 4 = b1
x3 + 2x4 = b2
will be consistent (with infinitely many solutions) for any choice of b1 , b2 ∈ R.
Example 20.11. From Example 19.5, the system of m = 3 linear equations in n = 3
variables
2x1 + 12x2 − 8x3 = −4
2x1 + 13x2 − 6x3 = −5
−2x1 − 14x2 + 4x3 = 7
has augmented matrix
2 12 −8 −4 2 12 −8 −4
[ A | ~b ] = 2 13 −6 −5 −→ 0 1 2 −1
−2 −14 4 7 0 0 0 1
and is inconsistent. From the System-Rank Theorem, we see
(1) rank (A) = 2 < 3 = rank [ A | ~b ] , so the system is inconsistent.
(2) as the system is inconsistent, the System-Rank Theorem does not apply here.
(3) rank (A) = 2 < 3 = m so the system will not be consistent forhevery ~b ∈ R3 . Indeed,
−4
i
as our work shows, the system is clearly not consistent for ~b = −5 .
7
In our last example, it is tempting to think that the system [ A | ~b ] will be inconsistent for
every ~b ∈ R3 , however, this is not the case. If we take ~b = ~0, then our system becomes
128
It isn’t difficult to see that x1 = x2 = x3 = 0 is a solution, so that this system is indeed
consistent. Of course, the question now is for which ~b ∈ R3 is this system consistent.
Example 20.12. Find an equation that b1 , b2 , b3 ∈ R must satisfy so that the system
is consistent.
Solution. We look at the augmented matrix of this system, and carry it to REF.
2 12 −8 b1 −→ 2 12 −8 b1 −→ 2 12 −8 b1
2 13 −6 b2 0 1 2 b2 − b1 0 1 2 b2 − b1
R2 −R1
−2 −14 4 b3 R3 +R1 0 −2 −4 b3 + b1 R3 +2R2 0 0 0 −b1 + 2b2 + b3
Since rank (A) = 2, we require rank [ A | ~b ] = 2 for consistency. Thus, we require that
−b1 + 2b2 + b3 = 0.
129
Lecture 21
Our last examples showed the the System-Rank Theorem did indeed use the rank of a
matrix to predict whether or not a system was consistent, and if it was consistent, how
many parameters the solution would have. Of course, we already knew the answers to these
problems as we had previously solved those systems. Here we look at another example of
using the System-Rank Theorem to predict how many solutions a system will have based
on the values of the coefficients in the system. In this situation, we are not concerned with
what the solutions are, but simply if solutions exist and how many solutions there are.
2x1 + 6x2 = 5
4x1 + (k + 15)x2 = ` + 8
We carry [ A | ~b ] to REF.
" # " #
2 6 5 −→ 2 6 5
4 k + 15 ` + 8 R2 −2R1 0 k+3 `−2
tent and thus has no solutions. If `−2 = 0, that is if ` = 2, then rank (A) = 1 = rank [ A | ~b ]
so the system is consistent with 2−rank (A) = 2−1 = 1 parameter. Hence we have infinitely
many solutions.
In summary,
Unique Solution : k 6= −3
No Solutions : k = −3 and ` 6= 2
Infinitely Many Solutions : k = −3 and ` = 2
130
Definition 21.2. A linear system of m equations in n variables is underdetermined if n > m,
this is, if it has more variables than equations.
Example 21.3. The linear system of equations
x1 + x 2 − x3 + x4 − x5 = 1
x1 − x2 − 3x3 + 2x4 + 2x5 = 7
is underdetermined.
Theorem 21.4. A consistent underdetermined linear system of equations has infinitely many
solutions.
Proof. Consider a consistent underdetermined linear system of m equations in n variables
with augmented matrix [ A | ~b ]. Since rank (A) ≤ min{m, n} = m, the system will have
n − rank (A) ≥ n − m > 0 parameters and so will have infinitely many solutions.
Definition 21.5. A linear system of m equations in n variables is overdetermined if n < m,
this is, if it has more equations than variables.
Example 21.6. The linear system of equations
−2x1 + x2 = 2
x1 − 3x2 = 4
3x1 + 2x2 = 7
is overdetermined.
Note that overdetermined systems are often inconsistent. Indeed, the system in the previous
example is inconsistent. To see why this is, consider for example, three lines in R2 (so a
system of three equations in two variables like the one in the previous example). When
chosen arbitrarily, it is generally unlikely that all three lines would intersect in a common
point and hence we would generally expect no solutions.
131
As this is still a linear system of equations, we use our usual techniques to solve such systems.
However, notice that x1 = x2 = · · · = xn = 0 satisfies each equation in the homogeneous
system, and thus ~0 ∈ Rn is a solution to this system, called the trivial solution. As every
homogeneous system has a trivial solution, we see immediately that homogeneous linear
systems of equations are always consistent.
x1 + x2 + x3 = 0
3x2 − x3 = 0
Solution. We have
" # " # " #
1 1 1 0 −→ 1 1 1 0 R1 −R2 1 0 4/3 0
0 3 −1 0 1
R
3 2
0 1 −1/3 0 −→ 0 1 −1/3 0
so
x1 = − 34 t x1 −4/3
x2 = 13 t, t ∈ R or x2 = t 1/3 , t ∈ R.
x3 = t x3 1
We make a few remarks about this example:
• Note that taking t = 0 gives the trivial solution. However, as our system was underde-
termined, we have infinitely many solutions. Indeed, the solution set is actually a line
through the origin.
where s = t/3. Hence we can let the parameter “absorb” the factor of 1/3. This is not
necessary, but is useful if one wishes to eliminate fractions.
• When working with homogeneous systems of linear equations, notice that the aug-
mented matrix [ A | ~0 ] will always have the last column containing all zero entries.
Thus, it is common to row reduce only the coefficient matrix.
132
Example 21.10. If we solve the system
x1 + x2 + x3 = 1
3x2 − x3 = 3
we obtain
" # " # " #
1 1 1 1 −→ 1 1 1 1 R1 −R2 1 0 4/3 0
0 3 −1 3 1
R
3 2
0 1 −1/3 1 −→ 0 1 −1/3 1
so
x1 = − 34 t x1 0 −4/3
x2 = 1 + 13 t, t∈R or x2 = 1 + t 1/3 , t ∈ R.
x3 = t x3 0 1
Note that the solution to the associated homogeneous system (from Example 21.9) is
x1 −4/3
x2 = t 1/3 , t ∈ R
x3 1
so we view the homogeneous solution from Example 21.9 as a line, say L0 , through the
origin, and the solution
h i from Example 21.10 as a line, say L1 , through P (0, 1, 0) parallel
0
to L0 . We refer to 1 as a particular solution to the system in Example 21.10 and note
0
that in general, the solution to a non-homogeneous system of linear equations is a particular
solution plus the solution to the associated homogeneous system of linear equations, provided
the non-homogeneous system of linear equations is consistent.
x3 0 1 x3 1
| {z } | {z }
particular associated
solution homogeneous
solution
x1 + 6x2 − x4 = −1
x3 + 2x4 = 7
133
We know from Example 19.4 that the solution is
x1 −1 −6 1
x2 0 1 0
= + s + t , s, t ∈ R,
x3 7 0 −2
x4 0 0 1
x1 + 6x2 − x4 = 0
x3 + 2x4 = 0
is
x1 −6 1
x2 1 0
= s + t , s, t ∈ R.
x3 0 −2
x4 0 1
which we recognize as a plane through the origin in R4 since the two vectors appearing in
the solution are nonzero and nonparallel.
From Examples 21.9 and 21.11 we saw that our solutions sets were lines and planes through
the origin which we recognize as subspaces. The following theorem shows that the solution
set to any homogeneous system in n variables will indeed be a subspace of Rn .
Theorem 21.12. Let S be the solution set to a homogeneous system of m linear equations
in n variables. Then S is a subspace of Rn .
Proof. Since the system has n variables, S ⊆ Rn and since the system is homogeneous, ~0 ∈ S
so S is nonempty. Now let
y1 z1
. .
~y = .. and ~z = ..
yn zn
be vectors in S. To show that S is closed under vector addition and scalar multiplication, it
is enough to consider one arbitrary equation of the system:
a1 x1 + · · · + an xn = 0.
a1 y1 + · · · + an yn = 0 = a1 z1 + · · · + an zn .
134
It follows that
a1 (y1 + z1 ) + · · · + an (yn + zn ) = a1 y1 + · · · + an yn + a1 z1 + · · · + an zn = 0 + 0 = 0
so ~y + ~z satisfies any equation of the system and thus ~y + ~z ∈ S. For c ∈ R,
a1 (cy1 ) + · · · + an (cyn ) = c(a1 y1 + · · · + an yn ) = c(0) = 0
so c~y ∈ S. Hence S is a subspace of Rn .
Note that we call the solution set of a homogeneous system the solution space of the system.
Example 21.13. Solve the homogeneous system of linear equations
8 −4 6 11 R2 −2R1 0 0 0 1 −→ 0 0 0 1 −→
−4 2 −3 −7 R3 +R1 0 0 0 −2 R3 +2R2 0 0 0 0
1 −1/2 3/4 0
0 0 0 1
0 0 0 0
so
= 12 s − 43 t
x1 x1 1/2 −3/4
x2 = s x2 1 0
, s, t ∈ R or = s + t , s, t ∈ R.
x3 = t x3 0 1
x4 = 0 x4 0 0
Taking
1/2 −3/4
1 0
B= , ,
0 1
0 0
we can express the solution set S of our homogeneous system of linear equations as S =
Span B. As B contains two vectors that are not scalar multiples of one another, we have
that B is a basis for S. We see that S is a plane through the origin in R4 .
135
Lecture 22
Consider the homogeneous system of linear equations
x 1 + x2 + x3 + 4x5 = 0
x4 + 2x5 = 0
with t1 , t2 , t3 ∈ R so
−1 −1 −4
1 0 0
B= 0 , 1 , 0
0 0 −2
0 0 1
is a spanning set for the solution space S of the system. We check B for linear independence.
Note however that the variables x2 , x3 and x5 are free variables. If we consider the second,
third and fifth entries in vectors of our spanning set
−1 −1 −4
1 0 0
B= 0 , 1 , 0
0 0 −2
0 0 1
we see that each vector has a 1 where the other two vectors have zeros in that same position.
Thus no vector in B is in the span of the others, and so B is linearly independent by Theorem
14.5. Hence B is a basis for the solution space S.
36
Remember that for homogeneous systems of linear equations, we normally row reduce just the coefficient
matrix.
136
Theorem 22.1. Let [ A | ~b ] be the augmented matrix for a consistent system of m linear
equations in n variables. If rank (A) = k < n, then the general solution of the system is of
the form
~x = d~ + t1~v1 + · · · + tn−k~vn−k
where d~ ∈ Rn , t1 , . . . , tn−k ∈ R and the set {~v1 , . . . , ~vn−k } ⊆ Rn is linearly independent. In
particular, the solution set is an (n − k)−flat in Rn .
Note that if rank (A) = n in the above theorem, then there are n − n = 0 parameters and so
our solution ~x = d~ is unique.
When solving a homogeneous system of linear equations, we see that the spanning set for
the solution space we find by solving the system is linearly independent. However, given
an arbitrary spanning set B for a subspace of Rn , we cannot assume that B is linearly
independent, and so we must still check. We now show a faster way to do so.
Consider
1 2 1 5
1 2 2 7
~v1 = , ~v2 = , ~v3 = and ~v4 =
2 4 3 12
3 6 4 17
and let B = {~v1 , ~v2 , ~v3 , ~v4 } and S = Span B. We wish to find a basis B 0 for S with B 0 ⊆ B.
That is, find a linearly independent subset B 0 of B with Span B 0 = S. For c1 , c2 , c3 , c4 ∈ R,
considering
c1~v1 + c2~v2 + c3~v3 + c4~v4 = ~0
gives a homogeneous system whose coefficient matrix we carry to RREF:
1 2 1 5 −→ 1 2 1 5 R1 −R2 1 2 0 3
1 2 2 7 R2 −R1
0 0 1 2 −→ 0 0 1 2
2 4 3 12 R3 −2R1 0 0 1 2 R3 −R2 0 0 0 0
3 6 4 17 R4 −3R1 0 0 1 2 R4 −R2 0 0 0 0
We see that c2 and c4 are free variables so we obtain nontrivial solutions to the system and
hence B is linearly dependent. Our work with bases thus far has shown us that since we can
find solutions with c2 6= 0 and c4 6= 0, we can remove one of ~v2 or ~v4 from B and then test
the resulting smaller set for linear independence. We show here that we can simply remove
both ~v2 and ~v4 and arrive at B 0 = {~v1 , ~v3 } as our basis for S immediately.
To begin, note that c1 and c3 were leading variables in the above system. Using our work
above, we see that by considering the homogeneous system
c1~v1 + c3~v3 = ~0
137
we obtain
1 1 1 0
1 2 0 1
→
2 3 0 0
3 4 0 0
which has only the trivial solution so {~v1 , ~v3 } is linearly indepedent. If we try to write ~v4 as
a linear combination of ~v1 , ~v2 and ~v3 , we obtain the system with augmented matrix
1 2 1 5 1 2 0 3
1 2 2 7 0 0 1 2
−→
2 4 3 12 0 0 0 0
3 6 4 17 0 0 0 0
The system is consistent (with infinitely many solutions), so ~v4 ∈ Span {~v1 , ~v2 , ~v3 } and so by
Theorem 13.8, Span {~v1 , ~v2 , ~v3 , ~v4 } = Span {~v1 , ~v2 , ~v3 } so we “discard” ~v4 . Now, if we try to
express ~v2 as a linear combination of ~v1 , we obtain the system with augmented matrix
1 2 1 2
1 2 0 0
−→
2 4 0 0
3 6 0 0
which is also consistent (with a unique solution) so ~v2 ∈ Span {~v1 } ⊆ Span {~v1 , ~v3 } and we
have that Span {~v1 , ~v2 , ~v3 } = Span {~v1 , ~v3 } by Theorem 13.8. We will thus “discard” ~v2 . In
summary, we’ve shown
S = Span B = Span {~v1 , ~v2 , ~v3 , ~v4 } = Span {~v1 , ~v2 , ~v3 } = Span {~v1 , ~v3 }
with {~v1 , ~v3 } linearly independent. Hence B 0 = {~v1 , ~v3 } is a basis for S.
Thus, we see that given a spanning set B = {~v1 , . . . , ~vk } for a subspace S of Rn , to find a
basis B 0 for S with B 0 ⊆ B, we construct the matrix [ ~v1 · · · ~vk ] which we carry to (reduced)
row echelon form. For i = 1, . . . , k, take ~vi ∈ B 0 if and only if the ith column of any REF of
our matrix has a leading entry. We also see that for ~vj ∈ / B 0 , ~vj can be expressed as a linear
combination of the vectors in {~v1 , . . . , ~vj−1 } ∩ B 0 .
Example 22.2. Let
1 1 1 3
B = −1 , 2 , 5 , 6 .
1 −3 −7 −9
138
Solution. We have
1 1 1 3 −→ 1 1 1 3 −→ 1 1 1 3
−1 2 5 6 R2 +R1 0 3 6 9 0 3 6 9
1 −3 −7 −9 R3 −R1 0 −4 −8 −12 R3 + 43 R2 0 0 0 0
As only the first two columns of an REF of our matrix contain leading entries, the first two
vectors in B comprise B 0 , that is
1 1
0
B = −1 , 2
1 −3
1 −3 −7 −9 0 0 0 0
Note that the third and fourth columns of the RREF do not contain leading ones. We see
that those vectors in B not taken in B 0 satisfy
1 1 1 1 1 1 1 0 −1 omit 4th
5 = −1 −1 + 2 2 since −1 2 5 −→ 0 1 2 columns from
−7 1 −3 1 −3 −7 0 0 0 matrices in (?)
3 1 1 1 1 3 1 0 0 omit 3rd
6 = 0 −1 + 3 2 since −1 2 6 −→ 0 1 3 columns from
−9 1 −3 1 −3 −9 0 0 0 matrices in (?)
Dimension
Let S be a subspace of Rn and B = {~v1 , ~v2 } be a basis for S. If C = {w ~ 1, w ~ 3 } is a set of
~ 2, w
vectors in S, then C must be linearly dependent. To see this, note that since B is a basis
for S, Theorem 15.6 gives that there are unique a1 , a2 , b1 , b2 , c1 , c2 ∈ R so that
w
~ 1 = a1~v1 + a2~v2 , w
~ 2 = b1~v1 + b2~v2 and w
~ 3 = c1~v1 + c2~v2 .
139
= t1 (a1~v1 + a2~v2 ) + t2 (b1~v1 + b2~v2 ) + t3 (c1~v1 + c2~v2 )
= (a1 t1 + b1 t2 + c1 t3 )~v1 + (a2 t1 + b2 t2 + c2 t3 )~v2
a1 t1 + b1 t2 + c1 t3 = 0
a2 t1 + b2 t2 + c2 t3 = 0
of R3 had basis
1 −1
B= ,
0 1
−1 0
so dim(S) = 2.
140
Theorem 22.8. If S is a k−dimensional subspace of Rn with k > 0, then
belong to S. Since ~v1 and ~v2 are nonzero and nonparallel, we have that {~v1 , ~v2 } is a linearly
independent set of two vectors in S. Since dim(S) = 2, we have that S = Span {~v1 , ~v2 } by
Theorem 22.8(3). Thus {~v1 , ~v2 } is a basis for S.
Note that we must know dim(S) before we use Theorem 22.8. In the previous example, we
could not have used the linear independence of {~v1 , ~v2 } to conclude that S = Span {~v1 , ~v2 }
if we weren’t told the dimension of S.
141
Lecture 23
We now begin to look at some application of systems of linear equations.
H2 + O2 −→ H2 O
The process by which molecules combine to form new molecules is called a chemical reaction.
Note that each hydrogen molecule is composed of two hydrogen atoms, each oxygen molecule
is composed of two oxygen atoms, and that each water molecule is composed of two hydrogen
atoms and one oxygen atom. Our goal is to balance this chemical reaction, that is, compute
how many hydrogen molecules and how many oxygen molecules are needed so that there are
the same number of atoms of each type both before and after the chemical reaction takes
place. By inspection, we find that
2H2 + O2 −→ 2H2 O
That is, two hydrogen molecules and one oxygen molecule combine to create two water
molecules. Before this chemical reaction takes place, there are four hydrogen atoms and
two oxygen atoms. After the reaction, there are again four hydrogen atoms and two oxygen
atoms. Thus we have balanced the chemical reaction.
x1 CO2 + x2 H2 O −→ x3 C6 H12 O6 + x4 O2
Equating the number of atoms of each type before and after the reaction gives the equations
C: x1 = 6x3
O : 2x1 + x2 = 6x3 + 2x4
H: 2x2 = 12x3
142
Moving all variables to the left in each equation gives the homogeneous system
x1 − 6x3 = 0
2x1 + x2 − 6x3 − 2x4 = 0
2x2 − 12x3 = 0
Row reducing the augmented matrix of this system to RREF gives
1 0 −6 0 0 −→ 1 0 −6 0 0 −→ 1 0 −6 0 0 −→
2 1 −6 −2 0 R2 −2R1 0 1 6 −2 0 0 1 6 −2 0
0 2 −12 0 0 1
R
2 3
0 1 −6 0 0 R3 −R2 0 0 −12 2 0 − 12 R3
1 0 −6 0 0 R1 +R3 1 0 0 −1 0 −→ 1 0 0 −1 0
0 1 6 −2 0 R2 −R3 0 1 0 −1 0 0 1 0 −1 0
0 0 6 −1 0 −→ 0 0 6 −1 0 1
R
6 3
0 0 1 −1/6 0
We see that for t ∈ R,
x1 = t, x2 = t, x3 = t/6 and x4 = t
There are infinitely many solutions to the homogeneous system. However, since we cannot
have a fractional number of molecules, we require that x1 , x2 , x3 and x4 be nonnegative
integers. This implies that t should be an integer multiple of 6. Moreover, we wish to have
the simplest (or smallest) solution, so we will take t = 6. This gives x1 = x2 = x4 = 6 and
x3 = 1. Thus,
6CO2 + 6H2 O −→ C6 H12 O6 + 6O2
balances the chemical reaction.
Example 23.1. The fermentation of sugar is a chemical reaction given by the following
equation:
C6 H12 O6 −→ CO2 + C2 H5 OH
where C6 H12 O6 is glucose, CO2 is carbon dioxide and C2 H5 OH is ethanol37 . Balance this
chemical reaction.
Solution. Let x1 denote the number of C6 H12 O6 molecules, x2 the number of CO2 molecules
and x3 the number of C2 H5 OH molecules. We obtain
x1 C6 H12 O6 −→ x2 CO2 + x3 C2 H5 OH
Equating the number of atoms of each type before and after the reaction gives the equations
C : 6x1 = x2 + 2x3
O : 6x1 = 2x2 + x3
H : 12x1 = 6x3
37
Ethanol is also denoted by C2 H6 O and CH3 CH2 OH
143
which leads to the homogeneous system of equations
6x1 − x2 − 2x3 = 0
6x1 − 2x2 − x3 = 0
12x1 − 6x3 = 0
6 −2 −1 0 R2 −R1 0 −1 1 0 −→ 0 −1 1 0 −R2
12 0 −6 0 R3 −2R1 0 2 −2 0 R3 +2R2 0 0 0 0 −→
1 0 −1/2 0
0 1 −1 0
0 0 0 0
Thus, for t ∈ R,
x1 = t/2, x2 = t and x3 = t
Taking t = 2 gives the smallest nonnegative integer solution, and we conclude that
Industry A1 A2 A3 A4
Pb 1 0 1 7
SO2 2 1 2 9
NO2 0 2 2 0
The CAAG (Clean Air Action Group) has just leaked a government report that claims that
on one day last year, 250 units of Pb, 550 units of SO2 and 400 units of NO2 were measured
in the atmosphere. An inspector reported that A3 did not break the law on that day. Which
industry (or industries) broke the law on that day?
144
Solution. Let ai denote the number of units of coal burned by Industry Ai , for i = 1, 2, 3, 4.
Using the above table, we account for each of the pollutants on that day.
Pb : a1 + a3 + 7a4 = 250
SO2 : 2a1 + a2 + 2a3 + 9a4 = 550
NO2 : 2a2 + 2a3 = 400
0 0 1 5 150 0 0 1 5 150
where t ∈ R. Now we look for conditions on t. We know A3 did not break that law, so
0 ≤ a3 ≤ 45, that is,
0 ≤ 150 − 5t ≤ 45
−150 ≤ −5t ≤ −105
30 ≥ t ≥ 21
It immediately follows that A4 didn’t break that law as a4 = t. Looking at A2 , we have
21 ≤ t ≤ 30
105 ≤ 5t ≤ 150
155 ≤ 50 + 5t ≤ 200
155 ≤ a2 ≤ 200
21 ≤ t ≤ 30
−42 ≥ −2t ≥ −60
58 ≥ 100 − 2t ≥ 40
58 ≥ a1 ≥ 40
so it is possible that A1 broke the law, but we cannot be sure without more information.
145
Example 23.3. An engineering company has three divisions (Design, Production, Testing)
with a combined annual budget of $1.5 million. Production has an annual budget equal to
the combined annual budgets of Design and Testing. Testing requires a budget of at least
$80 000. What is the Production budget and the maximum possible budget for the Design
division?
Solution. Let x1 denote the annual Design budget, x2 the annual Production budget, and
x3 the annual Testing budget. It follows that x1 + x2 + x3 = 1 500 000. Since the annual
Production budget is equal the the combined Design and Testing budgets, we have x2 =
x1 + x3 . This gives the system of equations
x1 + x2 + x3 = 1 500 000
x1 − x2 + x3 = 0
This gives
x1 = 750 000 − t, x2 = 750 000, x3 = t
where t ∈ R. We know that the Testing budget requires at least $80 000 and can re-
ceive no more than $750 000 (since Testing shares a budget of $750 000 with Design). Thus
80 000 ≤ t ≤ 750 000. It follows that
Hence the Production budget is $750 000 and the maximum Design budget is $670 000.
146
Lecture 24
Junction Rule: At each of the junctions (or nodes) in the network, the flow into that
junction must equal the flow out of that junction.
Our goal is to achieve a network such that every junction obeys the Junction Rule. We say
that such a system is in a steady state or equilibrium.
Figure 47 below gives an example of a network with four nodes, A, B, C and D, and eight
directed line segments. We wish to compute all possible values of f1 , f2 , f3 and f4 so that
the system is in equilibrium.
147
Using the Junction Rule at each node, we construct the following table:
Rearranging each of the above four linear equations leads to the following system:
f1 + f4 = 40
f1 + f2 = 50
f2 + f3 = 60
f3 + f4 = 50
We find that
f1 = 40 − t, f2 = 10 + t, f3 = 50 − t and f4 = t
where t ∈ R. We see that there are infinitely many values for f1 , f2 , f3 and f4 so that the
system is in equilibrium. Note that a negative solution for one of the variables means that the
flow is in the opposite direction than the one indicated in the diagram. Depending on what
the network is representing, we may require that each of f1 , f2 , f3 and f4 be nonnegative.
In this case,
f1 ≥ 0 =⇒ 40 − t ≥ 0 =⇒ t ≤ 40
f2 ≥ 0 =⇒ 10 + t ≥ 0 =⇒ t ≥ −10
f3 ≥ 0 =⇒ 50 − t ≥ 0 =⇒ t ≤ 50
f4 ≥ 0 =⇒ t≥0
Here, we see that 0 ≤ t ≤ 40. They may be more constraints on f1 , f2 , f3 and f4 . For exam-
ple, if the flows in the above network represent the number of automobiles moving between
148
the junctions, then we further require f1 , f2 , f3 and f4 to be integers. In our example, this
would make t = 0, 1, 2, . . . 40, giving us 41 possible solutions.
When using linear algebra to model real world problems, we must be able to interpret our
solutions in terms of the problem it is modelling. This includes incorporating any real world
restrictions imposed by the system we are modelling.
Example 24.1. Consider four train stations labelled A, B, C and D. In the figure below, the
directed line segments represent train tracks to and from stations, and the numbers represent
the number of trains travelling on that track per day. Assume the tracks are one-way, so
trains may not travel in the other direction.
b) Suppose the tracks from A to C and from D to A are closed due to maintenance. Is it
still possible for the system to be in equilibrium?
Solution.
a) We construct a table:
149
Rearranging gives the linear system of equations
f1 − f4 + f5 = 5
f1 − f2 = −10
f2 − f3 + f5 = 10
f3 − f4 = 5
which we carry to RREF
1 0 0 −1 1 5 −→ 1 0 0 −1 1 5 −→
1 −1 0 0 0 −10 −R2 −1
1 0 0 0 10 R2 +R1
0 1 −1 0 1 10 −R3 0 −1 1 0 −1 −10
0 0 1 −1 0 5 −R4 0 0 −1 1 0 −5
1 0 0 −1 1 5 −→ 1 0 0 −1 1 5 −→
0
1 0 −1 1 15
0 1
0 −1 1 15
0 −1 1 0 −1 −10 R3 +R2 0 0 1 −1 0 5
0 0 −1 1 0 −5 0 0 −1 1 0 −5 R4 +R3
1 0 0 −1 1 5
0 1 0 −1 1 15
0 0 1 −1 0 5
0 0 0 0 0 0
giving
f1 = 5 + s − t, f2 = 15 + s − t, f3 = 5 + s, f4 = s and f5 = t
for integers s, t (as we cannot have fractional trains). Moreover, as trains cannot go
the other way, we immediately have
f1 ≥0 =⇒ 5 + s − t ≥ 0 =⇒ s − t ≥ −5
f2 ≥0 =⇒ 15 + s − t ≥ 0 =⇒ s − t ≥ −15
f3 ≥0 =⇒ 5+s≥0 =⇒ s ≥ −5
f4 ≥0 =⇒ s≥0
f5 ≥0 =⇒ t≥0
so we have s, t ≥ 0 and s − t ≥ −5.
b) Assume the tracks from A to C and from D to A are closed. This forces f4 = f5 = 0.
From our previous solution, we have that s = t = 0. Since s − t = 0 ≥ −5, this is a
valid solution. We have
f1 = 5, f2 = 15, f3 = 5, f4 = 0 and f5 = 0
Notice here we have a unique solution.
150
Application: Electrical Networks
Consider the following electrical network shown in Figure 48:
It consists of voltage sources, resistors and wires. A voltage source (often a battery) provides
an electromotive force V measured in volts. This electromotive force moves electrons through
the network along a wire at a rate we refer to as current I measured in amperes (or amps).
The resistors (lightbulbs for example) are measured in ohms Ω, and serve to retard the
current by slowing the flow of electrons. The intersection point between three or more wires
is called a node. The nodes break the wires up into short paths between two nodes. Every
such path can have a different current, and the arrow on each path is called a reference
direction. Pictured here is a voltage source (left) and a resistor (right) between two nodes.
One remark about voltage sources. If a current passes through a battery supplying V volts
from the “−” to the “+”, then there is a voltage increase of V volts. If the current passes
through the same battery from the “+” to the “−”, then there is a voltage drop (decrease)
of V volts.
151
Our aim is to compute the currents I1 , I2 and I3 in Figure 48. The following laws will be
useful.
Ohm’s Law The potential difference V across a resistor is given by V = IR, where I is the
current and R is the resistance.
Note that the reference direction is important when using Ohm’s Law. A current I travelling
across a resistor of 10Ω in the reference direction will result in a voltage drop of 10I while
the same current travelling across the same resistor against the reference direction will result
in a voltage gain of 10I.
Kirchoff ’s Laws
1. Conservation of Energy: Around any closed voltage loop in the network, the algebraic
sum of voltage drops and voltage increases caused by resistors and voltage sources is
zero.
2. Conservation of Charge: At each node, the total inflow of current equals the total
outflow of current.
Kirchoff’s Laws will be used to derive a system of equations that we can solve in order to find
the currents. The Conservation of Energy requires using Ohm’s Law. Returning to Figure
48, we can now solve for I1 , I2 and I3 . Notice that there is an upper loop, and a lower loop.
We may choose any orientation we like for either loop. Given the reference directions, we
will use a clockwise orientation for the upper loop and a counterclockwise orientation for the
lower loop. We will compute the voltage increases and drops as we move around both loops.
Conservation of Energy says the voltage drops must equal the voltage gains around each loop.
For the upper loop, we can start at node A. Moving clockwise, we first have a voltage gain
of 5 from the battery, then a voltage drop of 5I1 at the 5Ω resistor and a 10I2 voltage drop
at the 10Ω resistor. Thus
For the lower loop, we can again start at node A. Moving counterclockwise, we have a
voltage drop of 5I3 followed by a voltage increase of 10 and finally a voltage drop of 10I2 .
We have
I1 − I2 + I3 = 0 (17)
152
Note that at node B we obtain the same equation, so including it would be redundant.
Combining equations (15), (16) and (17) gives the system of equations
I1 − I2 + I3 = 0
5I1 + 10I2 = 5
10I2 + 5I3 = 10
Carrying the augmented matrix of this system to RREF,
1 −1 1 0 −→ 1 −1 1 0 −→ 1 −1 1 0 −→
5 10 0 5 R2 −5R1 0 15 −5 5 5 R2 0 3 −1 1 R2 −R3
1
0 10 5 10 0 10 5 10 1
R
5 3
0 2 1 2
1 −1 1 0 R1 +R2 1 0 −1 −1 −→ 1 0 −1 −1 R1 +R3
0 1 −2 −1 −→ 0 1 −2 −1 0 1 −2 −1 R2 +2R3
0 2 1 2 R3 −2R2 0 0 5 4 1
R
5 3
0 0 1 4/5 −→
1 0 0 −1/5
0 1 0 3/5
0 0 1 4/5
we see that I1 = −1/5 amps, I2 = 3/5 amps and I3 = 4/5 amps. Notice that I1 is negative.
This simply means that our reference direction for I1 in Figure 48 is incorrect and the cur-
rent flows in the opposite direction there. Note that the reference directions may be assigned
arbitrarily.
Note that there is actually a third loop in Figure 48: the loop that travels along the outside
of the network. If we start at node A and travel clockwise around this loop, we first have
a voltage increase of 5, then a voltage drop of 5I1 , then another voltage drop of 10 (as we
pass through the 10V battery from “+” to “−”) and finally a voltage increase of 5I3 (as we
pass through the 5Ω resistor in the opposite reference direction for I3 ). As voltage increases
equal voltage drops, we have 5 + 5I3 = 5I1 + 10, or 5I1 − 5I3 = −5. However, this is just
Equation (16) subtracted from Equation (15). Including this equation in our above system
of equations would only result in an extra row of zeros when we carried the resulting system
of equations to RREF. This will be true in general, and shows that when computing current
in an electrical network, we only need to consider the “smallest” loops.
Another note is that we chose to orient the upper loop in the clockwise direction and the
lower loop in the counterclockwise direction. This was totally arbitrary (but made sense
given the reference directions). We could have changed either of the directions. Of course,
as we saw in the previous paragraph, we have to consider which way our orientation will
cause the current to flow through a battery, and how to handle resistors if our orientation
has us moving in the opposite direction of a reference direction.
153
One last thing to notice here is that since I1 is negative, the current is actually flowing
backwards through the 5V battery. This can happen in a poorly designed electrical network
- the 10V battery is too strong and actually forces the current to travel through the 5V
battery in the wrong direction. Too much current being forced through a battery in the
wrong direction will lead to a fire.
Solution. We begin by using the Conservation of Energy on each of the three smallest closed
loops. Going clockwise around the left loop starting at A, we see a voltage drop of 20I2 , a
voltage gain of 10 and then a drop of 20I1 . This gives
Traversing the middle loop clockwise starting at A, we have a voltage drop of 20I3 followed
by a gain of 20I2 (note the we pass the resistor between A and C in the opposite direction
of I2 ). We obtain
20I2 = 20I3 or I2 − I3 = 0
Moving clockwise around the right loop starting at B, we observe a voltage gain of 20,
followed by a drop of 20I5 and then a gain of 20I3 leading to
20I5 = 20 + 20I3 or I3 − I5 = −1
Next, we apply the Conservation of Charge to the nodes A, B, C and D (in that order) to
obtain the equations
I1 − I2 − I4 =0
I3 − I4 + I5 =0
I1 − I2 − I6 =0
I3 + I5 − I6 =0
154
Finally, we have constructed the system of equations
2I1 + 2I2 = 1
I2 − I3 = 0
I3 − I5 = −1
I1 − I2 − I4 = 0
I3 − I4 + I5 = 0
I1 − I2 − I6 = 0
I3 I5 − I6 = 0
1 −1 0 −→ 0 1 −1
0 0 0 0 0 0 0 0 R2 +R3
0
0 1 0 −1 0 −1
0 0
1 0 −1 0 −1 −→
0 4 0 2 0 0 1 R4 −4R2 0 0 4 2 0 0 1 R4 −4R3
0
0 1 −1 1 0 0
0 0
1 −1 1 0 0 R5 −R3
0
0 0 1 0 −1 0
0 0
0 1 0 −1 0
0 0 1 0 1 −1 0 0 0 1 0 1 −1 0 R7 −R3
1 0 0 −1 −1 0 −1 −→ 1 0 0 −1 −1 0 −1 R1 +R4
0 −1 0 −1 0 −1 0 −1 −→
0 1 0 0 1 0
0 0 1
0 −1 0 −1
0 0 1
0 −1 0 −1
0 0 0 2 4 0 5 0 0 0 1 0 −1 0
0 0 0 −1 2 0 R4 ↔R6 0 0 0 −1
1 2 0 1 R5 +R4
0 0 0
1 0 −1 0
0 0 0
2 4 0 5 R6 −2R4
0 0 0 0 2 −1 1 0 0 0 0 2 −1 1
155
1 0 0 0 −1 −1 −1 −→ 1 0 0 0 −1 −1 −1 R1 +R5
0 −1 0 −1 −1 0 −1 R2 +R5
0 1 0 0 1 0 0
0 0 1 0 −1 0 −1
0 0 1 0
−1 0 −1 R3 +R5
0 0 0 1 0 −1 0 0 0 0 1 0 −1 0 −→
0 0 0 0 2 −1 1 2 R4
1
0 0 0 0 1 −1/2 1/2
0 0 0 0 4 2 5 4 R5
1
0 0 0 0 1 1/2 5/4 R6 −R5
0 0 0 0 2 −1 1 1
R
2 7
0 0 0 0 1 −1/2 1/2 R7 −R5
1 0 0 0 0 −3/2 −1/2 R1 + 32 R6 1 0 0 0 0 0 5/8
0 0 −1/2 −1/2 R2 + 12 R6 0 1 0 0 0 0 −1/8
0 1 0
0 0 1 0 0 −1/2 −1/2
R 3 + 1
2
R 6
0 0
1 0 0 0 −1/8
0 0 0 1 0 −1 0 R4 +R6 0 0 0 1 0 0 3/4
0 0 0 0 1 −1/2 1/2
R5 + 12 R6 0 0
0 0 1 0 7/8
0 0 0 0 0 1 −→ 0 0
3/4 0 0 0 1 3/4
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Finally, we see
5 1 1
I1 = amps, I2 = − amps, I3 = − amps,
8 8 8
3 7 3
I4 = amps, I5 = amps, I6 = amps
4 8 4
In particular, the reference arrows for I2 and I3 are pointing in the wrong direction.
156
Lecture 25
Matrix Algebra
We first encountered matrices when we solved systems of equations, where we performed
elementary row operations to the augmented matrix or the coefficient matrix of the system.
Here, we look at matrices as their own algebraic objects, and we will find that they are not
so different from vectors in Rn .
Definition 25.1. An m × n matrix A is a rectangular array with m rows and n columns.
The entry in the ith row and jth column will be denoted by aij , that is38
a11 a12 · · · a1j · · · a1n
a21 a22 · · · a2j · · · a2n
. .. .. ..
.. . . .
A=
ai1 ai2 · · · aij · · · ain
.. .. .. ..
. . . .
am1 am2 · · · amj · · · amn
which we sometimes abbreviate as A = [aij ] when the size of the matrix is known. Two
m × n matrices A and B are equal if aij = bij for all i = 1, . . . , m and j = 1, . . . , n, and we
write A = B. The set of all m × n matrices with real entries is denoted by Mm×n (R).
For a matrix A ∈ Mm×n (R), we say that A has size m × n and call aij the (i, j)−entry of A.
Note that we may write (A)ij instead of aij . If m = n, we say that A is a square matrix.
Example 25.2. Let
1 2 " #
0 0
A= 6 4 and B =
0 sin π
3 1
Then A is a 3 × 2 matrix and B is a 2 × 2 square matrix.
Definition 25.3. The m × n matrix with all zero entries is called a zero matrix, denoted by
0m×n , or just 0 if the size is clear. Note that the matrix B in the previous example is the
2 × 2 zero matrix.
Definition 25.4. For A, B ∈ Mm×n (R) we define matrix addition as
(A + B)ij = (A)ij + (B)ij
and for c ∈ R, scalar multiplication is defined by
(cA)ij = c(A)ij
38
We normally use an uppercase letter to denote a matrix, such as A, B or C. We will then use aij , bij or
cij , respectively, to denote the entry in the ith row and jth column.
157
Example 25.5. Find a, b, c ∈ R such that
h i h i h i
a b c − 2 c a b = −3 3 6
Solution. Since
h i h i h i
a b c −2 c a b = a − 2c b − 2a c − 2b
we require
a − 2c = −3
−2a + b = 3
−2b + c = 6
1 0 −2 −3 −→ 1 0 −2 −3 −→ 1 0 −2 −3 −→
−2 1 0 3 R2 +2R1 0 1 −4 −3 0 1 −4 −3
0 −2 1 6 0 −2 1 6 R3 +2R2 0 0 −7 0 − 17 R3
1 0 −2 −3 R1 +2R3 1 0 0 −3
0 1 −4 −3 R2 +4R3 0 1 0 −3
0 0 1 0 −→ 0 0 1 0
so a = b = −3 and c = 0.
Note that for any A = Mm×n (R) and any c ∈ R we have that
Example 25.6. Let c ∈ R and A ∈ Mm×n (R) be such that cA = 0m×n . Prove that either
c = 0 or A = 0m×n .
If c = 0, then the result holds, so we assume c 6= 0. But then from (18), we see that aij = 0
for every i = 1, . . . , m and j = 1, . . . , n, that is, A = 0m×n .
The next theorem is very similar to Theorem 6.10, and shows that under our operations of
addition and scalar multiplication, matrices behave very similarly to vectors.
158
V3. (A + B) + C = A + (B + C) (addition is associative)
V4. There exists a matrix 0m×n ∈ Mm×n (R) such that A + 0m×n = A for every
A ∈ Mm×n (R) (zero matrix)
V5. For each A ∈ Mm×n (R) there exists a (−A) ∈ Mm×n (R) such that A + (−A) = 0m×n
(additive inverse)
Then
1 " #
h i 4 −1
AT = 1 2 3 , BT = 4 and C T = .
2 3
8
(3) (A + B)T = AT + B T
159
Solution. Using Theorem 25.10, we have
" #!T " #
T T
1 2 2 3
2A − 3 = by (3)
−1 1 −1 2
" #T " #
T T
1 2 2 3
2 A −3 = by (4)
−1 1 −1 2
" # " #
1 −1 2 3
2A − 3 = by (2)
2 1 −1 2
" # " #
2 3 3 −3
2A = +
−1 2 6 3
" #
1 5 0
A=
2 5 5
" #
5/2 0
A=
5/2 5/2
Then
" #
1 6
AT = =A
6 9
1 −2 3
B T = −2 4 6 6= B
3 5 7
Example 25.14. Prove that if A, B ∈ Mn×n (R) are symmetric, then sA + tB is symmetric
for any s, t ∈ R.
160
Proof. Since A and B are symmetric, we have that AT = A and B T = B. We must show
that (sA + tB)T = sA + tB. We have
so sA + tB is symmetric.
161
Lecture 26
x1 + 3x2 − 2x3 = −7
−x1 − 4x2 + 3x3 = 8
Let
" # x1 " #
1 3 −2 −7
A= , ~x = x2 and ~b = .
−1 −4 3 8
x3
and let " # " # " #
1 3 −2
~a1 = , ~a2 = and ~a3 =
−1 −4 3
be the columns of A so that A = [ ~a1 ~a2 ~a3 ]. Now our above system is consistent if and
only if we can find x1 , x2 , x3 ∈ R so that
" # " # " # " # " #
~b = −7 x 1 + 3x 2 − 2x 3 1 3 −2
= = x1 + x2 + x3
8 −x1 − 4x2 + 3x3 −1 −4 3
= x1~a1 + x2~a2 + x3~a3
that is, the system is consistent if and only if ~b ∈ Span {~a1 , ~a2 , ~a3 }. This is simply Theorem
20.2 which in this case states that ~b ∈ Span {~a1 , ~a2 , ~a3 } if and only if the system with
augmented matrix [ ~a1 ~a2 ~a3 ~b ] is consistent. We make the following definition.
Definition 26.1. Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R) (it follows that ~a1 , . . . , ~an ∈ Rm ) and
~x = [ x1 · · · xn ]T ∈ Rn . Then the vector A~x is defined by
162
Using this definition, we can rewrite our above system as
" # x1 " #
1 3 −2 −7
x2 =
−1 −4 3 8
x3
or more simply as
A~x = ~b.
Example 26.2.
1 5 " # 1 5 9
−1
−1 2 = (−1) −1 + 2 2 = 5
2
−2 1 −2 1 4
x1 + 5x2 = 9
−x1 + 2x2 = 5
−2x1 + x2 = 4
Notice in the previous example that the entries in the solution ~x to the system A~x = ~b are
the coefficients that express ~b as a linear combination of the columns of the coefficient matrix
A.
Theorem 26.3.
(1) Every linear system of equations can be expressed as A~x = ~b for some matrix A and
some vector ~b,
(2) The system A~x = ~b is consistent if and only if ~b can be expressed as a linear combina-
tion of the columns of A,
(3) If ~a1 , . . . , ~an are the columns of A ∈ Mm×n (R) and ~x = [ x1 · · · xn ]T , then ~x
satisfies A~x = ~b if and only if x1~a1 + · · · + xn~an = ~b.
It’s important to keep the sizes of our matrices and vectors in mind:
A |{z} ~b
~x = |{z}
|{z}
m×n Rn Rm
1 4 −1
/ R2 .
is not defined since A has two columns but ~x ∈
163
Example 26.4. " #" # " # " # " #
1 1 1 1 1 0
=1 −1 =
1 1 −1 1 1 0
This shows that for A ∈ Mm×n (R) and ~x ∈ Rn with A 6= 0m×n and ~x 6= ~0Rn we are not
guaranteed that A~x is nonzero.
Recall Theorem 21.12 which states that the solution set for a homogeneous system of equa-
tions in n variables is a subspace of Rn , called the solution space. We proved Theorem 21.12,
but we prove it again here using our new notation for systems of equations. Note how much
more concise the proof now is.
Example 26.6. Let A ∈ Mm×n (R) and ~x ∈ Rn . Let S denote the solution set to the
homogeneous system of linear equations A~x = ~0. Show S is a subspace of Rn .
Proof. Since ~x ∈ Rn , we have that S ⊆ Rn , and since A~0Rn = ~0, ~0Rn ∈ S so S is nonempty.
Suppose ~y , ~z ∈ S. Then A~y = ~0 = A~z and
so ~y + ~z ∈ S. For any c ∈ R,
A(c~y ) = cA~y = c ~0 = ~0
so c~y ∈ S. Hence S is a subspace of Rn .
164
We return now to examine the matrix-vector product. We have seen that A~x can be viewed
as a linear combination of the columns of A which has allowed us to talk about systems of
equations. Writing out and evaluating a linear combination can be tedious, and we will see
that dot products can simplify the task. If we compute A~x where
1 −1 6 1
A= 0 2 1 and ~x = 1 .
4 −3 2 2
then we have
1 −1 6 1(1) + 1(−1) + 2(6) 12
A~x = 1 0 + 1 2 + 2 1 = 1(0) + 1(2) + 2(1) = 4
If we define
1 0 4
~r1 = −1 , ~r2 = 2 and ~r3 = −3
6 1 2
then
~r1 · ~x
A~x = ~r2 · ~x
~r3 · ~x
In general, given A ∈ Mm×n (R), there are vectors ~r1 , . . . , ~rm ∈ Rn so that
~r1T
.
A = ..
~rmT
Definition 26.7. The n × n identity matrix, denoted by In (or In×n or just I if the size is
clear) is the square matrix of size n × n with aii = 1 for i = 1, 2, . . . , n (these entries make
up what we call the main diagonal of the matrix) and zeros elsewhere.
165
For example,
1 0 0 0
" # 1 0 0
1 0 0 1 0 0
I2 = I3 = 0 1 0 I4 =
0 1 , , 0 0 1 0 ,
0 0 1
= [ ~e1 ~e2 ] 0 0 0 1
= [ ~e1 ~e2 ~e3 ]
= [ ~e1 ~e2 ~e3 ~e4 ]
Note that A~x = B~x is equivalent to (A − B)~x = ~0, and we have seen in Example 26.4 that we can have
39
166
Lecture 27
Definition 27.2. Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R). The column space of A is the subset
of Rm defined by
Note that the nullspace of A is simply the solution space of the homogeneous system of
equations A~x = ~0 and is hence a subspace of Rn by Theorem 21.12. Since the column space
of A is simply the span of the columns of A, we have that Col (A) is a subspace of Rm by
Example 16.6. Similarly, Row (A) is a subspace of Rn .
From our previous work, we know that the system A~x = ~b is consistent if and only if ~b is a
linear combination of the columns of A. Now we can say that A~x = ~b is consistent if and
only if ~b ∈ Col (A).
Let A ∈ Mm×n (R). We already know how to find a basis for the nullspace of A, and since
the column space of A is simply the span of the columns of A, finding a basis for Col (A)
amounts to removing dependencies among the columns of A, which is a method we have
previously derived. But how do we find a basis for the row space of A?
Theorem 27.4. Let A ∈ Mm×n (R). If R is obtained from A by a series of elementary row
operations, then Row (R) = Row (A).
167
Proof. Let A ∈ Mm×n (R) with rows ~r1T , . . . , ~rmT . It is sufficient to show that Row (A) is
unchanged by each of the three elementary row operations. Let 1 ≤ i, j ≤ m with i 6= j. If
we swap the ith row and jth row of A, then the row space of the resulting matrix will be
spanned by
~r1 , . . . , ~ri−1 , ~rj , ~ri+1 , . . . , ~rj−1 , ~ri , ~rj+1 , . . . , ~rm
(we’ve shown the case for i < j, the case j < i being similar) and it’s not difficult to see that
Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~ri−1 , ~rj , ~ri+1 , . . . , ~rj−1 , ~ri , ~rj+1 , . . . , ~rm }. (19)
If we add k times the ith row of A to the jth row of A, then the resulting matrix will have
a row space spanned by
~r1 , . . . , ~rj−1 , ~rj + k~ri , ~rj+1 , . . . , ~rm
and it’s not difficult to show that
Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~rj−1 , ~rj + k~ri , ~rj+1 , . . . , ~rm }. (20)
Finally, if we multiply the ith row of A by a nonzero scalar k ∈ R, then the row space of the
resulting matrix will be spanned by
Together, equations (19), (20) and (21) show that if R is obtained from A by a series of
elementary row operations, then Row (R) = Row (A).
It follows from Theorem 27.4 that to find a basis for Row (A), it is sufficient to find a basis
for Row (R). The next example will show that this is quite easy if R is the reduced row
echelon form of A.
Example 27.5. Let
1 1 5 1
A= 1 2 7 2
2 3 12 3
Find a basis for Null (A), Col (A) and Row (A), and state the dimensions of each of these
subspaces.
Solution. Carrying A to RREF gives
1 1 5 1 −→ 1 1 5 1 R1 −R2 1 0 3 0
1 2 7 2 R2 −R1 0 1 2 1 −→ 0 1 2 1
2 3 12 3 R3 −2R1 0 1 2 1 R3 −R2 0 0 0 0
168
The solution to the homogeneous system A~x = ~0 is
x1 −3 0
x −2 −1
2
= s + t , s, t ∈ R
x3 1 0
x4 0 1
so
−3 0
−2 −1
B1 = ,
1 0
0 1
is a basis for Null (A) and dim(Null (A)) = 2. Also, as only the first two columns of the
RREF of A have leading entries,
1
1
B2 = 1 , 2
2 3
is a basis for Col (A) and dim(Col (A)) = 2. Theorem 27.4 tells us that the rows of the
reduced row echelon form of A span Row (A), so
1 0 0
0 1 0
Row (A) = Span , , .
3 2 0
0 1 0
Since each of the nonzero vectors in our spanning set for Row (A) has a 1 where the others
have a zero, the nonzero vectors in our spanning set are linearly independent and still span
Row (A). Hence
1 0
0 1
B3 = ,
3 2
0 1
is a basis for Row (A) and dim(Row (A)) = 2.
Note that if R is the reduced row echelon form of any matrix A, then the nonzero rows of
R will each contain a 1 where the other rows will have a zero and so the nonzero rows of R
will be a linearly independent set and hence a basis for Row (A).40
40
With a bit more thought, one realizes that the nonzero rows of any row echelon form of A form a basis
for Row (A).
169
Note that to find a basis for Col (A), we carry A to any row echelon form R (preferably
reduced row echelon form, particularly if we also seek a basis for Null (A)) and look for the
columns of R with leading entries. The corresponding columns of A will form a basis for
Col (A). To find a basis for Row (A), we simply take nonzero rows of R. It follows that for
any A ∈ Mm×n (R),
dim(Col (A)) = rank (A) = dim(Row (A)).
Also, by the System–Rank Theorem,
Example 27.6. Find a basis for Null (A), Col (A) and Row (A) where
1 2 1 3 4
A= 3 6 2 6 9
−2 −4 1 1 −1
3 6 2 6 9 R2 −3R1 0 0 −1 −3 −3 −→
−2 −4 1 1 −1 R3 +2R1 0 0 3 7 7 R3 +3R2
1 2 0 0 1 −→ 1 2 0 0 1 −→ 1 2 0 0 1
0 0 −1 −3 −3 −R2 0 0 1 3 3 R2 −3R3 0 0 1 0 0
0 0 0 −2 −2 − 21 R3 0 0 0 1 1 0 0 0 1 1
We have
x1 −2 −1
−2 −1
x2 1 0 1 0
x3 = s
0 + t
0 ,
s, t ∈ R so B1 =
, 0
0
x4 0 −1 0 −1
x5 0 1 0 1
is a basis for Null (A) showing that dim(Null (A)) = 2. As the first, third and fourth columns
of the RREF of A have leading entries,
1 1 3
B2 = 3 , 2 , 6
−2 1 1
170
is a basis for Col (A) and dim(Col (A)) = 3. Finally, the nonzero rows of the reduced row
echelon form of A give
1 0 0
2 0 0
B3 = 0 ,
1 ,
0
0 0 1
1 0 1
Matrix Multiplication
We now extend the matrix–vector product to matrix multiplication.
Definition 27.7. If A ∈ Mm×n (R) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (R), then the matrix
product AB is the m × k matrix
AB = [ A~b1 · · · A~bk ].
2 2
Then
" # 1 " # " # 2 " #
1 2 3 9 1 2 3 6
A~b1 = 1 = and A~b2 = −1 =
−1 −1 1 0 −1 −1 1 1
2 2
so " #
9 6
AB = [ A~b1 A~b2 ] = .
0 1
171
In general, for the product AB to be defined, the number of columns of A must equal
the number of rows of B. If this is the case, then A ∈ Mm×n (R) and B ∈ Mn×k (R) and
AB ∈ Mm×k (R).
The above method to multiply matrices can be quite tedious. As with the matrix–vector
product, we can simplify the task using dot products. For
~r1T
.
A = .. ∈ Mm×n (R) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (R)
~rmT
we see that ~ri ∈ Rn for i = 1, . . . , m and ~bj ∈ Rn for j = 1, . . . , k so the dot product ~ri · ~bj is
defined. Then we have
Then
" #" # " # " #
1 2 1 1 3 1(1) + 2(4) 1(1) + 2(−2) 1(3) + 2(1) 9 −3 5
AB = = =
3 4 4 −2 1 3(1) + 4(4) 3(1) + 4(−2) 3(3) + 4(1) 19 −5 13
In the previous example, note that A ∈ M2×2 (R) and B ∈ M2×3 (R) so AB ∈ M2×3 (R).
However, the number of columns of B is not equal to the number of rows of A, so the
product BA is not defined.
172
Lecture 28
Example 28.1. Let " # " #
1 1 1 2
A= and B = .
1 1 1 −1
Then
" #"
# " #
11 12 2 1
AB = =
1 −1
1 1 2 1
" #" # " #
1 2 1 1 3 3
BA = =
1 −1 1 1 0 0
from which we see that AB 6= BA despite the products AB and BA both being defined and
having the same size.
Examples 27.9 and 28.1 show us that matrix multiplication is not commutative. That is,
given two matrices A and B such that AB is defined, the product BA may not be defined,
and even if it is, BA may not be equal to AB (in fact, BA need not have the same size as
AB: consider A ∈ M2×3 (R) and B ∈ M3×2 (R)).
but
" #" # " #
1 3 1 −1 4 5
AT B T = =
2 4 1 2 6 6
and in fact
" #" # " #
1 −1 1 3 −1 −1
B T AT = = = (AB)T
1 2 2 4 5 11
173
41
Theorem 28.3. Let c ∈ R and A, B, C be matrices so that the following are defined
(7) (AB)T = B T AT
Note that since we defined matrix products in terms of the matrix vector product, we have
that (3) holds for the matrix vector product also: A(B~x) = (AB)~x where ~x has the same
number of entries as B has columns. We also note that (7) can be generalized as
Solution. We have
Note:
• A(3B − C) = 3AB − AC, that is, when distributing, A must remain on the left
• (A − 2B)C = AC − 2BC, that is, when distributing, C must remain on the right
174
Proof. Since C commutes with both A and B, we have that AC = CA and BC = CB. Thus
Complex Matrices
We denote the set of m×n matrices with complex entries by Mm×n (C). The rules of addition,
scalar multiplication, matrix-vector product, matrix multiplication and transpose derived for
real matrices also hold for complex matrices.
Then
" #" # " #
j 2−j 1 j 2 + 5j −3j
AB = =
4 + j 1 − 2j 2j 1 − j 8 + 3j −2 + j
" #" # " #
1 j j 2−j −1 + 5j 4
BA = =
2j 1 − j 4 + j 1 − 2j 3 − 3j 1 + j
from which we see that AB 6= BA, so multiplication of complex matrices also doesn’t
commute.
Note that if
~r1T
.
A = .. ∈ Mm×n (C) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (C)
~rmT
then ~ri ∈ Cn for i = 1, . . . , m and ~bj ∈ Cn for j = 1, . . . , k so the dot product ~ri ·~bj is defined.
Then we have
Thus, the (i, j)−entry of AB is ~ri · ~bj . It’s important to note that we use the dot product
here, and not the complex inner product.
175
Definition 28.7. Let A = [aij ] ∈ Mm×n (C). Then the conjugate of A is
A = [aij ]
Example 28.8.
" #∗ 1−j 2
1 + j 1 − 2j j
= 1 + 2j j
2 −j 3+j
−j 3−j
Then
" #
−j 1 − j
A∗ = 6= A
1−j 3
" #
3 2−j
B∗ = =B
2+j 6
(2) (A + B)∗ = A∗ + B ∗
(4) (AB)∗ = B ∗ A∗
(5) (A~z)∗ = ~z ∗ A∗
176
Application: Adjacency Matrices for Directed Graphs
A directed graph (or digraph) is a set of vertices and a set of directed edges between some of
the pairs of vertices. We may move from one vertex in the directed graph to another vertex
if there is a directed edge pointing in the direction we wish to move. Consider the directed
graph below:
This graph has four vertices, V1 , V2 , V3 and V4 . A directed edge between two vertices Vi and
Vj is simply the arrow pointing from Vi to Vj . As seen in the figure, we may have a directed
edge from a vertex to the same vertex (see V1 ), an edge may be directed in both directions
(see V2 and V3 ) and there may be more than one directed edge from one vertex to another
(see V3 and V4 ).
One question we may ask is in how many distinct ways can we get from V1 to V4 travelling
along exactly 3 directed edges, that is, how many distinct 3−edged paths are there from V1
to V4 ? A little counting reveals that there are 6 distinct such paths:
upper
V1 −−−→ V1 −−−→ V3 −−−→ V4
lower
V1 −−−→ V1 −−−→ V3 −−−→ V4
V1 −−−→ V1 −−−→ V2 −−−→ V4
upper
V1 −−−→ V2 −−−→ V3 −−−→ V4
lower
V1 −−−→ V2 −−−→ V3 −−−→ V4
V1 −−−→ V3 −−−→ V2 −−−→ V4
177
Note that each time we move from V3 to V4 , we specify which directed edge we are taking
since there is more than one. We could alternatively label each directed edge as we have the
vertices. However, we are more concerned with counting the number of paths and not with
actually listing them all out.
Counting may seem easy, but what if we were asked to find all distinct 20−edged paths
from V1 to V4 ? After months of counting, you would find 2 584 875 distinct paths. Clearly,
counting the paths one-by-one is not the best method.
Consider the 4 × 4 matrix A whose (i, j)−entry is the number of directed edges from Vi to
Vj . Then
1 1 1 0
0 0 1 1
A=
0 1 0 2
1 0 0 0
We compute
1 2 2 3 4 3 3 6
2
1 1 0 2
3 3 1 2 1
A = and A =
2 0 1 1 3 3 2 2
1 1 1 0 1 2 2 3
and note that the (1, 4)−entry of A3 is 6 which is the number of distinct 3−edged paths
from V1 to V4 . In fact, the (i, j)−entry of A3 gives the number of distinct 3−edged paths
from Vi to Vj for any i and j with 1 ≤ i, j ≤ 4.
Definition 28.13. Consider a directed graph with n vertices V1 , V2 , . . . , Vn . The adjacency
matrix of the directed graph is the n × n matrix A whose (i, j)−entry is the number of
directed edges from Vi to Vj .
Theorem 28.14. Consider a directed graph with n vertices V1 , V2 , . . . , Vn . For any positive
integer k, the number of distinct k−edged paths from Vi to Vj is given by the (i, j)−entry of
Ak .
Proof. 43 The result is true for k = 1 since the (i, j)−entry of A1 = A is by definition the
number of distinct 1−edged paths from Vi to Vj . Assume now that the result is true for
(k)
some positive integer k. Denote the (i, j)−entry of Ak by aij so that the number of distinct
(k) (k+1)
k−edged paths from Vi to Vj is aij . Consider the (i, j)−entry of Ak+1 , denoted by aij .
We have n
(k+1)
X (k) (k) (k) (k)
aij = ai` a`j = ai1 a1j + ai2 a2j + · · · + ain anj
`=1
43
The proof technique used here is called induction. Although you will not be asked to give a proof by
induction, we include this proof here as it illustrates why Ak gives the number of k−edged paths between
the vertices of a directed graph.
178
Note that every (k + 1)−edged path from Vi to Vj is of the form
Example 28.15. An airline company offers flights between the cities of Toronto, Beijing,
Paris and Sydney. You can fly between these cities as you like, except that there is no
flight from Beijing to Sydney, and there is no flight between Toronto and Sydney in either
direction.
(a) If you depart from Toronto, how many distinct sequences of flights can you take if you
plan to arrive in Beijing after no more than 5 flights? (You may arrive at Beijing in
less than 5 flights and then leave, provided you end up back in Beijing after no later
than the 5th flight).
(b) Suppose you wish to depart from Sydney and arrive in Beijing after the 5th flight. In
how many ways can this be done so that your second flight takes you to Toronto?
(c) Suppose you wish to depart from Sydney and arrive in Beijing after the 5th flight. In
how many ways can this be done so that you visit Toronto at least once?
Solution. We denote the four cities as vertices, and place a directed arrow between two cities
if we can fly between the two cities in that direction. We label Toronto as V1 , Beijing as V2 ,
Paris as V3 and Sydney as V4 . We obtain the following directed graph:
179
We construct the adjacency matrix A as
0 1 1 0
1 0 1 0
A=
1 1 0 1
0 1 1 0
(a) Since the (1, 2)−entry of Ak gives the number of distinct ways to fly from Toronto to
Beijing using k flights, we simply add the (1, 2)−entries of these five matrices. We
have 1 + 1 + 4 + 7 + 19 = 32. Thus, there are 32 distinct ways to fly from Toronto to
Beijing using no more than 5 flights.
(b) Here, we need to fly from Sydney to Beijing using exactly 5 flights. The (4, 2)−entry of
A5 tells us that there are 19 ways to do this. However, we must pass through Toronto
after the second flight. This restriction implies our final answer should be no greater
than 19. We will compute the number of ways to fly from Sydney to Toronto in two
flights, and then the number of ways to fly from Toronto to Beijing in three flights,
and finally multiply our results together to get the final answer. Thus we compute
(2) (3)
a41 · a12 = 2 · 4 = 8
There are 8 ways to fly from Sydney to Beijing in 5 flights, stopping in Toronto after
the second flight.
(c) Here it is tempting to count the number of flights from Sydney that pass through
Toronto after the first flight, then the number of flights that pass through Toronto
after the second flight, third flight and fourth flight, then add the results, that is, to
compute
(1) (4) (2) (3) (3) (2) (4) (1)
a41 · a12 + a41 · a12 + a41 · a12 + a41 · a12 = 0 · 7 + 2 · 4 + 2 · 1 + 8 · 1 = 18
and conclude that there are 18 such flights. However, the sequence of flights
is “double-counted” as it passes through Toronto twice. Thus there should be less than
18 such flights. To avoid this double-counting, we will instead count the number of
ways to fly from Sydney to Beijing without visiting Toronto, and we will accomplish
this by removing Toronto from our directed graph:
180
This leads to a new adjacency matrix
0 0 0 0
0 0 1 0
B=
0 1 0 1
0 1 1 0
The (4, 2)−entry of B 5 shows that there are 5 distinct way to fly from Sydney to Beijing
in 5 flights without stopping in Toronto. Since the (4, 2)-entry of A5 shows there are
19 ways to fly from Sydney to Beijing in 5 flights, there must be 19 − 5 = 14 distinct
ways to fly from Sydney to Beijing in 5 flights while visiting Toronto at least once.
181
Lecture 29
182
Definition 29.1. A vector ~s ∈ Rn is called a probability vector if the entries in the vector are
nonnegative and sum to 1. A square matrix is called stochastic if its columns are probability
vectors. Given a stochastic matrix P , a Markov Chain is a sequence of probability vectors
~s0 , ~s1 , ~s2 , . . . where
~sk+1 = P~sk
for every nonnegative integer k. In a Markov Chain, the probability vectors ~sk are called
state vectors.
Now suppose that for k = 0 (the moment the zombie apocalypse begins - referred to by
survivors as “Z–Day”), everyone is still human. Thus, a person is a human with probability
1 and a person is a zombie with probability 0. This gives
" #
1
~s0 =
0
showing that one day after the start of the zombie apocalypse, 1/2 of the population are
humans while the other 1/2 of the population are now zombies. Now
" #" # " # " #
1/2 1/4 1/2 3/8 0.37500
~s2 = P~s1 = = =
1/2 3/4 1/2 5/8 0.62500
" #" # " # " #
1/2 1/4 3/8 11/32 0.34375
~s3 = P~s2 = = =
1/2 3/4 5/8 21/32 0.65625
183
It appears that the sequence ~s0 , ~s1 , ~s2 , . . . is converging44 to
" #
1/3
~s =
2/3
To algebraically determine any steady-state vectors in above our example, we start with
P~s = ~s. Then
P~s − ~s = ~0
P~s − I~s = ~0
(P − I)~s = ~0
so that we have a homogeneous system. Note the introduction of the identity matrix I above.
It might be tempting to go from P~s − ~s = ~0 to (P − 1)~s = ~0, but since P is a matrix and 1
is a number, P − 1 is not defined. Computing the coefficient matrix P − I and row reducing
gives
" # " # " #
−1/2 1/4 −→ −1/2 1/4 −2R1 1 −1/2
1/2 −1/4 R2 +R1 0 0 −→ 0 0
184
h iT
Now, what happens if we change our initial state vector ~s0 ? If we let ~s0 = h0 z0 and
h iT
recall that h0 + z0 = 1, we obtain that z0 = 1 − h0 , so ~s0 = h0 1 − h0 . It is a good
exercise to show that by repeatedly using Equation (24), we obtain
h0 1 1
+ −
4k 3 3 · 4k
~sk =
h 2 1
0
− k + +
4 3 3 · 4k
Since both h0 /4k and 1/(3 · 4k ) tend to zero as k tends to infinity, we see that for any initial
state vector ~s0 , the sequence ~s0 , ~s1 , ~s2 , . . . tends to
" #
1/3
~s =
2/3
This means that once the zombie apocalypse begins, in the long-run for a city of 100 000
people, we can expect 33 333 humans and 66 667 zombies each day, and that this long-term
outcome does not depend on the initial state ~s0 . Note that once this steady-state is achieved
humans are still turning into zombies, and zombies are still reverting back to humans each
day, but that the number of humans turning into zombies is equal to the number of zombies
turning into humans. It’s worth noting that once the steady-state is achieved, there would
actually be 33 333.3̄ humans and 66 666.6̄ zombies. We have rounded our final answers due
to the real-world constraints that we cannot have fractional humans.
Note that in our above zombie example, our stochastic matrix P had a unique steady-state
and that the Markov Chain converged to this steady-state regardless of the initial state vec-
tor ~s0 . The next two examples show that this is not always the case.
Example 29.3. The n × n identity matrix I is a stochastic matrix, and for any state vector
~s ∈ Rn , I~s = ~s. This shows that every state vector ~s ∈ Rn is a steady-state vector for I.
Thus we do not have a unique steady-state vector.
Example 29.4. For our second example, consider the stochastic matrix
" #
0 1
Q=
1 0
h iT
Then for any state vector ~s = s1 s2 ,
" #" # " #
0 1 s1 s2
Q~s = =
1 0 s2 s1
185
In order for ~s to be a steady-state vector, we require Q~s = ~s, and so we have that s1 = s2 =
1/2. Thus we have a unique steady-state vector (which we also could have found by solving
the homogeneous system (Q − I)~s = ~0 as above). However, if we take the initial state vector
h iT
~s0 = 1 0 , we find
" #" # " #
0 1 1 0
~s1 = Q~s0 = =
1 0 0 1
" #" # " #
0 1 0 1
~s2 = Q~s1 = = = ~s0
1 0 1 0
so that the Markov Chain doesn’t converge to the steady-state with this initial state. In
h iT
fact, the Markov Chain converges to the steady-state only when ~s0 = 1/2 1/2 .
Clearly, the stochastic matrix P from the zombie apocalypse example is special in the sense
that P has a unique steady-state vector and any Markov Chain will converge to this steady-
state regardless of the initial state vector chosen. This is because the matrix P is regular.
Definition 29.5. An n × n stochastic matrix P is called regular if for some positive integer
k, the matrix P k has all positive entries.
Since a stochastic matrix has all entries between 0 and 1 inclusive, a stochastic matrix P
fails to be regular when P k contains a zero entry for every positive integer k. Clearly,
" #
1/2 1/4
P = P1 =
1/2 3/4
is regular as all entries are positive. The n × n identity matrix is not regular since for any
positive integer k, I k = I contains zero entries. The matrix
" #
0 1
Q=
1 0
186
is not regular since for any positive integer k,
" #
1 0
, if k is even,
0 1
Qk =
" #
0 1
, if k is odd
1 0
for k = 0, 1, 2, . . . and
s1
.
~s = ..
sn
for our steady-state vector.
187
Lecture 30
Matrix Inverses45
We have seen that like real numbers, we can multiply matrices. For real numbers, we know
that 1 is the multiplicative identity since 1(x) = x = x(1) for any x ∈ R. We also know that
if x, y ∈ R are such that xy = 1 = yx, then x and y are multiplicative inverses of each other,
and we say that they are both invertible. We have recently seen that for an n × n matrix A,
IA = A = AI where I is the n × n identity matrix which shows that I is the multiplicative
identity for Mn×n (R). It is then natural to ask that for a given a matrix A, does there exist
a matrix B so that AB = I = BA?46
Definition 30.1. Let A ∈ Mn×n (R). If there exists a B ∈ Mn×n (R) such that
AB = I = BA
so A is not invertible.
45
When we say inverse here, we mean multiplicative inverse. Given any matrix A ∈ Mm×n (R), the additive
inverse of A is −A, which is both easy to compute and not very interesting to study.
46
The requirement that AB = BA imposes the condition that A be a square matrix.
188
Notice that in the previous example, A is a nonzero matrix that fails to be invertible. This
might be surprising since for a real number x, we know that x being invertible is equivalent
to x being nonzero. Clearly this is not the case for n × n matrices.
By the above definition, to show that B ∈ Mn×n (R) is an inverse of A ∈ Mn×n (R), we must
check that both AB = I and BA = I. Then next theorem shows that if AB = I, then it
follows that BA = I (or equivalently, if BA = I then it follows that AB = I) so that we
may verify only one of AB = I and BA = I to conclude that B is an inverse of A.
Proof. Let A, B ∈ Mn×n (R) be such that AB = I. We first show that rank (B) = n. Let
~x ∈ Rn be such that B~x = ~0. Since AB = I,
so ~x = ~0 is the only solution to the homogeneous system B~x = ~0. Thus, rank (B) = n by
the System–Rank Theorem(2).
We next show that BA = I. Let ~y ∈ Rn . Since rank (B) = n and B has n rows, the
System–Rank Theorem(3) guarantees that we will find ~x ∈ Rn such that ~y = B~x. Then
Finally, since BA = I, it follows that rank (A) = n by the first part of our proof with the
roles of A and B interchanged.
We have now proven that if A ∈ Mn×n (R) is invertible, then rank (A) = n. It follows that
the reduced row echelon form of A is I. We now prove that if A is invertible, then the inverse
of A is unique.
Theorem 30.5. Let A ∈ Mn×n (R) be invertible. If B, C ∈ Mn×n (R) are both inverses of A,
then B = C.
Proof. Assume for A, B, C ∈ Mn×n (R) that both B and C are inverses of A. Then BA = I
and AC = I. We have
B = BI = B(AC) = (BA)C = IC = C.
Hence, if A is invertible, the inverse of A is unique, and we denote this inverse by A−1 .
189
Theorem 30.6. Let A, B ∈ Mn×n (R) be invertible and let c ∈ R with c 6= 0. Then
(1) (cA)−1 = 1c A−1
Consider A ∈ M3×3 (R). If A is invertible, then there exists an X = [ ~x1 ~x2 ~x3 ] ∈ M3×3 (R)
such that
AX = I
47
Don’t you even think about writing A−1 = 1
A. This is wrong as 1
A is not even defined.
190
A[ ~x1 ~x2 ~x3 ] = [ ~e1 ~e2 ~e3 ]
[ A~x1 A~x2 A~x3 ] = [ ~e1 ~e2 ~e3 ]
Thus
A~x1 = ~e1 , A~x2 = ~e2 and A~x3 = ~e3 .
We have three systems of equations with the same coefficient matrix, so we construct an
augmented matrix
[ A | ~e1 ~e2 ~e3 ] = [ A | I ]
We must consider two cases when solving this system. First, if the reduced row echelon form
of A is I, then
[ A | I ] −→ [ I | ~b1 ~b2 ~b3 ]
where B = [ ~b1 ~b2 ~b3 ] ∈ M3×3 (R) is the matrix that I reduces to under the same elemen-
tary row operations that carry A to I. From this, we see that ~b1 is the solution to A~x1 = ~e1 ,
~b2 is the solution to A~x2 = ~e2 and ~b3 is the solution to A~x3 = ~e3 , that is,
Thus, for A ∈ Mn×n (R), to see if A is invertible (and to compute A−1 if A is invertible),
carry the matrix [ A | I ] to reduced row echelon form. If the reduced row echelon form of
[ A | I ] is [ I | B ] for some B ∈ Mn×n (R), then B = A−1 , but if the reduced row echelon form
of A is not I, then A is not invertible. This is known as the Matrix Inversion Algorithm.
Example 30.7. Let " #
2 3
A= .
4 5
Find A−1 if it exists.
Solution. We have
" # " # " #
2 3 1 0 −→ 2 3 1 0 R1 +3R2 2 0 −5 3 1
R
2 1
4 5 0 1 R2 −2R1 0 −1 −2 1 −→ 0 −1 −2 1 −R2
" #
1 0 −5/2 3/2
0 1 2 −1
So A is invertible (since the reduced row echelon form of A is I) and
" #
−5/2 3/2
A−1 = .
2 −1
191
Example 30.8. Let " #
1 2
A= .
2 4
Find A−1 if it exists.
Solution. We have
" # " #
1 2 1 0 −→ 1 2 1 0
2 4 0 1 R2 −2R1 0 0 −2 1
1 2 −2
Find A−1 if it exists.
Solution. We have
1 0 −1 1 0 0 −→ 1 0 −1 1 0 0 −→
1 1 −2 0 1 0 R2 −R1 0 1 −1 −1 1 0
1 2 −2 0 0 1 R3 −R1 0 2 −1 −1 0 1 R3 −2R2
1 0 −1 1 0 0 R1 +R3 1 0 0 2 −2 1
0 1 −1 −1 1 0 R2 +R3 0 1 0 0 −1 1
0 0 1 1 −2 1 −→ 0 0 1 1 −2 1
1 −2 1
Note that if you find A to be invertible and you compute A−1 , then you can check your work
by ensuring that AA−1 = I.
192
Lecture 31
Then
" #" # " #
1 1 1 1 2 2
AB = =
0 1 1 1 1 1
" #" # " #
2 0 1 1 2 2
CA = =
1 0 0 1 1 1
So AB = CA but B 6= C.
193
The previous example shows that we do not have mixed cancellation. This is a direct result of
matrix multiplication not being commutative. From AB = CA, we can obtain B = A−1 CA,
and since B 6= C, we have C 6= A−1 CA. Note that we cannot cannot cancel A and A−1 here.
Example 31.3. For A, B ∈ Mn×n (R) with A, B and A + B invertible, do we have that
(A + B)−1 = A−1 + B −1 ?
A−1 + B −1 = I −1 + I −1 = I + I = 2I
Theorem 31.4 (Invertible Matrix Theorem). Let A ∈ Mn×n (R). The following are equiva-
lent.
(1) A is invertible
(4) For all ~b ∈ Rn , the system A~x = ~b is consistent and has a unique solution
(8) AT is invertible
194
In particular, for A invertible, the system A~x = ~b has a unique solution. We can solve for ~x
using our matrix algebra:
A~x = ~b
A−1 A~x = A−1~b
I~x = A−1~b
~x = A−1~b
~x = A−1~b
" #" #
−5/2 3/2 4
=
2 −1 −1
" #
−23/2
=
9
Of course we could have solved the above system A~x = ~b by row reducing the augmented
matrix [ A | ~b ] → [ I | −23/2
9
]. Note that to find A−1 we row reduced [ A | I ] −→ [ I | A−1 ] and
that the elementary row operations used in both cases are the same.
Linear Transformations
Recall that a function is a rule that assigns to every element in one set (called the domain
of the function) a unique element in another set (called the codomain 48 of the function).
Given sets U and V we write f : U → V to indicate that f is a function with domain U
and codomain V , and it is understood that to each element u ∈ U , the function f assigns
a unique element v ∈ V . We say that f maps u to v and that v is the image of u under f .
We typically write v = f (u). See Figure 49.
195
(a) A function with domain U and codomain (b) This fails to be a function from U to V
V. for two reasons: it doesn’t assign an image
in V to all points in U , and it assigns to one
point in U more than one image in V .
Figure 49: An example of a function (on the left) and something that fails to be a function
(on the right).
Definition 31.6. For A ∈ Mm×n (R), the function fA : Rn → Rm defined by fA (~x) = A~x
for every ~x ∈ Rn is called the matrix transformation corresponding to A. We call Rn the
domain of fA and Rm the codomain of fA . We say that fA maps ~x to A~x and say that A~x
is the image of ~x under fA .
• The subscript A in fA is merely to indicate that the function depends on the matrix
A. If we change the matrix A, we change the function fA .
• For A ∈ Mm×n (R), we have that fA : Rn → Rm . This is a result of how we defined the
matrix-vector product.
196
and more generally,
" # x1
1 2 3
fA (x1 , x2 , x3 ) = x2 = (x1 + 2x2 + 3x3 , x1 − x2 + x3 ).
1 −1 1
x3
Since for A ∈ Mm×n (R), the function fA sends vectors in Rn to vectors in Rm , we should be
writing
x1 x1 y1
. . .
fA .. = A .. = .. ,
xn xn ym
but as functions are often viewed as sending points to points, we will prefer the notation
y1
.
f (x1 , . . . , xn ) = (y1 , . . . , ym ) or f (x1 , . . . , xn ) = .. .
ym
Theorem 31.8. Let A ∈ Mm×n (R) and let fA be the matrix transformation corresponding
to A. For every ~x, ~y ∈ Rn and for every c ∈ R,
and
fA (c~x) = A(c~x) = cA~x = cfA (~x).
197
Thus matrix transformations preserve vector sums and scalar multiplication. Combin-
ing these two results shows that matrix transformations preserve linear combinations: for
~x1 , . . . , ~xk ∈ Rn and c1 , . . . , ck ∈ R,
Functions which preserve linear combinations are called linear transformations or linear
mappings.
It follows immediately from Theorem 31.8 that every matrix transformation is a linear trans-
formation.
L(~0Rn ) = ~0Rm ,
that is, a linear transformation always sends the zero vector of the domain to the zero vector
of the codomain. By taking s = −1 and t = 0, we see that
L(−~x) = −L(~x)
Linear transformations are important throughout mathematics – in fact, we have seen them
in calculus.49 For differentiable functions f, g : R → R, and s, t ∈ R we have
d d d
(sf (x) + tg(x)) = s f (x) + t g(x).
dx dx dx
Example 31.10. Show that L : R2 → R2 defined by
is a linear transformation.
198
= (sx1 + ty1 ) − (sx2 + ty2 ), 2(sx1 + ty1 ) + (sx2 + ty2 )
= (sx1 − sx2 , 2sx1 + sx2 ) + (ty1 − ty2 , 2ty1 + ty2 )
= s(x1 − x2 , 2x1 + x2 ) + t(y1 − y2 , 2y1 + y2 )
= sL(~x) + tL(~y ).
Solution. To show that L is not linear, we must exhibit two vectors ~x, ~y ∈ R2 and two
scalars s, t ∈ R such that L(s~x + t~y ) 6= sL(~x) + tL(~y ). We know that the norm does not
generally preserve sums, so we will take s = t = 1 and choose two nonzero nonparallel vectors
~x, ~y ∈ R2 . Consider " # " #
1 0
~x = and ~y = .
0 1
Then " #
1 √
L(~x + ~y ) = L(1, 1) = = 2
1
and " # " #
1 0
L(~x) + L(~y ) = L(1, 0) + L(0, 1) = + = 1 + 1 = 2.
0 1
As we have found vectors ~x, ~y ∈ R2 such that L(~x + ~y ) 6= L(~x) + L(~y ), we conclude that L
is not linear.
199
Lecture 32
Example 32.1. Show that L : R3 → R2 defined by L(x1 , x2 , x3 ) = (x1 + x2 + x3 , x23 + 3) is
not linear.
Solution. Consider
1 0
~x = 0 and ~y = 1 .
0 0
Then
L(~x + ~y ) = L(1, 1, 0) = (2, 3)
but
L(~x) + L(~y ) = L(1, 0, 0) + L(0, 1, 0) = (1, 3) + (1, 3) = (2, 6),
which shows that L is not linear.
Recall that a linear transformation always maps the zero vector of the domain to the
zero vector of the codomain. Thus in Example 32.1, we could have quickly noticed that
L(0, 0, 0) = (0, 3) 6= (0, 0) and concluded immediately that L was not linear. Note however,
that a function sending the zero vector of the domain to the zero vector of the codomain
does not guarantee that the function is linear – see Example 31.11.
Then
L(3, 5) = L(1, 2) + L(2, 3) = (1, 2, 3, 4) + (1, 4, 0, −1) = (2, 6, 3, 3).
200
" # " #
1 2 x1 R1 −2R2 1 0 −3x1 + 2x2
0 1 2x1 − x2 −→ 0 1 2x1 − x2
Thus, by knowing just L(1, 2) and L(2, 3) we can compute L(~x) for any ~x ∈ R2 . Also note
that
−x1 + x2 −1 1 " #
2x1 2 0 x1
L(x1 , x2 ) = =
−9x1 + 6x2 −9 6 x2
−14x1 + 9x2 −14 9
which shows that L is a matrix transformation.
Recall that Theorem 31.8 guarantees that every matrix transformation from Rn to Rm is a
linear transformation. We also noticed that the linear transformations from Examples 31.10
and 32.2 were matrix transformations, so it is natural to ask if every linear transformation
from Rn to Rm is a matrix transformation. The following theorem shows the answer is yes.
Theorem 32.3. If L : Rn → Rm is a linear transformation, then L is a matrix transforma-
tion with corresponding matrix
201
Given a linear transformation L : Rn → Rm , we refer to [ L ] ∈ Mm×n (R) as the standard
matrix of L. Theorems 31.8 and 32.3 combine to give that a transformation is linear if and
only if it is a matrix transformation.
Example 32.4. Let d~ ∈ R2 be nonzero and define L : R2 → R2 by L(~x) = proj d~ ~x for every
~x ∈ R2 . Show L is linear, and then find the standard matrix of L with d~ = [ −1
3 ].
Note that if we take ~x = [ 12 ] for example, we can compute the projection of ~x onto d~ = [ −1
3 ]
as " #" # " #
1/10 −3/10 1 −1/2
L(~x) = proj d~ ~x = = ,
−3/10 9/10 2 3/2
that is, we can compute projections using matrix multiplication.
202
~ Note that
Figure 50: Reflecting ~x in a line through the origin with direction vector d.
2proj d~ ~x − ~x = ~x − 2perp d~ ~x
Solution. We first show that L is linear using the fact that proj d~ ~x is linear. For ~x, ~y ∈ R2
and s, t ∈ R we have
and so " #
0 1
[ L ] = [ L(~e1 ) L(~e2 ) ] = .
1 0
Note that in R2 , the line through the origin with direction vector d~ = [ 11 ] has scalar equation
x2 = x1 . For any ~y = [ yy12 ] ∈ R2 ,
" #" # " #
0 1 y1 y2
L(~y ) = =
1 0 y2 y1
from which we see that reflecting a vector in the line x2 = x1 simply swaps the coordinates
of that vector.
203
Example 32.6. Let L : R3 → R3 be defined by L(~x) = x − 2proj ~n ~x where ~n ∈ R3 is
a nonzero vector. Figure 51 shows that L represents a reflection in the plane through the
origin with normal vector ~n. Show that L is linear, and find the standard matrix of L if the
plane has scalar equation x1 − x2 + 2x3 = 0.
Figure 51: Reflecting ~x in a plane through the origin with normal vector ~n.
Solution. We first show that L is linear using the fact that projections are linear. For
~x, ~y ∈ R3 , and s, t ∈ R,
L(s~x + t~y ) = (s~x + t~y ) − 2proj ~n (s~x + t~y )
= s~x + t~y − 2(s proj ~n ~x + t proj ~n ~y )
= s(~x − 2proj ~n ~x) + t(~y − 2proj ~n ~y )
= sL(~x) + tL(~y )
h 1 i
and so L is linear. Now for the plane x1 − x2 + 2x3 = 0, we have that ~n = −1 . We compute
2
1 1 2/3
~e1 · ~n 1
L(~e1 ) = ~e1 − 2proj ~n ~e1 = ~e1 − 2 ~n = 0 − 2 −1 = 1/3
k~nk 2 6
0 2 −2/3
0 1 1/3
~e2 · ~n (−1)
L(~e2 ) = ~e2 − 2proj ~n ~e2 = ~e2 − 2 ~n = 1 − 2 −1 = 2/3
k~nk 2 6
0 2 2/3
0 1 −2/3
~e3 · ~n 2
L(~e3 ) = ~e3 − 2proj ~n ~e3 = ~e3 − 2 ~n = 0 − 2 −1 = 2/3
k~nk 2 6
1 2 −1/3
204
In the last two examples, we required the objects we were reflecting in (a line and a plane)
to be through the origin. The reason for this is because if our line or plane does not contain
the origin, then our transformation would not send the zero vector to the zero vector and
thus not be linear.
205
Lecture 33
We are seeing that linear transformations (or equivalently, matrix transformations) give us a
way to geometrically understand the matrix–vector product. We have seen that projections
and reflections are both linear transformations, and we now look at some additional linear
transformations that are common in many fields, such as computer graphics.
where r ∈ R satisfies r = k~xk ≥ 0 and φ ∈ R is the angle ~x makes with the positive x1 −axis
measured counterclockwise (if ~x = ~0, then r = 0 and we may take φ to be any real number).
See Figure 52.
Since Rθ (~x) is obtained from rotating ~x counterclockwise about the origin, it is clear that
kRθ (~x)k = r and that Rθ (~x) makes an angle of θ + φ with the positive x1 −axis (this is
illustrated in Figure 52). Thus using the angle-sum formulas for sine and cosine, we have
" #
r cos(φ + θ)
Rθ (~x) =
r sin(φ + θ)
" #
r(cos φ cos θ − sin φ sin θ)
=
r(sin φ cos θ + cos φ sin θ)
" #
cos θ(r cos φ) − sin θ(r sin φ)
=
sin θ(r cos φ) + cos θ(r sin φ)
" #" #
cos θ − sin θ r cos φ
=
sin θ cos θ r sin φ
206
" #
cos θ − sin θ
= ~x
sin θ cos θ
and we see that Rθ is a matrix transformation and thus a linear transformation. We also see
that " #
cos θ − sin θ
[ Rθ ] = .
sin θ cos θ
Example 33.1. Find the vector that results from rotating ~x = [ 12 ] counterclockwise about
the origin by an angle of π6 .
Solution. We have
" #" # " √ #" # " √ #
cos π6 − sin π6 1 3/2 −1/2 1 1 3−2
R π6 (~x) = [ R π6 ]~x = = √ = √ .
sin π6 cos π6 2 1/2 3/2 2 2 1−2 3
Note that a clockwise rotation about the origin by an angle of θ is simply a counterclockwise
rotation about the origin by an angle of −θ. Thus a clockwise rotation by θ is given by the
linear transformation
" # " #
cos(−θ) − sin(−θ) cos θ sin θ
[ R−θ ] = =
sin(−θ) cos(−θ) − sin θ cos θ
where we have used the fact that cos θ is an even function (cos(−θ) = cos θ) and sin θ is an
odd function (sin(−θ) = − sin θ).
We briefly mention that we can generalize these results for rotations about a coordinate axis
in R3 . Consider50
1 0 0 cos θ 0 sin θ cos θ − sin θ 0
A = 0 cos θ − sin θ , B = 0 1 0 , C = sin θ cos θ 0 .
Then
207
In fact, we can rotate about any line through the origin in R3 , but finding the standard
matrix of such a transformation is beyond the scope of this course.
We next look at stretches and compressions. For t a positive real number, let
" #
t 0
A=
0 1
If t > 1, then we say that L is a stretch in the x1 −direction by a factor of t, and if 0 < t < 1,
we say that L is a compression if the x1 −direction. A stretch in the ~x1 −direction is illustrated
in Figure 53.
Note the requirement that t > 0. If t = 0, then L is actually a projection onto the x2 −axis,
and if t < 0, then L is a reflection in the x2 −axis followed by a stretch or compression by a
factor of −t > 0. A stretch or compression in the x2 −direction is defined in a similar way.
and define L(~x) = B~x for every ~x ∈ R2 . Then L is a matrix transformation and thus a linear
transformation. For ~x = [ xx12 ],
" #" # " #
t 0 x1 tx1
L(~x) = = = t~x.
0 t x2 tx2
208
We see that L(~x) is simply a scalar multiple of ~x. We call L a dilation if t > 1 and we call
L a contraction if 0 < t < 1. If t = 1, then B is the identity matrix and L(~x) = ~x. Figure
54 illustrates a dilation.
209
Note that a shear in the x2 −direction (or a vertical shear) by a factor of s ∈ R has standard
matrix " #
1 0
.
s 1
(cL)(~x) = cL(~x)
for every ~x ∈ Rn .
and
(−2)L(~x) = −2(2x1 + x2 , x1 − x2 + x3 ) = (−4x1 − 2x2 , −2x1 + 2x2 − 2x3 ).
51
This definition works for any two functions L, M : Rn → Rm
210
Notice that in the previous example, L + M and −2L are both linear transformationsh x1 i as
well. In fact, since L and M are linear transformations, we have that for any ~x = xx23 ∈ R3
(L + M )(~x) = L(~x) + M (~x) = [ L ]~x + [ M ]~x = [ L ] + [ M ] ~x
" # " #! x1 " # x1
2 1 0 0 0 1 2 1 1
= + x2 = x2
1 −1 1 1 2 3 2 1 4
x3 x3
" #
2x1 + x2 + x3
=
2x1 + x2 + 4x3
and
" x1 #
2 1 0
(−2L)(~x) = −2L(~x) = −2[ L ]~x = −2 x2
1 −1 1
x3
" # " #
2x1 + x2 −4x1 − 2x2
= −2 =
x1 − x2 + x3 −2x1 + 2x2 − 2x3
[ L + M ] = [ L ] + [ M ] and [ cL ] = c[ L ].
Proof. We prove the result for cL. For any ~x, ~y ∈ Rn and s, t ∈ R, we have
from which we see that [ cL ] = c[ L ] by the Matrices Equal Theorem (Theorem 26.10).
211
Aside from adding and scaling linear transformations, we can also compose them.
Definition 33.6. Let L : Rn → Rm and M : Rm → Rp be (linear) transformations. The
composition M ◦ L : Rn → Rp is defined by
(M ◦ L)(~x) = M (L(~x))
for every ~x ∈ Rn .
The composition of two transformations is illustrated in Figure 56. It is important to note
that in order for M ◦ L to be defined, the domain of M must contain the codomain of L.
Notice that M ◦ L is also a linear transformation with domain R3 and codomain R2 . In fact,
computing the standard matrices for L and M gives
" # " #
1 1 0 1 −3
[L] = and [ M ] =
0 1 1 2 0
212
Theorem 33.8. Let L : Rn → Rm and M : Rm → Rp be linear transformations. Then
M ◦ L : Rn → Rp is a linear transformation and
[ M ◦ L ] = [ M ][ L ].
213
Lecture 34
Example 34.1. Let L : R2 → R2 be a counterclockwise rotation about the origin by an
angle of π/4 and let M : R2 → R2 be a projection onto the x1 −axis. Find the standard
matrices for M ◦ L and L ◦ M .
Solution. Since L and M are linear, we have
" # " √ √ #
cos π/4 − sin π/4 2/2 − 2/2
[L] = = √ √
sin π/4 cos π/4 2/2 2/2
" #
h i 1 0
[ M ] = proj ~e1 ~e1 proj ~e1 ~e2 =
0 0
and thus
" #" √ √ # " √ √ #
1 0 2/2 − 2/2 2/2 − 2/2
[ M ◦ L ] = [ M ][ L ] = √ √ =
0 0 2/2 2/2 0 0
" √ √ #" # " √ #
2/2 − 2/2 1 0 2/2 0
[ L ◦ M ] = [ L ][ M ] = √ √ = √ .
2/2 2/2 0 0 2/2 0
We notice in the previous example that although M ◦ L and L ◦ M are both defined,
[ M ◦ L ] 6= [ L ◦ M ] from which we conclude that M ◦ L and L ◦ M are not the same
linear transformation, that is, L and M do not commute.
Example 34.2. Let L, M : R2 → R2 be linear transformations defined by L(x1 , x2 ) =
(2x1 + x2 , x1 + x2 ) and M (x1 , x2 ) = (x1 − x2 , −x1 + 2x2 ). Find [ M ◦ L ] and [ L ◦ M ].
Solution. Since L and M are linear, we have
" #
h i 2 1
[ L ] = L(~e1 ) L(~e2 ) =
1 1
" #
h i 1 −1
[ M ] = M (~e1 ) M (~e2 ) =
−1 2
and thus
" #" # " #
1 −1 2 1 1 0
[ M ◦ L ] = [ M ][ L ] = =
−1 2 1 1 0 1
" #" # " #
2 1 1 −1 1 0
[ L ◦ M ] = [ L ][ M ] = = .
1 1 −1 2 0 1
We see that [ M ◦ L ] = I = [ L ◦ M ], so M ◦ L = L ◦ M .
214
Inverse Linear Transformations
We have studied invertible matrices, and have seen that the inverse is only defined for
square matrices. We now study invertible linear transformations, which will only be defined
for linear operators on Rn .52
Definition 34.3. The linear transformation Id : Rn → Rn defined by Id(~x) = ~x for every
~x ∈ Rn is called the identity transformation.
Clearly, h i h i
[ Id ] = Id(~e1 ) · · · Id(~en ) = ~e1 · · · ~en = I.
M is the inverse of L ⇐⇒ M ◦ L = Id = L ◦ M
⇐⇒ [ M ◦ L ] = [ Id ] = [ L ◦ M ]
⇐⇒ [ M ][ L ] = I = [ L ][ M ]
⇐⇒ [ M ] is the inverse of [ L ].
[ L−1 ] = [ L ]−1 .
215
Note that we have just shown that [ Rθ ]−1 = [ R−θ ], that is,
" #−1 " #
cos(θ) − sin(θ) cos θ sin θ
= .
sin(θ) cos(θ) − sin θ cos θ
but this is quite tedious. Indeed, understanding what multiplication by a square matrix does
geometrically can give us a fast way to decide if the matrix is invertible, and if so, what the
inverse of that matrix is.
Then
" # 1 " #
1−j j 2 2+j
L(1, j, 1 + j) = j = .
2+j 3 1+j 2 + 6j
1+j
216
We wish to know the coordinates of the endpoint of the arm. We begin with the portion
of the arm with length `2 , assuming it is based at the origin and is lying on the x1 -axis so
its end has coordinates (`2 , 0). We perform a counterclockwise rotation by θ2 , followed by a
translation by the vector [ `1 0 ]T (which can be thought as inserting the remaining portion
of the arm with its base at the origin and laying parallel to the x1 -axis), and then rotate
counterclockwise by θ1 .
217
In terms of transformations, we obtain
" # " # " #" #!
cos θ1 − sin θ1 `1 cos θ2 − sin θ2 `2
+ =
sin θ1 cos θ1 0 sin θ2 cos θ2 0
" # " # " #!
cos θ1 − sin θ1 `1 `2 cos θ2
= +
sin θ1 cos θ1 0 `2 sin θ2
" #" #
cos θ1 − sin θ1 `1 + `2 cos θ2
=
sin θ1 cos θ1 `2 sin θ2
" #
`1 cos θ1 + `2 cos θ1 cos θ2 − `2 sin θ1 sin θ2
=
`1 sin θ1 + `2 sin θ1 cos θ2 + `2 cos θ1 sin θ2
" #
`1 cos θ1 + `2 (cos θ1 cos θ2 − sin θ1 sin θ2 )
=
`1 sin θ1 + `2 (sin θ1 cos θ2 + cos θ1 sin θ2 )
" #
`1 cos θ1 + `2 cos(θ1 + θ2 )
=
`1 sin θ1 + `2 sin(θ1 + θ2 )
where we have used the angle-sum formulas. We see the endpoint of the arm has coordinates
(x1 , x2 ) = `1 cos θ1 + `2 cos(θ1 + θ2 ), `1 sin θ1 + `2 sin(θ1 + θ2 )
Note that although our transformation is linear, it was constructed using translations (a
shift in the direction of a nonzero vector), which are nonlinear transformations. We can see
the nonlinearity of a translation: for `1 6= 0 we define
" # " #
x1 `1
L(x1 , x2 ) = +
x2 0
Below we introduce homogeneous coordinates. These coordinates require adding a 1 onto the
coordinates of a vector in Rn thus giving a vector in Rn+1 . We see that in these coordinates,
we actually can compute a translation using matrix multiplication.
Definition 34.8. To each point (x1 , x2 ) in the x1 x2 −plane there is a corresponding point
(x1 , x2 , 1) lying in the plane x3 = 1 in R3 . We call the coordinates (x1 , x2 , 1) the homogeneous
coordinates of the point (x1 , x2 ).
218
For a, b ∈ R, not both zero, consider a transformation L : R2 → R2 defined by
" # " # " #
x1 a x1 + a
L(x1 , x2 ) = + =
x2 b x2 + b
This is not a linear transformation, but using homogeneous coordinates, we are able to apply
the transformation by matrix multiplication:
1 0 a x1 x1 + a
0 1 b x2 = x2 + b
0 0 1 1 1
Also, for any linear transformation L : R2 → R2 defined by L(~x) = A~x, where A ∈ M2×2 (R),
we can construct the 3 × 3 matrix " #
A ~0R2
~0 T2 1
R
0 0 1 0 0 1 0 0 1 1
cos θ1 − sin θ1 0 1 0 `1 `2 cos θ2
= sin θ1 cos θ1 0 0 1 0 `2 sin θ2
0 0 1 0 0 1 1
cos θ1 − sin θ1 0 `1 + `2 cos θ2
= sin θ1 cos θ1 0 `2 sin θ2
0 0 1 1
`1 cos θ1 + `2 cos θ1 cos θ2 − `2 sin θ1 sin θ2
= `1 sin θ1 + `2 sin θ1 cos θ2 + `2 cos θ1 sin θ2
1
`1 cos θ1 + `2 cos(θ1 + θ2 )
= `1 sin θ1 + `2 sin(θ1 + θ2 )
219
Another interesting use for linear transformations is the differentiation of polynomials. If we
write our polynomials as
p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + an xn
where n is a positive integer, then we may represent the polynomial as a vector in Rn+1
a0
a1
a2
p(x) −→ .
..
a
n−1
an
Since
d
p(x) = a1 + 2a2 x + · · · + (n − 1)an−1 xn−2 + nan xn−1
dx
we have
a1
2a2
d 3a3
p(x) −→ .
dx .
.
na
n
0
h iT
In fact, given an arbitrary vector a0 a1 a2 · · · an−1 an ∈ Rn+1 , we have that
a1 a0
0 1 0 ··· 0 0
2a2 a1
0 0 2 ··· 0 0
3a3 a2
.. ... ..
. =
.. . . .
..
0 0 0 ··· 0 n
na a
n n−1
0 0 0 ··· 0 0
0 an
Example 34.10. We can represent
d
(3 − 2x + 4x2 − 7x3 ) = −2 + 8x − 21x2
dx
as
−2 0 1 0 0 3
0 0 2 0 −2
8
=
−21 0 0 0 3 4
0 0 0 0 0 −7
220
Lecture 35
Solution. We compute
from which we deduce that ~x1 , ~x2 ∈ Ker (L) and ~x3 ∈
/ Ker (L).
Solution. To see if ~y1 ∈ Range (L), we try to find ~x = [ xx12 ] ∈ R2 such that L(~x) = ~y1 . Thus
we need
L(x1 , x2 ) = (x1 + x2 , 2x1 + x2 , 3x2 ) = (2, 3, 3).
53
The kernel of L can also be called the nullspace of L, denoted by Null (L).
221
This leads to a system of equations
x1 + x2 = 2
2x1 + x2 = 3
3x2 = 3
Carrying the augmented matrix of this system to reduced row echelon form gives
1 1 2 −→ 1 1 2 R1 +R2 1 0 1 −→ 1 0 1
2 1 3 R2 −2R1 0 −1 −1 −→ 0 −1 −1 −R2 0 1 1
0 3 3 0 3 3 R3 +3R1 0 0 0 0 0 0
from which we see that x1 = x2 = 1 and so L(1, 1) = (2, 3, 3). Thus ~y1 ∈ Range (L). For
~y2 , we seek ~x = [ xx12 ] ∈ R2 such that L(x1 , x2 ) = (1, 1, 2). A similar computation leads to a
system of equations with augmented matrix
1 1 1 1 1 1
2 1 1 −→ 0 −1 −1 .
0 3 2 0 0 −1
As this system is inconsistent, there is no ~x = [ xx12 ] ∈ R2 such that L(x1 , x2 ) = (1, 1, 2) and
so ~y2 ∈
/ Range (L).
As one might expect, the kernel and range for a linear transformation are both subspaces.
Proof.
(1) By definition, Ker (L) ⊆ Rn and since L is linear, L(~0Rn ) = ~0Rm so ~0Rn ∈ Ker (L) and
Ker (L) is nonempty. For ~x, ~y ∈ Ker (L), we have that L(~x) = ~0 = L(~y ). Then, since
L is linear
L(~x + ~y ) = L(~x) + L(~y ) = ~0 + ~0 = ~0
so ~x + ~y ∈ Ker (L) and Ker (L) is closed under vector addition. For c ∈ R, we again
use the linearity of L to obtain
L(c~x) = cL(~x) = c ~0 = ~0
showing that c~x ∈ Ker (L) so that Ker (L) is closed under scalar multiplication. Hence,
Ker (L) is a subspace of Rn .
222
(2) By definition, Range (L) ⊆ Rm and since L is linear, L(~0Rn ) = ~0Rm so ~0Rm ∈ Range (L)
and Range (L) is nonempty. For ~x, ~y ∈ Range (L), there exist ~u, ~v ∈ Rn such that
~x = L(~u) and ~y = L(~v ). Then since L is linear,
[ L ] = [ L(~e1 ) · · · L(~en ) ]
and that L(~x) = [ L ]~x for every ~x ∈ Rn . Thus we may view the kernel of L as the nullspace
of [ L ] and the range of L as the column space of [ L ].
Note that in Example 35.4, to see if ~y1 ∈ Range (L), were are ultimately checking if the
linear system of equations [ L ]~x = ~y1 is consistent, that is, if ~y1 ∈ Col ([ L ]).
Example 35.7. Leth Li: R3 → R3 be a projection onto the line through the origin with
1
direction vector d~ = 1 . Find a basis for Ker (L) and Range (L).
1
If ~x ∈ Ker (L), then L(~x) = [ L ]~x = ~0. Carrying [ L ] to reduced row echelon form gives
1/3 1/3 1/3 1 1 1
1/3 1/3 1/3 −→ 0 0 0
223
and we see that
−1 −1
~x = s 1 + t 0 , s, t ∈ R
0 1
so
−1 −1
1 , 0
0 1
is a basis for Ker (L). To find a basis for Range (L), we find all vectors ~y ∈ Rm for which
there exists a ~x ∈ Rn with L(~x) = ~y . But this is equivalent to finding all ~y ∈ Rm for which
the system [ L ]~x = ~y is consistent, and the system [ L ]~x = ~y is consistent if and only if
~y ∈ Col ([ L ]). Hence we simply seek a basis for Col ([ L ]). From our work above, we see
that the reduced row echelon form of [ L ] has a leading one in the first column only, and so
a basis for Range (L) is
1/3
1/3 .
1/3
Example 35.8. Find a basis for Ker (L) and Range (L) where L is the linear transformation
satisfying
L(x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 ).
Solution. We have
" #
h i 1 1 0
[L] = L(~e1 ) L(~e2 ) L(~e3 ) = .
0 1 1
1
and so
1
−1
1
224
is a basis for Ker (L). As the reduced row echelon form of [ L ] has leading ones in the first
two columns, a basis for Range (L) is
(" # " #)
1 1
, .
0 1
In Example 35.8,
h 1 inote that geometrically, Ker (L) is a line through the origin with direction
vector d~ = −1 , which is a 1−dimensional subspace of R3 and that Range (L) = R2 .
1
Figure 58 gives a more general geometric interpretation of the domain and range of a linear
transformation from Rn to Rm .
Figure 58: Visualizing the kernel and the range of a linear transformation.
Note that an equivalent definition is that L is one-to-one if ~x1 6= ~x2 implies L(~x1 ) 6= L(~x2 ).
Thus a one-to-one transformation cannot send distinct elements of the domain to the same
element in the range, and it follows that each element of the range is the image of at most
one element in the domain. This is illustrated in Figure 59.
54
Again, this definition holds for any function from Rn to Rm . In fact, if n = m = 1, then we have a
function from R to R, and in this case the definition amounts to the horizontal line test often seen in a
calculus course.
225
(a) An example of a one-to-one transforma- (b) An example of a transformation (or func-
tion (or function). tion) that is not one-to-one. Note that ~y1 is
the image of both ~x1 and ~x2 , but ~x1 6= ~x2 .
Figure 59: For a one-to-one transformation, every element of the codomain is the image of
at most one element from the domain.
Assume now that Ker (L) = {~0}. If ~x1 , ~x2 ∈ Rn are such that L(~x1 ) = L(~x2 ), then using the
linearity of L, we have
~0 = L(~x1 ) − L(~x2 ) = L(~x1 − ~x2 )
and so ~x1 − ~x2 ∈ Ker (L). Since Ker (L) = {~0}, we see that ~x1 − ~x2 = ~0, that is, ~x1 = ~x2 and
so L is one-to-one.
Given that the kernel of a linear transformation from Rn to Rm is simply the nullspace of
the standard matrix, we can actually use the rank of the standard matrix to determine if a
linear transformation is one-to-one.
Theorem 35.11. Let L : Rn → Rm be a linear transformation. Then L is one-to-one if and
only if rank ([ L ]) = n.
Proof. Since L : Rn → Rm is a linear transformation, [ L ] ∈ Mm×n (R). Then
226
Example 35.12. Consider the linear transformations L : R2 → R3 and M : R2 → R2
defined by
L(x1 , x2 ) = (x1 , x2 − x1 , x2 )
M (x1 , x2 ) = (x1 + x2 , 2x1 + 2x2 )
0 1 0 0
and we see that rank ([ L ]) = 2 = n and thus L is one-to-one. In the case of M , n = 2 and
the standard matrix for M is
" # " #
1 1 1 1
[M ] = −→
2 2 0 0
227
Lecture 36
Recall that a linear transformation L : Rn → Rm is one-to-one if L(~x1 ) = L(~x2 ) implies
that ~x1 = ~x2 for every ~x1 , ~x2 ∈ Rn . Recall also that this means that every element in the
codomain of L is the image of at most one element in the domain, and that knowing the rank
of [ L ] allows us to verify if L is one-to-one. We now look for a condition that guarantees
that every element in the codomain is the image of at least one element from the domain.
Definition 36.1. Let L : Rn → Rm be a (linear) transformation. L is called onto (or
surjective) if for every ~y ∈ Rm there exists an ~x ∈ Rn such that L(~x) = ~y .
It is clear that Range (L) ⊆ Rm . It follows that if L : Rn → Rm is onto, then Range (L) = Rm .
Figure 60 gives an illustration of an onto transformation.
Figure 60: For an onto transformation, every element of the codomain is the image of at
least one element from the domain.
The next theorem shows that we can use the rank of the standard matrix of a linear trans-
formation to determine if the linear transformation is onto.
Theorem 36.2. Let L : Rn → Rm be a linear transformation. Then L is onto if and only if
rank ([ L ]) = m.
Proof. Since L : Rn → Rm is a linear transformation, [ L ] ∈ Mm×n (R). Then
L is onto ⇐⇒ for every ~y ∈ Rm there exists a ~x ∈ Rn such that L(~x) = ~y
⇐⇒ [ L ]~x = ~y is consistent for every ~y ∈ Rm
⇐⇒ rank ([ L ]) = m by the System-Rank Theorem (3).
Example 36.3. Let L : R3 → R2 and M : R2 → R3 be linear transformations defined by
L(x1 , x2 , x3 ) = (x1 + x2 − x3 , x2 + x3 )
M (x1 , x2 ) = (x1 , x2 , 0)
Determine which of L and M are onto.
228
Solution. In the case of L, m = 2. The standard matrix for L is
" # " #
1 1 −1 1 0 −2
[L] = −→
0 1 1 0 1 1
and we see rank ([ L ]) = 2 = m, and thus L is onto. In the case of M , m = 3 and the
standard matrix for M is
1 0
[M ] = 0 1
0 0
and thus rank ([ M ]) = 2 < 3 = m. Hence M is not onto.
(1) If {~x1 , . . . , ~xk } is linearly independent and L is one-to-one, then {L(~x1 ), . . . , L(~xk )} is
linearly independent.
(2) If Span {~x1 , . . . , ~xk } = Rn and L is onto, then Span {L(~x1 ), . . . , L(~xk )} = Rm .
Proof.
(1) Assume L is one-to-one and that {~x1 , . . . , ~xk } is linearly independent. For scalars
c1 , . . . , ck ∈ R consider
c1 L(~x1 ) + · · · + ck L(~xk ) = ~0.
We must show that c1 = · · · = ck = 0. Since L is linear, we have
L(c1~x1 + · · · + ck ~xk ) = ~0
and thus, c1~x1 +· · ·+ck ~xk ∈ Ker (L). Since L is one-to-one, Ker (L) = {~0} by Theorem
35.10. Hence
c1~x1 + · · · + ck ~xk = ~0
and since {~x1 , . . . , ~xk } is linearly independent, c1 = · · · = ck = 0. Hence {L(~x1 ), · · · , L(~xk )}
is linearly independent.
(2) Assume L is onto and that Span {~x1 , . . . , ~xk } = Rn . Let ~y ∈ Rm . We must show that
~y can be expressed as a linear combination of L(~x1 ), . . . , L(~xk ). Since L is onto, there
exists an ~x ∈ Rn so that L(~x) = ~y . As Span {~x1 , . . . , ~xk } = Rn , there are c1 , . . . , ck ∈ R
so that
~x = c1~x1 + · · · + ck ~xk .
Then since L is linear
229
and we see that ~y ∈ Span {L(~x1 ), . . . , L(~xk )}. This shows
230
Determinants, Adjugates and Matrix Inverses
We return now to studying matrices. Previously, we used the Matrix Inversion Algorithm
to both decide if an n × n matrix A was invertible and to compute A−1 if A was invertible.
Now we study a number associated to an n × n matrix A, called the determinant and we
will see how the determinant is related to the invertibility. We begin with a 2 × 2 matrix.
231
Theorem 36.9. Let A ∈ M2×2 (R). Then
A(adj A) = (det A)I = (adj A)A
Moreoever, A is invertible if and only if det A 6= 0 and in this case
1
A−1 = adj A
det A
Proof. Let " #
a b
A= ∈ M2×2 (R)
c d
Then " #
d −b
det A = ad − bc and adj A =
−c a
Now " #" # " #
a b d −b ad − bc 0
A(adj A) = = = (det A)I
c d−c a 0 ad − bc
" #" # " #
d −b a b ad − bc 0
(adj A)A = = = (det A)I
−c a c d 0 ad − bc
Assume then that det A 6= 0. From
A(adj A) = (det A)I = (adj A)A
we obtain
1 1
A adj A = I = adj A A
det A det A
so
1
A−1 = adj A.
det A
Thus det A 6= 0 implies that A invertible and gives our formula for A−1 . We now show if A is
invertible, then det A 6= 0. Assume for a contradiction that det A = 0. Since A is invertible,
A 6= 0 so at least one of a, b, c, d are not zero. Since
A(adj A) = (det A)I = 0I = 0,
we have " # " # " # " #
d 0 −b 0
A = and A = .
−c 0 a 0
Since not all of a, b, c, d are zero, we have that either
" # " # " # " #
d 0 −b 0
6= or 6=
−c 0 a 0
from which we see that the homogeneous system A~x = ~0 has a nontrivial solution, so A is
not invertible by the Invertible Matrix Theorem. This is a contradiction, so our assumption
that det A = 0 was incorrect, and we must have det A 6= 0.
232
Lecture 37
We now turn our attention to computing the determinant of an n × n matrix. We will see
that the definition of the determinant of an n × n matrix is recursive - to compute such
a determinant, we will compute n determinants of size (n − 1) × (n − 1). This can be
quite tedious by hand, so we will also begin to explore how elementary row (and column)
operations can greatly reduce our work.
Definition 37.1. Let A ∈ Mn×n (R) and let A(i, j) be the (n − 1) × (n − 1) matrix obtained
from A by deleting the ith row and jth column of A. The (i, j)-cofactor of A, denoted by
Cij , is
Cij = (−1)i+j det A(i, j).
Example 37.2. Let
1 −2 3
A= 1 0 4
4 1 1
then the (3, 2)-cofactor of A is
1 3
C32 = (−1)3+2 det A(3, 2) = (−1)5 = (−1)(4 − 3) = −1
1 4
1 3
C22 = (−1)2+2 det A(2, 2) = (−1)4 = 1(1 − 12) = −11.
4 1
Definition 37.3. Let A ∈ Mn×n (R). For any i = 1, . . . , n, we define the determinant of A
as
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin
which we refer to as a cofactor expansion of A along the ith row of A. Equivalently, for any
j = 1, . . . , n,
det A = a1j C1j + a2j C2j + · · · + anj Cnj
which we refer to as a cofactor expansion of A along the jth column of A.
Note that we can do a cofactor expansion along any row or column we choose. This is
illustrated in the next example.
Example 37.4. Compute det A where
1 2 −3
A = 4 −5 6
−7 8 9
233
Solution. Doing a cofactor expansion along the first row gives
+ − + + − + + − +
− + − − + − − + −
+ − + + − + + − +
2 −3
z}|{ z}|{ z}|{
1
−5 6 4 6 4 −5
det A = 4 −5 6 = 1 −2 −3
8 9 −7 9 −7 8
−7 8 9 | {z } | {z } | {z }
1 2 −3 1 2 −3 1 2 −3
4 −5 6 4 −5 6 4 −5 6
−7 8 9 −7 8 9 −7 8 9
5 6 −7
1 0 −2
3 4 0 −2 0 −2
det B = 0 3 4 = 1(−1)1+1 + 0(−1)2+1 + 5(−1)3+1
6 −7 6 −7 3 4
5 6 −7
= 1(−21 − 24) + 0 + 5(0 + 6)
= −45 + 30
= −15
234
Example 37.6. Find det A if
1 2 −1 3
1 2 0 4
A=
0 0 0 3
−1 1 2 1
1 2 −1 3
1 2 −1
1 2 0 4
det A = = −3 1 2 0
0 0 0 3
−1 1 2
−1 1 2 1
To evaluate the determinant of the 3 × 3 matrix, we can do a cofactor expansion along the
third column. This gives
!
1 2 1 2
det A = −3 −1 +2
−1 1 1 2
= −3(−1(1 + 2) + 2(2 − 2))
= −3(−3 + 0)
=9
3. The adjugate of A is
adj A = [Cij ]T ∈ Mn×n (R)
3 4 5
235
Solution.
T
1 2 1 2 1 1
−
4 5 3 5 3 4
T
−3 1 1 −3 2 1
2 3 1 3 1 2
− 4 5
adj A = − 2 −4 2 = 1 −4
=
1
3 5 3 4
1 1 −1 1 2 −1
2 3 1 3 1 2
−
1 2 1 2 1 1
Note that in the previous example, we computed all of the cofactors of A. Thus we can
easily compute the determinant of A by doing a cofactor expansion along, say, the first row:
1 2 3
det A = 1 1 2 = a11 C11 + a12 C12 + a13 C13 = 1(−3) + 2(1) + 3(1) = 2.
3 4 5
Note that
1 2 3 −3 2 1 2 0 0
A(adj A) = 1 1 2 1 −4 1 = 0 2 0 = 2I = (det(A))I
3 4 5 1 2 −1 0 0 2
−3 2 1 1 2 3 2 0 0
(adj A)A = 1 −4 1 1 1 2 = 0 2 0 = 2I = (det(A))I
1 2 −1 3 4 5 0 0 2
a31 C11 + a32 C12 + a33 C13 a31 C21 + a32 C22 + a33 C23 a31 C31 + a32 C32 + a33 C33
236
and
a11 C11 + a21 C21 + a31 C31 a12 C11 + a22 C21 + a32 C31 a13 C11 + a23 C21 + a33 C31
(adj A)A = a11 C12 + a21 C22 + a31 C32 a12 C12 + a22 C22 + a32 C32 a13 C12 + a23 C22 + a33 C32
a11 C13 + a21 C23 + a31 C33 a12 C13 + a22 C23 + a32 C33 a13 C13 + a23 C23 + a33 C33
The (1, 1)−, (2, 2)− and (3, 3)− entries of A(adj A) are respectively the cofactor expansions
along the first, second and third rows of A, and thus are each equal to det A. The (1, 1)−,
(2, 2)− and (3, 3)− entries of (adj A)A are respectively the cofactor expansions along the first,
second and third columns of A, and are thus each equal to det A. The entries of A(adj A)
and (adj A)A that are not on the main diagonal look like cofactor expansions, but they are
not (they are sometimes called false determinants). These always evaluate to zero.
The following theorem generalizes Theorem 36.9 for n × n matrices. The proof is similar and
is thus omitted.
1 2 4
1 4 1 4 1 1
det A = 1 −1 +2
2 4 1 4 1 2
= 1(4 − 8) − 1(4 − 4) + 2(2 − 1)
= −4 + 2
= −2
237
Then
T
1 4 1 4 1 1
−
2 4 1 4 1 2
T
−4 0 1 −4 0 2
1 2 1 2 1 1
− 2 4
adj A = − 2 −1 = 0 2 −2
=
0
1 4 1 2
2 −2 0 1 −1 0
1 2 1 2 1 1
−
1 4 1 4 1 1
so
−4 0 2 2 0 −1
1
A−1 =− 0 2 −2 = 0 −1 1
2
1 −1 0 −1/2 1/2 0
Although we have developed determinants for square real matrices, determinants are also
defined for square complex matrices, and the computations are identical as those for real
matrices.
Solution. We have
and
" #
4 −2 + j
adj A =
−3j 1+j
so
" # " #
4
1 4 −2 + j + 85 j − 45 − 35 j
A−1 = = 5
6
.
1 − 2j −3j 1+j 5
− 35 j − 15 + 35 j
238
Elementary Row/Column Operations
After computing several determinants, we see that having a row or column consisting of
mostly zeros greatly simplifies our work. We now investigate how a determinant changes after
a matrix has elementary row operations (or elementary column operations55,56 performed on
it). Our goal is to use these operations to introduce rows and/or columns with many zero
entries.
Example 37.12. Consider
" # " # " # " #
1 2 2 1 1 3 2 2
A= , B= , C= and D =
1 4 4 1 1 5 2 4
and notice that B, C and D can each be derived from A by exactly one elementary column
operation.
" # " #
1 2 −→ 2 1
A= =B and det B = − det A
1 4 C1 ↔C2 4 1
" # " #
1 2 −→ 1 3
A= = C and det C = det A
1 4 C1 +C2 →C2 1 5
" # " #
1 2 2C1 →C1 2 2
A= = D and det D = 2 det A
1 4 −→ 2 4
It appears that the determinant changes predictably under these elementary column opera-
tions (the same holds for elementary row operations).
Theorem 37.13. Let A ∈ Mn×n (R).
(1) If A has a row (or column) of zeros, then det A = 0.
(2) If B is obtained from A by swapping two distinct rows (or two distinct columns), then
det B = − det A.
(3) If B is obtained from A by adding a multiple of one row to another row (or a multiple
of one column to another column) then det B = det A.
55
Elementary column operations are the same as elementary row operations, but performed on the columns.
One may think of performing an elementary column operation on A as performing an elementary row
operation on AT .
56
When solving a linear system of equations by carrying the augmented matrix to reduced row echelon
form, you must perform elementary row operations, and not elementary column operations.
239
(4) If two distinct rows of A (or two distinct columns of A) are equal, then det A = 0.
Note: Do not perform elementary row operations and elementary column operations at the
same time. In particular, do not add a multiple of a row to a column, or swap a row with a
column. If you need to do both types of operations, do the row operations in one step and
the column operations in another.
240
Lecture 38
We now use elementary row and column operations to simplify the taking of determinants.
7 8 10
Solution. Rather than immediately doing a cofactor expansion, we will perform elementary
row operations to A to introduce two zeros in the first column, and then do a cofactor
expansion along that column.
1 2 3 = 1 2 3
−3 −6
det A = 4 5 6 R2 −4R1 0 −3 −6 =1
−6 −11
7 8 10 R3 −7R1 0 −6 −11
Of course, we could now evaluate the 2 × 2 determinant, but to include another example, we
will instead multiply the first column by a factor of −1/3 and then evaluate the simplified
determinant.
−3 −6 − 31 C1 →C1 1 −6
det A = (−3) = (−3)(−11 + 12) = −3.
−6 −11 = 2 −11
A couple of things to note here. First, we are using “=” rather than “−→” when we perform
our elementary operations on A. This is because we are really working with determinants,
and provided we are making the necessary adjustments mentioned in Theorem 37.13, we will
maintain equality. Secondly, when we performed the operation − 31 C1 → C1 , a factor of −3
appeared rather than a factor of −1/3. To see why this is, consider
" # " #
−3 −6 1 −6
C= and B =
−6 −11 2 −11
241
which is why we have
−3 −6 1 −6
= (−3)
−6 −11 2 −11
We normally view this type of row or column operation as “factoring out” of that row or
column, and we omit writing this type of operation as we reduce.
1 c c2
Show that det A = (b − a)(c − a)(c − b).
Solution. We again introduce two zeros into the first column by performing elementary row
operations on A, and then do a cofactor expansion along that column.
1 a a2 = 1 a a2
(b − a) (b − a)(b + a)
det A = 1 b b2 R2 −R1 0 b − a b 2 − a2 =1
(c − a) (c − a)(c + a)
1 c c2 R3 −R1 0 c − a c 2 − a2
1 b+a
= (b − a)(c − a)
1 c+a
= (b − a)(c − a)(c + a − b − a)
= (b − a)(c − a)(c − b)
(b − a) (b − a)(b + a) 1 b+a
= (b − a)(c − a) (25)
(c − a) (c − a)(c + a) 1 c+a
results from removing a factor of b − a from the first row of the determinant on the left, and
removing a factor of c − a from the second row. These correspond to the row operations
1 1
R → R1 and c−a
b−a 1
R2 → R2 . It is natural to ask what happens if a = b or a = c since it
would appear that we are dividing by zero in these cases. However, if a = b or a = c, we see
that both sides of (25) evaluate to zero, so that we still have equality.
1 x x
For what values of x ∈ R does A fail to be invertible?
242
Solution. A fails to be invertible exactly when det A = 0. Thus we have
x x 1 R1 −xR3 0 x − x 2 1 − x2
x(1 − x) (1 + x)(1 − x)
0= x 1 x R2 −xR3 0 1 − x2 x − x 2 =1
(1 + x)(1 − x) x(1 − x)
1 x x = 1 x x
x 1+x
= (1 − x)2
1+x x
= (1 − x)2 (x2 − (1 + x)2 ) = (1 − x)2 (x2 − 1 − 2x − x2 )
= −(1 − x)2 (1 + 2x)
so A is not invertible exactly when −(1−x)2 (1+2x) = 0, that is, when x = 1 or x = −1/2.
Example 38.4. Compute det A if
1 0 0 0
2 3 0 0
A=
4 5 6 0
7 8 9 10
Solution.
1 0 0 0
3 0 0
2 3 0 0 6 0
det A = =1 5 6 0 = 1(3) = 1(3)(6)(10) = 180
4 5 6 0 9 10
8 9 10
7 8 9 10
Note that in the previous example, det A is just the product of the entries on the main
diagonal57
Definition 38.5. Let A ∈ Mm×n (R). A is called upper triangular if every entry below the
main diagonal is zero. A is called lower triangular if every entry above the main diagonal is
zero.
Example 38.6. The matrices
4 −7 1 2 3 " #
0 0
0 3 , 0 4 10 and
0 0
0 0 0 0 −2
57
Recall that for A = [aij ] ∈ Mm×n (R), the main diagonal of A consists of the entries a11 , a22 , . . . , akk
with k being the minimum of m and n.
243
are upper triangular, and the matrices
" # 0 0 0 " #
3 0 0 0 0
, 1 2 0 and
2 −4 0 0 0
−1 3 4
Properties of Determinants
det(kA) = k n det A.
Solution. We have
and
−1 5
det(AB) = = −11 − (−5) = −6.
−1 11
for any A, B ∈ Mn×n (R). This means that even though A and B do not commute in general,
we are guaranteed that det(AB) = det(BA).
244
Note that Theorem 38.10 extends to more than two matrices. For A1 , A2 , . . . , Ak ∈ Mn×n (R),
and
It follows that
det(Ak ) = (det A)k
for any integer k where k ≤ 0 requires that A be invertible.
Recalling Theorem 30.6, we have that for A1 , A2 , . . . , Ak ∈ Mn×n (R) invertible, the product
A1 A2 · · · Ak is invertible and
245
Note that since det(AT ) = det(A) for a square matrix A, we see why we may perform column
operations on A when computing det(A) – column operations performed on A are just row
operations performed on AT .
Example 38.14. If det(A) = 3, det(B) = −2 and det(C) = 4 for A, B, C ∈ Mn×n (R), find
det(A2 B T C −1 B 2 (A−1 )2 )
Solution. We have
246
Lecture 39
Example 39.1. Find a cubic polynomial p(x) whose graph passes through each of the points
(−2, −5), (−1, 4), (1, 4) and (3, 60).
Solving the system gives a0 = 3, a1 = −2, a2 = 1 and a3 = 2, that is, p(x) = 3−2x+x2 +2x3 .
More generally, given n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), we construct a polynomial
p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 such that p(xi ) = yi for each i = 1, . . . , n. This gives
the system of equations
247
whose matrix equation is
1 x1 x21 · · · xn−1
1 a0 y1
1 x2 x2 · · · xn−1
2
a1 y2
2
=
.. .. .. .. .. .. (26)
. . . . . .
2 n−1
1 xn xn · · · xn an−1 yn
1 x3 x23
that is, det A is the product of the terms (xj − xi ) where j > i and i, j both lie between
1 and n inclusively. It follows that the n × n Vandermonde matrix is invertible if and only
if x1 , x2 , . . . , xn are all distinct and that in this case, Equation (26) has a unique solution.
This shows the following:
Theorem 39.2. For the n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) where x1 , x2 , . . . , xn are
all distinct, there exists a unique polynomial
248
Example 39.3. A car manufacturing company uses a wind tunnel to test the force due to
air resistance experienced by the car windshield. The following data was collected:
Air velocity (m/s) 20 33 45
Force on windshield (N) 200 310 420
Construct a quadratic polynomial to model this data, and use it to predict the force due to
air resistance from a wind speed of 40m/s.
Solution. Let p(x) = a0 +a1 x+a2 x2 where a0 , a1 , a2 ∈ R. Using our data points (20, 200), (33, 310)
and (45, 420) we obtain the system of equations in matrix notation
1 20 400 a0 200
1 33 1089 a1 = 310
1 45 2025 a2 420
The determinant of the coefficient matrix is (45 − 20)(45 − 33)(33 − 20) = 25 · 12 · 13 = 3900
and the adjugate is
T
33 1089 1 1089 1 33
−
45 2025 1 2025 1 45
T
17 820 −936 12
− 20 400 1 400 1 20
− = −22 500 1625 −25
45 2025 1 2025 1 45
8 580 −689 13
20 400 1 400 1 20
−
33 1089 1 1089 1 33
17 820 −22 500 8 580
= −936 1625 −689
12 −25 13
so
a0 17 820 −22 500 8 580 200 642/13
1
a1 = −936 1625 −689 310 = 209/30
3900
a2 12 −25 13 420 11/390
Thus
642 209 11 2
p(x) = + x+ x
13 30 390
When x = 40, we have
642 209 11 2 14 554
p(40) = + 40 + 40 = ≈ 373.18
13 30 390 39
When the air velocity is 40 m/s, the windshield experiences approximately 373.18N of force.
249
Determinants and Area
Let " # " #
u1 v1
~u = and ~v =
u2 v2
h u1 i h v1 i
2
be vectors in R . Recall that [ uu12 ] 6= u2 and [ vv12 ] 6=
, and that the parallelogram
v2
0 0 h u1 i
determined by [ uu12 ] and [ vv12 ] is a subset of R2 while the parallelogram determined by u2
h v1 i 0
3
and 2 is a subset of R . However, these two parallelograms do have the same area. See
v
0
Figure 62.
Figure 62: A parallelogram determined by ~u, ~v ∈ R2 on the left, and its “realization” lying
in the x1 x2 −plane of R3 on the right.
0 0 u1 v2 − v1 u2
" #
u1 v1
= |u1 v2 − v1 u2 | = det = det[ ~u ~v ] .
u2 v2
58
We need to be careful here: we explicitly write “det” when indicating a determinant since we are using
“| · · · |” to indicate absolute value and not the determinant. Mathematics often uses the same notation to
mean different things in different settings, and so we must be careful in cases such as this when such notation
could be interpreted in several ways.
250
Example 39.4. The area of the parallelogram determined by the vectors
" # " #
1 3
~u = and ~v =
2 4
is " #
1 3
A = det = |4 − 6| = | − 2| = 2.
2 4
Example 39.5. Let ~u, ~v ∈ R2 determine a parallelogram with area equal to 4. Let
L : R2 → R2 be a linear transformation with standard matrix
" #
1 2
[L] = .
−1 1
Figure 63: The parallelogram determined by ~u and ~v on the left and its image under the
linear transformation L on the right.
251
Although we have focused on parallelograms, our work generalizes to any shape in R2 . For
example, consider a circle of radius r = 1 centred at the origin in R2 . The area of this circle
is Acircle = πr2 = π(1)2 = π. If we consider a stretch in the x1 −direction by a factor of 2,
then we are considering the linear transformation L : R2 → R2 with standard matrix
" #
2 0
[L] = .
0 1
The image of our circle under L is called an ellipse, and this ellipse has area
Figure 64 depicts our circle and the resulting ellipse, and shows that our result for the area
of the ellipse is consistent with the actual formula for the area of an ellipse.
Figure 64: A circle of radius 1 centred at the origin on the left, and its image under the
linear transformation L on the right.
252
Lecture 40
u3 v3 w3
be three vectors in R3 . From before, we know the volume, V , of the parallelepiped they
determine is given by
V = |~u · (~v × w)|.
~
Working with the components of ~u, ~v and w
~ gives
V = |~u · (~v × w)|
~
u1 v1 w1
= u2 · v2 × w2
u3 v3 w3
u1 v2 w3 − w2 v3
= u2 · −(v1 w3 − w1 v3 )
u3 v1 w2 − w1 v2
u1 v1 w1
= det u2 v2 w2
u3 v3 w3
h i
= det ~u ~v w ~ .
As in R2 , our work generalizes to any shape in R3 . For example, consider a sphere of radius
r = 1 centred at the origin in R3 . The volume of this sphere is Vsphere = 34 πr3 = 43 π(1)3 = 43 π.
If we consider a stretch in the x2 −direction by a factor of 2 and a stretch in the x3 −direction
by a factor of 3, then we have the linear transformation L : R3 → R3 with standard matrix
1 0 0
[L] = 0 2 0 .
0 0 3
253
The image of our circle under L is an ellipsoid, and this ellipsoid has volume
4
Vellipsoid = det[ L ] Vsphere = |6| π = 8π.
3
Figure 65 illustrates our sphere and the resulting ellipsoid, and shows that our result for the
volume of the ellipsoid is consistent with the actual formula for the volume of an ellipsoid.
Figure 65: A sphere of radius 1 centred at the origin on the left, and its image under the
linear transformation L on the right.
254
then " #" # " #
−3/5 4/5 1 1
A~x = = = 1~x,
4/5 3/5 2 2
and so λ = 1 is an eigenvalue of A and ~x = [ 12 ] is a corresponding eigenvector.
Example 40.3. Let L : R2 → R2 be a reflection in the x2 −axis. Then we know L is a linear
transformation and " #
−1 0
A = [L] =
0 1
is the standard matrix of L. Thinking geometrically, we see that the reflection of ~e1 in
the x2 −axis is −~e1 , that is, A~e1 = −~e1 = (−1)~e1 so λ = −1 is an eigenvalue of A with
corresponding eigenvector ~e1 . Similarly, we see A~e2 = ~e2 = 1~e2 , so λ = 1 is an eigenvalue
of A with corresponding eigenvector ~e2 . In fact, any nonzero vector lying on the x1 −axis is
an eigenvector corresponding to λ = −1 and any nonzero vector lying on the x2 −axis is an
eigenvector corresponding to λ = 1.
How do we find eigenvalues and eigenvectors for A ∈ Mn×n (R)? For a nonzero vector ~x and
scalar λ, we have that λ is an eigenvalue of A with corresponding eigenvector ~x if and only
if
A~x = λ~x ⇐⇒ A~x − λ~x = ~0 ⇐⇒ A~x − λI~x = ~0 ⇐⇒ (A − λI)~x = ~0.
Thus we will consider the homogeneous system (A − λI)~x = ~0. Since ~x 6= ~0, we require
nontrivial solutions to this system, and since A − λI is an n × n matrix, the Invertible
Matrix Theorem gives that A − λI cannot be invertible, and so det(A − λI) = 0. This
verifies the following theorem.
Theorem 40.4. Let A ∈ Mn×n (R). A number λ is a eigenvalue of A if and only if λ satisfies
the equation
det(A − λI) = 0.
If λ is a eigenvalue of A, then all nonzero solutions of the homogeneous system of equations
(A − λI)~x = ~0
are all of the eigenvectors corresponding to λ.
Theorem 40.4 indicates that to find the eigenvalues and corresponding eigenvectors of an
n × n matrix A, we first find all scalars λ so that det(A − λI) = 0 which will be our
eigenvalues. Then for each eigenvalue λ of A, we find the nullspace of A − λI by solving the
homogeneous system (A − λI)~x = ~0. The nonzero vectors of Null (A − λI) will be the set of
eigenvectors of A corresponding to λ. We make the following definition.
Definition 40.5. Let A ∈ Mn×n (R). The characteristic polynomial of A is
CA (λ) = det(A − λI).
We note that λ is an eigenvalue of A if and only if CA (λ) = 0. As we will see, CA (λ) is
indeed a polynomial. Since A ∈ Mn×n (R), CA (λ) will have real coefficients, but may have
non–real roots.
255
Lecture 41
Example 41.1. Find the eigenvalues and all corresponding eigenvectors for the matrix
" #
1 2
A= .
−1 4
1−λ 2
CA (λ) = det(A − λI) = = (1 − λ)(4 − λ) − 2(−1)
−1 4 − λ
= 4 − 5λ + λ2 + 2 = λ2 − 5λ + 6 = (λ − 2)(λ − 3).
Now λ is a eigenvalue of A if and only if CA (λ) = 0, that is, if and only if (λ − 2)(λ − 3) = 0.
Thus λ1 = 2 and λ2 = 3 are the eigenvalues of A. To find the eigenvectors of A corresponding
to λ1 = 2, we solve the homogeneous system (A − 2I)~x = ~0.
" # " # " #
−1 2 −→ −1 2 −R1 1 −2
A − 2I =
−1 2 R2 −R1 0 0 −→ 0 0
so " # " #
2t 2
~x = =t , t ∈ R.
t 1
Thus the eigenvectors of A corresponding to λ1 = 2 are
" #
2
t , t ∈ R, t 6= 0.
1
256
Definition 41.2. Let λ be an eigenvalue of A ∈ Mn×n (R). The set containing all of the eigen-
vectors of A corresponding to λ together with the zero vector of Rn is called the eigenspace
of A corresponding to λ, and is denoted by Eλ (A). It follows that
Eλ (A) = Null (A − λI)
and is hence a subspace of Rn .
Thus we seek a basis for each eigenspace Eλ (A) of A. Once we have a basis for Eλ (A), we can
construct all eigenvectors of A corresponding to λ by taking all non-zero linear combinations
of these basis vectors.
Note that we can verify our work is correct by ensuring that our basis vectors for each
eigenspace satisfy the equation A~x = λ~x for the corresponding eigenvalue λ:
" # " #" # " # " #
2 1 2 2 4 2
A = = =2
1 −1 4 1 2 1
" # " #" # " # " #
1 1 2 1 3 1
A = = =3 .
1 −1 4 1 3 1
Example 41.3. Find the eigenvalues and a basis for each eigenspace of A where
0 1 1
A = 1 0 1 .
1 1 0
Solution. We begin by computing the characteristic polynomial of A, using elementary row
operations to aid in our computations.
−λ 1 1 R1 +λR2 0 1 − λ2 1 + λ
CA (λ) = det(A − λI) = 1 −λ 1 = 1 −λ 1
1 1 −λ R3 −R2 0 1 + λ −λ − 1
and performing a cofactor expansion along the first column and factoring entries as needed
leads to
(1 + λ)(1 − λ) 1+λ 1−λ 1
= (−1) = (−1)(1 + λ)2
1+λ −(1 + λ) 1 −1
257
= (−1)(λ + 1)2 ((1 − λ)(−1) − 1) = (−1)(λ + 1)2 (λ − 2).
Hence the eigenvalues of A are λ1 = −1 and λ2 = 2. For λ1 = −1, we solve (A + I)~x = ~0.
1 1 1 −→ 1 1 1
A + I = 1 1 1 R2 −R1 0 0 0
1 1 1 R3 −R1 0 0 0
so
−s − t −1 −1
~x = s = s 1 + t 0 , s, t ∈ R.
t 0 1
Hence a basis for Eλ1 (A) is
−1
−1
B1 = 1 , 0 .
0 1
A − 2I = 1 −2 1 −→ 1 −2 1 1 −2 1 −→
1 1 −2 R3 −R2 0 3 −3 R3 +R1 0 0 0
0 1 −1 −→ 0 1 −1 1 0 −1
R1 ↔R2
1 −2 1 R2 +2R1 1 0 −1 0 1 −1
−→
0 0 0 0 0 0 0 0 0
so
t 1
~x = t = t 1 , t ∈ R.
t 1
Hence a basis for Eλ2 (A) is
1
B2 = 1 .
1
Note that in the last example, the matrix A was 3 × 3 and the characteristic polynomial of
A was of degree 3. This is true in general: for A ∈ Mn×n (R), CA (λ) will be of degree n.
Notice also in the last example that we only had two eigenvalues: λ1 = −1 (which was a
double–root of CA (λ)) and λ2 = 2 (which was a single–root of CA (λ)).
258
Definition 41.4. Let A ∈ Mn×n (R) with eigenvalue λ. The algebraic multiplicity of λ,
denoted by aλ , is the number of times λ appears as a root of CA (λ).60
In our previous example, λ1 = −1 and λ2 = 2 were the only two eigenvalues of A, and we
observed that
aλ1 = 2 and aλ2 = 1.
Also in our last example, we see that dim(Eλ1 (A)) = 2 and dim(Eλ2 (A)) = 1.
Definition 41.5. Let A ∈ Mn×n (R) with eigenvalue λ. The geometric multiplicity of λ,
denoted by gλ , is the dimension of the eigenspace Eλ (A).
Again from our previous example, we have that,
gλ1 = 2 and gλ2 = 1.
The next theorem states a relationship between the algebraic and geometric multiplicities of
an eigenvalue. The proof is omitted as it is beyond the scope of this course.
Theorem 41.6. For any A ∈ Mn×n (R) and any eigenvalue λ of A,
1 ≤ gλ ≤ aλ ≤ n.
Example 41.7. Find the eigenvalues of A and a basis for each eigenspace where
" #
1 0
A=
5 1
Solution. We have
1−λ 0
CA (λ) = det(A − λI) = = (1 − λ)2
5 1−λ
which shows that λ1 = 1 is the only eigenvalue of A and aλ1 = 2. We solve (A − I)~x = ~0.
" # " # " #
0 0 −→ 0 0 R1 ↔R2 1 0
A−I =
5 0 1
R
5 2
1 0 −→ 0 0
so " # " #
0 0
~x = =t , t∈R
t 1
Thus (" #)
0
1
is a basis for Eλ1 (A), and we see gλ1 = 1 < 2 = aλ1 .
We see from this example that the geometric multiplicity of an eigenvalue can be less than
its algebraic multiplicity. We also notice that for a square upper or lower triangular matrix,
the eigenvalues of A are the entries on the main diagonal of A.
60
We can find the algebraic multiplicities of the eigenvalues of a matrix from the factorization of its
characteristic polynomial. In Example 41.3, we saw that CA (λ) = (−1)(λ + 1)2 (λ − 2). The exponent of “2”
on the λ + 1 term means that λ1 = −1 has algebraic multiplicity 2 while the exponent of “1” on the λ − 2
means that λ2 = 2 has algebraic multiplicity 1. 259
Lecture 42
Given A ∈ Mn×n (R) we have seen that CA (λ) is a real polynomial of degree n. However, we
have seen before that a real polynomial can have non-real roots, and it thus follows that a
real matrix can have non-real eigenvalues.
Example 42.1. Let " #
1 −1
A= .
1 1
Find the eigenvalues of A, and for each eigenvalue, find one corresponding eigenvector.
Solution. We have
1 − λ −1
CA (λ) = det(A − λI) = = (1 − λ)2 + 1 = λ2 − 2λ + 2.
1 1−λ
260
As a reminder, we can check our work:
" #" # " # " #
1 −1 −j −1 − j −j
= = (1 − j)
1 1 1 1−j 1
" #" # " # " #
1 −1 j −1 + j j
= = (1 + j)
1 1 1 1+j 1
Recall from Theorem 5.2 that if a real polynomial has a complex root z, then z is also a root
of the polynomial. Thus it follows that if a real n × n matrix A has a complex eigenvalue λ,
then λ is also an eigenvalue of A, which is exactly what we observed in the previous example.
Moreover, we observed that if ~x is an eigenvector of a real n × n matrix A corresponding to a
complex eigenvalue λ, then ~x is a eigenvector of A corresponding to the complex eigenvalue
λ.
Diagonalization
Note that for our following discussions about diagonalization, it is assumed that our matrices
are square matrices with real entries, and that the eigenvalues (and thus eigenvectors) of
our matrices are real. Our work does generalize naturally to real matrices with complex
eigenvalues, and even to complex matrices with complex eigenvalues, but we do not pursue
this here.
Definition 42.2. An n × n matrix D such that dij = 0 for all i 6= j is called a diagonal
matrix and is denoted by D = diag(d11 , . . . , dnn ).
Example 42.3. The matrices
" # " # 1 0 0
1 0 0 0
, , 0 2 0
0 1 0 0
0 0 3
are diagonal matrices. Note that diagonal matrices are both upper and lower triangular
matrices.
Lemma 42.4. If D = diag(d11 , . . . , dnn ) and E = diag(e11 , . . . , enn ) then it follows
1) D + E = diag(d11 + e11 , . . . , dnn + enn )
Dk = diag(d11
k k
, . . . , dnn ).
In fact, this holds for any integer k provided none of d11 , . . . , dnn are zero, that is, if D is
invertible.
261
Definition 42.5. An n × n matrix A is diagonalizable if there exists an n × n invertible
matrix P and an n × n diagonal matrix D so that P −1 AP = D. In this case, we say that P
diagonalizes A to D.
It is important to note that P −1 AP = D does not imply that A = D in general. This
is because matrix multiplication does not commute, so we cannot cancel P and P −1 in the
expression P −1 AP = D. However, given two n×n matrices A and B such that P −1 AP = B,
it can be shown that A and B have many similarities.
Theorem 42.6. If A, B are n × n matrices such that P −1 AP = B for some invertible n × n
matrix P , then
1) det A = det B,
2) A and B have the same eigenvalues,
3) rank (A) = rank (B),
4) tr (A) = tr (B) where
tr (A) = a11 + · · · + ann
is called the trace of A.
This motivates the following definition.
Definition 42.7. If A and B are n × n matrices such that P −1 AP = B for some n × n
invertible matrix P , then A and B are said to be similar.
In light of Definition 42.7, we can restate Definition 42.5 by saying that an n × n matrix is
diagonalizable if it is similar to a diagonal matrix.
We now consider how to determine if a square matrix A is diagonalizable, and to find the
invertible matrix P that diagonalizes A (provided A is indeed diagonalizable). Suppose A
is an n × n matrix whose distinct eigenvalues are λ1 , . . . , λk with algebraic multiplicities
aλ1 , . . . , aλk . Since CA (λ) is a polynomial of degree n, it has exactly n roots (counting
complex roots and repeated roots). Thus aλ1 + · · · + aλk = n. From Theorem 41.6, we
have that 1 ≤ gλ ≤ aλ ≤ n for any eigenvalue λ of A so k ≤ gλ1 + · · · + gλk ≤ n. In fact,
gλ1 + · · · + gλk = n if and only if gλi = aλi for each i = 1, . . . , k.
Lemma 42.8. Let A be an n × n matrix and let λ1 , . . . , λk be distinct eigenvalues of A. If Bi
is a basis for the eigenspace Eλi (A) for i = 1, . . . , k, then B = B1 ∪ B2 ∪ · · · ∪ Bk is linearly
independent.
Lemma 42.8 simply states that if we have bases for eigenspaces corresponding to the distinct
eigenvalues of an n × n matrix A and we construct a set B that contains all of those bases
vectors, then the set B will be linearly independent. Since the number of vectors in each
eigenspace Eλi (A) is gλi , there are k ≤ gλ1 + · · · + gλk ≤ n vectors in B. If there are in fact n
vectors in B, then B is a basis for Rn consisting of eigenvectors of A. The following theorem
gives us a condition under which A is diagonalizable.
262
Theorem 42.9 (Diagonalization Theorem). An n × n matrix A is diagonalizable if and only
if there exists a basis for Rn consisting of eigenvectors of A.
Proof. We first assume that A is diagonalizable. Then there exists an invertible matrix
P = [ ~x1 · · · ~xn ] and a diagonal matrix D = diag(λ1 , . . . , λn ) such that P −1 AP = D,
that is, such that AP = P D. Thus
We see that A~xi = λi~xi for i = 1, . . . , n, and since P = [ ~x1 · · · ~xn ] is invertible, it follows
from the Invertible Matrix Theorem that the set {~x1 , . . . , ~xn } is a basis for Rn so that ~xi 6= ~0
for i = 1, . . . , n. Thus {~x1 , . . . , ~xn } is a basis for Rn consisting of eigenvectors of A.
We now assume that there is a basis {~x1 , . . . , ~xn } of Rn consisting of eigenvectors of A. Then
for each i = 1, . . . n, A~xi = λi~xi for some eigenvalue λi of A. It follows from the Invertible
Matrix Theorem that P = [ ~x1 · · · ~xn ] is invertible and thus
P −1 AP = P −1 [ A~x1 · · · A~xn ]
= P −1 [ λ1~x1 · · · λn~xn ]
= P −1 [ λ1 P~e1 · · · λn P~en ]
= P −1 P [ λ1~e1 · · · λn~en ]
= diag(λ1 , . . . , λn )
263
Lecture 43
Example 43.1. Diagonalize the matrix
" #
1 2
A= .
−1 4
and
(" #)
2
is a basis for Eλ1 (A) so λ1 = 2 has geometric multiplicity gλ1 = 1
1
(" #)
1
is a basis for Eλ2 (A) so λ2 = 3 has geometric multiplicity gλ2 = 1.
1
We see that aλ1 = gλ1 and aλ2 = gλ2 and so A is diagonalizable by Corollary 42.1062 . We
take " #
2 1
P =
1 1
and have that P diagonalizes A, that is
" #
2 0
P −1 AP = diag(2, 3) = = D.
0 3
Note that P and D are not unique. We could have chosen P = [ 11 21 ] which would have
diagonalized A to D = diag(3, 2). Moreover, we can use the vectors from any bases for the
eigenspaces of A, not just the ones we found in Example 41.1.
62
In this case, we don’t even need to compute the geometric multiplicities of A to conclude that A is
diagonalizable - since the two eigenvalues of A are distinct and A is a 2×2 matrix, Corollary 42.11 immediately
tells us that A is diagonalizable.
264
Example 43.2. Diagonalize the matrix
0 1 1
A = 1 0 1 .
1 1 0
and
−1
−1
1 , 0 is a basis for Eλ1 (A) so λ1 = −1 has geometric multiplicity gλ1 = 2
0 1
1
1 is a basis for Eλ2 (A) so λ2 = 2 has geometric multiplicity gλ2 = 1.
1
Since aλ1 = gλ1 and aλ2 = gλ2 , we see that A is diagonalizable so we take
−1 −1 1
P = 1 0 1
0 1 1
0 0 2
Again, it’s a good idea to check P −1 AP = D even though it’s a bit more work to compute
P −1 for a 3 × 3 matrix.
We note that for a diagonalizable matrix A, the ith column of P is an eigenvector of A which
must correspond to the eigenvalue lying in the ith column of D. Thus, when A is diagonal-
izable, we normally construct P which then allows us to easily write out the diagonal matrix
D based on how we constructed P . We also note that an eigenvalue λ of a diagonalizable
matrix A appears in the diagonal matrix D aλ times.
265
Example 43.3. Recall from Example 41.7 that
" #
1 0
A=
5 1
has eigenvalues λ1 = 1 with aλ1 = 2. However, a basis for Eλ1 (A) is
(" #)
0
1
so gλ1 = 1 6= 2 = aλ1 . Hence A is not diagonalizable. This means that we cannot find two
linearly independent eigenvectors of A to form an invertible 2 × 2 matrix P .
Powers of Matrices
A useful application of diagonalizing is computing high powers of a matrix. Suppose A is
an n × n diagonalizable matrix. Then P −1 AP = D for some n × n invertible P and n × n
diagonal matrix D. Then A = P DP −1 and
A2 = P DP −1 P DP −1 = P DIDP −1 = P D2 P −1
Similarly, A3 = P D3 P −1 and more generally, Ak = P Dk P −1 for any positive integer k.
Although computing a high power of an arbitrary matrix is nearly impossible by inspection,
Lemma 42.4 states that to compute a positive power of a diagonal matrix, one need only
raise each of the diagonal entries to that power.
Example 43.4. Find Ak for any positive integer k where
" #
1 2
A= .
−1 4
Solution. From Example 43.1, A is diagonalizable with
" # " #
2 1 2 0
P = and D = .
1 1 0 3
Thus
Ak = P Dk P −1
" #" #" #
2 1 2k 0 1 −1
=
1 1 0 3k −1 2
" #" #
2k+1 3k 1 −1
= k k
2 3 −1 2
" #
2k+1 − 3k (2)3k − 2k+1
= .
2k − 3k (2)3k − 2k
266
Note that we can verify our work is reasonable - taking k = 1 gives
" # " #
1+1 1 1 1+1
2 − 3 (2)3 − 2 1 2
A1 = = = A.
21 − 31 (2)31 − 21 −1 4
Solution.
3 − λ −4
CA (λ) = = (3 − λ)(1 − λ) − 8 = λ2 − 4λ +3 − 8 = λ2 − 4λ − 5 = (λ − 5)(λ +1)
−2 1 − λ
so λ1 = −1 and λ2 = 5 are the eigenvalues of A. We see aλ1 = 1 = aλ2 , that is, the 2 × 2
matrix A has two distinct eigenvalues, so we are guaranteed that A is diagonalizable by
Corollary 42.11. For λ1 = −1,
" # " # " #
4 −4 −→ 4 −4 1
4
R 1 1 −1
A+I =
−2 2 R2 + 21 R1 0 0 −→ 0 0
so (" #)
1
1
is a basis for Eλ1 (A). For λ2 = 5,
" # " # " #
−2 −4 −→ −2 −4 1
− 2 R1 1 2
A − 5I =
−2 −4 R2 −R1 0 0 −→ 0 0
so (" #)
−2
1
is a basis for Eλ2 (A). Now, let
" # " #
1 −2 −1 0
P = from which it follows that D = .
1 1 0 5
267
Then " #
1 1 2
P −1 =
3 −1 1
and
" #" # " #
1 −2 (−1)k 0 1 1 2
Ak = P Dk P −1 =
1 1 0 5k 3 −1 1
" #" #
1 (−1)k (−2)5k 1 2
=
3 (−1)k 5k −1 1
" #
1 (−1)k + (2)5k 2(−1)k − (2)5k
= .
3 (−1)k − 5k 2(−1)k + 5k
We can use the eigenvalues of an n × n matrix A to compute the determinant and trace of
A. Suppose A has k distinct eigenvalues λ1 , . . . , λk with algebraic multiplicities aλ1 , . . . , aλk .
Then aλ1 + · · · + aλk = n and the characteristic polynomial of A is of the form
Taking λ = 0 gives
Thus, det A is the product of the eigenvalues of A where each eigenvalue λ of A appears in
the product aλ times. With a bit more work, one can show that
k
X
tr A = λ1 aλ1 + · · · + λk aλk = λi aλ1 ,
i=1
that is, the trace of A is the sum of the eigenvalues of A where each eigenvalue λ of A appears
in the sum aλ times.
1 1 0
268
λ1 = −1 and λ2 = 2 with aλ1 = 2 and aλ2 = 1. Thus
269
Lecture 44
Vector Spaces
Back in Lecture 6, we defined the operations of vector addition and scalar multiplication for
vectors in Rn . Theorem 6.10 then gave ten properties that vectors in Rn obey under our
definitions of vector addition and scalar multiplication. The notion of a vector space is to
consider a set V of objects with an operation of addition and scalar multiplication defined
upon them such that a similar set of properties as those stated in Theorem 6.10 also hold.
As an example, in Lecture 25, we defined addition and scalar multiplication for matrices in
Mm×n (R), and Theorem 25.7 showed that the same ten properties held for matrices under
these two operations.
Definition 44.1. A set V with an operation of addition, denoted ~x + ~y , and an operation of
scalar multiplication, denoted c~x, c ∈ R is called a vector space over R if for every ~v , ~x, ~y ∈ V
and for every c, d ∈ R
V1: ~x + ~y ∈ V
V2: ~x + ~y = ~y + ~x
V3: (~x + ~y ) + ~v = ~x + (~y + ~v )
V4: There exists a vector ~0 ∈ V, called the zero vector, so that ~x + ~0 = ~x for every ~x ∈ V.
V5: For every ~x ∈ V there exists a (−~x) ∈ V so that ~x + (−~x) = ~0
V6: c~x ∈ V
V7: c(d~x) = (cd)~x
V8: (c + d)~x = c~x + d~x
V9: c(~x + ~y ) = c~x + c~y
V10: 1~x = ~x
We call the elements of V vectors 63 .
Note that in the above definition, “over R” means that our scalars are real numbers. Later,
we will briefly mention vector spaces over C. Until then, all vector spaces are over R and we
will simply say “vector space”.
Example 44.2. We have seen that Rn , subspaces of Rn , and Mm×n (R) all satisfy these
properties and are thus vector spaces with the usual definition of addition and scalar multi-
plication.
Example 44.3. The set L(Rn , Rm ) of linear transformations from Rn to Rm is a vector
space with the standard definition of addition and scalar multiplication.
63
The textbook uses a boldface x to denote a vector in a vector space V.
270
Example 44.4. Let a, b ∈ R with a < b. With the standard addition and scalar multiplica-
tion,
Example 44.5. The set of discontinuous functions f : R → R with the standard addition
and scalar multiplication is not a vector space. To see this, consider
( (
1, x ≥ 0 0, x ≥ 0
f1 (x) = and f2 (x) =
0, x < 0 1, x < 0
for every x ∈ R and is thus continuous. Hence, V1 fails: the set of discontinuous functions
is not closed under addition and is thus not a vector space.
Our work involving spanning sets, linear independence and linear dependence, bases and
subspaces all carry over naturally to vector spaces. We restate those definitions here for an
arbitrary vector space V.
Definition 44.6. Let B = {~v1 , . . . , ~vk } be a set a set of vectors in a vector space V. The
span of B is
Span B = {c1~v1 + · · · + ck~vk | c1 , . . . , ck ∈ R}.
The set Span B is spanned by B, and B is a spanning set for Span B.
Definition 44.7. Let B = {~v1 , . . . , ~vk } be a set of vectors in a vector space V. We say that
B is linearly dependent if there exist c1 , . . . , ck ∈ R, not all zero so that
~0 = c1~v1 + · · · + ck~vk .
is c1 = · · · = ck = 0.
271
Theorem 44.9 (Subspace Test). Let S be a nonempty subset of V. If for every ~x, ~y , ∈ S,
and for every c ∈ R, we have that ~x + ~y ∈ S and c~x ∈ S, then S is a subspace of V.
Definition 44.10. Let S be a subspace of V, and let B = {~v1 , . . . , ~vk } be a set of vectors in
S. Then B is a basis for S if B is linearly independent and S = Span B. If S = {~0}, then we
define B = ∅ to be a basis for S.
Solution. By definition, S ⊆ Mn×n (R), and since 0n×n B = 0n×n , we have that 0n×n ∈ S and
S is nonempty. Let A1 , A2 ∈ S. Then A1 B = 0n×n = A2 B so
Solution. S is not a subspace of M2×2 (R). To see this, note that I ∈ S since I 2 = I. However,
since (2I)2 = 4I 6= 2I, the matrix 2I ∈ / S so S is not closed under scalar multiplication
(property V 6 fails).
272
This gives " # " #
c1 c2 0 0
=
c3 c4 0 0
and so clearly c1 = c2 = c3 = c4 = 0 and thus B is linearly independent. Also note that for
any [ ac db ] ∈ M2×2 (R),
" # " # " # " # " #
a b 1 0 0 1 0 0 0 0
=a +b +c +d
c d 0 0 0 0 1 0 0 1
so Span B = M2×2 (R). Thus B is a basis for M2×2 (R), called the standard basis for M2×2 (R).
Since B has 4 vectors, dim(M2×2 (R)) = 4.
We construct the standard basis for Mm×n (R) similarly, so dim(Mm×n (R)) = mn.
Example 44.15. Let
(" # " # " # " #)
1 1 1 1 0 1 1 0
B= , , ,
0 1 1 0 1 1 1 1
Show B is a basis for M2×2 (R) and express A = [ 13 24 ] as a linear combination of the vectors
(matrices) in B.
Solution. For c1 , c2 , c3 , c4 ∈ R, consider
" # " # " # " # " #
1 2 1 1 1 1 0 1 1 0
= c1 + c2 + c3 + c4
3 4 0 1 1 0 1 1 1 1
Equating corresponding entries gives the system
c1 + c2 + + c4 = 1
c1 + c2 + c3 + = 2
+ c2 + c3 + c4 = 3
c1 + + c3 + c4 = 4
which we carry to reduced row echelon form.
1 1 0 1 1 −→ 1 1 0 1 1 R1 −R3 1 0 −1 0 −2 R1 +R2
1
1 1 0 2 R2 −R1 0
0 1 −1 1 −→
0 0 1 −1 1 −→
0 1 1 1 3 0 1 1 1 3 0 1 1 1 3 R3 −R2
1 0 1 1 4 R4 −R1 0 −1 1 0 3 R4 +R3 0 0 2 1 6 R4 −2R2
1 0 0 −1 −1 −→ 1 0 0 −1 −1 R1 +R4 1 0 0 0 1/3
0
0 1 −1 1 R2 ↔R3 0 1 0
2 2
R2 −2R4
0 1 0 0 −2/3
0 1 0 2 2 0 0 1 −1 1 R3 +R4 0 0 1 0 7/3
0 0 0 3 4 1
R
3 4
0 0 0 1 4/3 −→ 0 0 0 1 4/3
273
so c1 = 1/3, c2 = −2/3, c3 = 7/3, c4 = 4/3 and
" # " # " # " # " #
1 2 1 1 1 2 1 1 7 0 1 4 1 0
= − + + .
3 4 3 0 1 3 1 0 3 1 1 3 1 1
Also, since the coefficient matrix reduces to I, the resulting homogeneous system derived
from " # " # " # " # " #
1 1 1 1 0 1 1 0 0 0
c1 + c2 + c3 + c4 =
0 1 1 0 1 1 1 1 0 0
has only the trivial solution, so B is linearly independent. Since B has 4 vectors and
dim(M2×2 (R)) = 4, Span B = M2×2 (R) so B is a basis for M2×2 (R).
Example 44.16. Let S = {A ∈ M2×2 (R) | AT = A} be a subspace of M2×2 (R). Find a basis
for S.
" #
a1 a2
Solution: Let A = ∈ S. Then AT = A so
a3 a4
" # " #
a1 a3 a1 a2
=
a2 a4 a3 a4
274
Lecture 45
Definition 45.1. The set
P (R) = a0 + a1 x + a2 x2 + · · · | a0 , a1 , a2 , . . . ∈ R where only finitely many of the ai s are nonzero
275
Note that B = {1, x, x2 , . . . } is the standard basis for P (R). We see then that P (R) is
infinite dimensional !
c1 (1) + c2 (1 + x) + c3 (1 + x + x2 ) = 0
Rearranging gives
Thus
c1 + c2 + c3 = 0
c2 + c3 = 0
c3 = 0
and we see that c3 = 0 which implies that c2 = 0 which in turn gives c1 = 0. Thus B is linearly
independent. Since B has 3 elements and dim(P2 (R)) = 3, we see that Span B = P2 (R) and
so B is a basis for P2 (R).
c1 (1 + x) + c2 (1 − x) + c3 (1) + c4 (2x) + c5 (x + x2 ) = 0.
Rearranging gives
c1 + c2 + c3 = 0
c1 − c2 + 2c4 + c5 = 0
c5 = 0
We see immediately that this system is underdetermined and thus has nontrivial solutions.
This allows us to conclude that B is a linearly dependent set. Carrying the coefficient matrix
of our system to reduced row echelon form gives
1 1 1 0 0 −→ 1 1 1 0 0 −→ 1 1 1 0 0 R1 −R2
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
276
1 0 1/2 1 1/2 R1 − 21 R3 1 0 1/2 1 0
0 1 1/2 −1 −1/2 0 1 1/2 −1 0
R2 + 21 R3
0 0 0 0 1 −→ 0 0 0 0 1
From any of the above row echelon forms, we can see that there are leading entries in the
first, second and fifth columns. Thus we can tell that 1 and 2x can be expressed as linear
combinations of 1 + x and 1 − x, but from the reduced row echelon form, we see that
1 1
1 = (1 + x) + (1 − x)
2 2
2x = 1(1 + x) − 1(1 − x)
B 0 = {1 + x, 1 − x, x + x2 }
is a linearly independent subset of B with Span B 0 = Span B. Since B 0 has 3 elements and
dim(P2 (R)) = 3, Span B 0 = P2 (R), so B 0 is a basis for P2 (R).
Example 45.6. Consider the subspace S = {p(x) ∈ P2 (R) | p(2) = 0} of P2 (R). Find a basis
for S.
Solution: Let p(x) ∈ S. Then p(2) = 0 so x − 2 is a factor of p(x). Since p(x) ∈ P2 (R), there
are a, b ∈ R so that p(x) = (x − 2)(ax + b) = ax2 + bx − 2ax − 2b = a(x2 − 2x) + b(x − 2) so
S = Span {x2 − 2x, x − 2}. Since neither x2 − 2x nor x − 2 is a scalar multiple of the other,
{x2 − 2x, x − 2} is linearly independent and thus a basis for S.
277
Lecture 46
Example 46.1. Let V = {x ∈ R | x > 0}. Then under the standard operations of addition
and scalar multiplication of real numbers, V is not a vector space over R since, for example,
V 6 fails. To see this, note that 2 ∈ V and −1 ∈ R, but that (−1)2 = −2 ∈ / V. We also note
V 4 and V 5 fail. However, we define a new addition, ⊕, and a new scalar multiplication, ,
as follows: for all x, y ∈ V and for all c ∈ R
x ⊕ y = xy
c x = xc
V2: x ⊕ y = xy = yx = y ⊕ x
V8: (c + d) x = xc+d = xc xd = xc ⊕ xd = (c x) ⊕ (d x)
V10: 1 x = x1 = x
This shows that V is a vector space over R.
This example serves to show that it can be possible to redefine the operations of vector
addition and scalar multiplication to make a set a vector space. The resulting vector space
is quite bizarre: we saw that 1 is the zero vector64 of V, and the additive inverse of x ∈ V is x1 .
We don’t always use ⊕ and to denote vector addition and scalar multiplication, even if
these definitions have been redefined. If the definitions are clearly understood, then we can
use the standard notation.
64
This does not imply that 1 = 0. It simply says that under our new rules of vector addition and scalar
multiplication, 1 plays the role of the zero vector, that is, x ⊕ 1 = x for every ~x ∈ V. Note that as defined,
0∈
/ V.
278
Theorem 46.2. If V is a vector space, then for every ~x ∈ V,
1) 0~x = ~0
2) −~x = (−1)~x.
Proof of (2). For ~x ∈ V
(−1)~x = (−1)~x + ~0 by V4
= (−1)~x + (~x + (−~x)) by V5
= ((−1)~x + ~x) + (−~x) by V3
= ((−1)~x + 1~x) + (−~x) by V10
= ((−1) + 1)~x + (−~x) by V8
= 0~x + (−~x) Since − 1 + 1 = 0
= ~0 + (−~x) by part (1) above
= (−~x) + ~0 by V2
= −~x by V4
As an illustration of Theorem 46.2, we consider V from Example 46.1. For any ~x ∈ V
0 x = x0 = 1 ← the zero vector
and
1
= x−1 = (−1)
−x = x
x
which is consistent with Theorem 46.2.
Vector Spaces over C
• Cn is a vector space over65 C. For ~z ∈ Cn ,
z1
.
~z = .. = z1~e1 + · · · + zn~en .
zn
We call {~e1 , . . . , ~en } (where ~ei is the ith column of the n × n identity matrix) the
standard basis for Cn , so dim(Cn ) = n.
• Mm×n (C) is a vector space over C. The standard basis for Mm×n (C) is the same as for
Mm×n (R), so dim(Mm×n (C)) = mn.
• Pn (C) (the set of polynomials of degree at most n) is a vector space over C. The
standard basis is {1, x, . . . , xn } (here, x ∈ C), so dim(Pn (C)) = n + 1.
The notions of subspace, span, linear independence, basis and dimension are handled the
same way as for real vector spaces.
65
The expression “over C” means our scalars are complex numbers.
279
THE END
55
A 4−dimensional cube, often called a tesseract or a hypercube. The same hypercube is depicted on the
cover of these notes, but is viewed from a different angle.
280