Arsdigita University Month 0: Mathematics For Computer Science
Arsdigita University Month 0: Mathematics For Computer Science
Arsdigita University Month 0: Mathematics For Computer Science
(x) = lim
x0
y
x
= lim
x0
f(x + x) f(x)
x
We will look at several examples of computing derivatives by this denition.
2.4. Velocity and Rates of Change. In recitation, we will see how derivatives appear
in the real world as the velocity of a position function, and as a rate of change. We will
look at several problems of this nature.
Lecture 3: Differentiation methods.
In this lecture, we will see several methods for computing derivatives, so that we do not
always have to use the denition of derivative in order to do computations.
3.1. Rules for dierentiation. Here is a list of dierentiation rules. We will prove some
of these in class, and use them to compute some dicult derivatives.
1.
d
dx
c = 0 for any constant c R.
2.
d
dx
(x
n
) = n x
n1
.
3.
d
dx
(c f(x)) = c
d
dx
(f(x)), for any constant c.
4.
d
dx
(f(x) +g(x)) =
d
dx
(f(x)) +
d
dx
(g(x)).
5. The product rule.
d
dx
(f(x) g(x)) = g(x)
d
dx
(f(x)) +f(x)
d
dx
(g(x)).
6. The quotient rule.
d
dx
f(x)
g(x)
=
g(x)
d
dx
(f(x)) f(x)
d
dx
(g(x))
(g(x))
2
.
7. The chain rule.
d
dx
(f(g(x)) = f
(g(x))g
(x) =
df
du
du
dx
,
where u = g(x).
8. The power rule.
d
dx
(u
n
) = n u
n1
du
dx
.
9.
d
dx
(sin(x)) = cos(x).
10.
d
dx
(cos(x)) = sin(x).
11.
d
dx
(e
x
) = e
x
.
12.
d
dx
(a
x
) = ln(a) a
x
.
13.
d
dx
(ln(x)) =
1
x
.
14.
d
dx
(log
a
(x)) =
1
ln(a)x
.
We will also discuss implicit functions and implicit dierentiation.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
4 Outline
3.2. Graphing functions. Derivatives can be used to graph functions. The rst derivative
tells us whether a function is increasing or decreasing. The second derivative tells us about
the concavity of the function. This will be discussed in recitation.
Lecture 4: Max-min problems. Taylor series.
Today, we will be using derivatives to solve some real-world problems.
4.1. Max-min problems. The idea with max-min problems is that a function f(x) dened
on an interval [a, b] will take a maximum or minimum value in one of three places:
1. where f
(x) = 0,
2. at a, or
3. at b.
Thus, if we are given a problem such as the following, we know where to look for maximums,
minima, the biggest, the smallest, the most, the least, etc.
Example. Suppose we have 100 yards of fencing, and we want to enclose a rectangular
garden with maximum area. What is the largest area we can enclose?
We will discuss methods for solving problems like the following.
4.2. Approximations. We can also use the derivative to compute values of functions.
Suppose we want to compute
f(x) dx = F(x).
Notice that a constant can always be added to F(x), since
d
dx
c = 0! We will compute several
anti-derivaties.
Taking antiderivatives can be quite dicult. There is no genearal product rule or chain
rule, for example. We have some new rules, however.
5.3. Change of variables. The change of variables formula is the antidierentiation ver-
sion of the chain rule. It can be stated as follows.
f(g(x))g
(x) dx =
f(u) du.
There is also an antidierentiation version of the product rule. We will discuss this at a
later time.
Lecture 6: Area under a curve. Fundamental Theorem of Calculus.
I claimed at the beginning of the course that integral calculus attempted to answer a ques-
tion involoving the area under a curve. We will see how this is related to antidierentiation
today.
6.1. Riemann sums. We can approximate the area between a function and the x-axis
using Riemann sums. The idea is that we use rectangles to estimate the area under the
curve. We sum up the area of the rectangles, and take the limit as the rectangles get
more and more narrow. This leads us to the left-hand and right-hand Riemann sums. The
left-hand Riemann sum is
b
a
f(x) dx = lim
x0
n1
k=0
f(x
k
) x.
The right-hand Riemann sum is
b
a
f(x) dx = lim
x0
n
k=1
f(x
k
) x.
These both converge to the area we are looking for, so long as f(x) is a continuous function.
We will make several computations using Riemann sums, and we will look at a few more
examples of Riemann sums.
These Riemann sums seem to have to relation to the antiderivatives discussed above,
although we have used the same notation. It turns out that these two ideas are realted.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
6 Outline
6.2. The Fundamental Theorem of Calculus. We will prove the following theorem.
Theorem 6.2.1. If f(x) is a continuous function, and F(x) is an antiderivative of f(x),
then
b
a
f(x) dx = F(b) F(a).
We will discuss the implications of this theorem, and look at several examples. Furthermore,
we will discuss geometric versus algebraic area.
Lecture 7: Second Fundamental Theorem of Calculus. Integration by
parts. The area between two curves.
7.1. Discontinuities and Integration. First, lets talk about one horric example where
everything seems to go wrong. This will be an example of why we need a function f(x) to
be continuous in order to integrate. Consider the function
f(x) =
0 x is a rational number,
1 x is an irrational number.
We cant graph this function. The problem is, between every two rational points, there is
an irrational point, and between every two irrational points, there is a rational point. It
turns out that this function is also impossible to integrate. We will discuss why.
7.2. The Second Fundamental Theorem of Calculus. When we were taking deriva-
tives, we never really came across functions we couldnt dierentiate. From our little bit
of experience with taking indenite integrals (the ones of the form
x
a
f(t) dt = f(x).
This is a by-product of the proof we gave yesterday of the First Fundamental Theorem
of Calculus.
7.3. Integration by Parts. I talked about forming the inverse of the Product Rule
from dierentiation to make a rule about taking integrals. This is known as integration by
parts, and the rule is
u dv = u v
v du.
We will discuss this rule further, how to apply it, and some tricks for knowing which choices
to make for u and v.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 7
f(x)
a
b
g(x)
Figure 2. Were trying to compute the shaded area
between f(x) and g(x) from x = a to x = b.
7.4. The area between two curves. Finding the area between two curves looks pretty
dicult. Suppose f(x) > g(x). See the gure. If we think about it, though, this is not a
hard problem at all! When we think about it, all we have to do is take the area under f(x)
on [a, b], and subtract from that the area under g(x) on [a, b]. Thus, the area between f(x)
and g(x) on [a, b] is
b
a
f(x) dx
b
a
g(x) dx =
b
a
(f(x) g(x)) dx.
We will discuss some subtleties with this rule.
Lecture 8: Volumes of solids of rotation. Integration techniques.
8.1. Volumes of solids of rotation. Today, well talk about two methods to compute
the volumes of rotational solids, volumes by discs and volumes by cylindrical shells.
Let us consider for a moment the intuitive idea behind a denite integral. The denite
f(x)
a x b
integral
b
a
f(x) dx is supposed to be an innitesimal sum (denoted by
b
a
) along the x-
axis (denoted by dx) of slices with height f(x). The methods we will present to compute
volumes of rotation will have a similar intuitive idea.
8.1.1. Volume by slices. Consider a function f(x) from a to b. We can take the region
between this curve and the x-axis and rotate it around the x-axis. We get a solid of
revolution.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
8 Outline
rotate
The goal is to compute the volume of this solid. We have seen some examples of this
already in recitation. The rst method we will use to compute a volume like this is area by
x
A(x)
Figure 3. This shows the slice at the point x. We
say that this slice has area A(x).
slices. The volume is the same as innitesimally adding up the area A(x) of the slices, as
shown in the above gure. So
Volume =
b
a
A(x) dx.
But this slice is just a circle with radius f(x), so A(x) = r
2
= f(x)
2
.
Volume =
b
a
f(x)
2
dx.
8.1.2. Volume by shells. Again, we consider a solid of revolution. In this case, instead of
slicing up our solid, we think of it as lled by cylinders, paper-thin cylinders. Thus, we will
often use this method when the solid whose volume we wish to determine is described by a
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 9
function rotated around the y-axis rather than the x-axis. We can think of the volume as
the innitesimal sum of the surface area of the cylinders:
r
h
Surface Area = 2rh
Thus, when computing the volume by cylindrical shells, we have
Volume =
b
a
2xf(x) dx.
Care is required to gure out what the appropriate height function is, and what bounds
of integration to use. Furthermore, it is often useful to nd a voume such as one of these
by integrating with respect to y rather than x. We will see several examples of the cases
to which these slices and cylindrical shells apply.
8.2. Integration using partial fractions. A rational function is the quotient of two
polynomials,
P(x)
Q(x)
. It is proper if
deg(P(x)) < deg(Q(x))
and improper if
deg(P(x)) deg(Q(x)).
An improper rational function can always be reduced to a polynomial plus a proper rational
function. For example,
x
4
x
4
1
=
(x
4
1) + 1
x
4
1
= 1 +
1
x
4
1
, and
x
5
x
4
1
=
(x
5
x) +x
x
4
1
= x +
x
x
4
1
.
Furthermore, we can simplify or combine terms:
2
x + 3
+
6
x 2
=
2(x 2) + 6(x + 3)
(x 2)(x + 3)
=
8x + 14
x
2
+x 6
.
How can we go backwards? This is most easily illustrated with an example. If we start
with
2x 11
(x + 2)(x 3)
=
A
x + 2
+
B
x 3
,
how do we nd A and B? We know that A and B must satisfy
A(x 3) +B(x 2) = 2x 11 , so
(A+B)x + (2B 3A) = 2x 11 ,
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
10 Outline
so
A+B = 2 ,
3A + 2B = 11 .
We can solve this system, or we can substitute x = 3, 2 in A(x 3) + B(x + 2) to get
A(3 3) +B(3 + 2) = 2 3 11
5B = 5
B = 1 ,
and
A(2 3) +B(2 + 2) = 2(2) 11
5A = 15
A = 3 .
So
2x 11
(x + 2)(x 3)
=
3
x + 2
1
x 3
.
What is this good for? Suppose we want to integrate
2x 11
(x + 2)(x 3)
dx.
We might try to substitute u = x
2
+x 6, but we would soon see that (2x 11) dx is not
du. So instead, we can split this up:
2x 11
(x + 2)(x 3)
dx =
3
x + 2
dx
1
x 3
dx
u = x + 2
du = dx
w = x 3
dw = dx
=
3
u
du
1
w
dw
= 3 ln |x + 2| ln |x 3| +c
= ln
|x + 2|
3
|x 3|
+c .
A couple of cautionary notes about partial fractions:
1. If there is a term (x a)
m
in the denominator, we will get a sum of the form
A
1
x a
+
A
2
(x a)
2
+ +
A
m
(x a)
m
,
where the A
i
are all constants.
2. If there is a quadratic factor ax
2
+bx +c in the denominator, it gives a term
Ax +b
ax
2
+bx +c
.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 11
8.3. Dierential Equations. Finally, if there is time, we will talk a bit about dierential
equations. A dierential equation is an equation that relates a function and its derivatives.
For example, consider
dy
dx
= xy.
The goal is to determine what y is as a function of x. If we treat the term
dy
dx
as a fraction,
we can separate the variables and put all terms with y and dy on the left-hand side, and
all terms with x and dx on the right-hand side. Upon separating, we get
dy
y
= x dx.
We can now integrate each side to get
ln |y| =
dy
y
=
x dx =
x
2
2
+c.
Thus, by raising each side to the power of e, we get
y = C e
x
2
,
where C is a constant. Dierential equations that can be solved in this manner are called
dierential equations with variables separable. There are many other dierential equations
which can not be solved in this way. Many courses in mathematics are devoted to solving
dierent kinds of dierential equations. We will not spend any more time on them in this
course.
Lecture 9: Trigonometric substitution. Multivariable functions.
9.1. Trigonometric substitutions. Today, well discuss trigonometric substitutions a
little further. To use the method of trigonometric substitution, we will make use of the
following identities.
1 sin
2
= cos
2
,
1 + cot
2
= csc
2
,
sec
2
1 = tan
2
.
If we see an integral with the function
a
2
x
2
in the integrand, with a a constant, we
can make the substitution x = a sin . Then we have
a
2
x
2
=
a
2
a
2
sin
2
=
a
2
cos
2
= a cos .
Notice that in the case, dx = a cos d. We make similar substitutions in the remaining
cases. If we see a function of the form
x
2
a
2
in the integrand, we can make the substi-
tution x = a sec . If we see a function of the form
x
2
+a
2
in the integrand, we can make
the substitution x = a cot .
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
12 Outline
9.2. Cavalieris Principle. Next, well discuss Cavalieris Principle. This relates to the
disc method we discussed yesterday, although Cavalieri, a contemporary of Galileo, stated
the result centuries before the advent of calculus. We will see that this principle is true
using calculus.
Theorem 9.2.1. If two solids A and B have the same height, and if cross-sections parallel
to their bases have the same corresponding areas, then A and B have the same volume.
This is clearly true, if we take the volumes by discs. Since the areas of the discs are the
same, the volume that we compte when we integrate will be the same.
9.3. Multivariable Functions. Finally, there will be a brief introduction to multivariable
functions. We will discuss graphing multivariable functions and multiple integrals, as time
permits.
Given a multivariable function z = f(x, y), we can graphs its level curves. These are
the curves that we obtain by setting z = a, for constants a. As a varies, we get all of the
level curves of the function, which we graph in the xy-plane. For example, we can draw
the level curves of the function z = x
2
+ y
2
. These will just be circles a = x
2
+ y
2
. Using
z
=
4
z
=
2
z
=
1
z
=
3
Figure 4. This shows the level curves of the func-
tion z = x
2
+ y
2
.
the level curves, we can get a three dimensional view of the function. Now instead of a
Figure 5. This shows the three dimensional graph
of the function z = x
2
+ y
2
. This graph is known as
a paraboloid.
tangent line to a curve, we have a tangent plane to a surface. We can still compute partial
derivatives. But there are many direections in which we can dierentiate. We now have
directional derivatives. Finally, we can also compute the volume under a surface z = f(x, y)
over a region R. To do this, we can break R up into many small squares, and add up lots
of rectangular solids, taking their height to be some avearge value of the function. We can
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 13
make exactly the same Riemann sums as before, but this time instead of just dx, we have
dA = dx dy = dy dx.
Lecture 10: Linear algebra
Today, we make a total break from calculus and start linear algebra. We (probably)
wont take another derivative or integral this month! In linear algebra, we will discuss a
special class of functions, linear functions
f : R
n
R
m
.
We will begin by discussing vectors, properties of vectors, and matrices.
10.1. Vectors. We have vectors in R
n
. For example,
1
2
1
0
3
R
5
.
In general, we will think of vectors in R
n
as column vectors, but we may write them as row
vectors in order to save space. We will use v and w to represent vectors. We add vectors
component-wise. That is,
1
2
1
0
3
2
0
5
4
1
1 + 2
2 + 0
1 + 5
0 4
3 + 1
3
2
6
4
4
.
We can multiply a vector by a real number a R, a scalar, and we do this component-wise
as well. We cannot multiply two vectors, but we can take their dot product.
x
1
x
2
.
.
.
x
n
y
1
y
2
.
.
.
y
n
=
n
i=1
x
i
y
i
.
The dot product of two vectors is a real number. The length of a vector v is ||v|| =
v v.
A unit vector is a vector v with ||v|| = 1. If we have two vectors v and w in R
n
, then the
angle between them can be computed by
cos =
v w
||v|| ||w||
.
This is known as the Law of Cosines. Notice that if the vectors are perpindicular, then
the cosine is 0, so the dot product of the vectors must be 0. We also have the Schwartz
Inequality, which says
v w ||v|| ||w||.
Note that this follows from the Law of Cosines.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
14 Outline
10.2. Matrices. A matrix is an mn array. An mn matrix has m rows and n columns.
We will use a matrix to store data. The (i, j) entry of a matrix is the entry in the i
th
rown
and j
th
column. Consider the matrix
342 216 47
312 287 29
.
The rst row might represent sales in August 2000, and the second sales in August 1999.
The rst column might reperesent gallons of milk, the second loaves of bread, and the third
heads of lettuce. So long as we remember what each row and column represents, we only
need keep track of the entries in each position.
As there were vector operations, we also have matrix operations. We can add two ma-
trices, so long as they are the same size. As before, addition is entry-wise. For example,
1 2
2 0
2 3
7 4
1 2 2 + 3
2 + 7 0 + 4
.
We can also multiply a matrix by a scalar. Again, this is entry-wise. Finally, we can
multiply two matrices together, so long as they have the right sizes. We can multiply an
mn matrix by an n p matrix to get a mp matrix. The way this happens is if we are
multiplying A B to get C, then the (i, j) entry of C is the dot product of the i
th
row of A
with the j
th
column of B. Notice that the dimension restrictions on A and B are such that
these two vectors are the same size, so their dot product is well-dened. For example,
1 2
2 0
2 3
7 4
1 (2) + 2 7 2 (2) + 0 7
1 3 + 2 4 2 3 + 0 4
.
We can also use a matrix to represent the coecients of linear equations. For example,
if we have the system of equations
342 216 47
312 287 29
x
y
z
1041
998
.
We will often refer to this form as A x = b, where A is the coecient matrix, x is the
vector of variables, and b is the desired solution vector.
Lecture 11: Gaussian elimination. Matrix operations. Inverses.
Today, we will further explore matrices.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 15
11.1. Gaussian Elimination. First, we will learn an algorithm for solving a system of
linear equations. This Gaussian elimination will come up later in the algorithms course,
when you have to gure out its time complexity. Suppose you have the following system of
equations.
x + 2y z = 2
x + y + z = 3
.
Subtracting the rst equation from the second gives a new system with the same solutions.
x + 2y z = 2
y + 2z = 1
.
This system has innitely many solutions. We can let the variable z take any value, and
then x and y are determined. The existence of innitely many solutions can also occur in
the case of three equations in three unknowns. If a coecient matrix A is n n, or square,
then we say that it is singular if A x = b has innitely many solutions or no solutions.
Before we give the algorithm for Gaussian elimination, we will dene the elementary row
operations for a matrix. An elementary row operation on an mn matrix A is one of the
following.
1. Swapping two rows.
2. Multiplying a row by a constant c R.
3. Adding c times row i of A to d times row j of A.
Now, given a system of equations such as
x + 2y z = 2
x + y + z = 3
,
the augmented matrix associated to this system is
1 2 1 2
1 1 1 3
.
In general, for a system A x = b, the augmented matrix is
A b
A I
I B
.
Then the matrix B is the inverse of A. We will talk about how to prove this in class.
A square matrix is not always invertible. It turns out that a matrix is not invertible if
and only if it is singular.
Lecture 12: Matrices. Factorization A = L U. Determinants.
12.1. Facts about matrices. Here are some interesting facts about matrices, their mul-
tiplication, addition, and inverses. Each of the following is true when the dimensions of the
matrices are such that the appropriate multiplications and additions make sense.
1. It is not always true that AB = BA!
2. There is an associative law, A(BC) = (AB)C.
3. There is also a distributive law A(B +C) = AB +AC.
4. (AB)
1
= B
1
A
1
.
There are also some important types of matrices. We say that an entry of a matrix is on
the diagonal if it is the (i, i) entry for some i. A matrix is a diagonal matrix if all of its
non-diagonal entries are zero. For example, the matrix
d
1
0 0 . . . 0
0 d
2
0 . . . 0
0 0
.
.
. . . . 0
0 0 0 . . . d
n
is a diagonal matrix. Diagonal matrices have some nice properties. They commute with
each other (that is, CD = DC), and they are easy to multiply. The identity matrix I is a
special diagonal matrix: all its diagonal entries are 1. Another type of matrix is an upper
triangular matrix. This is a matrix which has zeroes in every entry below and to the left
of the diagonal. A lower triangular matrix. is a matrix which has zeroes in every entry
above and to the right of the diagonal. Finally, the transpose of an m n matrix A is an
n m matrix A
T
, where the rows and columns are interchanged. In this operation, the
(i, j)-entry becomes the (j, i)-entry. If a matrix has A = A
T
, then we say A is symmetric.
Notice that (A
T
)
T
= A, and that (AB)
T
= B
T
A
T
.
12.2. Factorization of A. We will discuss the factorization
A = L U
of a matrix A into a lower triangular matrix L times an upper triangular matris U. It turns
out that we already have the matrix U. This is the matrix that is the result of Gaussian
elimination. If we keep track of our row operations, then we can also determine L. Note
that if you need to perform a row exchange during elimination, then the factorization will
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 17
be of the form P A = L U. We will discuss how to nd L and P if necessary in class.
Finally, in the case that A is a symmetric matrix, the factorization A = L U becomes
A = L D L
T
,
where D is a diagonal matrix, and L is lower triangular (and hence L
T
is upper triangular).
Our ultimate goal is to factor a matrix A into the form
A = C D C
1
,
where C is an invertible matrix, and D is a diagonal matrix. This factorization will allow
us to compute powers of A easily, since
A
n
= (C D C
1
)
n
= C D
n
C
1
.
12.3. Determinants. As the name suggests, the determinant of a matrix will determine
something about the matrix: if det(A) = 0, then A is a singular matrix. If A is a 2 2
matrix, then
A =
a b
c d
,
and det(A) = adbc. We will dene the determinant for a larger square matrix recursively.
Some important properties of the determinant are the following.
1. det(I) = 1 for I the identity matrix.
2. If two rows of A are the same, the det(A) = 0.
3. If a row of A is all zeroes, then det(A) = 0.
4. If D is a diagonal matrix, then det(D) is the product of the diagonal entries.
5. If A is upper or lower triangular, then det(A) is the product of the diagonal entries.
6. det(AB) = det(A) det(B).
Lecture 13: Vector spaces.
13.1. Application of factorization. We have seen three algorithms thus far. They can
be described as follows.
1. Gaussian elimination.
(a) Ax = b gives an augmented matrix
A b
.
(b) Use row operations to transform
A b
into
U c
, where U is an upper
triangular matrix.
(c) Back solve to nd solution(s) x.
2. Gauss-Jordan elimination.
(a) Ax = b gives an augmented matrix
A b
.
(b) Use row operations to transform
A b
into
I d
.
(c) Read o answer(s) x = d.
3. A = L U factorization.
(a) Ax = b. Just consider A for a minute.
(b) Factor A = L U.
(c) Ax = L U(x) = b.
(i) Forward substitute Lz = b.
(ii) Back solve Ux = z.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
18 Outline
We will discuss each of these algorithms, and see why the last algorithm is the fastest.
13.2. Abstract vector spaces. We are not going to take a step towards the more abstract
side of linear algebra. We are going to dene a vector space, something like R
n
. A vector
space (over R) is a set V along with two operations, + addition of vectors and scalar
multiplication, such that for all vectors v, w, z V and scalars c, d R,
1. v +w V and v +w = w +v.
2. (v +w) +z = v + (w +z).
3. There is a zero-vector 0 such that 0 +v = v.
4. For every v, there is an additive inverse v such that v + (v) = 0.
5. c v V and c (d v) = (c d) v.
6. c (v +w) = c v +c w.
7. (c +d) v = c v +c w.
Of these, the most important ones are numbers 1 and 5. Most of the other properties
are implied by these two. The most important thing to check is that for any two vectors
v, w V , any linear combination c v +d w of them is also in V .
In class, we will see many examples of vector spaces. We will continue our discussion
of vector spaces by dening subspaces. A subset W V of a vector space V is a vector
subspace if it is also a vector space in its own right. A map L : V
1
V
2
is a linear
transformation if L(c v + d w) = c L(v) + d L(w). The kernel or nullspace of a linear
transformation is the subset W
1
V
1
such that L(w) = 0 for all w W
1
. We write the
kernel as ker(L) or N(L). The nullspace of a vector space is a subspace.
Lecture 14: Solving a system of linear equations.
14.1. More vector spaces. First, we will go over several examples of vector spaces. The
most important are the row space and column space of a matrix. Given a matrix A, the
row space R(A) is the vector space consisting of all linear combinations of the row vectors
of A. The column space C(A) is the vector space consisting of all linear combinations of
column vectos of A.
14.2. Solving systems of linear equations. We will discuss the complete solution to a
systme of linear equations,
A x = b.
To nd the complete solution, we must put the matrix into reduced row echelon form. A
matrix is in reduced row echelon form if it satises the following conditions.
1. All rows consisting entirely of zeroes, if there are any, are the bottom rows of the
matrix.
2. If rows i and i +1 are successive rows that do not consist entirely of zeroes, the leading
non-zero entry in row i + 1 is to the right of the leading non-zero entry in row i.
3. If row i does not consist entirely of zeroes, then its leading non-zero entry is 1.
4. If a column has a leading non-zero entry (a 1!) in it in some row, then every other
entry in that column is zero.
The columns that contain a leading entry are called pivot columns. The columns that do
not contain leading entries are called free columns. The rows that are not entirely zeroes
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 19
are pivot rows. The rank of a matrix A is the number of pivot columns (or rows) that it has
when it is transformed into reduced row echelon form. We write rank(A) = r. The pivot
variables are the variables that correspond to the pivot columns. The free variables are the
variables corresponding to the free columns.
To nd a particluar solution x
p
to Ax = b, we put
A b
1
0
0
1
.
This is an orthogonal basis.
Finally, two subspaces W
1
and W
2
inside V are orthogonal subspaces if for every w
1
W
1
and w
2
W
2
, w
1
and w
2
are orthogonal. Suppose that A is an m n matrix with rank
r. Then the row space R(A) and the nullspace N(A) are orthogonal subspaces of R
m
, and
the column space C(A) and the left nullspace N(A
T
are orthogonal subspaces of R
n
.
16.2. Orthonormality. We say that a basis B = {q
1
, . . . , q
n
} of R
n
is an orthonormal
basis if every pair of vectors is orthogonal, and if each vector has length 1. That is,
q
i
q
j
=
0 i = j
1 i = j
.
The standard basis for R
2
described above is in fact an orthonormal basis. Orthonormal
bases are particularly easy to work with because of the following theorem.
Theorem 16.2.1. If B = {q
1
, . . . , q
n
} is an orthonormal basis of R
n
and v R
n
is any
vector, then
v = c
1
q
1
+ +c
n
q
n
,
where c
i
= v q
i
is the dot product of v and q
i
for all i.
We say that a matrix is an orthogonal matrix if its columns form an orthonormal basis
of the column space of the matrix. We will be particularly interested in square orthogonal
matrices. The name orthogonal is a bit of a misnomer. Unfortunately, the name has
mathematical roots, and so although we would like to call this an orthonormal matrix, we
will not. Suppose
Q =
q
1
q
2
q
n
v
2
w
1
w
1
w
1
w
1
.
We continue in this manner, and let
w
i
= v
i
v
i
w
1
w
1
w
1
w
1
v
i
w
2
w
2
w
2
w
2
v
i
w
i1
w
i1
w
i1
w
i1
.
At this stage, we could stop, and we would have an orthogonal basis {w
1
, . . . , w
n
} such that
the subspace spanned by {v
1
, . . . , v
k
} is the same as the subspace spanned by {w
1
, . . . , w
k
}
for all k. To turn this into an orthonormal bases, we normalize, and set
q
i
=
w
i
||w
i
||
.
This basis {q
1
, . . . , q
n
} is the desired orthonormal basis. We will prove in class that this is
indeed an orthonormal basis.
16.4. Change of basis matrices. Let B be the standard basis of R
n
. That is, B =
{e
1
, . . . , e
n
}, where e
i
is the vector that has a 1 in the i
th
position, and zeroes elsewhere.
Let
B = {v
1
, . . . , v
n
} and
B = {w
1
, . . . , w
n
} be two other bases of R
n
. We will now discuss
the change of basis matrices. Suppose we have a vector
v =
c
1
.
.
.
c
n
R
n
.
Then we can write this vector as
c
1
.
.
.
c
n
= c
1
e
1
+ +c
n
e
n
= I
c
1
.
.
.
c
n
.
We may want to write our vector in terms of
B. To do this, we need to nd constants
b
1
, . . . , b
n
such that
v = b
1
v
1
+ +b
n
v
n
=
v
1
v
n
b
1
.
.
.
b
n
.
Solving this equation, we get that
b
1
.
.
.
b
n
v
1
v
n
c
1
.
.
.
c
n
.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
22 Outline
Thus, we say that the change of basis from B to
B is
M
B
B
=
v
1
v
n
1
.
The change of basis matrix from
B to B is
M
BB
=
v
1
v
n
.
Finally, if we want to change basis from
B to
B, then the change of basis matrix is
M
B
B
= M
B
B
M
BB
.
We seem to have multiplied in the wrong order. We will discuss this in class, and compute
several examples.
Lecture 17: Eigenvalues, eigenvectors and diagonalization.
17.1. Eigenvalues and eigenvectors. For a matrix A, the eigenvalues of A are the real
numbers such that
A v = v
for some vector v. Lets explore this equation further.
A v = v = ( I) v
Av Iv = 0
(A I) v = 0
This last line will have a nonzero solution v if and only if
det(AI) = 0.
Thus, we nd the eigenvalues of A by nding roots of the polynomial
det(AI) = 0.
An eigenvector of a matrix A is a vector v such that
A v = v,
for some real number . Once we know what the eigenvalues are, we can nd associated
eigenvectors.
Eigenvalues do not necessarily exist, other than 0. They are the roots of a degree n poly-
nomial, and so there are at most n eigenvalues, and there can be no non-zero eigenvalues.
If we have n eigenvalues, they may not be distinct. If we have n distinct eigenvalues, then
if we choose one eigenvector for each one, these will form a basis for R
n
.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
Outline 23
17.2. Diagonalization. If A is a matrix, and there is a diagonal matrix D and an invertible
matrix P such that
A = P D P
1
,
then we say that A is diagonalizable.
Theorem 17.2.1. A matrix A is diagonalizable if it has n distinct eigenvalues. In this
case, if
1
, . . . ,
n
are the eigenvalues and v
1
, , v
n
are the eigenvectors, then
A =
v
1
v
n
1
0 0
0
.
.
. 0
0 0
n
v
1
v
n
1
.
In the case when the eigenvalues are not distinct, the matrix may or may not be diago-
nalizable. In this case, there is still a canonical form known as the Jordan canonical form.
We will discuss this in class. Some matrices may be diagonalizable over R, but they are
over C. We will see an application of diagonalization tomorrow.
Lecture 18: Fibonacci numbers. Markov matrices.
18.1. Fibonacci numbers. In the year 1202, the Italian mathematician Fibonacci pub-
lished a book in which he dened a certain sequence of numbers. The so-called Fibonacci
numbers are dened by the relation
F
n
= F
n1
+F
n2
,
and F
0
= 1 and F
1
= 1. This sequence is
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . .
We can use diagonalization to write a closed form equation for the n
th
Fibonacci number.
If we put two values in a vector,
F
n1
F
n2
,
then we can write
1 1
1 0
F
n1
F
n2
F
n1
+F
n2
F
n1
F
n
F
n1
.
Thus,
1 1
1 0
F
1
F
0
F
n+1
F
n
1 1
1 0
n
,
we could easily compute the n
th
Fibonacci number. If we diagonalize the matrix
1 1
1 0
,
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.
24 Outline
then we can easily compute its n
th
power. We will do this in class, and come up with the
formula
F
n
=
1
1 +
5
2
n+1
5
2
n+1
.
18.2. Markov matrices. We will briey discuss Markov matrices. This is a place where
linear algebra can be applied. A Markov matrix represents a scenario where certain outcome
occur with various probabilities, and these probabilities change over time. In some cases,
a steady-state is approached, and we will see exactly when. We will use eigenvectors and
eigenvalues to determine what that steady-state is.
c Tara S. Holm, MMI. May be distributed as per http://opencontent.org.