Linear Algebra Study Guide
Linear Algebra Study Guide
Study Guide
Eduardo Corona Other Authors as they Join In
November 2, 2008
Contents
1 Vector Spaces and Matrix Operations 2
2 Linear Operators 2
3 Diagonalizable Operators 3
3.1 The Rayleigh Quotient and the Min-Max Theorem . . . . . . . . 4
3.2 Gershgorins Discs Theorem . . . . . . . . . . . . . . . . . . . . . 4
4 Hilbert Space Theory: Interior Product, Orthogonal Projection
and Adjoint Operators 5
4.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 The Gram-Schmidt Process and QR Factorization . . . . . . . . 8
4.3 Riesz Representation Theorem and The Adjoint Operator . . . . 9
5 Normal and Self-Adjoint Operators: Spectral Theorems and
Related Results 11
5.1 Unitary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Positive Operators and Square Roots . . . . . . . . . . . . . . . . 15
6 Singular Value Decomposition and the Moore-Penrose Gener-
alized Inverse 16
6.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 16
6.2 The Moore-Penrose Generalized Inverse . . . . . . . . . . . . . . 18
6.3 The Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . 19
7 Matrix Norms and Low Rank Approximation 19
7.1 The Frobenius Norm . . . . . . . . . . . . . . . . . . . . . . . . 19
7.2 Operator Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.3 Low Rank Matrix Approximation: . . . . . . . . . . . . . . . . . 21
1
8 Generalized Eigenvalues, the Jordan Canonical Form and e
.
22
8.1 The Generalized Eigenspace 1
X
. . . . . . . . . . . . . . . . . . . 23
8.2 A method to compute the Jordan Form: The points diagram . . 24
8.3 Applications: Matrix Powers and Power Series . . . . . . . . . . . 24
9 Nilpotent Operators 24
10 Other Important Matrix Factorizations 24
11 Other Topics (which appear in past exams) 24
12 Yet more Topics I can think of 25
1 Vector Spaces and Matrix Operations
2 Linear Operators
Denition 1 Let l,\ be vector spaces ,1 (Usually 1 = R or C). Then
1(l, \ ) = T : l \ [ T is linear. In particular, 1(l, l) = 1(l) is the
space of linear operators of l, and 1(l, 1) = l
+
is its algebraic dual.
Denition 2 Important Subspaces: Given \ _ \ (subspace), T
1
(\) _
l. In particular, we are interested in T
1
(0) = 1cr(T). Also, if o _ l, then
T(o) _ l. We are most interested in T(l) = 1a:(T).
Theorem 3 l, \ c,1, dim(l) = :, dim(\ ) = :. Given 1 = n
1
, ..., n
n
basis of l and 1
t
=
1
, ...,
n
basis of \, to each T 1(l, \ ) we can associate
a matrix [T]
1
0
1
such that:
Tn
I
= a
1I
1
+... +a
nI
n
\i 1, .., :
[T]
1
0
1
= (a
I
) in '
nn
(1)
T
l \
1 | | 1
t
1
n
1
n
[T]
1
0
1
Conversely, given a matrix '
nn
(1), there is a unique T
.
1(l, \ ) such
that = [T]
1
0
1
.
Proposition 4 Given T 1(l), there exist basis 1 and 1
t
of l such that:
[T]
1
0
1
=
_
1 0
0 0
_
1 is constructed as an extension for a basis for 1cr(T), and 1
t
as an extension
for T(n)
u1
.
2
Theorem 5 (Rank and Nullity) dim(l) = dim(1cr(T))+dim(1a:(T)). i(T) =
dim(1cr(T)) is known as the nullity of T, and r(T) = dim(1a:(T)) as the rank
of T.
Change of Basis: l, \ c,1, 1 and basis of l, 1
t
,
t
basis of \, there
exists 1 invertible such that:
[T]
1
0
1
= 1[T]
1
1
1 is a matrix that performs a change of coordinates. This means that, if two
matrices are similar, they represent the same linear operator using a dierent
basis. This further justies that key properties of matrices are preserved under
similarity.
3 Diagonalizable Operators
If l = \ (T,is a linear operator), it is natural to impose that both basis 1 and
1
t
are also the same. In this case, it is no longer generally true that we can
nd a basis 1 such that the corresponding matrix is diagonal. However, if there
exists a basis 1 such that [T]
1
0
1
= diagonal matrix, we say T is diagonalizable.
Denition 6 \ c,1, T 1(\ ). ` 1 is an eigenvalue of T if \ a
nonzero vector such that T = `. All nonzero vectors such that this holds are
known as eigenvectors of T.
We can immediately derive, from this denition, that the existence of the
eigenpair (`, ) (eigenvalue ` and corresponding eigenvector ) is equivalent to
the existence of a nonzero solution to
(T `1) = 0
This in turn tells us that the eigenvalues of T are those such that the operator
T `1 is not invertible. After selecting a basis 1 for \, this also means: u
det([T]
1
1
`1) = 0
Which is called the characteristic equation of T. We notice this equation
does not depend on the choice of basis 1, since it is invariant under similarity:
det(11
1
`1) = det(1(`1)1
1
) = det(`1)
This equation nally is equivalent to nding the complex roots of a polyno-
mial in `. We know this to be a really hard problem for : _ 5, and a numerically
ill-posed problem at that.
3
Denition 7 \ c,1, T 1(\ ), ` an eigenvector of T. Then 1
X
= \ [
T = ` is the eigenspace for `.
Theorem 8 \ c,1 of nite dimension, T 1(\ ). The following are equiva-
lent:
i) T is diagonalizable
ii) \ has a basis of eigenvectors of T
iii) There exist subspaces \
1
, ..., \
n
such that dim(\
I
) = 1, T(\
I
) \
I
and
\ =
n
I=1
\
I
iv) \ =
|
I=1
1
X
k
with `
1
, ..., `
|
eigenvalues of T
v)
|
I=1
dim(1
Xi
) = dim(\ )
Proposition 9 \ c,C, T 1(\ ) then T has at least one eigenvalue (this is a
corollary of the Fundamental Theorem of Algebra, applied to the characteristic
equation).
Theorem 10 (Schurs factorization) \ c,C, T 1(\ ). There always exists
a basis 1 such that [T]
1
is upper triangular.
3.1 The Rayleigh Quotient and the Min-Max Theorem
3.2 Gershgorins Discs Theorem
Although calculating eigenvalues of a big matrix is a very dicult problem
(computationally and analytically), it is very easy to come up with regions on
the complex plane where all the eigenvalues of a particular operator T must
lie. This technique was rst devised by the russian mathematician Semyon
Aranovich Gershgorin (1901 1933):
Theorem 11 (Gershgorin, 1931) Let = (a
I
) '
n
(C). For each i 1, .., :,
we dene the it/ "radius of " as r
I
() =
,=I
[a
I
[ and the it/ Gershgorin
disc as
1
I
() = . C [ [. a
II
[ < r
I
()
Then, if we dene () = ` [ ` is an eigenvalue of , it follows that:
()
n
_
I=1
1
I
()
That is, all eigenvalues of must lie inside one or more Gershgorin discs.
4
Proof. Let ` be an eigenvalue of , an associated eigenvector. We x i as the
it/ coordinate of with maximum modulus, that is, [
I
[ _ [
|
[ \/. Necessarily,
[
I
[ , = 0. Then,
= ` ==
`
I
=
a
I
(` a
II
)
I
=
,=I
a
I
[(` a
II
)[ [
I
[ _
,=I
[a
I
[ [
I
[
` 1
I
()
Now, we know that represents a linear operator T 1(R
n
), and that
therefore its eigenvalues are invariant under transposition of and under simi-
larity. Therefore:
Corollary 12 Let = (a
I
) '
n
(C). Then, ()
n
_
I=1
1
I
(11
1
) [ 1
is invertible
Of course, if T is diagonalizable, one of this 1
t
: is the one such that 11
1
is diagonal, and therefore the Gershgorin discs degenerate to the : points we are
looking for. However, if we dont want to compute the eigenvalues, we can still
use this to come up with a ne heuristic to reduce the region given by the union
of the gershgorin discs: We can use permutation matrices or diagonal matrices
as our 1
t
: to get a "reasonable region". This result also hints at the fact that,
if we perturb a matrix , the eigenvalues change continuously.
The Gershgorin disc theorem is also a quick way to prove is invertible if it
is diagonal dominant, and it also provides us with results when the eigenvalues
of are all distinct (namely, that there must be at least one eigenvalue per
Gershgorin disc).
4 Hilbert Space Theory: Interior Product, Or-
thogonal Projection and Adjoint Operators
Denition 13 Let \ c,1. An interior product on \ is a function <, : \
\ 1 such that:
1) n +, n = n, n +, n \n, , n \
2) cn, n = cn, \n, \ , c 1
3) n, = , n
4) n, n _ 0 and n, n = 0 == n = 0
5
By denition, every interior product induces a natural norm for \ , given by
||
\
=
_
,
Denition 14 We say n and are orthogonal, or nl, if n, = 0
Some important identities:
1. Pythagoras Theorem: nl == |n +|
2
= |n|
2
+||
2
2. Cauchy-Bunyakowski-Schwarz: [n, [ _ |n| || \n, \ with
equality == n = c
3. Parallelogram: |n +|
2
+|n |
2
= 2(|n|
2
+||
2
) \n, \
4. Polarization:
n, =
1
4
|n +|
2
|n |
2
\n, \ if 1 = R
n, =
1
4
4
|=1
i
|
_
_
n +i
|
_
_
2
\n, \ if 1 = C
In fact, identities 3 and 4 (Parallelogram and Polarization) give us both
necessary and sucient conditions for a norm to be induced by some
interior product. In this fashion, we can prove ||
1
and ||
o
are not induced
by an interior product by showing paralellogram fails.
Denition 15 \ is said to be of unit norm if || = 1
Denition 16 A subset o \ is said to be orthogonal if the elements in o are
mutually orthogonal (perpendicular)
Denition 17 o is orthonormal if it is orthogonal and its elements are of unit
norm
If o is orthogonal, then it is automatically LI (Linearly Independent). Intu-
itively, we can think of orthogonal vectors as vectors which do not "cast a shade"
on each other, and therefore point to completel exclusive directions. We have
the following property: if o =
1
, ...,
n
is orthogonal, then for all :ja:(o):
= c
1
1
+... +c
n
n
c
I
=
,
I
I
,
I
\i
Thus, we can obtain the coecient for each element of o independently, by
computing the interior product with the corresponding
I
. Furthermore, if o is
orthonormal:
c
I
= ,
I
\i
And these coecients are also called the abstract Fourier coecients.
6
Theorem 18 (Bessels Inequality) Let
1
, ...,
n
orthonormal set, \ .
Then:
n
I=1
[,
I
[
2
_ ||
With equality == :ja:(
I
n
I=1
).
4.1 Orthogonal Projection
This last result suggests that, for an orthogonal set o, in order to retrieve the
component going "in the it/ direction", we only need to compute
u,ui)
ui,ui)
I
. This
is in fact a projection of our vector in the direction of vector
I
, or the "shadow"
that casts on the direction of vector
I
. We shall dene this more generally,
and see that we can dene projection operators which give us the component of
a vector in a given subspace of \ :
Denition 19 o \. We dene the orthogonal complement o
J
= \ [
, : = 0 \: o. If o _ \ (closed subspace), then oo
J
= \ and (o
J
)
J
= o
(always true for nite dimension)
Denition 20 Let \ _ \. Then we dene 1
V
1(\ ) such that, if =
V
+
V
?, then 1
V
() =
V
. We can also dene this operator by its action on
a suitable basis of \ : If we take 1
V
= n
1
, .., n
and 1
V
? = n
1
, ..., n
n
basis of \ and \
J
, then 1 = 1
V
' 1
V
?:
1
V
(n
I
) = n
I
\i 1, ..., j
1
V
(n
) = 0 \, 1, ..., : j
[1
V
]
1
=
_
1
0
0 0
_
From this, a myriad of properties for 1 can be deduced:
1. 1
2
V
= 1
V
: This follows easily from the fact that 1
V
n = n \n \
2. 1a:(1
V
) = \ and 1cr(1
V
) = \
J
3. 1
V
\
J
\ \ : we can deduce this directly from the deni-
tion, or compute the interior product with any member of \. Also, this
follows from the diagram one can draw in R
2
or R
3
: If we remove the
"shadow" cast by a vector, all that is left is the orthogonal component.
This additonally tells us that:
1
V
? = 1 1
V
7
4. | 1
V
| _ | n| \n \: This is a very strong result: it tells us the
orthogonal projection is the best approximation to through vectors in \.
This is a key result which justies the use of the projection in applications
such as least squares, polynomial interpolation and approximation, fourier
series, etc. In fact, this result can be extended to projection on convex
sets in Hilbert spaces.
5. |1
V
| _ || \ \: This tells us the projection is a contraction. In
particular, we know that |1| = 1, since there are vectors (namely, those
in \) for which equality holds.
6. 1
V
n, = n, 1
V
\n, \ (1
V
is "self-adjoint"). This can be
proved explicitly using the unique decomposition of n and as the sum of
components in \ and \
J
. In particular, this also tells us that the matrix
which represents 1
V
is symmetric / self-adjoint as well if we choose a basis
of orthonormal vectors.
7. It can be shown that properties (1) and (4), (1) and (5) or (1) and (6)
completely characterize the orthogonal projection. That is, from these
properties alone we can deduce the rest, and that the operator 1 has to
be a projection onto its range.
4.2 The Gram-Schmidt Process and QR Factorization
We can ask ourselves if, for any basis in \ there exists a procedure to turn it
into an orthonormal basis. The Gram-Schmidt Process does exactly this, and
as a by-product it also gives us a very useful matrix factorization, QR:
Theorem 21 (Gram-Schmidt) If n
I
n
I=1
is a linearly independent set, there
exists an orthonormal set n
I
n
I=1
such that :ja:(n
I
n
I=1
) = :ja:(n
I
n
I=1
). It
can be constructed through the following process:
_
_
_
1
= n
1
,
|
= n
|
|1
=1
n
|
,
= 1
son(]ui]
k1
i=1
)
?
(n
|
)
_
_
_
_
n
1
=
1
|
1
|
, n
|
=
|
|
|
|
_
Furthermore, by completing n
I
n
I=1
to a full basis of \ (if : _ :), we can
always obtain an orthonormal basis of \ following this process.
Theorem 22 (QR Factorization) Let be a matrix of full column rank,
with columns n
I
n
I=1
. Then , by applying Gram-Schmidt to the columns of
(augmenting them to obtain a full basis if : < :), we obtain the following:
n
|
= |
|
| n
|
+
|1
=1
n
|
, n
\/
8
If we write this in matrix form, where Q is a matrix with columns n
I
n
I=1
(by
denition, this is an orthogonal / unitary matrix) and 1
||
= |
|
| , 1
|
=
n
|
, n
0 |
2
| n
n
, n
2
0 0
.
.
.
.
.
.
0 0 0 |
n
|
0 0 0 0
_
_
_
_
_
_
_
This is, = (Q
1
[ Q
2
)
_
11
0
_
= Q
1
1
1
, where Q
1
has the same column space as
.
This factorization is very useful both to solve linear systems of equations
(there are ecient ways to compute Q1, namely the Householder algorithm and
other sparse or incomplete Q1 routines) because, once computed, the system
r = / is equivalent to solving:
1r = Q
+
/
Which can be rapidly solved through backward or forward substitution (since 1
is upper triangular). Also, Q1 factorization is extensively used to obtain easier
formulas for certain matrix products that appear in applications such as OLS
and smooth splines.
A relevant result regarding this matrix factorization is that, although it is
not unique in general, if we have that = Q
1
1
1
= Q
2
1
2
, then it can be shown
that 1 = 1
2
1
1
1
is a diagonal, unitary matrix.
4.3 Riesz Representation Theorem and The Adjoint Op-
erator
For any linear operator T 1(\, \), we can obtain a related operator T
+
1(\, \ ) called the adjoint operator, which has very interesting properties. This
operator becomes even more relevant for applications to vector spaces of innite
dimension. This operator is dened as follows:
Denition 23 Let T 1(\, \). Then, the adjoint operator T
+
1(\, \ ) is
dened by the following functional relation:
T, n
V
= , T
+
n
\
\ \ , n \
If we choose orthonormal basis 1 and 1
t
for \ and \, then the matrix that
represents the adjoint is the conjugate transpose of the matrix that represents
. We get:
r, j
R
m = r,
+
j
R
n \r R
n
, j R
n
Where = [T]
1
0
1
and
+
= [T
+
]
1
1
0 = ([T]
1
0
1
)
+
.
9
The existence and uniqueness of this operator is given by the Riesz Repre-
sentation Theorem for Hilbert Spaces:
Theorem 24 (Riesz Representation) Let \ :,1 a Hilbert space, and T
1(\, 1) a continuous, linear functional (element of the topological dual). Then,
! \ such that:
Tn = n, \n \
Therefore, the Adjoint operator is always well dened by the functional
relation we have outlined, parting from the linear functional 1
u
= T, n
V
xing each n \.
Remark 25 Here is a quick application of the adjoint operator and the orthog-
onal projection operator: Let '
nn
(1), and r = / a system of linear
equations. Then the least squares solution to this system is given by the solution
to:
rrrrrrrrrrrrr
0
= 1
1on(.)
(/)
Since we are projecting / onto the column space of , and we know this is the
best approximation we can have using linear combinations of the columns of .
Using properties of the projection operator, we now know that:
r, / 1
1on(.)
(/)
_
= 0 \r
r, / r
0
= 0
Now, using the adjoint of , we nd:
r,
+
/
+
r
0
= 0 \r
So, this means
+
/ =
+
r
0
, and therefore, if
+
is invertible, this means
r
0
= (
+
)
1
+
/
j = r
0
= (
+
)
1
+
/
Incidentally, this also tells us that the projection matrix of a vector onto the
column space of a matrix is given by 1
1on(.)
= (
+
)
1
+
.
Properties of the Adjoint: (T 1(\, \))
1. (T +o)
+
= T
+
+o
+
2. (cT)
+
= cT
+
3. (T
+
)
+
= T
4. 1
+
\
= 1
\
(the identity is self-adjoint)
5. (oT)
+
= T
+
o
+
10
6. 1 and 1
t
orthonormal basis of \ and \ == then [T
+
]
1
1
0 = ([T]
1
0
1
)
+
(be careful, this is an == statement)
The most important property of the adjoint, however, provides us with an
explicit relation between the kernel and the image of T and T
+
. These relations
can be deduced directly from the denition, and provide us with comprehensive
tools to study spaces \ and \.
Theorem 26 ("Fundamental Theorem of Linear Algebra II") Let \, \
nite dimentional Hilbert spaces, T 1(\, \). Then:
1cr(T
+
) = 1a:(T)
J
1a:(T
+
) = 1cr(T)
J
Thus, we can always write \ = 1cr(T)1a:(T
+
) and \ = 1cr(T
+
)1a:(T).
Proof. (1cr(T
+
) = 1a:(T)
J
): Let 1cr(T
+
), and any Tn 1a:(T), then
1a:(T)
J
== Tn, = 0 \n == n, T
+
= 0 \n == 1cr(T
+
).
The proof of the second statement can be found replacing T and T
+
above.
A couple of results that follow from this one are:
1. T is injective == T
+
is onto
2. 1cr(T
+
T) = 1cr(T), and thus, r(T
+
T) = r(T) = r(T
+
) (rank).
5 Normal and Self-Adjoint Operators: Spectral
Theorems and Related Results
Depending on the eld 1 we are working with, we can obtain "eld sensitive"
theorems that characterize diagonalizable operators. In particular, we are in-
terested in the cases where the eld is either R or C. This discussion will also
yield important results on isometric, unitary and positive operators.
Denition 27 T 1(\ ) is said to be a self-adjoint operator if T = T
+
. If
1 = R, this is equivalent to saying [T]
1
is symmetric, and if 1 = C, that [T]
1
is Hermitian (equal to its conjugate transpose).
Denition 28 T 1(\ ) is said to be normal if it commutes with its adjoint,
that is, if TT
+
= T
+
T.
Remark 29 If 1 = R, then an operator T is normal and it is not self-adjoint
== \1 orthogonal basis of \ , [T]
1
is a block diagonal matrix, with blocks of
size 1 and blocks of size 2 which are multiples of rotation matrices.
First, we introduce a couple of interesting results on self-adjoint and normal
operators:
11
Proposition 30 T 1(\ ), 1 = C then there exist unique self-adjoint opera-
tors T
1
and T
2
such that T = T
1
+iT
2
. T is then self-adjoint == T
2
= 0, and
is normal == T
1
T
2
= T
2
T
1
. These operators are given by:
T
1
=
T +T
+
2
, T
2
=
T T
+
2i
Proposition 31 If T 1(\ ) is normal, then 1cr(T) = 1cr(T
+
) and 1a:(T) =
1a:(T
+
).
The most important properties of these families of operators, however, have
to do with the spectral information we can retrieve:
Proposition 32 T 1(\ ) and self-adjoint and 1 = C. If ` is an eigenvalue
of T, then ` R
Proof. For n eigenvector, we have:
`n, n = n, Tn = Tn, n = `n, n
` = `
Proposition 33 T 1(\ ) and 1 = C. T is self-adjoint == T, R
\ \
Proof. Using both self-adjoint and properties of the interior product:
T, = , T = T, \ \
This, in particular tells us the Rayleigh quotient for such an operator is
always real, and we can also rederive last proposition.
Proposition 34 If T 1(\ ) is a normal operator,
i) |T| = |T
+
| \ \
ii) T 1 is normal \ C
iii) is an eigenvector of T == is an eigenvector of T
+
Proof. (i): T, T = , T
+
T = , TT
+
= T
+
, T
+
(ii): (T 1)
+
(T 1) = T
+
T 1(T +T
+
) +
2
1 = TT
+
1(T
+
+T) +
2
1 =
(T 1)(T 1)
+
(iii): (T `1) = 0 == |(T `1)
+
| = 0 (by i and ii) == (T `1)
+
= 0
12
Theorem 35 (Spectral Theorem, 1 = C Version) \ :,C nite dimen-
tional Hilbert space. T 1(\ ). \ has an orthonormal basis of eigenvectors of
T == T is normal.
Proof. (==) By Schurs factorization, there exists a basis of \ such that [T]
1
is
upper triangular. By Gram-Schmidt, I can turn this basis into an orthonormal
one Q, and by studying the Q1 factorization, we realize the resulting matrix
is still upper triangular. However, since the basis is now orthonormal, and
T is normal, it follows that [T]
Q
is a normal, upper triangular matrix. This
necessarily implies [T]
S
is diagonal (We can see this by computing the products
of the o-diagonal elements, and concluding they have to be zero in order for
this to hold).
(==) If this is the case, then we have Q an orthonormal basis and diagonal
such that [T]
Q
= . Since a diagonal matrix is always normal, it follows that T
is a normal operator.
Theorem 36 (Spectral Theorem, 1 = R Version) \ :,R nite dimen-
tional Hilbert space. T 1(\ ). \ has an orthonormal basis of eigenvectors of
T == T is self-adjoint.
Proof. We follow the proof for the complex case, noting that, since 1 = R, both
Schurs factorization and Gram-Schmidt will yield matrices with real entries.
Finally a diagonal matrix with real entries is always self-adjoint (since this only
means that it is symmetric). We can also apply the theorem for the complex
case and use the properties for self-adjoint operators.
In any case, we then have the following powerful properties:
1. \ =
|
I=1
1
Xi
and (1
Xi
)
J
=
,=I
1
Xj
\i
2. If we denote 1
= 1
J
j
, then 1
I
1
= c
I
3. (Spectral Resolution of Identity)
1
\
=
|
I=1
1
I
4. (Spectral Resolution of T)
T =
|
I=1
`
I
1
I
These properties characterize all diagonalizable operators on nite dimen-
sional Hilbert spaces. Some important results that follow from this are:
13
Theorem 37 (Cayley-Hamilton) T 1(\ ), \ nite dimensional Hilbert space.
If j is the characteristic polynomial of T, then j(T) = 0.
Theorem 38 \ :,C and T 1(\ ) normal, then j C[r] polynomial such
that j(T) = T
+
. This polynomial can be found by the Lagrange Interpolation
problem j(`
I
) = `
I
|
I=1
We also have the following properties for T normal, which we can now deduce
using the spectral decomposition of T. These properties basically tell us that, if
T is normal, we can operate it almost as if it were a number through its spectral
representation.
1. If is a polynomial, then (T) =
|
I=1
(`
I
)1
I
2. If T
r _ 0 \r 1
n
Remark 45 If 1 = C, we can remove the assumption that T is self-adjoint.
Remark 46 The operators T
+
T and TT
+
are always positive. In fact, it can
be shown any positive operator T is of the form oo
+
. This is a general version
of the famous Cholesky factorization for symmetric positive denite matrices.
Proposition 47 T is a positive operator == T is self-adjoint and all its
eigenvalues are real and non-negative.
Some properties of positive operators:
1. T, l 1(\ ) positive operators, then T +l is positive
2. T 1(\ ) is positive == cT is positive \c _ 0
3. T 1(\ ) is positive and invertible == T
1
is positive
4. T 1(\ ) positive == T
2
is positive (the converse is false in general)
5. T, l 1(\ ) positive operators, then Tl = lT implies Tl is positive.
Here we use heavily that Tl = lT implies there is a basis of vectors
which are simultaneously eigenvectors of T and l.
Denition 48 T 1(\ ). Then we say o is a square root of T if o
2
= T.
We note that, in general, the square root is not unique, For example, the
identity has an innite number of square roots: permutations, reections and
rotations by 180
and n
1
, .., n
n
as columns, and if is the matrix in '
nn
(1) with all zeros
16
except for
II
= o
I
for i _ r, then:
= l\
+
=
_
_
[ [ [
n
1
n
n
[ [ [
_
_
_
_
_
_
_
_
o
1
0 0
0
.
.
. 0 0
.
.
.
.
.
. o
:
0
0 0 0
n:
_
_
_
_
_
_
_
_
_
+
1
.
.
.
+
n
_
_
_
This factorization is known as the Singular Value Decomposition, or SVD fac-
torization of .
We know that, for the system of equations r = /, the best approximation
is given by the solution of
+
r =
+
/. By using the SVD, we can always
compute the solution with minimum norm:
Given an SVD for , = l\
+
, we have the following:
|r /| = |l\
+
r /|
= |\
+
r Q
+
/|
Since l is a unitary matrix. Therefore, all we need is to minimize |j c| ,
and then solve for r = \ j, c = l
+
/. However, it is clear that:
|j c|
2
=
:
I=1
[o
I
j
I
c
I
[
2
+
n
I=:+1
[c
I
[
2
Which is minimized precisely when j
I
=
ci
ci
for i _ r and its minimum value
is
n
I=:+1
[c
I
[
2
. If we want the j with minimum norm, all we have to do is to
make the rest of its coordinates zero.
Now, solving for r, if we dene:
=
_
_
_
_
_
_
1
c1
0 0
0
.
.
. 0 0
.
.
.
.
.
.
1
cr
0
0 0 0
n:
_
_
_
_
_
_
Then the solution for this problem is given by:
r = \ j = \
c = (\
l
+
)/
From the properties of Least Squares, and this last formula, we already know
that the matrix (\
l
+
) does the following:
1. If / 1a:(), (the system is consistent) then it gives us the solution
to r = / with minimum norm. For any r R
n
, we know we can write
r = 1
1t:(.)
r +1
1t:(T)
?r. Since (1
1t:(.)
r) = 0, then (\
l
+
)/ is the
unique solution in 1cr()
J
.
17
2. If / , 1a:(), (the system is inconsistent) then it projects / in Co(),
and then gives us the unique solution to r = 1
1on(.)
/ in 1cr()
J
.
3. We can also deduce this from the fact that, from the construction of
the SVD, the Fundamental Theorem of Linear Algebra II, and
+
=
\
+
l
+
, that
1
, ...,
:
is a basis for 1cr()
J
,
:+1
, ...,
n
for 1cr(),
n
1
, ..., n
:
for 1a:() and n
:+1
, ..., n
n
for 1a:(T)
J
.
6.2 The Moore-Penrose Generalized Inverse
Theorem 51 (Moore-Penrose Generalized Inverse) Given \, \ vs,1 Hilbert
spaces, T 1(\, \) with rank r. There exists a unique linear operator, which
we call the Moore-Penrose Generalized Inverse (or pseudoinverse, for short)
T
[
1on(T)
= o
1
As an extension of this inverse, it has the following properties:
T
T = 1
1t:(T)
?
TT
= 1
1on(T)
Finally, if we have an o\ 1 decomposition of T, the pseudoinverse T
can be
computed as:
T
; , _ r
T
= 0 ; , r
In matrix form, if = [T]
I
\
,
= [T]
\
I
, then:
= \
l
+
The following properties can be obtained for the SVD and the Pseudoinverse:
1. Let '
nn
(C). Then, ,
+
and
)
+
= (
+
)
and (
= (
+
=
= l
l
+
Where
+
. Given an SVD
= l\
+
, then \ = l\
+
and 1 = \ \
+
.
Proof. = l\
+
= (l\
+
)(\ \
+
) = \1. As a product of unitary matrices,
\ is unitary, and the positivity of 1 follows from the fact that is diagonal
with non-negative entries.
Some useful results that follow from this decomposition are:
1. = \1 is normal == \1
2
= 1
2
\
2. Using that a positive matrix has a unique square root, we can use the
previous result to conclude = \1 is normal == \1 = 1\
3. = \1, then det(1) = [det [ and det(\) = c
I arg(det(.))
.
The Polar decomposition, which can be extended to linear operators in in-
nite dimensions, basically tells us that we can view any linear operator as the
composition of a partial isometry and a positive operator.
7 Matrix Norms and Low Rank Approximation
Matrices are very versatile: they can be seen as rearranged vectors in R
nn
, we
can identify a group of matrices with some mathematical object, or we can just
take them as members of the vector space of linear transformations from 1
n
to
1
n
. In any case, it is very useful to have the notion of what a matrix norm is.
7.1 The Frobenius Norm
If we consider matrices as members of R
nn
, it is then natural to endow them
with the usual euclidean norm and interior product:
||
J
=
I,
a
2
I
, 1
J
=
I,
a
I
/
I
19
Or equivalently, we can write:
||
J
=
_
tr(
+
)
, 1
J
= tr(1
+
)
In any case, we conclude the space ('
nr
(1), ||
J
) is a Hilbert space. This
norm has the following properties:
1. |r|
J
m _ ||
J
|r|
J
n (Lipschitz condition). In particular, this condi-
tion tells us any linear operator in '
nr
(1) is continuous.
2. and 1 such that 1 makes sense, then |1|
J
_ ||
J
|1|
J
3. Given an SVD = l\
+
, ||
2
J
= tr(\
+
\
+
) = tr(
+
) =
:
I=1
o
2
I
4. Given diagonalizable, we can reinterpret the spectral decomposition
as follows: = QQ
+
=
n
I=1
`
I
+
I
=
n
I=1
`
I
7
I
, where 7
I
n
I=1
is
an orthonormal set in ('
nn
(1), ||
J
). Also, given an SVD of
'
nn
(1), = l\
+
=
:
I=1
o
I
(l
\
+
) =
:
I=1
o
I
7
.
5. (Pythagoras Theorem) 1a:()l1a:(1) == l1 in the Frobenius
inner product, and |+1|
2
J
= ||
2
J
+|1|
2
J
(not true for general matrix
norms)
6. (Pseudoinverse, revisited)
1
_
_
2
=
1
cr
. In general, we have
_
_
_
_
2
=
1
cr
.
We also have a result for the 1 and norms:
||
1
= max
=1,..,n
I=1
[a
I
[ (maximum of ||
1
norm of columns)
||
o
= max
I=1,..,n
=1
[a
I
[ (maximum of ||
1
norm of rows)
We observe that ||
1
= |
+
|
o
.
7.3 Low Rank Matrix Approximation:
We have now seen that the SVD provides us with tools to compute matrix
norms and to derive results of the vector space of matrices of a certain size. In
a way that is completely analogous to the theory of general Hilbert spaces, this
leads us to a theory of matrix approximation, by eliminating the singular values
that are not signicant and therefore producing an approximation of that has
21
lower rank. This can immediately be seen as truncating a "Fourier Series" of
, using the orthonormal basis that is suggested by our SVD:
Let = l\
+
=
:
I=1
o
I
(l
I
\
+
I
) =
:
I=1
o
I
7
II
, where 7
I
= l
I
\
+
is the orthonormal basis of '
nn
(1) (with respect to the Frobenius interior
product) as before. Then, it becomes evident that:
o
I
= , 7
II
J
That is, the o
I
(and all of the entries in ) are the Fourier coecients of using
this particular orthonormal basis. We notice that, since the 7
I
are exterior
products of two vectors, it follows that ra:/(7
I
) = 1 \i, ,.
Now, it is often the case that will be "noisy", either because it represents
the pixels in a blurred image, or because it is a transformation that involves some
noise. However, as in other instances of ltering or approximation schemes, we
expect the noise to be of "high frequency", or equivalently, we know that the
signal to noise ratio decreases in proportion to o
I
. Therefore, by truncating the
series after a certain o
|
, the action of remains almost intact, but we often get
rid of a signicant amount of "noise". Also, using results of abstract Fourier
series, we can derive the fact that this is the best approximation to of rank /
in the Frobenius norm, that is:
|
=
|
I=1
o
I
7
II
|
|
|
2
J
=
|
I=1
o
2
I
= min
:on|(1)=|
|1|
2
J
:
I=|+1
o
2
I
:
I=1
o
2
I
2. The matrix
|
is the result of / succesive approximations to , each of
rank 1.
3.
|
is also an optimal approximation under the ||
2
norm, with minimum
value o
|+1
.
8 Generalized Eigenvalues, the Jordan Canoni-
cal Form and e
|
I=1
and it is either the case that the
algebraic and the geometric multiplicity of each `
I
coincide (T is diagonalizable)
or that, for some `, q:(`) < a:(`). In the latter case, the problem is that the
eigenspaces fail to span the entire space. We can then consider powers of the
operator (T `1), and since 1cr((T `1)
n
) 1cr((T `1)
n+1
) \: (the
space "grows"), we can dene the following:
1
X
= \ : (T `1)
n
= 0 for some : N
These generalized eigenspaces have the following properties:
1. 1
X
1
X
\` eigenvalue of T (by denition)
2. 1
X
_ \ and is invariant under the action of T: T(1
X
) 1
X
3. If dim(\ ) < then dim(1
X
) = a:(`)
4. 1
X1
1
X2
= ? for `
1
,= `
2
Theorem 54 (Generalized Eigenvector Decomposition) Let \ be a vector
space over C, T 1(\ ). Then if `
I
|
I=1
are the eigenvalues of T and the
characteristic polynomial j(`) splits in the eld 1,
\ =
|
I=1
1
Xi
Proof.
Theorem 55 (Jordan Canonical Form Theorem) Under the conditions of
the generalized eigenvector decomposition, there exists a basis 1 such that [T]
1
is block diagonal, and its blocks are Jordan blocks, that is:
[T]
1
=
_
_
_
_
_
J
1
0 0
0 J
2
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 J
|
_
_
_
_
_
23
Where each J
I
is a Jordan Canonical Form. A Jordan Canonical Form in turn
is also a block diagonal matrix, composed of Jordan blocks, which are matrices
of the form:
J
()
Xi
=
_
_
_
_
_
_
`
I
1 0
0 `
I
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 1
0 0 0 `
I
_
_
_
_
_
_
And the number of blocks in J
I
coincides with the geometric multiplicity of `
I
(dim(1
Xi
)). Also, the maximum size of these blocks is the rst : for which
1cr((T `1)
n
) = 1cr((T `1)
n+1
) = 1
Xi
.
Proof.
8.2 A method to compute the Jordan Form: The points
diagram
8.3 Applications: Matrix Powers and Power Series
9 Nilpotent Operators
10 Other Important Matrix Factorizations
1. 11l Factorization
2. Cholesky
11 Other Topics (which appear in past exams)
1. Limits with Matrices
2. Symplectic Matrices
3. Perron Frobenius and the Theory of Matrices with Positive Entries.
4. Markov Chains
5. Graph Adjacency Matrices and the Graph Laplacian. Dijikstra and Floyd
Warshall.
6. Matrices of rank / and the Sherman-Morrison-Woodbury formula (for the
inverse of rank / updates of a matrix)
24
12 Yet more Topics I can think of
1. Symmetric Positive Semidenite Matrices and the Variance Covariance
Matrix
2. Krylov Subspaces: CG, GMRES and Lanczos Algorithms
3. Toeplitz and Wavelet Matrices
4. Polynomial Interpolation
25