Kevin Zhou Notes

Lecture Notes on
Undergraduate Math
Kevin Zhou
kzhou7@gmail.com
These notes are a review of the basic undergraduate math curriculum, focusing on the content most
relevant for physics. The primary sources were:
• Oxford’s Mathematics lecture notes, particularly notes on M2 Analysis, M1 Groups, A2 Metric

Spaces, A3 Rings and Modules, A5 Topology, and ASO Groups. The notes by Richard Earl
are particularly clear and written in a modular form.
• Rudin, Principles of Mathematical Analysis. The canonical introduction to real analysis; terse
but complete. Presents many results in the general setting of metric spaces rather than R.
• Ablowitz and Fokas, Complex Variables. Quickly covers the core material of complex analysis,
then introduces many practical tools; indispensable for an applied mathematician.
• Artin, Algebra. A good general algebra textbook that interweaves linear algebra and focuses
on nontrivial, concrete examples such as crystallography and quadratic number fields.
• David Skinner’s lecture notes on Methods. Provides a general undergraduate introduction to

mathematical methods in physics, a bit more careful with mathematical details than typical.
• Munkres, Topology. A clear, if somewhat dry introduction to point-set topology. Also includes
a bit of algebraic topology, focusing on the fundamental group.
• Renteln, Manifolds, Tensors, and Forms. A textbook on differential geometry and algebraic
topology for physicists. Very clean and terse, with many good exercises.
Some sections are quite brief, and are intended as a telegraphic review of results rather than a full
exposition. The most recent version is here; please report any errors found to kzhou7@gmail.com.
Contents
1 Metric Spaces 1
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Real Analysis 11
2.1 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Complex Analysis 25
3.1 Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Multivalued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Contour Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Application to Real Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Conformal Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Linear Algebra 45
4.1 Exact Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 The Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Endomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Groups 53
5.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Group Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4 Composition Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Semidirect Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Rings 69
6.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Quotient Rings and Field Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5 The Structure Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Point-Set Topology 71
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 Closed Sets and Limit Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.4 The Product Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.5 The Metric Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 Algebraic Topology 79
8.1 Constructing Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2 The Fundamental Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.3 Group Presentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.4 Covering Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9 Methods for ODEs 80

9.1 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.2 Eigenfunction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.3 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.5 Variational Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10 Methods for PDEs 94

10.1 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.3 The Method of Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.4 Green’s Functions for PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11 Approximation Methods 110

11.1 Asymptotic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.2 Asymptotic Evaluation of Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
11.3 Matched Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
11.4 Multiple Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
11.5 WKB Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
1 1. Metric Spaces
1 Metric Spaces
1.1 Definitions
We begin with some basic definitions. Throughout, we let E be a subset of a fixed set X.
• A set X is a metric space if it is has a distance function d(p, q) which is positive definite (except
for d(p, p) = 0), symmetric, and satisfies the triangle inequality.
• A neighborhood of p is the set Nr (p) of all q with d(p, q) < r for some radius r > 0.
Others define a neighborhood as any set that contains one of these neighborhoods, which are
instead called “the open ball of radius r about p”. This is equivalent for proofs; the important
part is that neighborhoods always contain points “arbitrarily close” to p.
• A point p is a limit point of E if every neighborhood of p contains a point q =

6 p in E. If p is
not a limit point but is in E, then p is an isolated point.
• E is closed if every limit point of E is in E. Intuitively, this means E “contains all its edges”.
The closure E of E is the union of E and the set of its limit points.
• A point p is an interior point of E if there is a neighborhood N of p such that N ⊂ E. Note

that interior points must be in E itself, while limit points need not be.
• E is open if every point of E is an interior point of E. Intuitively, E “doesn’t have edges”.
• E is bounded if there exists M and q so that d(p, q) < M for all p ∈ E.
• E is dense in X if every point of X is a limit point of E or a point of E, or both.
• The interior E 0 of E is the set of all interior points of E, or equivalently the union of all open
sets contained in E.
Example. We give some simple examples in R with the usual metric.
• Finite subsets of R cannot have any limit points or interior points, so they are trivially closed
and not open.
• The set (0, 1]. The limit points are [0, 1], so the set is not closed. The interior points are (0, 1),
so the set is not open.
• The set of points 1/n for n ∈ Z. The single limit point is 0, so the set is not closed.
• All points. This set is trivially open and closed.
• The interval [1, 2] in the restricted space [1, 2] ∪ [3, 4]. This is both open and closed. Generally,
this happens when a set contains “all of a connected component”.
As seen from the last example above, whether a set is closed or open depends on the space, so if we
wanted to be precise, we would say “closed in X” rather than just “closed”.
Example. There are many examples of metrics besides the usual one.
2 1. Metric Spaces
• For any set S, we may define the discrete metric

(
0 x = y,
d(x, y) =
1 x=6 y.
Note that in this case, the closed ball of radius 1 about p is not the closure of the open ball of
radius 1 about p.
• A metric on a vector space can be defined from an inner product, which can in turn be defined
from a norm. (However, a norm does not necessarily give a valid inner product.) For example,
for continuous functions f : [a, b] → R we have the inner product
Z b
hf, gi = f (t)g(t) dt
a
p
which gives the norm kf k = hf, f i and the metric
s
Z b
d2 (f, g) = kf − gk = (f (t) − g(t))2 dt.
a
• Alternatively, we could use the metric
d∞ (f, g) = sup |f (x) − g(x)|.

x∈[a,b]
These are both special cases of a range of metrics.
We now consider some fundamental properties of open and closed sets.
• E is open if and only if its complement E c is closed.

Heuristically, this proof works because open and closed are ‘for all’ and ‘there exists’ properties,
and taking the complement swaps them. Specifically, if q is an interior point of E, then E
contains all points arbitrarily close to q. But if q is a limit point of E c , there exist points
arbitrarily close to q that are in E c . Only one of these can be true, giving the result.
• Arbitrary unions of open sets are open, because interior points stay interior points when we
add more points. By taking the complement, arbitrary intersections of closed sets are closed.
• Finite intersections of open sets are open, because we can take intersections of the relevant
neighborhoods. This breaks down for infinite intersections because the neighborhoods can
shrink down to nothing, e.g. let En = (−1/n, 1/n)). By taking the complement, finite unions
of closed sets are closed. Infinite unions don’t work because they can create new limit points.
Prop. The closure E is the smallest closed set containing E.

Proof. The idea behind the proof of closure is that all limit points of E must be limit points of E.
Formally, let p be a limit point of E. Then any neighborhood N ⊃ p contains some q ∈ E. Since
neighborhoods are open, N must contain a neighborhood N 0 of q, which then must contain some
element of q. Thus p is a limit point of E.
To see that E is the smallest possibility, note that adding more points never subtracts limit
points. Therefore any closed F ⊃ E must contain all the limit points of E.
3 1. Metric Spaces
Prop. For Y ⊂ X, the open sets E of Y are precisely Y ∩ G for open sets G of X.
Proof. If G is open, then moving to the smaller space Y will keep it open. Now consider the
converse. Starting with E ⊂ Y , we construct G by taking the union of all neighborhoods (in X) of
points in E. Then G is an open set of X because it is the union of open sets. Moreover, E = Y ∩ G
because E is open.
Note. Topological spaces further abstract by throwing away the metric but retaining the structure
of the open sets. A topological space is a set X along with a set T of subsets of X, called the open
sets of X, such that T is closed under all unions and finite intersections, and contains both X itself
and the null set. The closed sets are defined as the complements of open sets. The rest of our
definitions hold as before, if we think of a neighborhood of a point x as any open set containing x.
For a subspace Y ⊂ X, we use the above proposition in reverse, defining the open sets in Y by
those in X. The resulting topology is called the subspace topology.
Note. An isometry between two metric spaces X and Y is a bijection that preserves the metric.
However, topological properties only depend on the open set structure, so we define a homeomor-
phism to be a bijection that is continuous with a continuous inverse; this ensures that it induces
a bijection between the topologies of X and Y . As we’ll see below, many important properties
such as continuity depend only on the topology, so we are motivated to find topological invariants,
properties preserved by homeomorphisms, to classify spaces.
1.2 Compactness
Compactness is a property that generalizes “finiteness” or “smallness”. Though its definition is
somewhat unintuitive, it turns out to be quite useful.
• An open cover of a set E in a metric space X is a set of open sets Gi of X so that their union
contains E. For example, one open cover of E could be the set of all neighborhoods of radius r
of every point in E.
• K is compact if every open cover of K contains a finite subcover. For example, all finite sets
are compact. Since we only made reference to the open sets, not the metric, compactness is a
topological invariant.
• Let K ⊂ Y ⊂ X. Then K is compact in X iff it is compact in Y , so we can refer to compactness

as an absolute property, independent of the containing space.
Proof: essentially, this is because we can transfer open covers of K in Y and of K in X back
and forth, using the above theorem. Thus if we can pick a finite subcover in one, we can pick
the analogous subcover in the other.
• All compact sets are closed. Intuitively, consider the interval (0, 1/2) in R. Then the open cover
(1/n, 1) has no finite subcover; we can get ‘closer and closer’ to the open boundary.
Proof: let K ⊂ X be compact; we will show K c is open. Fixing p ∈ K c , define the open cover
consisting of the balls with radius d(p, q)/2 for all q ∈ K. Consider a finite subcover and let
dmin be the minimum radius of any ball in it. Then there is a neighborhood of radius dmin /2 of
p containing no points of K.
4 1. Metric Spaces
• All compact subsets of a metric space are bounded. This follows by taking an open cover
consisting of larger and larger balls.
• Closed subsets of compact sets are compact.

Proof: let F ⊂ K ⊂ X with F closed and K compact. Take an open cover of F , and add F c
to get an open cover of K. Then a finite subcover of K yields a finite subcover of F .
• Intersections of compact sets are compact. This follows from the previous two results.
Note. The overall intuition found above is that compactness is a notion of ‘smallness’. An open
boundary is not ‘small’ because it is essentially the same as a boundary at infinity, from the
standpoint of open covers. We see that compactness is useful for proofs because the finiteness of a
subcover allows us to take least or greatest elements; we show some more examples of this below.
Example. Let K be a compact metric space. Then for any > 0, there exists an N so that every
set of N distinct points in K includes at least two points with distance less than between them.
To show this, consider the open cover consisting of all neighborhoods of radius /2. Then there’s
a finite open subcover, with M elements, centered at points pi . For N > M , we are done by the
pigeonhole principle.
Example. Let K be a compact metric space. Then K has a subset that is dense and at most
countable. To prove this, consider the open cover of all neighborhoods of radius 1. Take a finite
subcover centered at a set of points P1 . Then points in P1 are within a distance of 1 from any
S
point in K. Next construct P2 using radius 1/2, and so on. Then P = n Pn is dense and at most
countable.
Lemma. All k-cells in Rk are compact.
Proof. This is the key lemma that uses special properties of R. For simplicity, we consider the
case k = 1, showing that all intervals [a, b] are compact. Let U be an open cover of [a, b] and define
W = {x ∈ [a, b] : finite subcover of U exists for [a, x]}, c = sup(W ).
First we show c ∈ W . Let c ∈ U ∈ U. Since U is open, it includes (c − δ, c + δ) for some δ > 0. On

the other hand, by the definition of the supremum there must be some element w ∈ W inside this
range. Then we have an finite subcover of [a, c] by taking U along with the finite subcover for [a, w].
Next, by a similar argument, if x ∈ W and x < b, there must be δ > 0 so that x + δ ∈ W . Hence
we have a contradiction unless c = b, giving the result. The generalization to arbitrary k is similar.
Note that we used the least upper bound property of R by assuming c ∈ R.
Theorem (Heine-Borel). For any E ⊂ Rk , E is closed and bounded if and only if it is compact.
Proof. We have already shown the reverse direction above. For the forward direction, note that if
E is bounded it is a subset of a k-cell, and closed subsets of compact spaces are compact.
5 1. Metric Spaces
1.3 Sequences
We begin by defining convergence of a sequence.
• A sequence (pn ) in a metric space X converges to a point p ∈ X if, for every > 0, there is an
integer N so that if n ≥ N , then d(pn , p) < . This may also be written
lim pn = p.
n→∞
If (pn ) doesn’t converge, it diverges. Note that convergence depends on X. If X is “missing”

the right point, then an otherwise convergent sequence may diverge.
• More generally, in the context of a topological space, a sequence (pn ) converges to p ∈ X iff
every neighborhood of p contains all but finitely many of the pn .
• Sequences can only converge to one point; this is proven by considering neighborhoods of radius
/2 and using the triangle inequality.
• If a sequence converges, it must be bounded. This is because only finitely many points lie
outside any given neighborhood of the limit point p, and finite sets are bounded.
• If E ⊂ X and p is a limit point of E, then there is a sequence (pn ) in E that converges to p.

Conversely, a convergent sequence with range in E converges to a point in E.
• A topological space is sequentially compact if every infinite sequence has a convergent subse-
quence, and compactness implies sequential compactness.
To see this, let (xk ) be a sequence and let
Xn = {xk : k > n}, Un = M \ Xn .
Assuming the space is not sequentially compact, the intersection of all the Xn is empty, so the
Un are an open cover with no finite subcover, so the space is not compact.
• It can be shown that sequential compactness in a metric space implies compactness, though
this does not hold for a general topological space.
Example. Consider the set of bounded real sequences `∞ . The unit cube
C = {(xk ) : |xk | ≤ 1}
is closed and bounded, but it is not compact, because the sequence
e1 = (1, 0, . . .), e2 = (0, 1, 0, . . .), e3 = (0, 0, 1, 0, . . .)
has no convergent subsequence.
Next, we specialize to Euclidean space, recovering some familiar results.
• Bounded monotonic sequences of real numbers converge to their least upper/greatest lower
bounds, essentially by definition.
6 1. Metric Spaces
• If (sn ) converges to s and (tn ) converges to t, then
(sn + tn ) → s + t (csn ) → cs (c + sn ) → c + s (sn tn ) → st (1/sn ) → 1/s (if sn 6= 0).
The proofs are easy except for the last two, where we must work to bound the error. For the
fourth, we can factor
sn tn − st = (sn − s)(tn − t) + s(tn − t) + t(sn − s)

√
To get an O() error on the left, we must use a error for the first term.
• If sn ≤ tn for all n, then s ≤ t. To prove it, consider (tn − sn ). The range is bounded below by
0, so the closure of the range can’t contain any negative numbers.
• All of the above works similarly

√ for vectors in Rk , and limits can be taken componentwise;
the proof is to just use / k. In particular, xn · yn → x · y, since the components are just
multiplications.
• Since compactness implies sequential compactness, we have the Bolzano-Weierstrass theorem,

which states that every bounded sequence in Rk has a convergent subsequence.
Next, we introduce Cauchy sequences. They are useful because they allow us to say some things
about convergence without specifying a limit.
• A sequence (pn ) in a metric space X is Cauchy if, for every > 0 there is an integer N such
that d(pn , pm ) < if n, m ≥ N . Note that unlike regular convergence, this definition depends
on the metric structure.
• The diameter of E is the supremum of the set of distances d(p, q) with p, q ∈ E. Then (pn ) is
Cauchy iff the limit of the diameters dn of the sequences pn , pn+1 , . . . is zero.
• All convergent sequences are Cauchy, because if we get within /2 of the limit, then the points
themselves are within of each other.
• A metric space in which every Cauchy sequence converges is complete; intuitively, these spaces
have no ‘missing limit points’. Moreover, every closed subset E of a complete metric space X
is complete, since Cauchy sequences in E are also Cauchy sequences in X.
• Compact metric spaces are complete. This is because compactness implies sequential compact-
ness, and a convergent subsequence of a Cauchy sequence is sufficient to guaranteed the Cauchy
sequence is convergent.
• The space Rk is complete, because all Cauchy sequences are bounded, and hence inside a k-cell.
Since k-cells are compact, we can apply the previous fact. The completeness of R is one of its
most important properties, and it is what suits it for doing calculus better than Q.
• Completeness is not a topological invariant, because (0, 1) is not complete while R is; it depends
on the details of the metric. However, the property of “complete metrizability” is a topological
invariant; a topological space is completely metrizable if there exists a metric while yields the
topology under which the space is complete.
Finally, we introduce some convenient notation for limits.

7 1. Metric Spaces
• For a real sequence, we write sn → ∞ if, for every real M , there is an integer N so that every
term after sn is at least M . We’ll now count ±∞ as a possible subsequential limit.
• Denote E as the set of subsequential limits of a real sequence (sn ), and write
s∗ = lim sup sn = sup E, s∗ = lim inf sn = inf E.

n→∞ n→∞
It can be shown that E is closed, so it contains s∗ and s∗ . The sequence converges iff s∗ = s∗ .
Example. For a sequence containing all rationals in arbitrary order, every real number is a sub-
sequential limit, so s∗ = ∞ and s∗ = −∞. For the sequence ak = (−1)k (k + 1)/k, we have s∗ = 1
and s∗ = −1.
The notation we’ve defined above will be useful for analyzing series. For example, a series might
contain several geometric subsequences; for convergence, we care about the one with the largest
ratio, which can be extracted with lim sup.
1.4 Series
P
Given a sequence (an ), we say the sum of the series n an , if it exists, is
lim sn , sn = a1 + . . . + an .
n→∞
That is, the sum of a series is the limit of its partial sums. We quickly review convergence tests.
• Cauchy convergence test: since R is complete, we can replace convergence with the Cauchy
P
property. Then n an converges iff, for every > 0, there is an integer N so that for all
m ≥ n ≥ N , |an + . . . am | ≤ .
P
• Limit test: taking m = n, the above becomes |an | ≤ , which means that if n an converges,
than an → 0. This is a much weaker version of the above.
• A monotonic sequence converges iff it’s bounded. Then a series of nonnegative terms converges
iff the partial sums form a bounded sequence.
P P
• Comparison test: if |an | < cn for n > N0 for a fixed N0 , and n cn converges, then n an
converges. We prove this by plugging directly into the Cauchy criterion.
• Divergence test: taking the contrapositive, we can prove a series diverges if we can bound it
from below by a divergent series.
• Geometric series: for 0 ≤ x < 1, n xn = 1/(1 − x), so the series an = xn converges. To prove
P
this, write the partial sums using the geometric series formula, then take the limit explicitly.
P P n
• Cauchy condensation test: let a1 ≥ a2 ≥ . . . ≥ 0. Then n an converges iff 2 a2n does. This
surprisingly implies that we only need a small number of the terms to determine convergence.
Proof: since the terms are all nonnegative, convergence is equivalent to the partial sums being
bounded. Now group the terms an two different ways:
k
X
k
a1 + (a2 + a3 ) + . . . + (a2k + . . . + a2k+1 −1 ) ≤ a1 + 2a2 + . . . + 2 a2k = 2n a2k ,
1
8 1. Metric Spaces
k
1 1X n
a1 + a2 + (a3 + a4 ) + . . . + (a2k−1 +1 + . . . + a2k ) ≥ a1 + a2 + 2a4 + . . . = 2 a2k .
2 2
1
Then the sequences of partial sums of an and 2n a2n
are within a constant multiple of each other,
P
so each converges iff the other does. As an application, n 1/n diverges.
Next, we apply our basic tests to more specific situations.

p
P
• p-series: n 1/n converges iff p > 1.
Proof: for p ≤ 0, the terms don’t go to zero. Otherwise, apply Cauchy condensation, giving the
k
series k 2k /2kp = k 2(1−p) , and use the geometric series test.
P P
P
• Ratio test: let 6 0. Then the series converges if α = lim supn→∞ |an+1 /an | < 1.
an have an =
The proof is simply by comparison to a geometric series.
p P
• Root test: let α = lim supn→∞ n |an |. Then n an converges if α < 1 and diverges if α > 1.
The proof is similar to the ratio test: for sufficiently large n, we can bound the terms by a
geometric series.
• Dirichlet’s theorem: let An = nk=0 ak and Bn = nk=0 bk . Then if (An ) is bounded and (bn )
P P
P
is monotonically decreasing with limn→∞ bn = 0, then an bn converges.
Proof: we use ‘summation by parts’,
q
X q−1
X
an bn = An (bn − bn+1 ) + Aq bq − Ap−1 bp .
n=p n=p
The result follows immediately by the comparison test.
• Alternating series test: if |ci | ≥ |ci+1 | and limn→∞ cn = 0 and the ci alternate in sign, then
P
n cn converges.
Proof: this is a special case of Dirichlet’s theorem; it also follows from the Cauchy criterion.
Example. The series n>0 1/(n(log n)p ) converges iff p > 1, by the Cauchy condensation test.
P
The general principle is that Cauchy condensation can be used to remove a layer of logarithms, or
convert a p-series to a geometric series.
Example. Tricking the ratio test. Consider the series that alternates between 3−n and 2−n . Then
half of the ratios are large, so the ratio test is inconclusive. However, the root test works, giving
α = 1/2 < 1. Essentially, the two tests do the same thing, but the root test is more powerful
because it doesn’t just look at ‘local’ information.
Note. The ratio and root test come from the geometric series test, which in turn comes from the
limit test. That is, fundamentally, they aren’t doing anything deeper than seeing if the terms blow
up. The only stronger tools we have are the Cauchy condensation test, which gives us the p-series
test, and Dirichlet’s theorem.
Example. The Fourier p-series is defined as
∞
X cos(kx)
.
kp
k=1
9 1. Metric Spaces
By comparison with p-series, it converges for p > 1. For 0 ≤ p ≤ 1, use the Dirichlet theorem with
an = cos nx and bn = 1/np . Using geometric series, we can show that (An ) is bounded as long as x
is not a multiple of 2π, giving convergence.
Next, we extend to the complex numbers and consider power series; note that our previous results
continue to work when the absolute value is replaced by the complex norm. Given a sequence (cn )
of complex numbers, the series n cn z n is called a power series.
P
p
Theorem. Let α = lim supn→∞ n |cn |. Then the power series n cn z n converges when |z| < R
P
and diverges when |z| > R, where R = 1/α is called the radius of convergence.
Proof. Immediate by the root test.
Example. We now give some example applications of the theorem.

• The series n z n has R = 1. If |z| = 1, the series diverges by the limit test.
P
• The series n z n /n has R = 1. We’ve already shown it diverges if z = 1. However, it converges

P
for all other z on the boundary, as this is just a variant of the Fourier p-series.
• The series n z n /n2 has R = 1 and converges for all z on the boundary by the p-series test.
P
• The series n z n /n! has R = ∞ by the ratio test.

P
As stated earlier, divergence of power series is not subtle; the terms become unbounded.
P P
We say the series n an converges absolutely if n |an | converges.
• Many properties that intuitively hold for convergent series really require absolute convergence;
often the absolute values appear from a triangle-inequality argument.
P P
• All absolutely convergent series are convergent because | ai | ≤ |ai | by triangle inequality.
• Power series are absolutely convergent within their radius of convergence, because the root test
only considers absolute values.
P P P
Prop. Let n an = A and n bn = B with n an converging absolutely. Then the product series
P
n cn defined by
X n
cn = ak bn−k
k=0
converges to AB. This definition is motivated by multiplication of power series.
Proof. Let βn = Bn − B. We’ll pull out the terms we want from Cn , plus an error term,
Cn = a0 b0 + . . . + (a0 bn + . . . + an b0 ) = a0 Bn + . . . + an B0 .
Pulling out An B gives
Cn = An B + a0 βn + . . . + an β0 ≡ An B + γn .
P
We want to show that γn → 0. Let α = n |an |. For some > 0, choose N so that |βn | ≤ for all
n ≥ N . Then separate the error term into
γn ≤ |β0 an + . . . βN an−N | + |βN +1 an−N +1 + . . . + βn a0 |
The first term goes to zero as n → ∞, and the second is bounded by α. Since was arbitrary,
we’re done.
10 1. Metric Spaces
Note. Series that converge but not absolutely are conditionally convergent. The Riemann rear-
rangement theorem states that for such series, the terms can always be reordered to approach any
desired limit; the idea is to take just enough positive terms to get over it, then enough negative
terms to get under it, and alternate.
11 2. Real Analysis
2 Real Analysis
2.1 Continuity
We begin by defining limits in the metric spaces X and Y .
• Let f map E ⊂ X into Y , and let p be a limit point of E. Then we write
lim f (x) = q
x→p
if, for every > 0 there is a δ > 0 such that for all x ∈ E, with 0 < dX (x, p) < δ, we have
dY (f (x), q) < . We also write f (x) → q as x → p.
• This definition is completely indifferent to f (p) itself, which could even be undefined.
• In terms of sequences, an equivalent definition of limits is that
lim f (pn ) = q
n→∞
for every sequence (pn ) ∈ E so that pn 6= p and limn→∞ pn = p.
• By the same proofs as for sequences, limits are unique, and in R they add/multiply/divide as
expected.
We now use this limit definition to define continuity.
• We say that f is continuous at p if
lim f (x) = f (p).

x→p
In the case where p is not a limit point of the domain E, we say f is continuous at p. If f is
continuous at all points of E, then we say f is continuous on E.
• None of our definitions care about E c , so we’ll implicitly restrict X to the domain E for all
future statements.
• If f maps X into Y , and g maps range F ⊂ Y into Z, and f is continuous at p and g is

continuous at f (p), then g ◦ f is continuous at p. We prove by using the definition twice.
• Continuity for functions f : R → R is preserved under arithmetic operations the way we expect,
by the results above. The function f (x) = x is continuous, as we can choose δ = . Hence poly-
nomials and rational functions are continuous. The absolute value function is also continuous;
we can choose δ = by the triangle inequality. This can be generalized to functions from R to
Rk , which are continuous iff all the components are.
Now we connect continuity to topology. Note that if we were dealing with a topological space rather
than a metric space, the following condition would be used to define continuity.
Theorem. A map f : X → Y is continuous on X iff f −1 (V ) is open in X for all open sets V in Y .

12 2. Real Analysis
Proof. The key idea is that every point of an open set is an interior point. Assume f is continuous
on X, and let p ∈ f −1 (V ) and q = f (p). The continuity condition states that
f (Nδ (p)) ⊂ N (q)
for some δ, given any . Choosing so that N (q) ⊂ V , this shows that p is an interior point of
f −1 (V ), giving the result. The converse is similar.
Corollary. If f is continuous, then f −1 takes closed sets to closed sets; this follows from taking
the complement of the previous theorem.
Corollary. A function f is continuous if, for every subset S ⊂ X, we have f (S) ⊂ f (S). This
follows from the previous corollary, and exhibits the intuitive notion that continuous functions keep
nearby points together.
Example. Using the definition of continuity, it is easy to show that the circle x2 + y 2 = 1 is closed,
because this is the inverse image of the closed set {1} under the continuous function f (x, y) = x2 +y 2 .
Similarly, the region x2 + xy + y 2 < 1 is open, and so on. In general continuity is one of the most
practical ways to show that a set is open or closed.
We now relate continuity to compactness.
• Let f : X → Y be continuous on X. Then if X is compact, f (X) is compact.

Proof: take an open cover {Vα } of f (X). Then {f −1 (Vα )} is an open cover of X. Picking a
finite subcover and applying f gives a finite subcover of f (X).
• EVT: let f be a continuous real function on a compact metric space X, and let
M = sup f (p), m = inf f (p).
p∈X p∈X
Then there exist points p, q ∈ X so that f (p) = M and f (q) = m.

Proof: let E = f (X). Then E is compact, so closed and bounded. By the definition of sup and
inf, we know that M and m are limit points of E. Since E is closed, E must contain them.
• Compactness is required for the EVT because it rules out asymptotes (e.g. 1/x on (0, ∞)). This
is another realization of the ‘smallness’ compactness guarantees.
Next, we relate continuity to connectedness, another topological property.
• A metric space X is disconnected if it may be written as X = A ∪ B where A and B are disjoint,

nonempty, open subsets of X. We say X is connected if it is not disconnected. Since it depends
only on the open set structure, connectedness is a topological invariant.
• The interval [a, b] is connected. To show this, note that disconnectedness is equivalent to the
existence of a closed and open, nonempty proper subset. Let C be such a subset and let a ∈ C
without loss of generality. Define
W = {x ∈ [a, b] : [a, x] ⊂ C}, c = sup W.
Then c ∈ [a, b], which is the crucial step that does not work for Q. We know for any > 0 there
exists x ∈ W so that x ∈ (c − , c], which implies [a, c − ] ⊂ C. Since C is closed, this implies
c ∈ W . On the other hand, if x ∈ C and x < b, then since C is open, there exists an > 0 so
that x + ∈ C. Hence if c < b, we have a contradiction, so we must have c = b and [a, b] = C.
13 2. Real Analysis
• More generally, the connected subsets of R are the intervals, while almost every subset of Q is
disconnected.
• Let f : X → Y be continuous and one-to-one on a compact metric space X. Then f −1 is

continuous on Y .
Proof: let V be open in X. Then V C is compact, so f (V C ) is compact and hence closed in Y .
Since f is one-to-one, f (V C ) = f (V )C , so f (V ) is open, giving the result.
• Let f : X → Y be continuous on X. Then if E ⊂ X is connected, so is f (E). This is proved

directly from the definition of connectedness.
• IVT: let f be a continuous real function defined on [a, b]. Then if f (a) < f (b) and c ∈ [f (a), f (b)],
then there exists a point x ∈ (a, b) such that f (x) = c. This follows immediately from the above
fact, because intervals are connected.
• A set S ⊂ Rn is path-connected if, given any a, b ∈ S there is a continuous map γ : [0, 1] → S

such that γ(0) = a and γ(1) = b.
• Path connectedness implies connectedness. To see this, note that connectedness of S is equivalent
to all continuous functions f : S → Z being constant. Now consider the map f ◦ γ : [0, 1] → Z
for any continuous f . It is continuous, and its domain is connected, so its value is constant and
f (γ(0)) = f (γ(1)). Then f (a) = f (b) for all a, b ∈ S.
• All open connected subsets of Rn are path connected. However, in general connected sets are
not necessarily path connected. The standard example is the Topologist’s sine curve
X = A ∪ B, A = {(x, sin(1/x)) : x > 0}, B = {(0, y) : y ∈ R}.
The two path components are A and B.
Now we define a stronger form of continuity that’ll come in handy later.
• We say f : X → Y is uniformly continuous on X if, for every > 0, there exists δ > 0 so that
dX (p, q) < δ implies dY (f (p), f (q)) <
for all p, q ∈ X. That is, we can use the same δ for every point. For example, 1/x is continuous
but not uniformly continuous on (0, ∞) because it gets arbitrarily steep.
• A function f : X → Y is Lipschitz continuous if there exists a constant K > 0 so that
dY (f (p), f (q)) ≤ KdX (p, q).
Lipschitz continuity implies uniform continuity, by choosing δ = /2K, and can be an easy way
to establish uniform continuity.
• Let f : X → Y be continuous on X. Then if X is compact, f is uniformly continuous on X.

Proof: for a given , let δp be a corresponding δ to show continuity at the point p. The set
of neighborhoods Nδp (p) form an open cover of X. Take a finite subcover and let δmin be the
minimum δp used. Then a multiple of δmin works for uniform continuity.
14 2. Real Analysis
Example. The metric spaces [0, 1] and [0, 1) are not homeomorphic. Suppose that h : [0, 1] → [0, 1)
is such a homeomorphism. Then the map
1
1 − h(x)
is a continuous, unbounded function on [0, 1], which contradicts the IVT.
2.2 Differentiation
In this section we define derivatives for functions on the real line; the situation is more complicated
in higher dimensions.
• Let f be defined on [a, b]. Then for x ∈ [a, b], define the derivative
f (t) − f (x)
f 0 (x) = lim
t→x t−x
If f 0 is defined at a point/set, we say f is differentiable at that point/set.
• Note that our definition defines differentiability at all x that are limit points of the domain of
f , and hence includes the endpoints a and b. In more general applications, though, we’ll prefer
to talk about differentiability only on open sets, where we can ‘approach from all directions’.
• Differentiability implies continuity, because

f (t) − f (x)
f (t) − f (x) = · (t − x)
t−x
and taking the limit x → t gives zero.
• The linearity of the derivative and the product rule can be derived by manipulating the difference
quotient. For example, if h = f g, then
h(t) − h(x) f (t)(g(t) − g(x)) + g(x)(f (t) − f (x))
=
t−x t−x
which gives the product rule.
• By the definition, the derivative of 1 is 0 and the derivative of x is 1. Using the above rules
gives the power rule, (d/dx)(xn ) = nxn−1 .
• Chain Rule: suppose f is continuous on [a, b], f 0 (x) exists at some point x ∈ [a, b], g is defined on
an interval I that contains the range of f , and g is differentiable at f (x). Then if h(t) = g(f (t)),
then
h0 (x) = g 0 (f (x))f 0 (x)
To prove this, we isolate the error terms,
f (t) − f (x) = (t − x)(f 0 (x) + u(t)), g(s) − g(y) = (s − y)(g 0 (y) + v(s)).
By definition, u(t) → 0 as t → x and v(s) → 0 as s → f (x). Now the total error is
h(t) − h(x) = g(f (t)) − g(f (x)) = (t − x) (f 0 (x) + u(t)) (g 0 (f (t))) + v(f (x))).
Thus by appropriate choices of we have the result; note that we need continuity of f to ensure
that f (t) → f (x).
15 2. Real Analysis
• Inverse Rule: if f has a differentiable inverse f −1 , then

d −1 1
f (x) = 0 −1
dx f (f (x))
This can be derived by applying the chain rule to f ◦ f −1 .
We now introduce the generalized mean value theorem, which is extremely useful in proofs.
• We say a function f : X → R has a local maximum at p if there exists δ > 0 so that f (q) ≤ f (p)
for all q ∈ X with d(p, q) ∈ δ.
• Given a function f : [a, b] → R, if f has a local maximum at x ∈ (a, b) and f 0 (x) exists, then
f 0 (x) = 0.
Proof: sequences approaching from the right give f 0 (x) ≤ 0, because the difference quotient is
nonnegative once we get within δ of x. Similarly, sequences from the left give f 0 (x) ≥ 0.
• Some sources define a “critical point” as a point x where f 0 (x) = 0, f 0 (x) doesn’t exist, or x is
an endpoint of the domain. The point of this definition is that these critical points are all the
points that could have local extrema.
• Rolle: if f is continuous on [a, b] and differentiable on (a, b), and f (a) = f (b), then there is a
point x ∈ (a, b) so that f 0 (x) = 0.
Proof: if f is constant, we’re done. Otherwise, suppose f (t) > f (a) for some t ∈ (a, b). Then
by the EVT, there is an x ∈ (a, b) that achieves the maximum, which means f 0 (x) = 0. If f (a)
is the maximum, we do the same reasoning with the minimum.
• Generalized MVT: if f and g are continuous real functions on [a, b] which are differentiable in
(a, b), then there is a point x ∈ (a, b) such that
[f (b) − f (a)] g 0 (x) = [g(b) − g(a)] f 0 (x)
Proof: apply Rolle’s theorem to
h(t) = [f (b) − f (a)] g 0 (t) − [g(b) − g(a)] f 0 (t).
• Intuitively, if we consider the curve parametrized by (f (t), g(t)), the generalized MVT states
that some tangent line to the curve is parallel to the line connecting the endpoints.
• MVT: setting g(x) = x in the generalized MVT, there is a point x ∈ (a, b) so that
f (b) − f (a) = (b − a)f 0 (x).
• One use of the MVT is that it allows us to connect the derivative at a point, which is local,
with function values on a finite interval. For example, we can use it to show that if f 0 (x) ≥ 0,
then f is monotonically increasing.
• The MVT doesn’t apply for vector valued functions, as there’s too much ‘freedom in direction’.
The closest thing we have is the bound
|f (b) − f (a)| ≤ (b − a)|f 0 (x)|
for all x ∈ (a, b).

16 2. Real Analysis
Theorem (L’Hospital). Let f and g be real and differentiable in (a, b) with g 0 (x) 6= 0 for all
x ∈ (a, b). Suppose f 0 (x)/g 0 (x) → A as x → a. Then if f (x) → 0 and g(x) → 0 as x → a, or
g(x) → ∞ as x → a, then f (x)/g(x) → A as x → a.
Theorem (Taylor). Suppose f is a real function on [a, b], f (n−1) is continuous on [a, b], and f (n) (t)
exists for all t ∈ (a, b). Let α and β be distinct points in [a, b], and let
n−1
X f (k) (α)
P (t) = (t − α)k
k!
k=0
Then there exists a point x ∈ (α, β) so that
f (n) (x)
f (β) = P (β) + (β − α)n
n!
This bounds the error of a polynomial approximation in terms of the maximum value of f (n) (x).
Proof. Applying the MVT, let M be the number such that
f (β) = P (β) + M (β − α)n
and define the function

g(t) = f (t) − P (t) − M (t − α)n .
By construction, g satisfies the properties
g(α) = g 0 (α) = . . . = g (n−1) (α) = 0, g(β) = 0, g (n) (t) = f (n) (t) − n!M.
Then we wish to show that g (n) (t) = 0 for some t ∈ (α, β). Applying Rolle’s theorem gives a point
x1 ∈ (α, β) where g 0 (x1 ) = 0. Repeating this for g 0 on the interval (x1 , β) gives a point x2 where
g 00 (x2 ) = 0, and so on, giving the result.
Corollary. Under the same conditions as above, we have
f (x) = P (x) + (x)(x − α)n−1
where (x) → 0 as x → α.
2.3 Integration
In this section, we define integration over intervals on the real line.
• A partition P of the interval [a, b] is a finite set of points x0 , . . . , xn with
a = x0 ≤ x1 ≤ . . . ≤ xn−1 ≤ xn = b.
We write ∆xi = xi − xi−1 .
• Let f be a bounded real function defined on [a, b]. Then for a partition P , define
Mi = sup f (x), mi = inf f (x)

[xi−1 ,xi ] [xi−1 ,xi ]
and X X
U (P, f ) = Mi ∆xi , L(P, f ) = mi ∆xi .
17 2. Real Analysis
• Define the upper and lower Riemann integrals as

Z b Z b
f dx = inf U (P, f ), f dx = sup L(P, f )
a a
where the inf and sup are taken over all partitions P . These quantities are always defined if
f is bounded, because this implies that Mi and mi are bounded, which implies the upper and
lower integrals are. Conversely, our notion of integration doesn’t make sense if f isn’t bounded,
though we’ll find a way to accommodate this later.
• If the upper and lower integrals are Requal, we say f is Riemann-integrable on [a, b], write f ∈ R,
b
and denote their common value as a f dx.
• Given a monotonically increasing function α on [a, b], define

X X
∆αi = α(xi ) − α(xi−1 ), U (P, f, α) = Mi ∆αi , L(P, f, α) = mi ∆αi
i i
and the upper and lower integrals analogously. If they are the same, weR say f is integrable
b
with respect to α, write f ∈ R(α), and denote their common value as a f dα. This is the
Riemann-Stieltjes integral, with the Riemann integral as the special case α(x) = x.
Next, we find the conditions for integrability. Below, we always let f be real and bounded, and α
be monotonically increasing, on the interval [a, b].
• P ∗ is a refinement of P if P ∗ ⊃ P (i.e. we only split existing intervals into smaller ones). Given
two partitions P1 and P2 , their common refinement is P ∗ = P1 ∪ P2 .
• Refining a partition increases L and decreases U . This is clear by considering a refinement that
adds exactly one extra interval.
• The lower integral is not greater than the upper integral. For any two partitions P1 and P2 ,
L(P1 , f, α) ≤ L(P ∗ , f, α) ≤ U (P ∗ , f, α) ≤ U (P2 , f, α).
Taking sup over P1 and inf over P2 on both sides of this inequality gives the result.
• Therefore, f ∈ R(α) on [a, b] iff, for every > 0, there exists a partition so that
U (P, f, α) − L(P, f, α) < .
This follows immediately from the previous point, and will serve as a useful criterion for
integrability: we seek to construct partitions that give us an arbitrarily small ‘error’ .
• If U (P, f, α) − L(P, f, α) < , then we have

X
|f (si ) − f (ti )|∆αi <
i
where si and ti are arbitrary points in [xi−1 , xi ]. Moreover, if the integral exists,
X Z b

f (ti )∆αi − f dα < .
i a
18 2. Real Analysis
We can use these basic results to prove integrability theorems. We write ∆α = α(b) − α(a).
• If f is continuous on [a, b], then f ∈ R(α) on [a, b].

Proof: since [a, b] is compact, f is uniformly continuous. Then for any > 0, there is a δ > 0
so that |x − t| < δ implies |f (x) − f (t)| < . Choosing a partition with ∆xi < δ, the difference
between the upper and lower integrals is at most ∆α, and taking to zero gives the result.
• If f is monotonic on [a, b] and α is continuous on [a, b], then f ∈ R(α).

Proof: by the IVT, we can choose a partition so that ∆αi = ∆α/n. By telescoping the sum,
the error is bounded by (∆α/n)(f (b) − f (a)). Taking n to infinity gives the result.
• If f is bounded on [a, b] and has only finitely many points of discontinuity, none of which are
also points of discontinuity of α, then f ∈ R(α).
Proof: choose a partition so that each point of discontinuity is in the interior of a segment
[ui , vi ], where these segments’ ∆αi values add up to . Then f is continuous on the compact
set [a, b] \ [ui , vi ], so applying the previous theorem gives an O() error.
The segments with discontinuities contribute at most 2M , where M = sup |f (x)| is finite.
Then the overall error is O() as desired.
• Suppose f ∈ R(α) on [a, b], m ≤ f (x) ≤ M on [a, b], φ is continuous on [m, M ], and h(x) =
φ(f (x)) on [a, b]. Then h ∈ R(α) on [a, b]. That is, continuous functions preserve integrability.
Example. The function (

0 if x ∈ Q
f (x) =
1 otherwise
is not Riemann integrable, because the upper integral is (b − a) and the lower integral is zero.
2.4 Properties of the Integral

Below, we assume that all functions are integrable whenever applicable.
• Integration is linear,
Z b Z b Z b Z b Z b
(f1 + f2 ) dα = f1 dα + f2 dα cf dα = c f dα.
a a a a a
Proof: first, we prove that f1 + f2 is integrable. For any partition, we have
L(P, f1 , α) + L(P, f2 , α) ≤ L(P, f, α) ≤ U (P, f, α) ≤ U (P, f1 , α) + U (P, f2 , α)
Pick partitions for f1 and f2 with error /2. Then by the inequality above, their common
refinement P has error at most for f1 + f2 , as desired. Moreover, using the inequality again,
Z Z Z
f dα ≤ U (P, f, α) < f1 dα + f2 dα + .
Repeating this argument with fi replaced with −fi gives the desired result.
19 2. Real Analysis
• If f1 (x) ≤ f2 (x) on [a, b], then

Z b Z b
f1 dα ≤ f2 dα.
a a
• Integration ranges add

Z c Z b Z b
f dα + dα = dα.
a c a
• ML inequality: if |f (x)| ≤ M on [a, b], then

Z b

f dα ≤ M (α(b) − α(a)).
a
• Integration is also linear in α,

Z b Z b Z b Z b Z b
f d(α1 + α2 ) = f dα1 + f dα2 , f d(cα) = c f dα.
a a a a a
As before, the integrals on the left exist if the ones on the right do.
• Products of integrable functions are integrable.

Proof: we use an algebraic trick. Let these functions be f and g. Since φ(t) = t2 is continuous,
f 2 and g 2 are integrable, but then so is
4f g = (f + g)2 − (f − g)2
A similar trick works with maximum and minima, as max(f, g) = (f + g)/2 + |f − g|/2.
• If f is integrable, then so is |f |, and

Z b Z b

f dα ≤ |f | dα

a a
Proof: for integrability, compose with φ(t) = |t|. The inequality follows from f ≤ |f |.
The reason we used the Riemann-Stieltjes integral is because the choice of α gives us more flexibility.
In particular, the Riemann-Stieltjes integral contains infinite series as a special case.
• Define the unit step function I as

(
0 if x ≤ 0
I(x) =
1 if x > 0.
• If a < s < b, and f is bounded on [a, b] and continuous at s, and α(x) = I(x − s), then
Z b
f dα = f (s).
a
20 2. Real Analysis
P
• If cn ≥ 0 and n cn converges, and (sn ) is a sequence of distinct points in (a, b), and f is
continuous on [a, b], then
X Z b X
α(x) = cn I(x − sn ) → f dα = cn f (sn ).
n a n
P
Proof: the series on the right-hand side converges by comparison to n M cn where M =
sup |f (x)|. We need to show that it converges to the desired integral; to do this, consider
truncating the series after N terms so that the rest of the terms add up to , and let
N
X
αN (x) = cn I(x − sn ).
n=0
R R
Then f dαN is at most M away from f dα by the ML inequality, while the truncated series
is at most away from the full series. Taking to zero gives the result.
Note. These results show that equations from physics like

Z
I = x2 dm
make sense; with the Riemann-Stieltjes integral, this equation holds whether the masses are contin-
uous or discrete, or both; the function m(x) is the amount of mass to the left of x.
• Let α increase monotonically and let α be differentiable with α0 ∈ R on [a, b]. Let f be bounded
on [a, b]. Then
Z b Z b
f dα = f (x)α0 (x) dx
a a
where one integral exists if and only if the other does.
Proof: we relate the integrals using the MVT. For all partitions P , we can use the MVT to
choose ti in each interval so that
∆αi = α0 (ti )∆xi .
Now consider taking si in each interval to yield the upper sum
X X
U (P, f, α) = f (si )∆αi = f (si )α0 (ti )∆xi .
i i
Now, since α0 is integrable, we can choose P so that U (P, α0 ) − L(P, α0 ) < . Then we have
X
|α0 (si ) − α0 (ti )|∆xi <
i
which implies that

|U (P, f, α) − U (P, f α0 )| ≤ M
where M = sup |f (x)|. Therefore the upper integrals (if they exist) must coincide; similarly the
lower integrals must, giving the result.
21 2. Real Analysis
• Let ϕ be a strictly increasing continuous function that maps [A, B] onto [a, b]. Let α be
monotonically increasing on [a, b] and f ∈ R(α) on [a, b]. Let
β(y) = α(ϕ(y)), g(y) = f (ϕ(y)).
Then g ∈ R(β) on [A, B] with

Z B Z b
g dβ = f dα
A a
Proof: ϕ gives a one-to-one correspondence between partitions of [a, b] and [A, B]. Correspond-
ing partitions have identical upper and lower sums, so the integrals must be equal.
Note. The first proof above shows another common use of the MVT: pinning down specific points
where an ‘on average’ statement is true. Having these points in hand makes the rest of the proof
more straightforward.
Note. A set A ⊂ R has measure zero if, for every > 0 there exists a countable collection of open
intervals {(ai , bi )} such that [ X
A⊂ (ai , bi ), (bi − ai ) <
i i
That is, the “length” of the set is arbitrarily small. Lebesgue’s theorem states that a bounded real
function is Riemann integrable if and only if its set of discontinuities has measure zero.
Next, we relate integration and differentiation.
• Let f ∈ R on [a, b]. For x ∈ [a, b], let

Z x
F (x) = f (t) dt.
a
Then F is continuous on [a, b], and if f is continuous at x0 , then F is differentiable at x0 with

F 0 (x0 ) = f (x0 ).
Proof: F is continuous by the ML inequality, and the fact that f is bounded. The second part
also follows by the ML inequality: by continuity, we can bound f (u) − f (x0 ) when u is close to
x0 . Then the quantity F 0 (x0 ) − f (x0 ) can be bounded by the ML inequality to zero.
• FTC: if f ∈ R on [a, b] and F is differentiable on [a, b] with F 0 = f , then

Z b
f (x) dx = F (b) − F (a).
a
Proof: choose a partition P so that U (P, f ) − L(P, f ) < . By the MVT, we can choose points
in each interval so that
X
F (xi ) − F (xi−1 ) = f (ti )∆xi → f (ti )∆xi = F (b) − F (a)
i
Then both the upper and lower integrals are within of F (b) − F (a), and taking to zero gives
the result.
22 2. Real Analysis
• Vector ML inequality: for f : [a, b] → Rk and f ∈ R(α), then

Z b Z b

f dα ≤ |f | dα.

a a
Proof: first, we must show that |f | is integrable; this follows because it can be built from
squaring, addition, square root, and norm, all of which are continuous. (The square root
function is continuous because it is theR inverse of the square on the compact interval [0, M ] for
any M .) To show the bound, let y = f dα. Then
Z X Z
|y|2 = yi fi dα ≤ |y||f | dα
i
by Cauchy-Schwartz. Canceling |y| from both sides gives the result.
Note. The proofs above show some common techniques: using the ML inequality to bound an
error to zero, and using the MVT to get concrete points to work with.
2.5 Uniform Convergence

Next, we establish some useful technical results using uniform convergence.
• A sequence of functions fn : X → Y converges pointwise to f : X → Y if, for every x ∈ X,
lim fn (x) = f (x).

n→∞
One must treat pointwise convergence with caution; the problems boil down to the fact that
two limits may not commute. For instance, the pointwise limit of continuous functions may not
be continuous.
• Integration and pointwise convergence don’t commute. Let fn : [0, 1] → R where fn (x) = n2 on
(0, 1/n) and 0 otherwise. Then
Z 1 Z 1
lim fn (x) dx = lim n = ∞, lim fn (x) dx = 0.
n→∞ 0 n→∞ 0 n→∞
An analogous statement holds for integration and series summation.
• Differentiation and pointwise convergence don’t commute. Let fn (x) = sin(n2 x)/n, so fn → 0
pointwise. But fn0 (x) = −n cos(n2 x), so fn0 (π/4) → ∞.
An analogous statement holds for differentiation and series summation.
• A sequence of functions fn : X → Y converges uniformly on X to f : X → Y if, for all > 0,

there exists an N so that for n > N , we have
dY (fn (x), f (x)) <
for all x ∈ X. That is, just as in the definition of uniform continuity, we may use the same N
for all points. For concreteness, we specialize to X ⊂ R and Y = R with the standard metric.
23 2. Real Analysis
• An alternative way to write the criterion for uniform convergence is that

αn = sup |fn (x) − f (x)| → 0
x∈X
as n → ∞. It is clear that uniform convergence implies pointwise convergence.

We now establish properties of uniform convergence. All of our functions below map X → R.
• If (fn ) converges uniformly on X to f and the fn are continuous, f is.
Proof: we will show f is continuous at p ∈ X. Fix > 0. For x near p so that |x − p| < δ,
|f (x) − f (p)| ≤ |f (x) − fN (x)| + |fN (x) − fN (p)| + |fN (p) − f (p)|
and we are done if we can show the right-hand side is bounded by . We may first choose N so
the first and third terms are bounded by /3, by the definition of uniform continuity. Next, we
choose δ so the second term is bounded by /3, since fN is continuous, giving the result.
• Uniform convergence also comes in a “Cauchy” variant: (fn ) converges uniformly on X if and
only if, for all > 0, there exists an N so that for n, m > N ,
|fn (x) − fm (x)| <
for all x ∈ X. This follows from the completeness of R.
• If (fn ) converges uniformly to X on f and the fn are integrable, f is.
Proof: for any > 0, f is within of fn for sufficiently large n. Then the upper and lower
integrals of f are within (b − a) of each
R other, giving the result. In particular, the integral of
f must be the limit of the sequence ( fn ).
• If (fn ) converges pointwise to [a, b] on f , the fn are differentiable on (a, b), and the fn0 are
continuous and converge uniformly to a bounded function g on (a, b), then f is differentiable
and f 0 = g.
Proof: the simplest proof uses integration. Taking the result
Z x
fn0 (t) dt = fn (x) − fn (a)
a
in the limit n → ∞, and using the previous result, we have
Z x
g(t) dt = f (x) − f (a).
a
On the other hand, since g is continuous, the left-hand side is a differentiable function F (x)
with F 0 = g. Hence by differentiating both sides, g = f 0 as desired.
We now apply these results to power series.
• Similarly, uniform convergence can be defined for series. For a set of real-valued functions uk ,
P
the series uk converges pointwise/uniformly on X if and only if (fn ) does, where
fn = u1 + . . . + un .
P P
By the above, if uk converges uniformly and the uk are continuous, then uk is continuous.
The same holds with “integrable” in place of “continuous”, as well as “differentiable” if the u0k
P 0
are continuous and uk converges uniformly. In these cases, differentiation and integration
can be performed term by term.
24 2. Real Analysis
P
• Uniform convergence for series also comes in a Cauchy variant. The series uk converges
uniformly on X if and only if, for all > 0, there exists an N so that for n > m > N ,
|um+1 (x) + . . . + un (x)| <
for all x ∈ X.
P
• Weierstrass M -test: the series uk converges uniformly on X if there exist real constants Mk
so that for all k and x ∈ X,
X
|uk (x)| ≤ Mk , Mk converges.
P
This follows directly from the previous point, because Mk is a Cauchy series. This condition
is stronger than necessary, because each Mk depends on the largest value uk (x) takes anywhere,
but in practice is quite useful.
• As we saw earlier, a power series k ck xk has a radius of convergence R so that it converges

P
absolutely for |x| < R, and diverges for |x| > R.
• For any δ with 0 < δ < R, k ck xk converges uniformly on [−R + δ, R − δ]. This simply follows
P
from the Weierstrass M -test, using Mk = |ck (R − δ)k |, where

P
Mk converges by the root test.
Note that the power series does not necessarily converge uniformly on (−R, R). One simple
P k
example is x , which has R = 1. However, the “up to δ” result here will be good enough
because we can take δ arbitrarily small.
• As a result, the power series k ck xk defines a continuous function f on (−R, R). In particular,
P
this establishes the continuity of various functions such as exp(x), sin(x), and cos(x). The
reason that the “up to δ” issue above doesn’t matter is that continuity is a local condition,
which holds at individual points, while uniform convergence is global. Another way of saying
this is that a function is continuous on an arbitrary union of domains where it is continuous,
but this doesn’t hold for uniform convergence.
• Similarly, the power series k ck xk defines a differentiable function f on (−R, R) which can
P
be differentiated term by term. This takes some technical work, as we must show k kck xk−1
P
converges uniformly. Repeating this argument, f is infinitely differentiable on (−R, R).
• Weierstrass’s polynomial approximation theorem states that for any continuous f : [a, b] → R,
there exists a sequence (Pn ) of real polynomials which converges uniformly to f .
25 3. Complex Analysis
3 Complex Analysis
3.1 Analytic Functions
Let f (z) = u(x, y) + iv(x, y) where z = x + iy and u and v are real.
• The derivative of a complex function f (z) is defined by the usual limit definition. We say a
complex function is analytic/holomorphic at a point z0 if it is differentiable in a neighborhood
of z0 .
• Approaching along the x and y-directions respectively, we have
f 0 (z) = ux + ivx , f 0 (z) = −iuy + vy .
Thus, for the derivative to be defined we must have
ux = vy , vx = −uy
which are the Cauchy-Riemann equations. These are also a sufficient condition, as any other
directional derivative can be computed by a linear combination of these two.
• Assuming that f is twice differentiable, both u and v are harmonic functions, uxx + uyy =
vxx + vyy = 0, by the equality of mixed partials.
• The level curves of u and v are orthogonal, because
∇u · ∇v = ux vx + uy vy = −ux uy + ux uy = 0.
In particular, this means that u solves Laplace’s equation when conductor surfaces are given
by level curves of v.
• Changing coordinates to polar gives an alternate form of the Cauchy-Riemann equations,

1 1
ur = vθ , vr = − uθ
r r
where the derivative is
f 0 (z) = e−iθ (ur + ivr ).
• Locally, a complex function differentiable at z0 satisfies ∆f = f 0 (z0 )∆z. Thus the function
looks like a local ’scale and twist’ of the complex plane, which provides some intuition. For
example, z is not differentiable because it behaves like a ‘flip’ and twist.
• The mapping z 7→ f (z) takes harmonic functions to harmonic functions as long as f is differen-
tiable with f 0 (z) 6= 0. This is because the harmonic property (‘function value equal to average
of neighbors’) is invariant under rotation and scaling.
• Conformal transformations are maps of the plane which preserve angles; all holomorphic func-
tions with nonzero derivative produce such a transformation.
• A domain is an open, simply connected region in the complex plane. We say a complex function
is analytic in a region if it is analytic in a domain containing that region. If a function is
analytic everywhere, it is called entire.
• Using the formal coordinate transformation from (x, y) to (z, z) yields the Wirtinger derivatives,
1 1
∂z = (∂x − i∂y ), ∂z = (∂x + i∂y ).
2 2
The Cauchy-Riemann equations are equivalent to
∂z f = 0.
Similarly, we say f is antiholomorphic if ∂z f = 0. The Wirtinger derivatives satisfy a number
of intuitive properties, such as ∂z (zz ∗ ) = z ∗ .
As an example, we consider ideal fluid flow.
• The flow of a fluid is described by a velocity field v = (v1 , v2 ). Ideal fluid flow is steady,
nonviscous, incompressible, and irrotational. The latter two conditions translate to ∇ · v =
∇ × v = 0, which in terms of components are
∂x v1 + ∂y v2 = ∂x v2 − ∂y v1 = 0.
We are switching our derivative notation to avoid confusion with the subscripts.
• The zero curl condition can be satisfied automatically by using a velocity potential, v = ∇φ.
It is also useful to define a stream function ψ, so that
v1 = ∂x φ = ∂y ψ, v2 = ∂y φ = −∂x ψ
in which case incompressibility is also automatic.
• Since φ and ψ satisfy the Cauchy-Riemann equations, they can be combined into an analytic
complex velocity potential
Ω(z) = φ(x, y) + iψ(x, y).
• Since the level sets of ψ are orthogonal to those of φ, level sets of the stream function ψ are
streamlines. The derivative of Ω is the complex velocity,
Ω0 (z) = ∂x φ + i∂x ψ = ∂x φ − i∂y φ = v1 − iv2 .
The boundary conditions are typically of the form ‘constant velocity at infinity’ (which requires
φ to approach a linear function) and ‘zero velocity normal to an obstacle’ (which requires ψ to
be constant on its surface).
Example. The uniform flow Ω(z) = v0 e−iθ0 z. The real part is
φ(x, y) = v0 (cos θ0 x + sin θ0 y)
giving a velocity of v = v0 (cos θ0 , sin θ0 ).
Example. Flow past a cylinder. Consider the velocity potential
Ω(z) = v0 (z + a2 /z), φ = v0 (r + a2 /r) cos θ, ψ = v0 (r − a2 /r) sin θ.
At infinity, the flow has uniform velocity v0 to the right. Since ψ = 0 on r = a, this potential
describes flow past a cylindrical obstacle. To get intuition for this result, note that φ also serves
as an electric potential in the case of a cylindrical conductor at r = a, in a uniform background
field. We know that the cylinder is polarized, producing a dipole moment, and corresponding dipole
potential cos θ/r2 = x/r3 . For the fluid flow there is one less power of r since the situation is
two-dimensional.
Example. Using a conformal transformation. The complex potential Ω(z) = z 2 has stream function
2xy, and hence xy = 0 is a streamline; hence this potential describes flow at a rectangular corner.
An alternate solution is to apply conformal transformation to the boundary condition. If we define
z = w1/2 , with z = x + iy and w = u + iv, then the boundary x = 0, y = 0 is mapped to v = 0.
This problem is solved by the uniform flow Ω(w) = w, and transforming back gives the result.
3.2 Multivalued Functions

Multivalued functions arise in complex analysis as the inverses of single-valued functions.
• Consider w = z 1/2 , defined to be the ‘inverse’ of z = w2 . For every z, there are two values of
w, which are opposites of each other. In polar coordinates,
w = r1/2 eiθp /2 enπi
where θp is restricted to lie in [0, 2π) and n = 0, 1 indexes the two possible values. The surprise
is that if we go in a loop around the origin, we can move from n = 0 to n = 1, and vice versa!
• We say z = 0 is a branch point; a loop traversed around a branch point can change the value of
a multivalued function. Similarly, the point z = ∞ is a branch point, as can be seen by taking
z = 1/t and going around the point t = 0.
• A multivalued function can be rendered single-valued and continuous in a subset of the plane
by choosing a branch. Often this is done by removing a curve, called a ‘branch cut’, from the
plane. In the case above, the branch cut is arbitrary, but must join the branch points z = 0
and z = ∞. This prevents curves from going around either of the branch points. (Generally,
but not always, branch cuts connect pairs of branch points.)
• Using stereographic projection, the branch points for w = z 1/2 are the North and South poles,
and the branch cut connects them.
• A second example is the logarithm function,
log z = log |z| + iθp + 2nπi
where n ∈ Z, and we take the logarithm of a real number to be single-valued. This function has
infinitely many branches, with a branch point at z = 0. It also has a branch point at z = ∞,
by considering log 1/z = − log z.
• For a rational power z m/l with m and l relatively prime, we have

hm i
z m/l = e(m/l) log z = exp (log r + iθp ) exp [2πi(mn/l)]
l
so that there are l distinct branches. For an irrational power, there are infinitely many branches.
Example. An explicit branch of the logarithm. Defining
w = log z, z = x + iy, w = u + iv
we have
y
e2u = x2 + y 2 , tan v = .
x
The first can be easily inverted to yield u = log(x2 + y 2 )/2, which is single-valued because the real
log is, but the second is more subtle. For the inverse tangent of a real number, we customarily take
the branch so that the range is (−π/2, π/2). Then to maintain continuity of v, we set
v = tan−1 (y/x) + Ci , C1 = 0, C2 = C3 = π, C4 = 2π
in the ith quadrant. Then the branch cut is along the positive x axis. Finally, we differentiate, for
d x − iy 1
log z = ux + ivx = 2 2
=
dz x +y z
as expected.
Example. A more complicated multivalued function. Let w = cos−1 z. We have
eiw + e−iw
cos w = z =
2
and solving this as a quadratic in eiw yields
eiw = z + i(1 − z 2 )1/2 → w(z) = −i log(z + i(1 − z 2 )1/2 ).
The function thus has two sources of multivaluedness. We have branch points at z = ±1 due to
the square root. There are no branch points due to the logarithm at finite z, because its argument
is never zero, but there is a branch point at infinity (as can be seen by substituting t = 1/z).
Intuitively, these branch points come from the fact that the cosine of x is the same as the cosine of
2π − x (for the square root) and the cosine of x + 2π (for the logarithm).
Example. Explicit construction of a more complicated branch. Consider
w = [(z − a)(z − b)]1/2 .
There are branch cuts at z = a and z = b, though one can check by setting t = 1/z that there is no
branch cut at infinity. (Intuitively, going around the ‘point at infinity’ is the same as going around
both finite branch points, each of which contribute a phase of π.) To explicitly set a branch, define
z − b = r1 eiθ1 , z − a = r2 eiθ2
so that w ∝ ei(θ1 +θ2 )/2 . A branch is thus specified by a choice of θ. For example, we may choose to
restrict 0 ≤ θi < 2π, which gives a branch cut between a and b, as shown below.
An alternative choice can send the branch cut through the point at infinity, which is more easily
visualized using stereographic projection. Similar reasoning can be used to handle any function
made of products of (z − xi )k .
Note. Branches can be visualized geometrically as sheets of Riemann surfaces, which are generated
by gluing copies of the complex plane together along branch cuts. The logarithm has an infinite
‘spiral staircase’ of such sheets, with each winding about the origin bringing us to the next.
Example. More flows. The potential Ω(z) = k log(z) with k real describes a source or sink at the
origin. Its derivative 1/z describes a dipole, i.e. a source and sink immediately adjacent.
By contrast, the potential Ω(z) = ik log(z) describes circulation about the origin. Here, the
multivaluedness of log(z) is crucial, because if the velocity potential were single-valued, then it
would be impossible to have net circulation along any path; instead going around the origin takes
us to another branch of the logarithm. (In multivariable calculus, we say that zero curl does not
imply that a function is a gradient, if the domain is not simply connected. Here, we can retain the
gradient function at the cost of making it multivalued.)
3.3 Contour Integration

Next, we turn to defining integration.
• A contour C in the complex plane can be parametrized as z(t). We will choose to work with
piecewise smooth contours, i.e. those where z 0 (t) is piecewise continuous.
• For convenience, we may sometimes require that C be simple, i.e. that it does not intersect
itself. This ensures that C winds about every point at most once.
• The contour integral of f along C is defined as

Z Z b
f (z) dz = f (z(t))z 0 (t) dt.
C a
All the usual properties of integration apply; in particular the result is independent of parametriza-
tion. In the piecewise smooth case, we simply define the integral by splitting C into smooth
pieces.
• The ML inequality states that the magnitude any contour integral is bounded by the product
of the supremum of |f (z)| and the length of the contour.
• Cauchy’s theorem: if f is analytic in a simply connected domain D, and f 0 is continuous, then

along a simple closed contour C in D,
I
f (z) dz = 0.
C
Proof: in components, we have

Z Z
f (z) dz = (u dx − v dy) + i(v dx + u dy).
C
We can then apply Green’s theorem to the real and imaginary parts. Applying the Cauchy-
Riemann equations, the ‘curl’ is zero, giving the result. We need the simply connected hypothesis
to ensure that C does not contain points outside of D.
• Goursat’s theorem: Cauchy’s theorem holds without the assumption that f 0 is continuous.
Proof sketch: we break the integral down into the sum of integrals over contours with arbitrarily
small size. By Taylor’s theorem, the function can be expanded as the sum of a constant, linear,
and sublinear term within each small contour. The integrals of the first two vanish, while the
contributions of the sublinear terms go to zero in the limit of small contours.
• As a result, every analytic f in a simply connected domain has a primitive, i.e. a function F
with F 0 = f , with Z
f (z) dz = F (b) − F (a).
C
We can construct the function F by simply choosing any contour connecting a to b.
Example. We integrate f (z) = 1/z over an arbitrary closed contour which winds around the origin
once. (Equivalently, any simple closed contour containing the origin.) Since f is analytic everywhere
besides the origin, we may freely deform the contour so that it becomes a small circle of radius r
about the origin. Then
ireiθ
Z Z
dz
= dθ = 2πi.
C z reiθ
This result can be thought of as due to having a multivalued primitive F (z) = log z, or due to the
hole at the origin. The analogous calculation for 1/z n gives zero for n 6= 1, as there are single-valued
primitives 1/z n−1 .
Example. Complex fluid flow again. The circulation along a curve and flow out of a curve are
Z Z
Γ= vx dx + vy dy, F = vx dy − vy dx.
C C
Combining these, we find Z

Γ + iF = Ω0 (z) dz
C
where Ω is the complex velocity potential. This also provides some general intuition: multiplying i
makes the circulation and flux switch places.
Example. Let P (z) be a polynomial with degree n and n simple roots, and let C be a simple
closed contour. We wish to evaluate
P 0 (z)
I
1
I= dz.
2πi C P (z)
Q
First note that if P (z) = A i (z − ai ), then
P 0 (z) X 1
= .
P (z) z − ai
i
Every root is thus a simple pole, so the integral is simply the number of roots in C. One way to
think of this is that the integrand is really d(log P ), and here log P has logarithmic branch points
at every root, each of which gives a change of 2πi.
Example. Consider the integral Z ∞

2
I= eix dx.
0
2
We consider a contour integral of eiz with a line from the origin to R, an arc to Reiπ/4 , and a line
back to the origin. The arc is exponentially suppressed and does not contribute in the limit R → ∞,
while the total integral is zero since the integrand is analytic. Thus
Z ∞
2 √
I= eiπ/4 e−r dr = eiπ/4 π/2.
0
More generally, this shows that the standard Gaussian integral formula holds for any complex σ 2
as long as the integral converges.
Next, we introduce some more theoretical results.
• Cauchy’s integral formula states that if f (z) is analytic in and on a simple closed contour C,
I
1 f (ξ)
f (z) = dξ.
2πi C ξ − z
Then the value of an analytic function is determined by the values of points around it. The
proof is to deform the contour to a small circle about ξ = z, where the pole gives f (z). The
error term goes to zero by continuity and the ML inequality.
• As a corollary, if f (z) is analytic in and on C, then all of its derivatives exist, with
I
(k) k! f (ξ)
f (z) = dξ.
2πi C (ξ − z)k+1
Proof: we consider k = 1 first. The difference quotient is

f (z + h) − f (z)
I
1 1 1 1
= f (ξ) − dξ.
h 2πi h C ξ − (z + h) ξ − z
This gives the desired result, plus an error term

I
h f (ξ) dξ
R= .
2πi C (ξ − z)2 (ξ − z − h)
For |ξ − z| > δ and |h| < δ/2, the integral is bounded by ML. Since h goes to zero, R goes
to zero as well. This also serves as a proof that f 0 (z) exists. The cases k > 1 are handed
inductively by similar reasoning.
• Intuitively, if we represent a complex function as a Taylor series, the general formulas above
simply pluck out individual terms of this series by shifting them over to 1/z.
• Applying the ML inequality above yields the bound

n!M
|f (n) (z)| ≤
Rn
where M is the maximum of |f (z)| on C.
• Liouville’s theorem: a bounded entire function must be constant.

Proof: suppose f is bounded and apply the bound above for n = 1. Then |f 0 (z)| ≤ M/R, and
taking R to infinity shows that f 0 (z) = 0, so f is constant.
• Morera: if f (z) is continuous in a domain D, and all contour integrals of f are zero, then f (z)
is analytic in D.
Proof: we may construct a primitive F (z) by integration, with F 0 (z) = f (z). Since F is
automatically twice-differentiable, f is analytic.
• Fundamental theorem of algebra: every nonconstant polynomial P (z) has a root in C.

Proof: assume P has no roots. Since |P (z)| → ∞ for |z| → ∞, the function 1/P (z) is bounded
and entire, and hence constant by Liouville’s theorem. Then P (z) is constant.
• Mean value property: if f (z) is analytic on the set |z − z0 | ≤ r, then

Z 2π
1
f (z0 ) = f (z0 + reiθ ) dθ.
2π 0
Intuitively, this is because the components of f are harmonic functions. It also follows directly
from Cauchy’s integral formula; the contour integral along the boundary is
Z 2π Z 2π
f (z0 + reiθ )
Z
1 f (z) 1 iθ 1
f (z0 ) = dz = ire dθ = f (z0 + reiθ ) dθ.
2πi C z − z0 2πi 0 reiθ 2π 0
As a corollary, if |f | has a relative maximum at some point, then f must be constant in a

neighborhood of that point.
• Maximum modulus: suppose f (z) is analytic in a bounded connected region A. If f is continuous

on A and its boundary, then either f is constant or the maximum of |f | occurs only on the
boundary of A.
Proof: the assumptions ensure |f | has an absolute maximum on A and its boundary by the
extreme value theorem. If the maximum is in the interior of A, then f is constant by the mean
value property.
Example. We evaluate the integral Z

dz
C z 2 (1 − z)
around a small counterclockwise circle centered at z = 0. Naively, one might think the answer is
zero since the root at z = 0 is a double root, but 1/(1 − z) expands to 1 + z + . . .. Then the piece
with a simple root is z/z 2 , giving 2πi. Another approach is to use Cauchy’s integral formula with
f (z) = 1/(1 − z), which gives Z
1 f (z) dz
dz = f 0 (0) = 1
2πi C z 2
as expected.
3.4 Laurent Series

We begin by reviewing Taylor series. For simplicity, we center all series about z = 0.
• Previously, we have shown that a power series

∞
X p
f (z) = an z n , α = lim sup n
|an |
n→∞
n=0
converges for |z| < R = 1/α and diverges for |z| > R. It is uniformly convergent for |z| < R, so
we may perform term-by-term integration and differentiation. For example, the power series
∞
X
nan z n−1
n=1
converges to f 0 (z), and also has radius of convergence R.
• We would like to show that a function’s Taylor series converges to the function itself. For an
infinitely-differentiable real function, Taylor’s theorem states that the error of omitting the nth
and higher terms is bounded as
|f (n) (x0 )| n
error at x ≤ max x .
x0 ∈[0,x] n!
One can show this error goes to zero as n goes to infinity for common real functions, such as
the exponential.
• For a complex differentiable function f , the Taylor series of f automatically converges to f

within its radius of convergence. This is a consequence of Cauchy’s integral formula.
To see this, let the Taylor series of f centered at zero have radius of convergence R. We consider
a circular contour of radius r2 < R and let |z| < r1 < r2 . Then
I ∞
I X
1 f (ξ) 1 f (ξ) n
f (z) = dξ = z dξ
2πi ξ−z 2πi ξ n+1
n=0
where the geometric series is convergent since r1 < r2 . In particular, it is uniformly convergent,
so we can exchange the sum and the integral, giving
∞ ∞
f (n) (0) n
I
X 1 f (ξ) n X
f (z) = z dξ = z .
2πi ξ n+1 n!
n=0 n=0
Taking r1 arbitrarily close to R gives the result.
• Therefore, we say a function is analytic at a point if it admits a Taylor series about that point
with positive radius of convergence, and is equal to its Taylor series in a neighborhood of that
point. We have shown that a complex differentiable function is automatically analytic and thus
use the terms interchangeably.
• A function is singular if it is not analytic at a point.
– The function log z has a singularity at z = 0 since it diverges there.

2
– More subtly, e−1/z has a singularity at z = 0 since it is not equal to its Taylor series in
any neighborhood of z = 0.
• A singularity of a function is isolated if there is a neighborhood of that point, excluding the

singular point itself, where the function is analytic.
– The function 1/ sin(π/z) has singularities at z = 0 and z = 1/n for integer n, and hence
the singularity at z = 0 is not isolated.
– As a real function, the singularity at x = 0 of log x is not isolated since log x is not defined
for x < 0. As a single-valued complex function, the same holds because log z requires a
branch cut starting at z = 0.
• More generally, suppose that f (z) is complex differentiable in a region R, z0 ∈ R, and the disk
of radius r about z0 is contained in R. Then f converges to its Taylor series about z0 inside
this disk. The proof of this statement is the same as above, just for general z0 .
• The zeroes of an analytic function, real or complex, are isolated. We simply expand in a Taylor
series about the zero at z = z0 and pull out factors of z − z0 until the series is nonzero at z0 .
The remaining series is nonzero in a neighborhood of z0 by continuity.
Next, we turn to Laurent series.
• Suppose f (z) is analytic on the annulus A = {r1 < |z| < r2 }. Then we claim f (z) may be
written as a Laurent series
∞ ∞
X bn X
f (z) = + an z n
zn
n=1 n=0
where the two parts are called the singular/principal and analytic/regular parts, respectively,
and converge to analytic functions for |z| < r2 and |z| > r1 , respectively.
• The proof is similar to our earlier proof for Taylor series. Let z ∈ A and consider the contour
consisting of a counterclockwise circle C1 of radius greater than |z| and a clockwise circle C2 of
radius less than |z|, both lying within the annulus. By Cauchy’s integral formula,
I I ∞ I X ∞
1 f (ξ) 1 X f (ξ) n 1 f (ξ) n
f (z) = dξ = n+1
z dξ − n+1
ξ dξ
2πi C1 −C2 ξ−z 2πi C1 ξ 2πi C2 z
n=0 n=0
where both geometric series are convergent. These give the analytic and singular pieces of the
Laurent series, respectively.
• From this proof we also read off integral expressions for the coefficients,
I I
1 f (ξ) 1
an = dξ, bn = f (ξ)ξ n−1 dξ.
2πi ξ n+1 2πi
Unlike for Taylor series, none of these coefficients can be expressed in terms of derivatives of f .
• In practice, we use series expansions and algebraic manipulations to determine Laurent series,
though we must use series that converge in the desired annulus.
• Suppose f (z) has an isolated singularity at z0 , so it has a Laurent series expansion about z0 .
– If all of the bn are zero, then z0 is a removable singularity. We may define f (z0 ) = a0 to
make f analytic at z0 . Note that this is guaranteed if f is bounded by the ML inequality.
– If a finite number of the bn are nonzero, we say z0 is a finite pole of f (z). If bk is the highest
nonzero coefficient, the pole has order k. A finite pole with order 1 is a simple pole, or
simply a pole. The residue Res(f, z0 ) of a finite pole is b1 .
– Finite poles are nice, because functions with only finite poles can be made analytic by
multiplying them with polynomials.
– If an infinite number of the bn are nonzero, z0 is an essential singularity. For example, z = 0
2
for e−1/z is an essential singularity. Essential singularities behave very badly; Picard’s
theorem states that they take on all possible values infinitely many times, with at most
one exception, for any neighborhood of z0 .
– A function that is analytic on some region with the exception of a set of poles of finite
order is called meromorphic.
– Note that all of these definitions apply only to isolated poles. For example, the logarithm
has a branch cut starting at z = 0, so the order of this singularity is not defined.
Example. The function f (z) = 1/(z(z − 1)) has poles at z = 0 and z = 1, and hence has a Laurent
series about z = 0 for 0 < |z| < 1 and 1 < |z| < ∞. In the first case, the result can be found by
geometric series,
1 1 1
f (z) = − = − (1 + z + z 2 + . . .).
z1−z z
We see that the residue of the pole at z = 0 is −1. In the second case, this series expansion does
not converge; we instead expand in 1/z for the completely different series

1 1 1 1 1
f (z) = = 2 1 + + 2 + ... .
z z(1 − 1/z) z z z
In particular, note that there is no 1/z term because the residues of the two (simple) poles cancel
out, as can be seen by partial fractions; we cannot use this Laurent series to compute the residue
of the z = 0 pole.
Example. Going to the complex plane gives insight into why some real Taylor series fail. First,
consider f (x) = 1/(1 + x2 ) about x = 0. This Taylor series breaks down for |x| ≥ 1 even though
the function itself is not singular at all. This is explained by the poles at z = ±i in the complex
plane, which set the radius of convergence.
2
As another example, e−1/x does not appear to be pathological on the real line at first glance.
One can see that it is not analytic because its high-order derivatives blow up, but an easier way is
2
to note that when approached along the imaginary axis, the function becomes e1/x , which diverges
very severely at x = 0.
Next, we give some methods for computing residues, all proven with Laurent series.
• If f has a finite pole at z0 , then

n−1
1 d
Res(f, z0 ) = lim (z − z0 )f (z) = lim (z − z0 )n f (z).
z→z0 z→z0 (n − 1)! dx
• If f has a simple pole at z0 and g is analytic at z0 , then

Res(f g, z0 ) = g(z0 )Res(f, z0 ).
• If g(z) has a simple zero at z0 , then 1/g(z) has a simple pole at z0 with residue 1/g 0 (z0 ).
• In practice, we can find the residue of a function defined from functions with Laurent series
expansions by taking the Laurent series of everything, expanding, and finding the 1/z term.
• Suppose that f is analytic in a region R except for a set of isolated singularities. Then if C is
a closed curve in A that doesn’t go through any of the singularities,
I X
f (z) dz = 2πi residues of f in C counted with multiplicity.
C
This is the residue theorem, and it can be shown by deforming the contour to a set of small
circles about each singularity, and expanding in Laurent series about each one and using the
Cauchy integral formula.
Example. Find the residue at z = 0 of f (z) = sinh(z)ez /z 5 . The answer is the z 4 term of the
Laurent series of sinh(z)ez , and
z3 z2 z3

z 1 1
sinh(z)e = z + + ... 1+z+ + + ... = ... + + z4 + . . .
3! 2! 3! 4! 3!
giving the residue 5/24.
Example. The function cot(z) = cos(z)/ sin(z) has a simple pole of residue 1 at z = nπ for all
integers π. To see this, note that sin(z) has simple zeroes at z = nπ and its derivative is cos(z), so
1/ sin(z) has residues of 1/ cos(nπ). Multiplying by cos(z), which is analytic everywhere, cancels
these factors giving a residue of 1.
Example. Compute the contour integral along the unit circle of z 2 sin(1/z). There is an essential
singularity at z = 0, but this doesn’t change the computation. The Laurent series for sin(1/z) is
1 1 1
sin(1/z) = − + ...
z 3! z 3
which gives a residue of −1/6.
Example. The residue at infinity. Suppose that f is analytic in C with a finite number of singular-
ities, and a curve C encloses every singularity once. Then the contour integral along C is the sum
of all the residues. On the other hand, we can formally think of the interior of the contour as the
exterior; then we get the same result if we postulate a pole at infinity with residue
I
1
Res(f, ∞) = − f (z) dz.
2πi C
To compute this quantity, substitute z = 1/w to find
I
1 f (1/w)
Res(f, ∞) = dw
2πi C w2
where C is now negatively oriented. Now, f (1/w) has no poles inside the curve C, so the only
possible pole is at w = 0. Then
Res(f, ∞) = −Res(f (1/w)/w2 , 0)
which may be much easier to compute. Under this language, z has a simple pole at infinity, while
ez has an essential singularity at infinity.
3.5 Application to Real Integrals

In this section, we apply our theory to the evaluation of real integrals.
• In order to express real integrals over the real line in terms of contour integrals, we will have to
close the contour. This is easy if the decay is faster than 1/|z| in either the upper or lower-half
plane by the ML inequality.
• Another common situation is when the function is oscillatory, e.g. it is of the form f (z)eikz . If
|f (z)| does not decay faster than 1/|z|, the ML inequality does not suffice. However, since the
oscillation on the real axis translates to decay in the imaginary direction, if we use a square
contour bounded by z = ±L, the vertical sides are founded by |f (L)|/k and the top side is
exponentially small, so the contributions vanish as L → ∞ as desired.
Example. We compute Z ∞
dx
I= .
−∞ x4+1
We close the contour in the upper-half plane; the decay is O(1/|z|4 ) so the semicircle does not
contribute. The two poles are at z1 = eiπ/4 and z2 = e3iπ/4 . An easy way to compute the residues
is with L’Hopital’s rule,
z − z1 1 e−3iπ/4 e−iπ/4 π
Res(f, z1 ) = lim = lim = , Res(f, z2 ) = , I=√ .
z→z1 1 + z 4 z→z1 4z 3 4 4 2
Example. For b > 0, we compute Z ∞
cos(x)
I= dx.
−∞ x2 + b2
For convenience we replace cos(x) with eix and take the real part at the end. Now, the function
decays faster than 1/|z| in the upper-half plane, so we close the contour there. The contour contains
the pole at z = ib which has residue e−b /2ib, giving I = πe−b /b.
Example. Integrals over angles can be replaced with contour integrals over the unit circle. We let
z + 1/z z − 1/z
z = eiθ , dz = izdθ, cos θ = , sin θ = .
2 2i
For example, we can compute
Z 2π
dθ
I= , |a| =
6 1.
0 1 + a2 − 2a cos θ
Making the above substitutions and some simplifications, we have

(
2π/(a2 − 1) |a| > 1,
Z
dz
I= =
C −ia(z − a)(z − 1/a)
2
−2π/(a − 1) |a| < 1.
It is clear this method works for any trigonometric integral over [0, 2π).
Example. An integral with a branch cut. Consider

Z ∞ 1/3
x
I= dx.
0 1 + x2
We will place the branch cut on the positive real axis, so that for z = reiθ , we have
z 1/3 = r1/3 eiθ/3 , 0 ≤ θ < 2π.
We choose a keyhole contour that avoids the branch cut.
The desired integral I is the integral over C1 , while the integrals over Cr and CR go to zero. The
integral over C2 is on the other end of the branch
√ cut, and hence is −e2πi/3 I. Finally, including the
contributions of the two poles gives I = π/ 3.
Example. The Cauchy principal value. We compute
Z ∞
sin(x)
I= dx.
−∞ x
This is the imaginary part of the contour integral
Z iz
e
I0 = dz
C z
where the contour along the real line is closed by a semicircle. The integrand blows up along the
contour, since it goes through a pole; to fix this, we define the principal value of the integral I 0
to be the limit limr→0 I 0 (r) where a circle of radius r about the origin is deleted from the contour.
This is equal to I because the integrand sin(x)/x is not singular at the origin; in more general cases
where the original integrand is singular, the value of the integral is defined as the principal value.
Now consider the contour above. In the limit r → 0, we have I 0 = πi because it picks up “half of
the pole”, giving I = π. More generally, if the naive contour “slices” through a pole, the principal
value picks up i times the residue times the angle subtended.
Note. The idea of a principal value works for both real and complex integrals. In the case of a real
integral, we delete a small segment centered about the divergence. The principal value also exists
for integrals with bounds at ±∞, by setting the bounds to −R and R and taking R → ∞.
3.6 Conformal Transformations

In this section, we apply conformal transformations.
• A conformal map on the complex plane f (z) is a map so that the tangent vectors at any point z0
are mapped to the tangent vectors at f (z0 ) by a nonzero scaling and proper rotation. Informally,
this means that conformal maps preserve angles.
• As we’ve seen, f (z) is automatically conformal if it is holomorphic with nonzero derivative; the
scaling and rotation factor is f 0 (z0 ).
• The Riemann mapping theorem states that if a region A is simply connected, and not the entire
complex plane, then there exists a bijective conformal map between A and the unit disk; we
say the regions are conformally equivalent.
• The proof is rather technical, but is useful to note a few specific features.
– We cannot take A = C, by Liouville’s theorem.

– There are three real degrees of freedom in the map, which corresponds to the fact that
there is a three-parameter family of maps from the unit disk to itself.
– If A is bounded by a simple closed curve, which may pass through the point at infinity, we
may use these degrees of freedom to specify the images of three boundary points.
– Alternatively, we may specify the image of one interior point of A, and the image of a
direction at that point.
– In practice, we could use “canonical domains” other than the unit disc; one common one
is the upper half-plane, in which case we usually fix points to map to 0, 1, and ∞.
– The theorem guarantees the mapping is conformal in the interior of A, but not necessarily
its boundary, where singularities are needed to smooth out corners and cusps.
– Since conformal maps preserve angles, including their orientation, a curve traversing ∂A
with the interior of A to its right maps to a curve traversing the image of ∂A satisfying the
same property.
• A useful set of conformal transformations are the fraction linear transformations, or Mobius
transformations
az + b
T (z) = , ad − bc 6= 0.
cz + d
Note that when ad − bc = 0, then T (z) is constant. Mobius transformations can also be taken
to act on the extended complex plane, with
a
T (∞) = , T (−d/c) = ∞.
c
They are bijective on the extended complex plane, and conformal everywhere except z = −d/c.
• When c = 0, we get scalings and rotations. The map T (z) = 1/z flips circles inside and outside
of the unit circle. As another example,
z−i
T (z) =
z+i
maps the real axis to the unit circle, and hence maps the upper half-plane to the unit disk.
• In general, a Mobius transformation maps generalized circles to generalized circles, where

generalized circles include straight lines. To show this, note that it is true for scaling and
rotation, so we only need to prove it for inversions, which can be done by components. For
example, inversion maps a circle passing through the origin to a linear that doesn’t.
• A very useful fact is that Mobius transformations can be identified with matrices,

a b
T (z) =
c d
so that composition of Mobius transformations is matrix multiplication. Since we can always

scale ad − bc to one, and then further multiply all the coefficients by −1, the set of Mobius
transformations is P SL2 (C) = SL2 (C)/{±I}.
• The subset of Mobius transformations that map the upper half-plane to itself turn out to be
the ones where a, b, c, and d are all real, and ad − bc = 1. Then the group of conformal
automorphisms of the upper half-plane contains P SL2 (R).
• In fact, these are all of the conformal automorphisms of the upper half-plane. To prove this,
one typically shows using the Schwartz lemma that the conformal automorphisms of the disk
take the form
z−a
T (z) = λ , |λ| = 1, |a| < 1
az − 1
and then notes that the upper half-plane is conformally equivalent to the disk.
• Given any three distinct points (z1 , z2 , z3 ), there exists a Mobius transformation that maps
them to (w1 , w2 , w3 ). To see this, note that we can map (z1 , z2 , z3 ) to (0, 1, ∞) by
(z − z1 )(z2 − z3 )
T (z) =
(z − z3 )(z2 − z1 )
and this map is invertible, giving the result.
Note. A little geometry. The reflection of a point in a line is the unique point so that any generalized
circle that goes through both points intersects the line perpendicularly. We define the reflection of
a point in a generalized circle in the same way. To prove that this reflection is unique, note that
since Mobius transformations preserve generalized circles and angles, they preserve the reflection
property; however we can use a Mobius transformation to map a given circle to a line, then use
uniqueness of reflection in a line.
Reflection in a circle is called inversion in the context of Euclidean geometry. Our “inversion”
map z 7→ 1/z is close, but it actually corresponds to an inversion about the unit circle followed by
a reflection about the real axis. The inversion along would be z 7→ 1/z.
Example. Suppose two circles C1 and C2 do not intersect; we would like to construct a conformal
mapping that makes them concentric. To do this, let z1 and z2 be reflections of each other in both
circles – it is easier to see such points exist by mapping C1 to a line and then mapping back. Now,
by a conformal transformation we can map z1 to zero and z2 to infinity, which means both centers
must end up centered at zero.
Example. Find a map from the upper half-plane with a semicircle removed to a quarter-plane.
We will use a Mobius transformation. The trick is to look at how the boundary must be mapped.
We have right angles at A and C, but only one right angle in the image; we can achieve this by
mapping A to infinity and C to zero, so
z−1
z 7→ ζ = .
z+1
To verify the boundary is correct, we note that ABC and CDA are still generalized circles after
the mapping, and verify that B and D are mapped into the imaginary and real axes, respectively.
More generally, if we need to change the angle at the origin by a factor α, we can compose by a
monomial z 7→ z α .
Example. Map the upper half plane to itself, permuting the points (0, 1, ∞). We must use Mobius
maps with real coefficients. Since orientation is preserved, we can only perform even permutations.
The answers are
1
ζ= , (0, 1, ∞) 7→ (1, ∞, 0)
1−z
and
z−1
ζ= , (0, 1, ∞) 7→ (∞, 0, 1).
z
Example. The Dirichlet problem is to find a harmonic function on a region A given specified values
on the boundary of A. For example, let A be the unit disk with boundary condition
(
1 0 < θ < π,
u(eiθ ) =
0 π < θ < 2π.
The problem can be solved by conformal mapping. We apply T (z) = (z − i)/(z + i), which maps
the real axis to the unit circle. Then A maps to the upper half-plane with boundary condition
u(x) = θ(−x), and an explicit solution is u(x, y) = θ/π = Im(log(z))/π.
More generally, consider a piecewise constant boundary condition u(eiθ ). Then the conformally
transformed solution is a sum of pieces of the form log(z − x0 ). An arbitrary boundary condition
translates to a weighted integral of log(z − x) over real x.
Example. The general case of flow around a circle. Suppose f (z) is a complex velocity potential.
Singularities of the potential correspond to sources or vertices. If there are no singularities for
|z| < R, then the Milne-Thomson circle theorem states that
Φ(z) = f (z) + f (R2 /z)
is a potential for a flow with a streamline on |z| = R and the same singularities; it is what the
potential would be if we introduced a circular obstacle but kept everything else the same. We’ve
already seen the specific example of uniform flow around a circle, where f (z) = z.
To see this, note that f (z) may be expanded in a Taylor series

f (z) = a0 + a1 z + a2 z 2 + . . .
which converges for |z| ≤ R. Then f (R2 /z) has a Laurent series
2 2
2
R2 R
f (R /z) = a0 + a1 + a2 + ...
z z
which converges for |z| ≥ R, so no new physical singularities are introduced by adding it. To see
that |z| = R is a streamline, note that
Φ(Reiθ ) = f (Reiθ ) + f (Reiθ ) ∈ R.
Then the stream function Im Φ has a level set on |z| = R, namely zero.
Example. The map f (z) = eiz takes the half-strip Im(z) > 0, Re(z) ∈ (−π/2, π/2) to the right
half-disc. In general, since the complex exponential is periodic in 2π, it is useful for mapping from
strips. The logarithm f (z) = log z maps to strips. For example, it takes the upper half-plane to the
strip Im(z) ∈ (0, π). It also maps the upper half-disc to the left half of this strip.
Example. The Joukowski map is
1 1
f (z) = z+ .
2 z
This map takes the unit disc to the entire complex plane; to see this, we simply note that the unit
circle is mapped to the slit x ∈ (−1, 1). This does not contradict the Riemann mapping theorem,
because f (z) is singular at z = 0. We create corners at z = ±1, which is acceptable because f 0
vanishes at these points. Since the Joukowski map obeys f (z) = f (1/z), the region outside the
unit disc is also mapped to the complex plane. The Joukowski transform is useful in aerodynamics,
because it maps off-center circles to shapes that look like airfoils. The flow past these airfoils can
be solved by applying the inverse transform, since the flow around a sphere is known analytically.
3.7 Additional Topics

Next, we introduce the argument principle, which is useful for counting poles and zeroes.
• Previously, we have restricted to simple closed curves, as these wind about any point at most
once. However, we may now define the winding number or index
Z
1 dz
Ind(γ, z0 ) =
2πi γ z − z0
for any closed curve γ that does not contain z0 . This follows from Cauchy’s integral theorem;
intuitively, the integrand is d(log(z − z0 )) and hence counts the number of windings by the net
phase change.
• For any integer power f (z) = z n , we have
Z 0
f (z)
dz = 2πn, Ind(γ, 0) = 1.
γ f (z)
This is because the integrand is df /f , so it counts the winding number of f about the origin
along the curve. Moreover, (f g)0 /(f g) = f 0 /f + g 0 /g, so other zeroes or poles contribute
additively.
• Formalizing this result, for a meromorphic function f and a simple closed curve γ not going
through any of the poles, we have the argument principle
Z 0
f (z)
dz = 2π(zeroes minus poles) = 2πi Ind(f ◦ γ, 0)
γ f (z)
where the zeroes and poles are weighted by their order.
• Rouche’s theorem states that for meromorphic functions f and h and a simple closed curve γ
not going through any of their poles, if |h| < |f | everywhere on γ, then
Ind(f ◦ γ, 0) = Ind((f + h) ◦ γ, 0).
Intuitively, this follows from the picture of a ‘dog on a short leash’ held by a person walking
around a tree. It can be shown using the argument principle; interpolating between f and f + h,
the integral varies continuously, so the index must stay the same.
• A useful corollary of Rouche’s theorem is the case of holomorphic f and h, which gives
zeroes of f in γ = zeroes of f + h in γ.
For example, suppose we wish to show that z 5 + 3z + 1 has all five of its zeroes within |z| = 2.
• This same reasoning provides a different proof of the fundamental theorem of algebra. We let
f (z) ∝ z n be the higher-order term in the polynomial and let h be the rest. Then within a
sufficiently large circle, f + h must contain n zeroes.
Next, we discuss analytic continuation.
• Suppose that f is holomorphic in a connected region R and vanishes in a sequence of distinct

points {wi } with a limit point in R. Then f is zero.
To see this, suppose that f is nonzero. Then it has a Taylor series expansion about the limit
point, but we’ve shown that zeroes of functions with Taylor series are isolated by continuity.
• As a corollary, if f and g are holomorphic on a connected region R and agree on a set of points
with a limit point in R, then they are equal. An analytic continuation of a real function is
a holomorphic function that agrees with it on the real axis; this result ensures that analytic
continuation is unique, at least locally.
• One must be more careful globally. For example, consider the two branches of the logarithm
with a branch cut along the positive and negative real axis. The two functions agree in the first
quadrant, but we cannot conclude they agree in the fourth quadrant, because the region where
they are both defined is the complex plane minus the real axis, which is not connected.
• These global issues are addressed by the monodromy theorem, which states that analytic
continuation is unique (i.e. independent of path) if the domain we use is simply connected. This
does not hold for the logarithm, because it is nonanalytic at the origin.
• As another example, the factorial function doesn’t have a unique analytic continuation, because
the set of positive integers doesn’t have a limit point. But the gamma function, defined as an
integral expression for positive real arguments, does have a unique analytic continuation. (This
statement is sometimes mangled to “the gamma function is the unique analytic continuation
of the factorial function”, which is incorrect.)
• Consider a Taylor series with radius of convergence R. This defines a holomorphic function
within a disk of radius R and hence can be analytically continued, e.g. by taking the Taylor
series about a different point in the disk.
• As an example where this fails, consider f (z) = z +z 2 +z 4 +. . ., which has radius of convergence
1. The function satisfies the recurrence relation f (z) = z + f (z 2 ), which implies that f (1) is
n
divergent. By repeatedly applying this relation, we see that f (z) is divergent if z 2 = 1, so
the divergences are dense on the boundary of the unit disk. These divergences form a ‘natural
boundary’ beyond which analytic continuation is not possible.
45 4. Linear Algebra
4 Linear Algebra
4.1 Exact Sequences
In this section, we rewrite basic linear algebra results using exact sequences. For simplicity, we only
work with finite-dimensional vector spaces.
• Consider vector spaces Vi and maps ϕi : Vi → Vi+1 , which define a sequence

ϕi−1 ϕi
. . . → Vi−1 −−−→ Vi −→ Vi+1 → . . . .
We say the map is exact at Vi if im ϕi−1 = ker ϕi . The general intuition is that this means Vi
is ‘made up’ of its neighbors Vi−1 and Vi+1 .
• We write 0 for the zero-dimensional vector space. For any other vector space V , there is only
one possible linear map from V to 0, or from 0 to V .
• A short exact sequence is an exact sequence of the form

ϕ1 ϕ2
0 → V1 −→ V2 −→ V3 → 0.
The sequence is exact at V1 iff ϕ1 is injective and exact at V2 iff ϕ2 is surjective.
• As an example, the exact sequence

ϕ
0 → V1 −
→ V2 → 0
requires ϕ to be an isomorphism.
• If T : V → W is surjective, then we have the exact sequence

i T
0 → ker T →
− V −
→W →0
where i is the inclusion map.
• Given this short exact sequence, there exists a linear map S : W → V so that T ◦ S = 1. We
say the exact sequence splits, and that S is a section of T . To see why S exists, take any basis
{fi } of W . Then there exist ei so that T (ei ) = fi , and we simply define S(fi ) = ei .
• Using the splitting, we have the identity
V = ker T ⊕ S(W )
which is a refinement of the rank-nullity theorem; this makes it clear exactly how V is determined
by its neighbors in the short exact sequence. Note that we always have dim Vi = dim Vi−1 +
dim Vi+1 , but by using the splitting, we get a direct decomposition of V itself.
• It is tempting to write V = ker T ⊕ W , but this is technically incorrect because W is not a

subspace of V . We will often ignore this distinction below.
Example. Quotient spaces. Given a subspace W ⊂ V we define the equivalence relation v ∼ w

if v − w ∈ W . The set of equivalence classes [v] is called V /W and we define the projection map
π : V → V /U by π(v) = [v]. Then we have an exact sequence
i π
0→U →
− V −
→ V /U → 0
which implies dim(V /U ) = dim V − dim U .
Example. The kernel of T : V → W measures the failure of T to be injective; the cokernel

coker T = W/ im T measures the failure of T to be surjective. Then we have the exact sequence
i T π
0 → ker T →
− V −
→W −
→ coker T → 0
where π projects out im T .
Example. Exact sequences and chain complexes. Consider the chain complex with boundary
operator ∂. The condition im ϕi−1 ⊂ ker ϕi states that the composition ϕi ◦ ϕi−1 takes everything
to zero, so ∂ 2 = 0. The condition ker ϕi ⊂ im ϕi−1 implies that the homology is trivial. Thus,
homology measures the failure of the chain complex to be exact.
Next, we prove a familiar theorem using the language of exact sequences.
Example. We claim every space with a symmetric nondegenerate bilinear form g has an orthonormal
basis, i.e. a set {vi } where g(vi , vj ) = ±δij . We prove only the real case for simplicity. Let dim V = k
and suppose we have an orthonormal set of k − 1 vectors ei . Defining the projection map
k−1
X
π(v) = g(ei , ei )g(ei , v)ei
i=1
we have the exact sequence

i π
0 → W⊥ →
− V −
→W →0
where W ⊥ = ker π is the orthogonal complement of W . Now, we claim that g is nondegenerate
when restricted to W ⊥ . To see this, note that if g(w1 , w2 ) = 0 for all w2 ∈ W , then g(w1 , v) = 0
for all vectors v ∈ V , so w1 must be zero by nondegeneracy. The result follows by induction.
We can also give a more direct proof. Given a set of vectors {vi }, define the Gram matrix G to
have components
gij = g(ei , ej ).
In the context of physics, this is simply the metric in matrix form. Then the form is nondegenerate
if and only if G has trivial nullspace, as
Gv = 0 ↔ g(vi ei , ej ) = 0.
By the spectral theorem, we can choose a basis so that G is diagonal; by the result above, its diagonal
entries are nonzero, so we can scale them to be ±1. This yields the desired basis. Sylvester’s theorem
states that the total number of 1’s and −1’s in the final form of G is unique. We say g is an inner
product if it is positive definite, i.e. there are no −1’s.
The determinate of the gram matrix, called the Grammian, is a useful concept. For example, for
any collection of vectors {vi }, the vectors are independent if and only if the Grammian is nonzero.
4.2 The Dual Space

Next, we consider dual spaces and dual maps.
• Let the dual space V ∗ be the set of linear functionals f on V . For finite-dimensional V , V and
V ∗ are isomorphic but there is no natural map between them.
• For infinite-dimensional V , V and V ∗ are generally not isomorphic. One important exception
is when V is a Hilbert space, which is crucial in quantum mechanics.
• We always have V ∗∗ = V , with the natural isomorphism v 7→ (f 7→ f (v)).
• When an inner product is given, we can define an isomorphism ψ between V and V ∗ by
v 7→ fv , fv (·) = g(v, ·).
By nondegeneracy, ψ is injective; since V and V ∗ have the same dimension, this implies it is
surjective as well.
• In the context of a complex vector space, there are some extra subtleties: the form can only be
linear in one argument, say the second, and is antilinear in the other. Then the map ψ indeed
maps vectors to linear functionals, but it does so at the cost of being antilinear itself.
• The result above also holds for (infinite-dimensional) Hilbert spaces, where it is called the Riesz
lemma; it is useful in quantum mechanics.
• Given a linear map A : V → W , there is a dual map A∗ : W ∗ → V ∗ defined by
(A∗ f )(v) = f (Av).
The dual map is often called the transpose map. To see why, pick arbitrary bases of V and W
with the corresponding dual bases of V ∗ and W ∗ . Then in components,
fi Aij vj = (A∗ f )j vj = (A∗ji fi )vj
which implies that Aij = A∗ji . That is, expressed in terms of matrices in the appropriate bases,
they are transposes of each other.
• Given an inner product g on V and a linear map A : V → V , there is another linear map
A† : V → V called the adjoint of A, defined by
g(A† w, v) = g(w, Av).
By working in an orthonormal basis and expanding in components, the matrix elements satisfy
A†ij = A∗ji
so that the matrices are conjugate transposes.
• In the case where V = W and V is a real vector space, the matrix representations of the dual
and adjoint coincide, but they are very different objects. In quantum mechanics, we switch
between a map and its dual constantly, but taking the adjoint has a nontrivial effect.
4.3 Determinants
We now review some facts about determinants.
eij = det A(i|j) where A(i|j) is A with its ith row
• Defining the ij minor of a matrix Aij to be A
and j th column removed. Define the adjugate matrix adj A to have elements
(adj A)ij = A
eji .
• By induction, we can show that the determinate satisfies the Laplace expansion formula
n
X
det A = Aij A
eij .
j=1
More generally, we have

n
X
Aij A
ekj = δjk det A
j=1
where we get a zero result when j 6= k because we are effectively taking the determinant of a
matrix with identical rows.
• Therefore, removing the components, we have
A(adj A) = (adj A)A = (det A)I
so that the adjugate gives a formula for the inverse, when it exists! When it doesn’t exist,
det A = 0, so both sides are simply zero.
• Applying this formula to Ax = b, we have x = (adj A)b/ det A. Taking components gives
Cramer’s rule
xi = det A(i) / det A
where A(i) is A with the ith column replaced with b.
• The Laplace expansion formula gives us a formula for the derivative of the determinant,
∂
(det A) = A
eij .
∂Aij
In the case det A 6= 0, this gives the useful result
∂
(det A) = (det A)(A−1 )ji .
∂Aij
Note. The final result above can also be derived by the identity
log det A = tr log A.
Taking the variation of both sides,
δ(log det A) = tr log(A + δA) = tr A−1 δA
which implies
∂
(log det A) = (A−1 )T
∂A
in agreement with our result. The crucial step is the simplification of the log, which is not valid
in general, but works because of the cyclic property of the trace. More precisely, if we expand the
logarithm order by order (keeping only terms up to first-order in δA), the cyclic property always
allows us to bring the factor of δA to the back, so A and δA effectively commute.
4.4 Endomorphisms
An endomorphism is a linear map from a vector space V to itself. The set of such endomorphisms
is called End(V ) in math, and the set of linear operators on V in physics. We write abstract
endomorphisms with Greek letters; for example, the identity map ι has matrix representation I.
• Two matrix representations of an endomorphism differ by conjugation by a change of basis

matrix, and any two matrices related this way are called similar.
• We define the trace and determinant of an endomorphism by the trace and determinant of any
matrix representation; this does not depend on the basis chosen.
• We define the λ-eigenspace of α as E(λ) = ker(α − λι).
• We define the characteristic polynomial of α by
χα (t) = det(tι − α).
This is a monic polynomial with degree dim V , and its roots correspond to eigenvalues. Similarly,
we can define the characteristic polynomial of a matrix as χA (t) = det(tI −A), and it is invariant
under basis change.
P
• The eigenspaces E(λi ) are independent. To prove this, suppose that i xi = 0 where xi ∈ E(λi ).
Then we may project out all but one component,
X Y Y
xi = (α − λk ι)(xi ) = (λj − λk )xj ∝ xj .
i k6=j k6=j
For the left-hand side to be zero, xj must be zero for all j, giving the result.
• We say α is diagonalizable when its eigenspaces span all of V , i.e. V = ⊕i Ei . Equivalently, α
has a diagonal matrix representation, produced by choosing a basis of eigenvectors.
Diagonalizability is an important property. To approach it, we introduce the minimal polynomial.
• Polynomial division: for any polynomials f and g, we may write f (t) = q(t)g(t) + r(t) where
deg r < deg g.
• As a corollary, whenever f has a root λ, we can extract a linear factor f (t) = (t − λ)g(t). The
fundamental theorem of algebra tells us that f will always have at least one root; repeating
this shows that all polynomials split into linear factors in C.
• The endomorphism α is diagonalizable if and only if there is a nonzero polynomial p(t) with
distinct linear factors such that p(α) = 0. Intuitively, each such linear factor (x − λi ) projects
away the eigenspace Ei , and since p(α) = 0, the Ei must span all of V .
Proof: The backward direction is simple. To prove the forward direction, we define projection
operators. Let the roots be λi and let
Y t − λi
qj (t) = → qj (λi ) = δij .
λj − λi
i6=j
P
Then q(t) = j qj (t) = 1. Now define the operators πj = qj (α). Since (α − λj ι)πj ∝ p(α) = 0,
πj projects onto the λj eigenspace. Since the projectors sum to πj (v) = q(α) = ι, the eigenspaces
span V .
• Define the minimal polynomial of α to be the non-zero monic polynomial mα (t) of least degree
so that mα (α) = 0. Such polynomials exist with degree bounded by n2 , since End(V ) has
dimension n2 .
• For any polynomial p, p(α) = 0 if and only if mα divides p.

Proof: using division, we have p(t) = q(t)mα (t) + r(t). Plugging in α, we have r(α) = 0, but r
has smaller degree than mα , so it must be zero, giving the result.
• As a direct corollary, the endomorphism α is diagonalizable if and only if mα (t) is a product of

distinct linear factors.
• Every eigenvalue is a root of the minimal polynomial, and vice versa.
Example. Intuition for the above results. Consider the matrices

1 0 1 1
A= , B= .
0 1 0 1
Then A satisfies t − 1, but B does not; instead its minimal polynomial is (t − 1)2 . To understand
this, note that
0 1
C=
0 0
has minimal polynomial t2 , and its action consists of taking the basis vectors ê2 → ê1 → 0, which
is why it requires two powers of t to vanish. This matrix is not diagonalizable because the only
possible eigenvalue is zero, but only ê1 is an eigenvector; ê2 is a ‘generalized eigenvector’ that instead
eventually maps to zero. As we’ll see below, such generalized eigenvectors are the only obstacle to
diagonalizability.
Prop (Schur). Let V be a finite-dimensional complex vector space and let α ∈ End(V ). Then there
is a basis where α is upper triangular.
Proof. By the FTA, the characteristic polynomial has a root, and hence there is an eigenvector.
By taking this as our first basis element, all entries in the first column are zero except for the first.
Quotienting out the eigenspace gives the result by induction.
Theorem (Cayley-Hamilton). Let V be a finite-dimensional vector space over F and let α ∈ End(V ).
Then χα (α) = 0, so mα divides χα .
Proof. [F = C] We use Schur’s theorem, and let α be represented with A, which has diagonal
Q
elements λi . Then χα (t) = i (t − λi ). Applying the factor (α − λn ) sets the basis vector ên to zero.
Subsequently applying the factor (α − λn−1 ) sets the basis vector ên−1 to zero, and does not map
anything to ên since A is upper triangular. Repeating this logic, χα (α) sets every basis vector to
zero, giving the result. This also proves the Cayley-Hamilton theorem for F = R, because every
real polynomial can be regarded as a complex one.
Proof. [General F] A tempting false proof of the Cayley-Hamilton theorem is to simply directly
substitute t = A in det(tI − A). This doesn’t make sense, but we can make it make sense by
explicitly expanding the characteristic polynomial. Let B = tI − A. Then
adj B = Bn−1 tn−1 + . . . + B1 t + B0 .

Using B(adj B) = (det B)I − χA (t)I, we have
(tI − A)(Bn−1 tn−1 + . . . + B0 ) = (tn + an−1 tn−1 + . . . + a0 )In
where the ai are the coefficients of the characteristic polynomial. Expanding term by term,
An Bn−1 = An , An−1 Bn−2 − An Bn−1 = an−1 An−1 , ... , −AB0 = a0 In .
Adding these equations together, the left-hand sides telescope, giving the result.
Proof. [Continuity] Use the fact that Cayley-Hamilton is obvious for diagonalizable matrices, con-
tinuity of χα , and the fact that diagonalizable matrices are dense in the space of matrices. This is
the shortest proof, but has the disadvantage of requiring much more setup.
Example. The minimal polynomial of

 
1 0 −2
A = 0 1 1  .
0 0 2
We know the characteristic polynomial is (t − 1)2 (t − 2), and that both 1 and 2 are eigenvalues.
Thus by Cayley-Hamilton the minimal polynomial is (t − 1)a (t − 2) where a is 1 or 2. A direct
calculation shows that a = 1 works; hence A is diagonalizable.
Next, we move to Jordan normal form.
• Let λ be an eigenvalue of α. Its algebraic multiplicity aλ is its multiplicity as a root of χα (t).

Its geometric multiplicity is gλ = dim Eα (λ). We also define cλ as its multiplicity as a root of
mα (t).
• If aλ = gλ for all λ, then α is diagonalizable. As shown earlier, this is equivalent to cλ = 1 for

all eigenvalues λ.
• As we’ll see, the source of nondiagonalizability is Jordan blocks, i.e. matrices of the form
Jn (λ) = λIn + Kn
where Kn has ones directly above the main diagonal. These blocks have gλ = 1 but aλ = cλ = n.
A matrix is in Jordan normal form if it is block diagonal with Jordan blocks.
• It can be shown that every matrix is similar to one in Jordan normal form. A sketch of the
proof is to split the vector space into ‘generalized eigenspaces’ (the nullspaces of (A − λI)k for
sufficiently high k), so that we can focus on a single eigenvalue, which can be shifted to zero
without loss of generality.
Example. All possible Jordan normal forms of 3×3 matrices. We have the diagonalizable examples,
diag(λ1 , λ2 , λ3 ), diag(λ1 , λ2 , λ2 ), diag(λ1 , λ1 , λ1 ),
as well as      
λ1 λ1 λ1 1
 λ2 1 ,  λ1 1 ,  λ1 1 .
λ2 λ1 λ1
The minimal polynomials are (t − λ1 )(t − λ2 )2 , (t − λ1 )2 , and (t − λ1 )3 , while the characteristic

polynomials can be read off the main diagonal. In general, aλ is the total dimension of all Jordan
blocks with eigenvalue λ, cλ is the dimension of the largest Jordan block, and gλ is the number of
Jordan blocks. The dimension of the λ eigenspace is gλ , while the dimension of the λ generalized
eigenspace is aλ .
Example. The prototype for a Jordan block is a nilpotent endomorphism that takes
ê1 7→ ê2 7→ ê3 7→ 0
for basis vectors êi . Now consider an endomorphism that takes
ê1 , ê2 7→ ê3 → 0.
At first glance it seems this can’t be put in Jordan form, but it can because it takes ê1 − ê2 → 0.
Thus there are actually two Jordan blocks!
Example. Solving the differential equation ẋ = Ax for a general matrix A. The method of normal
modes is to diagonalize A, from which we can read off the solution x(t) = eAt x(0). More generally,
the best we can do is Jordan normal form, and the exponential of a Jordan block contains powers of
t, so generally the amplitude will grow polynomially. Note that this doesn’t happen for mass-spring
systems, because there the equivalent of A must be antisymmetric by Newton’s third law, so it is
diagonalizable.
53 5. Groups
5 Groups
5.1 Fundamentals
We begin with the basic definitions.
• A group G is a set with an associative binary operation, so that there is an identity e which
satisfies ea = ae = a for all a ∈ G, and for every element a there is an inverse a−1 so that
aa−1 = a−1 a = e. A group is abelian if the operation is commutative.
• There are many important basic examples of groups.
– Any field F is a abelian group under addition, while F∗ , which omits the zero element, is a
abelian group under multiplication.
– The set of n × n invertible real matrices forms the group GL(n, R) under matrix multipli-
cation, and it is not abelian.
– A group is cyclic if all elements are powers g k of a fixed group element g. The nth cyclic
group Cn is the cyclic group with n elements.
– The dihedral group D2n is the set of symmetries of a regular n-gon. It is generated by
rotations r by 2π/n and a reflection s and hence has 2n elements, of the form rk or srk .
We may show this using the relations rn = s2 = 1 and srs = r−1 .
• We can construct new groups from old.
– The direct product group G × H has the operation
(g1 , h1 )(g2 , h2 ) = (g1 g2 , h1 h2 ).
For example, there are two groups of order 4, which are C4 and the Klein four group C2 ×C2 .
– A subgroup H ⊆ G is a subset of G closed under the group operations. For example,
Cn ⊆ D2n and C2 ⊆ D2n .
– Note that intersections of subgroups are subgroups. The subgroup generated by a subset
S of G, called hSi is the smallest subgroup of G that contains S. One may also consider
the subgroup generated by a group element, hgi.
• A group isomorphism φ : G → H is a bijection so that φ(g1 g2 ) = φ(g1 )φ(g2 ).
• The order of a group |G| is the number of elements it contains, while the order of a group
element g is the smallest integer k so that g k = e.
• An equivalence relation ∼ on a set S is a binary relation that is reflexive, symmetric, and

transitive. The set is thus partitioned into equivalence classes; the equivalence class of a ∈ S is
written as a or [a].
• Two elements in a group g1 and g2 are conjugate if there is a group element h so that g1 = hg2 h−1 .
Conjugacy is an equivalence relation and hence splits the group into conjugacy classes.
One of the most important examples is the permutation group.
• The symmetric group Sn is the set of bijections S → S of a set S with n elements, conventionally
written as S = {1, 2, . . . , n}, where the group operation is composition.
54 5. Groups
• An element σ of Sn can be written in the notation

1 2 ... n
.
σ(1) σ(2) . . . σ(n)
There is an ambiguity of notation, because for σ, τ ∈ Sn the product στ can refer to doing the
permutation σ first, as one would expect naively, or to doing τ first, because one would write
σ(τ (i)) for the image of element i. We choose the former option.
• It is easier to write permutations using cycle notation. For example, a 3-cycle (123) denotes
a permutation that maps 1 → 2 → 3 → 1 and fixes everything else. All group elements are
generated by 2-cycles, also called transpositions.
• Any permutation can be written as a product of disjoint cycles. The cycle type is the set
of lengths of these cycles, and conjugacy classes in Sn are specified by cycle type, because
conjugation merely ‘relabels the numbers’.
• Specifically, suppose there are ki cycles of length `i . Then the number of permutations with
this cycle type is
n!
Q ki
i `i ki !
where the first term in the denominator accounts for shuffling within a cycle (since (123) is
equivalent to (231)) and the second accounts for exchanging cycles of the same length (since
(12)(34) is equivalent to (34)(12)).
• Every permutation can be represented by a permutation matrix. A permutation matrix is even

if its permutation matrix has determinant +1. Hence by properties of determinants, even and
odd permutations are products of an even or odd number of transpositions.
• The subgroup of even permutations is the alternating group An ⊆ Sn . Note that every even
permutation is paired with an odd one, by multiplying by an arbitrary transposition, so |An | =
n!/2. For n ≥ 4, An is not abelian since (123) and (124) don’t commute.
• Finally, some conjugacy classes break in half when passing from Sn to An . For example, (123)
and (132) are not conjugate in A4 , because if σ −1 (123)σ = (132), then (1σ 2σ 3σ) = (132),
which means σ is odd.
Next, we turn to the group theory of the integers Z.
• The integers are the cyclic group of infinite order. To make this very explicit, we may define
an isomorphism φ(g k ) = k for generator g.
• Any subgroup of a cyclic group is cyclic. Let G = hgi and H ⊆ G. Then if n is the minimum
natural number so that g n ∈ H, we claim H = hg n i. For an arbitrary element g a ∈ H, we may
use the division algorithm to write a = qn + r, and hence g r ∈ H. Then we have a contradiction
unless r = 0.
• In particular, this means the subgroups of Z are nZ. We define
hm, ni = hgcf(m, n)i, hmi ∩ hni = hlcm(m, n)i.

55 5. Groups
We then immediately have Bezout’s lemma, i.e. there exist integers u and v so that
um + vn = gcf(m, n).
We can then establish the usual properties, e.g. if x|m and x|n then x| gcf(m, n).
• The Chinese remainder theorem states that if gcf(m, n) = 1, then
Cmn ∼
= Cm × Cn .
Specifically, if g and h generate Cm and Cn , we claim (g, h) generates Cm × Cn . It suffices to

show (g, h) has order mn. Clearly its order divides mn. Now suppose that (g k , hk ) = e. Then
m|k and n|k, and by Bezout’s lemma um + vn = 1. But then we have
mn|umk + vnk = k
so mn divides the order, and hence they are equal.
• We write Zn for the set of equivalence classes where a ∼ b if n|(a − b). Both addition and
multiplication are well defined on these classes. Under addition, Zn is simply a cyclic group Cn .
• Multiplication is more complicated. By Bezout’s lemma, m ∈ Zn has a multiplicative inverse

if and only if gcf(m, n) = 1, and we call m a unit. Hence if Zn is prime, then it is a field. In
general the set of units forms a group Z∗n under multiplication.
Next, we consider Lagrange’s theorem.
• Let H be a subgroup of G. We define the left and right cosets
gH = {gh : h ∈ H}, Hg = {hg : h ∈ H}
and write G/H to denote the set of (left) cosets. In general, gH 6= Hg.
• We see gH and kH are the same coset if k −1 g ∈ H. This is an equivalence relation, so the
cosets partition the group. Moreover, all cosets have the same size because the map h 7→ gh is
a bijection between H and gH. Thus we have
|G| = |G/H| · |H|.
In particular, we have Lagrange’s theorem, |H| divides |G|.
• By considering the cyclic group generated by any group element, the order of any group element
divides |G|. In particular, all groups with prime order are cyclic.
• Fermat’s little theorem states that for a prime p where p does not divide a,
ap−1 ≡ 1 mod p.
This is simply because the order of a in Z∗p divides p − 1.

56 5. Groups
• In general, |Z∗n | = φ(n) where φ is the totient function, which satisfies
φ(p) = p − 1, φ(pk ) = pk−1 (p − 1), φ(mn) = φ(m)φ(n) if gcf(m, n) = 1.
Then Euler’s theorem generalizes Fermat’s little theorem to
aφ(n) ≡ 1 mod n
where gcf(a, n) = 1.
• Wilson’s theorem states that for a prime p,
(p − 1)! ≡ −1 mod p.
To see this, note that the only elements that are their own inverses are ±1. All other elements
pair off with their inverses and contribute 1 to the product.
• If G has even order, then it has an element of order 2, by the same reasoning as before: some
element must be its own inverse by parity.
• This result allows us to classify groups of order 2p for prime p ≥ 3. There must be an element
x of order 2. Furthermore, not all elements can have order 2, or else the group would be (Z2 )n ,
so there is an element y of order p. Since p is odd, x 6∈ hyi, so the group is G = hyi ∪ xhyi.
The product yx must be one of these elements, and it can’t be a power of y, so yx = xy j . Then
odd powers of yx all carry a power of x, so yx must have even order. If it has order 2p, then
G∼= C2p . Otherwise, it has order 2, so (yx)2 = y j+1 = 1, implying j = p − 1, so G ∼
= D2p .
• The group D2n can be presented in terms of generators and relations,
D2n = hr, s : rn = s2 = e, sr = r−1 si.
In general, when one is given a group in this form, one simply uses the relations to reduce
strings of the generators, called words, as far as possible. The remaining set that cannot be
reduced form the group elements.
Example. So far we’ve classified all groups up to order 7, where order 6 follows from the work
above. The groups of order 8 are
C8 , C2 × C4 , C2 × C2 × C2 , D8 , Q8
where Q8 is the quaternion group. The quaternions are numbers of the form
q = a + bi + cj + dk, a, b, c, d ∈ R
obeying the rules

i2 = j2 = k2 = ijk = −1.
The group Q8 is identified with the subset {±1, ±i, ±j, ±k}.
57 5. Groups
5.2 Group Homomorphisms

Next, we consider maps between groups.
• A group homomorphism φ : G → H is a map so that
φ(g1 g2 ) = φ(g1 )φ(g2 )
and an isomorphism is simply a bijective homomorphism. An automorphism of G is an isomor-

phism from G to G, and form a group Aut(G) under composition. An endomorphism of G is a
homomorphism from G to G. We say a monomorphism is an injective homomorphism and an
epimorphism is a surjective homomorphism.
• There are many basic examples of homomorphisms.
– If H ⊆ G, we have inclusion ι : H → G with ι(h) = h.

– The trivial map φ(g) = e.
– The projections π1 : G1 × G2 → G1 , (g1 , g2 ) 7→ g1 , and π2 : G1 × G2 → G2 , (g1 , g2 ) 7→ g2 .
– The sign map sgn : Sn → {±1} which gives the sign of a permutation.
– The determinant det : GL(n, R) → R∗ , and the trace tr : Mn (R) → R where the operation
on Mn (R) is addition.
– The map log : (0, ∞) → R, which is moreover an isomorphism.
– The map φ : G → G given by φ(g) = g 2 , if and only if G is abelian.
– Conjugation is an automorphism, φh (g) = hgh−1 .
– All homomorphisms of φ : Z → Z are of the form φ(x) = nx, because homomorphisms are
completely determined by how they map the generators.
• We say H is a normal subgroup of G, and write H E G if
gH = Hg for all g ∈ G
or equivalently if g −1 hg ∈ H for all g ∈ G, h ∈ H. Since conjugation is akin to a “basis

change”, a normal subgroup “looks the same from all directions”. Normality depends on how H
is embedded in G, not just on H itself. A group is simple if it has no proper normal subgroups.
In an abelian group, all subgroups are normal.
• For a group homomorphism φ : G → H, define the kernel and image by
ker φ = {g ∈ G : φ(g) = e} E G, im φ = {φ(g) : g ∈ G} ⊆ H.
Note that φ is constant on cosets of ker φ.
• Normal subgroups are unions of conjugacy classes. This can place strong constraints on normal
subgroups by counting arguments.
• If |G/H| = 2 then H E G. This is because the left and right cosets eH and He must coincide,
and hence the other left and right coset also coincide. For example, An E Sn and SO(n) E O(n).
58 5. Groups
• We define the center of G as
Z(G) = {g ∈ G : gh = hg for all h ∈ G}.
Then Z(G) E G.
Next, we construct quotient groups.
• For H E G, we may define a group operation on G/H by
(g1 H)(g2 H) = (g1 g2 )H
and hence make G/H into a quotient group. This rule is consistent because
(g1 H)(g2 H) = g1 HHg2 = g1 Hg2 = g1 g2 H.
Conversely, the consistency of this rule implies H E G, because
(g −1 hg)H = (g −1 H)(hH)(gH) = (g −1 H)(eH)(gH) = (g −1 g)H = H
which implies that g −1 hg ∈ H.
• The idea of a quotient construction is to ‘mod out’ by H, leaving a simpler structure, or

equivalently identify elements of G by an equivalence relation. In terms of sets, there are no
restrictions, but we need H E G to preserve group structure.
• If H E G, it is the kernel of a homomorphism from G, namely
π : G → G/H, π(g) = gH.
• We give a few examples of quotient groups below.
– We have Z/nZ ∼= Zn almost by definition.

– We have Sn /An ∼
= C2 .
– For the rotation generator r of D2n , D2n /hri ∼
= C2 .
– We have C∗ /S 1 ∼
= (0, ∞) because we remove the complex phase.
– Let AGL(n, R) denote the group of affine maps f (x) = Ax + b where A ∈ GL(n, R). If T
is the subgroup of translations, G/T ∼
= GL(n, R).
• The first isomorphism theorem states that for a group homomorphism φ : G → H,
G/ ker φ ∼
= im φ
via the isomorphism

g(ker φ) 7→ φ(g).
It is straightforward to verify this is indeed an isomorphism. As a corollary,
|G| = | ker φ| · | im φ|.
• We give a few examples of this theorem below.

59 5. Groups
– For det : GL(n, R) → R∗ we have GL(n, R)/SL(n, R) ∼

= R∗ .
– For φ : Z → Z with φ(x) = nx), we have Z ∼
= nZ.
– For φ : Z → Zn given by φ(x) = x, we have Z/nZ ∼
= Zn .
• The first isomorphism theorem can also be used to classify all homomorphisms φ : G → H. We
first determine the normal subgroups of G, as these are the potential kernels. For each normal
subgroup N , we count the number n(N ) of subgroups in H isomorphic to G/N . Finally, we
determine Aut(G/N ). Then the number of homomorphisms is
X
n(N ) · |Aut(G/N )|.
N
This is because all such homomorphisms have the form

π ι
G−
→ G/N →
− I
where π maps g 7→ gN and ι is an isomorphism from G/N to I ⊆ H ∼

= G/N , or which there
are Aut(G/N ) possibilities.
There are also additional isomorphism theorems.
• For a group G, if H ⊆ G and N E G, then HN = {hn|h ∈ H, n ∈ N } is a subgroup of G. This

is because N H = HN , and HN HN = HN N H = HN H = HHN = HN .
• The second isomorphism theorem states that for H ⊆ G and N E G, then H ∩ N E H and
HN ∼ H
= .
N H ∩N
The first statement follows because both N and H are closed under conjugation by elements of
H. As for the second, we consider
i
H→
− HN → HN/N
where i is the inclusion map and the second map is a quotient. The composition is surjective
with kernel H ∩ N , so the result follows from the first isomorphism theorem.
• Let N E G and K E G with K ⊆ N . Then N/K E G/K and
(G/K)/(N/K) ∼
= G/N.
The first statement follows because
(gK)−1 (nK)(gK) = g −1 KnKgK = g −1 ngK ∈ N/K
since K is normal in G. Now consider the composition of quotient maps
G → G/K → (G/K)/(N/K).
The composition is surjective with kernel N , giving the result.

60 5. Groups
• Conversely, let K E G and let G = G/K with H E G. Then there exists H ⊆ G with H = H/K,
defined by
H = {h ∈ G|hK ∈ H}.
Note that in this definition, H is comprised of cosets. However, if H E G then H E G.
• As a corollary, given K E G there is a one-to-one correspondence H 7→ H = H/K between

subgroups of G containing K, and subgroups of G/K, which preserves normality. This is a
sense in which structure is preserved upon quotienting.
Example. We will use the running example of G = S4 . Let H = S3 ⊆ S4 by acting on the first
three elements only, and let N = V4 E S4 . Then HN = S4 and H ∩ N = {e}, so the second
isomorphism theorem states
S4 /V4 ∼
= S3 .
Next, let N = A4 E S4 and K = V4 E S4 . We may compute G/K ∼
= S3 and N/K ∼
= A3 , so the
third isomorphism theorem states
S3 /A3 ∼
= C2 .
Example. The symmetric groups Sn are not simple, because An E Sn . However, An is simple for
n ≥ 5. For example, for A5 the conjugacy classes have sizes
60 = 1 + 20 + 15 + 12 + 12
where the factors of 12 come from splitting the 24 5-cycles. There is no way to pick a subset of
these numbers to sum to 30. In fact, A5 is the smallest non-abelian simple group.
Note. As we’ll see below, the simple groups are the “atoms” of group theory. The finite simple
groups have been classified; the only possibilities are:
• A cyclic group of prime order Cp .
• An alternating group An for n ≥ 5.
• A finite group of Lie type such as PSL(n, q) for n > 2 or q > 3.
• One of 26 sporadic groups, including the Monster and Baby Monster groups.
5.3 Group Actions

Next, we consider group actions.
• A left action of a group G on a set S is a map
ρ : G × S → S, g · s ≡ ρ(g, s)
obeying the axioms

e · s = s, g · (h · s) = (gh) · s
for all s ∈ S and g, h ∈ G. A right action would have the order in the second axiom reversed.
• All groups have a left action on themselves by g · h = gh and by conjugation, g · h = ghg −1 . As

we’ve already seen, there is a left action of G on the left cosets G/H by g1 · (g2 H) = (g1 g2 )H,
though this only descends to a left action of G/H on itself when H E G.
61 5. Groups
• The orbit and stabilizer of s ∈ S are defined as
Orb(s) = {g · s : g ∈ G} ⊂ S, Stab(s) = {g ∈ G : g · s = s} ⊆ G.
In particular, Stab(s) is a subgroup of G, and the orbits partition S. If there is only one orbit,
we say the action is transitive. Also, if two elements lie in the same orbit, their stabilizers are
conjugate.
• For example, GL(n, R) acts on matrices and column vectors Rn by matrix multiplication, and
on matrices by conjugation; in the latter case the orbits correspond to Jordan normal forms.
Also note that GL(n, R) has a left action on column vectors but a right action on row vectors.
• The symmetry group D2n acts on the vertices of a regular n-gon. Affine transformations of
the plane act on shapes in the plane, and the orbits are congruence classes. Geometric group
actions such as these were the original motivation for group theory.
• The orbit-stabilizer theorem states that
|G| = | Stab(s)| · |Orb(s)|.
This is because there is an isomorphism between the cosets of Stab(s) and the elements of
Orb(s), explicitly by g Stab(s) 7→ g · s, which implies |G|/| Stab(s)| = |Orb(s)|. That is, a
transitive group action corresponds to a group action on the set of cosets of the stabilizer.
• This is a generalization of Lagrange’s theorem, because in the case H ⊆ G, the action of G on

G/H by g · (kH) = (gk)H has Stab(H) = H and Orb(H) = G/H, so |G| = |G/H| · |H|. What
we’ve additionally learned is that in the general case, |Orb(s)| divides |G|.
• Define the centralizer of g ∈ G by
CG (g) = {h ∈ G : gh = hg}.
Also let C(g) be the conjugacy class of g. Applying the orbit-stabilizer theorem to the group
action of conjugation,
|G| = |CG (g)| · |C(g)|.
This gives an alternate method for finding |C(g)|, or for finding |G|.
Example. Let GT be the tetrahedral group, the set of rotational symmetries of the four vertices
of a tetrahedron. The stabilizer of a particular vertex v consists of the identity and two rotations,
and the action is transitive, so
|GT | = 3 · 4 = 12.
Similarly, for the cube, the stabilizer of a vertex consists of the identity and the 120◦ and 240◦
rotations about a space diagonal through the vertex, so
|GC | = 3 · 8 = 24.
We could also have done the calculation looking at the orbit and stabilizer of edges or faces.
62 5. Groups
Example. If |G| = pr , then G has a nontrivial center. The conjugacy class sizes are powers of p,
and the class of the identity has size 1, so there must be more classes of size 1, yielding a nontrivial
center. In the case |G| = p2 , let x be a nontrivial element in the center. If the order of x is p2 , then
G∼= Cp2 . If not, it has order p. Consider another element y with order p, not generated by x. Then
the p2 group elements xi y j form the whole group, so G ∼ = Cp × Cp .
Example. Cauchy’s theorem states that for any finite group G and prime p dividing |G|, G has
an element of order p. To see this, consider the set
S = {(g1 , g2 , . . . , gp ) ∈ Gp |g1 g2 . . . gp = e}.
Then |S| = |G|p−1 , because the first p − 1 elements can be chosen freely, while the last element is
determined by the others. The group Cp with generator σ acts on S by
σ · (g1 , g2 , . . . , gp ) = (g2 , . . . , gp , g1 ).
By the Orbit-Stabilizer theorem, the orbits have size 1 or p, and the orbits partition the set. Since
(e, . . . , e) is an orbit of size 1, there must be other orbits of size 1, corresponding to an element g
with g p = e.
Orbits can also be used in counting problems.
• Let G act on S and let N be the number of orbits Oi . Then

1 X
N= |fix(g)|, fix(g) = {s ∈ S : g · s = s}.
|G|
g∈G
To see this, note that we can count the pairs (g, s) so that g · s = s by summing over group
elements or set elements, giving
X X
|fix(g)| = | Stab(s)|.
g∈G s∈S
Next, applying the Orbit-Stabilizer theorem,

N X N X
X X X |G|
| Stab(s)| = | Stab(s)| = = N |G|
|Oi |
s∈S i=1 s∈Oi i=1 s∈Oi
as desired. This result is called Burnside’s lemma.
• Note that if g and h are conjugate, then |fix(g)| = |fix(h)|, so the right-hand side can also be
evaluated by summing over conjugacy classes.
• Note that every action of G on a set S is associated with a homomorphism
ρ : G → Sym(S)
which is called a representation of G. For example, when S is a vector space and G acts by
linear transformations, then ρ is a representation as used in physics.
• The representation is faithful if G is isomorphic to im ρ. Equivalently, it is faithful if only the

identity element acts trivially.
63 5. Groups
• A group’s action on itself by left multiplication is faithful, so every finite group G is isomorphic
to a subgroup of S|G| . This is called Cayley’s theorem.
Example. Find the number of ways to color a triangle’s edges with n colors, up to rotation and
reflection. We consider rotations D6 acting on the triangle, and want to find the number of orbits.
Burnside’s lemma gives
1 3
n + 3n2 + 2n

N=
3
where we summed over the trivial conjugacy class, the conjugacy class of the rotation, and the
conjugacy class of the reflection. This is indeed the correct answer, with no casework required.
Example. Find the number of ways to paint the faces of a rectangular box black or white, where
the three side lengths are distinct. The rotational symmetries are C2 × C2 , corresponding to the
identity and 180◦ rotations about the x, y, and z axes. Then
1
N = (26 + 24 ) = 28.
4
Example. Find the number of ways to make a bracelet with 3 red beads, 2 blue beads, and 2 white
beads. Here the symmetry group is D14 , imagining the beads as occupying the vertices of a regular
heptagon, and there are 7!/3!2!2! = 210 bracelets without accounting for the symmetry. Then
1
N= (210 + 6(0) + 7(3!)) = 18.
14
Example. Find the number of ways to color the faces of a cube with n colors. The relevant
symmetry group is GC . Note that we have a homomorphism ρ : GC → S 4 by considering how GC
acts on the four space diagonals of the cube. In fact, it is straightforward to check that ρ is an
isomorphism, so GC ∼= S 4 . This makes it easy to count the conjugacy classes. We have
24 = 1 + 3 + 6 + 6 + 8
where the 3 corresponds to double transpositions or rotations of π about opposing faces’ midpoints,
the first 6 corresponds to 4-cycles or rotations of π/2 about opposing faces’ midpoints, the second
6 corresponds to transpositions or rotations of π about opposing edges’ midpoints, and the 8
corresponds to 3-cycles or rotations of π/3 about space diagonals. By Burnside’s lemma,
1 6
N= (n + 3n4 + 6n3 + 6n3 + 8n2 ).
24
By similar reasoning, we have a homomorphism ρ : GT → S4 by considering how GT acts on the
four vertices of the tetrahedron, and |GT | = 12, so GT ∼
= A4 .
5.4 Composition Series

First, we look more carefully at generators and relations.
• For a group G and a subset S of G, we defined the subgroup hSi ⊆ G to be the smallest subgroup
of G containing S. However, it is not clear how this definition works for infinite groups, nor
immediately clear why it is unique. A better definition is to let hSi be the intersection of all
subgroups of G that contain S.
64 5. Groups
• We say a group G is finitely generated if there exists a finite subset S of G so that hSi = G.
All groups of uncountable order are not finitely generated. Also, Q under multiplication is
countable but not finitely generated because there are infinitely many primes.
• Suppose we have a set S called an alphabet, and define a corresponding set S −1 , so the element
x ∈ S corresponds to x−1 ∈ S −1 . A word w is a finite sequence w = x1 . . . xn where each
xi ∈ S ∪ S −1 . The empty sequence is denoted by ∅.
• We may contract words by canceling adjacent pairs of the form xx−1 for x ∈ S ∪ x−1 . It is
somewhat fiddly to prove, but intuitively clear, that every word w can be uniquely transformed
into a reduced word [w] which does not admit any such contractions.
• The set of reduced words is a group under concatenation, called the free group F (S) generated
by S. Here F (S) is indeed a group because
[[ww0 ]w00 ] = [w[w0 w00 ]]
by the uniqueness of reduced words; both are equal to [ww0 w00 ].
Free groups are useful because we can use them to formalize group presentations.
• Given any set S, group G, and mapping f : S → G, there is a unique homomorphism φ : F (S) →
G so that the diagram
f
S G
φ
i
F (S)
commutes, where i : S → F (S) is the canonical inclusion which takes x ∈ S to the corresponding
generator of F (S).
• To see this, we define

φ(x11 . . . xnn ) = f (x1 )1 . . . f (x2 )2
where i = ±1. It is clear this is a homomorphism, and it is unique because φ(x) = f (x) for
every x ∈ S, and a homomorphism is determined by its action on the generators.
• Taking S to be a generating set for G, and f to be inclusion, this implies every group is a
quotient of a free group.
• Let B be a subset of a group G. The normal subgroup generated by B is the intersection of all
normal subgroups of G that contain B, and is denoted by hhBii.
• More precisely, we have

hhBii = hgbg −1 : g ∈ G, b ∈ Bi
which explicitly means that hhBii consists of elements of the form
n
Y
gi bi i gi−1 .
i=1
65 5. Groups
To prove this, denote this set as N . It is clear that N ⊆ hhBii, so it suffices to show that N E G.
The only nontrivial check is closure under conjugation, which works because
n n
!
Y i −1
Y
g gi bi gi −1
g = (ggi )bi i (ggi )−1
i=1 i=1
which lies in N .
• Let X be a set and let R be a subset of F (X). We define the group with presentation hX|Ri
to be F (X)/hhRii. We need to use hhRii because the relation w = e implies gwg −1 = e.
• For any group G, there is a canonical homomorphism F (G) → G by sending every generator of
F (G) to the corresponding group element. Letting R(G) be the kernel, we have G ∼
= F (G)/R(G),
and hence we define the canonical presentation for G to be
hG|R(G)i.
This is a very inefficient presentation, which we mention because it uses no arbitrary choices.
• Free groups also characterize homomorphisms. Let hX|Ri and H be groups. A map f : X → R
induces a homomorphism φ : F (X) → H. This descends to a homomorphism hX|Ri → H if
and only if R ⊂ ker φ.
Next, we turn to composition series.
• A composition series for a group G is a sequence of subgroups

{e} E G1 E . . . E Gn−1 E Gn = G
so that each composition factor Gi+1 /Gi is simple, or equivalently each Gi is a maximal proper
normal subgroup of Gi+1 . By induction, every finite group has a composition series.
• Composition series are not unique. For example, we have
{e} E C2 E C4 E C12 , {e} E C3 E C6 E C12 , {e} E C2 E C6 E C12 .
The composition factors are C2 , C2 , and C3 in each case, but in a different order.
• Composition series do not determine the group. For example, A4 has composition series
{e} E C2 E V4 E A4
with composition factors C2 , C2 , and C3 . There are actually three distinct composition series
here, since V4 has three C2 subgroups. The composition factors don’t say how they fit together.
• The group Z, which is infinite, does not have a composition series.
• The Jordan-Holder theorem states that all composition series for a finite group G have the
same length, with the same composition factors. Consider the two composition series
{e} E G1 E . . . E Gr−1 E Gr = G, {e} E H1 E . . . E Hs−1 E Hs = G.
We prove the theorem by induction on r. If Gr−1 = Hs−1 , then we are done. Otherwise, note
that Gr−1 Hs−1 E G. Now, by the definition of a composition series Gr−1 cannot contain Hs−1 ,
so Gr−1 Hs−1 must be strictly larger than Gr−1 . But by the definition of a composition series
again, that means we must have Gr−1 Hs−1 = G. Let K = Gr−1 ∩ Hs−1 E G.
66 5. Groups
• The next step in the proof is to ‘quotient out’ by K. By the second isomorphism theorem,
G/Gr−1 ∼
= Hs−1 /K, G/Hs−1 ∼
= Gr−1 /K
so Gr−1 /K and Hs−1 /K are simple. Since K has a composition series, we have composition
series
{e} E K1 E . . . E Kt−1 E K E Gr−1 , {e} E K1 E . . . E Kt−1 E K E Hs−1 .
By induction, the former series is equivalent to
{e} E G1 E . . . E Gr−1
which means that t = r − 2. By induction again, the latter series is equivalent to
{e} E H1 E . . . E Hs−1
which proves that r = s.
• Next, we append the factor G to the end of these series. By the second isomorphism theorem,
the composition series
{e} E K1 E . . . E Kt−1 E K E Gr−1 E G, {e} E K1 E . . . E Kt−1 E K E Hs−1 E G
are equivalent. Then our original two composition series are equivalent, completing the proof.
• Note that if G is finite and abelian, its composition factors are also, and hence must be cyclic
of prime order. In particular, for G = Cn this proves the fundamental theorem of arithmetic.
• If H E G with G finite, then the composition factors of G are the union of those of H and
G/H. We showed this as a corollary when discussing the isomorphism theorems. In particular,
if X and Y are simple, the only two composition series of X × Y are
{e} E X E X × Y, {e} E Y E X × Y.
• A finite group G is solvable if every composition factor is a cyclic group of prime order, or
equivalently, abelian. Burnside’s theorem states that all groups of order pn q m for primes p
and q are solvable, while the Feit-Thompson theorem states that all groups of odd order are
solvable.
5.5 Semidirect Products

Finally, as a kind of converse, we see how groups can be built up by combining groups.
• We already know how to combine groups using the direct product, but this is uninteresting.
Suppose a group were of the form G = G1 G2 for two disjoint subgroups G1 and G2 . Then
every group element can be written in the form g1 g2 , but it is unclear how we would write the
product of two elements (g1 g2 )(g10 g20 ) in this form. The problem is resolved if one of the Gi is
normal in G, motivating the following definition.
67 5. Groups
• Let G be a group with H ⊆ G and N E G. We say G is an internal semi-direct product of H

and N and write
G=N oH
if G = N H and H ∩ N = {e}.
• The semidirect product generalizes the direct product. If we also have H E G, then G ∼
= N × H.
To see this, note that every group element can be written uniquely in the form nh. Letting
nh = (n1 h1 )(n2 h2 ), we have
nh = (n1 h1 n2 h−1 −1
1 )(h1 h2 ) = (n1 n2 )(n2 h1 n2 h2 ).
By normality of N and H, both these expressions are already in the form nh. Then we have
n = n1 h1 n2 h−1
1 = n1 n2 , which implies h1 n2 = n2 h1 , giving the result.
• We’ve already seen several examples of the semidirect product.
– We have D2n = hσi o hτ i where σ generates rotations and τ is a reflection. Note that a
nonabelian group arises from the semidirect product of abelian groups.
– We have Sn = An o hσi for any transposition σ.
– We have S4 = V4 o S3 , which we found earlier.
• To understand the multiplication rule in a semidirect product, letting nh = (n1 h1 )(n2 h2 ) again,
nh = n1 h1 n2 h2 = (n1 h1 n2 h−1
1 )h1 h2
which implies that
(n1 , h1 ) ◦ (n2 , h2 ) = (n1 φh1 (n2 ), h1 h2 ), φh (g) = hgh−1 .
That is, the multiplication law is like that of a direct product, but the multiplication in N is
“twisted” by conjugation by H. The map h 7→ φh gives a group homomorphism H → Aut(N ).
• This allows us to define the semidirect product of two groups without referring to a larger group,
i.e. an external semidirect product. Specifically, for two groups H and N and a homomorphism
φ : H → Aut(N )
we may define (N o H, ◦) to consist of the set of pairs (n, h) with group operation
(n1 , h1 ) ◦ (n2 , h2 ) = (n1 φ(h1 )(n2 ), h1 h2 ).
Then it is straightforward to check that N E H is an internal semi-direct product of the

subgroups H̃ = {(e, h)} and Ñ = {(n, e)}. The direct product is just the case of trivial φ.
Example. Let Cn = hai and C2 = hbi. Let φ : C2 → Aut(Cn ) satisfy φ(b)(a) = a−1 . Then
Cn oφ C2 ∼
= D2n . To see this, note that an = b2 = e and
ba = (e, b) ◦ (a, e) = (φ(b)(a), b) = a−1 b
which is the other relation of D2n .

68 5. Groups
Example. An automorphism of Zn must map 1 to another generator, so

Aut(Zn ) ∼
= U (Zn )
where U (Zn ) is the group of units of the ring Zn , i.e. the numbers k with gcf(k, n) = 1. For example,
suppose we classify semidirect products Z3 o Z3 . Then
Aut(Z3 ) ∼
= {1, 2} ∼
= Z2
since the automorphism that maps 1 7→ 2 is negation. However, since the only homomorphism
H : Z3 → Z2 is the trivial map, the only possible semidirect product is Z3 × Z3 .
Next consider Z3 o Z4 . There is one nontrivial homomorphism H : Z4 → Z2 , which maps 1 mod 4
to negation. Hence
(n1 mod 3, h1 mod 4) ◦ (n2 mod 3, h2 mod 4) = (n1 + (−1)h1 n2 mod 3, h1 + h2 mod 4).
This is easier to understand in terms of generators. Defining
x = (1 mod 3, 0 mod 4), y = (0 mod 3, 1 mod 4)
we have relations x3 = y 4 = e and yx = x−1 y. This is a group of order 12 we haven’t seen before.
Example. We know that S4 = V4 o S3 . To see this as an external direct product, note that
Aut(V4 ) ∼
= S3 = Sym({1, 2, 3})
since the three non-identity elements a, b, and c can be permuted. Writing the other factor of S3
as Sym({a, b, c}), the required homomorphism is the one induced by mapping a ↔ 1, b ↔ 2, c ↔ 3.
We now discuss the group extension problem.
• Let A, B, and G be groups. Then
i π
{e} → A →
− G−
→ B → {e}
is a short exact sequence if i is injective, π is surjective, and im i = ker π. Note that i(A) =
ker π E G and by the first isomorphism theorem, B ∼ = G/A.
• In general, we say that an extension of A by B is a group G with a normal subgroup K ∼
= A, with
∼
G/K = B. This is equivalent to the exactness of the above sequence. Hence the classification
of extensions of A by B is equivalent to classifying groups G where we know G/A ∼ = B.
• The short exact sequence shown above splits if there is a group homomorphism j : B → G so
that π ◦ j = idB , and this occurs if and only if G ∼
= A o B. For the forward direction, note that
if the sequence splits, then j is injective and im J ∼
= B. Since im i ∩ im j = {e}, G ∼
= A o B. To
show explicitly that G is an external semidirect product, we use
φ : B → Aut(A), φ(b)(a) = i−1 (j(b)i(a)j(b−1 )).
Example. The extensions of C2 = hai by C2 = hbi are

{e} → C2 → C2 × C2 × C2 → {e}
along with the nontrivial extension
i π
{e} → C2 →
− C4 = hci −
→ C2 → {e}
where i(a) = c2 and π(c) = b. The short exact sequence does not split. Hence even very simple
extensions can fail to be semidirect products.
69 6. Rings
6 Rings
6.1 Fundamentals
We begin with the basic definitions.
• A ring R is a set with two binary operations + and ×, so that R is an abelian group under the
operation + with identity element 0 ∈ R, and × is associative and distributes over +,
(a + b)c = ac + bc, a(b + c) = ab + ac
for all a, b, c ∈ R. If multiplication is commutative, we say the ring is commutative. Most

intuitive rules of arithmetic hold, with the notable exception that multiplication is not invertible.
• A ring R has an identity if there is an element 1 ∈ R where a1 = 1a = a, and 1 6= 0. If the

latter were not true, then everything would collapse down to the zero element. Most rings we
study will be commutative rings with an identity (CRIs).
• Here we give some fundamental examples of rings.
– Any field F is a CRI. The polynomials F[x] also form a CRI. More generally given any
ring R, the polynomials R[x] also form a ring. We may also define polynomial rings with
several variables, R[x1 , . . . , xn ].
– The integers Z, the Gaussian integers Z[i], and Zn are CRIs. The quaternions H form a
noncommutative ring.
– The set Mn (F) of n × n matrices over F is a ring, which implies End(V ) = Hom(V, V ) is a
ring for a vector space V .
– For an n×n matrix A, the set of polynomials evaluated on A, denoted F[A], is a commutative
subring of Mn (F). Note that the matrix A may satisfy nontrivial relations; for instance if
A2 = −I, then R[A] ∼= C.
– The space of bounded real sequences `∞ is a CRI under componentwise addition and
multiplication, as does the set of continuous functions C(R). In general for a set S and
ring R we may form a ring RS out of functions f : S → R.
– The power set P(X) of a set X is a CRI where the multiplication operation is intersection,
and the addition operation is XOR, written as A∆B = (A \ B) ∪ (B \ A). Then the additive
inverse of each subset is itself. For a finite set, P(X) ∼
= (Z2 )|X| .
• Polynomial rings over fields are familiar. However, we will be interested in polynomial rings
over rings, which are more subtle. For example, in Z8 [x] we have
(2x)(4x) = 8x2 = 0
so multiplication is not invertible. Moreover the quadratic x2 − 1 has four roots 1, 3, 5, 7, and
hence can be factored in two ways,
x2 − 1 = (x − 1)(x + 1) = (x − 3)(x − 5).
Much of our effort will be directed at finding when properties of C[x] carry over to general
polynomial rings.
70 6. Rings
• A subring S ⊆ R is a subset of a ring R that is closed under + and ×. This implies 0 ∈ S. For
example, as in group theory, we always have the trivial subrings {0} and R. Given any subset
X ⊂ R, the subring generated by X is the smaller subring containing it.
• In a ring R, we say a nonzero element a ∈ R is a zero divisor if there exist nonzero b, c ∈ R so

that ab = ca = 0. An integral domain R is a CRI with no zero divisors.
• If R is an integral domain, then cancellation works: if a 6= 0 and ab = ac, then b = c. This is

because 0 = ab − ac = a(b − c), which implies b − c = 0.
• In a ring R with identity, an element a ∈ R is a unit if there exists a b ∈ R so that ab = ba = 1.

If such a b exists, we write it as a−1 . The set of units R∗ forms a group under multiplication.
• We now give a few examples of these definitions.
– All fields are integral domains where every element is a unit.

– The integers Z form an integral domain with units ±1. The Gaussian integers Z[i] form an
integral domain with units ±1, ±i.
– In H there no zero divisors but it is not an integral domain, because it is not commutative.
– In Mn (R), the nonzero singular matrices are zero divisors, and the invertible matrices are
the units.
– In P(X), every nonempty proper set is a zero divisor and the only unit is X.
6.2 Quotient Rings and Field Extensions

6.3 Factorization
6.4 Modules
6.5 The Structure Theorem
71 7. Point-Set Topology
7 Point-Set Topology
7.1 Definitions
We begin with the fundamentals, skipping content covered when we considered metric spaces.
Definition. A topological space is a set X and a topology T of subsets of X, whose elements

are called the open sets of X. The topology must include ∅ and X and be closed under finite
intersections and arbitrary unions.
Example. The topology containing all subsets of X is called the discrete topology, and the one
containing only X and ∅ is called the indiscrete/trivial topology.
Example. The finite complement topology Tf is the set of subsets U of X such that X − U is
either finite or all of X. The set of finite subsets U of X (plus X itself) fails to be a topology, since
it’s instead closed under arbitrary intersections and finite unions; taking the complement flips this.
Definition. Let T and T 0 be two topologies on X. If T 0 ⊃ T , then T 0 is finer than T . If the

reverse is true, we say T 0 is coarser than T . If either is true, we say T and T 0 are comparable.
Definition. A basis B for a topology on X is a set of subsets of X, called basis elements, such that
• For every x ∈ X, there is at least one basis element B containing x.
• If x belongs to the intersection of two basis elements B1 and B2 , then there is a basis element
B3 containing x such that B3 ⊂ B1 ∩ B2 .
The topology T generated by B is the set of unions of elements of B. Conversely, B is a basis for T
if every element of T can be written as a union of elements of B.
Prop. The set of subsets generated by a basis B is a topology.
Proof. Most properties hold automatically, except for closure under finite intersections. It suffices
to consider the intersection of two sets, U1 , U2 ∈ T . Let x ∈ U1 ∩ U2 . We know there is a basis
element B1 ⊂ U1 that contains x, and a basis element B2 ⊂ U2 that contains x. Then there is a B3
containing x contained in B1 ∩ B2 , which is in U1 ∩ U2 . Then U1 ∩ U2 ∈ T , as desired.
Describing a topological space by a basis fits better with our intuitions. For example, the topology
generated by B 0 is finer than the topology generated by B is every element of B can be written as
the union of elements of B 0 . Intuitively, we “smash rocks (basis elements) into pebbles”.
Example. The collection of one-point subsets is a basis for the discrete topology. The collection of
(open) circles is a basis for the “usual” topology of R2 , as is the collection of open rectangles. We’ll
formally show this later.
Example. Topologies on R. The standard topology on R has basis (a, b) for all real a < b, and
we’ll implicitly mean this topology whenever we write R. The lower limit topology on R, written
Rl , is generated by basis [a, b). The K-topology on R, written RK , is generated by open intervals
(a, b) and sets (a, b) − K, where K = {1/n | n ∈ Z+ }.
Both of these topologies are strictly finer than R. For x ∈ (a, b), we have x ∈ [x, b) ⊂ (a, b), so
Rl is finer; since there is no open interval containing x in [x, b), it is strictly finer. Similarly, there
is no open interval containing 0 in (−1, 1) − K, so RK is strictly finer.
Definition. A subbasis S for a topology on X is a set of subsets of X whose union is S. The

topology it generates is the set of unions and finite intersections of elements of S.
Definition. Let X be an ordered set with more than one element. The order topology on X is
generated by a basis B containing all open intervals (a, b), and the intervals [a0 , b) and (a, b0 ] where
a0 and b0 are the smallest and largest elements of X, if they exist.
It’s easy to check B is a basis, as the intersection of two intervals is either empty or another interval.
Prop. The order topology on X contains the open rays
(a, +∞) = {x | x > a}, (−∞, a) = {x | x < a}.
Proof. Consider (a, +∞). If X has a largest element, we’re done. Otherwise, it is the union of all
basis elements of the form (a, x) for x > a.
Example. The order topology on R is just the usual topology. The order topology on R2 in the
dictionary order contains all open intervals of the form (a × b, c × d) where a < c or a = c and b < d.
It’s sufficient to take the intervals of the second type as a basis, since we can recover intervals of
the first type by taking unions of rays.
Example. The set X = {1, 2} × Z+ in the dictionary order looks like a1 , a2 . . . ; b1 , b2 , . . .. However,
the order topology on X is not the discrete topology, because it doesn’t contain {b1 }! All open sets
containing b1 must contain some ai .
Definition. If X and Y are topological spaces, the product topology on X × Y is generated by the
basis B containing all sets of the form U × V , where U and V are open in X and Y .
We can’t use B itself as the topology, since the union of product sets is generally not a product set.
Prop. If B and C are bases for X and Y , the set of products D = {B × C | B ∈ B, C ∈ C} is a basis
for the product topology on X × Y .
Proof. We must show that any U × V can be written as the union of members of D. For any
x × y ∈ U × V , we have basis elements B ⊂ U containing x and C ⊂ V containing y. Then
B × C ⊂ U × V and contains x, as desired.
Example. The standard topology on R2 is the product topology on R × R.

We can also find a subbasis for the product topology. Let π1 : X × Y → X denote projection onto
the first factor and let π2 : X × Y → Y be projection onto the second factor. If U is open in X,
then π1−1 (U ) = U × Y is open in X × Y .
Prop. The collection
S = {π1−1 (U ) | U open in X} ∪ {π2−1 (V ) | V open in Y }
is a subbasis for the product topology on X × Y . Intuitively, the basis contains rectangles, and the
subbasis contains strips.
Proof. Since every element of S is open in the product topology, we don’t get any extra open sets.
We know we get every open set because intersecting two strips gives a rectangle, so we can get every
basis element.
Definition. Let X be a topological space with topology T and let Y ⊂ X. Then

TY = {Y ∩ U | U ∈ T }
is the subspace topology on Y . Under this topology, Y is called a subspace of X.
We show TY is a topology using the distributive properties of ∩ and ∪. We have to be careful about
phrasing; if U ⊂ Y , we say U is open relative to Y if U ∈ TY and U is open relative to X if U ∈ T .
Lemma. If Y ⊂ X and B is a (sub)basis for T on X, BY = {B ∩ Y | B ∈ B} is a (sub)basis for TY .
Lemma. Let Y be a subspace of X. If U is open in Y and Y is open in X, then U is open in X.
Prop. If A is a subspace of X and B is a subspace of Y , then the product topology on A × B is the
same as the topology A × B inherits as a subspace of X × Y . (Product and subspace commute.)
Proof. We show their bases are equal. Every basis element of the topology X × Y is of the form
U × V for U open in X and V open in Y . Then the basis elements for the subspace topology A × B
of the form
(U × V ) ∩ (A × B) = (U ∩ A) × (V ∩ B).
But the basis elements of X are of the form U ∩ A by our lemma, so this is just the basis for the
product topology A × B. Thus the topologies are the same.
The same result doesn’t hold for the order topology. If X has the order topology and Y is a subset
of X, the subspace topology on Y is not the same as the order topology it inherits from X.
Example. Let Y be the subset [0, 1] of R in the subspace topology. Then the basis has elements
of the form (a, b) for a, b ∈ Y , but also elements of the form [0, b) and (a, 1], which are not open
in R. This illustrates our above lemma. However, the order topology on Y does coincide with its
subspace topology.
Now let Y be the subset [0, 1) ∪ {2} of R. Then {2} is an open set in the subspace topology, but
it isn’t open in the order topology. (But it would be if Y were the subset [0, 1] ∪ {2}.)
Example. Let I = [0, 1]. The set I × I in the dictionary order topology is called the ordered square,
denoted Io2 . However, it is not the same as the subspace topology on I × I (as a subspace of the
dictionary order topology on R × R), since in the latter, {1/2} × (1/2, 1] is open.
In both examples above, the subspace topology looks strange because the intersection operation
chops up open sets into closed ones. We will show that if this never happens, the topologies coincide.
Prop. Let a subset Y of X be convex in X if, for every pair of points a < b in Y , all points in the
interval (x, y) of X are in Y . If Y is convex in an ordered set X, the order topology and subspace
topology on Y coincide.
Proof. We will show they contain each others’ subbases. We know Yord has a subbasis of rays in
Y , and Ysub has a subbasis consisting of the intersection of Y with rays in X.
Consider the intersection of ray (a, +∞) in X with Y . If a ∈ Y , we get a ray in Y . If a 6∈ Y ,
then by convexity, a is either a lower or upper bound on Y , in which case we get all of Y or nothing.
Thus Yord contains Ysub .
Now consider a ray in Y , (a, +∞). This is just the intersection of Y with the ray (a, +∞) in X,
so Ysub contains Yord , giving the result.
In the future, we’ll assume that a subset Y of X is given the subspace topology, regardless of the
topology on X.
7.2 Closed Sets and Limit Points

Prop. Let Y be a subspace of X. If A is closed in Y and Y is closed in X, then A is closed in X.
Prop. Let Y be a subspace of X and let A ⊂ Y . Then the closure of A in Y is A ∩ Y .
Proof. Let B denote the closure of A in Y . Since B is closed in Y , B = Y ∩ U where U is closed

in X and contains A. Then A ⊂ U , so A ∩ Y ⊂ B. Next, since A is closed in X, A ∩ Y is closed in
Y and contains A, so B ⊂ A ∩ Y . These two inclusions show the result.
Now we give a convenient way to find the closure of a set. Say that a set A intersects a set B if
A ∩ B is not empty, and say U is a neighborhood of a point x if U is an open point containing x.
Theorem. Let A ⊂ X. Then x ∈ A iff every neighborhood of x intersects A. If X has a basis, the
theorem is also true if we only use basis elements as neighborhoods.
Proof. Consider the contrapositive. Suppose x has a neighborhood U that doesn’t intersect A.
Then X − U is closed, so A ⊂ X − U , so x 6∈ A. Conversely, if x 6∈ A, then X − A is a neighborhood
of x that doesn’t intersect A.
Restricting to basis elements works because if U is a neighborhood of x, then by definition, there
is a basis element B ⊂ U that contains x.
Definition. If A ⊂ X, we say x ∈ X is a limit point of A if it belongs to the closure of A − {x}.
Equivalently, every neighborhood of x intersects an element of A, besides itself; intuitively, there

are points of A “arbitrarily close” to x.
Theorem. Let A ⊂ X and let A0 be the set of limit points of A. Then A = A ∪ A0 .
Proof. The limit point criterion is stricter than the closure criterion above, so A0 ⊂ A, giving
A ∪ A0 ⊂ A. To show the reverse, let x ∈ A. If x ∈ A, we’re done; otherwise, every neighborhood
of x intersects an element of A that isn’t itself, so x ∈ A0 . Then A ⊂ A ∪ A0 .
Corollary. A subset of a topological space is closed iff it contains all its limit points.
Example. If A ⊂ R is the interval (0, 1], then A = [0, 1], but the closure of A in the subspace
Y = (0, 2) is (0, 1]. We can also show that Q = R, and Z+ = Z+ . Note that Z+ has no limit points.
In a general topological space, intuitive statements about closed sets that hold in R may not
hold. For example, let X = {a, b} and T = {{}, {a}, {a, b}}. Then the one-point set {a} isn’t closed,
since it has b as a limit point!
Similarly, statements about convergence fail. Given a sequence of points xi ∈ X, we say the
sequence converges to x ∈ X if, for every neighborhood U of x, there is a positive integer N so that
xn ∈ U for all n ≥ N . Then the one-point sequence a, a, . . . converges to both a and b!
The problem is that the points a and b are “too close together”, so close that we can’t topologically
tell them apart. We add a new, mild axiom to prevent this from happening.
Definition. A topological space X is Hausdorff if, for every two distinct points x1 , x2 ∈ X, there
exist disjoint neighborhoods of x1 and x2 . Then the points are “housed off” from each other.
Prop. Every finite point set in a Hausdorff space is closed.

Proof. It suffices to show this for a one-point set, {x0 }. If x 6= x0 , then x has a neighborhood that
doesn’t contain x0 . Then it’s not in the closure of {x0 }, by definition.
This condition, called the T1 axiom, is even weaker than the Hausdorff axiom.
Prop. Let X satisfy the T1 axiom and let A ⊂ X. Then x is a limit point of A iff every neighborhood
of x contains infinitely many points of A.
Proof. Suppose the neighborhood U of x contains finitely many points of A − {x}, and call this
finite set A0 . Since A0 is closed, U ∩ (X − A0 ) is a neighborhood of x disjoint from X − {x}, so x is
not a limit point of A.
If every neighborhood U of x contains infinitely many points of A, then every such neighborhood
contains at least one point of A − {x}, so x is a limit point of A.
Prop. If X is a Hausdorff space, sequences in X have unique limits.
Proof. Let xn → x and y 6= x. Then x and y have disjoint neighborhoods U and V . Since all but
finitely many xn are in U , the same cannot be true of V , so xn does not converge to y.
Prop. Every order topology is Hausdorff, and the Hausdorff property is preserved by products and
subspaces.
7.3 Continuous Functions

Example. Let f : R → R be continuous. Then given x0 ∈ R and > 0, f −1 ((f (x0 ) − , f (x0 ) + ))
is open in R. Since this set contains x0 , it must contain a basis element (a, b) about x0 , so it contains
(x0 − δ, x0 + δ) for some δ. Thus, if f is continuous, |x − x0 | < δ implies |f (x) − f (x0 )| < , the
standard continuity criterion. The two are equivalent.
Example. Let f : R → Rl be the identity function f (x) = x. Then f is not continuous, because
the inverse image of the open set [a, b) of R0 is not open in R.
Definition. Let f : X → Y be injective and continuous and let Z = f (X), so the restriction
f 0 : X → Z is bijective. If f 0 is a homeomorphism, we say f is a topological imbedding of X in Y .
Example. The topological spaces (−1, 1) and R are isomorphic. Define F : (−1, 1) → R and its
inverse G as
x 2y
F (x) = 2
, G(y) = .
1−x 1 + (1 + 4y 2 )1/2
Because F is order-preserving and bijective, it corresponds basis elements of (−1, 1) and R, so it is
a homeomorphism. Alternatively, we can show F and G are continuous using facts from calculus.
Example. Define f : [0, 1) → S 1 by f (t) = (cos 2πt, sin 2πt). Then f is bijective and continuous.
However, f −1 is not, since f sends the open set [0, 1/4) to a non-open set. This makes sense, since
our two sets are topologically distinct.
As in real analysis, we now give rules for constructing continuous functions.
Prop. Let X and Y be topological spaces.
• The constant function is continuous.

• Compositions of continuous functions are continuous.
• Let A be a subspace of X. The inclusion function j : A → X is continuous, and the restriction

of a continuous f : X → Y to A, f |A : A → Y , is continuous.
• (Range) Let f : X → Y be continuous. If Z is a subspace of Y containing f (X), the function

g : X → Z obtained by restricting the range of f is continuous. If Z is a space having Y as a
subspace, the function h : X → Z obtained by expanding the range of f is also continuous.
• (Local criterion) The map f : X → Y is continuous if X can be written as the union of open
sets Uα so that f |Uα is continuous for each α.
• (Pasting) Let X = A ∪ B where A and B are closed in X. If f : A → Y and g : B → Y are

continuous and agree on A ∩ B, then they combine to yield a continuous function h : X → Y .
Proof. Most of these properties are straightforward, so we only prove the last one. Let C be a
closed subset of Y . Then h−1 (C) = f −1 (C) ∪ g −1 (C). These sets are closed in A and B respectively,
and hence closed in X. Then h−1 (C) is closed in X.
Example. The pasting lemma also works if A and B are both open, since the local criterion applies.
However, it can fail if only A is closed and only B is open. Consider the real line and let A = (−∞, 0)
and let B = [0, ∞), with f (x) = x − 2 and g(x) = x + 2. These functions are continuous on A and
B respectively, but pasting them yields a function discontinuous at x = 0.
Prop. Write f : A → X × Y as f (a) = (f1 (a), f2 (a)). Then f is continuous iff the coordinate
functions f1 and f2 are. This is another manifestation of the universal property of the product.
Proof. If f is continuous, the composition fi = πi ◦ f is continuous. Conversely, let f1 and

f2 are continuous. We will show the inverse image of basis elements is open. By set theory,
f −1 (U × V ) = f1−1 (U ) ∩ f2−1 (V ), which is open since it’s the intersection of two open sets.
This theorem is useful in vector calculus; for example, a vector field is continuous iff its components
are.
7.4 The Product Topology

We now generalize the product topology to arbitrary Cartesian products.
Definition. Given an index set J and a set X, a J-tuple of elements of X is a function x : J → X.

We also write x as (xα )α∈J . Denote the set of such J-tuples as X J .
S
Definition. Given an indexed family of sets {Aα }α∈J , let X = α∈J Aα and define their Cartesian
product α∈J Aα as the subset of X J where xα ∈ Aα for each α ∈ J.
Q
Definition. Let {Xα }α∈J be an indexed family of topological spaces, and let Uα denote an arbitrary
open set in Xα .
Q Q
• The box topology on Xα has basis elements of the form Uα .
• The product topology on Xα has subbasis elements of the form πα−1 (Uα ), for arbitrary α.
Q
We’ve already seen that in the finite case, these two definitions are equivalent. However, they differ
in the infinite case, because subbasis elements only generate open sets under finite intersections.
Q
Then the basis elements of the product topology are of the form Uα , where Uα = Xα for all but
finitely many values of α. We prefer the product topology, for the following reason.
Q Q
Prop. Write f : A → Xα as f (a) = (fα (a))α∈J . If Xα has the product topology, then f is
continuous iff the coordinate functions fα are.
Proof. If f is continuous, the composition fi = πi ◦ f is continuous. Conversely, let the fα be

continuous. We will show the inverse image of subbasis elements is open. The inverse image of
πβ−1 (Uβ ) is fβ−1 (Uβ ), which is open in A by the continuity of fβ .
Example. The above proposition doesn’t hold for the box topology. Consider Rω and let f (t) =
(t, t, . . .). Then each coordinate function is continuous, but the inverse image of the basis element

1 1 1 1
B = (−1, 1) × − , × − , × ···
2 2 3 3
is not open, because it contains the point zero, but no basis element (−δ, δ) about the point zero.
This is inherently because open sets are not closed under infinite intersections.
Q
In the future, whenever we consider Xα , we will implicitly give it the product topology. The box
topology will sometimes be used to construct counterexamples.
Q
Prop. The following results hold for Xα in either the box or product topologies.
Q Q
• If Aα is a subspace of Xα , then Aα is a subspace of Xα if both are given the box or product
topologies.
Q
• If each Xα is Hausdorff, so is Xα .
• Let Aα ⊂ Xα . Then
Y Y
Aα = Aα .
Q
• Let Xα have basis Bα . Then Bα where Bα ∈ Bα is a basis for the box topology. The same
collection of sets, where Bα = Xα for all but a finite number of α, is a basis for the product
topology. Thus the box topology is finer than the product topology.
7.5 The Metric Topology

Definition. If X is a metric space with metric d, the collection of all -balls
Bd (x, ) = {y | d(x, y) < }
is a basis for a topology on X, called the metric topology induced by d. We say a topological space
is metrizable if it can be induced by a metric on the underlying set, and call a metrizable space
together with its metric a metric space.
Metric spaces correspond nicely with our intuitions from analysis. For example, using a basis above,
a set U is open if, for every y ∈ U , U contains an -ball centered at y. Different choices of metric
may yield the same topology; properties dependent on such a choice are not topological properties.
Example. The metric d(x, y) = 1 (for x 6= y) generates the discrete topology.
Example. The metric d(x, y) = |x − y| on R generates the standard topology on R, because its
basis elements (x − , x + ) are the same as those of the order topology, (a, b).
Example. Boundedness is not a topological property. Let X be a metric space with metric d. A
subset A of X is bounded if the set of distances d(a1 , a2 ) with a1 , a2 ∈ A has an upper bound. If A
is bounded, its diameter is
diam A = sup d(a1 , a2 ).
a1 ,a2 ∈A
The standard bounded metric on X is defined by
d(x, y) = min(d(x, y), 1).
Then every set is bounded if we use the metric d, but d and d generate the same topology! Proof:
we may use the set of -balls with < 1 as a basis for the metric topology. These sets are identical
for d and d.
We now show that Rn is metrizable.
Definition. Given x = (x1 , . . . , xn ) ∈ Rn , we define the Euclidean metric d2 as

q
d2 (x, y) = kx − yk2 , kxk2 = x21 + . . . + x2n .
We may also define other metric with a general exponent; in particular,
d∞ (x, y) = max{|x1 − y1 |, . . . , |xn − yn |}.

79 8. Algebraic Topology
8 Algebraic Topology
8.1 Constructing Spaces
8.2 The Fundamental Group
8.3 Group Presentations
8.4 Covering Spaces
80 9. Methods for ODEs
9 Methods for ODEs

9.1 Differential Equations
In this section, we will focus on techniques for solving linear ordinary differential equations (ODEs).
• Our problems will be of the form

Ly(x) = f (x), L = Pn ∂ n + . . . + P0 , a≤x≤b
where L is a linear differential operator and f is the forcing function.
• There are several ways we can specify a solution. When the independent variable x represents
time, we often use initial conditions, specifying y and its derivatives at x = a. When x represents
space, we often use boundary conditions, which constrain y and its derivatives at x = a or
x = b.
• We will consider only linear boundary conditions, i.e. those of the form
X
an y (n) (x0 ) = γ, x0 ∈ {a, b}.
n
The boundary condition is homogeneous if γ is zero. Boundary value problems are more subtle
than initial value problems, because a given set of boundary conditions may admit no solutions
or infinitely many. As such, we will completely ignore the boundary conditions for now.
• By the linearity of L, the general solution consists of a solution to the equation plus any solution
to the homogeneous equation, which has f = 0 . The solutions to the homogeneous equation
form an n-dimensional vector space. For simplicity we will focus on the case n = 2 below.
• The simplest way to check if a set of solutions to the homogeneous equation is linearly dependent
is to evaluate the Wronskian. For n = 2 it is

y1 y2
W (y1 , y2 ) = det 0 = y1 y20 − y2 y10
y1 y20
and the generalization to arbitrary n is straightforward. If the solutions are linearly dependent,
then the Wronskian vanishes.
• The converse to the above statement is a bit subtle. It is clearly true if the Pi are all constants.
However, if P2 (x0 ) = 0 for some x0 , then y 00 is not determined at that point; hence two solutions
may be dependent for x < x0 but become independent for x > x0 . If P2 (x) never vanishes, the
converse is indeed true.
• For constant coefficients, the homogeneous solutions may be found by guessing exponentials.
In the case where Pn ∝ xn , all terms have the same power, so we may guess a power xm .
• Another useful trick is reduction of order. Suppose one solution y1 (x) is known. We guess a
solution of the form
y(x) = v(x)y1 (x).
Plugging this in, all terms proportional to v cancel because y1 satisfies the ODE, giving
P2 (2v 0 y10 + v 00 y1 ) + P1 v 0 y1 = 0
which is a first-order ODE in v 0 .
Next, we introduce variation of parameters to solve the inhomogeneous equation.
• Given homogeneous solutions y1 (x) and y2 (x), we guess an inhomogeneous solution
y(x) = c1 (x)y1 (x) + c2 (x)y2 (x).
We impose the condition c01 y1 + c02 y2 = 0, so we have
y 0 = c1 y10 + c2 y20 , y 00 = c1 y100 + c2 y200 + c01 y10 + c02 y20
and the condition ensures that no second derivatives of the ci appear.
• Plugging this into the ODE we find
Ly = P2 (c01 y10 + c02 y20 ) = f
where many terms drop out since y1 and y2 are homogeneous solutions.
• We are left with a system of two first-order ODEs for the ci , which are solvable. By solving the
system, we find
f y2 f y1
c01 = − , c02 =
P2 W P2 W
where W is again the Wronskian. Then the general solution is
Z x Z x
f (t)y2 (t) f (t)y1 (t)
y(x) = −y1 (x) dt + y2 (x) dt.
P2 (t)W (t) P2 (t)W (t)
As before, there are issue if P2 (t) ever vanishes, so we assume it doesn’t. The constants of
integration from the unspecified lower bounds allow the addition of an arbitrary homogeneous
solution.
• So far we haven’t accounted for boundary conditions. Consider the simple case y(a) = y(b) = 0.
We choose homogeneous solutions obeying
y1 (a) = y2 (b) = 0.
Then the boundary conditions require
c2 (a) = c1 (b) = 0
which fixes the unique solution

Z b Z x
f (t)y2 (t) f (t)y1 (t)
y(x) = y1 (x) dt + y2 (x) dt.
x P2 (t)W (t) a P2 (t)W (t)
We can also write this in terms of a Green’s function g(x, t),

Z b (
1 y1 (t)y2 (x) t≤x
y(x) = g(x, t)f (t) dt, g(x, t) = .
a P2 (t)W (t) y2 (t)y1 (x) x≤t
Similar methods work for any homogeneous boundary conditions.

9.2 Eigenfunction Methods

We begin by reviewing Fourier series.
• Fourier series are defined for functions f : S 1 → C, parametrized by θ ∈ [−π, π). We define the
Fourier coefficients Z 2π
1 inθ 1
ˆ
fn = (e , f ) ≡ e−inθ f (θ) dθ.
2π 2π 0
We then claim that X
f (θ) = fˆn einθ .
n∈Z
Before continuing, we investigate whether this sum converges to f , if it converges at all.
• One can show that the Fourier series converges to f for continuous functions with bounded
continuous derivatives. Fejer’s theorem states that one can always recover f from the fˆn as
long as f is continuous except at finitely many points, though it makes no statement about the
convergence of the Fourier series. One can also show that the Fourier series converges to f as
long as n |fˆn | converges.
P
• The Fourier coefficients for the sawtooth function f (θ) = θ are

(
0 n = 0,
fˆn = n+1
(−1) /in otherwise.
At the discontinuity, the Fourier series converges to the average of f (π + ) and f (π − ). This
always happens: to show that, simply add the sawtooth to any function with a discontinuity
to remove it, then apply linearity.
• Integration makes Fourier series ‘nicer’ by dividing fˆn by in, while differentiation does the
opposite. In particular, a discontinuity appears as 1/n decay of the Fourier coefficients (as
shown for the sawtooth), so a discontinuity of f (k) appears as 1/nk+1 decay. For a smooth
function, the Fourier coefficients fall faster than any power.
• Right next to a discontinuity, the truncated Fourier series displays an overshoot by about 18%,
called the Gibbs-Wilbraham phenomenon. The width of the overshoot region goes to zero as
more terms are added, but the maximum extent of the overshoot remains the same; this shows
that the Fourier series converges pointwise rather than uniformly. (The phenomenon can be
shown explicitly for the square wave; this extends to all other discontinuities by linearity.)
• Computing the norm-squared of f in position space and Fourier space gives Parseval’s identity,
Z π X
|f (θ)|2 dθ = 2π |fˆk |2 .
−π k∈Z
This is simply the fact that the map f (x) → fˆn is unitary.
• Parseval’s theorem also gives error bounds: the mean-squared error from cutting off a Fourier
series is proportional to the length of the remaining Fourier coefficients. In particular, the best
possible approximation of a function f (in terms of mean-squared error) using only a subset of
the Fourier coefficients is obtained by simply truncating the Fourier series.
Fourier series are simply changes of basis in function space, and linear differential operators are
linear operators in function space.
• We are interested in solving the eigenfunction problem
Lyi (x) = λi yi (x)
along with homogeneous boundary conditions. Generically, there will be infinitely many eigen-
functions, allowing us to construct a solution to the inhomogeneous problem by linearity.
• We define the inner product on the function space as

Z b
(u, v) = u(x)v(x) dx.
a
Note there is no conjugation because we only work with real functions.
• We wish to define the adjoint L∗ of a linear operator L by
(Ly, w) = (y, L∗ w).
We could then get an explicit expression for L∗ using integration by parts. However, generally
we end up with boundary terms, which don’t have the correct form.
• Suppose that we have certain homogeneous boundary conditions on y. Demanding that the
boundary terms vanish will induce homogeneous boundary conditions on w. If L = L∗ and the
boundary conditions stay the same, the problem is self-adjoint. If only L = L∗ , then we call L
self-adjoint, or Hermitian.
Example. We take L = ∂ 2 with y(a) = 0, y 0 (b) − 3y(b) = 0. Then we have

Z b
b Z b
00 0 0
yw00 dx.

wy dx = (wy − w y) +
a a a
Hence we have L∗ = ∂ 2 , and the induced boundary conditions are
w0 (b) − 3w(b) = 0, w(a) = 0.
Hence the problem is self-adjoint.
Now we focus on the eigenfunctions.
• Eigenfunctions of the adjoint problem have the same eigenvalues as the original problem. That
is, if Ly = λy, there is a w so that L∗ w = λw. This is intuitive thinking of L∗ as the transpose
of L, though we can’t formally prove it.
• Eigenfunctions with different eigenvalues are orthogonal. Specifically, let
Lyj = λj yj , Lyk = λk yk
where the latter yields L∗ wk = λk wk . Then if λj 6= λk , then hyj , wk i = 0. This follows from
the same proof as for matrices.
• To solve a general inhomogeneous boundary value problem, we solve the eigenvalue problem
(subject to homogeneous boundary conditions) as well as the adjoint eigenvalue problem, to
obtain (λj , yj , wj ). To obtain a solution for Ly = f (x) we assume
X
y= ci yi (x).
i
We then solve for the coefficients by projection,
hf, wk i = hLy, wk i = hy, λk wk i = λk ck hyk , wk i
from which we may find ck .
• Finally, consider the case of inhomogeneous boundary conditions. Such a problem can always
be split into an inhomogeneous problem with homogeneous boundary conditions, and a homoge-
neous problem with inhomogeneous boundary conditions. Since solving homogeneous problems
tends to be easier, this case isn’t much harder.
Example. Consider the inhomogeneous problem
y 00 = f (x), 0 ≤ x ≤ 1, y(0) = α, y(1) = β.
Performing the decomposition described above, the homogeneous boundary conditions are simply
y(0) = y(1) = 0, so the eigenfunctions are
yk (x) = sin(kπx), λk = −k 2 π 2 , k = 1, 2, . . . .
The problem is self-adjoint, so yk = wk and we have

R1
hf, wk i 2 0 f (x) sin(kπx) dx
ck = =− .
λk hyk , wk i k2 π2
To handle the inhomogeneous boundary conditions, we simply add on (β − α)x + α.
• For most applications, we’re interested in second-order linear differential operators,
d2 d
L = P (x) 2
+ R(x) − Q(x), Ly = 0.
dx dx
• We may simplify L using the method of integrating factors,
d2
R
1 R(x) d Q(x) Rx d x
R(t)/P (t) dt d Q(x)
L= 2 + − = e− R(t)/P (t) dt e − .
P (x) dx P (x) dx P (x) dx dx P (x)
Assuming P (x) 6= 0, the equation Ly = 0 is equivalent to (1/P (x))Ly = 0. Hence any L can
be taken to have the form
d d
L= p(x) − q(x)
dx dx
without loss of generality. Operators in this form are called Sturm-Liouville operators.
• Sturm-Liouville operators are self-adjoint under the inner product

Z b
(f, g) = f (x)∗ g(x) dx
a
provided that the functions on which they act obey appropriate boundary conditions. To see
this, apply integration by parts for
dg b
∗
df
(Lf, g) − (f, Lg) = p(x) g − f∗ .
dx dx a
• There are several possible boundary conditions that ensure the boundary term vanishes. For
example, we can demand
f (a)/f 0 (a) = ca , f (b)/f 0 (b) = cb
for constants ca and cb , for all functions f . Alternatively, we can demand periodicity,
f (a) = f (b), f 0 (a) = f 0 (b).
Another possibility is that p(a) = p(b) = 0, in which case the term automatically vanishes.
Naturally, we always assume the functions are smooth.
Next, we consider the eigenfunctions of the Sturm-Liouville operators.
• A function y(x) is an eigenfunction of L with eigenvalue λ and weight function w(x) if
Ly(x) = λw(x)y(x).
The weight function must be real, nonnegative, and have finitely many zeroes on the domain
[a, b]. It isn’t necessary, as we can remove it by redefining y and L, but it will be convenient.
• We define the inner product with weight w to be

Z b
(f, g)w = f ∗ (x)g(x)w(x) dx
a
so that (f, g)w = (f, wg) = (wf, g). The conditions on the weight function are chosen so that
the inner product remains nondegenerate, i.e. (f, f )w = 0 implies f = 0. We take the weight
function to be fixed for each problem.
• By the usual proof, if L is self-adjoint, then the eigenvalues λ are real. Moreover, since everything
is real except for the functions themselves, f ∗ is an eigenfunction if f is. Thus we can always
switch basis to Re f and Im f , so the eigenfunctions can be chosen real.
• Moreover, eigenfunctions with different eigenvalues are orthogonal, as
λi (fj , fi )w = (fj , Lfi ) = (Lfj , fi ) = λj (fj , fi )w .
Thuspwe can construct an orthonormal set {Yn (x)} from eigenfunctions yn (x) by setting Yn =
yn / (yn , yn ).
• One can show that the eigenvalues form a countably infinite sequence {λn } with |λn | → ∞
as n → ∞, and that the eigenfunctions Yn (x) form a complete set for functions satisfying the
given boundary conditions. Thus we may always expand such a function f as
∞
X Z b
f (x) = fn Yn (x), fn = (Yn , f )w = Yn∗ (x)f (x)w(x) dx.
n=1 a
From now on we ignore convergence issues for infinite sums.

• Parseval’s identity carries over, as
∞
X
(f, f )w = |fn |2 .
n=1
Example. We choose periodic boundary conditions on [−L, L] with L = d2 /dx2 and w(x) = 1.
Solving the eigenfunction equation
y 00 (x) = λy(x)
gives solutions
nπ 2
yn (x) = exp(inπx/L), λn = − , n ∈ Z.
L
Thus we’ve recovered the Fourier series.
Example. Consider the differential equation
1 00
H − xH 0 = −λH, x ∈ R
2
subject to the condition that H(x) grows sufficiently slowly at infinity, to ensure inner products
exist. Using the method of integrating factors, we rewrite the equation in Sturm-Liouville form,

d 2 dH 2
e−x = −2λe−x H(x).
dx dx
2
This is now an eigenfunction equation with weight function w(x) = e−x . Thus weight functions
naturally arise when converting general second-order linear differential operators to Sturm-Liouville
form. The solutions are the Hermite polynomials,
dn −x2
2
Hn (x) = (−1)n ex e
dxn
and they are orthogonal with respect to the weight function w(x).
Example. Consider the inhomogeneous equation
Lφ(x) = w(x)F (x)
where F (x) is a forcing term. Expanding in the eigenfunctions yields the particular solution
∞
X (Yn , F )w
φp (x) = Yn (x).
λn
n=1
Alternatively, expanding this as an integral and defining f (x) = w(x)F (x), we have
∞
Yn (x)Yn∗ (ξ)
Z b X
φp (x) = G(x, ξ)f (ξ) dξ, G(x, ξ) = .
a λn
n=1
The function G is called a Green’s function, and it provides a formal inverse to L. It gives the
response at x to forcing at ξ.
9.3 Distributions
We now take a detour by defining distributions, as the Dirac delta ‘function’ will be needed later.
• Given a domain Ω, we choose a class of test functions D(Ω). The test functions are required to
be infinitely smooth and have compact support; one example is
( 2
e−1/(1−x ) |x| < 1,
ψ(x) =
0 otherwise.
A distribution T is a linear map T : D(Ω) → R given by T : φ 7→ T [φ]. The set of distributions

is written as D0 (Ω), the dual space of D(Ω). It is a vector space under the usual operations.
• We can define the product of a distribution and a test function by
(ψT )[φ] = T [ψφ].
However, there is no way to multiply distributions together.
• The simplest type of distribution is an integrable function f : Ω → R, where we define the

action by the usual inner product of functions,
Z
f [φ] = (f, φ) = f (x)φ(x) dV.
Ω
However, the most important example is the Dirac delta ‘function’,
δ[φ] = φ(0)
which cannot be thought of this way. Though we often write the Dirac δ-function under integrals,
we always implicitly think of it as a functional of test functions.
• The Dirac δ-function can also be defined as the limit of a sequence of distributions, e.g.
2 x2 √
Gn (x) = ne−n / π.
In terms of functions, the limit limn→∞ Gn (x) does not exist. But if we view the functions
as distributions, we have limn→∞ (Gn , φ) = φ(0) for each φ, giving a limiting distribution, the
Dirac delta.
• Next, we can define the derivative of a distribution by integration by parts,
T 0 [φ] = −T [φ0 ].
This trick means that distributions are infinitely differentiable, despite being incredibly badly
behaved! For example, δ 0 [φ] = −φ0 (0). As another example, the step function Θ(x) is not
differentiable as a function, but as a distribution,
Θ0 [φ] = −Θ[φ0 ] = φ(0) − φ(∞) = φ(0)
which gives Θ0 = δ.
• The Dirac δ-function obeys

X δ(x − xi )
δ(f (x)) =
|f 0 (xi )|
i
where the xi are the roots of f . This can be shown nonrigorously by treating the delta function
as an ordinary function and using integration rules; it can also be proven entirely within
distribution theory.
• The Fourier series of the Dirac δ-function on [−L, L] is

1 X inπx/L
δ(x) = e .
2L
n∈Z
Again, the right-hand side must be thought of as a limit of a series of distributions. When
integrated against a test function φ(x), it extracts the sum of the Fourier coefficients φ̂n , which
yields φ(0).
• Similarly, we can expand the Dirac δ-function in any basis of orthonormal functions,
X Z b
δ(x − ξ) = cn Yn (x), cn = Yn∗ (x)δ(x − ξ)w(x) dx = Yn∗ (ξ)w(ξ).
n a
This gives the expansion

X X
δ(x − ξ) = w(ξ) Yn∗ (ξ)Yn (x) = w(x) Yn∗ (ξ)Yn (x)
n n
where we can replace w(ξ) with w(x) since δ(x−ξ) is zero for all x 6= ξ. To check this expression,
P
note that if g(x) = m dm Ym (x), then
Z b X Z b X
∗ ∗ ∗
g (x)δ(x − ξ) = Yn (ξ)dm w(x)Ym∗ (x)Yn (x) dx = d∗m Ym∗ (ξ) = g ∗ (ξ).
a m,n a m
We will apply the eigenfunction expansion of the Dirac δ-function to Green’s functions below.
Note. Principal value integrals. Suppose we wanted to view the function 1/x as a distribution.
This isn’t possible directly because of the divergence at x = 0, but we can use the principal value
Z − Z ∞
1 f (x) f (x)
P [f (x)] = lim dx + dx .
x →0+ −∞ x x
All the integrals here are real, but for many applications, f (x) will be a meromorphic complex
function. Then we can simply evaluate the principal value integral by taking a contour that goes
around the pole at x = 0 by a semicircle, and closes at infinity.
Note. We may also regulate 1/x by adding an imaginary part to x. The Sokhotsky formula is
1 1
lim = P − iπδ(x)
→0+ x + i x
where both sides do not converge as functions, but merely as distributions. This can be shown
straightforwardly by integrating both sides against a test function and taking real and imaginary
parts; note that we cannot assume the test function is analytic and use contour integration.
Example. A Kramers-Kronig relation. Suppose that our test function f (x) is analytic in the
lower-half plane and decays sufficiently quickly. Then applying 1/(x + i) to f (x) gives zero by
contour integration, so we have Z ∞
f (x)
P dx = iπf (0)
−∞ x
by the Sokhotsky formula. In particular, this relates the real and imaginary parts of f (x).
Note. One has to be careful with performing algebra with distributions. Suppose that xa(x) = 1
where a(x) is a distribution, and both sides are regarded as distributions. Then dividing by x is
not invertible; we instead have
1
a(x) = P + Aδ(x)
x
where A is not determined. This is important for Green’s functions below.
9.4 Green’s Functions

Next, we consider Green’s functions for second-order ODEs. They are used to solve problems with
forcing terms.
• We consider linear differential operators of the form
d2 d
L = α(x) 2
+ β(x) + γ(x)
dx dx
defined on [a, b], and wish to solve the problem Ly(x) = f (x) where f (x) is a forcing term.
For mechanical systems, such terms represent literal forces; for first-order systems such as heat,
they represent sources.
• We define the Green’s function G(x, ξ) of L to satisfy
LG = δ(x − ξ)
where L always acts solely on x. To get a unique solution, we must also set boundary conditions;
for concreteness we choose G(a, ξ) = G(b, ξ) = 0.
• The Green’s function G(x, ξ) is the response to a δ-function source at ξ. Regarding the equation
above as a matrix equation, it is the inverse of L, and the solution to the problem with general
forcing is
Z b
y(x) = G(x, ξ)f (ξ) dξ.
a
Here, the integral is just a continuous variant of matrix multiplication. The differential operator
L can be thought of the same way; its matrix elements are derivatives of δ-functions.
• To construct the Green’s function, take a basis of solutions {y1 , y2 } to the homogeneous equation
(i.e. no forcing term) such that y1 (a) = 0 and y2 (b) = 0. Then we must have
(
A(ξ)y1 (x) x < ξ,
G(x, ξ) =
B(ξ)y2 (x) x > ξ.
• Next, we need to join these solutions together at x = ξ. We know that LG has only a δ-function
singularity at x = ξ. Hence the singularity must be provided by the second derivative, or else we
would get stronger singularities; then the first derivative has a discontinuity while the Green’s
function itself is continuous. Explicitly,

− + ∂G ∂G 1
G(x = ξ , ξ) = G(x = ξ , ξ), − = .
∂x x=ξ− ∂x x=ξ+ α(ξ)
• Solving the resulting equations gives

(
1 y1 (x)y2 (ξ) a ≤ x < ξ,
G(x, ξ) = ×
α(ξ)W (ξ) y2 (x)y1 (ξ) ξ < x ≤ b.
Here, W = y1 y20 − y2 y10 is the Wronskian, and it is nonzero because the solutions form a basis.
• This reasoning fully generalizes to higher order ODEs. For an nth order ODE, we have a basis
of n solutions, a discontinuity in the n − 1th derivative, and n − 1 continuity conditions.
• If the boundary conditions are inhomogeneous, we use the linearity trick again: we solve the
problem with inhomogeneous boundary conditions but no forcing (using our earlier methods),
and with homogeneous boundary conditions with forcing.
• We can also compute the Green’s function in terms of the eigenfunctions. Letting G(x, ξ) =
P
n Ĝn (ξ)Yn (x), and expanding LG = δ(x − ξ) gives
X X
w(x) Ĝn (ξ)λn Yn (x) = w(x) Yn (x)Yn∗ (ξ)
n n
which implies Ĝn (ξ) = Yn∗ (ξ)/λn . This is the same result we found several sections earlier.
• Note that the coefficients Ĝn (ξ) are singular if λn = 0. This is simply a manifestation of the
fact that Ax = b has no unique solution if A has a zero eigenvalue.
• For example, consider Ly = y 00 − y on [0, a] with boundary conditions y(0) = y(a) = 0.

Generically, there are no zero eigenvalues, but in the case a = nπ we have y = sin(x).
Thus, when we’re dealing with boundary conditions it can be difficult to see whether a solution
is unique; it must be treated on a case-by-case basis. Note that the invertibility of L depends
on the boundary conditions; though the operator L is fixed, the space on which it acts is
determined by the boundary conditions.
• Green’s functions can be defined for a variety of boundary conditions. For example, when time
is the independent variable with t ∈ [t0 , ∞), then we might take y(t0 ) = y 0 (t0 ) = 0. Then the
Green’s function G(t, τ ) must be zero until t = τ , giving the retarded Green’s function. Using
a ‘final’ condition instead would give the advanced Green’s function.
9.5 Variational Principles

In this section, we consider some problems involving minimizing a functional
Z β
F [y] = f (y, y 0 , x) dx.
α
The Euler-Lagrange equation gives

∂f d ∂f
− =0
∂y dx ∂y 0
for fixed endpoints. When f does not depend explicitly on x, Noether’s theorem yields
∂f 0
f− y = const.
∂y 0
This quantity is also called the first integral.
√
Example. The path of a light ray in the xz plane with n(z) = a − bz. Here, the functional is
the total time, and we parametrize the path by z(x). Then
dt p
f= = n(z) 1 + z 02
dx
p
which has no explicit x-dependence, giving the first integral (a − bz)/(1 + z 02 ). Separating and
integrating shows that the path is a parabola; a linear n(z) would give a circle.
Example. The brachistochrone. A bead slides on a frictionless wire from (0, 0) to (x, y) with y
positive in the downward direction. We have
s
dt 1 + (y 0 )2
f= ∝
dx y
p
which yields the first integral 1/ y(1 + y 02 ). Separating and integrating, then parametrizing ap-
propriately gives
x = c(θ − sin θ), y = c(1 − cos θ)
which is a cycloid.
Example. The isoperimetric problem: maximize the area enclosed by a curve with fixed perimeter.
To handle this constrained variation, we use Lagrange multipliers. In general, if we have the
constraint P [y] = c, then we extremize the functional
Φ[y] = F [y] − λ(P [y] − c)
without constraint, then pick λ to satisfy the constraint. (For multiple constraints, we just add one
term for each constraint, with a different λi .) In this case, the area and perimeter are
I I p
A[y] = y(x) dx, P [y] = 1 + (y 0 )2 dx
C C
where x is integrated from α to β (for the top half), then back down from β to α (for the bottom
half). We must extremize the functional
p
f [y] = y − λ 1 − y 02
and the Euler-Lagrange

p equation applies because there are no endpoints. We thus have the first
integral y − λ/ 1 + (y 0 )2 , which can be separated and integrated to show the solution is a circle.
As an application, we consider Noether’s theorem.

• We consider a one-parameter family of transformations parametrized by s. To first order,
q → q + sδq, q̇ → q̇ + sδ q̇.
˙ because we are varying along paths, on which q̇ and q are related.
Note that δ q̇ = (δq)
• For this transformation to be a symmetry, the Lagrangian must change by a total derivative,
as this preserves stationary paths of the action,

∂L ∂L dK
δL = s δq + δ q̇ =s .
∂q ∂ q̇ dt
Applying the Euler-Lagrange equations, on shell we have

dK d ∂L d ∂L
s =s δq → δq − K = 0.
dt dt ∂ q̇ dt ∂ q̇
This is Noether’s theorem.
• To get a shortcut for finding a conserved quantity, promote s to a function s(t). Then we pick
up an extra term,

∂L ∂L ∂L dK ∂L
δL = s δq + δ q̇ + ṡδq =s + ṡδq
∂q ∂ q̇ ∂ q̇ dt ∂ q̇
where K is defined as above. Simplifying,

d ∂L
δL = (sK) + ṡ δq −K
dt ∂ q̇
so that the conserved quantity is the coefficient of ṡ. This procedure can be done without
knowing K beforehand; the point is to simplify the variation into the sum of a total derivative
and a term proportional to ṡ, which is only possible when we are considering a real symmetry.
• We can also phrase the shortcut differently. Suppose we can get the variation in the form
δL = sK̇ + ṡJ.
Applying the product rule and throwing away a total derivative,

˙
δL ∼ s(K̇ − J)
and the variation of the action must vanish on-shell for any variation, including a variation from
a general s(t). Then we need K̇ − J˙ = 0, so K − J is conserved. This is simply a rephrasing of
the previous method. (Note that we can always write δL as linear in s and ṡ, but the coefficient
of s will only be a total derivative when we are dealing with a symmetry.)
• The same setup can be done in Hamiltonian mechanics, where the action is
Z
I[q, p] = pq̇ − H(q, p) dt
and q and p are varied independently, with fixed endpoints for x. This is distinct from the
Lagrangian picture where q and q̇ cannot be varied independently on paths, even if they are
off-shell. In the Hamiltonian picture, q and p are only on on-shell paths.
Example. Time translational symmetry. We perform a time shift δq = q̇, giving

dK ∂L ∂L dL ∂L
= q̇ + q̈ = − .
dt ∂q ∂ q̇ dt ∂t
If time translational symmetry holds, ∂L/∂t = 0, giving K = L and the conserved quantity
∂L
H = q̇ − L.
∂ q̇
On the other hand, using our shortcut method in Hamiltonian mechanics,
q → q + sq̇, q̇ → q̇ + ṡq̇ + sq̈, p → p + sṗ
giving the variation

Z Z
∂H ∂H d
δI = sṗq̇ + spq̈ + ṡpq̇ − q̇ − ṗ dt = (spq̇ − sH) + ṡH
∂q ∂p dt
where we used ∂H/∂t = 0. We then directly read off the conserved quantity H.
We can also handle functionals of functions with multiple arguments, in which case the Euler-
Lagrange equation gives partial differential equations. Note that this is different from functionals
of multiple functions, in which case we get multiple Euler-Lagrange equations.
Example. A minimal surface is a surface of minimal area satisfying some boundary conditions.
The functional is Z
∂y
q
F [y] = dx1 dx2 1 + y12 + y22 , yi =
∂xi
which can be seen by rotating into a coordinate system where y2 = 0. Denoting the integrand as f ,
the Euler-Lagrange equation is
d ∂f ∂f
=
dxi ∂yi ∂y
and the right-hand side is zero. Simplifying gives the minimal surface equation
(1 + y12 )y22 + (1 + y22 )y11 − 2y1 y2 y12 = 0.
If the first derivatives are small, this reduces to Laplace’s equation ∇2 y = 0.
Example. Functionals like the one above are common in field theories. For example, the action
for waves on a string is Z
1
S[y] = dx dt (ρẏ 2 − T y 02 ).
2
Using our Euler-Lagrange equation above, there is no dependence on y, giving
d d
(−T y 0 ) + (ρẏ) = 0
dx dt
which yields the wave equation. It can be somewhat confusing to treat x and t on the same footing
in this way, so sometimes it’s easier to set the variation to zero directly.
94 10. Methods for PDEs
10 Methods for PDEs

10.1 Separation of Variables
We begin by studying Laplace’s equation,
∇2 ψ = 0.
Later, we will apply our results to the study of the heat, wave, and Schrodinger equations,
∂ψ ∂2ψ ∂ψ
K∇2 ψ = , c2 ∇2 ψ = , −∇2 ψ + V (x)ψ = i .
∂t ∂t2 ∂t
Separating the time dimension in these equations will often yield a Helmholtz equation in space,
∇2 ψ + k 2 ψ = 0.
Finally, an important variant of the wave equation is the massive Klein-Gordan equation,
∂2ψ
c2 ∇2 ψ − m2 ψ = .
∂t2
As shown in electromagnetism, the solution to Laplace’s equation is unique given Dirichlet or
Neumann boundary conditions. We always work in a compact spatial domain Ω.
Example. In two dimensions, Laplace’s equation is equivalent to
∂2ψ
=0
∂z∂z
where z = x + iy. Thus the general solution is ψ(x, y) = φ(z) + χ(z) where φ and χ are holomorphic
and antiholomorphic. For example, suppose we wish to solve Laplace’s equation inside the unit disc
subject to ψ = f (θ) on the boundary. We may write the boundary condition as a Fourier series,
X
f (θ) = fˆn einθ .
n∈Z
Now note that at |z| = 1, z n and z −n reduce to einθ . Thus the solution inside the disc is
∞
X
ψ(x, y) = fˆ0 + (fˆn z n + fˆ−n z n )
n=1
which is indeed the sum of a holomorphic and antiholomorphic function. Similarly, to get a bounded
solution outisde the disc, we simply flip the powers.
Next, we introduce the technique of separation of variables.
• Suppose the boundary conditions are given in a three-dimensional rectangular region. Then it
is convenient to separate in Cartesian coordinates. Writing
ψ(x, y, z) = X(x)Y (y)Z(z)
and plugging into Laplace’s equation gives

X 00 (x) Y 00 (y) Z 00 (z)
+ + = 0.
X(x) Y (y) Z(z)
• Thus every term must be independently constant, so

X 00 = −λX, Y 00 = −µY, Z 00 = (λ + µ)Z.
• Generally, we see that separation converts PDEs into individual Sturm-Liouville problems, with
a specified relation between the eigenvalues (in this case, they must sum to zero). Each solution
is a normal mode of the system – we’ve seen this vocabulary before, applied to eigenvalues in
time. Homogeneous boundary conditions (e.g. ‘zero on this surface’) then give constraints on
the allowed eigenvalues.
• Finally, we arrive at a set of allowed solutions and superpose them to satisfy a set of given
inhomogeneous boundary conditions. This is often simplified by the orthogonality of the
eigenfunctions; we project the inhomogeneous term onto each one.
We now apply the same principle, but in spherical polar coordinates.
• In spherical coordinates, the Laplacian is

1 1 1
∇2 = ∂r (r2 ∂r ) + 2 ∂θ (sin θ∂θ ) + 2 2 ∂φ2 .
r2 r sin θ r sin θ
For simplicity, we consider only axisymmetric solutions with no φ dependence.
• Separating ψ(r, θ) = R(r)Θ(θ) yields the equations

d dΘ d 2 dR
sin θ + λ sin θ Θ = 0, r − λR = 0.
dθ dθ dr dr
• For the angular equation, we substitute x = cos θ, so that x ∈ [−1, 1], giving

d dΘ
(1 − x2 ) = −λΘ.
dx dx
This is a Sturm-Liouville equation, which is self adjoint because p(±1) = 0, with weight function
w(x) = 1. The solutions are hence orthogonal on [−1, 1].
• The solutions are the Legendre polynomials, obeying the Rodriguez formula
1 d` 2
P` (x) = (x − 1)` , λ = `(` + 1), ` = 0, 1, . . . .
2` `! dx`
They can be found by guessing a series solution and demanding the series truncates to a
finite-degree polynomial. An explicit calculation shows that
Z 1
2
Pm (x)P` (x) dx = δm` .
−1 2` +1
As in the previous example, any axisymmetric boundary condition on a sphere can be expanded
in Legendre polynomials.
• Finally, the radial equation has solution

B`
R` (r) = A` r` + .
r`+1
If we demand our solution to decay at r → ∞, or to be regular at r = 0, then we can throw
out the A` or B` .
• As an application, applying our results to the field of a point charge gives the multipole
expansion, where ` = 0 is the monopole, ` = 1 is the dipole, and so on.
• Allowing for dependence on φ, the φ equation has solution Φ(φ) = eimφ for integer m, while
the θ equation yields an associated Legendre function; the radial equation remains the same.
In cylindrical coordinates, we encounter Bessel functions in the radial equation.

√
• Separating ψ = R(r)Θ(θ)Z(z), we find that Θ(θ) = einz and Z(z) = e−z µ, while the radial
equation becomes
r2 R00 + rR0 + (µr2 − λ)R = 0.
Converting to the Sturm-Liouville form gives
n2

d dR
r − R = −µrR
dr dr r
which has the weight function w(r) = r.
• The eigenvalue µ doesn’t matter because it simply sets the length scale. Eliminating it by
√
setting x = r µ gives Bessel’s equation of order n,
d2 R dR
x2 2
+x + (x2 − n2 )R = 0.
dx dx
The solutions are the Bessel functions Jn (x) and Yn (x).
• The Bessel functions of the first kind, Jn (x), are regular at the origin, but the Yn (x) are not;
thus we can ignore them if we care about the region x → 0.
• For small x, we have

Jn (x) ∼ xn , Yn (x) ∼ x−n
while for large x, we have
cos(x − nπ/2 − π/4) sin(x − nπ/2 − π/4)
Jn (x) ∼ √ , Yn (x) ∼ √ .
x x
√
The decrease 1/ x is consistent with our intuition for a cylindrical wave.
• We also encounter Bessel functions in two-dimensional problems in polar coordinates after

separating out time; in that case time plays the same role that z does here.
• Solving the Helmholtz equation in three dimensions (again, often encountered by separating
out time) yields the spherical Bessel functions jn (x) and yn (x). They behave somewhat like
regular Bessel functions of order n + 1/2, but fall as 1/x for large x instead.
Next, we turn to the heat equation. Since it involves time, we write its solutions as Φ, while ψ is
reserved for space only.
• For positive diffusion constant K, the heat equation ‘spreads heat out’, so it is only defined for
t ∈ [0, ∞). If we try to follow the time evolution backwards, we generically get singularities at
finite time.
R
• The heat flux is K∇Φ. Generally, we can show that the total heat Φ dV is conserved as long
as no heat flux goes through the boundary.
• Another useful property is that if Φ(x, t) solves the heat equation, then so does Φ(λx, λ2 t),
as can be checked explicitly. Then the time
√ dependence of any solution can be written as a
function of the similarity variable η = x/ Kt.
• For the one-dimensional

√ heat equation, ∂Φ/∂t = K∂ 2 Φ/∂x2 , we can write the solution as
Φ(x, t) = F (η)/ Kt. Then the equation reduces to
2F 0 + ηF = const.
This shows that the normalized solution with F 0 (0) = 0 is
exp(−x2 /4Kt)
G(x, t) = √ .
4πKt
This is called the heat kernel, or the fundamental solution of the heat equation; at t = 0 it
limits to δ(x). Convolving it with the state at time t0 gives the state at time t0 + t.
• Separating out time, Φ = T (t)ψ(r) gives the Helmholtz equation,
∇2 ψ = −λψ, T (t) = e−λt , λ > 0.
That is, high eigenvalues are quickly suppressed. For example, if we work on the line, where the
spatial solutions are exponentials, and recall the decay properties of Fourier series, evolution
under the heat equation for an infinitesimal time removes discontinuities!
• Since the heat equation involves time, we must also supply an initial condition along with
standard spatial boundary conditions. We now prove uniqueness for Dirichlet conditions in
time and space. Let Φ1 and Φ2 be solutions and let δΦ be their difference. Then
Z Z Z
d
δΦ dV ∝ (δΦ)∇ δΦ dV = − (∇δΦ)2 dV ≤ 0
2 2
dt Ω Ω Ω
where we integrated by parts and applied the boundary conditions to remove the surface term.
Then the left-hand side is decreasing, but it starts at zero by the initial conditions, so it is
always zero. (We can also show this by separating variables.)
• The spatial domain Ω must be compact for the integrals above to exist. For example, in an
infinite domain we can have heat forever flowing in from infinity, giving a nonunique solution.
Example. The cooling of the Earth. We model the Earth as a sphere of radius R with an isotropic
heat distribution and initial conditions
Φ(r, 0) = Φ0 for r < R, Φ(R, t) = 0 for t > 0
so that the Earth starts with a uniform temperature, with zero temperature at the surface (i.e. outer
space). We separate variables by Φ(r, t) = R(r)T (t) giving

d 2 dR dT
r = −λ2 r2 R, = −λ2 KT.
dr dr dt
The radial equation has sinusoids decaying as 1/r for solutions,

sin(λr) cos(λr)
R(r) = Bλ + Cλ .
r r
For regularity at r = 0, we require Cλ = 0. To satisfy the homogeneous boundary condition, we set
λ = nπ/R, giving the solution
2 2
1X nπr n π
Φ(r, t) = An sin exp − 2 Kt .
r R r
n∈Z
We then choose the coefficients An to fit the inhomogeneous initial condition. At time t = 0,
nπr Z R nπr
X Θ0 R
rΘ0 = An sin → An = Θ0 sin r dr = (−1)n+1 .
R 0 R nπ
n∈Z
The solution is not valid for r > R because the thermal diffusivity K changes, from the value for
rock to the value for air.
Note. Solving problems involving the wave equation is rather similar; the only difference is that
we get oscillation in time rather than exponential decay, and that we need both an initial position
and velocity. To prove uniqueness, we use the energy functional
Z
1
E= φ̈ + c2 (∇φ)2 dV
2 Ω
which is positive definite and conserved. Then the difference of two solutions has zero initial energy,
so it must be zero.
Note. There is no fundamental difference between initial conditions and (spatial) boundary con-
ditions: they both are conditions on the boundary of the spacetime region where the PDE holds;
Dirichlet and Neumann boundary conditions correspond exactly to initial positions and velocities.
However, in practice they are treated differently because the time condition is ‘one-sided’: while we
can specify that a rope is held at both of its ends, we usually can’t specify where it’ll be both now
and in the future. As a result, while we only often need one (two-sided) boundary condition to get
uniqueness, we need as many initial conditions as there are time derivatives.
Note. In our example above, the initial condition is inhomogeneous and the boundary condition is
homogeneous. But if both were inhomogeneous, our method would fail because we wouldn’t have
any conditions to constrain the eigenvalues. In this case the trick is to use linearity, which turns
the problem into the sum of two problems, each with one homogeneous condition.
10.2 The Fourier Transform

Fourier transforms extend Fourier series to nonperiodic functions f : R → C.
• We define the Fourier transform f˜ = F [f ] by

Z
f˜(k) = e−ikx f (x) dx.
All integrals in this section are over the real line. The Fourier transform is linear, and obeys
f˜(k/c)
F [f (x − a)] = e−ika f˜(k), F [ei`x f (x)] = f˜(k − `), F [f (cx)] = .
|c|
• Defining the convolution of two functions as

Z
(f ∗ g)(x) = f (x − y)g(y) dy
the Fourier transform satisfies

F [f ∗ g] = F [f ]F [g].
• Finally, the Fourier transform converts differentiation to multiplication,
F [f 0 (x)] = ik f˜(k).
This allows differential equations with forcing to be rewritten nicely. If L(∂)y(x) = f (x),
F [L(∂)y] = L(ik)ỹ(k), ỹ(k) = f˜(k)/L(ik).
• The Fourier transform can be inverted by

Z
1
f (x) = eikx f˜(k) dk.
2π
This can be derived by taking the continuum limit of the Fourier series. In particular,
1
f (−x) = F [f˜(k)]
2π
which implies that F 4 = (2π)2 . Intuitively, a Fourier transform is a rotation in (x, p) phase
space by 90 degrees.
• Parseval’s theorem carries over, as

1 ˜ ˜
(f, f ) = (f , f ).
2π
This expression also holds replacing the second f with g, as unitary transformations preserve
inner products.
• Defining the Fourier transform of a δ-function requires some more distribution theory, but
naively we have F [δ(x)] = 1, with the inverse Fourier transform implying the integral
Z
e−ikx dx = 2πδ(k).
This result only makes sense in terms of distributions. As corollaries, we have
F [δ(x − a)] = e−ika , F [ei`x ] = 2πδ(k − `)
which imply
F [cos(`x)] = π(δ(k + `) + δ(k − `)), F [sin(`x)] = iπ(δ(k + `) − δ(k − `)).

Example. The Fourier transform of a step function Θ(x) is subtle. In general, the Fourier trans-
forms of ordinary functions can be distributions, because functions in Fourier space are only linked
to observable quantities in real space via integration. Naively, we would have 1/ik since δ is the
derivative of Θ, but this is incorrect because dividing by k gives us extra δ(k) terms we haven’t
determined. Instead, we add an infinitesimal damping Θ(x) → Θ(x)e−x giving
1 1
FΘ = lim = P + πδ(k)
→0+ + ik ik
by the Sokhotsky formula. As a consistency check, we have
1
F[Θ(−x)] = −P + πδ(k)
ik
and the two sum to 2πδ(k), which is indeed the Fourier transform of 1.
Note. There is an alternative way to think about the Fourier transform of the step function. For
any function f (x), split
f (x) = f+ (x) + f− (x)
where the two terms have support for positive and negative x respectively. Then take the Fourier
transform of each piece. The point of this split is that for nice functions, the Fourier integral
Z ∞
˜
f+ (k) = f+ (x)eikx dx
0
will converge as long as Im k is sufficiently large; note we are now thinking of k as complex-valued.
The Fourier transform can be inverted as long as we follow a contour across the complex k plane in
this region of large Im k. For the step function, we hence have
1
FΘ = , Im k > 0.
ik
The expression is not valid at Im k = 0, so we cannot integrate along this axis. This removes the
ambiguity of whether we cross the pole above or below, at the cost of having to keep track of where
in the complex plane FΘ is defined. Often, as here, we can analytically continue f˜+ and f˜− to a
much greater region of the complex plane. A Fourier inversion contour is then valid as long as it
passes above all the singularities of f˜+ and below those of f˜− . In a more general situation, there
could also be branch cuts that obstruct the contour.
Example. Solving a differential equation by Fourier transform. Let (∂ 2 + m2 )φ(x) = −ρ(x). In

the naive approach, we have
(k 2 − m2 )φ̃(k) = ρ̃(k)
from which we conclude the Green’s function is
1
G̃(k) = .
k2 − m2
Then, to find the solution to the PDE, we perform the inverse Fourier transform for
Z ikx
1 e ρ̃(k)
φ(x) = dk.
2π k 2 − m2
However, this integral does not exist, so we must resort to performing a contour integral around the
poles. This ad hoc procedure makes more sense using distribution theory. We can’t really divide
by k 2 + m2 since G̃(k) is a distribution, so instead
1
G̃(k) = P + g1 δ(k − m) + g2 δ(k + m)
k 2 + m2
with g1 and g2 undetermined, reflecting the fact that the Green’s function is not uniquely defined
without boundary conditions. By the Sokhotsky formula, we can go back and forth between the
principal value and the i regulator at the cost of modifying g1 and g2 . This is extremely useful
because of the link between causality and analyticity, as we saw for the Kramers-Kronig relations.
In particular, the retarded and advanced Green’s functions are just
1 1
G̃ret (k) = , G̃adv (k) =
k2 − m2 − ik k2 − m2 + ik
with no need for more delta function terms at all. Similarly, if we had a PDE instead, the general
Green’s function would be
1
G̃(k) = P 2 + g(k)δ(k 2 − m2 )
k + m2
and the function g(k) must be determined by boundary conditions.
Example. Solving another differential equation using a Fourier transform in the complex plane.
We consider Airy’s equation
d2 y
+ xy = 0.
dx2
We write the solution as a generalized Fourier integral
Z
y(x) = g(ζ)exζ dζ.
Γ
Plugging this in and integrating by parts, we have

Z
xζ
(ζ 2 g(ζ) − g 0 (ζ))exζ dζ = 0

g(ζ)e
Γ Γ
which must vanish for all x. The first term is evaluated at the endpoints of the contour. For the
second term to vanish for all x, we must have
3 /3
g 0 (ζ) = ζ 2 g(ζ), g(ζ) = Ceζ .
At this point, this might seem strange, as we were supposed to have two independent solutions. But
note that in order for g(ζ)exζ to vanish at the endpoints, the contour must go to infinity in one of
the unshaded regions below.
If we take a contour that starts and ends in the same region, then we will get zero by Cauchy’s
theorem. Then there are two independent contours, starting in one region and ending in another,
giving the two independent solutions; all others are related by summation or negation. Of course,
the integrals cannot be performed in closed form, but for large x the integrals are amenable to
saddle point approximation.
Note. The discrete Fourier transform applies to functions defined on Zn and is useful for computing.
It’s independent of the Fourier series we considered earlier; their common property of a discrete
spectrum comes from the compactness of the domains S 1 and Zn . More generally, we can perform
Fourier analysis on any Abelian group, or even any compact, possibly non-Abelian group.
Example. Fourier transforms are useful for linear time-translation invariant (LTI) systems, LI = O.
These are more general than linear differential operators, as L might integrate I or impose a time
delay. However, their response is local in frequency space, because if L(eiωt ) = O(t), then
L(eiω(t−t0 ) ) = O(t − t0 ) = O(t)e−iωt0
which shows that O(t) ∝ eiωt . Thus we can write
˜ R̃(ω)
Õ(ω) = I(ω)
where R̃ is called the transfer function or system function. Taking an inverse Fourier transform
gives O(t) = (I ∗ R)(t), so R behaves like a Green’s function; it is called the response function.
As an explicit example, consider the case
n
X di O(t)
ai = I(t)
dti
i=0
where R is simply a Green’s function. In this case we have
J J X kj
1 1 Y 1 X Γmj
R̃(ω) = n
= k
=
a0 + a1 iω + · · · + an (iω) an (iω − cj ) j (iω − cj )m
j=1 j=1 m=1
where the cj are the roots of the polynomial and the kj are their multiplicities, and we used partial
fractions in the last step. In the case m = 1, we recall the result from the example above,
1
F[eαt Θ(t)] = , Re(α) < 0.
iω − α
Therefore, using the differentiation rule, we have
1
F[(tm eαt /m!)Θ(t)] = , Re(α) < 0
(iω − α)m+1
which provides the general solution for R(t). We see that oscillatory/exponential solutions appear
as poles in the complex plane, while higher-order singularities provide higher-order resonances.
Example. Stabilization by negative feedback. Consider a system function R̃(ω). We say the system
is stable if it doesn’t have exponentially growing modes; this corresponds to R̃(ω) having no poles
in the upper half-plane. Now suppose we attempt to stabilize a system by adding negative feedback,
feeding the output scaled by −r and time delayed by t0 back into the input. Defining the feedback
factor k = reiωt0 , the new system function is
R̃(ω)
R̃(ω)loop =
1 + k R̃(ω)
by the geometric series formula; this result is called Black’s formula. Then the new poles are given
by the zeroes of 1 + αR̃(ω).
The Nyquist criterion is a graphical method for determining whether the new system is stable.
We consider a contour C along the real axis and closed along the upper half-plane, encompassing all
poles and zeroes of R̃(ω). The Nyquist plot is a plot of R̃(ω) along C. By the argument principle,
the number of times the Nyquist plot wraps around −1 is equal to the number of poles P of R̃(ω)
in the upper-half plane minus the number of zeroes of k R̃(ω) + 1 in the upper-half plane. Then the
system is stable if the Nyquist plot wraps around −1 exactly P times. This is useful since we only
need to know P , not the location of the poles or the number of zeroes.
Note. Causality is ‘built in’ to the Fourier transform. As we’ve seen in the above examples, damping
that occurs forward in time (as required by Re(α) < 0) automatically yields singularities only in
the upper-half plane, and causal/retarded Green’s functions that vanish for t < 0.
In general, the Green’s functions returned by the Fourier transform are regular for |t| → ∞,
which serves as an extra implicit boundary condition. For example, for the damped harmonic
oscillator we have
1
G̃(ω) = 2
ω0 − ω 2 − iγω
which yields a unique G(t, τ ), because the advanced solution (which blows up at t → −∞) has been
thrown out. On the other hand, for the undamped harmonic oscillator,
1
G̃(ω) =
ω02 − ω2
the Fourier inversion integral diverges, so G(t, τ ) cannot be defined. We must specify a ‘pole
prescription’, which corresponds to an infinitesimal damping. Forward damping gives the retarded
Green’s function, and reverse damping gives the advanced Green’s function. Note that there’s no
analogue of the Feynman Green’s function; that appears in field theory because there are both
positive and negative-energy modes.
10.3 The Method of Characteristics

We begin by stepping back and reconsidering initial conditions and boundary conditions.
• Initial conditions and boundary conditions specify the value of a function φ and/or its derivatives,
on a surface of codimension 1. In general, such information is called Cauchy data, and solving
a PDE along with given Cauchy data is called a Cauchy problem.
• A Cauchy problem is well-posed if there exists a unique solution which depends continuously
on the Cauchy data. We’ve seen that the existence and uniqueness problem can be subtle.
• We have already seen that the backwards heat equation is ill-posed. Another example is
Laplace’s equation on the upper-half plane with boundary conditions
sin(Ax)
φ(x, 0) = 0, ∂y φ(x, 0) = g(x), g(x) = .
A
In this case the solution is

sin(Ax) sinh(Ay)
φ(x, y) =
A2
which diverges in the limit A → ∞, through the exponential dependence in sinh(Ay), even
though g(x) continuously approaches zero.
The method of characteristics helps us formalize how solutions depend on Cauchy data.
• We begin with the case of a first order PDE in R2 ,
α(x, y)∂x φ + β(x, y)∂y φ = f (x, y).
Such a PDE is called quasi-linear, because it is linear in φ, but the functions α and β are not
linear in x and y.
• Defining the vector field u = (α, β), the PDE becomes
u · ∇φ = f.
The vector field u defines a family of integral curves, called characteristic curves,
Ct (s) = {x(s, t), y(s, t)}
where s is the parameter along the curve and t identifies the curve, satisfying

∂x ∂y
= α|Ct , = β|Ct .
∂s t ∂s t
• In the (s, t) coordinates, the PDE becomes a family of ODEs,

∂φ
= f |Ct
∂s t
Therefore, for a unique solution to exist, we must specify Cauchy data at exactly one point
along each characteristic curve, i.e. along a curve B transverse to the characteristic curves. The
value of the Cauchy data at that point determines the value of φ along the entire curve. Each
curve is completely independent of the rest!
Example. The 1D wave equation is (∂x2 − ∂t2 )φ = 0, which contains both right-moving and left-
moving waves. The simpler equation (∂x − ∂t )φ = 0 only contains right-moving waves; the charac-
teristic curves are x − t = const.
Example. We consider the explicit example
ex ∂x φ + ∂y φ − 0, φ(x, 0) = cosh x.
The vector field (ex , 1) has characteristics satisfying

dx dy
= ex , =1
ds ds
which imply
e−x = −s + c, y =s+d
where the constants c and d reflect freedom in the parametrizations of s and t. To fix s, we
demand that the characteristic curves pass through B at s = 0. To fix t, we parametrize B itself
by (x, y) = (t, 0). This yields
e−x = −s + e−t , y = s
and the solution is simply φ(s, t) = cosh t. Inverting gives the result
φ(x, y) = cosh log(y + e−x ).
We could also add an inhomogeneous term on the right without much more effort.
Next, we generalize to the case of second-order PDEs, which yield new features.
• Consider a general second-order linear differential operator
L = aij (x)∂i ∂j + bi (x)∂i + c(x), x ∈ Rn
where we choose aij to be symmetric. We define the symbol of L to be
σ(x, k) = aij (x)ki kj + bi (x)ki + c(x).
We similarly define the symbol of a PDE of general order.
• The principle part of the symbol, σ P (x, k), is the leading term. In the second-order case it is
an x-dependent quadratic form,
σ P (x, k) = kT Ak.
• We classify L by the eigenvalues of A. The operator L is
– elliptic if the eigenvalues all have the same sign (e.g. Laplace)
– hyperbolic if all but one of the eigenvalues have the same sign (e.g. wave)
– ultrahyperbolic if there is more than one eigenvalue with each sign (requires d ≥ 4)
– parabolic if there is a zero eigenvalue (i.e. the quadratic form is degenerate) (e.g. heat)
• We will focus on the two-dimensional case, where we have

a b
A=
b c
and L is elliptic if ac − b2 > 0, hyperbolic if ac − b2 < 0, and parabolic if ac − b2 = 0. The

names come from the conic section L is in Fourier space.
• When the coefficients are constant, then the Fourier transform of L is the symbol σ(ik). Another
piece of intuition is that the principle part of the symbol dominates when the solution is rapidly
varying.
• From our previous work, we’ve seen that typically we need:
– Dirichlet or Neumann boundary conditions on a closed surface, for elliptic equations

– Dirichlet and Neumann boundary conditions on an open surface, for hyperbolic equations
– Dirichlet or Neumann boundary conditions on an open surface, for parabolic equations
Generically, stricter boundary conditions will not have solutions, or will have solutions that
depend very sensitively on them.
Now we apply the method of characteristics for second-order PDEs.
• In this case, the Cauchy data consists of the value of φ on a surface Γ along with the normal
derivative ∂n φ. Let ti denote the other directions. In order to propagate the Cauchy data to a
neighboring surface, we need to know the normal second derivative ∂n ∂n φ.
• Since we know φ on all of Γ, we know ∂ti ∂tj φ and ∂n ∂ti φ. To attempt to find ∂n ∂n φ we use
the PDE, which is
∂2φ
aij = known.
∂xi ∂xj
Therefore, we know the value of ann ∂n ∂n φ, which gives the desired result unless ann is zero.
• We define a characteristic surface Σ to be one whose normal vector nµ obeys aµν nµ nν = 0.

Then we can propagate forward the Cauchy data on Γ as long as it is nowhere tangent to a
characteristic surface.
• Generically, a characteristic surface has dimension one. In two dimensions, they are lines, and
an equation is hyperbolic, parabolic, or elliptic at a point if it has two, one, or zero characteristic
curves through that point.
Example. The wave equation is the archetypal hyperbolic equation. It’s easiest to see its charac-
teristic curves in ‘light-cone’ coordinates where ξ± = x ± ct, where it becomes
∂2φ
= 0.
∂ξ+ ∂ξ−
Then the characteristic curves are curves of constant ξ± . Information is propagated along these
curves in the sense that the general solution is f (ξ+ ) + g(ξ− ). On the other hand, the value of φ at
a point depends on all the initial Cauchy data in its past light cone; the ‘domain of dependence’ is
instead bounded by characteristic curves.
10.4 Green’s Functions for PDEs

We now find Green’s functions for PDEs, using the Fourier transform. We begin with the case of
an unbounded spatial domain.
• We consider the Cauchy problem for the heat equation on Rn × [0, ∞),
∂φ
D∇2 φ = , φ(x, t = 0) = f (x), lim φ(x, t) = 0.
∂t x→∞
To do this, we find the solution for initial condition δ(x) (called the fundamental solution) by
Fourier transform in space, giving
2
2 e−x /4Dt
Sn (x, t) = F −1 [e−Dk t ] = .
(4πDt)n/2
The general solution is given by convolution with the fundamental solution. As expected, the
position x only enters through the similarity variable x2 /t. We also note that the heat equation
is nonlocal, as Sn (x, t) is nonzero for arbitrarily large x at arbitrarily small t.
• We can also solve the heat equation with forcing and homogeneous initial conditions,
∂φ
− D∇2 φ = F (x, t), φ(x, t = 0) = 0.
∂t
In this case, we want to find a Green’s function G(x, t, y, τ ) representing the response to a δ-
function source at (y, t). Duhamel’s principle states that it is simply related to the fundamental
solution,
G(x, t, y, τ ) = Θ(t − τ )Sn (x − y, t − τ ).
To understand this, note that we can imagine starting time at t = τ + . In this case, we don’t
see the δ-function driving; instead, we see its outcome, a δ-function initial condition at y. The
general solution is given by convolution with the Green’s function.
• In both cases, a time direction is picked out by specifying φ(t = 0) and solving for φ at times
t > 0. In particular, this forces us to get the retarded Green’s function.
• As another example, we consider the forced wave equation on Rn × (0, ∞) for n = 3,
∂2φ
− c2 ∇2 φ = F, φ(t = 0) = ∂t φ(t = 0) = 0.
∂t2
Taking the spatial Fourier transform, the Green’s function satisfies
2
∂
+ k c G̃(k, t, y, τ ) = e−ik·y δ(t − τ ).
2 2
∂t2
Applying the initial condition and integrating gives
sin(kc(t − τ ))
G̃(k, t, y, τ ) = Θ(t − τ )e−ik·y .
kc
This result holds in all dimensions.
• To take the Fourier inverse, we perform the k integration in spherical coordinates, but the final
angular integration is only nice in odd dimensions. In three dimensions, we find
δ(|x − y| − c(t − τ ))
G(x, t, y, τ ) = −
4πc|x − y|
so that a force at the origin makes a shell that propagates at speed c. In one dimension, we
instead have G(x, t, y, τ ) ∼ θ(|x − y| − c(t − τ )), so we find a raised region whose boundary
propagates at speed c. In even dimensions, we can’t perform the eikr cos θ dθ integral. Instead,
we find a boundary that propagates with speed c with a long tail behind it.
• Another way to phrase this is that in one dimension, the instantaneous force felt a long distance
from the source is a delta function, just like the source. In three dimensions, it is the derivative.
Then in two dimensions, it is the half-derivative, but this is not a local operation.
• The same result can be found by a temporal Fourier transform, or a spacetime Fourier transform.
In the latter case, imposing the initial condition to get the retarded Green’s function is a little
more subtle, requiring a pole prescription.
• For the wave equation, Duhamel’s principle relates the Green’s function to the solution for an
initial velocity but zero initial position.
The Green’s function is simply related to the fundamental solution only on an unbounded domain.
In the case of a bounded domain Ω, Green’s functions must additionally satisfy boundary conditions
on ∂Ω. However, it is still possible to construct a Green’s function using a fundamental solution.
Example. The method of images. Consider Laplace’s equation defined on a half-space with
homogeneous Dirichlet boundary conditions φ = 0. The fundamental solution is the field of a point
charge. The Green’s function can be constructed by putting another point charge with opposite
charge, ‘reflected’ in the plane; choosing the same charge would work for homogeneous Neumann
boundary conditions.
The exact same reasoning works for the wave equation. Dirichlet boundary conditions correspond
to a hard wall, and we imagine an upside-down ‘ghost wave’ propagating the other way. Similarly,
for the heat equation, Neumann boundary conditions correspond to an insulating barrier, and we
can imagine a reflected, symmetric source of heat.
For less symmetric domains, Green’s functions require much more work to construct. We consider
the Poisson equation as an extended example.
• We begin with finding the fundamental solution to Poisson’s equation,

∇2 Gn (x) = δ n (x).
Applying rotational symmetry and integrating over a ball of radius r,
Z Z Z
dGn
1= ∇2 Gn dV = ∇Gn · dS = rn−1 dΩn .
Br ∂Br dr S n−1
Denoting An as the area of the (n − 1)-dimensional sphere, we have

x + c1 n = 1,


log x
Gn (x) = 2π + c2 n = 2,

− 1 1
n ≥ 3.
An (n−2) xn−2 + cn

For n ≥ 3 the constant can be set to zero if we require Gn → 0 for x → ∞. Otherwise, we need
additional constraints. We then define Gn (x, y) = Gn (x − y), which is the response at x to a
source at y.
• Next, we turn to solving the Poisson equation on a compact domain Ω. We begin with deriving
some useful identities. For any regular functions φ, ψ : Ω → R,
Z Z Z
φ∇ψ · dS = ∇ · (φ∇ψ) dV = φ∇2 ψ + (∇φ) · (∇ψ) dV
∂Ω Ω Ω
by the divergence theorem. This is Green’s first identity. Antisymmetrizing gives

Z Z
2 2
φ∇ ψ − ψ∇ φ = (φ∇ψ − ψ∇φ) · dS
Ω ∂Ω
which is Green’s second identity.
• Next, we set ψ(x) = Gn (x, y) and ∇2 φ(x) = −F (x), giving Green’s third identity
Z Z
φ(y) = − Gn (x, y)F (x) dV + (φ(x)∇Gn (x, y) − Gn (x, y)∇φ(x)) · dS
Ω ∂Ω
where we used a delta function to do an integral, and all derivatives are with respect to x.
• At this point it looks like we’re done, but the problem is that generally we can only specify φ or
∇φ · n̂ at the boundary, not both. Once one is specified, the other is determined by uniqueness,
so the equation above is really an expression for φ in terms of itself, not a closed form for φ.
• For concreteness, suppose we take Dirichlet boundary conditions φ|∂Ω = g. We define a Dirichlet
Green’s function G = Gn + H where H satisfies Laplace’s equation throughout Ω and G|∂Ω = 0.
Then using Green’s third identity gives
Z Z
φ(y) = g(x)∇G(x, y) · dS − G(x, y)F (x) dV
∂Ω Ω
which is the desired closed-form expression! Of course, at this point the hard task is to construct
H, but at the very least this problem has no source terms.
• As a concrete example, we can construct an explicit form for H whenever the method of images
applies. For example, for a half-space it is the field of a reflected opposite charge.
• Similarly, we can construct a Neumann Green’s function. There is a subtlety here, as the
integral of ∇φ · dS must be equal to the integral of the driving F , by Gauss’s law. If this doesn’t
hold, no solution exists.
• The surface terms can be given a physical interpretation. Suppose we set φ|∂Ω = 0 in Green’s
third identity, corresponding to grounding the surface ∂Ω. At the surface, we have
(∇φ) · n̂ ∝ E⊥ ∝ ρ
which means that the surface term is just accounting for the field of the screening charges.
• Similarly, we can interpret the surface term in our final result, when we turn on a potential
φ|∂Ω = g. To realize this, we make ∂Ω the inner surface of a very thin capacitor. The outer
surface ∂Ω0 , just outside ∂Ω, is grounded. The surfaces are split into parallel plates and hooked
up to batteries with emf g(x), giving locally opposite charge densities on ∂Ω0 and ∂Ω. Then
the potential g can be thought of as coming from nearby opposite sheets of charge. The term
∇G describes such sources, by thinking of the derivative as a finite difference.
110 11. Approximation Methods
11 Approximation Methods
11.1 Asymptotic Series
We illustrate the ideas behind perturbation theory with some algebraic equations with a small
parameter , before moving onto differential equations. We begin with some motivating examples
which will bring us to asymptotic series.
Example. Solve the equation
x2 + x − 1 = 0.
The exact solution is (
2
r

2 1− 2 + 8 + ...
x=− ± 1+ = 2
.
2 4 −1 − 2 + 8 + ...
This series converges for || < 2 and rapidly if is small; it is a model example of the perturbation
method. Now we show two ways to find the series without already knowing the exact answer.
First, rearrange the equation to the form x = f (x),
√
x = ± 1 − x.
Then we may use successive approximations,

√
xn+1 = 1 − xn .
The starting point x0 can be chosen to be an exact solution when = 0, in this case x0 = 1. Then
√
r
x1 = 1 − , x2 = 1 − 1 −
2
and so on. The xn term matches the series up to the n term. To see why, note that if the desired
fixed point is x∗ , then
xn+1 − x∗ = f (xn ) − x∗ = f (x∗ + xn − x∗ ) − x∗ ≈ (xn − x∗ )f 0 (x∗ ).
Near the fixed point we have f 0 (x∗ ) ≈ −/2, so the error decreases by a factor of every iteration.
The most important part of this method is to choose f so that f 0 (x∗ ) is small, ensuring rapid
convergence. For instance, if we had f 0 (x∗ ) ∼ 1 − instead, convergence could be very slow.
Second, expand about one of the roots when = 0 in a series in ,
x = 1 + x1 + 2 x2 + . . . .
By plugging this into the equation, expanding in powers of , and setting each coefficient to zero, we
may determine the xi iteratively. This tends to be easier when working to higher orders. In general,
one might need to expand in a different variable than , but this works for regular problems.
x2 + x − 1 = 0.
This is more subtle because there are two roots for any > 0, but only one root for = 0. Problems
where the → 0 limit differs in an important way from the = 0 case are called singular. The exact
solutions are √ (
−1 ± 1 + 4 1 − + 22 + . . .
x= =
2 − 1 − 1 + − 22 + . . .
where the series converges for || < 1/4. We see the issue is that one root diverges to infinity. We
can capture it using the expansion method by starting the series with −1 ,
x−1
x= + x0 + x1 + . . . .

This also captures the regular root in the case x−1 = −1. However, we again only knew to start the
series at 1/ by using the exact solution.
We can arrive at the same conclusion by changing variables by a rescaling,
x = X/, X 2 + x − = 0.
This is now a regular problem which can be handled as above. Again, the difficult part is choosing
the right rescaling to accomplish this. Consider the general rescaling x = δX, which gives
δ 2 X 2 + δX − 1 = 0.
The rescaling is good if the formerly singular root becomes O(1). We would thus like at least two
of the quantities (δ 2 , δ, 1) to be similar in size, with the rest much smaller. This gives a regular
perturbation problem, where the similar terms give an O(1) root, and the rest perturb it slightly. By
casework, this only happens for δ ∼ 1 and δ ∼ 1/, giving the regular and singular roots respectively.
This method is called finding the “dominant balance” or “distinguished limit”.

(1 − )x2 − 2x + 1 = 0.
We see that when = 0 we have a double root x = 1. Naively taking
x = 1 + x1 + 2 x2
we immediately find the equations
0 : 0 = 0, 1 : 0 = 1.
To see the problem, consider one of the exact solutions,

1
x= = 1 + 1/2 + + 3/2 + . . . .
1 − 1/2
Hence we should have expanded in powers of 1/2 ,
x = 1 + 1/2 x1/2 + x1 + . . . .
Setting the coefficient of n/2 to zero determines x(n−1)/2 .

To find the expansion sequence in general, we suppose
x = 1 + δ 1 x1 , δ1 () 1
and substitute it in. Simplifying, we find
δ12 x21 − + 2δ1 x1 + δ12 x21 = 0.

We now apply dominant balance again. The last two terms are always subleading, so balancing the
first two gives δ1 = 1/2 , from which we determine x1 = 1. At this point we could guess the next
term is O(), but to be safe we could repeat the procedure, setting
x = 1 + 1/2 + δ2 x2 , δ2 () 1/2 .
However, this rapidly gets more complicated for higher orders.

Finally, we could use the iterative method. We choose
xn+1 = 1 ± 1/2 xn
which ensures rapid convergence. Taking the positive root and starting with x0 = 1 gives
x1 = 1 + 1/2 , x2 = 1 + 1/2 + , ....

xe−x = .
One root is near x = 0 and is easy to approximate, as we may expand the exponential in a series;
the other becomes large as → 0. The expansion series is not obvious, so we use the iterative
procedure. We know that when x = L ≡ log 1/,
xe−x = L .
On the other hand, when x = 2L,

xe−x = 22 L .
Hence the desired solution is approximately L. The easiest way to proceed is with the iterative
method. We rearrange the equation to
xn+1 = L + log xn
and choose x0 = L. Then, omitting absolute value signs for brevity,

log L
x1 = L + log L, x2 = L + log(L + log L) = L + log L + log 1 + .
L
The final logarithm can be expanded in a series, and continuing gives us an expansion with terms of
the form (log L)m /Ln . Even for tiny , L is not very large, and log L isn’t either. Hence the series
converges very slowly.
Since we are working with expansions more general than convergent power series, we formalize them
as asymptotic expansions.
• We say f = O(g) as → 0 if there exists K and 0 so that |f | < K|g| for all < 0 .
• We say f = o(g) as → 0 if f /g → 0 as → 0.
• A set of functions {φn ()} is an asymptotic sequence as → 0 if, for each n and i > 0,
φn+i () = o(φn ()) as → 0.
• A function f () has an asymptotic expansion with respect to the asymptotic sequence {φn ()}
as → 0 if there exists constants so that
X
f () ∼ an φn ()
n
which stands for

N
X
f () = an φn () + o(φN ())
n=0
for all N .
• Given {φn }, the coefficients an of f are unique. This is easily proven by induction. However,
the converse is not true: the coefficients an don’t determine f . Just like ordinary power series,
we may be missing terms that are smaller than any of the φn .
• The above definition of asymptotic expansion implies that as → 0, for all N ≥ 0,
f () − N
P
n=0 fn ()
lim = 0.
→0 fN ()
That is, unlike the regular definition of convergence, we take → 0 rather than N → ∞.
• Asymptotic series may be integrated term by term. However, they may not be differentiated term
by term, because unlike power series, the functions fn () may be quite singular (e.g. cos(1/))
and grow much larger than expected upon differentiating.
• Asymptotic series may be plugged into each other, but some care must be taken. For example,
taking the exponential of only the leading terms of a series may give a completely wrong result;
we must instead take all terms of order 1 or higher.
• As we’ve seen above, the terms in an asymptotic series can get quite complicated. However, it
is at least true that functions obtained by a finite number of applications of +, −, ×, ∇·, exp,
and log may always be ordered; these are called Hardy’s logarithmico-exponential functions.
Example. Often an asymptotic expansion works better than a convergent power series. We have
z ∞
zX ∞
(−t2 )n 2 X (−1)n z 2n+1
Z Z
2 2 2
erf(z) = √ e−t dt = √ dt = √
π 0 π 0 n! π n=0 (2n + 1)n!
n=0
where all manipulations above are valid since the series has an infinite radius of convergence.
However, for large z the series converges very slowly, and many terms in the series are much larger
than the final result, so roundoff error affects the accuracy.
A better series can be constructed by noting
Z ∞
2 2
erf(z) = 1 − √ e−t dt.
π z
We now integrate by parts using

∞ ∞ 2 2 ∞ 2
2te−t e−z e−t
Z Z Z
−t2
e dt = dt = − dt.
z z 2t 2z z 2t2
Iterating this procedure gives

2
e−z

1 3!! 5!!
erf(z) = 1 − √ 1− 2 + − + ... .
z π 2z (2z 2 )2 (2z 2 )3
This series diverges for all z, with radius of convergence zero. However, it is an asymptotic series
as z → ∞. For large z, cutting off the series even at a few terms gives a very good approximation.
For any fixed z, the series eventually diverges as more terms are included; generally the optimal
truncation is to stop at the smallest term.
One might worry that asymptotic series don’t give a guarantee of quality, since the series can get
worse as more terms are used, but in practical terms, the usual definition of convergence doesn’t
guarantee quality either. In physics, our expansion parameters will usually be much closer to zero
than the our number of terms will be to infinity, so using an asymptotic series will be more accurate.
And in numerics, the roundoff errors due to the large terms in a convergent series can make the
result inaccurate no matter how many terms we take.
11.2 Asymptotic Evaluation of Integrals

Now we turn to some techniques for asymptotic evaluation of integrals. As we’ve seen above, the
simplest method is repeated integration by parts.
Example. If f () is smooth near = 0, then
Z
f () = f (0) + f 0 (x) dx.
0
Integrating by parts gives

Z
0
( − x)f 00 (x) dx.

f () = f (0) + (x − )f (x) +
0 0
It’s not hard to see that by repeating this, we just recover the Taylor series.
Example. We would like to evaluate
Z ∞
4
I(x) = e−t dt
x
in the limit x → ∞. Integrating by parts,

4
1 ∞ 1 d −t4 e−x 3 ∞ 1 −t4
Z Z
I(x) = − (e ) dt = − e dt.
4 x t3 dt 4x3 4 x t4
This is the beginning of an asymptotic series because the remainder term is at most I(x)/x4 , and
the ratio vanishes as x → ∞. For large x, even the first term alone is a good approximation.
Example. As a trickier example, we evaluate
Z x
I(x) = t−1/2 e−t dt
0
in the limit x → ∞. However, the simplest approach

x
1 x −3/2 −t
Z
−1/2 −t

I(x) = −t e − t e dt
0 2 0
gives a singular boundary term. Instead, we evaluate

Z ∞
√
I(x) = I(∞) − t−1/2 e−t dt, I(∞) = Γ(1/2) = π.
x
The second term may be integrated by parts, giving
∞
√ e−x 1
Z
I(x) = π− √ + t−3/2 e−t dt
x 2 x
which is the start of an asymptotic series. In general, integration by parts fails if the endpoints
yield contributions larger than the original integral itself. The reason such large contributions can
appear is that every round of integration by parts makes the remaining integral more singular at
t = 0 by differentiating the t−1/2 .
Example. We evaluate Z ∞
2
I(x) = e−xt dt
0
in the limit x → ∞. Naive integration by parts yields a singular boundary term and an infinite
remaining integral. In fact, integration by parts cannot possibly work here because the exact answer
√ √
is π/2 x, a fractional power. Integration by parts also doesn’t work if the dominant contribution
is from an interior point rather than an endpoint, which would have occurred if the lower bound
were not 0.
Laplace’s method may be used to find
Z b
I(x) = f (t)exφ(t) dt
a
in the limit x → ∞, where f (t) and φ(t) are real and continuous.
Example. Find the asymptotic behavior of
10
e−xt
Z
I(x) = dt
0 1+t
as x → ∞. For high x the integrand is localized near t = 0. Hence we split
Z −xt
e 1
I(x) = dt + O(e−x ), 1.
0 1+t x
√
Concretely, we could take = 1/ x. For the remaining integral, change variable to s = xt to yield
1 x e−s
Z
I(x) ∼ ds.
x 0 1 + s/x
Since s/x is small in the entire integration range, we Taylor expand the denominator for
∞
1 x −s X (−s)n
Z Z x
X 1
I(x) ∼ e n
ds = n+1
(−s)n e−s ds.
x 0 n
x x 0 n=0
By extending the upper limit of integration to infinity, we pick up O((x)n e−x ) error terms. Also,
by interchanging the order of summation and integration, we have produced an asymptotic series,
X 1 Z ∞ X (−1)n n!
n −s
I(x) ∼ n+1
(−s) e ds = n+1
.
n
x 0 n
x
Note that we could have gotten an easier, better bound by extending the upper bound of integration
to infinity at the start, but we do things in this order to show the general technique.
Now we develop Laplace’s method formally.
• Laplace’s method is justified by Watson’s lemma: if f (t) is continuous on [0, b] and has the
asymptotic expansion
∞
X
f (t) ∼ tα an tβn
n=0
as t → 0+ , where α > −1 and β > 0, then
Z b ∞
−xt
X an Γ(α + βn + 1)
I(x) = f (t)e dt ∼
0 xα+βn+1
n=0
as x → +∞. The conditions α > −1 and β > 0 ensure the integral converges, and in the case
b = ∞ we also require f (t) = O(ect ) for some constant c at infinity. Watson’s lemma can also
be used to justify the methods below.
• In the case where the asymptotic series for f is uniformly convergent in a neighborhood of the
origin, then Watson’s lemma may be established by interchanging the order of integration and
summation. Otherwise, we cut off the sums at a finite number of terms and simply show the
error terms are sufficiently small to have an asymptotic series.
• We now consider the general integral

Z b
I(x) = f (t)exφ(t) dt.
a
The dominant contribution comes from the maximum of φ(t), which can occur at the endpoints
or at an interior point. We’ll find only the leading contribution in each case.
• First, suppose the maximum is at t = a, and set a = 0 for simplicity. As in the example,
Z Z b
I(x) = f (x)e xφ(t)
dt + f (t)exφ(t) dt, x−1 x−1/2 .
0
Then the second term is O(e xφ0 (0) ) smaller than the first, and hence negligible if x 1.
• In the first term we assume we can expand φ(t) and f (t) in the asymptotic series
φ(t) ∼ φ(0) + tφ0 (0) + . . . , f (t) ∼ f (0) + tf 0 (0) + . . .
where generically φ0 (0) 6= 0. Changing variables to s = xt,

exφ(0) x
Z
s 0 2 00
I(x) ∼ f (0) + f 0 (0) esφ (0)+s φ (0)/2x+... ds.
x 0 x
Given that s2 /x 1, which is equivalent to x−1/2 , the second-order term in the integral
can be neglected. Similarly, the (s/x)f 0 (0) term may be neglected.
• Now the upper bound of integration can be extended to ∞ with exponentially small error, for
f (a)exφ(a)
I(x) ∼ − .
xφ0 (a)
There are also higher-order corrections which we can compute by taking higher-order terms in
the series. The overall error once these corrections are taken care of is exponentially small.
• Maxima at interior points are a bit more subtle since φ0 vanishes there. In this case suppose
the maximum is at c = 0 for simplicity, and split the integral as
Z − Z Z b
xφ(t) xφ(t)
I(x) = f (t)e dt + f (t)e dt + f (t)exφ(t) dt.
a −
As before the first and third terms are exponentially small, and negligible if x2 1, where
the different scaling occurs because the linear term φ0 (0) vanishes.
• Within the second integral we expand
t2 00 t3
φ(t) ∼ φ(0) + φ (0) = φ000 (0) + . . . , f (t) ∼ f (0) + tf 0 (0) + . . .
2 6
√
where generically φ00 (0) 6= 0. Changing variables to s = xt,
√
x
exφ(0) √
Z
s 0 2 00 3 000
I(x) ∼ √ √
(f (0) + f (0) + . . .)es φ (0)/2+s φ (0)/6 x+... ds.
x − x x
√ √ √
For the leading term to dominate, we need x/x 1 and ( x)3 / x 1. The latter is more
stringent, and putting together our constraints gives
x−1/2 x−1/3 .
• Finally, incurring another exponentially small error by extending the integration bounds to
±∞, we conclude that s
2π
I(x) ∼ f (c)exφ(c) .
−xφ00 (c)
Now we turn to the method of stationary phase.
• The method of stationary phase is used for integrals of the form

Z b
I(x) = f (t)eixψ(t) dt
a
where ψ(t) is real.

Rb
• The rigorous foundation of the method is the Riemann-Lebesgue lemma: if the integral a f (t) dt
is absolutely convergent and ψ(t) is continuously differentiable on [a, b] and not constant on
any subinterval of [a, b], then
Z b
f (t)eixψ(t) dt → 0
a
as x → ∞.
• The Riemann-Lebesgue lemma makes it easy to get leading endpoint contributions. For instance,
Z 1 ixt
ieix i 1 eixt
Z
e i
I(x) = dt = − + − dt
0 1+t 2x x x 0 (1 + t)2
and the Riemann-Lebesgue lemma ensures the remaining term is subleading.

• As in Laplace’s method, it’s more subtle to find contributions from interior points. We get a
large contribution at every point ψ 0 vanishes, since we don’t get rapid phase cancellation in
that region. Concretely, suppose the only such point is ψ 0 (c) = 0. We split the integral as
Z c− Z c+ Z b
ixψ(t) ixψ(t)
I(x) = f (t)e dt + f (t)e dt + f (t)eixψ(t) dt
a c− c+
for 1. For the first term, we integrate by parts to find
f (t) ixψ(t) c−

Z c−
ixψ(t) 1 1
f (t)e dt = e + subleading = O =O .
a ixψ 0 (t)
a xψ 0 (c − ) xψ 00 (c)
We pick up a similar contribution from the second term. Note that unlike Laplace’s method,
these error terms are only algebraically small, not exponentially small.
• For the second term, we expand
(t − c)2 00 (t − c)3 000

f (t) ∼ f (c) + (t − c)f 0 (c) + . . . , ψ(t) ∼ ψ(c) + ψ (c) + ψ (c) + . . . .
2 6
Plugging this in and changing variables to s = x1/2 (t − c) we get
x1/2
eixψ(c) s2 00 3
Z s 0 i 2 ψ (c)+i s1/2 ψ 000 (c)+...
f (c) + f (c) + . . . e 6x ds.
x1/2 −x1/2 x 1/2
The third-derivative term in the exponent is smaller by a factor of s3 /x1/2 , so it is subleading if

x−1/3 . Similarly, the f 0 (c) term is smaller by a factor of s/x1/2 , so it is subleading if 1.
• Therefore, the leading term is

x1/2
f (c)eixψ(c)
Z
s2 00 (c)
ei 2 ψ ds.
x1/2 −x1/2
When we extend the limits of integration to ±∞, we pick up O(1/x) error terms as before.
The integral can then be done by contour integration, rotating the contour to yield a Gaussian
integral, to conclude √
2πf (c)eixψ(c) e±iπ/4
I(x) = + O(1/x).
x1/2 |ψ 00 (c)|1/2
In order for this to be the leading term, it must be greater than O(1/x) and hence x−1/2 .
• Putting our most stringent constraints together, we require
x−1/2 x−1/3
just as for Laplace’s method for an interior point. Unfortunately it’s difficult to improve the
approximation, because the next terms involve nonlocal contributions.
Finally, we consider the method of steepest descents.

• Laplace’s method and the method of stationary phase are really just special cases of the method
of steepest descents, which is for contour integrals of the form
Z
I(x) = f (t)exφ(t) dt.
C
We might think naively that the greatest contribution comes from the maximum of Re φ, but
this is incorrect due to the rapid phase oscillations. Similarly regions with zero stationary phase
may have negligible magnitude.
• To get more insight, write φ = u + iv. The Cauchy-Riemann equations tell us that u and v are
harmonic functions with (∇u) · (∇v) = 0. Hence the landscape of u consists of hills and valleys
at infinity, along with saddle points. Assuming the contour goes to infinity, it must follow a
path where u → −∞ at infinity.
• Now consider deforming the path so that v is constant. Then the path is parallel to ∇u, so
it generically follows paths of steepest descent. Since u goes to −∞ at infinity, there must be
points where u0 = 0 in the contour, with each point giving a contribution by Laplace’s method.
Note that if we instead took u constant we would use the method of stationary phase, but this
is less useful because the higher-order terms are much harder to compute.
• In general we have some flexibility in the contour. Since the contribution is local, we only need
to know which saddle points it passes through, and which poles we cross. This is also true
computationally: switching to something close to the steepest descent contour makes numerical
evaluation much easier, but we don’t have to compute the exact contour for this to work.
• One might worry how to determine which saddle points are relevant. If all the zeroes of f
are simple, there is no problem because each saddle point is only connected to one valley; the
relevant saddle points are exactly those connected to the valley at the endpoints at infinity. We
are free in principle to deform the contour to pass through other saddle points, but we’d pick
up errors from the regions of high u that are much larger than the value of the integral.
Example. The gamma function for x 1. We may define the gamma function by
Z
1 1
= et t−x dt
Γ(x) 2πi C
where C is a contour which starts at t = −∞ − ia, encircles the branch cut which we take to lie
along the negative real axis, and ends at t = ∞ + ib. Rewriting the integrand at et−x log t , there is a
saddle at t = x. But since x is large, it’s convenient to rescale,
Z
1 1
= ex(s−log s) ds, t = xs.
Γ(x) 2πixx−1 C
Defining φ(s) = s − log s, the saddle is now at s = 1. The steepest descent contour passes through
s = 1 vertically. Near this point we have
(s − 1)2 (s − 1)3
φ(s) ∼ 1 + − + ....
2 3
√
Rescaling by u = x(s − 1) we have
ex u2 3
Z
1 − 3u√x +...
∼ √ e 2 du.
Γ(x) 2πixx−1 x
As usual, we extend the range of integration to infinity, giving

ex ex
Z
1 u2 /2
∼ e du = √
Γ(x) 2πixx−1/2 2πxx−1/2
where the integral converges since u ranges from −i∞ to i∞. This is the usual Stirling’s approxi-
mation, but we can get increasingly accurate approximations by going to higher order.
Example. The Airy function for x 1. The Airy function is defined by

Z
1 3
Ai(x) = ei(t /3+xt) dt.
2π C
Dividing the plane into six sextants like quadrants, the integrand only decays in the first, third, and
fifth sextants, and the contour starts at infinity in the third sextant and ends at infinity in the first.
Differentiating the exponent shows the saddle points are at t = ±ix1/2 . Rescaling t = x1/2 z,
x1/2
Z
3 /2φ(z)
Ai(x) = ex , φ(x) = i(z 3 /3 + z).
2π C
The steepest descent contour goes through the saddle point z = i but not z = −i, giving
3/2
e−2x /3
Ai(x) ∼ √ 1/4 .
2 πx
Now consider Ai(−x) for x 1. In this case the saddle points are at t = ±1 and both are relevant.
Adding the two contributions gives
!
1 π 2x3/2
Ai(−x) ∼ √ 1/4 cos − .
πx 4 3
The fact that there are two different asymptotic expansions for different regimes is called the Stokes
phenomenon. If we view Ai(z) as a function on the complex plane, these regions are separated by
Stokes and anti-Stokes lines.
11.3 Matched Asymptotics

11.4 Multiple Scales
11.5 WKB Theory

Kevin Zhou Notes

Uploaded by

Copyright:

Available Formats

Kevin Zhou Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kevin Zhou Notes

Uploaded by

Copyright:

Available Formats

Lecture Notes on

• Oxford’s Mathematics lecture notes, particularly notes on M2 Analysis, M1 Groups, A2 Metric

• David Skinner’s lecture notes on Methods. Provides a general undergraduate introduction to

9 Methods for ODEs 80

10 Methods for PDEs 94

11 Approximation Methods 110

• A point p is a limit point of E if every neighborhood of p contains a point q =

• A point p is an interior point of E if there is a neighborhood N of p such that N ⊂ E. Note

• E is open if every point of E is an interior point of E. Intuitively, E “doesn’t have edges”.

• E is bounded if there exists M and q so that d(p, q) < M for all p ∈ E.

• E is dense in X if every point of X is a limit point of E or a point of E, or both.

Example. We give some simple examples in R with the usual metric.

• All points. This set is trivially open and closed.

• For any set S, we may define the discrete metric

• Alternatively, we could use the metric

d∞ (f, g) = sup |f (x) − g(x)|.

These are both special cases of a range of metrics.

We now consider some fundamental properties of open and closed sets.

• E is open if and only if its complement E c is closed.

Prop. The closure E is the smallest closed set containing E.

• Let K ⊂ Y ⊂ X. Then K is compact in X iff it is compact in Y , so we can refer to compactness

• Closed subsets of compact sets are compact.

Lemma. All k-cells in Rk are compact.

W = {x ∈ [a, b] : finite subcover of U exists for [a, x]}, c = sup(W ).

First we show c ∈ W . Let c ∈ U ∈ U. Since U is open, it includes (c − δ, c + δ) for some δ > 0. On

If (pn ) doesn’t converge, it diverges. Note that convergence depends on X. If X is “missing”

• If E ⊂ X and p is a limit point of E, then there is a sequence (pn ) in E that converges to p.

Xn = {xk : k > n}, Un = M \ Xn .

is closed and bounded, but it is not compact, because the sequence

e1 = (1, 0, . . .), e2 = (0, 1, 0, . . .), e3 = (0, 0, 1, 0, . . .)

has no convergent subsequence.

Next, we specialize to Euclidean space, recovering some familiar results.

• If (sn ) converges to s and (tn ) converges to t, then

(sn + tn ) → s + t (csn ) → cs (c + sn ) → c + s (sn tn ) → st (1/sn ) → 1/s (if sn 6= 0).

sn tn − st = (sn − s)(tn − t) + s(tn − t) + t(sn − s)

• All of the above works similarly

• Since compactness implies sequential compactness, we have the Bolzano-Weierstrass theorem,

Finally, we introduce some convenient notation for limits.

s∗ = lim sup sn = sup E, s∗ = lim inf sn = inf E.

Next, we apply our basic tests to more specific situations.

The result follows immediately by the comparison test.

Example. We now give some example applications of the theorem.

• The series n z n /n has R = 1. We’ve already shown it diverges if z = 1. However, it converges

• The series n z n /n! has R = ∞ by the ratio test.

• Let f map E ⊂ X into Y , and let p be a limit point of E. Then we write

• In terms of sequences, an equivalent definition of limits is that

for every sequence (pn ) ∈ E so that pn 6= p and limn→∞ pn = p.

We now use this limit definition to define continuity.

• We say that f is continuous at p if

lim f (x) = f (p).

• If f maps X into Y , and g maps range F ⊂ Y into Z, and f is continuous at p and g is

Theorem. A map f : X → Y is continuous on X iff f −1 (V ) is open in X for all open sets V in Y .

• Let f : X → Y be continuous on X. Then if X is compact, f (X) is compact.

Then there exist points p, q ∈ X so that f (p) = M and f (q) = m.

Next, we relate continuity to connectedness, another topological property.

• A metric space X is disconnected if it may be written as X = A ∪ B where A and B are disjoint,

• Let f : X → Y be continuous and one-to-one on a compact metric space X. Then f −1 is

• Let f : X → Y be continuous on X. Then if E ⊂ X is connected, so is f (E). This is proved

• A set S ⊂ Rn is path-connected if, given any a, b ∈ S there is a continuous map γ : [0, 1] → S

dX (p, q) < δ implies dY (f (p), f (q)) <

f (x) = P (x) + (x)(x − α)n−1

U (P, f, α) − L(P, f, α) < .

• If U (P, f, α) − L(P, f, α) < , then we have

• A sequence of functions fn : X → Y converges uniformly on X to f : X → Y if, for all > 0,

dY (fn (x), f (x)) <

|um+1 (x) + . . . + un (x)| <