Kevin Zhou Notes
Kevin Zhou Notes
Kevin Zhou Notes
Undergraduate Math
Kevin Zhou
kzhou7@gmail.com
These notes are a review of the basic undergraduate math curriculum, focusing on the content most
relevant for physics. The primary sources were:
• Rudin, Principles of Mathematical Analysis. The canonical introduction to real analysis; terse
but complete. Presents many results in the general setting of metric spaces rather than R.
• Ablowitz and Fokas, Complex Variables. Quickly covers the core material of complex analysis,
then introduces many practical tools; indispensable for an applied mathematician.
• Artin, Algebra. A good general algebra textbook that interweaves linear algebra and focuses
on nontrivial, concrete examples such as crystallography and quadratic number fields.
• Munkres, Topology. A clear, if somewhat dry introduction to point-set topology. Also includes
a bit of algebraic topology, focusing on the fundamental group.
• Renteln, Manifolds, Tensors, and Forms. A textbook on differential geometry and algebraic
topology for physicists. Very clean and terse, with many good exercises.
Some sections are quite brief, and are intended as a telegraphic review of results rather than a full
exposition. The most recent version is here; please report any errors found to kzhou7@gmail.com.
Contents
1 Metric Spaces 1
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Real Analysis 11
2.1 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Complex Analysis 25
3.1 Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Multivalued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Contour Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Application to Real Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Conformal Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Linear Algebra 45
4.1 Exact Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 The Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Endomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Groups 53
5.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Group Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4 Composition Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Semidirect Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Rings 69
6.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Quotient Rings and Field Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5 The Structure Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Point-Set Topology 71
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 Closed Sets and Limit Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.4 The Product Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.5 The Metric Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 Algebraic Topology 79
8.1 Constructing Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2 The Fundamental Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.3 Group Presentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.4 Covering Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
1 Metric Spaces
1.1 Definitions
We begin with some basic definitions. Throughout, we let E be a subset of a fixed set X.
• A set X is a metric space if it is has a distance function d(p, q) which is positive definite (except
for d(p, p) = 0), symmetric, and satisfies the triangle inequality.
• A neighborhood of p is the set Nr (p) of all q with d(p, q) < r for some radius r > 0.
Others define a neighborhood as any set that contains one of these neighborhoods, which are
instead called “the open ball of radius r about p”. This is equivalent for proofs; the important
part is that neighborhoods always contain points “arbitrarily close” to p.
• E is closed if every limit point of E is in E. Intuitively, this means E “contains all its edges”.
The closure E of E is the union of E and the set of its limit points.
• The interior E 0 of E is the set of all interior points of E, or equivalently the union of all open
sets contained in E.
• Finite subsets of R cannot have any limit points or interior points, so they are trivially closed
and not open.
• The set (0, 1]. The limit points are [0, 1], so the set is not closed. The interior points are (0, 1),
so the set is not open.
• The set of points 1/n for n ∈ Z. The single limit point is 0, so the set is not closed.
• The interval [1, 2] in the restricted space [1, 2] ∪ [3, 4]. This is both open and closed. Generally,
this happens when a set contains “all of a connected component”.
As seen from the last example above, whether a set is closed or open depends on the space, so if we
wanted to be precise, we would say “closed in X” rather than just “closed”.
Example. There are many examples of metrics besides the usual one.
2 1. Metric Spaces
Note that in this case, the closed ball of radius 1 about p is not the closure of the open ball of
radius 1 about p.
• A metric on a vector space can be defined from an inner product, which can in turn be defined
from a norm. (However, a norm does not necessarily give a valid inner product.) For example,
for continuous functions f : [a, b] → R we have the inner product
Z b
hf, gi = f (t)g(t) dt
a
p
which gives the norm kf k = hf, f i and the metric
s
Z b
d2 (f, g) = kf − gk = (f (t) − g(t))2 dt.
a
• Arbitrary unions of open sets are open, because interior points stay interior points when we
add more points. By taking the complement, arbitrary intersections of closed sets are closed.
• Finite intersections of open sets are open, because we can take intersections of the relevant
neighborhoods. This breaks down for infinite intersections because the neighborhoods can
shrink down to nothing, e.g. let En = (−1/n, 1/n)). By taking the complement, finite unions
of closed sets are closed. Infinite unions don’t work because they can create new limit points.
Prop. For Y ⊂ X, the open sets E of Y are precisely Y ∩ G for open sets G of X.
Proof. If G is open, then moving to the smaller space Y will keep it open. Now consider the
converse. Starting with E ⊂ Y , we construct G by taking the union of all neighborhoods (in X) of
points in E. Then G is an open set of X because it is the union of open sets. Moreover, E = Y ∩ G
because E is open.
Note. Topological spaces further abstract by throwing away the metric but retaining the structure
of the open sets. A topological space is a set X along with a set T of subsets of X, called the open
sets of X, such that T is closed under all unions and finite intersections, and contains both X itself
and the null set. The closed sets are defined as the complements of open sets. The rest of our
definitions hold as before, if we think of a neighborhood of a point x as any open set containing x.
For a subspace Y ⊂ X, we use the above proposition in reverse, defining the open sets in Y by
those in X. The resulting topology is called the subspace topology.
Note. An isometry between two metric spaces X and Y is a bijection that preserves the metric.
However, topological properties only depend on the open set structure, so we define a homeomor-
phism to be a bijection that is continuous with a continuous inverse; this ensures that it induces
a bijection between the topologies of X and Y . As we’ll see below, many important properties
such as continuity depend only on the topology, so we are motivated to find topological invariants,
properties preserved by homeomorphisms, to classify spaces.
1.2 Compactness
Compactness is a property that generalizes “finiteness” or “smallness”. Though its definition is
somewhat unintuitive, it turns out to be quite useful.
• An open cover of a set E in a metric space X is a set of open sets Gi of X so that their union
contains E. For example, one open cover of E could be the set of all neighborhoods of radius r
of every point in E.
• K is compact if every open cover of K contains a finite subcover. For example, all finite sets
are compact. Since we only made reference to the open sets, not the metric, compactness is a
topological invariant.
• All compact sets are closed. Intuitively, consider the interval (0, 1/2) in R. Then the open cover
(1/n, 1) has no finite subcover; we can get ‘closer and closer’ to the open boundary.
Proof: let K ⊂ X be compact; we will show K c is open. Fixing p ∈ K c , define the open cover
consisting of the balls with radius d(p, q)/2 for all q ∈ K. Consider a finite subcover and let
dmin be the minimum radius of any ball in it. Then there is a neighborhood of radius dmin /2 of
p containing no points of K.
4 1. Metric Spaces
• All compact subsets of a metric space are bounded. This follows by taking an open cover
consisting of larger and larger balls.
• Intersections of compact sets are compact. This follows from the previous two results.
Note. The overall intuition found above is that compactness is a notion of ‘smallness’. An open
boundary is not ‘small’ because it is essentially the same as a boundary at infinity, from the
standpoint of open covers. We see that compactness is useful for proofs because the finiteness of a
subcover allows us to take least or greatest elements; we show some more examples of this below.
Example. Let K be a compact metric space. Then for any > 0, there exists an N so that every
set of N distinct points in K includes at least two points with distance less than between them.
To show this, consider the open cover consisting of all neighborhoods of radius /2. Then there’s
a finite open subcover, with M elements, centered at points pi . For N > M , we are done by the
pigeonhole principle.
Example. Let K be a compact metric space. Then K has a subset that is dense and at most
countable. To prove this, consider the open cover of all neighborhoods of radius 1. Take a finite
subcover centered at a set of points P1 . Then points in P1 are within a distance of 1 from any
S
point in K. Next construct P2 using radius 1/2, and so on. Then P = n Pn is dense and at most
countable.
Proof. This is the key lemma that uses special properties of R. For simplicity, we consider the
case k = 1, showing that all intervals [a, b] are compact. Let U be an open cover of [a, b] and define
Theorem (Heine-Borel). For any E ⊂ Rk , E is closed and bounded if and only if it is compact.
Proof. We have already shown the reverse direction above. For the forward direction, note that if
E is bounded it is a subset of a k-cell, and closed subsets of compact spaces are compact.
5 1. Metric Spaces
1.3 Sequences
We begin by defining convergence of a sequence.
• A sequence (pn ) in a metric space X converges to a point p ∈ X if, for every > 0, there is an
integer N so that if n ≥ N , then d(pn , p) < . This may also be written
lim pn = p.
n→∞
• More generally, in the context of a topological space, a sequence (pn ) converges to p ∈ X iff
every neighborhood of p contains all but finitely many of the pn .
• Sequences can only converge to one point; this is proven by considering neighborhoods of radius
/2 and using the triangle inequality.
• If a sequence converges, it must be bounded. This is because only finitely many points lie
outside any given neighborhood of the limit point p, and finite sets are bounded.
• A topological space is sequentially compact if every infinite sequence has a convergent subse-
quence, and compactness implies sequential compactness.
To see this, let (xk ) be a sequence and let
Assuming the space is not sequentially compact, the intersection of all the Xn is empty, so the
Un are an open cover with no finite subcover, so the space is not compact.
• It can be shown that sequential compactness in a metric space implies compactness, though
this does not hold for a general topological space.
Example. Consider the set of bounded real sequences `∞ . The unit cube
C = {(xk ) : |xk | ≤ 1}
• Bounded monotonic sequences of real numbers converge to their least upper/greatest lower
bounds, essentially by definition.
6 1. Metric Spaces
The proofs are easy except for the last two, where we must work to bound the error. For the
fourth, we can factor
• If sn ≤ tn for all n, then s ≤ t. To prove it, consider (tn − sn ). The range is bounded below by
0, so the closure of the range can’t contain any negative numbers.
Next, we introduce Cauchy sequences. They are useful because they allow us to say some things
about convergence without specifying a limit.
• A sequence (pn ) in a metric space X is Cauchy if, for every > 0 there is an integer N such
that d(pn , pm ) < if n, m ≥ N . Note that unlike regular convergence, this definition depends
on the metric structure.
• The diameter of E is the supremum of the set of distances d(p, q) with p, q ∈ E. Then (pn ) is
Cauchy iff the limit of the diameters dn of the sequences pn , pn+1 , . . . is zero.
• All convergent sequences are Cauchy, because if we get within /2 of the limit, then the points
themselves are within of each other.
• A metric space in which every Cauchy sequence converges is complete; intuitively, these spaces
have no ‘missing limit points’. Moreover, every closed subset E of a complete metric space X
is complete, since Cauchy sequences in E are also Cauchy sequences in X.
• Compact metric spaces are complete. This is because compactness implies sequential compact-
ness, and a convergent subsequence of a Cauchy sequence is sufficient to guaranteed the Cauchy
sequence is convergent.
• The space Rk is complete, because all Cauchy sequences are bounded, and hence inside a k-cell.
Since k-cells are compact, we can apply the previous fact. The completeness of R is one of its
most important properties, and it is what suits it for doing calculus better than Q.
• Completeness is not a topological invariant, because (0, 1) is not complete while R is; it depends
on the details of the metric. However, the property of “complete metrizability” is a topological
invariant; a topological space is completely metrizable if there exists a metric while yields the
topology under which the space is complete.
• For a real sequence, we write sn → ∞ if, for every real M , there is an integer N so that every
term after sn is at least M . We’ll now count ±∞ as a possible subsequential limit.
• Denote E as the set of subsequential limits of a real sequence (sn ), and write
It can be shown that E is closed, so it contains s∗ and s∗ . The sequence converges iff s∗ = s∗ .
Example. For a sequence containing all rationals in arbitrary order, every real number is a sub-
sequential limit, so s∗ = ∞ and s∗ = −∞. For the sequence ak = (−1)k (k + 1)/k, we have s∗ = 1
and s∗ = −1.
The notation we’ve defined above will be useful for analyzing series. For example, a series might
contain several geometric subsequences; for convergence, we care about the one with the largest
ratio, which can be extracted with lim sup.
1.4 Series
P
Given a sequence (an ), we say the sum of the series n an , if it exists, is
lim sn , sn = a1 + . . . + an .
n→∞
That is, the sum of a series is the limit of its partial sums. We quickly review convergence tests.
• Cauchy convergence test: since R is complete, we can replace convergence with the Cauchy
P
property. Then n an converges iff, for every > 0, there is an integer N so that for all
m ≥ n ≥ N , |an + . . . am | ≤ .
P
• Limit test: taking m = n, the above becomes |an | ≤ , which means that if n an converges,
than an → 0. This is a much weaker version of the above.
• A monotonic sequence converges iff it’s bounded. Then a series of nonnegative terms converges
iff the partial sums form a bounded sequence.
P P
• Comparison test: if |an | < cn for n > N0 for a fixed N0 , and n cn converges, then n an
converges. We prove this by plugging directly into the Cauchy criterion.
• Divergence test: taking the contrapositive, we can prove a series diverges if we can bound it
from below by a divergent series.
• Geometric series: for 0 ≤ x < 1, n xn = 1/(1 − x), so the series an = xn converges. To prove
P
this, write the partial sums using the geometric series formula, then take the limit explicitly.
P P n
• Cauchy condensation test: let a1 ≥ a2 ≥ . . . ≥ 0. Then n an converges iff 2 a2n does. This
surprisingly implies that we only need a small number of the terms to determine convergence.
Proof: since the terms are all nonnegative, convergence is equivalent to the partial sums being
bounded. Now group the terms an two different ways:
k
X
k
a1 + (a2 + a3 ) + . . . + (a2k + . . . + a2k+1 −1 ) ≤ a1 + 2a2 + . . . + 2 a2k = 2n a2k ,
1
8 1. Metric Spaces
k
1 1X n
a1 + a2 + (a3 + a4 ) + . . . + (a2k−1 +1 + . . . + a2k ) ≥ a1 + a2 + 2a4 + . . . = 2 a2k .
2 2
1
Then the sequences of partial sums of an and 2n a2n
are within a constant multiple of each other,
P
so each converges iff the other does. As an application, n 1/n diverges.
P
• Ratio test: let 6 0. Then the series converges if α = lim supn→∞ |an+1 /an | < 1.
an have an =
The proof is simply by comparison to a geometric series.
p P
• Root test: let α = lim supn→∞ n |an |. Then n an converges if α < 1 and diverges if α > 1.
The proof is similar to the ratio test: for sufficiently large n, we can bound the terms by a
geometric series.
• Dirichlet’s theorem: let An = nk=0 ak and Bn = nk=0 bk . Then if (An ) is bounded and (bn )
P P
P
is monotonically decreasing with limn→∞ bn = 0, then an bn converges.
Proof: we use ‘summation by parts’,
q
X q−1
X
an bn = An (bn − bn+1 ) + Aq bq − Ap−1 bp .
n=p n=p
• Alternating series test: if |ci | ≥ |ci+1 | and limn→∞ cn = 0 and the ci alternate in sign, then
P
n cn converges.
Proof: this is a special case of Dirichlet’s theorem; it also follows from the Cauchy criterion.
Example. The series n>0 1/(n(log n)p ) converges iff p > 1, by the Cauchy condensation test.
P
The general principle is that Cauchy condensation can be used to remove a layer of logarithms, or
convert a p-series to a geometric series.
Example. Tricking the ratio test. Consider the series that alternates between 3−n and 2−n . Then
half of the ratios are large, so the ratio test is inconclusive. However, the root test works, giving
α = 1/2 < 1. Essentially, the two tests do the same thing, but the root test is more powerful
because it doesn’t just look at ‘local’ information.
Note. The ratio and root test come from the geometric series test, which in turn comes from the
limit test. That is, fundamentally, they aren’t doing anything deeper than seeing if the terms blow
up. The only stronger tools we have are the Cauchy condensation test, which gives us the p-series
test, and Dirichlet’s theorem.
Example. The Fourier p-series is defined as
∞
X cos(kx)
.
kp
k=1
9 1. Metric Spaces
By comparison with p-series, it converges for p > 1. For 0 ≤ p ≤ 1, use the Dirichlet theorem with
an = cos nx and bn = 1/np . Using geometric series, we can show that (An ) is bounded as long as x
is not a multiple of 2π, giving convergence.
Next, we extend to the complex numbers and consider power series; note that our previous results
continue to work when the absolute value is replaced by the complex norm. Given a sequence (cn )
of complex numbers, the series n cn z n is called a power series.
P
p
Theorem. Let α = lim supn→∞ n |cn |. Then the power series n cn z n converges when |z| < R
P
and diverges when |z| > R, where R = 1/α is called the radius of convergence.
Proof. Immediate by the root test.
for all other z on the boundary, as this is just a variant of the Fourier p-series.
• The series n z n /n2 has R = 1 and converges for all z on the boundary by the p-series test.
P
As stated earlier, divergence of power series is not subtle; the terms become unbounded.
P P
We say the series n an converges absolutely if n |an | converges.
• Many properties that intuitively hold for convergent series really require absolute convergence;
often the absolute values appear from a triangle-inequality argument.
P P
• All absolutely convergent series are convergent because | ai | ≤ |ai | by triangle inequality.
• Power series are absolutely convergent within their radius of convergence, because the root test
only considers absolute values.
P P P
Prop. Let n an = A and n bn = B with n an converging absolutely. Then the product series
P
n cn defined by
X n
cn = ak bn−k
k=0
converges to AB. This definition is motivated by multiplication of power series.
Proof. Let βn = Bn − B. We’ll pull out the terms we want from Cn , plus an error term,
Cn = a0 b0 + . . . + (a0 bn + . . . + an b0 ) = a0 Bn + . . . + an B0 .
Pulling out An B gives
Cn = An B + a0 βn + . . . + an β0 ≡ An B + γn .
P
We want to show that γn → 0. Let α = n |an |. For some > 0, choose N so that |βn | ≤ for all
n ≥ N . Then separate the error term into
γn ≤ |β0 an + . . . βN an−N | + |βN +1 an−N +1 + . . . + βn a0 |
The first term goes to zero as n → ∞, and the second is bounded by α. Since was arbitrary,
we’re done.
10 1. Metric Spaces
Note. Series that converge but not absolutely are conditionally convergent. The Riemann rear-
rangement theorem states that for such series, the terms can always be reordered to approach any
desired limit; the idea is to take just enough positive terms to get over it, then enough negative
terms to get under it, and alternate.
11 2. Real Analysis
2 Real Analysis
2.1 Continuity
We begin by defining limits in the metric spaces X and Y .
lim f (x) = q
x→p
if, for every > 0 there is a δ > 0 such that for all x ∈ E, with 0 < dX (x, p) < δ, we have
dY (f (x), q) < . We also write f (x) → q as x → p.
• This definition is completely indifferent to f (p) itself, which could even be undefined.
lim f (pn ) = q
n→∞
• By the same proofs as for sequences, limits are unique, and in R they add/multiply/divide as
expected.
In the case where p is not a limit point of the domain E, we say f is continuous at p. If f is
continuous at all points of E, then we say f is continuous on E.
• None of our definitions care about E c , so we’ll implicitly restrict X to the domain E for all
future statements.
• Continuity for functions f : R → R is preserved under arithmetic operations the way we expect,
by the results above. The function f (x) = x is continuous, as we can choose δ = . Hence poly-
nomials and rational functions are continuous. The absolute value function is also continuous;
we can choose δ = by the triangle inequality. This can be generalized to functions from R to
Rk , which are continuous iff all the components are.
Now we connect continuity to topology. Note that if we were dealing with a topological space rather
than a metric space, the following condition would be used to define continuity.
Proof. The key idea is that every point of an open set is an interior point. Assume f is continuous
on X, and let p ∈ f −1 (V ) and q = f (p). The continuity condition states that
f (Nδ (p)) ⊂ N (q)
for some δ, given any . Choosing so that N (q) ⊂ V , this shows that p is an interior point of
f −1 (V ), giving the result. The converse is similar.
Corollary. If f is continuous, then f −1 takes closed sets to closed sets; this follows from taking
the complement of the previous theorem.
Corollary. A function f is continuous if, for every subset S ⊂ X, we have f (S) ⊂ f (S). This
follows from the previous corollary, and exhibits the intuitive notion that continuous functions keep
nearby points together.
Example. Using the definition of continuity, it is easy to show that the circle x2 + y 2 = 1 is closed,
because this is the inverse image of the closed set {1} under the continuous function f (x, y) = x2 +y 2 .
Similarly, the region x2 + xy + y 2 < 1 is open, and so on. In general continuity is one of the most
practical ways to show that a set is open or closed.
We now relate continuity to compactness.
• More generally, the connected subsets of R are the intervals, while almost every subset of Q is
disconnected.
• IVT: let f be a continuous real function defined on [a, b]. Then if f (a) < f (b) and c ∈ [f (a), f (b)],
then there exists a point x ∈ (a, b) such that f (x) = c. This follows immediately from the above
fact, because intervals are connected.
• Path connectedness implies connectedness. To see this, note that connectedness of S is equivalent
to all continuous functions f : S → Z being constant. Now consider the map f ◦ γ : [0, 1] → Z
for any continuous f . It is continuous, and its domain is connected, so its value is constant and
f (γ(0)) = f (γ(1)). Then f (a) = f (b) for all a, b ∈ S.
• All open connected subsets of Rn are path connected. However, in general connected sets are
not necessarily path connected. The standard example is the Topologist’s sine curve
• We say f : X → Y is uniformly continuous on X if, for every > 0, there exists δ > 0 so that
for all p, q ∈ X. That is, we can use the same δ for every point. For example, 1/x is continuous
but not uniformly continuous on (0, ∞) because it gets arbitrarily steep.
Lipschitz continuity implies uniform continuity, by choosing δ = /2K, and can be an easy way
to establish uniform continuity.
Example. The metric spaces [0, 1] and [0, 1) are not homeomorphic. Suppose that h : [0, 1] → [0, 1)
is such a homeomorphism. Then the map
1
1 − h(x)
is a continuous, unbounded function on [0, 1], which contradicts the IVT.
2.2 Differentiation
In this section we define derivatives for functions on the real line; the situation is more complicated
in higher dimensions.
• Let f be defined on [a, b]. Then for x ∈ [a, b], define the derivative
f (t) − f (x)
f 0 (x) = lim
t→x t−x
If f 0 is defined at a point/set, we say f is differentiable at that point/set.
• Note that our definition defines differentiability at all x that are limit points of the domain of
f , and hence includes the endpoints a and b. In more general applications, though, we’ll prefer
to talk about differentiability only on open sets, where we can ‘approach from all directions’.
• The linearity of the derivative and the product rule can be derived by manipulating the difference
quotient. For example, if h = f g, then
h(t) − h(x) f (t)(g(t) − g(x)) + g(x)(f (t) − f (x))
=
t−x t−x
which gives the product rule.
• By the definition, the derivative of 1 is 0 and the derivative of x is 1. Using the above rules
gives the power rule, (d/dx)(xn ) = nxn−1 .
• Chain Rule: suppose f is continuous on [a, b], f 0 (x) exists at some point x ∈ [a, b], g is defined on
an interval I that contains the range of f , and g is differentiable at f (x). Then if h(t) = g(f (t)),
then
h0 (x) = g 0 (f (x))f 0 (x)
To prove this, we isolate the error terms,
f (t) − f (x) = (t − x)(f 0 (x) + u(t)), g(s) − g(y) = (s − y)(g 0 (y) + v(s)).
h(t) − h(x) = g(f (t)) − g(f (x)) = (t − x) (f 0 (x) + u(t)) (g 0 (f (t))) + v(f (x))).
Thus by appropriate choices of we have the result; note that we need continuity of f to ensure
that f (t) → f (x).
15 2. Real Analysis
We now introduce the generalized mean value theorem, which is extremely useful in proofs.
• We say a function f : X → R has a local maximum at p if there exists δ > 0 so that f (q) ≤ f (p)
for all q ∈ X with d(p, q) ∈ δ.
• Given a function f : [a, b] → R, if f has a local maximum at x ∈ (a, b) and f 0 (x) exists, then
f 0 (x) = 0.
Proof: sequences approaching from the right give f 0 (x) ≤ 0, because the difference quotient is
nonnegative once we get within δ of x. Similarly, sequences from the left give f 0 (x) ≥ 0.
• Some sources define a “critical point” as a point x where f 0 (x) = 0, f 0 (x) doesn’t exist, or x is
an endpoint of the domain. The point of this definition is that these critical points are all the
points that could have local extrema.
• Rolle: if f is continuous on [a, b] and differentiable on (a, b), and f (a) = f (b), then there is a
point x ∈ (a, b) so that f 0 (x) = 0.
Proof: if f is constant, we’re done. Otherwise, suppose f (t) > f (a) for some t ∈ (a, b). Then
by the EVT, there is an x ∈ (a, b) that achieves the maximum, which means f 0 (x) = 0. If f (a)
is the maximum, we do the same reasoning with the minimum.
• Generalized MVT: if f and g are continuous real functions on [a, b] which are differentiable in
(a, b), then there is a point x ∈ (a, b) such that
• Intuitively, if we consider the curve parametrized by (f (t), g(t)), the generalized MVT states
that some tangent line to the curve is parallel to the line connecting the endpoints.
• MVT: setting g(x) = x in the generalized MVT, there is a point x ∈ (a, b) so that
• One use of the MVT is that it allows us to connect the derivative at a point, which is local,
with function values on a finite interval. For example, we can use it to show that if f 0 (x) ≥ 0,
then f is monotonically increasing.
• The MVT doesn’t apply for vector valued functions, as there’s too much ‘freedom in direction’.
The closest thing we have is the bound
Theorem (L’Hospital). Let f and g be real and differentiable in (a, b) with g 0 (x) 6= 0 for all
x ∈ (a, b). Suppose f 0 (x)/g 0 (x) → A as x → a. Then if f (x) → 0 and g(x) → 0 as x → a, or
g(x) → ∞ as x → a, then f (x)/g(x) → A as x → a.
Theorem (Taylor). Suppose f is a real function on [a, b], f (n−1) is continuous on [a, b], and f (n) (t)
exists for all t ∈ (a, b). Let α and β be distinct points in [a, b], and let
n−1
X f (k) (α)
P (t) = (t − α)k
k!
k=0
f (n) (x)
f (β) = P (β) + (β − α)n
n!
This bounds the error of a polynomial approximation in terms of the maximum value of f (n) (x).
Proof. Applying the MVT, let M be the number such that
g(α) = g 0 (α) = . . . = g (n−1) (α) = 0, g(β) = 0, g (n) (t) = f (n) (t) − n!M.
Then we wish to show that g (n) (t) = 0 for some t ∈ (α, β). Applying Rolle’s theorem gives a point
x1 ∈ (α, β) where g 0 (x1 ) = 0. Repeating this for g 0 on the interval (x1 , β) gives a point x2 where
g 00 (x2 ) = 0, and so on, giving the result.
where (x) → 0 as x → α.
2.3 Integration
In this section, we define integration over intervals on the real line.
a = x0 ≤ x1 ≤ . . . ≤ xn−1 ≤ xn = b.
• Let f be a bounded real function defined on [a, b]. Then for a partition P , define
and X X
U (P, f ) = Mi ∆xi , L(P, f ) = mi ∆xi .
17 2. Real Analysis
where the inf and sup are taken over all partitions P . These quantities are always defined if
f is bounded, because this implies that Mi and mi are bounded, which implies the upper and
lower integrals are. Conversely, our notion of integration doesn’t make sense if f isn’t bounded,
though we’ll find a way to accommodate this later.
• If the upper and lower integrals are Requal, we say f is Riemann-integrable on [a, b], write f ∈ R,
b
and denote their common value as a f dx.
and the upper and lower integrals analogously. If they are the same, weR say f is integrable
b
with respect to α, write f ∈ R(α), and denote their common value as a f dα. This is the
Riemann-Stieltjes integral, with the Riemann integral as the special case α(x) = x.
Next, we find the conditions for integrability. Below, we always let f be real and bounded, and α
be monotonically increasing, on the interval [a, b].
• P ∗ is a refinement of P if P ∗ ⊃ P (i.e. we only split existing intervals into smaller ones). Given
two partitions P1 and P2 , their common refinement is P ∗ = P1 ∪ P2 .
• Refining a partition increases L and decreases U . This is clear by considering a refinement that
adds exactly one extra interval.
• The lower integral is not greater than the upper integral. For any two partitions P1 and P2 ,
Taking sup over P1 and inf over P2 on both sides of this inequality gives the result.
• Therefore, f ∈ R(α) on [a, b] iff, for every > 0, there exists a partition so that
This follows immediately from the previous point, and will serve as a useful criterion for
integrability: we seek to construct partitions that give us an arbitrarily small ‘error’ .
where si and ti are arbitrary points in [xi−1 , xi ]. Moreover, if the integral exists,
X Z b
f (ti )∆αi − f dα < .
i a
18 2. Real Analysis
We can use these basic results to prove integrability theorems. We write ∆α = α(b) − α(a).
• If f is bounded on [a, b] and has only finitely many points of discontinuity, none of which are
also points of discontinuity of α, then f ∈ R(α).
Proof: choose a partition so that each point of discontinuity is in the interior of a segment
[ui , vi ], where these segments’ ∆αi values add up to . Then f is continuous on the compact
set [a, b] \ [ui , vi ], so applying the previous theorem gives an O() error.
The segments with discontinuities contribute at most 2M , where M = sup |f (x)| is finite.
Then the overall error is O() as desired.
• Suppose f ∈ R(α) on [a, b], m ≤ f (x) ≤ M on [a, b], φ is continuous on [m, M ], and h(x) =
φ(f (x)) on [a, b]. Then h ∈ R(α) on [a, b]. That is, continuous functions preserve integrability.
• Integration is linear,
Z b Z b Z b Z b Z b
(f1 + f2 ) dα = f1 dα + f2 dα cf dα = c f dα.
a a a a a
Pick partitions for f1 and f2 with error /2. Then by the inequality above, their common
refinement P has error at most for f1 + f2 , as desired. Moreover, using the inequality again,
Z Z Z
f dα ≤ U (P, f, α) < f1 dα + f2 dα + .
Repeating this argument with fi replaced with −fi gives the desired result.
19 2. Real Analysis
As before, the integrals on the left exist if the ones on the right do.
4f g = (f + g)2 − (f − g)2
A similar trick works with maximum and minima, as max(f, g) = (f + g)/2 + |f − g|/2.
Proof: for integrability, compose with φ(t) = |t|. The inequality follows from f ≤ |f |.
The reason we used the Riemann-Stieltjes integral is because the choice of α gives us more flexibility.
In particular, the Riemann-Stieltjes integral contains infinite series as a special case.
• If a < s < b, and f is bounded on [a, b] and continuous at s, and α(x) = I(x − s), then
Z b
f dα = f (s).
a
20 2. Real Analysis
P
• If cn ≥ 0 and n cn converges, and (sn ) is a sequence of distinct points in (a, b), and f is
continuous on [a, b], then
X Z b X
α(x) = cn I(x − sn ) → f dα = cn f (sn ).
n a n
P
Proof: the series on the right-hand side converges by comparison to n M cn where M =
sup |f (x)|. We need to show that it converges to the desired integral; to do this, consider
truncating the series after N terms so that the rest of the terms add up to , and let
N
X
αN (x) = cn I(x − sn ).
n=0
R R
Then f dαN is at most M away from f dα by the ML inequality, while the truncated series
is at most away from the full series. Taking to zero gives the result.
make sense; with the Riemann-Stieltjes integral, this equation holds whether the masses are contin-
uous or discrete, or both; the function m(x) is the amount of mass to the left of x.
• Let α increase monotonically and let α be differentiable with α0 ∈ R on [a, b]. Let f be bounded
on [a, b]. Then
Z b Z b
f dα = f (x)α0 (x) dx
a a
where one integral exists if and only if the other does.
Proof: we relate the integrals using the MVT. For all partitions P , we can use the MVT to
choose ti in each interval so that
∆αi = α0 (ti )∆xi .
Now consider taking si in each interval to yield the upper sum
X X
U (P, f, α) = f (si )∆αi = f (si )α0 (ti )∆xi .
i i
Now, since α0 is integrable, we can choose P so that U (P, α0 ) − L(P, α0 ) < . Then we have
X
|α0 (si ) − α0 (ti )|∆xi <
i
• Let ϕ be a strictly increasing continuous function that maps [A, B] onto [a, b]. Let α be
monotonically increasing on [a, b] and f ∈ R(α) on [a, b]. Let
Note. The first proof above shows another common use of the MVT: pinning down specific points
where an ‘on average’ statement is true. Having these points in hand makes the rest of the proof
more straightforward.
Note. A set A ⊂ R has measure zero if, for every > 0 there exists a countable collection of open
intervals {(ai , bi )} such that [ X
A⊂ (ai , bi ), (bi − ai ) <
i i
That is, the “length” of the set is arbitrarily small. Lebesgue’s theorem states that a bounded real
function is Riemann integrable if and only if its set of discontinuities has measure zero.
Proof: choose a partition P so that U (P, f ) − L(P, f ) < . By the MVT, we can choose points
in each interval so that
X
F (xi ) − F (xi−1 ) = f (ti )∆xi → f (ti )∆xi = F (b) − F (a)
i
Then both the upper and lower integrals are within of F (b) − F (a), and taking to zero gives
the result.
22 2. Real Analysis
Proof: first, we must show that |f | is integrable; this follows because it can be built from
squaring, addition, square root, and norm, all of which are continuous. (The square root
function is continuous because it is theR inverse of the square on the compact interval [0, M ] for
any M .) To show the bound, let y = f dα. Then
Z X Z
|y|2 = yi fi dα ≤ |y||f | dα
i
Note. The proofs above show some common techniques: using the ML inequality to bound an
error to zero, and using the MVT to get concrete points to work with.
One must treat pointwise convergence with caution; the problems boil down to the fact that
two limits may not commute. For instance, the pointwise limit of continuous functions may not
be continuous.
• Integration and pointwise convergence don’t commute. Let fn : [0, 1] → R where fn (x) = n2 on
(0, 1/n) and 0 otherwise. Then
Z 1 Z 1
lim fn (x) dx = lim n = ∞, lim fn (x) dx = 0.
n→∞ 0 n→∞ 0 n→∞
• Differentiation and pointwise convergence don’t commute. Let fn (x) = sin(n2 x)/n, so fn → 0
pointwise. But fn0 (x) = −n cos(n2 x), so fn0 (π/4) → ∞.
An analogous statement holds for differentiation and series summation.
for all x ∈ X. That is, just as in the definition of uniform continuity, we may use the same N
for all points. For concreteness, we specialize to X ⊂ R and Y = R with the standard metric.
23 2. Real Analysis
P
• Uniform convergence for series also comes in a Cauchy variant. The series uk converges
uniformly on X if and only if, for all > 0, there exists an N so that for n > m > N ,
for all x ∈ X.
P
• Weierstrass M -test: the series uk converges uniformly on X if there exist real constants Mk
so that for all k and x ∈ X,
X
|uk (x)| ≤ Mk , Mk converges.
P
This follows directly from the previous point, because Mk is a Cauchy series. This condition
is stronger than necessary, because each Mk depends on the largest value uk (x) takes anywhere,
but in practice is quite useful.
• For any δ with 0 < δ < R, k ck xk converges uniformly on [−R + δ, R − δ]. This simply follows
P
• As a result, the power series k ck xk defines a continuous function f on (−R, R). In particular,
P
this establishes the continuity of various functions such as exp(x), sin(x), and cos(x). The
reason that the “up to δ” issue above doesn’t matter is that continuity is a local condition,
which holds at individual points, while uniform convergence is global. Another way of saying
this is that a function is continuous on an arbitrary union of domains where it is continuous,
but this doesn’t hold for uniform convergence.
• Similarly, the power series k ck xk defines a differentiable function f on (−R, R) which can
P
be differentiated term by term. This takes some technical work, as we must show k kck xk−1
P
• Weierstrass’s polynomial approximation theorem states that for any continuous f : [a, b] → R,
there exists a sequence (Pn ) of real polynomials which converges uniformly to f .
25 3. Complex Analysis
3 Complex Analysis
3.1 Analytic Functions
Let f (z) = u(x, y) + iv(x, y) where z = x + iy and u and v are real.
• The derivative of a complex function f (z) is defined by the usual limit definition. We say a
complex function is analytic/holomorphic at a point z0 if it is differentiable in a neighborhood
of z0 .
ux = vy , vx = −uy
which are the Cauchy-Riemann equations. These are also a sufficient condition, as any other
directional derivative can be computed by a linear combination of these two.
• Assuming that f is twice differentiable, both u and v are harmonic functions, uxx + uyy =
vxx + vyy = 0, by the equality of mixed partials.
∇u · ∇v = ux vx + uy vy = −ux uy + ux uy = 0.
In particular, this means that u solves Laplace’s equation when conductor surfaces are given
by level curves of v.
• Locally, a complex function differentiable at z0 satisfies ∆f = f 0 (z0 )∆z. Thus the function
looks like a local ’scale and twist’ of the complex plane, which provides some intuition. For
example, z is not differentiable because it behaves like a ‘flip’ and twist.
• The mapping z 7→ f (z) takes harmonic functions to harmonic functions as long as f is differen-
tiable with f 0 (z) 6= 0. This is because the harmonic property (‘function value equal to average
of neighbors’) is invariant under rotation and scaling.
• Conformal transformations are maps of the plane which preserve angles; all holomorphic func-
tions with nonzero derivative produce such a transformation.
• A domain is an open, simply connected region in the complex plane. We say a complex function
is analytic in a region if it is analytic in a domain containing that region. If a function is
analytic everywhere, it is called entire.
26 3. Complex Analysis
• Using the formal coordinate transformation from (x, y) to (z, z) yields the Wirtinger derivatives,
1 1
∂z = (∂x − i∂y ), ∂z = (∂x + i∂y ).
2 2
The Cauchy-Riemann equations are equivalent to
∂z f = 0.
Similarly, we say f is antiholomorphic if ∂z f = 0. The Wirtinger derivatives satisfy a number
of intuitive properties, such as ∂z (zz ∗ ) = z ∗ .
As an example, we consider ideal fluid flow.
• The flow of a fluid is described by a velocity field v = (v1 , v2 ). Ideal fluid flow is steady,
nonviscous, incompressible, and irrotational. The latter two conditions translate to ∇ · v =
∇ × v = 0, which in terms of components are
∂x v1 + ∂y v2 = ∂x v2 − ∂y v1 = 0.
We are switching our derivative notation to avoid confusion with the subscripts.
• The zero curl condition can be satisfied automatically by using a velocity potential, v = ∇φ.
It is also useful to define a stream function ψ, so that
v1 = ∂x φ = ∂y ψ, v2 = ∂y φ = −∂x ψ
in which case incompressibility is also automatic.
• Since φ and ψ satisfy the Cauchy-Riemann equations, they can be combined into an analytic
complex velocity potential
Ω(z) = φ(x, y) + iψ(x, y).
• Since the level sets of ψ are orthogonal to those of φ, level sets of the stream function ψ are
streamlines. The derivative of Ω is the complex velocity,
Ω0 (z) = ∂x φ + i∂x ψ = ∂x φ − i∂y φ = v1 − iv2 .
The boundary conditions are typically of the form ‘constant velocity at infinity’ (which requires
φ to approach a linear function) and ‘zero velocity normal to an obstacle’ (which requires ψ to
be constant on its surface).
Example. The uniform flow Ω(z) = v0 e−iθ0 z. The real part is
φ(x, y) = v0 (cos θ0 x + sin θ0 y)
giving a velocity of v = v0 (cos θ0 , sin θ0 ).
Example. Flow past a cylinder. Consider the velocity potential
Ω(z) = v0 (z + a2 /z), φ = v0 (r + a2 /r) cos θ, ψ = v0 (r − a2 /r) sin θ.
At infinity, the flow has uniform velocity v0 to the right. Since ψ = 0 on r = a, this potential
describes flow past a cylindrical obstacle. To get intuition for this result, note that φ also serves
as an electric potential in the case of a cylindrical conductor at r = a, in a uniform background
field. We know that the cylinder is polarized, producing a dipole moment, and corresponding dipole
potential cos θ/r2 = x/r3 . For the fluid flow there is one less power of r since the situation is
two-dimensional.
27 3. Complex Analysis
Example. Using a conformal transformation. The complex potential Ω(z) = z 2 has stream function
2xy, and hence xy = 0 is a streamline; hence this potential describes flow at a rectangular corner.
An alternate solution is to apply conformal transformation to the boundary condition. If we define
z = w1/2 , with z = x + iy and w = u + iv, then the boundary x = 0, y = 0 is mapped to v = 0.
This problem is solved by the uniform flow Ω(w) = w, and transforming back gives the result.
• Consider w = z 1/2 , defined to be the ‘inverse’ of z = w2 . For every z, there are two values of
w, which are opposites of each other. In polar coordinates,
where θp is restricted to lie in [0, 2π) and n = 0, 1 indexes the two possible values. The surprise
is that if we go in a loop around the origin, we can move from n = 0 to n = 1, and vice versa!
• We say z = 0 is a branch point; a loop traversed around a branch point can change the value of
a multivalued function. Similarly, the point z = ∞ is a branch point, as can be seen by taking
z = 1/t and going around the point t = 0.
• A multivalued function can be rendered single-valued and continuous in a subset of the plane
by choosing a branch. Often this is done by removing a curve, called a ‘branch cut’, from the
plane. In the case above, the branch cut is arbitrary, but must join the branch points z = 0
and z = ∞. This prevents curves from going around either of the branch points. (Generally,
but not always, branch cuts connect pairs of branch points.)
• Using stereographic projection, the branch points for w = z 1/2 are the North and South poles,
and the branch cut connects them.
where n ∈ Z, and we take the logarithm of a real number to be single-valued. This function has
infinitely many branches, with a branch point at z = 0. It also has a branch point at z = ∞,
by considering log 1/z = − log z.
w = log z, z = x + iy, w = u + iv
we have
y
e2u = x2 + y 2 , tan v = .
x
28 3. Complex Analysis
The first can be easily inverted to yield u = log(x2 + y 2 )/2, which is single-valued because the real
log is, but the second is more subtle. For the inverse tangent of a real number, we customarily take
the branch so that the range is (−π/2, π/2). Then to maintain continuity of v, we set
v = tan−1 (y/x) + Ci , C1 = 0, C2 = C3 = π, C4 = 2π
in the ith quadrant. Then the branch cut is along the positive x axis. Finally, we differentiate, for
d x − iy 1
log z = ux + ivx = 2 2
=
dz x +y z
as expected.
eiw + e−iw
cos w = z =
2
and solving this as a quadratic in eiw yields
The function thus has two sources of multivaluedness. We have branch points at z = ±1 due to
the square root. There are no branch points due to the logarithm at finite z, because its argument
is never zero, but there is a branch point at infinity (as can be seen by substituting t = 1/z).
Intuitively, these branch points come from the fact that the cosine of x is the same as the cosine of
2π − x (for the square root) and the cosine of x + 2π (for the logarithm).
There are branch cuts at z = a and z = b, though one can check by setting t = 1/z that there is no
branch cut at infinity. (Intuitively, going around the ‘point at infinity’ is the same as going around
both finite branch points, each of which contribute a phase of π.) To explicitly set a branch, define
z − b = r1 eiθ1 , z − a = r2 eiθ2
so that w ∝ ei(θ1 +θ2 )/2 . A branch is thus specified by a choice of θ. For example, we may choose to
restrict 0 ≤ θi < 2π, which gives a branch cut between a and b, as shown below.
An alternative choice can send the branch cut through the point at infinity, which is more easily
visualized using stereographic projection. Similar reasoning can be used to handle any function
made of products of (z − xi )k .
29 3. Complex Analysis
Note. Branches can be visualized geometrically as sheets of Riemann surfaces, which are generated
by gluing copies of the complex plane together along branch cuts. The logarithm has an infinite
‘spiral staircase’ of such sheets, with each winding about the origin bringing us to the next.
Example. More flows. The potential Ω(z) = k log(z) with k real describes a source or sink at the
origin. Its derivative 1/z describes a dipole, i.e. a source and sink immediately adjacent.
By contrast, the potential Ω(z) = ik log(z) describes circulation about the origin. Here, the
multivaluedness of log(z) is crucial, because if the velocity potential were single-valued, then it
would be impossible to have net circulation along any path; instead going around the origin takes
us to another branch of the logarithm. (In multivariable calculus, we say that zero curl does not
imply that a function is a gradient, if the domain is not simply connected. Here, we can retain the
gradient function at the cost of making it multivalued.)
• A contour C in the complex plane can be parametrized as z(t). We will choose to work with
piecewise smooth contours, i.e. those where z 0 (t) is piecewise continuous.
• For convenience, we may sometimes require that C be simple, i.e. that it does not intersect
itself. This ensures that C winds about every point at most once.
All the usual properties of integration apply; in particular the result is independent of parametriza-
tion. In the piecewise smooth case, we simply define the integral by splitting C into smooth
pieces.
• The ML inequality states that the magnitude any contour integral is bounded by the product
of the supremum of |f (z)| and the length of the contour.
We can then apply Green’s theorem to the real and imaginary parts. Applying the Cauchy-
Riemann equations, the ‘curl’ is zero, giving the result. We need the simply connected hypothesis
to ensure that C does not contain points outside of D.
30 3. Complex Analysis
• Goursat’s theorem: Cauchy’s theorem holds without the assumption that f 0 is continuous.
Proof sketch: we break the integral down into the sum of integrals over contours with arbitrarily
small size. By Taylor’s theorem, the function can be expanded as the sum of a constant, linear,
and sublinear term within each small contour. The integrals of the first two vanish, while the
contributions of the sublinear terms go to zero in the limit of small contours.
• As a result, every analytic f in a simply connected domain has a primitive, i.e. a function F
with F 0 = f , with Z
f (z) dz = F (b) − F (a).
C
We can construct the function F by simply choosing any contour connecting a to b.
Example. We integrate f (z) = 1/z over an arbitrary closed contour which winds around the origin
once. (Equivalently, any simple closed contour containing the origin.) Since f is analytic everywhere
besides the origin, we may freely deform the contour so that it becomes a small circle of radius r
about the origin. Then
ireiθ
Z Z
dz
= dθ = 2πi.
C z reiθ
This result can be thought of as due to having a multivalued primitive F (z) = log z, or due to the
hole at the origin. The analogous calculation for 1/z n gives zero for n 6= 1, as there are single-valued
primitives 1/z n−1 .
Example. Complex fluid flow again. The circulation along a curve and flow out of a curve are
Z Z
Γ= vx dx + vy dy, F = vx dy − vy dx.
C C
Example. Let P (z) be a polynomial with degree n and n simple roots, and let C be a simple
closed contour. We wish to evaluate
P 0 (z)
I
1
I= dz.
2πi C P (z)
Q
First note that if P (z) = A i (z − ai ), then
P 0 (z) X 1
= .
P (z) z − ai
i
Every root is thus a simple pole, so the integral is simply the number of roots in C. One way to
think of this is that the integrand is really d(log P ), and here log P has logarithmic branch points
at every root, each of which gives a change of 2πi.
31 3. Complex Analysis
More generally, this shows that the standard Gaussian integral formula holds for any complex σ 2
as long as the integral converges.
• Cauchy’s integral formula states that if f (z) is analytic in and on a simple closed contour C,
I
1 f (ξ)
f (z) = dξ.
2πi C ξ − z
Then the value of an analytic function is determined by the values of points around it. The
proof is to deform the contour to a small circle about ξ = z, where the pole gives f (z). The
error term goes to zero by continuity and the ML inequality.
• As a corollary, if f (z) is analytic in and on C, then all of its derivatives exist, with
I
(k) k! f (ξ)
f (z) = dξ.
2πi C (ξ − z)k+1
For |ξ − z| > δ and |h| < δ/2, the integral is bounded by ML. Since h goes to zero, R goes
to zero as well. This also serves as a proof that f 0 (z) exists. The cases k > 1 are handed
inductively by similar reasoning.
• Intuitively, if we represent a complex function as a Taylor series, the general formulas above
simply pluck out individual terms of this series by shifting them over to 1/z.
• Morera: if f (z) is continuous in a domain D, and all contour integrals of f are zero, then f (z)
is analytic in D.
Proof: we may construct a primitive F (z) by integration, with F 0 (z) = f (z). Since F is
automatically twice-differentiable, f is analytic.
Intuitively, this is because the components of f are harmonic functions. It also follows directly
from Cauchy’s integral formula; the contour integral along the boundary is
Z 2π Z 2π
f (z0 + reiθ )
Z
1 f (z) 1 iθ 1
f (z0 ) = dz = ire dθ = f (z0 + reiθ ) dθ.
2πi C z − z0 2πi 0 reiθ 2π 0
converges for |z| < R = 1/α and diverges for |z| > R. It is uniformly convergent for |z| < R, so
we may perform term-by-term integration and differentiation. For example, the power series
∞
X
nan z n−1
n=1
• We would like to show that a function’s Taylor series converges to the function itself. For an
infinitely-differentiable real function, Taylor’s theorem states that the error of omitting the nth
and higher terms is bounded as
|f (n) (x0 )| n
error at x ≤ max x .
x0 ∈[0,x] n!
One can show this error goes to zero as n goes to infinity for common real functions, such as
the exponential.
where the geometric series is convergent since r1 < r2 . In particular, it is uniformly convergent,
so we can exchange the sum and the integral, giving
∞ ∞
f (n) (0) n
I
X 1 f (ξ) n X
f (z) = z dξ = z .
2πi ξ n+1 n!
n=0 n=0
• Therefore, we say a function is analytic at a point if it admits a Taylor series about that point
with positive radius of convergence, and is equal to its Taylor series in a neighborhood of that
point. We have shown that a complex differentiable function is automatically analytic and thus
use the terms interchangeably.
2
– More subtly, e−1/z has a singularity at z = 0 since it is not equal to its Taylor series in
any neighborhood of z = 0.
– The function 1/ sin(π/z) has singularities at z = 0 and z = 1/n for integer n, and hence
the singularity at z = 0 is not isolated.
– As a real function, the singularity at x = 0 of log x is not isolated since log x is not defined
for x < 0. As a single-valued complex function, the same holds because log z requires a
branch cut starting at z = 0.
• More generally, suppose that f (z) is complex differentiable in a region R, z0 ∈ R, and the disk
of radius r about z0 is contained in R. Then f converges to its Taylor series about z0 inside
this disk. The proof of this statement is the same as above, just for general z0 .
• The zeroes of an analytic function, real or complex, are isolated. We simply expand in a Taylor
series about the zero at z = z0 and pull out factors of z − z0 until the series is nonzero at z0 .
The remaining series is nonzero in a neighborhood of z0 by continuity.
• Suppose f (z) is analytic on the annulus A = {r1 < |z| < r2 }. Then we claim f (z) may be
written as a Laurent series
∞ ∞
X bn X
f (z) = + an z n
zn
n=1 n=0
where the two parts are called the singular/principal and analytic/regular parts, respectively,
and converge to analytic functions for |z| < r2 and |z| > r1 , respectively.
• The proof is similar to our earlier proof for Taylor series. Let z ∈ A and consider the contour
consisting of a counterclockwise circle C1 of radius greater than |z| and a clockwise circle C2 of
radius less than |z|, both lying within the annulus. By Cauchy’s integral formula,
I I ∞ I X ∞
1 f (ξ) 1 X f (ξ) n 1 f (ξ) n
f (z) = dξ = n+1
z dξ − n+1
ξ dξ
2πi C1 −C2 ξ−z 2πi C1 ξ 2πi C2 z
n=0 n=0
where both geometric series are convergent. These give the analytic and singular pieces of the
Laurent series, respectively.
• From this proof we also read off integral expressions for the coefficients,
I I
1 f (ξ) 1
an = dξ, bn = f (ξ)ξ n−1 dξ.
2πi ξ n+1 2πi
Unlike for Taylor series, none of these coefficients can be expressed in terms of derivatives of f .
• In practice, we use series expansions and algebraic manipulations to determine Laurent series,
though we must use series that converge in the desired annulus.
• Suppose f (z) has an isolated singularity at z0 , so it has a Laurent series expansion about z0 .
35 3. Complex Analysis
– If all of the bn are zero, then z0 is a removable singularity. We may define f (z0 ) = a0 to
make f analytic at z0 . Note that this is guaranteed if f is bounded by the ML inequality.
– If a finite number of the bn are nonzero, we say z0 is a finite pole of f (z). If bk is the highest
nonzero coefficient, the pole has order k. A finite pole with order 1 is a simple pole, or
simply a pole. The residue Res(f, z0 ) of a finite pole is b1 .
– Finite poles are nice, because functions with only finite poles can be made analytic by
multiplying them with polynomials.
– If an infinite number of the bn are nonzero, z0 is an essential singularity. For example, z = 0
2
for e−1/z is an essential singularity. Essential singularities behave very badly; Picard’s
theorem states that they take on all possible values infinitely many times, with at most
one exception, for any neighborhood of z0 .
– A function that is analytic on some region with the exception of a set of poles of finite
order is called meromorphic.
– Note that all of these definitions apply only to isolated poles. For example, the logarithm
has a branch cut starting at z = 0, so the order of this singularity is not defined.
Example. The function f (z) = 1/(z(z − 1)) has poles at z = 0 and z = 1, and hence has a Laurent
series about z = 0 for 0 < |z| < 1 and 1 < |z| < ∞. In the first case, the result can be found by
geometric series,
1 1 1
f (z) = − = − (1 + z + z 2 + . . .).
z1−z z
We see that the residue of the pole at z = 0 is −1. In the second case, this series expansion does
not converge; we instead expand in 1/z for the completely different series
1 1 1 1 1
f (z) = = 2 1 + + 2 + ... .
z z(1 − 1/z) z z z
In particular, note that there is no 1/z term because the residues of the two (simple) poles cancel
out, as can be seen by partial fractions; we cannot use this Laurent series to compute the residue
of the z = 0 pole.
Example. Going to the complex plane gives insight into why some real Taylor series fail. First,
consider f (x) = 1/(1 + x2 ) about x = 0. This Taylor series breaks down for |x| ≥ 1 even though
the function itself is not singular at all. This is explained by the poles at z = ±i in the complex
plane, which set the radius of convergence.
2
As another example, e−1/x does not appear to be pathological on the real line at first glance.
One can see that it is not analytic because its high-order derivatives blow up, but an easier way is
2
to note that when approached along the imaginary axis, the function becomes e1/x , which diverges
very severely at x = 0.
Next, we give some methods for computing residues, all proven with Laurent series.
• If g(z) has a simple zero at z0 , then 1/g(z) has a simple pole at z0 with residue 1/g 0 (z0 ).
• In practice, we can find the residue of a function defined from functions with Laurent series
expansions by taking the Laurent series of everything, expanding, and finding the 1/z term.
• Suppose that f is analytic in a region R except for a set of isolated singularities. Then if C is
a closed curve in A that doesn’t go through any of the singularities,
I X
f (z) dz = 2πi residues of f in C counted with multiplicity.
C
This is the residue theorem, and it can be shown by deforming the contour to a set of small
circles about each singularity, and expanding in Laurent series about each one and using the
Cauchy integral formula.
Example. Find the residue at z = 0 of f (z) = sinh(z)ez /z 5 . The answer is the z 4 term of the
Laurent series of sinh(z)ez , and
z3 z2 z3
z 1 1
sinh(z)e = z + + ... 1+z+ + + ... = ... + + z4 + . . .
3! 2! 3! 4! 3!
giving the residue 5/24.
Example. The function cot(z) = cos(z)/ sin(z) has a simple pole of residue 1 at z = nπ for all
integers π. To see this, note that sin(z) has simple zeroes at z = nπ and its derivative is cos(z), so
1/ sin(z) has residues of 1/ cos(nπ). Multiplying by cos(z), which is analytic everywhere, cancels
these factors giving a residue of 1.
Example. Compute the contour integral along the unit circle of z 2 sin(1/z). There is an essential
singularity at z = 0, but this doesn’t change the computation. The Laurent series for sin(1/z) is
1 1 1
sin(1/z) = − + ...
z 3! z 3
which gives a residue of −1/6.
Example. The residue at infinity. Suppose that f is analytic in C with a finite number of singular-
ities, and a curve C encloses every singularity once. Then the contour integral along C is the sum
of all the residues. On the other hand, we can formally think of the interior of the contour as the
exterior; then we get the same result if we postulate a pole at infinity with residue
I
1
Res(f, ∞) = − f (z) dz.
2πi C
To compute this quantity, substitute z = 1/w to find
I
1 f (1/w)
Res(f, ∞) = dw
2πi C w2
where C is now negatively oriented. Now, f (1/w) has no poles inside the curve C, so the only
possible pole is at w = 0. Then
Res(f, ∞) = −Res(f (1/w)/w2 , 0)
which may be much easier to compute. Under this language, z has a simple pole at infinity, while
ez has an essential singularity at infinity.
37 3. Complex Analysis
• In order to express real integrals over the real line in terms of contour integrals, we will have to
close the contour. This is easy if the decay is faster than 1/|z| in either the upper or lower-half
plane by the ML inequality.
• Another common situation is when the function is oscillatory, e.g. it is of the form f (z)eikz . If
|f (z)| does not decay faster than 1/|z|, the ML inequality does not suffice. However, since the
oscillation on the real axis translates to decay in the imaginary direction, if we use a square
contour bounded by z = ±L, the vertical sides are founded by |f (L)|/k and the top side is
exponentially small, so the contributions vanish as L → ∞ as desired.
Example. We compute Z ∞
dx
I= .
−∞ x4+1
We close the contour in the upper-half plane; the decay is O(1/|z|4 ) so the semicircle does not
contribute. The two poles are at z1 = eiπ/4 and z2 = e3iπ/4 . An easy way to compute the residues
is with L’Hopital’s rule,
z − z1 1 e−3iπ/4 e−iπ/4 π
Res(f, z1 ) = lim = lim = , Res(f, z2 ) = , I=√ .
z→z1 1 + z 4 z→z1 4z 3 4 4 2
Example. For b > 0, we compute Z ∞
cos(x)
I= dx.
−∞ x2 + b2
For convenience we replace cos(x) with eix and take the real part at the end. Now, the function
decays faster than 1/|z| in the upper-half plane, so we close the contour there. The contour contains
the pole at z = ib which has residue e−b /2ib, giving I = πe−b /b.
Example. Integrals over angles can be replaced with contour integrals over the unit circle. We let
z + 1/z z − 1/z
z = eiθ , dz = izdθ, cos θ = , sin θ = .
2 2i
For example, we can compute
Z 2π
dθ
I= , |a| =
6 1.
0 1 + a2 − 2a cos θ
It is clear this method works for any trigonometric integral over [0, 2π).
38 3. Complex Analysis
We will place the branch cut on the positive real axis, so that for z = reiθ , we have
The desired integral I is the integral over C1 , while the integrals over Cr and CR go to zero. The
integral over C2 is on the other end of the branch
√ cut, and hence is −e2πi/3 I. Finally, including the
contributions of the two poles gives I = π/ 3.
Example. The Cauchy principal value. We compute
Z ∞
sin(x)
I= dx.
−∞ x
This is the imaginary part of the contour integral
Z iz
e
I0 = dz
C z
where the contour along the real line is closed by a semicircle. The integrand blows up along the
contour, since it goes through a pole; to fix this, we define the principal value of the integral I 0
to be the limit limr→0 I 0 (r) where a circle of radius r about the origin is deleted from the contour.
This is equal to I because the integrand sin(x)/x is not singular at the origin; in more general cases
where the original integrand is singular, the value of the integral is defined as the principal value.
Now consider the contour above. In the limit r → 0, we have I 0 = πi because it picks up “half of
the pole”, giving I = π. More generally, if the naive contour “slices” through a pole, the principal
value picks up i times the residue times the angle subtended.
Note. The idea of a principal value works for both real and complex integrals. In the case of a real
integral, we delete a small segment centered about the divergence. The principal value also exists
for integrals with bounds at ±∞, by setting the bounds to −R and R and taking R → ∞.
39 3. Complex Analysis
• A conformal map on the complex plane f (z) is a map so that the tangent vectors at any point z0
are mapped to the tangent vectors at f (z0 ) by a nonzero scaling and proper rotation. Informally,
this means that conformal maps preserve angles.
• As we’ve seen, f (z) is automatically conformal if it is holomorphic with nonzero derivative; the
scaling and rotation factor is f 0 (z0 ).
• The Riemann mapping theorem states that if a region A is simply connected, and not the entire
complex plane, then there exists a bijective conformal map between A and the unit disk; we
say the regions are conformally equivalent.
• The proof is rather technical, but is useful to note a few specific features.
• A useful set of conformal transformations are the fraction linear transformations, or Mobius
transformations
az + b
T (z) = , ad − bc 6= 0.
cz + d
Note that when ad − bc = 0, then T (z) is constant. Mobius transformations can also be taken
to act on the extended complex plane, with
a
T (∞) = , T (−d/c) = ∞.
c
They are bijective on the extended complex plane, and conformal everywhere except z = −d/c.
• When c = 0, we get scalings and rotations. The map T (z) = 1/z flips circles inside and outside
of the unit circle. As another example,
z−i
T (z) =
z+i
maps the real axis to the unit circle, and hence maps the upper half-plane to the unit disk.
40 3. Complex Analysis
• A very useful fact is that Mobius transformations can be identified with matrices,
a b
T (z) =
c d
• The subset of Mobius transformations that map the upper half-plane to itself turn out to be
the ones where a, b, c, and d are all real, and ad − bc = 1. Then the group of conformal
automorphisms of the upper half-plane contains P SL2 (R).
• In fact, these are all of the conformal automorphisms of the upper half-plane. To prove this,
one typically shows using the Schwartz lemma that the conformal automorphisms of the disk
take the form
z−a
T (z) = λ , |λ| = 1, |a| < 1
az − 1
and then notes that the upper half-plane is conformally equivalent to the disk.
• Given any three distinct points (z1 , z2 , z3 ), there exists a Mobius transformation that maps
them to (w1 , w2 , w3 ). To see this, note that we can map (z1 , z2 , z3 ) to (0, 1, ∞) by
(z − z1 )(z2 − z3 )
T (z) =
(z − z3 )(z2 − z1 )
Note. A little geometry. The reflection of a point in a line is the unique point so that any generalized
circle that goes through both points intersects the line perpendicularly. We define the reflection of
a point in a generalized circle in the same way. To prove that this reflection is unique, note that
since Mobius transformations preserve generalized circles and angles, they preserve the reflection
property; however we can use a Mobius transformation to map a given circle to a line, then use
uniqueness of reflection in a line.
Reflection in a circle is called inversion in the context of Euclidean geometry. Our “inversion”
map z 7→ 1/z is close, but it actually corresponds to an inversion about the unit circle followed by
a reflection about the real axis. The inversion along would be z 7→ 1/z.
Example. Suppose two circles C1 and C2 do not intersect; we would like to construct a conformal
mapping that makes them concentric. To do this, let z1 and z2 be reflections of each other in both
circles – it is easier to see such points exist by mapping C1 to a line and then mapping back. Now,
by a conformal transformation we can map z1 to zero and z2 to infinity, which means both centers
must end up centered at zero.
41 3. Complex Analysis
Example. Find a map from the upper half-plane with a semicircle removed to a quarter-plane.
We will use a Mobius transformation. The trick is to look at how the boundary must be mapped.
We have right angles at A and C, but only one right angle in the image; we can achieve this by
mapping A to infinity and C to zero, so
z−1
z 7→ ζ = .
z+1
To verify the boundary is correct, we note that ABC and CDA are still generalized circles after
the mapping, and verify that B and D are mapped into the imaginary and real axes, respectively.
More generally, if we need to change the angle at the origin by a factor α, we can compose by a
monomial z 7→ z α .
Example. Map the upper half plane to itself, permuting the points (0, 1, ∞). We must use Mobius
maps with real coefficients. Since orientation is preserved, we can only perform even permutations.
The answers are
1
ζ= , (0, 1, ∞) 7→ (1, ∞, 0)
1−z
and
z−1
ζ= , (0, 1, ∞) 7→ (∞, 0, 1).
z
Example. The Dirichlet problem is to find a harmonic function on a region A given specified values
on the boundary of A. For example, let A be the unit disk with boundary condition
(
1 0 < θ < π,
u(eiθ ) =
0 π < θ < 2π.
The problem can be solved by conformal mapping. We apply T (z) = (z − i)/(z + i), which maps
the real axis to the unit circle. Then A maps to the upper half-plane with boundary condition
u(x) = θ(−x), and an explicit solution is u(x, y) = θ/π = Im(log(z))/π.
More generally, consider a piecewise constant boundary condition u(eiθ ). Then the conformally
transformed solution is a sum of pieces of the form log(z − x0 ). An arbitrary boundary condition
translates to a weighted integral of log(z − x) over real x.
Example. The general case of flow around a circle. Suppose f (z) is a complex velocity potential.
Singularities of the potential correspond to sources or vertices. If there are no singularities for
|z| < R, then the Milne-Thomson circle theorem states that
is a potential for a flow with a streamline on |z| = R and the same singularities; it is what the
potential would be if we introduced a circular obstacle but kept everything else the same. We’ve
already seen the specific example of uniform flow around a circle, where f (z) = z.
42 3. Complex Analysis
• Previously, we have restricted to simple closed curves, as these wind about any point at most
once. However, we may now define the winding number or index
Z
1 dz
Ind(γ, z0 ) =
2πi γ z − z0
for any closed curve γ that does not contain z0 . This follows from Cauchy’s integral theorem;
intuitively, the integrand is d(log(z − z0 )) and hence counts the number of windings by the net
phase change.
• For any integer power f (z) = z n , we have
Z 0
f (z)
dz = 2πn, Ind(γ, 0) = 1.
γ f (z)
This is because the integrand is df /f , so it counts the winding number of f about the origin
along the curve. Moreover, (f g)0 /(f g) = f 0 /f + g 0 /g, so other zeroes or poles contribute
additively.
43 3. Complex Analysis
• Formalizing this result, for a meromorphic function f and a simple closed curve γ not going
through any of the poles, we have the argument principle
Z 0
f (z)
dz = 2π(zeroes minus poles) = 2πi Ind(f ◦ γ, 0)
γ f (z)
• Rouche’s theorem states that for meromorphic functions f and h and a simple closed curve γ
not going through any of their poles, if |h| < |f | everywhere on γ, then
Ind(f ◦ γ, 0) = Ind((f + h) ◦ γ, 0).
Intuitively, this follows from the picture of a ‘dog on a short leash’ held by a person walking
around a tree. It can be shown using the argument principle; interpolating between f and f + h,
the integral varies continuously, so the index must stay the same.
• A useful corollary of Rouche’s theorem is the case of holomorphic f and h, which gives
zeroes of f in γ = zeroes of f + h in γ.
For example, suppose we wish to show that z 5 + 3z + 1 has all five of its zeroes within |z| = 2.
• This same reasoning provides a different proof of the fundamental theorem of algebra. We let
f (z) ∝ z n be the higher-order term in the polynomial and let h be the rest. Then within a
sufficiently large circle, f + h must contain n zeroes.
• As a corollary, if f and g are holomorphic on a connected region R and agree on a set of points
with a limit point in R, then they are equal. An analytic continuation of a real function is
a holomorphic function that agrees with it on the real axis; this result ensures that analytic
continuation is unique, at least locally.
• One must be more careful globally. For example, consider the two branches of the logarithm
with a branch cut along the positive and negative real axis. The two functions agree in the first
quadrant, but we cannot conclude they agree in the fourth quadrant, because the region where
they are both defined is the complex plane minus the real axis, which is not connected.
• These global issues are addressed by the monodromy theorem, which states that analytic
continuation is unique (i.e. independent of path) if the domain we use is simply connected. This
does not hold for the logarithm, because it is nonanalytic at the origin.
• As another example, the factorial function doesn’t have a unique analytic continuation, because
the set of positive integers doesn’t have a limit point. But the gamma function, defined as an
integral expression for positive real arguments, does have a unique analytic continuation. (This
statement is sometimes mangled to “the gamma function is the unique analytic continuation
of the factorial function”, which is incorrect.)
44 3. Complex Analysis
• Consider a Taylor series with radius of convergence R. This defines a holomorphic function
within a disk of radius R and hence can be analytically continued, e.g. by taking the Taylor
series about a different point in the disk.
• As an example where this fails, consider f (z) = z +z 2 +z 4 +. . ., which has radius of convergence
1. The function satisfies the recurrence relation f (z) = z + f (z 2 ), which implies that f (1) is
n
divergent. By repeatedly applying this relation, we see that f (z) is divergent if z 2 = 1, so
the divergences are dense on the boundary of the unit disk. These divergences form a ‘natural
boundary’ beyond which analytic continuation is not possible.
45 4. Linear Algebra
4 Linear Algebra
4.1 Exact Sequences
In this section, we rewrite basic linear algebra results using exact sequences. For simplicity, we only
work with finite-dimensional vector spaces.
We say the map is exact at Vi if im ϕi−1 = ker ϕi . The general intuition is that this means Vi
is ‘made up’ of its neighbors Vi−1 and Vi+1 .
• We write 0 for the zero-dimensional vector space. For any other vector space V , there is only
one possible linear map from V to 0, or from 0 to V .
requires ϕ to be an isomorphism.
• Given this short exact sequence, there exists a linear map S : W → V so that T ◦ S = 1. We
say the exact sequence splits, and that S is a section of T . To see why S exists, take any basis
{fi } of W . Then there exist ei so that T (ei ) = fi , and we simply define S(fi ) = ei .
V = ker T ⊕ S(W )
which is a refinement of the rank-nullity theorem; this makes it clear exactly how V is determined
by its neighbors in the short exact sequence. Note that we always have dim Vi = dim Vi−1 +
dim Vi+1 , but by using the splitting, we get a direct decomposition of V itself.
Example. Exact sequences and chain complexes. Consider the chain complex with boundary
operator ∂. The condition im ϕi−1 ⊂ ker ϕi states that the composition ϕi ◦ ϕi−1 takes everything
to zero, so ∂ 2 = 0. The condition ker ϕi ⊂ im ϕi−1 implies that the homology is trivial. Thus,
homology measures the failure of the chain complex to be exact.
Example. We claim every space with a symmetric nondegenerate bilinear form g has an orthonormal
basis, i.e. a set {vi } where g(vi , vj ) = ±δij . We prove only the real case for simplicity. Let dim V = k
and suppose we have an orthonormal set of k − 1 vectors ei . Defining the projection map
k−1
X
π(v) = g(ei , ei )g(ei , v)ei
i=1
Gv = 0 ↔ g(vi ei , ej ) = 0.
By the spectral theorem, we can choose a basis so that G is diagonal; by the result above, its diagonal
entries are nonzero, so we can scale them to be ±1. This yields the desired basis. Sylvester’s theorem
states that the total number of 1’s and −1’s in the final form of G is unique. We say g is an inner
product if it is positive definite, i.e. there are no −1’s.
The determinate of the gram matrix, called the Grammian, is a useful concept. For example, for
any collection of vectors {vi }, the vectors are independent if and only if the Grammian is nonzero.
47 4. Linear Algebra
• Let the dual space V ∗ be the set of linear functionals f on V . For finite-dimensional V , V and
V ∗ are isomorphic but there is no natural map between them.
• For infinite-dimensional V , V and V ∗ are generally not isomorphic. One important exception
is when V is a Hilbert space, which is crucial in quantum mechanics.
By nondegeneracy, ψ is injective; since V and V ∗ have the same dimension, this implies it is
surjective as well.
• In the context of a complex vector space, there are some extra subtleties: the form can only be
linear in one argument, say the second, and is antilinear in the other. Then the map ψ indeed
maps vectors to linear functionals, but it does so at the cost of being antilinear itself.
• The result above also holds for (infinite-dimensional) Hilbert spaces, where it is called the Riesz
lemma; it is useful in quantum mechanics.
The dual map is often called the transpose map. To see why, pick arbitrary bases of V and W
with the corresponding dual bases of V ∗ and W ∗ . Then in components,
which implies that Aij = A∗ji . That is, expressed in terms of matrices in the appropriate bases,
they are transposes of each other.
• Given an inner product g on V and a linear map A : V → V , there is another linear map
A† : V → V called the adjoint of A, defined by
By working in an orthonormal basis and expanding in components, the matrix elements satisfy
A†ij = A∗ji
• In the case where V = W and V is a real vector space, the matrix representations of the dual
and adjoint coincide, but they are very different objects. In quantum mechanics, we switch
between a map and its dual constantly, but taking the adjoint has a nontrivial effect.
48 4. Linear Algebra
4.3 Determinants
We now review some facts about determinants.
eij = det A(i|j) where A(i|j) is A with its ith row
• Defining the ij minor of a matrix Aij to be A
and j th column removed. Define the adjugate matrix adj A to have elements
(adj A)ij = A
eji .
• By induction, we can show that the determinate satisfies the Laplace expansion formula
n
X
det A = Aij A
eij .
j=1
4.4 Endomorphisms
An endomorphism is a linear map from a vector space V to itself. The set of such endomorphisms
is called End(V ) in math, and the set of linear operators on V in physics. We write abstract
endomorphisms with Greek letters; for example, the identity map ι has matrix representation I.
For the left-hand side to be zero, xj must be zero for all j, giving the result.
• We say α is diagonalizable when its eigenspaces span all of V , i.e. V = ⊕i Ei . Equivalently, α
has a diagonal matrix representation, produced by choosing a basis of eigenvectors.
• Polynomial division: for any polynomials f and g, we may write f (t) = q(t)g(t) + r(t) where
deg r < deg g.
• As a corollary, whenever f has a root λ, we can extract a linear factor f (t) = (t − λ)g(t). The
fundamental theorem of algebra tells us that f will always have at least one root; repeating
this shows that all polynomials split into linear factors in C.
• The endomorphism α is diagonalizable if and only if there is a nonzero polynomial p(t) with
distinct linear factors such that p(α) = 0. Intuitively, each such linear factor (x − λi ) projects
away the eigenspace Ei , and since p(α) = 0, the Ei must span all of V .
Proof: The backward direction is simple. To prove the forward direction, we define projection
operators. Let the roots be λi and let
Y t − λi
qj (t) = → qj (λi ) = δij .
λj − λi
i6=j
P
Then q(t) = j qj (t) = 1. Now define the operators πj = qj (α). Since (α − λj ι)πj ∝ p(α) = 0,
πj projects onto the λj eigenspace. Since the projectors sum to πj (v) = q(α) = ι, the eigenspaces
span V .
50 4. Linear Algebra
• Define the minimal polynomial of α to be the non-zero monic polynomial mα (t) of least degree
so that mα (α) = 0. Such polynomials exist with degree bounded by n2 , since End(V ) has
dimension n2 .
Then A satisfies t − 1, but B does not; instead its minimal polynomial is (t − 1)2 . To understand
this, note that
0 1
C=
0 0
has minimal polynomial t2 , and its action consists of taking the basis vectors ê2 → ê1 → 0, which
is why it requires two powers of t to vanish. This matrix is not diagonalizable because the only
possible eigenvalue is zero, but only ê1 is an eigenvector; ê2 is a ‘generalized eigenvector’ that instead
eventually maps to zero. As we’ll see below, such generalized eigenvectors are the only obstacle to
diagonalizability.
Prop (Schur). Let V be a finite-dimensional complex vector space and let α ∈ End(V ). Then there
is a basis where α is upper triangular.
Proof. By the FTA, the characteristic polynomial has a root, and hence there is an eigenvector.
By taking this as our first basis element, all entries in the first column are zero except for the first.
Quotienting out the eigenspace gives the result by induction.
Theorem (Cayley-Hamilton). Let V be a finite-dimensional vector space over F and let α ∈ End(V ).
Then χα (α) = 0, so mα divides χα .
Proof. [F = C] We use Schur’s theorem, and let α be represented with A, which has diagonal
Q
elements λi . Then χα (t) = i (t − λi ). Applying the factor (α − λn ) sets the basis vector ên to zero.
Subsequently applying the factor (α − λn−1 ) sets the basis vector ên−1 to zero, and does not map
anything to ên since A is upper triangular. Repeating this logic, χα (α) sets every basis vector to
zero, giving the result. This also proves the Cayley-Hamilton theorem for F = R, because every
real polynomial can be regarded as a complex one.
Proof. [General F] A tempting false proof of the Cayley-Hamilton theorem is to simply directly
substitute t = A in det(tI − A). This doesn’t make sense, but we can make it make sense by
explicitly expanding the characteristic polynomial. Let B = tI − A. Then
where the ai are the coefficients of the characteristic polynomial. Expanding term by term,
Adding these equations together, the left-hand sides telescope, giving the result.
Proof. [Continuity] Use the fact that Cayley-Hamilton is obvious for diagonalizable matrices, con-
tinuity of χα , and the fact that diagonalizable matrices are dense in the space of matrices. This is
the shortest proof, but has the disadvantage of requiring much more setup.
We know the characteristic polynomial is (t − 1)2 (t − 2), and that both 1 and 2 are eigenvalues.
Thus by Cayley-Hamilton the minimal polynomial is (t − 1)a (t − 2) where a is 1 or 2. A direct
calculation shows that a = 1 works; hence A is diagonalizable.
Next, we move to Jordan normal form.
• As we’ll see, the source of nondiagonalizability is Jordan blocks, i.e. matrices of the form
Jn (λ) = λIn + Kn
where Kn has ones directly above the main diagonal. These blocks have gλ = 1 but aλ = cλ = n.
A matrix is in Jordan normal form if it is block diagonal with Jordan blocks.
• It can be shown that every matrix is similar to one in Jordan normal form. A sketch of the
proof is to split the vector space into ‘generalized eigenspaces’ (the nullspaces of (A − λI)k for
sufficiently high k), so that we can focus on a single eigenvalue, which can be shifted to zero
without loss of generality.
Example. All possible Jordan normal forms of 3×3 matrices. We have the diagonalizable examples,
as well as
λ1 λ1 λ1 1
λ2 1 , λ1 1 , λ1 1 .
λ2 λ1 λ1
52 4. Linear Algebra
Example. The prototype for a Jordan block is a nilpotent endomorphism that takes
At first glance it seems this can’t be put in Jordan form, but it can because it takes ê1 − ê2 → 0.
Thus there are actually two Jordan blocks!
Example. Solving the differential equation ẋ = Ax for a general matrix A. The method of normal
modes is to diagonalize A, from which we can read off the solution x(t) = eAt x(0). More generally,
the best we can do is Jordan normal form, and the exponential of a Jordan block contains powers of
t, so generally the amplitude will grow polynomially. Note that this doesn’t happen for mass-spring
systems, because there the equivalent of A must be antisymmetric by Newton’s third law, so it is
diagonalizable.
53 5. Groups
5 Groups
5.1 Fundamentals
We begin with the basic definitions.
• A group G is a set with an associative binary operation, so that there is an identity e which
satisfies ea = ae = a for all a ∈ G, and for every element a there is an inverse a−1 so that
aa−1 = a−1 a = e. A group is abelian if the operation is commutative.
– Any field F is a abelian group under addition, while F∗ , which omits the zero element, is a
abelian group under multiplication.
– The set of n × n invertible real matrices forms the group GL(n, R) under matrix multipli-
cation, and it is not abelian.
– A group is cyclic if all elements are powers g k of a fixed group element g. The nth cyclic
group Cn is the cyclic group with n elements.
– The dihedral group D2n is the set of symmetries of a regular n-gon. It is generated by
rotations r by 2π/n and a reflection s and hence has 2n elements, of the form rk or srk .
We may show this using the relations rn = s2 = 1 and srs = r−1 .
For example, there are two groups of order 4, which are C4 and the Klein four group C2 ×C2 .
– A subgroup H ⊆ G is a subset of G closed under the group operations. For example,
Cn ⊆ D2n and C2 ⊆ D2n .
– Note that intersections of subgroups are subgroups. The subgroup generated by a subset
S of G, called hSi is the smallest subgroup of G that contains S. One may also consider
the subgroup generated by a group element, hgi.
• The order of a group |G| is the number of elements it contains, while the order of a group
element g is the smallest integer k so that g k = e.
• Two elements in a group g1 and g2 are conjugate if there is a group element h so that g1 = hg2 h−1 .
Conjugacy is an equivalence relation and hence splits the group into conjugacy classes.
• The symmetric group Sn is the set of bijections S → S of a set S with n elements, conventionally
written as S = {1, 2, . . . , n}, where the group operation is composition.
54 5. Groups
There is an ambiguity of notation, because for σ, τ ∈ Sn the product στ can refer to doing the
permutation σ first, as one would expect naively, or to doing τ first, because one would write
σ(τ (i)) for the image of element i. We choose the former option.
• It is easier to write permutations using cycle notation. For example, a 3-cycle (123) denotes
a permutation that maps 1 → 2 → 3 → 1 and fixes everything else. All group elements are
generated by 2-cycles, also called transpositions.
• Any permutation can be written as a product of disjoint cycles. The cycle type is the set
of lengths of these cycles, and conjugacy classes in Sn are specified by cycle type, because
conjugation merely ‘relabels the numbers’.
• Specifically, suppose there are ki cycles of length `i . Then the number of permutations with
this cycle type is
n!
Q ki
i `i ki !
where the first term in the denominator accounts for shuffling within a cycle (since (123) is
equivalent to (231)) and the second accounts for exchanging cycles of the same length (since
(12)(34) is equivalent to (34)(12)).
• The subgroup of even permutations is the alternating group An ⊆ Sn . Note that every even
permutation is paired with an odd one, by multiplying by an arbitrary transposition, so |An | =
n!/2. For n ≥ 4, An is not abelian since (123) and (124) don’t commute.
• Finally, some conjugacy classes break in half when passing from Sn to An . For example, (123)
and (132) are not conjugate in A4 , because if σ −1 (123)σ = (132), then (1σ 2σ 3σ) = (132),
which means σ is odd.
• The integers are the cyclic group of infinite order. To make this very explicit, we may define
an isomorphism φ(g k ) = k for generator g.
• Any subgroup of a cyclic group is cyclic. Let G = hgi and H ⊆ G. Then if n is the minimum
natural number so that g n ∈ H, we claim H = hg n i. For an arbitrary element g a ∈ H, we may
use the division algorithm to write a = qn + r, and hence g r ∈ H. Then we have a contradiction
unless r = 0.
We then immediately have Bezout’s lemma, i.e. there exist integers u and v so that
um + vn = gcf(m, n).
We can then establish the usual properties, e.g. if x|m and x|n then x| gcf(m, n).
Cmn ∼
= Cm × Cn .
mn|umk + vnk = k
• We write Zn for the set of equivalence classes where a ∼ b if n|(a − b). Both addition and
multiplication are well defined on these classes. Under addition, Zn is simply a cyclic group Cn .
and write G/H to denote the set of (left) cosets. In general, gH 6= Hg.
• We see gH and kH are the same coset if k −1 g ∈ H. This is an equivalence relation, so the
cosets partition the group. Moreover, all cosets have the same size because the map h 7→ gh is
a bijection between H and gH. Thus we have
• By considering the cyclic group generated by any group element, the order of any group element
divides |G|. In particular, all groups with prime order are cyclic.
• Fermat’s little theorem states that for a prime p where p does not divide a,
ap−1 ≡ 1 mod p.
aφ(n) ≡ 1 mod n
where gcf(a, n) = 1.
(p − 1)! ≡ −1 mod p.
To see this, note that the only elements that are their own inverses are ±1. All other elements
pair off with their inverses and contribute 1 to the product.
• If G has even order, then it has an element of order 2, by the same reasoning as before: some
element must be its own inverse by parity.
• This result allows us to classify groups of order 2p for prime p ≥ 3. There must be an element
x of order 2. Furthermore, not all elements can have order 2, or else the group would be (Z2 )n ,
so there is an element y of order p. Since p is odd, x 6∈ hyi, so the group is G = hyi ∪ xhyi.
The product yx must be one of these elements, and it can’t be a power of y, so yx = xy j . Then
odd powers of yx all carry a power of x, so yx must have even order. If it has order 2p, then
G∼= C2p . Otherwise, it has order 2, so (yx)2 = y j+1 = 1, implying j = p − 1, so G ∼
= D2p .
In general, when one is given a group in this form, one simply uses the relations to reduce
strings of the generators, called words, as far as possible. The remaining set that cannot be
reduced form the group elements.
Example. So far we’ve classified all groups up to order 7, where order 6 follows from the work
above. The groups of order 8 are
C8 , C2 × C4 , C2 × C2 × C2 , D8 , Q8
where Q8 is the quaternion group. The quaternions are numbers of the form
q = a + bi + cj + dk, a, b, c, d ∈ R
gH = Hg for all g ∈ G
• Normal subgroups are unions of conjugacy classes. This can place strong constraints on normal
subgroups by counting arguments.
• If |G/H| = 2 then H E G. This is because the left and right cosets eH and He must coincide,
and hence the other left and right coset also coincide. For example, An E Sn and SO(n) E O(n).
58 5. Groups
Then Z(G) E G.
and hence make G/H into a quotient group. This rule is consistent because
G/ ker φ ∼
= im φ
• The first isomorphism theorem can also be used to classify all homomorphisms φ : G → H. We
first determine the normal subgroups of G, as these are the potential kernels. For each normal
subgroup N , we count the number n(N ) of subgroups in H isomorphic to G/N . Finally, we
determine Aut(G/N ). Then the number of homomorphisms is
X
n(N ) · |Aut(G/N )|.
N
• The second isomorphism theorem states that for H ⊆ G and N E G, then H ∩ N E H and
HN ∼ H
= .
N H ∩N
The first statement follows because both N and H are closed under conjugation by elements of
H. As for the second, we consider
i
H→
− HN → HN/N
where i is the inclusion map and the second map is a quotient. The composition is surjective
with kernel H ∩ N , so the result follows from the first isomorphism theorem.
(G/K)/(N/K) ∼
= G/N.
G → G/K → (G/K)/(N/K).
• Conversely, let K E G and let G = G/K with H E G. Then there exists H ⊆ G with H = H/K,
defined by
H = {h ∈ G|hK ∈ H}.
Note that in this definition, H is comprised of cosets. However, if H E G then H E G.
Example. We will use the running example of G = S4 . Let H = S3 ⊆ S4 by acting on the first
three elements only, and let N = V4 E S4 . Then HN = S4 and H ∩ N = {e}, so the second
isomorphism theorem states
S4 /V4 ∼
= S3 .
Next, let N = A4 E S4 and K = V4 E S4 . We may compute G/K ∼
= S3 and N/K ∼
= A3 , so the
third isomorphism theorem states
S3 /A3 ∼
= C2 .
Example. The symmetric groups Sn are not simple, because An E Sn . However, An is simple for
n ≥ 5. For example, for A5 the conjugacy classes have sizes
60 = 1 + 20 + 15 + 12 + 12
where the factors of 12 come from splitting the 24 5-cycles. There is no way to pick a subset of
these numbers to sum to 30. In fact, A5 is the smallest non-abelian simple group.
Note. As we’ll see below, the simple groups are the “atoms” of group theory. The finite simple
groups have been classified; the only possibilities are:
• A cyclic group of prime order Cp .
• One of 26 sporadic groups, including the Monster and Baby Monster groups.
ρ : G × S → S, g · s ≡ ρ(g, s)
Orb(s) = {g · s : g ∈ G} ⊂ S, Stab(s) = {g ∈ G : g · s = s} ⊆ G.
In particular, Stab(s) is a subgroup of G, and the orbits partition S. If there is only one orbit,
we say the action is transitive. Also, if two elements lie in the same orbit, their stabilizers are
conjugate.
• For example, GL(n, R) acts on matrices and column vectors Rn by matrix multiplication, and
on matrices by conjugation; in the latter case the orbits correspond to Jordan normal forms.
Also note that GL(n, R) has a left action on column vectors but a right action on row vectors.
• The symmetry group D2n acts on the vertices of a regular n-gon. Affine transformations of
the plane act on shapes in the plane, and the orbits are congruence classes. Geometric group
actions such as these were the original motivation for group theory.
This is because there is an isomorphism between the cosets of Stab(s) and the elements of
Orb(s), explicitly by g Stab(s) 7→ g · s, which implies |G|/| Stab(s)| = |Orb(s)|. That is, a
transitive group action corresponds to a group action on the set of cosets of the stabilizer.
CG (g) = {h ∈ G : gh = hg}.
Also let C(g) be the conjugacy class of g. Applying the orbit-stabilizer theorem to the group
action of conjugation,
|G| = |CG (g)| · |C(g)|.
This gives an alternate method for finding |C(g)|, or for finding |G|.
Example. Let GT be the tetrahedral group, the set of rotational symmetries of the four vertices
of a tetrahedron. The stabilizer of a particular vertex v consists of the identity and two rotations,
and the action is transitive, so
|GT | = 3 · 4 = 12.
Similarly, for the cube, the stabilizer of a vertex consists of the identity and the 120◦ and 240◦
rotations about a space diagonal through the vertex, so
|GC | = 3 · 8 = 24.
We could also have done the calculation looking at the orbit and stabilizer of edges or faces.
62 5. Groups
Example. If |G| = pr , then G has a nontrivial center. The conjugacy class sizes are powers of p,
and the class of the identity has size 1, so there must be more classes of size 1, yielding a nontrivial
center. In the case |G| = p2 , let x be a nontrivial element in the center. If the order of x is p2 , then
G∼= Cp2 . If not, it has order p. Consider another element y with order p, not generated by x. Then
the p2 group elements xi y j form the whole group, so G ∼ = Cp × Cp .
Example. Cauchy’s theorem states that for any finite group G and prime p dividing |G|, G has
an element of order p. To see this, consider the set
Then |S| = |G|p−1 , because the first p − 1 elements can be chosen freely, while the last element is
determined by the others. The group Cp with generator σ acts on S by
σ · (g1 , g2 , . . . , gp ) = (g2 , . . . , gp , g1 ).
By the Orbit-Stabilizer theorem, the orbits have size 1 or p, and the orbits partition the set. Since
(e, . . . , e) is an orbit of size 1, there must be other orbits of size 1, corresponding to an element g
with g p = e.
To see this, note that we can count the pairs (g, s) so that g · s = s by summing over group
elements or set elements, giving
X X
|fix(g)| = | Stab(s)|.
g∈G s∈S
• Note that if g and h are conjugate, then |fix(g)| = |fix(h)|, so the right-hand side can also be
evaluated by summing over conjugacy classes.
ρ : G → Sym(S)
which is called a representation of G. For example, when S is a vector space and G acts by
linear transformations, then ρ is a representation as used in physics.
• A group’s action on itself by left multiplication is faithful, so every finite group G is isomorphic
to a subgroup of S|G| . This is called Cayley’s theorem.
Example. Find the number of ways to color a triangle’s edges with n colors, up to rotation and
reflection. We consider rotations D6 acting on the triangle, and want to find the number of orbits.
Burnside’s lemma gives
1 3
n + 3n2 + 2n
N=
3
where we summed over the trivial conjugacy class, the conjugacy class of the rotation, and the
conjugacy class of the reflection. This is indeed the correct answer, with no casework required.
Example. Find the number of ways to paint the faces of a rectangular box black or white, where
the three side lengths are distinct. The rotational symmetries are C2 × C2 , corresponding to the
identity and 180◦ rotations about the x, y, and z axes. Then
1
N = (26 + 24 ) = 28.
4
Example. Find the number of ways to make a bracelet with 3 red beads, 2 blue beads, and 2 white
beads. Here the symmetry group is D14 , imagining the beads as occupying the vertices of a regular
heptagon, and there are 7!/3!2!2! = 210 bracelets without accounting for the symmetry. Then
1
N= (210 + 6(0) + 7(3!)) = 18.
14
Example. Find the number of ways to color the faces of a cube with n colors. The relevant
symmetry group is GC . Note that we have a homomorphism ρ : GC → S 4 by considering how GC
acts on the four space diagonals of the cube. In fact, it is straightforward to check that ρ is an
isomorphism, so GC ∼= S 4 . This makes it easy to count the conjugacy classes. We have
24 = 1 + 3 + 6 + 6 + 8
where the 3 corresponds to double transpositions or rotations of π about opposing faces’ midpoints,
the first 6 corresponds to 4-cycles or rotations of π/2 about opposing faces’ midpoints, the second
6 corresponds to transpositions or rotations of π about opposing edges’ midpoints, and the 8
corresponds to 3-cycles or rotations of π/3 about space diagonals. By Burnside’s lemma,
1 6
N= (n + 3n4 + 6n3 + 6n3 + 8n2 ).
24
By similar reasoning, we have a homomorphism ρ : GT → S4 by considering how GT acts on the
four vertices of the tetrahedron, and |GT | = 12, so GT ∼
= A4 .
• For a group G and a subset S of G, we defined the subgroup hSi ⊆ G to be the smallest subgroup
of G containing S. However, it is not clear how this definition works for infinite groups, nor
immediately clear why it is unique. A better definition is to let hSi be the intersection of all
subgroups of G that contain S.
64 5. Groups
• We say a group G is finitely generated if there exists a finite subset S of G so that hSi = G.
All groups of uncountable order are not finitely generated. Also, Q under multiplication is
countable but not finitely generated because there are infinitely many primes.
• Suppose we have a set S called an alphabet, and define a corresponding set S −1 , so the element
x ∈ S corresponds to x−1 ∈ S −1 . A word w is a finite sequence w = x1 . . . xn where each
xi ∈ S ∪ S −1 . The empty sequence is denoted by ∅.
• We may contract words by canceling adjacent pairs of the form xx−1 for x ∈ S ∪ x−1 . It is
somewhat fiddly to prove, but intuitively clear, that every word w can be uniquely transformed
into a reduced word [w] which does not admit any such contractions.
• The set of reduced words is a group under concatenation, called the free group F (S) generated
by S. Here F (S) is indeed a group because
Free groups are useful because we can use them to formalize group presentations.
• Given any set S, group G, and mapping f : S → G, there is a unique homomorphism φ : F (S) →
G so that the diagram
f
S G
φ
i
F (S)
commutes, where i : S → F (S) is the canonical inclusion which takes x ∈ S to the corresponding
generator of F (S).
• Taking S to be a generating set for G, and f to be inclusion, this implies every group is a
quotient of a free group.
• Let B be a subset of a group G. The normal subgroup generated by B is the intersection of all
normal subgroups of G that contain B, and is denoted by hhBii.
To prove this, denote this set as N . It is clear that N ⊆ hhBii, so it suffices to show that N E G.
The only nontrivial check is closure under conjugation, which works because
n n
!
Y i −1
Y
g gi bi gi −1
g = (ggi )bi i (ggi )−1
i=1 i=1
which lies in N .
• Let X be a set and let R be a subset of F (X). We define the group with presentation hX|Ri
to be F (X)/hhRii. We need to use hhRii because the relation w = e implies gwg −1 = e.
• For any group G, there is a canonical homomorphism F (G) → G by sending every generator of
F (G) to the corresponding group element. Letting R(G) be the kernel, we have G ∼
= F (G)/R(G),
and hence we define the canonical presentation for G to be
hG|R(G)i.
This is a very inefficient presentation, which we mention because it uses no arbitrary choices.
• Free groups also characterize homomorphisms. Let hX|Ri and H be groups. A map f : X → R
induces a homomorphism φ : F (X) → H. This descends to a homomorphism hX|Ri → H if
and only if R ⊂ ker φ.
• The next step in the proof is to ‘quotient out’ by K. By the second isomorphism theorem,
G/Gr−1 ∼
= Hs−1 /K, G/Hs−1 ∼
= Gr−1 /K
so Gr−1 /K and Hs−1 /K are simple. Since K has a composition series, we have composition
series
{e} E G1 E . . . E Gr−1
{e} E H1 E . . . E Hs−1
• Next, we append the factor G to the end of these series. By the second isomorphism theorem,
the composition series
are equivalent. Then our original two composition series are equivalent, completing the proof.
• Note that if G is finite and abelian, its composition factors are also, and hence must be cyclic
of prime order. In particular, for G = Cn this proves the fundamental theorem of arithmetic.
• If H E G with G finite, then the composition factors of G are the union of those of H and
G/H. We showed this as a corollary when discussing the isomorphism theorems. In particular,
if X and Y are simple, the only two composition series of X × Y are
{e} E X E X × Y, {e} E Y E X × Y.
• A finite group G is solvable if every composition factor is a cyclic group of prime order, or
equivalently, abelian. Burnside’s theorem states that all groups of order pn q m for primes p
and q are solvable, while the Feit-Thompson theorem states that all groups of odd order are
solvable.
• We already know how to combine groups using the direct product, but this is uninteresting.
Suppose a group were of the form G = G1 G2 for two disjoint subgroups G1 and G2 . Then
every group element can be written in the form g1 g2 , but it is unclear how we would write the
product of two elements (g1 g2 )(g10 g20 ) in this form. The problem is resolved if one of the Gi is
normal in G, motivating the following definition.
67 5. Groups
• The semidirect product generalizes the direct product. If we also have H E G, then G ∼
= N × H.
To see this, note that every group element can be written uniquely in the form nh. Letting
nh = (n1 h1 )(n2 h2 ), we have
nh = (n1 h1 n2 h−1 −1
1 )(h1 h2 ) = (n1 n2 )(n2 h1 n2 h2 ).
By normality of N and H, both these expressions are already in the form nh. Then we have
n = n1 h1 n2 h−1
1 = n1 n2 , which implies h1 n2 = n2 h1 , giving the result.
– We have D2n = hσi o hτ i where σ generates rotations and τ is a reflection. Note that a
nonabelian group arises from the semidirect product of abelian groups.
– We have Sn = An o hσi for any transposition σ.
– We have S4 = V4 o S3 , which we found earlier.
• To understand the multiplication rule in a semidirect product, letting nh = (n1 h1 )(n2 h2 ) again,
nh = n1 h1 n2 h2 = (n1 h1 n2 h−1
1 )h1 h2
That is, the multiplication law is like that of a direct product, but the multiplication in N is
“twisted” by conjugation by H. The map h 7→ φh gives a group homomorphism H → Aut(N ).
• This allows us to define the semidirect product of two groups without referring to a larger group,
i.e. an external semidirect product. Specifically, for two groups H and N and a homomorphism
φ : H → Aut(N )
we may define (N o H, ◦) to consist of the set of pairs (n, h) with group operation
Example. Let Cn = hai and C2 = hbi. Let φ : C2 → Aut(Cn ) satisfy φ(b)(a) = a−1 . Then
Cn oφ C2 ∼
= D2n . To see this, note that an = b2 = e and
6 Rings
6.1 Fundamentals
We begin with the basic definitions.
• A ring R is a set with two binary operations + and ×, so that R is an abelian group under the
operation + with identity element 0 ∈ R, and × is associative and distributes over +,
– Any field F is a CRI. The polynomials F[x] also form a CRI. More generally given any
ring R, the polynomials R[x] also form a ring. We may also define polynomial rings with
several variables, R[x1 , . . . , xn ].
– The integers Z, the Gaussian integers Z[i], and Zn are CRIs. The quaternions H form a
noncommutative ring.
– The set Mn (F) of n × n matrices over F is a ring, which implies End(V ) = Hom(V, V ) is a
ring for a vector space V .
– For an n×n matrix A, the set of polynomials evaluated on A, denoted F[A], is a commutative
subring of Mn (F). Note that the matrix A may satisfy nontrivial relations; for instance if
A2 = −I, then R[A] ∼= C.
– The space of bounded real sequences `∞ is a CRI under componentwise addition and
multiplication, as does the set of continuous functions C(R). In general for a set S and
ring R we may form a ring RS out of functions f : S → R.
– The power set P(X) of a set X is a CRI where the multiplication operation is intersection,
and the addition operation is XOR, written as A∆B = (A \ B) ∪ (B \ A). Then the additive
inverse of each subset is itself. For a finite set, P(X) ∼
= (Z2 )|X| .
• Polynomial rings over fields are familiar. However, we will be interested in polynomial rings
over rings, which are more subtle. For example, in Z8 [x] we have
(2x)(4x) = 8x2 = 0
so multiplication is not invertible. Moreover the quadratic x2 − 1 has four roots 1, 3, 5, 7, and
hence can be factored in two ways,
Much of our effort will be directed at finding when properties of C[x] carry over to general
polynomial rings.
70 6. Rings
• A subring S ⊆ R is a subset of a ring R that is closed under + and ×. This implies 0 ∈ S. For
example, as in group theory, we always have the trivial subrings {0} and R. Given any subset
X ⊂ R, the subring generated by X is the smaller subring containing it.
7 Point-Set Topology
7.1 Definitions
We begin with the fundamentals, skipping content covered when we considered metric spaces.
Example. The topology containing all subsets of X is called the discrete topology, and the one
containing only X and ∅ is called the indiscrete/trivial topology.
Example. The finite complement topology Tf is the set of subsets U of X such that X − U is
either finite or all of X. The set of finite subsets U of X (plus X itself) fails to be a topology, since
it’s instead closed under arbitrary intersections and finite unions; taking the complement flips this.
Definition. A basis B for a topology on X is a set of subsets of X, called basis elements, such that
• If x belongs to the intersection of two basis elements B1 and B2 , then there is a basis element
B3 containing x such that B3 ⊂ B1 ∩ B2 .
The topology T generated by B is the set of unions of elements of B. Conversely, B is a basis for T
if every element of T can be written as a union of elements of B.
Proof. Most properties hold automatically, except for closure under finite intersections. It suffices
to consider the intersection of two sets, U1 , U2 ∈ T . Let x ∈ U1 ∩ U2 . We know there is a basis
element B1 ⊂ U1 that contains x, and a basis element B2 ⊂ U2 that contains x. Then there is a B3
containing x contained in B1 ∩ B2 , which is in U1 ∩ U2 . Then U1 ∩ U2 ∈ T , as desired.
Describing a topological space by a basis fits better with our intuitions. For example, the topology
generated by B 0 is finer than the topology generated by B is every element of B can be written as
the union of elements of B 0 . Intuitively, we “smash rocks (basis elements) into pebbles”.
Example. The collection of one-point subsets is a basis for the discrete topology. The collection of
(open) circles is a basis for the “usual” topology of R2 , as is the collection of open rectangles. We’ll
formally show this later.
Example. Topologies on R. The standard topology on R has basis (a, b) for all real a < b, and
we’ll implicitly mean this topology whenever we write R. The lower limit topology on R, written
Rl , is generated by basis [a, b). The K-topology on R, written RK , is generated by open intervals
(a, b) and sets (a, b) − K, where K = {1/n | n ∈ Z+ }.
Both of these topologies are strictly finer than R. For x ∈ (a, b), we have x ∈ [x, b) ⊂ (a, b), so
Rl is finer; since there is no open interval containing x in [x, b), it is strictly finer. Similarly, there
is no open interval containing 0 in (−1, 1) − K, so RK is strictly finer.
72 7. Point-Set Topology
Proof. Consider (a, +∞). If X has a largest element, we’re done. Otherwise, it is the union of all
basis elements of the form (a, x) for x > a.
Example. The order topology on R is just the usual topology. The order topology on R2 in the
dictionary order contains all open intervals of the form (a × b, c × d) where a < c or a = c and b < d.
It’s sufficient to take the intervals of the second type as a basis, since we can recover intervals of
the first type by taking unions of rays.
Example. The set X = {1, 2} × Z+ in the dictionary order looks like a1 , a2 . . . ; b1 , b2 , . . .. However,
the order topology on X is not the discrete topology, because it doesn’t contain {b1 }! All open sets
containing b1 must contain some ai .
Definition. If X and Y are topological spaces, the product topology on X × Y is generated by the
basis B containing all sets of the form U × V , where U and V are open in X and Y .
We can’t use B itself as the topology, since the union of product sets is generally not a product set.
Prop. If B and C are bases for X and Y , the set of products D = {B × C | B ∈ B, C ∈ C} is a basis
for the product topology on X × Y .
Proof. We must show that any U × V can be written as the union of members of D. For any
x × y ∈ U × V , we have basis elements B ⊂ U containing x and C ⊂ V containing y. Then
B × C ⊂ U × V and contains x, as desired.
is a subbasis for the product topology on X × Y . Intuitively, the basis contains rectangles, and the
subbasis contains strips.
Proof. Since every element of S is open in the product topology, we don’t get any extra open sets.
We know we get every open set because intersecting two strips gives a rectangle, so we can get every
basis element.
73 7. Point-Set Topology
Theorem. Let A ⊂ X. Then x ∈ A iff every neighborhood of x intersects A. If X has a basis, the
theorem is also true if we only use basis elements as neighborhoods.
Proof. Consider the contrapositive. Suppose x has a neighborhood U that doesn’t intersect A.
Then X − U is closed, so A ⊂ X − U , so x 6∈ A. Conversely, if x 6∈ A, then X − A is a neighborhood
of x that doesn’t intersect A.
Restricting to basis elements works because if U is a neighborhood of x, then by definition, there
is a basis element B ⊂ U that contains x.
Proof. The limit point criterion is stricter than the closure criterion above, so A0 ⊂ A, giving
A ∪ A0 ⊂ A. To show the reverse, let x ∈ A. If x ∈ A, we’re done; otherwise, every neighborhood
of x intersects an element of A that isn’t itself, so x ∈ A0 . Then A ⊂ A ∪ A0 .
Corollary. A subset of a topological space is closed iff it contains all its limit points.
Example. If A ⊂ R is the interval (0, 1], then A = [0, 1], but the closure of A in the subspace
Y = (0, 2) is (0, 1]. We can also show that Q = R, and Z+ = Z+ . Note that Z+ has no limit points.
In a general topological space, intuitive statements about closed sets that hold in R may not
hold. For example, let X = {a, b} and T = {{}, {a}, {a, b}}. Then the one-point set {a} isn’t closed,
since it has b as a limit point!
Similarly, statements about convergence fail. Given a sequence of points xi ∈ X, we say the
sequence converges to x ∈ X if, for every neighborhood U of x, there is a positive integer N so that
xn ∈ U for all n ≥ N . Then the one-point sequence a, a, . . . converges to both a and b!
The problem is that the points a and b are “too close together”, so close that we can’t topologically
tell them apart. We add a new, mild axiom to prevent this from happening.
Definition. A topological space X is Hausdorff if, for every two distinct points x1 , x2 ∈ X, there
exist disjoint neighborhoods of x1 and x2 . Then the points are “housed off” from each other.
Proof. It suffices to show this for a one-point set, {x0 }. If x 6= x0 , then x has a neighborhood that
doesn’t contain x0 . Then it’s not in the closure of {x0 }, by definition.
This condition, called the T1 axiom, is even weaker than the Hausdorff axiom.
Prop. Let X satisfy the T1 axiom and let A ⊂ X. Then x is a limit point of A iff every neighborhood
of x contains infinitely many points of A.
Proof. Suppose the neighborhood U of x contains finitely many points of A − {x}, and call this
finite set A0 . Since A0 is closed, U ∩ (X − A0 ) is a neighborhood of x disjoint from X − {x}, so x is
not a limit point of A.
If every neighborhood U of x contains infinitely many points of A, then every such neighborhood
contains at least one point of A − {x}, so x is a limit point of A.
Proof. Let xn → x and y 6= x. Then x and y have disjoint neighborhoods U and V . Since all but
finitely many xn are in U , the same cannot be true of V , so xn does not converge to y.
Prop. Every order topology is Hausdorff, and the Hausdorff property is preserved by products and
subspaces.
Example. Let f : R → Rl be the identity function f (x) = x. Then f is not continuous, because
the inverse image of the open set [a, b) of R0 is not open in R.
Definition. Let f : X → Y be injective and continuous and let Z = f (X), so the restriction
f 0 : X → Z is bijective. If f 0 is a homeomorphism, we say f is a topological imbedding of X in Y .
Example. The topological spaces (−1, 1) and R are isomorphic. Define F : (−1, 1) → R and its
inverse G as
x 2y
F (x) = 2
, G(y) = .
1−x 1 + (1 + 4y 2 )1/2
Because F is order-preserving and bijective, it corresponds basis elements of (−1, 1) and R, so it is
a homeomorphism. Alternatively, we can show F and G are continuous using facts from calculus.
Example. Define f : [0, 1) → S 1 by f (t) = (cos 2πt, sin 2πt). Then f is bijective and continuous.
However, f −1 is not, since f sends the open set [0, 1/4) to a non-open set. This makes sense, since
our two sets are topologically distinct.
• (Local criterion) The map f : X → Y is continuous if X can be written as the union of open
sets Uα so that f |Uα is continuous for each α.
Proof. Most of these properties are straightforward, so we only prove the last one. Let C be a
closed subset of Y . Then h−1 (C) = f −1 (C) ∪ g −1 (C). These sets are closed in A and B respectively,
and hence closed in X. Then h−1 (C) is closed in X.
Example. The pasting lemma also works if A and B are both open, since the local criterion applies.
However, it can fail if only A is closed and only B is open. Consider the real line and let A = (−∞, 0)
and let B = [0, ∞), with f (x) = x − 2 and g(x) = x + 2. These functions are continuous on A and
B respectively, but pasting them yields a function discontinuous at x = 0.
Prop. Write f : A → X × Y as f (a) = (f1 (a), f2 (a)). Then f is continuous iff the coordinate
functions f1 and f2 are. This is another manifestation of the universal property of the product.
Definition. Let {Xα }α∈J be an indexed family of topological spaces, and let Uα denote an arbitrary
open set in Xα .
Q Q
• The box topology on Xα has basis elements of the form Uα .
• The product topology on Xα has subbasis elements of the form πα−1 (Uα ), for arbitrary α.
Q
77 7. Point-Set Topology
We’ve already seen that in the finite case, these two definitions are equivalent. However, they differ
in the infinite case, because subbasis elements only generate open sets under finite intersections.
Q
Then the basis elements of the product topology are of the form Uα , where Uα = Xα for all but
finitely many values of α. We prefer the product topology, for the following reason.
Q Q
Prop. Write f : A → Xα as f (a) = (fα (a))α∈J . If Xα has the product topology, then f is
continuous iff the coordinate functions fα are.
Example. The above proposition doesn’t hold for the box topology. Consider Rω and let f (t) =
(t, t, . . .). Then each coordinate function is continuous, but the inverse image of the basis element
1 1 1 1
B = (−1, 1) × − , × − , × ···
2 2 3 3
is not open, because it contains the point zero, but no basis element (−δ, δ) about the point zero.
This is inherently because open sets are not closed under infinite intersections.
Q
In the future, whenever we consider Xα , we will implicitly give it the product topology. The box
topology will sometimes be used to construct counterexamples.
Q
Prop. The following results hold for Xα in either the box or product topologies.
Q Q
• If Aα is a subspace of Xα , then Aα is a subspace of Xα if both are given the box or product
topologies.
Q
• If each Xα is Hausdorff, so is Xα .
• Let Aα ⊂ Xα . Then
Y Y
Aα = Aα .
Q
• Let Xα have basis Bα . Then Bα where Bα ∈ Bα is a basis for the box topology. The same
collection of sets, where Bα = Xα for all but a finite number of α, is a basis for the product
topology. Thus the box topology is finer than the product topology.
is a basis for a topology on X, called the metric topology induced by d. We say a topological space
is metrizable if it can be induced by a metric on the underlying set, and call a metrizable space
together with its metric a metric space.
Metric spaces correspond nicely with our intuitions from analysis. For example, using a basis above,
a set U is open if, for every y ∈ U , U contains an -ball centered at y. Different choices of metric
may yield the same topology; properties dependent on such a choice are not topological properties.
78 7. Point-Set Topology
Example. The metric d(x, y) = |x − y| on R generates the standard topology on R, because its
basis elements (x − , x + ) are the same as those of the order topology, (a, b).
Example. Boundedness is not a topological property. Let X be a metric space with metric d. A
subset A of X is bounded if the set of distances d(a1 , a2 ) with a1 , a2 ∈ A has an upper bound. If A
is bounded, its diameter is
diam A = sup d(a1 , a2 ).
a1 ,a2 ∈A
Then every set is bounded if we use the metric d, but d and d generate the same topology! Proof:
we may use the set of -balls with < 1 as a basis for the metric topology. These sets are identical
for d and d.
8 Algebraic Topology
8.1 Constructing Spaces
8.2 The Fundamental Group
8.3 Group Presentations
8.4 Covering Spaces
80 9. Methods for ODEs
The boundary condition is homogeneous if γ is zero. Boundary value problems are more subtle
than initial value problems, because a given set of boundary conditions may admit no solutions
or infinitely many. As such, we will completely ignore the boundary conditions for now.
• By the linearity of L, the general solution consists of a solution to the equation plus any solution
to the homogeneous equation, which has f = 0 . The solutions to the homogeneous equation
form an n-dimensional vector space. For simplicity we will focus on the case n = 2 below.
• The simplest way to check if a set of solutions to the homogeneous equation is linearly dependent
is to evaluate the Wronskian. For n = 2 it is
y1 y2
W (y1 , y2 ) = det 0 = y1 y20 − y2 y10
y1 y20
and the generalization to arbitrary n is straightforward. If the solutions are linearly dependent,
then the Wronskian vanishes.
• The converse to the above statement is a bit subtle. It is clearly true if the Pi are all constants.
However, if P2 (x0 ) = 0 for some x0 , then y 00 is not determined at that point; hence two solutions
may be dependent for x < x0 but become independent for x > x0 . If P2 (x) never vanishes, the
converse is indeed true.
• For constant coefficients, the homogeneous solutions may be found by guessing exponentials.
In the case where Pn ∝ xn , all terms have the same power, so we may guess a power xm .
• Another useful trick is reduction of order. Suppose one solution y1 (x) is known. We guess a
solution of the form
y(x) = v(x)y1 (x).
Plugging this in, all terms proportional to v cancel because y1 satisfies the ODE, giving
P2 (2v 0 y10 + v 00 y1 ) + P1 v 0 y1 = 0
which is a first-order ODE in v 0 .
81 9. Methods for ODEs
where many terms drop out since y1 and y2 are homogeneous solutions.
• We are left with a system of two first-order ODEs for the ci , which are solvable. By solving the
system, we find
f y2 f y1
c01 = − , c02 =
P2 W P2 W
where W is again the Wronskian. Then the general solution is
Z x Z x
f (t)y2 (t) f (t)y1 (t)
y(x) = −y1 (x) dt + y2 (x) dt.
P2 (t)W (t) P2 (t)W (t)
As before, there are issue if P2 (t) ever vanishes, so we assume it doesn’t. The constants of
integration from the unspecified lower bounds allow the addition of an arbitrary homogeneous
solution.
• So far we haven’t accounted for boundary conditions. Consider the simple case y(a) = y(b) = 0.
We choose homogeneous solutions obeying
y1 (a) = y2 (b) = 0.
c2 (a) = c1 (b) = 0
• Fourier series are defined for functions f : S 1 → C, parametrized by θ ∈ [−π, π). We define the
Fourier coefficients Z 2π
1 inθ 1
ˆ
fn = (e , f ) ≡ e−inθ f (θ) dθ.
2π 2π 0
We then claim that X
f (θ) = fˆn einθ .
n∈Z
• One can show that the Fourier series converges to f for continuous functions with bounded
continuous derivatives. Fejer’s theorem states that one can always recover f from the fˆn as
long as f is continuous except at finitely many points, though it makes no statement about the
convergence of the Fourier series. One can also show that the Fourier series converges to f as
long as n |fˆn | converges.
P
At the discontinuity, the Fourier series converges to the average of f (π + ) and f (π − ). This
always happens: to show that, simply add the sawtooth to any function with a discontinuity
to remove it, then apply linearity.
• Integration makes Fourier series ‘nicer’ by dividing fˆn by in, while differentiation does the
opposite. In particular, a discontinuity appears as 1/n decay of the Fourier coefficients (as
shown for the sawtooth), so a discontinuity of f (k) appears as 1/nk+1 decay. For a smooth
function, the Fourier coefficients fall faster than any power.
• Right next to a discontinuity, the truncated Fourier series displays an overshoot by about 18%,
called the Gibbs-Wilbraham phenomenon. The width of the overshoot region goes to zero as
more terms are added, but the maximum extent of the overshoot remains the same; this shows
that the Fourier series converges pointwise rather than uniformly. (The phenomenon can be
shown explicitly for the square wave; this extends to all other discontinuities by linearity.)
• Computing the norm-squared of f in position space and Fourier space gives Parseval’s identity,
Z π X
|f (θ)|2 dθ = 2π |fˆk |2 .
−π k∈Z
This is simply the fact that the map f (x) → fˆn is unitary.
• Parseval’s theorem also gives error bounds: the mean-squared error from cutting off a Fourier
series is proportional to the length of the remaining Fourier coefficients. In particular, the best
possible approximation of a function f (in terms of mean-squared error) using only a subset of
the Fourier coefficients is obtained by simply truncating the Fourier series.
83 9. Methods for ODEs
Fourier series are simply changes of basis in function space, and linear differential operators are
linear operators in function space.
along with homogeneous boundary conditions. Generically, there will be infinitely many eigen-
functions, allowing us to construct a solution to the inhomogeneous problem by linearity.
We could then get an explicit expression for L∗ using integration by parts. However, generally
we end up with boundary terms, which don’t have the correct form.
• Suppose that we have certain homogeneous boundary conditions on y. Demanding that the
boundary terms vanish will induce homogeneous boundary conditions on w. If L = L∗ and the
boundary conditions stay the same, the problem is self-adjoint. If only L = L∗ , then we call L
self-adjoint, or Hermitian.
• Eigenfunctions of the adjoint problem have the same eigenvalues as the original problem. That
is, if Ly = λy, there is a w so that L∗ w = λw. This is intuitive thinking of L∗ as the transpose
of L, though we can’t formally prove it.
Lyj = λj yj , Lyk = λk yk
where the latter yields L∗ wk = λk wk . Then if λj 6= λk , then hyj , wk i = 0. This follows from
the same proof as for matrices.
84 9. Methods for ODEs
• To solve a general inhomogeneous boundary value problem, we solve the eigenvalue problem
(subject to homogeneous boundary conditions) as well as the adjoint eigenvalue problem, to
obtain (λj , yj , wj ). To obtain a solution for Ly = f (x) we assume
X
y= ci yi (x).
i
• Finally, consider the case of inhomogeneous boundary conditions. Such a problem can always
be split into an inhomogeneous problem with homogeneous boundary conditions, and a homoge-
neous problem with inhomogeneous boundary conditions. Since solving homogeneous problems
tends to be easier, this case isn’t much harder.
Performing the decomposition described above, the homogeneous boundary conditions are simply
y(0) = y(1) = 0, so the eigenfunctions are
yk (x) = sin(kπx), λk = −k 2 π 2 , k = 1, 2, . . . .
d2 d
L = P (x) 2
+ R(x) − Q(x), Ly = 0.
dx dx
d2
R
1 R(x) d Q(x) Rx d x
R(t)/P (t) dt d Q(x)
L= 2 + − = e− R(t)/P (t) dt e − .
P (x) dx P (x) dx P (x) dx dx P (x)
Assuming P (x) 6= 0, the equation Ly = 0 is equivalent to (1/P (x))Ly = 0. Hence any L can
be taken to have the form
d d
L= p(x) − q(x)
dx dx
without loss of generality. Operators in this form are called Sturm-Liouville operators.
85 9. Methods for ODEs
provided that the functions on which they act obey appropriate boundary conditions. To see
this, apply integration by parts for
dg b
∗
df
(Lf, g) − (f, Lg) = p(x) g − f∗ .
dx dx a
• There are several possible boundary conditions that ensure the boundary term vanishes. For
example, we can demand
f (a)/f 0 (a) = ca , f (b)/f 0 (b) = cb
for constants ca and cb , for all functions f . Alternatively, we can demand periodicity,
Another possibility is that p(a) = p(b) = 0, in which case the term automatically vanishes.
Naturally, we always assume the functions are smooth.
Ly(x) = λw(x)y(x).
The weight function must be real, nonnegative, and have finitely many zeroes on the domain
[a, b]. It isn’t necessary, as we can remove it by redefining y and L, but it will be convenient.
so that (f, g)w = (f, wg) = (wf, g). The conditions on the weight function are chosen so that
the inner product remains nondegenerate, i.e. (f, f )w = 0 implies f = 0. We take the weight
function to be fixed for each problem.
• By the usual proof, if L is self-adjoint, then the eigenvalues λ are real. Moreover, since everything
is real except for the functions themselves, f ∗ is an eigenfunction if f is. Thus we can always
switch basis to Re f and Im f , so the eigenfunctions can be chosen real.
Thuspwe can construct an orthonormal set {Yn (x)} from eigenfunctions yn (x) by setting Yn =
yn / (yn , yn ).
86 9. Methods for ODEs
• One can show that the eigenvalues form a countably infinite sequence {λn } with |λn | → ∞
as n → ∞, and that the eigenfunctions Yn (x) form a complete set for functions satisfying the
given boundary conditions. Thus we may always expand such a function f as
∞
X Z b
f (x) = fn Yn (x), fn = (Yn , f )w = Yn∗ (x)f (x)w(x) dx.
n=1 a
Example. We choose periodic boundary conditions on [−L, L] with L = d2 /dx2 and w(x) = 1.
Solving the eigenfunction equation
y 00 (x) = λy(x)
gives solutions
nπ 2
yn (x) = exp(inπx/L), λn = − , n ∈ Z.
L
Thus we’ve recovered the Fourier series.
Example. Consider the differential equation
1 00
H − xH 0 = −λH, x ∈ R
2
subject to the condition that H(x) grows sufficiently slowly at infinity, to ensure inner products
exist. Using the method of integrating factors, we rewrite the equation in Sturm-Liouville form,
d 2 dH 2
e−x = −2λe−x H(x).
dx dx
2
This is now an eigenfunction equation with weight function w(x) = e−x . Thus weight functions
naturally arise when converting general second-order linear differential operators to Sturm-Liouville
form. The solutions are the Hermite polynomials,
dn −x2
2
Hn (x) = (−1)n ex e
dxn
and they are orthogonal with respect to the weight function w(x).
Example. Consider the inhomogeneous equation
Lφ(x) = w(x)F (x)
where F (x) is a forcing term. Expanding in the eigenfunctions yields the particular solution
∞
X (Yn , F )w
φp (x) = Yn (x).
λn
n=1
Alternatively, expanding this as an integral and defining f (x) = w(x)F (x), we have
∞
Yn (x)Yn∗ (ξ)
Z b X
φp (x) = G(x, ξ)f (ξ) dξ, G(x, ξ) = .
a λn
n=1
The function G is called a Green’s function, and it provides a formal inverse to L. It gives the
response at x to forcing at ξ.
87 9. Methods for ODEs
9.3 Distributions
We now take a detour by defining distributions, as the Dirac delta ‘function’ will be needed later.
• Given a domain Ω, we choose a class of test functions D(Ω). The test functions are required to
be infinitely smooth and have compact support; one example is
( 2
e−1/(1−x ) |x| < 1,
ψ(x) =
0 otherwise.
δ[φ] = φ(0)
which cannot be thought of this way. Though we often write the Dirac δ-function under integrals,
we always implicitly think of it as a functional of test functions.
• The Dirac δ-function can also be defined as the limit of a sequence of distributions, e.g.
2 x2 √
Gn (x) = ne−n / π.
In terms of functions, the limit limn→∞ Gn (x) does not exist. But if we view the functions
as distributions, we have limn→∞ (Gn , φ) = φ(0) for each φ, giving a limiting distribution, the
Dirac delta.
T 0 [φ] = −T [φ0 ].
This trick means that distributions are infinitely differentiable, despite being incredibly badly
behaved! For example, δ 0 [φ] = −φ0 (0). As another example, the step function Θ(x) is not
differentiable as a function, but as a distribution,
which gives Θ0 = δ.
88 9. Methods for ODEs
Again, the right-hand side must be thought of as a limit of a series of distributions. When
integrated against a test function φ(x), it extracts the sum of the Fourier coefficients φ̂n , which
yields φ(0).
• Similarly, we can expand the Dirac δ-function in any basis of orthonormal functions,
X Z b
δ(x − ξ) = cn Yn (x), cn = Yn∗ (x)δ(x − ξ)w(x) dx = Yn∗ (ξ)w(ξ).
n a
where we can replace w(ξ) with w(x) since δ(x−ξ) is zero for all x 6= ξ. To check this expression,
P
note that if g(x) = m dm Ym (x), then
Z b X Z b X
∗ ∗ ∗
g (x)δ(x − ξ) = Yn (ξ)dm w(x)Ym∗ (x)Yn (x) dx = d∗m Ym∗ (ξ) = g ∗ (ξ).
a m,n a m
We will apply the eigenfunction expansion of the Dirac δ-function to Green’s functions below.
Note. Principal value integrals. Suppose we wanted to view the function 1/x as a distribution.
This isn’t possible directly because of the divergence at x = 0, but we can use the principal value
Z − Z ∞
1 f (x) f (x)
P [f (x)] = lim dx + dx .
x →0+ −∞ x x
All the integrals here are real, but for many applications, f (x) will be a meromorphic complex
function. Then we can simply evaluate the principal value integral by taking a contour that goes
around the pole at x = 0 by a semicircle, and closes at infinity.
Note. We may also regulate 1/x by adding an imaginary part to x. The Sokhotsky formula is
1 1
lim = P − iπδ(x)
→0+ x + i x
where both sides do not converge as functions, but merely as distributions. This can be shown
straightforwardly by integrating both sides against a test function and taking real and imaginary
parts; note that we cannot assume the test function is analytic and use contour integration.
89 9. Methods for ODEs
Example. A Kramers-Kronig relation. Suppose that our test function f (x) is analytic in the
lower-half plane and decays sufficiently quickly. Then applying 1/(x + i) to f (x) gives zero by
contour integration, so we have Z ∞
f (x)
P dx = iπf (0)
−∞ x
by the Sokhotsky formula. In particular, this relates the real and imaginary parts of f (x).
Note. One has to be careful with performing algebra with distributions. Suppose that xa(x) = 1
where a(x) is a distribution, and both sides are regarded as distributions. Then dividing by x is
not invertible; we instead have
1
a(x) = P + Aδ(x)
x
where A is not determined. This is important for Green’s functions below.
d2 d
L = α(x) 2
+ β(x) + γ(x)
dx dx
defined on [a, b], and wish to solve the problem Ly(x) = f (x) where f (x) is a forcing term.
For mechanical systems, such terms represent literal forces; for first-order systems such as heat,
they represent sources.
LG = δ(x − ξ)
where L always acts solely on x. To get a unique solution, we must also set boundary conditions;
for concreteness we choose G(a, ξ) = G(b, ξ) = 0.
• The Green’s function G(x, ξ) is the response to a δ-function source at ξ. Regarding the equation
above as a matrix equation, it is the inverse of L, and the solution to the problem with general
forcing is
Z b
y(x) = G(x, ξ)f (ξ) dξ.
a
Here, the integral is just a continuous variant of matrix multiplication. The differential operator
L can be thought of the same way; its matrix elements are derivatives of δ-functions.
• To construct the Green’s function, take a basis of solutions {y1 , y2 } to the homogeneous equation
(i.e. no forcing term) such that y1 (a) = 0 and y2 (b) = 0. Then we must have
(
A(ξ)y1 (x) x < ξ,
G(x, ξ) =
B(ξ)y2 (x) x > ξ.
90 9. Methods for ODEs
• Next, we need to join these solutions together at x = ξ. We know that LG has only a δ-function
singularity at x = ξ. Hence the singularity must be provided by the second derivative, or else we
would get stronger singularities; then the first derivative has a discontinuity while the Green’s
function itself is continuous. Explicitly,
− + ∂G ∂G 1
G(x = ξ , ξ) = G(x = ξ , ξ), − = .
∂x x=ξ− ∂x x=ξ+ α(ξ)
Here, W = y1 y20 − y2 y10 is the Wronskian, and it is nonzero because the solutions form a basis.
• This reasoning fully generalizes to higher order ODEs. For an nth order ODE, we have a basis
of n solutions, a discontinuity in the n − 1th derivative, and n − 1 continuity conditions.
• If the boundary conditions are inhomogeneous, we use the linearity trick again: we solve the
problem with inhomogeneous boundary conditions but no forcing (using our earlier methods),
and with homogeneous boundary conditions with forcing.
• We can also compute the Green’s function in terms of the eigenfunctions. Letting G(x, ξ) =
P
n Ĝn (ξ)Yn (x), and expanding LG = δ(x − ξ) gives
X X
w(x) Ĝn (ξ)λn Yn (x) = w(x) Yn (x)Yn∗ (ξ)
n n
which implies Ĝn (ξ) = Yn∗ (ξ)/λn . This is the same result we found several sections earlier.
• Note that the coefficients Ĝn (ξ) are singular if λn = 0. This is simply a manifestation of the
fact that Ax = b has no unique solution if A has a zero eigenvalue.
• Green’s functions can be defined for a variety of boundary conditions. For example, when time
is the independent variable with t ∈ [t0 , ∞), then we might take y(t0 ) = y 0 (t0 ) = 0. Then the
Green’s function G(t, τ ) must be zero until t = τ , giving the retarded Green’s function. Using
a ‘final’ condition instead would give the advanced Green’s function.
dt p
f= = n(z) 1 + z 02
dx
p
which has no explicit x-dependence, giving the first integral (a − bz)/(1 + z 02 ). Separating and
integrating shows that the path is a parabola; a linear n(z) would give a circle.
Example. The brachistochrone. A bead slides on a frictionless wire from (0, 0) to (x, y) with y
positive in the downward direction. We have
s
dt 1 + (y 0 )2
f= ∝
dx y
p
which yields the first integral 1/ y(1 + y 02 ). Separating and integrating, then parametrizing ap-
propriately gives
x = c(θ − sin θ), y = c(1 − cos θ)
which is a cycloid.
Example. The isoperimetric problem: maximize the area enclosed by a curve with fixed perimeter.
To handle this constrained variation, we use Lagrange multipliers. In general, if we have the
constraint P [y] = c, then we extremize the functional
without constraint, then pick λ to satisfy the constraint. (For multiple constraints, we just add one
term for each constraint, with a different λi .) In this case, the area and perimeter are
I I p
A[y] = y(x) dx, P [y] = 1 + (y 0 )2 dx
C C
where x is integrated from α to β (for the top half), then back down from β to α (for the bottom
half). We must extremize the functional
p
f [y] = y − λ 1 − y 02
q → q + sδq, q̇ → q̇ + sδ q̇.
˙ because we are varying along paths, on which q̇ and q are related.
Note that δ q̇ = (δq)
• For this transformation to be a symmetry, the Lagrangian must change by a total derivative,
as this preserves stationary paths of the action,
∂L ∂L dK
δL = s δq + δ q̇ =s .
∂q ∂ q̇ dt
Applying the Euler-Lagrange equations, on shell we have
dK d ∂L d ∂L
s =s δq → δq − K = 0.
dt dt ∂ q̇ dt ∂ q̇
This is Noether’s theorem.
• To get a shortcut for finding a conserved quantity, promote s to a function s(t). Then we pick
up an extra term,
∂L ∂L ∂L dK ∂L
δL = s δq + δ q̇ + ṡδq =s + ṡδq
∂q ∂ q̇ ∂ q̇ dt ∂ q̇
where K is defined as above. Simplifying,
d ∂L
δL = (sK) + ṡ δq −K
dt ∂ q̇
so that the conserved quantity is the coefficient of ṡ. This procedure can be done without
knowing K beforehand; the point is to simplify the variation into the sum of a total derivative
and a term proportional to ṡ, which is only possible when we are considering a real symmetry.
• We can also phrase the shortcut differently. Suppose we can get the variation in the form
δL = sK̇ + ṡJ.
and the variation of the action must vanish on-shell for any variation, including a variation from
a general s(t). Then we need K̇ − J˙ = 0, so K − J is conserved. This is simply a rephrasing of
the previous method. (Note that we can always write δL as linear in s and ṡ, but the coefficient
of s will only be a total derivative when we are dealing with a symmetry.)
• The same setup can be done in Hamiltonian mechanics, where the action is
Z
I[q, p] = pq̇ − H(q, p) dt
and q and p are varied independently, with fixed endpoints for x. This is distinct from the
Lagrangian picture where q and q̇ cannot be varied independently on paths, even if they are
off-shell. In the Hamiltonian picture, q and p are only on on-shell paths.
93 9. Methods for ODEs
If time translational symmetry holds, ∂L/∂t = 0, giving K = L and the conserved quantity
∂L
H = q̇ − L.
∂ q̇
On the other hand, using our shortcut method in Hamiltonian mechanics,
where we used ∂H/∂t = 0. We then directly read off the conserved quantity H.
We can also handle functionals of functions with multiple arguments, in which case the Euler-
Lagrange equation gives partial differential equations. Note that this is different from functionals
of multiple functions, in which case we get multiple Euler-Lagrange equations.
Example. A minimal surface is a surface of minimal area satisfying some boundary conditions.
The functional is Z
∂y
q
F [y] = dx1 dx2 1 + y12 + y22 , yi =
∂xi
which can be seen by rotating into a coordinate system where y2 = 0. Denoting the integrand as f ,
the Euler-Lagrange equation is
d ∂f ∂f
=
dxi ∂yi ∂y
and the right-hand side is zero. Simplifying gives the minimal surface equation
Example. Functionals like the one above are common in field theories. For example, the action
for waves on a string is Z
1
S[y] = dx dt (ρẏ 2 − T y 02 ).
2
Using our Euler-Lagrange equation above, there is no dependence on y, giving
d d
(−T y 0 ) + (ρẏ) = 0
dx dt
which yields the wave equation. It can be somewhat confusing to treat x and t on the same footing
in this way, so sometimes it’s easier to set the variation to zero directly.
94 10. Methods for PDEs
∇2 ψ = 0.
Later, we will apply our results to the study of the heat, wave, and Schrodinger equations,
∂ψ ∂2ψ ∂ψ
K∇2 ψ = , c2 ∇2 ψ = , −∇2 ψ + V (x)ψ = i .
∂t ∂t2 ∂t
Separating the time dimension in these equations will often yield a Helmholtz equation in space,
∇2 ψ + k 2 ψ = 0.
Finally, an important variant of the wave equation is the massive Klein-Gordan equation,
∂2ψ
c2 ∇2 ψ − m2 ψ = .
∂t2
As shown in electromagnetism, the solution to Laplace’s equation is unique given Dirichlet or
Neumann boundary conditions. We always work in a compact spatial domain Ω.
Example. In two dimensions, Laplace’s equation is equivalent to
∂2ψ
=0
∂z∂z
where z = x + iy. Thus the general solution is ψ(x, y) = φ(z) + χ(z) where φ and χ are holomorphic
and antiholomorphic. For example, suppose we wish to solve Laplace’s equation inside the unit disc
subject to ψ = f (θ) on the boundary. We may write the boundary condition as a Fourier series,
X
f (θ) = fˆn einθ .
n∈Z
Now note that at |z| = 1, z n and z −n reduce to einθ . Thus the solution inside the disc is
∞
X
ψ(x, y) = fˆ0 + (fˆn z n + fˆ−n z n )
n=1
which is indeed the sum of a holomorphic and antiholomorphic function. Similarly, to get a bounded
solution outisde the disc, we simply flip the powers.
Next, we introduce the technique of separation of variables.
• Suppose the boundary conditions are given in a three-dimensional rectangular region. Then it
is convenient to separate in Cartesian coordinates. Writing
• Generally, we see that separation converts PDEs into individual Sturm-Liouville problems, with
a specified relation between the eigenvalues (in this case, they must sum to zero). Each solution
is a normal mode of the system – we’ve seen this vocabulary before, applied to eigenvalues in
time. Homogeneous boundary conditions (e.g. ‘zero on this surface’) then give constraints on
the allowed eigenvalues.
• Finally, we arrive at a set of allowed solutions and superpose them to satisfy a set of given
inhomogeneous boundary conditions. This is often simplified by the orthogonality of the
eigenfunctions; we project the inhomogeneous term onto each one.
• For the angular equation, we substitute x = cos θ, so that x ∈ [−1, 1], giving
d dΘ
(1 − x2 ) = −λΘ.
dx dx
This is a Sturm-Liouville equation, which is self adjoint because p(±1) = 0, with weight function
w(x) = 1. The solutions are hence orthogonal on [−1, 1].
• The solutions are the Legendre polynomials, obeying the Rodriguez formula
1 d` 2
P` (x) = (x − 1)` , λ = `(` + 1), ` = 0, 1, . . . .
2` `! dx`
They can be found by guessing a series solution and demanding the series truncates to a
finite-degree polynomial. An explicit calculation shows that
Z 1
2
Pm (x)P` (x) dx = δm` .
−1 2` +1
As in the previous example, any axisymmetric boundary condition on a sphere can be expanded
in Legendre polynomials.
• As an application, applying our results to the field of a point charge gives the multipole
expansion, where ` = 0 is the monopole, ` = 1 is the dipole, and so on.
• Allowing for dependence on φ, the φ equation has solution Φ(φ) = eimφ for integer m, while
the θ equation yields an associated Legendre function; the radial equation remains the same.
n2
d dR
r − R = −µrR
dr dr r
• The eigenvalue µ doesn’t matter because it simply sets the length scale. Eliminating it by
√
setting x = r µ gives Bessel’s equation of order n,
d2 R dR
x2 2
+x + (x2 − n2 )R = 0.
dx dx
The solutions are the Bessel functions Jn (x) and Yn (x).
• The Bessel functions of the first kind, Jn (x), are regular at the origin, but the Yn (x) are not;
thus we can ignore them if we care about the region x → 0.
• Solving the Helmholtz equation in three dimensions (again, often encountered by separating
out time) yields the spherical Bessel functions jn (x) and yn (x). They behave somewhat like
regular Bessel functions of order n + 1/2, but fall as 1/x for large x instead.
Next, we turn to the heat equation. Since it involves time, we write its solutions as Φ, while ψ is
reserved for space only.
• For positive diffusion constant K, the heat equation ‘spreads heat out’, so it is only defined for
t ∈ [0, ∞). If we try to follow the time evolution backwards, we generically get singularities at
finite time.
97 10. Methods for PDEs
R
• The heat flux is K∇Φ. Generally, we can show that the total heat Φ dV is conserved as long
as no heat flux goes through the boundary.
• Another useful property is that if Φ(x, t) solves the heat equation, then so does Φ(λx, λ2 t),
as can be checked explicitly. Then the time
√ dependence of any solution can be written as a
function of the similarity variable η = x/ Kt.
2F 0 + ηF = const.
exp(−x2 /4Kt)
G(x, t) = √ .
4πKt
This is called the heat kernel, or the fundamental solution of the heat equation; at t = 0 it
limits to δ(x). Convolving it with the state at time t0 gives the state at time t0 + t.
That is, high eigenvalues are quickly suppressed. For example, if we work on the line, where the
spatial solutions are exponentials, and recall the decay properties of Fourier series, evolution
under the heat equation for an infinitesimal time removes discontinuities!
• Since the heat equation involves time, we must also supply an initial condition along with
standard spatial boundary conditions. We now prove uniqueness for Dirichlet conditions in
time and space. Let Φ1 and Φ2 be solutions and let δΦ be their difference. Then
Z Z Z
d
δΦ dV ∝ (δΦ)∇ δΦ dV = − (∇δΦ)2 dV ≤ 0
2 2
dt Ω Ω Ω
where we integrated by parts and applied the boundary conditions to remove the surface term.
Then the left-hand side is decreasing, but it starts at zero by the initial conditions, so it is
always zero. (We can also show this by separating variables.)
• The spatial domain Ω must be compact for the integrals above to exist. For example, in an
infinite domain we can have heat forever flowing in from infinity, giving a nonunique solution.
Example. The cooling of the Earth. We model the Earth as a sphere of radius R with an isotropic
heat distribution and initial conditions
so that the Earth starts with a uniform temperature, with zero temperature at the surface (i.e. outer
space). We separate variables by Φ(r, t) = R(r)T (t) giving
d 2 dR dT
r = −λ2 r2 R, = −λ2 KT.
dr dr dt
98 10. Methods for PDEs
We then choose the coefficients An to fit the inhomogeneous initial condition. At time t = 0,
nπr Z R nπr
X Θ0 R
rΘ0 = An sin → An = Θ0 sin r dr = (−1)n+1 .
R 0 R nπ
n∈Z
The solution is not valid for r > R because the thermal diffusivity K changes, from the value for
rock to the value for air.
Note. Solving problems involving the wave equation is rather similar; the only difference is that
we get oscillation in time rather than exponential decay, and that we need both an initial position
and velocity. To prove uniqueness, we use the energy functional
Z
1
E= φ̈ + c2 (∇φ)2 dV
2 Ω
which is positive definite and conserved. Then the difference of two solutions has zero initial energy,
so it must be zero.
Note. There is no fundamental difference between initial conditions and (spatial) boundary con-
ditions: they both are conditions on the boundary of the spacetime region where the PDE holds;
Dirichlet and Neumann boundary conditions correspond exactly to initial positions and velocities.
However, in practice they are treated differently because the time condition is ‘one-sided’: while we
can specify that a rope is held at both of its ends, we usually can’t specify where it’ll be both now
and in the future. As a result, while we only often need one (two-sided) boundary condition to get
uniqueness, we need as many initial conditions as there are time derivatives.
Note. In our example above, the initial condition is inhomogeneous and the boundary condition is
homogeneous. But if both were inhomogeneous, our method would fail because we wouldn’t have
any conditions to constrain the eigenvalues. In this case the trick is to use linearity, which turns
the problem into the sum of two problems, each with one homogeneous condition.
All integrals in this section are over the real line. The Fourier transform is linear, and obeys
f˜(k/c)
F [f (x − a)] = e−ika f˜(k), F [ei`x f (x)] = f˜(k − `), F [f (cx)] = .
|c|
99 10. Methods for PDEs
F [f 0 (x)] = ik f˜(k).
This allows differential equations with forcing to be rewritten nicely. If L(∂)y(x) = f (x),
• Defining the Fourier transform of a δ-function requires some more distribution theory, but
naively we have F [δ(x)] = 1, with the inverse Fourier transform implying the integral
Z
e−ikx dx = 2πδ(k).
which imply
Example. The Fourier transform of a step function Θ(x) is subtle. In general, the Fourier trans-
forms of ordinary functions can be distributions, because functions in Fourier space are only linked
to observable quantities in real space via integration. Naively, we would have 1/ik since δ is the
derivative of Θ, but this is incorrect because dividing by k gives us extra δ(k) terms we haven’t
determined. Instead, we add an infinitesimal damping Θ(x) → Θ(x)e−x giving
1 1
FΘ = lim = P + πδ(k)
→0+ + ik ik
by the Sokhotsky formula. As a consistency check, we have
1
F[Θ(−x)] = −P + πδ(k)
ik
and the two sum to 2πδ(k), which is indeed the Fourier transform of 1.
Note. There is an alternative way to think about the Fourier transform of the step function. For
any function f (x), split
f (x) = f+ (x) + f− (x)
where the two terms have support for positive and negative x respectively. Then take the Fourier
transform of each piece. The point of this split is that for nice functions, the Fourier integral
Z ∞
˜
f+ (k) = f+ (x)eikx dx
0
will converge as long as Im k is sufficiently large; note we are now thinking of k as complex-valued.
The Fourier transform can be inverted as long as we follow a contour across the complex k plane in
this region of large Im k. For the step function, we hence have
1
FΘ = , Im k > 0.
ik
The expression is not valid at Im k = 0, so we cannot integrate along this axis. This removes the
ambiguity of whether we cross the pole above or below, at the cost of having to keep track of where
in the complex plane FΘ is defined. Often, as here, we can analytically continue f˜+ and f˜− to a
much greater region of the complex plane. A Fourier inversion contour is then valid as long as it
passes above all the singularities of f˜+ and below those of f˜− . In a more general situation, there
could also be branch cuts that obstruct the contour.
However, this integral does not exist, so we must resort to performing a contour integral around the
poles. This ad hoc procedure makes more sense using distribution theory. We can’t really divide
by k 2 + m2 since G̃(k) is a distribution, so instead
1
G̃(k) = P + g1 δ(k − m) + g2 δ(k + m)
k 2 + m2
with g1 and g2 undetermined, reflecting the fact that the Green’s function is not uniquely defined
without boundary conditions. By the Sokhotsky formula, we can go back and forth between the
principal value and the i regulator at the cost of modifying g1 and g2 . This is extremely useful
because of the link between causality and analyticity, as we saw for the Kramers-Kronig relations.
In particular, the retarded and advanced Green’s functions are just
1 1
G̃ret (k) = , G̃adv (k) =
k2 − m2 − ik k2 − m2 + ik
with no need for more delta function terms at all. Similarly, if we had a PDE instead, the general
Green’s function would be
1
G̃(k) = P 2 + g(k)δ(k 2 − m2 )
k + m2
and the function g(k) must be determined by boundary conditions.
Example. Solving another differential equation using a Fourier transform in the complex plane.
We consider Airy’s equation
d2 y
+ xy = 0.
dx2
We write the solution as a generalized Fourier integral
Z
y(x) = g(ζ)exζ dζ.
Γ
which must vanish for all x. The first term is evaluated at the endpoints of the contour. For the
second term to vanish for all x, we must have
3 /3
g 0 (ζ) = ζ 2 g(ζ), g(ζ) = Ceζ .
At this point, this might seem strange, as we were supposed to have two independent solutions. But
note that in order for g(ζ)exζ to vanish at the endpoints, the contour must go to infinity in one of
the unshaded regions below.
102 10. Methods for PDEs
If we take a contour that starts and ends in the same region, then we will get zero by Cauchy’s
theorem. Then there are two independent contours, starting in one region and ending in another,
giving the two independent solutions; all others are related by summation or negation. Of course,
the integrals cannot be performed in closed form, but for large x the integrals are amenable to
saddle point approximation.
Note. The discrete Fourier transform applies to functions defined on Zn and is useful for computing.
It’s independent of the Fourier series we considered earlier; their common property of a discrete
spectrum comes from the compactness of the domains S 1 and Zn . More generally, we can perform
Fourier analysis on any Abelian group, or even any compact, possibly non-Abelian group.
Example. Fourier transforms are useful for linear time-translation invariant (LTI) systems, LI = O.
These are more general than linear differential operators, as L might integrate I or impose a time
delay. However, their response is local in frequency space, because if L(eiωt ) = O(t), then
˜ R̃(ω)
Õ(ω) = I(ω)
where R̃ is called the transfer function or system function. Taking an inverse Fourier transform
gives O(t) = (I ∗ R)(t), so R behaves like a Green’s function; it is called the response function.
As an explicit example, consider the case
n
X di O(t)
ai = I(t)
dti
i=0
J J X kj
1 1 Y 1 X Γmj
R̃(ω) = n
= k
=
a0 + a1 iω + · · · + an (iω) an (iω − cj ) j (iω − cj )m
j=1 j=1 m=1
where the cj are the roots of the polynomial and the kj are their multiplicities, and we used partial
fractions in the last step. In the case m = 1, we recall the result from the example above,
1
F[eαt Θ(t)] = , Re(α) < 0.
iω − α
Therefore, using the differentiation rule, we have
1
F[(tm eαt /m!)Θ(t)] = , Re(α) < 0
(iω − α)m+1
which provides the general solution for R(t). We see that oscillatory/exponential solutions appear
as poles in the complex plane, while higher-order singularities provide higher-order resonances.
Example. Stabilization by negative feedback. Consider a system function R̃(ω). We say the system
is stable if it doesn’t have exponentially growing modes; this corresponds to R̃(ω) having no poles
in the upper half-plane. Now suppose we attempt to stabilize a system by adding negative feedback,
103 10. Methods for PDEs
feeding the output scaled by −r and time delayed by t0 back into the input. Defining the feedback
factor k = reiωt0 , the new system function is
R̃(ω)
R̃(ω)loop =
1 + k R̃(ω)
by the geometric series formula; this result is called Black’s formula. Then the new poles are given
by the zeroes of 1 + αR̃(ω).
The Nyquist criterion is a graphical method for determining whether the new system is stable.
We consider a contour C along the real axis and closed along the upper half-plane, encompassing all
poles and zeroes of R̃(ω). The Nyquist plot is a plot of R̃(ω) along C. By the argument principle,
the number of times the Nyquist plot wraps around −1 is equal to the number of poles P of R̃(ω)
in the upper-half plane minus the number of zeroes of k R̃(ω) + 1 in the upper-half plane. Then the
system is stable if the Nyquist plot wraps around −1 exactly P times. This is useful since we only
need to know P , not the location of the poles or the number of zeroes.
Note. Causality is ‘built in’ to the Fourier transform. As we’ve seen in the above examples, damping
that occurs forward in time (as required by Re(α) < 0) automatically yields singularities only in
the upper-half plane, and causal/retarded Green’s functions that vanish for t < 0.
In general, the Green’s functions returned by the Fourier transform are regular for |t| → ∞,
which serves as an extra implicit boundary condition. For example, for the damped harmonic
oscillator we have
1
G̃(ω) = 2
ω0 − ω 2 − iγω
which yields a unique G(t, τ ), because the advanced solution (which blows up at t → −∞) has been
thrown out. On the other hand, for the undamped harmonic oscillator,
1
G̃(ω) =
ω02 − ω2
the Fourier inversion integral diverges, so G(t, τ ) cannot be defined. We must specify a ‘pole
prescription’, which corresponds to an infinitesimal damping. Forward damping gives the retarded
Green’s function, and reverse damping gives the advanced Green’s function. Note that there’s no
analogue of the Feynman Green’s function; that appears in field theory because there are both
positive and negative-energy modes.
• Initial conditions and boundary conditions specify the value of a function φ and/or its derivatives,
on a surface of codimension 1. In general, such information is called Cauchy data, and solving
a PDE along with given Cauchy data is called a Cauchy problem.
• A Cauchy problem is well-posed if there exists a unique solution which depends continuously
on the Cauchy data. We’ve seen that the existence and uniqueness problem can be subtle.
• We have already seen that the backwards heat equation is ill-posed. Another example is
Laplace’s equation on the upper-half plane with boundary conditions
sin(Ax)
φ(x, 0) = 0, ∂y φ(x, 0) = g(x), g(x) = .
A
104 10. Methods for PDEs
The method of characteristics helps us formalize how solutions depend on Cauchy data.
Such a PDE is called quasi-linear, because it is linear in φ, but the functions α and β are not
linear in x and y.
u · ∇φ = f.
The vector field u defines a family of integral curves, called characteristic curves,
where s is the parameter along the curve and t identifies the curve, satisfying
∂x ∂y
= α|Ct , = β|Ct .
∂s t ∂s t
Therefore, for a unique solution to exist, we must specify Cauchy data at exactly one point
along each characteristic curve, i.e. along a curve B transverse to the characteristic curves. The
value of the Cauchy data at that point determines the value of φ along the entire curve. Each
curve is completely independent of the rest!
Example. The 1D wave equation is (∂x2 − ∂t2 )φ = 0, which contains both right-moving and left-
moving waves. The simpler equation (∂x − ∂t )φ = 0 only contains right-moving waves; the charac-
teristic curves are x − t = const.
ex ∂x φ + ∂y φ − 0, φ(x, 0) = cosh x.
where the constants c and d reflect freedom in the parametrizations of s and t. To fix s, we
demand that the characteristic curves pass through B at s = 0. To fix t, we parametrize B itself
by (x, y) = (t, 0). This yields
e−x = −s + e−t , y = s
and the solution is simply φ(s, t) = cosh t. Inverting gives the result
We could also add an inhomogeneous term on the right without much more effort.
Next, we generalize to the case of second-order PDEs, which yield new features.
• The principle part of the symbol, σ P (x, k), is the leading term. In the second-order case it is
an x-dependent quadratic form,
σ P (x, k) = kT Ak.
– elliptic if the eigenvalues all have the same sign (e.g. Laplace)
– hyperbolic if all but one of the eigenvalues have the same sign (e.g. wave)
– ultrahyperbolic if there is more than one eigenvalue with each sign (requires d ≥ 4)
– parabolic if there is a zero eigenvalue (i.e. the quadratic form is degenerate) (e.g. heat)
• When the coefficients are constant, then the Fourier transform of L is the symbol σ(ik). Another
piece of intuition is that the principle part of the symbol dominates when the solution is rapidly
varying.
Generically, stricter boundary conditions will not have solutions, or will have solutions that
depend very sensitively on them.
• In this case, the Cauchy data consists of the value of φ on a surface Γ along with the normal
derivative ∂n φ. Let ti denote the other directions. In order to propagate the Cauchy data to a
neighboring surface, we need to know the normal second derivative ∂n ∂n φ.
• Since we know φ on all of Γ, we know ∂ti ∂tj φ and ∂n ∂ti φ. To attempt to find ∂n ∂n φ we use
the PDE, which is
∂2φ
aij = known.
∂xi ∂xj
Therefore, we know the value of ann ∂n ∂n φ, which gives the desired result unless ann is zero.
• Generically, a characteristic surface has dimension one. In two dimensions, they are lines, and
an equation is hyperbolic, parabolic, or elliptic at a point if it has two, one, or zero characteristic
curves through that point.
Example. The wave equation is the archetypal hyperbolic equation. It’s easiest to see its charac-
teristic curves in ‘light-cone’ coordinates where ξ± = x ± ct, where it becomes
∂2φ
= 0.
∂ξ+ ∂ξ−
Then the characteristic curves are curves of constant ξ± . Information is propagated along these
curves in the sense that the general solution is f (ξ+ ) + g(ξ− ). On the other hand, the value of φ at
a point depends on all the initial Cauchy data in its past light cone; the ‘domain of dependence’ is
instead bounded by characteristic curves.
• We consider the Cauchy problem for the heat equation on Rn × [0, ∞),
∂φ
D∇2 φ = , φ(x, t = 0) = f (x), lim φ(x, t) = 0.
∂t x→∞
To do this, we find the solution for initial condition δ(x) (called the fundamental solution) by
Fourier transform in space, giving
2
2 e−x /4Dt
Sn (x, t) = F −1 [e−Dk t ] = .
(4πDt)n/2
The general solution is given by convolution with the fundamental solution. As expected, the
position x only enters through the similarity variable x2 /t. We also note that the heat equation
is nonlocal, as Sn (x, t) is nonzero for arbitrarily large x at arbitrarily small t.
107 10. Methods for PDEs
• We can also solve the heat equation with forcing and homogeneous initial conditions,
∂φ
− D∇2 φ = F (x, t), φ(x, t = 0) = 0.
∂t
In this case, we want to find a Green’s function G(x, t, y, τ ) representing the response to a δ-
function source at (y, t). Duhamel’s principle states that it is simply related to the fundamental
solution,
G(x, t, y, τ ) = Θ(t − τ )Sn (x − y, t − τ ).
To understand this, note that we can imagine starting time at t = τ + . In this case, we don’t
see the δ-function driving; instead, we see its outcome, a δ-function initial condition at y. The
general solution is given by convolution with the Green’s function.
• In both cases, a time direction is picked out by specifying φ(t = 0) and solving for φ at times
t > 0. In particular, this forces us to get the retarded Green’s function.
∂2φ
− c2 ∇2 φ = F, φ(t = 0) = ∂t φ(t = 0) = 0.
∂t2
Taking the spatial Fourier transform, the Green’s function satisfies
2
∂
+ k c G̃(k, t, y, τ ) = e−ik·y δ(t − τ ).
2 2
∂t2
Applying the initial condition and integrating gives
sin(kc(t − τ ))
G̃(k, t, y, τ ) = Θ(t − τ )e−ik·y .
kc
This result holds in all dimensions.
• To take the Fourier inverse, we perform the k integration in spherical coordinates, but the final
angular integration is only nice in odd dimensions. In three dimensions, we find
δ(|x − y| − c(t − τ ))
G(x, t, y, τ ) = −
4πc|x − y|
so that a force at the origin makes a shell that propagates at speed c. In one dimension, we
instead have G(x, t, y, τ ) ∼ θ(|x − y| − c(t − τ )), so we find a raised region whose boundary
propagates at speed c. In even dimensions, we can’t perform the eikr cos θ dθ integral. Instead,
we find a boundary that propagates with speed c with a long tail behind it.
• Another way to phrase this is that in one dimension, the instantaneous force felt a long distance
from the source is a delta function, just like the source. In three dimensions, it is the derivative.
Then in two dimensions, it is the half-derivative, but this is not a local operation.
• The same result can be found by a temporal Fourier transform, or a spacetime Fourier transform.
In the latter case, imposing the initial condition to get the retarded Green’s function is a little
more subtle, requiring a pole prescription.
• For the wave equation, Duhamel’s principle relates the Green’s function to the solution for an
initial velocity but zero initial position.
108 10. Methods for PDEs
The Green’s function is simply related to the fundamental solution only on an unbounded domain.
In the case of a bounded domain Ω, Green’s functions must additionally satisfy boundary conditions
on ∂Ω. However, it is still possible to construct a Green’s function using a fundamental solution.
Example. The method of images. Consider Laplace’s equation defined on a half-space with
homogeneous Dirichlet boundary conditions φ = 0. The fundamental solution is the field of a point
charge. The Green’s function can be constructed by putting another point charge with opposite
charge, ‘reflected’ in the plane; choosing the same charge would work for homogeneous Neumann
boundary conditions.
The exact same reasoning works for the wave equation. Dirichlet boundary conditions correspond
to a hard wall, and we imagine an upside-down ‘ghost wave’ propagating the other way. Similarly,
for the heat equation, Neumann boundary conditions correspond to an insulating barrier, and we
can imagine a reflected, symmetric source of heat.
For less symmetric domains, Green’s functions require much more work to construct. We consider
the Poisson equation as an extended example.
For n ≥ 3 the constant can be set to zero if we require Gn → 0 for x → ∞. Otherwise, we need
additional constraints. We then define Gn (x, y) = Gn (x − y), which is the response at x to a
source at y.
• Next, we turn to solving the Poisson equation on a compact domain Ω. We begin with deriving
some useful identities. For any regular functions φ, ψ : Ω → R,
Z Z Z
φ∇ψ · dS = ∇ · (φ∇ψ) dV = φ∇2 ψ + (∇φ) · (∇ψ) dV
∂Ω Ω Ω
• Next, we set ψ(x) = Gn (x, y) and ∇2 φ(x) = −F (x), giving Green’s third identity
Z Z
φ(y) = − Gn (x, y)F (x) dV + (φ(x)∇Gn (x, y) − Gn (x, y)∇φ(x)) · dS
Ω ∂Ω
where we used a delta function to do an integral, and all derivatives are with respect to x.
109 10. Methods for PDEs
• At this point it looks like we’re done, but the problem is that generally we can only specify φ or
∇φ · n̂ at the boundary, not both. Once one is specified, the other is determined by uniqueness,
so the equation above is really an expression for φ in terms of itself, not a closed form for φ.
• For concreteness, suppose we take Dirichlet boundary conditions φ|∂Ω = g. We define a Dirichlet
Green’s function G = Gn + H where H satisfies Laplace’s equation throughout Ω and G|∂Ω = 0.
Then using Green’s third identity gives
Z Z
φ(y) = g(x)∇G(x, y) · dS − G(x, y)F (x) dV
∂Ω Ω
which is the desired closed-form expression! Of course, at this point the hard task is to construct
H, but at the very least this problem has no source terms.
• As a concrete example, we can construct an explicit form for H whenever the method of images
applies. For example, for a half-space it is the field of a reflected opposite charge.
• Similarly, we can construct a Neumann Green’s function. There is a subtlety here, as the
integral of ∇φ · dS must be equal to the integral of the driving F , by Gauss’s law. If this doesn’t
hold, no solution exists.
• The surface terms can be given a physical interpretation. Suppose we set φ|∂Ω = 0 in Green’s
third identity, corresponding to grounding the surface ∂Ω. At the surface, we have
(∇φ) · n̂ ∝ E⊥ ∝ ρ
which means that the surface term is just accounting for the field of the screening charges.
• Similarly, we can interpret the surface term in our final result, when we turn on a potential
φ|∂Ω = g. To realize this, we make ∂Ω the inner surface of a very thin capacitor. The outer
surface ∂Ω0 , just outside ∂Ω, is grounded. The surfaces are split into parallel plates and hooked
up to batteries with emf g(x), giving locally opposite charge densities on ∂Ω0 and ∂Ω. Then
the potential g can be thought of as coming from nearby opposite sheets of charge. The term
∇G describes such sources, by thinking of the derivative as a finite difference.
110 11. Approximation Methods
11 Approximation Methods
11.1 Asymptotic Series
We illustrate the ideas behind perturbation theory with some algebraic equations with a small
parameter , before moving onto differential equations. We begin with some motivating examples
which will bring us to asymptotic series.
Example. Solve the equation
x2 + x − 1 = 0.
The exact solution is (
2
r
2 1− 2 + 8 + ...
x=− ± 1+ = 2
.
2 4 −1 − 2 + 8 + ...
This series converges for || < 2 and rapidly if is small; it is a model example of the perturbation
method. Now we show two ways to find the series without already knowing the exact answer.
First, rearrange the equation to the form x = f (x),
√
x = ± 1 − x.
The starting point x0 can be chosen to be an exact solution when = 0, in this case x0 = 1. Then
√
r
x1 = 1 − , x2 = 1 − 1 −
2
and so on. The xn term matches the series up to the n term. To see why, note that if the desired
fixed point is x∗ , then
Near the fixed point we have f 0 (x∗ ) ≈ −/2, so the error decreases by a factor of every iteration.
The most important part of this method is to choose f so that f 0 (x∗ ) is small, ensuring rapid
convergence. For instance, if we had f 0 (x∗ ) ∼ 1 − instead, convergence could be very slow.
Second, expand about one of the roots when = 0 in a series in ,
x = 1 + x1 + 2 x2 + . . . .
By plugging this into the equation, expanding in powers of , and setting each coefficient to zero, we
may determine the xi iteratively. This tends to be easier when working to higher orders. In general,
one might need to expand in a different variable than , but this works for regular problems.
Example. Solve the equation
x2 + x − 1 = 0.
This is more subtle because there are two roots for any > 0, but only one root for = 0. Problems
where the → 0 limit differs in an important way from the = 0 case are called singular. The exact
solutions are √ (
−1 ± 1 + 4 1 − + 22 + . . .
x= =
2 − 1 − 1 + − 22 + . . .
111 11. Approximation Methods
where the series converges for || < 1/4. We see the issue is that one root diverges to infinity. We
can capture it using the expansion method by starting the series with −1 ,
x−1
x= + x0 + x1 + . . . .
This also captures the regular root in the case x−1 = −1. However, we again only knew to start the
series at 1/ by using the exact solution.
We can arrive at the same conclusion by changing variables by a rescaling,
x = X/, X 2 + x − = 0.
This is now a regular problem which can be handled as above. Again, the difficult part is choosing
the right rescaling to accomplish this. Consider the general rescaling x = δX, which gives
δ 2 X 2 + δX − 1 = 0.
The rescaling is good if the formerly singular root becomes O(1). We would thus like at least two
of the quantities (δ 2 , δ, 1) to be similar in size, with the rest much smaller. This gives a regular
perturbation problem, where the similar terms give an O(1) root, and the rest perturb it slightly. By
casework, this only happens for δ ∼ 1 and δ ∼ 1/, giving the regular and singular roots respectively.
This method is called finding the “dominant balance” or “distinguished limit”.
x = 1 + x1 + 2 x2
0 : 0 = 0, 1 : 0 = 1.
x = 1 + δ 1 x1 , δ1 () 1
We now apply dominant balance again. The last two terms are always subleading, so balancing the
first two gives δ1 = 1/2 , from which we determine x1 = 1. At this point we could guess the next
term is O(), but to be safe we could repeat the procedure, setting
xn+1 = 1 ± 1/2 xn
which ensures rapid convergence. Taking the positive root and starting with x0 = 1 gives
xe−x = L .
xn+1 = L + log xn
The final logarithm can be expanded in a series, and continuing gives us an expansion with terms of
the form (log L)m /Ln . Even for tiny , L is not very large, and log L isn’t either. Hence the series
converges very slowly.
Since we are working with expansions more general than convergent power series, we formalize them
as asymptotic expansions.
• We say f = O(g) as → 0 if there exists K and 0 so that |f | < K|g| for all < 0 .
• We say f = o(g) as → 0 if f /g → 0 as → 0.
• A set of functions {φn ()} is an asymptotic sequence as → 0 if, for each n and i > 0,
φn+i () = o(φn ()) as → 0.
113 11. Approximation Methods
• A function f () has an asymptotic expansion with respect to the asymptotic sequence {φn ()}
as → 0 if there exists constants so that
X
f () ∼ an φn ()
n
for all N .
• Given {φn }, the coefficients an of f are unique. This is easily proven by induction. However,
the converse is not true: the coefficients an don’t determine f . Just like ordinary power series,
we may be missing terms that are smaller than any of the φn .
f () − N
P
n=0 fn ()
lim = 0.
→0 fN ()
That is, unlike the regular definition of convergence, we take → 0 rather than N → ∞.
• Asymptotic series may be integrated term by term. However, they may not be differentiated term
by term, because unlike power series, the functions fn () may be quite singular (e.g. cos(1/))
and grow much larger than expected upon differentiating.
• Asymptotic series may be plugged into each other, but some care must be taken. For example,
taking the exponential of only the leading terms of a series may give a completely wrong result;
we must instead take all terms of order 1 or higher.
• As we’ve seen above, the terms in an asymptotic series can get quite complicated. However, it
is at least true that functions obtained by a finite number of applications of +, −, ×, ∇·, exp,
and log may always be ordered; these are called Hardy’s logarithmico-exponential functions.
Example. Often an asymptotic expansion works better than a convergent power series. We have
z ∞
zX ∞
(−t2 )n 2 X (−1)n z 2n+1
Z Z
2 2 2
erf(z) = √ e−t dt = √ dt = √
π 0 π 0 n! π n=0 (2n + 1)n!
n=0
where all manipulations above are valid since the series has an infinite radius of convergence.
However, for large z the series converges very slowly, and many terms in the series are much larger
than the final result, so roundoff error affects the accuracy.
A better series can be constructed by noting
Z ∞
2 2
erf(z) = 1 − √ e−t dt.
π z
It’s not hard to see that by repeating this, we just recover the Taylor series.
Example. We would like to evaluate
Z ∞
4
I(x) = e−t dt
x
By extending the upper limit of integration to infinity, we pick up O((x)n e−x ) error terms. Also,
by interchanging the order of summation and integration, we have produced an asymptotic series,
X 1 Z ∞ X (−1)n n!
n −s
I(x) ∼ n+1
(−s) e ds = n+1
.
n
x 0 n
x
Note that we could have gotten an easier, better bound by extending the upper bound of integration
to infinity at the start, but we do things in this order to show the general technique.
116 11. Approximation Methods
• Laplace’s method is justified by Watson’s lemma: if f (t) is continuous on [0, b] and has the
asymptotic expansion
∞
X
f (t) ∼ tα an tβn
n=0
as t → 0+ , where α > −1 and β > 0, then
Z b ∞
−xt
X an Γ(α + βn + 1)
I(x) = f (t)e dt ∼
0 xα+βn+1
n=0
as x → +∞. The conditions α > −1 and β > 0 ensure the integral converges, and in the case
b = ∞ we also require f (t) = O(ect ) for some constant c at infinity. Watson’s lemma can also
be used to justify the methods below.
• In the case where the asymptotic series for f is uniformly convergent in a neighborhood of the
origin, then Watson’s lemma may be established by interchanging the order of integration and
summation. Otherwise, we cut off the sums at a finite number of terms and simply show the
error terms are sufficiently small to have an asymptotic series.
The dominant contribution comes from the maximum of φ(t), which can occur at the endpoints
or at an interior point. We’ll find only the leading contribution in each case.
• First, suppose the maximum is at t = a, and set a = 0 for simplicity. As in the example,
Z Z b
I(x) = f (x)e xφ(t)
dt + f (t)exφ(t) dt, x−1 x−1/2 .
0
Then the second term is O(e xφ0 (0) ) smaller than the first, and hence negligible if x 1.
• In the first term we assume we can expand φ(t) and f (t) in the asymptotic series
• Now the upper bound of integration can be extended to ∞ with exponentially small error, for
f (a)exφ(a)
I(x) ∼ − .
xφ0 (a)
There are also higher-order corrections which we can compute by taking higher-order terms in
the series. The overall error once these corrections are taken care of is exponentially small.
117 11. Approximation Methods
• Maxima at interior points are a bit more subtle since φ0 vanishes there. In this case suppose
the maximum is at c = 0 for simplicity, and split the integral as
Z − Z Z b
xφ(t) xφ(t)
I(x) = f (t)e dt + f (t)e dt + f (t)exφ(t) dt.
a −
As before the first and third terms are exponentially small, and negligible if x2 1, where
the different scaling occurs because the linear term φ0 (0) vanishes.
t2 00 t3
φ(t) ∼ φ(0) + φ (0) = φ000 (0) + . . . , f (t) ∼ f (0) + tf 0 (0) + . . .
2 6
√
where generically φ00 (0) 6= 0. Changing variables to s = xt,
√
x
exφ(0) √
Z
s 0 2 00 3 000
I(x) ∼ √ √
(f (0) + f (0) + . . .)es φ (0)/2+s φ (0)/6 x+... ds.
x − x x
√ √ √
For the leading term to dominate, we need x/x 1 and ( x)3 / x 1. The latter is more
stringent, and putting together our constraints gives
x−1/2 x−1/3 .
• Finally, incurring another exponentially small error by extending the integration bounds to
±∞, we conclude that s
2π
I(x) ∼ f (c)exφ(c) .
−xφ00 (c)
• The Riemann-Lebesgue lemma makes it easy to get leading endpoint contributions. For instance,
Z 1 ixt
ieix i 1 eixt
Z
e i
I(x) = dt = − + − dt
0 1+t 2x x x 0 (1 + t)2
• As in Laplace’s method, it’s more subtle to find contributions from interior points. We get a
large contribution at every point ψ 0 vanishes, since we don’t get rapid phase cancellation in
that region. Concretely, suppose the only such point is ψ 0 (c) = 0. We split the integral as
Z c− Z c+ Z b
ixψ(t) ixψ(t)
I(x) = f (t)e dt + f (t)e dt + f (t)eixψ(t) dt
a c− c+
We pick up a similar contribution from the second term. Note that unlike Laplace’s method,
these error terms are only algebraically small, not exponentially small.
x1/2
eixψ(c) s2 00 3
Z s 0 i 2 ψ (c)+i s1/2 ψ 000 (c)+...
f (c) + f (c) + . . . e 6x ds.
x1/2 −x1/2 x 1/2
When we extend the limits of integration to ±∞, we pick up O(1/x) error terms as before.
The integral can then be done by contour integration, rotating the contour to yield a Gaussian
integral, to conclude √
2πf (c)eixψ(c) e±iπ/4
I(x) = + O(1/x).
x1/2 |ψ 00 (c)|1/2
In order for this to be the leading term, it must be greater than O(1/x) and hence x−1/2 .
x−1/2 x−1/3
just as for Laplace’s method for an interior point. Unfortunately it’s difficult to improve the
approximation, because the next terms involve nonlocal contributions.
• Laplace’s method and the method of stationary phase are really just special cases of the method
of steepest descents, which is for contour integrals of the form
Z
I(x) = f (t)exφ(t) dt.
C
We might think naively that the greatest contribution comes from the maximum of Re φ, but
this is incorrect due to the rapid phase oscillations. Similarly regions with zero stationary phase
may have negligible magnitude.
• To get more insight, write φ = u + iv. The Cauchy-Riemann equations tell us that u and v are
harmonic functions with (∇u) · (∇v) = 0. Hence the landscape of u consists of hills and valleys
at infinity, along with saddle points. Assuming the contour goes to infinity, it must follow a
path where u → −∞ at infinity.
• Now consider deforming the path so that v is constant. Then the path is parallel to ∇u, so
it generically follows paths of steepest descent. Since u goes to −∞ at infinity, there must be
points where u0 = 0 in the contour, with each point giving a contribution by Laplace’s method.
Note that if we instead took u constant we would use the method of stationary phase, but this
is less useful because the higher-order terms are much harder to compute.
• In general we have some flexibility in the contour. Since the contribution is local, we only need
to know which saddle points it passes through, and which poles we cross. This is also true
computationally: switching to something close to the steepest descent contour makes numerical
evaluation much easier, but we don’t have to compute the exact contour for this to work.
• One might worry how to determine which saddle points are relevant. If all the zeroes of f
are simple, there is no problem because each saddle point is only connected to one valley; the
relevant saddle points are exactly those connected to the valley at the endpoints at infinity. We
are free in principle to deform the contour to pass through other saddle points, but we’d pick
up errors from the regions of high u that are much larger than the value of the integral.
Example. The gamma function for x 1. We may define the gamma function by
Z
1 1
= et t−x dt
Γ(x) 2πi C
where C is a contour which starts at t = −∞ − ia, encircles the branch cut which we take to lie
along the negative real axis, and ends at t = ∞ + ib. Rewriting the integrand at et−x log t , there is a
saddle at t = x. But since x is large, it’s convenient to rescale,
Z
1 1
= ex(s−log s) ds, t = xs.
Γ(x) 2πixx−1 C
Defining φ(s) = s − log s, the saddle is now at s = 1. The steepest descent contour passes through
s = 1 vertically. Near this point we have
(s − 1)2 (s − 1)3
φ(s) ∼ 1 + − + ....
2 3
√
Rescaling by u = x(s − 1) we have
ex u2 3
Z
1 − 3u√x +...
∼ √ e 2 du.
Γ(x) 2πixx−1 x
120 11. Approximation Methods
Dividing the plane into six sextants like quadrants, the integrand only decays in the first, third, and
fifth sextants, and the contour starts at infinity in the third sextant and ends at infinity in the first.
Differentiating the exponent shows the saddle points are at t = ±ix1/2 . Rescaling t = x1/2 z,
x1/2
Z
3 /2φ(z)
Ai(x) = ex , φ(x) = i(z 3 /3 + z).
2π C
The steepest descent contour goes through the saddle point z = i but not z = −i, giving
3/2
e−2x /3
Ai(x) ∼ √ 1/4 .
2 πx
Now consider Ai(−x) for x 1. In this case the saddle points are at t = ±1 and both are relevant.
Adding the two contributions gives
!
1 π 2x3/2
Ai(−x) ∼ √ 1/4 cos − .
πx 4 3
The fact that there are two different asymptotic expansions for different regimes is called the Stokes
phenomenon. If we view Ai(z) as a function on the complex plane, these regions are separated by
Stokes and anti-Stokes lines.