Invitation To Classical Analysis Compress
Invitation To Classical Analysis Compress
Invitation To Classical Analysis Compress
Sally
The UNDERGRADUATE TEXTS 17
SERIES
Invitation
to Classical
Analysis
Peter Duren
Invitation
to Classical
Analysis
Peter Duren
2010 Mathematics Subject Classification. Primary 11–01, 11B68, 26–01, 33–01, 34–01,
40E05, 40–01, 41–01, 42–01.
QA320.D87 2012
515.7–dc23
2011045853
Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy a chapter for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for such
permission should be addressed to the Acquisitions Department, American Mathematical Society,
201 Charles Street, Providence, Rhode Island 02904-2294 USA. Requests can also be made by
e-mail to reprint-permission@ams.org.
c 2012 by the American Mathematical Society. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 17 16 15 14 13 12
Jacob Bernoulli
Augustin-Louis Cauchy
Wikimedia Commons. Public Domain. Courtesy of the Archives of the Mathematisches Forschungsinstitut Ober- Wikimedia Commons. Public Domain.
wolfach. © Universität Göttingen Sammlung Sternwarte.
Leonard Euler
Bernard Bolzano
Joseph-Louis Lagrange
Wikimedia Commons. Public Domain. Wikimedia Commons. Public Domain. Wikimedia Commons. Public Domain.
Georg Cantor
Joseph Fourier
Bernhard Riemann
Archives of the Mathematisches Forschungsinstitut Oberwolfach. Wikimedia Commons. Public Domain. Wikimedia Commons. Public Domain.
Karl Weierstrass
Niels Henrik Abel
Preface xi
v
vi Contents
xi
xii Preface
more challenging, and consequently more rewarding. Hints are often pro-
vided. A few of the problems, not really exercises, come with an asterisk
and an invitation to consult the literature. References for each topic are
grouped at the end of the relevant chapter.
Historical notes are sprinkled throughout the book. To put a human
face on the mathematics, the book includes capsule scientific biographies of
the major players and a gallery of portraits. Some historical notes also shed
light on the origin and evolution of mathematical ideas. A few discussions
of physical applications serve the same purpose.
Many friends and colleagues helped to shape the book. I am especially
indebted to Dick Askey, Martin Chuaqui, Dima Khavinson, Jeff Lagarias,
Hugh Montgomery, Brad Osgood, Bill Ross, and Harold Shapiro for math-
ematical suggestions and encouragement as the writing progressed. Alex
Lapanowski, a student in the undergraduate honors program at Michigan,
read large parts of the manuscript and made valuable suggestions. Dragan
Vukotić and his students Irina Arévalo and Diana Giraldo also read portions
of the manuscript and spotted a number of small errors. I am enormously
grateful to David Ullrich, who read the manuscript carefully, checked many
of the exercises, pointed out errors and inaccuracies, and suggested impor-
tant improvements in the exposition. In particular, he devised the relatively
simple proof of Hilbert’s inequality presented in Chapter 4. Thanks to all of
these readers the book is better, but any remaining faults are the author’s
responsibility.
In the final stages of preparation the AMS production staff made expert
contributions to the book. Special thanks go to the editor Ina Mette for her
continual encouragement and willingness, for better or worse, to accommo-
date my peculiar wishes in regard to content and format. Finally, I must
acknowledge the important role of my wife Gay. Without her support the
book would not have been written.
Peter Duren
Chapter 1
Basic Principles
1
2 1. Basic Principles
n(n + 1)
1 +2 +3 + ···+ n = .
2
1(1+1)
Then P1 is clearly true, since 2 = 1. Suppose now that Pn is true for
some n ∈ N. Then
n(n + 1) (n + 1)(n + 2)
1 + 2 + 3 + · · · + n + (n + 1) = +n+1= ,
2 2
which says that Pn+1 is true. Thus the formula holds for all n ∈ N.
To give another example, let Pn be the inequality n2 ≤ 2n . The inequal-
ity is easily verified for n = 1 and n = 2, but it is false for n = 3, since
32 = 9 > 8 = 23 . However, it is again true for n = 4, and an inductive
argument shows that it remains true for all n ≥ 4. Indeed, suppose that
n2 ≤ 2n for some n ≥ 4. Then
2
n+1 1 2 n
2
(n + 1) = n ≤ 1+
2
2 ≤ 2n+1 ,
n n
which shows that the truth of Pn implies that of Pn+1 , provided that n ≥ 4.
Therefore, since P4 is true, it follows that Pn is true for all n ≥ 4.
Other applications of induction appear in exercises at the end of this
chapter.
common divisor; that is, if there is no integer k ≥ 2 such that m/k and n/k
are integers. The set of all rational numbers is denoted by Q.
The symbol R denotes the set of all real numbers. We will take for
granted the existence of the system of real numbers with its familiar prop-
erties of addition and multiplication. It is well known that the rational
numbers are everywhere dense in R. In other words, if a, b ∈ R and a < b,
then a < r < b for some r ∈ Q.
However,
√ not every real number is rational. To confirm this, let us show
that 2 is irrational; more precisely, there is no rational number r such that
r2 = 2. Suppose, for purpose of contradiction, that some rational number
r has the property r2 = 2. We may take r > 0 and express it in lowest
terms by r = m/n, where m, n ∈ N. Then r2 = 2 implies m2 = 2n2 , so
that m2 is even. Therefore, m is even and so m2 = (2k)2 = 4k 2 for some
k ∈ N. Hence n2 = 2k 2 and n2 is even, so that n is also even. Since m and
n are both even, they have a common divisor 2, contrary to our assumption
that r = m/n is in lowest terms. This contradiction
√ shows that no rational
2
number r can have the property r = 2. Thus 2 is irrational.
Like the rationals, the irrational numbers are everywhere dense in R. In
fact,
√ it follows from the density of the rationals that the rational multiples
of 2 are everywhere dense.
A number x0 ∈ R is said to be algebraic if it satisfies an equation
P (x) = a0 + a1 x + a2 x2 + · · · + an xn = 0
Every positive rational number must appear exactly once in the sequence.
This proves that Q is a countable set.
On the other hand, the set R of all real numbers is not countable. To
see this, it suffices to show that the real numbers in the interval 0 ≤ x < 1
do not form a countable set. The idea of the proof is quite ingenious and
is due to Georg Cantor (1845–1918), the founder of modern set theory.
(More of Cantor’s ideas will be discussed in Chapter 12.) Cantor’s proof
proceeds as follows. Suppose, for purpose of contradiction, that the set
[0, 1) = {x ∈ R : 0 ≤ x < 1} is countable, so that all of its members can be
arranged in a sequence
Here xn = 0.dn1 dn2 dn3 . . . is the decimal expansion of the number xn , chosen
to end in a sequence of 0’s (not 9’s) in cases of ambiguity. We will reach
a contradiction by displaying a number in [0, 1) that is not on the list.
If 0 ≤ dnn ≤ 4, set bn = 7. If 5 ≤ dnn ≤ 9, set bn = 2. Then the
number y = 0.b1 b2 b3 . . . lies in [0, 1) but does not appear in the purported
enumeration. Indeed, y = xn for each n because the nth digit of its decimal
expansion differs from that of xn by at least 3, so that |y − xn | > 10−n .
This shows that no sequence of numbers in [0, 1) can exhaust the entire
interval, and so [0, 1) is uncountable. Therefore, the set of all real numbers
is uncountable.
1.3. Completeness principles 5
The real number system has a very important property, known as com-
pleteness, that manifests itself in various ways and lies at the very founda-
tion of analysis. One form of completeness is the least upper bound principle,
whose statement requires us first to recall some terminology. A nonempty
set S ⊂ R is said to be bounded above if there is a number y ∈ R such that
x ≤ y for all x ∈ S. Such a number y is called an upper bound of S. A
number M is called the least upper bound or supremum of S if M is an upper
bound and M ≤ y for every upper bound y. This is indicated by writing
M = sup S. Similarly, a set S ⊂ R is bounded below if there is a number
y ∈ R, called a lower bound of S, such that y ≤ x for all x ∈ S. A number m
is called the greatest lower bound or infimum of S if y ≤ m for every lower
bound y. For this we write m = inf S. A set S ⊂ R is said to be bounded if
it is bounded both above and below.
in some sense equivalent to it. We begin with sequences of real numbers and
the concept of convergence.
A sequence of real numbers can be defined as a mapping from N into
R, a real-valued function defined on the positive integers. It is customarily
written as {xn } = {x1 , x2 , x3 , . . . }, where xn ∈ R for each n ∈ N. The se-
quence is said to converge to a limit L ∈ R if to each ε > 0 there corresponds
a number N such that |xn − L| < ε for all n ≥ N . This is tantamount to
saying that for each prescribed discrepancy ε the numbers xn are within
distance ε of L for all but a finite number of indices n. It is easy to see
that a sequence can have at most one limit. The range of a sequence is its
set of values {xn : n ∈ N}. A sequence is said to be bounded if its range
is a bounded set. Every convergent sequence is bounded, but the converse
is false. The convergence of a sequence {xn } to a limit L is indicated by
writing L = limn→∞ xn or simply xn → L as n → ∞. A sequence that does
not converge is called divergent.
If xn → L and yn → M , then (xn + yn ) → L + M and xn yn → LM . If
also M = 0, then yn = 0 for all n sufficiently large and xn /yn → L/M .
More generally, the notation xn → +∞ (or simply xn → ∞) as n → ∞
indicates that to each a ∈ R there corresponds a number N such that xn > a
for all n ≥ N . This is expressed by saying that the sequence {xn } tends to
+∞. Thus it diverges in a specific way. Similarly, the notation xn → −∞
means that for each a ∈ R there is a number N such that xn < a for all
n ≥ N.
A sequence is called monotonic if it is either nondecreasing or nonin-
creasing; that is, if either
x1 ≤ x2 ≤ x3 ≤ . . . or x1 ≥ x2 ≥ x3 ≥ . . . .
|α − β| ≤ |α − an | + |an − bn | + |bn − β|
= |α − an | + 2−n (b − a) + |bn − β| < ε
for all n sufficiently large. Let ξ denote the common value of α and β, so
that
lim an = lim bn = ξ .
n→∞ n→∞
The final step in the proof is to show that ξ is a cluster point of S. But
for each given ε > 0 we see that [an , bn ] ⊂ [ξ − ε, ξ + ε] for all n sufficiently
large. Indeed, since {an } increases to ξ and {bn } decreases to ξ, there exists
8 1. Basic Principles
{x ∈ R : 0 < |x − ξ| < ε}
Proof. The proof will consist of three main steps. The first step is to show
that every Cauchy sequence is bounded. To prove this, choose for instance
ε = 1 and refer to the definition of a Cauchy sequence to conclude that for
some integer N the inequality |xn − xN | < 1 holds for all n > N . Therefore,
for all n > N . But this implies that the sequence is bounded, since it follows
1.3. Completeness principles 9
that
|xn | ≤ max |x1 |, |x2 |, . . . , |xN |, |xN | + 1
for all n ∈ N.
The second step is to apply the sequential form of the Bolzano–
Weierstrass theorem. Since a Cauchy sequence is bounded, it must have
a convergent subsequence {xnk }.
The final step is to observe that if a Cauchy sequence has a convergent
subsequence, then in fact the full sequence is convergent. Indeed, if xnk → L
as k → ∞, then for each ε > 0 we see that
ε
|xn − L| ≤ |xn − xnk | + |xnk − L| < |xn − xnk | +
2
for all indices nk sufficiently large. But because {xn } is a Cauchy sequence,
it then follows that
ε ε
|xn − L| < + = ε
2 2
for all n sufficiently large. This shows that xn → L as n → ∞, which
completes the proof.
Nr (x0 ) = {x ∈ R : |x − x0 | < r}
is open. The union of any collection of open sets is open. The intersection of
any finite collection of open sets is open. Similarly, an arbitrary intersection
of closed sets is closed, and every finite union of closed sets is closed. It can
be proved (cf. Exercise 11) that a set is closed if and only if it contains all
of its cluster points.
Nested Sets Theorem. Let F1 , F2 , . . . be a sequence of nonempty closed
nested in the sense that Fn+1 ⊂ Fn
bounded sets of real numbers which are
for every n ∈ N. Then the intersection ∞ n=1 Fn is nonempty.
The theorem says that under the stated hypotheses the sets must con-
tain some common element. This may seem obvious, but if either of the
10 1. Basic Principles
Nr (p) = {x ∈ R : |x − p| < r}
F = {Nr (p) : p ∈ Q , r ∈ Q} .
The theorem says that whenever a closed bounded set of real numbers
is contained in the union of some collection of open sets, then it is actually
contained in the union of finitely many of those sets. The statement be-
comes false if either of the adjectives “closed” or “bounded” is omitted. For
instance, the collection
C = {( n1 , 1) : n ∈ N}
is an open covering of the open set (0, 1), and in fact ∞ 1
n=1 ( n , 1) = (0, 1), yet
no finite subcollection of C is a covering of (0, 1). To give another example,
the collection
C = {(−n, n) : n ∈ N}
is an open covering of any set S ⊂ R, but no finite subcollection of C can be
a covering of any unbounded set S.
Proof of theorem. We will show first that every countable open covering
of a closed bounded set contains a finite subcovering. Thus we assume that
a closed bounded set S is contained in the union of a countable family of
open sets G1 , G2 , . . . . In other words,
∞
S⊂ Gk .
k=1
n
We want to show that S ⊂ k=1 Gk for some n ∈ N. Let Hn = nk=1 Gk .
Then each set Hn is open and we see that H1 ⊂ H2 ⊂ H3 ⊂ . . . and S ⊂
∞
where n ∈ N and
n n! n(n − 1) · · · (n − k + 1)
= =
k k!(n − k)! k!
lim rn = 0 .
n→∞
Proof. Since (−r)n = (−1)n rn , it suffices to suppose that 0 < r < 1. Write
1
r = 1+a , where a > 0. Then na ≤ (1 + a)n and it follows that
1 1
0 < rn = n
≤ →0 as n → ∞ .
(1 + a) na
√ √
Proof. Since n 1/a = 1/ n a, it suffices to consider a > 1. Then n a > 1,
√
and we can write n a = 1 + hn for some hn > 0. It follows that
a
a = (1 + hn )n ≥ n hn , so that 0 < hn < → 0.
n
√
This shows that hn → 0 as n → ∞, which says that n
a → 1 as n → ∞.
Example 3.
√
n
lim n = 1.
n→∞
√
n
√
Proof. Again n > 1, so we can write n n = 1 + hn for some hn > 0.
Then
n(n − 1) 2 n(n − 1) 2
n = (1 + hn )n ≥ 1 + nhn + hn > hn , for n ≥ 2 ,
2! 2
Example 4. For each pair of numbers p > 0 and r with |r| < 1,
lim np rn = 0 .
n→∞
For an arbitrary sequence {xn } of real numbers, the upper and lower
limits
lim sup xn and lim inf xn
n→∞ n→∞
will now be defined and characterized in three equivalent ways. We will see
that
lim sup xn = lim inf xn = lim xn
n→∞ n→∞ n→∞
if the sequence {xn } converges. With suitable interpretation, the same will
be true if xn → +∞ or xn → −∞. However, the “lim sup” and “lim inf ”
are more general quantities that help to specify the eventual behavior of
divergent sequences that do not tend to +∞ or −∞.
First some conventions for unbounded sequences. If {xn } is not bounded
above, we write
lim sup xn = +∞ .
n→∞
lim inf xn = −∞ .
n→∞
for every sequence {xn }, with the usual interpretation of the symbol “≤” if
either of the quantities is infinite. Equality occurs if and only if the sequence
{xn } converges or tends to +∞ or −∞. For bounded sequences, equality
16 1. Basic Principles
occurs if and only if the set S contains only one number, which must then
be the limit of the sequence.
Another simple consequence of the definitions is that
Theorem. If a sequence {xn } is bounded above and does not tend to −∞,
then the number
β = lim sup xn = sup S
n→∞
Proof. We need only discuss the description of “lim sup”. Let us show
first that β = lim supn→∞ xn satisfies the conditions (i) and (ii). If (i)
fails for some ε > 0, then the inequality xn ≥ β + ε holds for infinitely
many indices n. This implies that xnk ≥ β + ε for some subsequence {xnk }.
But by hypothesis {xn } is bounded above, so the subsequence {xnk } is also
bounded above. Hence by the Bolzano–Weierstrass theorem it has a further
subsequence that converges to a limit λ ≥ β + ε. Thus λ belongs to the
set S of subsequential limits of {xn } and λ > β, which contradicts the fact
that β = sup S is an upper bound for S. This shows that β satisfies (i).
If (ii) fails to hold for some ε > 0, then xn ≤ β − ε for all but a finite
number of indices n, and so β − ε is an upper bound for S, which violates
1.4. Numerical sequences 17
the definition of β as the least upper bound of S. Thus we have proved that
β = lim supn→∞ xn satisfies both (i) and (ii). Conversely, it is clear that at
most one number β can satisfy both (i) and (ii) for each ε > 0, so that a
number with this property must be lim supn→∞ xn .
The proof for “lim inf ” can be carried out in a similar way, or the result
can be deduced from the relation lim inf n→∞ xn = − lim supn→∞ (−xn ) .
The condition (i) for “lim sup” is equivalent to saying that for each
ε > 0 there exists a number N (depending on ε) such that xn < β + ε for
all n ≥ N . A similar remark applies to “lim inf ”.
For a third formulation of upper and lower limits, one can show that
lim sup xn = lim sup xk and lim inf xn = lim inf xk ,
n→∞ n→∞ k≥n n→∞ n→∞ k≥n
and observe that yn+1 ≤ yn , whereas zn+1 ≥ zn . Thus both limits exist by
the monotone boundedness theorem. It is left as an exercise to identify the
limits as lim sup xn and lim inf xn , respectively.
The following theorem is useful in the study of infinite series.
Theorem. For any sequence {rn } of positive numbers,
√ rn+1
(i) lim sup n rn ≤ lim sup
n→∞ n→∞ rn
and
rn+1 √
(ii) lim inf ≤ lim inf n rn .
n→∞ rn n→∞
Proof. We will treat only the inequality (i). (The proof of the inequal-
ity (ii) is similar and is left as an exercise.) We may suppose that β =
lim supn→∞ rn+1 /rn is finite. Then for each ε > 0 there is a number N such
that 0 < rn+1 /rn < β + ε for all n ≥ N . For n > N it follows that
rN +1 rN +2 rn
rn = rN · · ··· < rN (β + ε)n−N , and so
rN rN +1 rn−1
√n
rn < n rN (β + ε)−N (β + ε) < β + 2ε
√
for all n sufficiently large, since n a → 1 for each fixed a > 0. It follows that
√
lim sup n rn ≤ β + 2ε for each ε > 0 ,
n→∞
√
which implies that lim supn→∞ n rn ≤ β.
18 1. Basic Principles
n
1 − rn+1
sn = rk = , r = 1 .
1−r
k=0
The series converges if and only if |r| < 1, in which case its sum is s = 1
1−r .
If a series converges, then an = sn − sn−1 → s − s = 0 as n → ∞. In
other words, a necessary condition for convergence of an infinite series ak
is that its general term ak tend to zero as k → ∞. However, this condition
is far from sufficient.
Infinite series with terms of constant sign are easiest to analyze. If ak ≥ 0
for all k, then sn ≤ sn+1 and the partial sums form a nondecreasing sequence.
By the monotone boundedness theorem, then, the series converges if and
only if its partial sums are bounded above. For an example, let ak = 1/k
and observe that
1 1 1 1 1 1 1
s2m = 1 + + + + + + + + ...
2 3 4 5 6 7 8
1 1
+ + ··· + m
2m−1 + 1 2
1 1 1 1 m
≥ 1 + + 2 · + 4 · + · · · + 2m−1 · m = 1 + →∞
2 4 8 2 2
as m → ∞, which shows that the series ∞ k=1 1/k diverges. On the other
hand, if ak = 1/k 2 , then
1 1 1 1 1 1
s2m = 1 + + + + + + + ...
22 32 42 52 62 72
1 1 1
+ + ··· + m + 2m
2 2(m−1) (2 − 1) 2 2
1 1 1 1
≤ 1 + 2 · 2 + 4 · 2 + · · · + 2m−1 · 2(m−1) + 2m
2 4 2 2
1 1 1 1
= 1 + + + · · · + m−1 + 2m < 2 ,
2 4 2 2
∞
so the sequence {sn } of partial sums is bounded and the series ∞ k=1p 1/k
2
converges. In similar manner it can be seen that the series k=1 1/k con-
verges if p > 1 and diverges if p ≤ 1.
Series with positive decreasing terms are often analyzed most easily by
comparison with an integral. To be more specific, suppose f (x) is a positive
1.5. Infinite series 19
n n
n−1
f (k) ≤ f (x) dx ≤ f (k) , n = 2, 3, . . . ,
k=2 1 k=1
∞
which shows that the sum ∞ k=1 f (k) and the integral 1 f (x) dx converge
or diverge together.
A series ak is said to be absolutely convergent if |ak | is convergent.
Root Test. The infinite series ak is convergent if
lim sup k
|ak | < 1 and divergent if lim sup k
|ak | > 1 .
k→∞ k→∞
20 1. Basic Principles
Proofs. In view of the theorem at the end of Section 1.4, the validity of
the ratio test follows from that of the root test. More specifically, the con-
dition for convergence in the ratio test implies that in the root test, and the
conditions for divergence are similarly related. Therefore, it will suffice to
prove the validity of the root test.
Let β = lim supk→∞ k |ak | and suppose that β < 1. Then for any
number r with β < r < 1 there is an index N such that k |ak | ≤ r for all k ≥
N . Thus |ak | ≤ rk for all k ≥ N and the series ak is absolutely
kconvergent,
hence convergent, by comparison with the geometricseries r . If β > 1,
choose any number ρ with 1 < ρ < β and note that |ak | > ρ for infinitely
k
Although the root test is more powerful than the ratio test, the latter is
often more convenient to apply, especially when factorials are involved.
The question of convergence becomes more delicate when the series is
not absolutely convergent. The basic theorem on series with alternating
signs goes back to Leibniz and is a useful test for convergence. Gottfried
Wilhelm Leibniz (1646–1716) is known as a founder of calculus, along with
Isaac Newton (1643–1727).
Leibniz’ Alternating Series Theorem. Suppose that a1 ≥ a2 ≥ a3 ≥
· · · ≥ 0 and limk→∞ ak = 0 . Then the infinite series
∞
(−1)k+1 ak = a1 − a2 + a3 − a4 + . . .
k=1
is convergent.
Proof. Let
n
sn = (−1)k+1 ak
k=1
denote the partial sums. The strategy is to apply the monotonicity to show
that each of the subsequences {s2n } and {s2n+1 } is convergent, then to use
the hypothesis that ak → 0 to see that the two limits agree.
Observe first that
s2 ≤ s4 ≤ s6 ≤ s8 ≤ . . . .
1.5. Infinite series 21
Similarly,
s2n+3 = s2n+1 − (a2n+2 − a2n+3 ) ≤ s2n+1 ,
so that
s1 ≥ s3 ≥ s5 ≥ s7 ≥ . . . .
In other words, the sequence {s2n } is nondecreasing, whereas the sequence
{s2n+1 } is nonincreasing. To see that both subsequences are bounded, note
that
s2 ≤ s2n ≤ s2n + a2n+1 = s2n+1 ≤ s1 .
Therefore, by the monotone boundedness theorem, each of the subsequences
{s2n } and {s2n+1 } is convergent. Let
Now invoke the hypothesis that ak → 0 to show that the two limits agree:
From this it is easy to deduce that the full sequence {sn } converges to the
same limit s.
where the anm are real numbers. Such a double ∞series is said to con-
verge to the sum s if for each fixed n the series m=1 anm converges to
asum An and ∞ n=1 A n = s. The double series is absolutely convergent if
∞ ∞
n=1 |a
m=1 nm | is convergent.
22 1. Basic Principles
∞
∞ ∞
∞
anm or anm
n=1 m=1 m=1 n=1
is absolutely convergent, then both are convergent and their sums are equal.
Proof of theorem.
∞ Suppose first that anm ≥ 0 for all n, m = 1, 2, . . . , and
∞
assume
∞ that n=1 m=1 anm is (absolutely)
∞ convergent. In other words,
m=1 anm = An < ∞ for each n, and n=1 An converges. Then anm ≤ An
and so
∞ ∞
Bm = anm ≤ An < ∞ for each m .
n=1 n=1
∞
Also, an1 +an2 +· · ·+anm ≤ An , so that B1 +B2 +· · ·+Bm ≤ n=1 An < ∞.
It follows that
∞
∞ ∞
∞
∞
∞
B= anm = Bm ≤ An = anm = A .
m=1 n=1 m=1 n=1 n=1 m=1
as claimed.
Now set δ = 12 min{δ(x1 ), δ(x2 ), . . . , δ(xn )}, and observe that δ > 0. Choose
arbitrary points x, t ∈ S with |x − t| < δ. Then x ∈ J(xk ) for some k, since
the sets J(x1 ), . . . , J(xn ) form a covering of S. Hence x ∈ I(xk ). But the
point t also belongs to the interval I(xk ), in view of the inequality
Because the points x and t belong to the same interval I(xk ), we conclude
that
ε ε
|f (x) − f (t)| ≤ |f (x) − f (xk )| + |f (xk ) − f (t)| < + = ε,
2 2
as desired. This proves that f is uniformly continuous on S.
if to each ε > 0 there corresponds a δ > 0 such that |f (x) − L| < ε for all
x ∈ S with 0 < a − x < δ.
A limit of special importance is the derivative
f (x) − f (x0 )
f (x0 ) = lim ,
x→x0 x − x0
where f is a function defined in an open interval containing the point x0 .
If the derivative exists, the function f is said to be differentiable at the
point x0 . A function is said to be differentiable on an open set S if it has
a derivative at each point of S. One-sided derivatives can be defined as
one-sided limits of the difference quotient.
The familiar rules of differentiation are easily deduced. The derivative
of the sum of two differentiable functions is the sum of their derivatives,
26 1. Basic Principles
the product f (x)g(x) has derivative f (x)g(x) + f (x)g (x), and the quotient
f (x)/g(x) has derivative
f (x)g(x) − f (x)g (x)
g(x)2
wherever g(x) = 0. The “chain rule” for differentiation of the composition
h(x) = f (g(x)) is more subtle but is found to be h (x) = f (g(x))g (x).
Every student of calculus learns that the maximum and minimum of
a function are likely to occur at a critical point; that is, at a point where
the derivative vanishes. This principle has a more precise statement, and
it applies more generally to local maxima and minima. If a function f is
defined on an open set S containing a point x0 , we say that it has a local
maximum at x0 if f (x) ≤ f (x0 ) for all x in some neighborhood
Nδ (x0 ) = {x ∈ R : |x − x0 | < δ} ⊂ S .
A local minimum is defined similarly. Here is the precise statement.
Theorem. If a function f is defined on an open set S and has a local
maximum or local minimum at a point x0 ∈ S, and if f is differentiable at
x0 , then f (x0 ) = 0.
Proof. It suffices to consider a local minimum. By hypothesis, f (x) ≥
f (x0 ) for all x in some neighborhood Nδ (x0 ). Consequently, the difference
quotient
f (x) − f (x0 )
g(x) = , x = x0 ,
x − x0
defined in Nδ (x0 ) except at x0 , satisfies the inequalities g(x) ≥ 0 for x > x0
and g(x) ≤ 0 for x < x0 . Since f is differentiable at x0 , it follows that
f (x0 ) = lim g(x) ≥ 0 and f (x0 ) = lim g(x) ≤ 0 .
x→x0 + x→x0 −
Hence f (x 0) = 0, as claimed.
f (b) − f (a)
f (ξ) = for some ξ ∈ (a, b) .
b−a
so that L(a) = f (a) and L(b) = f (b). Then apply Rolle’s theorem to the
function g(x) = f (x) − L(x).
The mean value theorem can be applied to show that a function with
zero derivative throughout an interval must be constant. More generally, if
the derivatives of two functions coincide in some interval, the two functions
must differ by a constant. Another corollary of the mean value theorem is
that a function with positive derivative in an interval is strictly increasing.
Similarly, a function with negative derivative is strictly decreasing.
n
n
U (f, P ) = Mk (xk − xk−1 ) and L(f, P ) = mk (xk − xk−1 )
k=1 k=1
28 1. Basic Principles
In other words, the supremum of the lower sums taken over all partitions is
less than or equal to the infimum of the upper sums. If equality occurs, the
function f is said to be Riemann integrable over the interval [a, b], and its
integral is defined to be the common value, written as
b
f (x) dx = sup L(f, P ) = inf U (f, P ) .
a P P
Equivalently, we can say that f is Riemann integrable if and only if for each
ε > 0 there exists a partition P such that U (f, P ) − L(f, P ) < ε.
Not every bounded function is Riemann integrable. Consider, for in-
stance, the function defined by f (x) = 1 if x is rational and f (x) = 0 if x
is irrational. Then L(f, P ) = 0 and U (f, P ) = b − a for every partition P ,
since both the rationals and the irrationals are everywhere dense. Thus f is
not Riemann integrable.
On the other hand, every continuous function on an interval [a, b] is
integrable. To see this, we apply the theorem (proved in the previous section)
that every continuous function on a closed bounded interval is uniformly
continuous there. Thus the pointwise continuity of a function f implies that
for each ε > 0 there is a δ > 0 such that |f (x) − f (t)| < ε/(b − a) for all
pairs of points x, t ∈ [a, b] with |x − t| < δ. Now choose a partition P with n
points and all subintervals of length less than δ. Then Mk − mk ≤ ε/(b − a)
and so
n
U (f, P ) − L(f, P ) = (Mk − mk )(xk − xk−1 ) ≤ ε ,
k=1
The product f g is also integrable. If f and g are integrable and f (x) ≤ g(x)
on [a, b], then
b b
f (x) dx ≤ g(x) dx .
a a
In particular, the inequality
b b
f (x) dx ≤ |f (x)| dx
a a
Proofs are omitted here, but can be found in introductory texts such as Ross
[7].
It is often possible to calculate an integral directly from the definition.
The technique was known to Archimedes, who essentially applied a limiting
process to calculate the volumes of geometric solids such as the cone and
sphere. As a simple illustration, let us calculate the integral of f (x) = x2
over the interval [0, 1]. Let Pn denote the partition of [0, 1] with n subinter-
vals of equal length 1/n. Then
n
1 k − 1 2 (n − 1)(2n − 1)
n
L(f, Pn ) = f (xk−1 )(xk − xk−1 ) = =
n n 6n2
k=1 k=1
and
2
1
n n
k (n + 1)(2n + 1)
U (f, Pn ) = f (xk )(xk − xk−1 ) = = ,
n n 6n2
k=1 k=1
30 1. Basic Principles
where the formula for the sum of squares (cf. Exercise 1) has been invoked.
Since both L(f, Pn ) and U (f, Pn ) converge to 1/3 as n → ∞, we conclude
1
that 0 x2 dx = 1/3.
Riemann’s original definition of the integral was based not on upper
and lower sums but on “Riemann sums”, as they are now called. Given a
partition P of [a, b] in the form a = x0 < x1 < x2 < · · · < xn = b, select an
arbitrary point ξk in each subinterval [xk−1 , xk ] and let ξ = (ξ1 , ξ2 , . . . , ξn ).
Then the corresponding Riemann sum is
n
S(f, P, ξ) = f (ξk )(xk − xk−1 ) .
k=1
The intermediate value theorem (cf. Section 1.6) for continuous func-
tions can be applied to establish a corresponding theorem for integrals.
which shows that F is (uniformly) continuous in [a, b]. For x0 ∈ (a, b), write
x
F (x) − F (x0 ) 1
−f (x0 ) = f (t)−f (x0 ) dt , x ∈ (a, b), x = x0 .
x − x0 x − x0 x0
Let ε > 0 be given. If f is continuous at x0 , then for some δ > 0 the
inequality |f (x) − f (x0 )| < ε holds for all x ∈ [a, b] with |x − x0 | < δ.
Therefore,
F (x) − F (x0 )
− f (x0 ) < ε if 0 < |x − x0 | < δ ,
x − x0
which proves that F is differentiable at x0 and F (x0 ) = f (x0 ).
32 1. Basic Principles
(b). Since g is integrable, for each ε > 0 there is a partition P such that
U (g , P ) − L(g , P ) < ε .
On the other hand, an application of the mean value theorem to the subin-
terval [xk−1 , xk ] of this partition P gives
Because ε > 0 was prescribed arbitrarily, this gives the desired result.
|f (x) − f (x0 )| ≤ |f (x) − fn (x)| + |fn (x) − fn (x0 )| + |fn (x0 ) − f (x0 )|
holds for all x ∈ S. By the uniform convergence, we infer that there exists
a number N (depending only on ε) such that
ε ε
|f (x) − f (x0 )| < + |fn (x) − fn (x0 )| + for all x ∈ S
3 3
whenever n ≥ N . Choose any n ≥ N and apply the continuity of fn to
conclude that for some δ > 0 the inequality |f (x) − f (x0 )| < ε holds for all
x ∈ S with |x − x0 | < δ. Thus f is continuous at x0 .
34 1. Basic Principles
An infinite series ∞ n=1 un (x) of functions defined on a set S ⊂ R is
understood to be uniformly
convergent on S to a sum s(x) if its sequence
of partial sums sn (x) = nk=1 uk (x) converges uniformly to s(x) on S. The
previous theorem has an equivalent expression for series.
Corollary. If an infinite series ∞ n=1 un (x) of continuous functions con-
verges uniformly on a set S, then its sum is continuous on S.
When a sequence of integrable functions fn (x) converges pointwise to
f (x) on an interval [a, b], the limit function f need not be integrable. If
b
f is integrable, it may well happen that the integrals a fn (x) dx do not
b
converge to a f (x) dx. Consider, for example, fn (x) = ngn (x), where gn (x)
is the “triangle function” defined above. The sequence {fn (x)} converges
pointwise (but not uniformly) on [0, 1] to f (x) ≡ 0, yet
1 1
fn (x) dx = 1 for every n, whereas f (x) dx = 0 .
0 0
A similar example is given in Exercise 31. The next theorem shows, however,
that integrals behave as expected under uniform convergence.
Theorem. If the functions f1 , f2 , . . . are integrable over an interval [a, b]
and fn (x) → f (x) uniformly on [a, b], then f is integrable and
b b
f (x) dx = lim fn (x) dx .
a n→∞ a
Proof. The continuity of f and ∂f∂x on the compact set R ⊂ R implies that
2
are continuous on the interval [a, b]. Therefore, for each point x ∈ [a, b] we
have
x x d d x
∂f ∂f
Ψ(t) dt = (t, y) dy dt = (t, y) dt dy
a a c ∂t c a ∂t
d
= f (x, y) − f (a, y) dy = Φ(x) − Φ(a) .
c
so that Φ (x) = Ψ(x). Further details and related results can be found, for
instance, in the book by Burkill and Burkill [2].
if n, m ≥ N .
1.8. Uniform convergence 37
Proof. If {fn (x)} is a uniform Cauchy sequence on S, then clearly {fn (x0 )}
is a numerical Cauchy sequence for each x0 ∈ S. Thus by the Cauchy
criterion, the sequence {fn (x)} converges pointwise on S to a function f (x).
To see that the convergence is uniform, apply the uniform Cauchy property:
n
sn (x) = uk (x) , n = 1, 2, . . . .
k=1
if m is sufficiently large, since the series ∞k=1 Mk converges. Thus {sn (x)}
is a uniform Cauchy sequence, and so it converges uniformly on S.
Dini’s Theorem. Suppose that the functions f1 (x), f2 (x), . . . are contin-
uous on a closed bounded interval [a, b], and f1 (x) ≥ f2 (x) ≥ · · · ≥ 0 for
all x ∈ [a, b]. If fn (x) → 0 pointwise for each x ∈ [a, b], then fn (x) → 0
uniformly in [a, b].
Proof. By assumption, either fn (x) ≤ fn+1 (x) ≤ f (x) or fn (x) ≥ fn+1 (x) ≥
f (x). Define gn (x) = f (x) − fn (x) or gn (x) = fn (x) − f (x), respectively, to
obtain a sequence of continuous functions gn (x) that decreases pointwise to
0. The theorem then says that gn (x) → 0 uniformly in [a, b] as n → ∞.
Proof. Let ε > 0 be given. Suppose first that 0 ≤ fn (x) ≤ f (x). Since f is
integrable, we can choose a number R > 0 for which
−R ∞
ε
0≤ f (x) dx + f (x) dx < ,
−∞ R 2
1.9. Historical remarks 39
for all n. On the other hand, fn (x) → f (x) uniformly on [−R, R] by Dini’s
theorem, so that
R
ε
0≤ [f (x) − fn (x)] dx <
−R 2
for all n sufficiently large. Addition of these inequalities now shows that
∞
0≤ [f (x) − fn (x)] dx < ε
−∞
for sufficiently large n, which is the desired result. The case where 0 ≤
f (x) ≤ fn (x) is treated in similar fashion, with the integrability of f1 allow-
ing the reduction to a finite interval [−R, R].
Proof. Since un (x) ≥ 0, the partial sums sn (x) increase to s(x), so the
conclusion follows from Corollary 2.
for later treatments. In those texts Euler made functions, not curves, the
primary objects of study. However, his definition of a function was rather
restrictive and he accepted without question the notion of an “infinitely
small” quantity known as a differential. His proofs and derivations of formu-
las tended to be nothing more than plausible arguments, not at all rigorous
by modern standards. Joseph-Louis Lagrange (1736–1813), best known for
his fundamental work in mechanics, tried to free calculus from differentials
and limits by placing heavy emphasis on power series. His attempt was not
successful, but he did give the first clear analysis of Taylor series expansion.
In 1821, Augustin-Louis Cauchy (1789–1857) published the first rigorous
textbook in calculus, a Cours d’analyse written for his students at the École
Royale Polytechnique in Paris. There he gave clear definitions of limits and
continuity, essentially the definitions that are commonly accepted today. He
formulated the notion of a Cauchy sequence and he introduced the defi-
nition of integral that was later refined and completed by Riemann. The
book is a landmark in the history of mathematics. However, Cauchy failed
to distinguish pointwise continuity from uniform continuity, and pointwise
convergence from uniform convergence, making some of his arguments falla-
cious. Cauchy, like Euler, published a huge volume of work in many areas of
mathematical science. Perhaps his greatest achievement was to develop the
theory of functions of a complex variable, including the calculus of residues.
The dominant figure in the second half of the 19th century was Karl
Weierstrass (1815–1897). His career was truly remarkable. For some years
he taught at a Gymnasium, a German secondary school, carrying out high-
level mathematical research in obscurity. Then at age 39 he published a
groundbreaking paper in Crelle’s Journal. It was an immediate sensation.
Two years later, Weierstrass had moved to a professorship at the University
of Berlin. There his lectures contained many original results and attracted
students from all over the world, many of whom became outstanding math-
ematicians in later years. His lectures on introductory analysis perfected
the program that Cauchy had initiated, bringing new rigor and clarity to
calculus while introducing further results.
Bernard Bolzano (1781–1848), a Bohemian monk in Prague, was a con-
temporary of Cauchy who independently obtained many of the same results
in the foundations of analysis. In 1817 he published a proof of the inter-
mediate value theorem, often called “Bolzano’s theorem”. In the same long
paper, Bolzano introduced the least upper bound principle and “Cauchy’s
criterion” for convergence of a sequence. However, his paper appeared in
an obscure journal published in Prague, far from the scientific centers of
the day, and there is no evidence that Cauchy knew of Bolzano’s ideas be-
fore writing his own Cours d’analyse a few years later. As suggested by
1.10. Metric spaces 41
The leading example is (R, d), the set of real numbers equipped with
the usual metric, or distance function, d(x, y) = |x − y|. A more general
example is (Rn , dn ), the n-dimensional Euclidean space with metric
1/2
n
dn (x, y) = (xk − yk )2 , x = (x1 , . . . , xn ) , y = (y1 , . . . , yn ) .
k=1
The first of these two metric spaces is complete, but the second is not.
An extensive theory of metric spaces has been developed, and there are
many examples important in analysis. Further information can be found in
the book by Rudin [8] and in other texts.
The need for an extension of the real number system is already apparent
in the problem of solving a quadratic equation ax2 + bx + c = 0, where
a, b, c ∈ R and a = 0. According to the quadratic formula, the solutions
have the form √
−b ± b2 − 4ac
x= .
2a
Of course, it may well happen that b2 − 4ac < 0, in which case the square-
root is undefined and the equation has no solutions in the algebra of real
numbers. In this case, however, it is fruitful to interpret the quadratic
formula as providing a pair of solutions
√
b 4ac − b2 √
− ±i , where i = −1 .
2a 2a
In effect, this amounts to extending the algebra of real numbers to the
algebra of complex numbers z = x + iy, where x and y are real numbers.
It should be emphasized that in the process of enlarging the set R of
real numbers to the set C of complex numbers, the algebraic operations of
addition and multiplication are also extended to C. The sum and product
of two complex numbers z = x + iy and w = u + iv are defined by
These expressions result from formal manipulations, assuming that the imag-
inary number i is subject to the usual rules of arithmetic. In this way the
algebra of real numbers is extended to the algebra of complex numbers,
preserving the basic properties of the arithmetic operations. Specifically,
addition and multiplication are both commutative and associative, and mul-
tiplication is distributive over addition: z(w1 + w2 ) = zw1 + zw2 .
A complex number z = x + iy is said to be real if y = 0, and purely
imaginary if x = 0. The real part of z is Re{z} = x and the imaginary
part is Im{z} = y. The complex conjugate of z is z = x − iy. Conjugation
commutes with addition and multiplication: z +w = z + w and zw = z w.
The modulus of a complex number z is |z| = x2 + y 2 , a generalization
of the absolute value of a real number. Observe that z z = |z|2 and that
|z| = |z|. Two useful formulas are
1 1
Re{z} = (z + z) and Im{z} = (z − z) .
2 2i
It may be noted that |Re{z}| ≤ |z| and |Im{z}| ≤ |z|. A simple calculation
shows that
In other words, the product of two complex numbers is the number whose
modulus is the product of their moduli and whose argument is the sum of
their arguments. In symbols,
|α − β| ≤ |α − γ| + |γ − β| , where α, β, γ ∈ C .
1.11. Complex numbers 45
Then the inequality says that the distance from α to β is no greater than
the distance from α to γ plus the distance from γ to β. The three points
α, β, γ can be viewed as vertices of a triangle.
In fact the formula
|α − β|2 = |α|2 + |β|2 − 2 Re{αβ}
is none other than the law of cosines. To see this, view the three points 0,
α, and β as vertices of a triangle. Then the sides have lengths |α|, |β|, and
|α − β|. Let θ = arg{α} and ϕ = arg{β}, where 0 < θ − ϕ < π. Then
the angle at the origin, the vertex opposite the side of length |α − β|, is
ψ = θ − ϕ. But Re{αβ} = |α||β| cos(θ − ϕ), so the formula does reduce to
the law of cosines: c2 = a2 + b2 − 2ab cos ψ.
Another consequence of the product formula is the de Moivre formula
(cos θ + i sin θ)n = cos nθ + i sin nθ ,
which is a compact expression for a family of trigonometric identities. In
view of the binomial theorem it gives, for instance
cos 5θ = cos5 θ − 10 cos3 θ sin2 θ + 5 cos θ sin4 θ .
The product formula appears more natural when the polar form of a
complex number is expressed in exponential form. Euler’s formula eiθ =
cos θ + i sin θ results from a formal comparison of Taylor series expansions.
(See Chapter 3, Exercise 23.) If we write z = reiθ and w = ρeiϕ , the
product formula is zw = rρei(θ+ϕ) . It is possible to define the exponential
of a complex number as a power series, but for the moment we may simply
regard eiθ as a notation for cos θ + i sin θ, with the property eiθ eiϕ = ei(θ+ϕ)
that is characteristic
of an exponential function. Observe also that eiθ = e−iθ
iθ
and e = 1, and that
d iθ
e = − sin θ + i cos θ = ieiθ .
dθ
The quotient of two complex numbers a + ib and c + id = 0 can be
defined as the solution z of the equation (c + id)z = a + ib. To see that
this equation has a unique solution, let z = x + iy and write the complex
equation in real form as a pair of linear equations
cx − dy = a , dx + cy = b .
This system has the unique solution
ac + bd bc − ad
x= 2 2
, y= 2 .
c +d c + d2
The quotient can be calculated by writing
a + ib (a + ib)(c − id) ac + bd bc − ad
= = 2 +i 2 .
c + id (c + id)(c − id) c + d2 c + d2
In more concise form, z/w = z w/|w|2 .
46 1. Basic Principles
Exercises
1. Use mathematical induction to prove that
n(n + 1)(2n + 1)
(a) 12 + 22 + · · · + n2 = , n = 1, 2, . . . ,
6
(b) 13 + 23 + · · · + n3 = (1 + 2 + · · · + n)2 , n = 1, 2, . . . .
(b) Prove that the inequality holds more generally for all x > −1.
4. Prove the inequality | sin nθ| ≤ n| sin θ| , n = 1, 2, . . . .
5. The Fibonacci sequence {xn } is defined by
x0 = 0, x1 = 1, x2 = 1, x3 = 2, x4 = 3, x5 = 5, x6 = 8, . . . .
Here the general rule is that each number in the sequence is the sum of the
two previous numbers: xn+2 = xn+1 + xn for n = 0, 1, 2, . . . .
(a) Show that the recurrence relation xn+2 = xn+1 + xn has solutions
of the form xn = αn for exactly two real constants α. Find this pair of
numbers.
(b) Show that the Fibonacci sequence has the form
xn = A α n + B β n , n = 0, 1, 2, . . . ,
7. Prove that the collection of all infinite sequences of 0’s and 1’s is un-
countable.
8. Prove that the collection of all sets of positive integers is uncountable.
9. Prove that the set of all algebraic numbers is countable and deduce the
existence of transcendental numbers.
Hint. Use the fact that a polynomial of degree n has at most n real roots.
10. (a) Show that the reciprocal of each nonzero algebraic number is
algebraic.
(b) Show that the square root of each positive algebraic number is
algebraic.
11. Show that a set of real numbers is closed if and only if it contains all of
its cluster points.
12. For any set S of real numbers, the derived set S is defined as the set
all cluster points of S. (It may be empty.) Show that for every set S ⊂ R
the set S is closed.
13. Let {sn } be a sequence of real numbers and let
s1 + s2 + · · · + sn
σn =
n
be its sequence of arithmetic means.
(a) If {sn } converges to s, show that {σn } also converges to s.
(b) Give an example to show that the converse is false: the sequence
{σn } may converge although the sequence {sn } does not.
14. Without appeal to Stirling’s formula, calculate
1
lim (n!)1/n .
n→∞ n
Suggestion. Apply the last theorem in Section 1.4.
15. Suppose a1 ≥ a2 ≥ a3 ≥ · · · > 0. Prove that the series
∞
∞
an converges if and only if 2k a2k converges.
n=1 k=0
17. If x1 = 1 and
√
xn+1 = 1 + xn , n = 1, 2, . . . ,
prove that the sequence {xn } converges and find its limit.
48 1. Basic Principles
18. Given numbers a1 and b1 with 0 < a1 < b1 , define the sequences {an }
and {bn } inductively by
an + bn
an+1 = an bn and bn+1 = , n = 1, 2, . . . .
2
Show that {an } and {bn } both converge, and to the same limit.
√
Hint. First observe that ab ≤ a+b2 for all a > 0 and b > 0.
19. Using the results of Section 1.4, verify the unrestricted limits
(a) lim xp rx = 0 for p > 0 and 0 < r < 1 ,
x→∞
(b) lim x−p log x = 0 for p > 0 ,
x→∞
(c) lim xp log x = 0 for p > 0 ,
x→0+
(d) lim xx = 1 .
x→0+
24. Prove the second inequality in the theorem at the end of Section 1.4:
rn+1 √
lim inf ≤ lim inf n rn .
n→∞ rn n→∞
28. Apply the generalized mean value theorem of the preceding exercise to
establish l’Hôpital’s rule in the following form. If f and g are differentiable
in an open interval (a, b) and f (x)/g (x) → L as x → a+, and if f (x) → 0
and g(x) → 0 as x → a+. then f (x)/g(x) → L as x → a+.
29. Verify that the function defined by
2
x sin x1 , x = 0
f (x) =
0, x = 0.
is differentiable everywhere on R, but its derivative is not continuous at the
origin.
30. The preceding example shows that a derivative may exist but fail to be
continuous. Show nevertheless that a derivative always has the intermediate
value property. Specifically, prove that if f is differentiable on an interval
[a, b] and f (a) < λ < f (b), then f (ξ) = λ for some ξ ∈ (a, b).
Hint. Show that g(x) = f (x) − λx attains its minimum value at a point
ξ ∈ (a, b).
31. For 0 ≤ x ≤ 1, define the functions fn (x) = n2 xn (1 − x), where
n = 1, 2, . . . .
(a) Show that limn→∞ fn (x) = 0 for each x ∈ [0, 1].
n
(b) Show directly, by consideration of fn ( n+1 ), that the convergence is
not uniform on [0, 1].
1
(c) Show that limn→∞ 0 fn (x)dx = 1, and conclude again that the
convergence of fn (x) to 0 is not uniform on [0, 1].
32. Let the functions f1 (x), f2 (x), . . . be continuous on the interval [0, 1]
and suppose that fn (x) → f (x) uniformly on [0, 1] as n → ∞. Show that
fn (1/n) → f (0).
33. Apply the Bolzano–Weierstrass theorem instead of the Heine–Borel
theorem to show that a function continuous at each point of a closed bounded
set S ⊂ R is uniformly continuous on S.
50 1. Basic Principles
Show that
∞
∞ ∞
∞
anm = −2 but anm = 0 .
n=1 m=1 m=1 n=1
Explain why this example does not contradict Cauchy’s double series theo-
rem (see Section 1.5).
References
[1] Umberto Bottazzini, The Higher Calculus: A History of Real and Complex
Analysis from Euler to Weierstrass, Springer-Verlag, New York, 1986.
This chapter is devoted to a study of some special sequences that arise often
in analysis. We begin with two familiar sequences which converge to e, the
base of natural logarithms. As a byproduct of the analysis, the number e is
found to be irrational. This prompts an elementary proof that π is irrational
as well. Further topics include Euler’s constant γ and the infinite product
formulas of Vieta and Wallis. The chapter concludes with a crown jewel of
classical analysis, Stirling’s approximation to the factorial function.
51
52 2. Special Sequences
be the nth partial sum, and observe that sn < sn+1 , since all terms are
positive. On the other hand, the inequality k! ≥ 2k−1 shows that
1 1 1
sn ≤ 1 + 1 + + 2 + · · · + n−1 < 3
2 2 2
for all n. Thus the sequence {sn } is monotonic and bounded, and is therefore
convergent. We denote its limit, or the sum of the infinite series, by e.
Now consider the expressions
1 n
xn = 1 + , n = 1, 2, 3, . . . .
n
In order to study the behavior of the sequence {xn }, we can use a hand
calculator to compute
The data suggest that {xn } is an increasing sequence. Our next step will be
to show that this is actually true.
According to the binomial theorem, we can write
n k
1 n n 1
xn = 1 + =
n k n
k=0
2
1 n(n − 1) 1 n(n − 1)(n − 2) 1 3
=1+n + +
n 2! n 3! n
n
n(n − 1)(n − 2) · · · 1 1
+ ··· +
n! n
1 1 1 1 2
=1+1+ 1− + 1− 1−
2! n 3! n n
1 1 2 n−1
+ ··· + 1− 1− ··· 1 − .
n! n n n
A similar expansion of xn+1 gives
n+1
n+1 k
1 n+1 1
xn+1 = 1+ = .
n+1 k n+1
k=0
Observe now that for each fixed k in the range 2 ≤ k ≤ n, the term
k
n+1 1 1 1 2 k−1
= 1− 1− ··· 1 −
k n+1 k! n+1 n+1 n+1
k
1 1 2 k−1 n 1
≥ 1− 1− ··· 1 − = .
k! n n n k n
2.1. The number e 53
n+1
1
Furthermore, the expansion for xn+1 contains an extra term n+1 > 0,
corresponding to k = n + 1. These two inequalities combine to show that
xn+1 > xn for n = 1, 2, 3, . . . .
The above expansion for xn also shows that
1 1 1
xn ≤ 1 + 1 + + + ··· + = sn < 3 .
2! 3! n!
Therefore, x1 < x2 < x3 < · · · < 3, and so the sequence {xn } is monotonic
and bounded, hence convergent. We are going to prove that
lim xn = e ,
n→∞
the limit of the sequence {sn }. Since xn ≤ sn for all n, it follows that
lim xn ≤ lim sn = e .
n→∞ n→∞
On the other hand, for each fixed m ≤ n the sum in the previous calcu-
lation can be truncated to give
1 1 1 1 2
xn ≥ 1 + 1 + 1− + 1− 1−
2! n 3! n n
1 1 2 m−1
+ ··· + 1− 1− ··· 1 − .
m! n n n
lim xn ≥ lim sm = e .
n→∞ m→∞
1 1
lim xn = lim sn = e = 1 + 1 + + + ... .
n→∞ n→∞ 2! 3!
1 1 1
e − sn = + + +...
(n + 1)! (n + 2)! (n + 3)!
2
1 1 1 1
< 1+ + + ... = ,
(n + 1)! n+1 n+1 n! n
1
0 < e − sn < for n = 1, 2, 3, . . . .
n! n
a0 + a1 x + a2 x2 + · · · + an xn = 0
2.2. Irrationality of π
We now digress from the theme of this chapter to prove that the number
π is irrational. This fact lies intrinsically deeper than the irrationality of e,
and was proved by more sophisticated methods before Ivan Niven [5] found
the remarkably elementary proof that will be presented here. In fact, the
proof yields the stronger result that π 2 is irrational.
Consider the polynomial
1 n
f (x) = x (1 − x)n ,
n!
where n is a positive integer to be specified later. This function satisfies
1
0 < f (x) < for 0 < x < 1 .
n!
It is easy to see that each of the derivatives f (k) (0) is an integer. Indeed,
f (k) (0) = 0 for 0 ≤ k < n, and for k ≥ n a calculation shows that the kth
derivative of xn (1 − x)n at the origin is an integer divisible by n!. (This
remains true if the factor (1 − x)n is replaced by any other polynomial
with integer coefficients.) By the symmetry relation f (1 − x) = f (x), it
follows that every derivative f (k) (1) is also an integer. Observe finally that
f (k) (x) ≡ 0 for all k > 2n, since f is a polynomial of degree 2n.
Now suppose, for purpose of contradiction, that π 2 is rational, so that
π 2 = p/q for some positive integers p and q. Define the polynomial
g(x) = q n π 2n f (x) − π 2n−2 f (2) (x) + π 2n−4 f (4) (x) − · · · + (−1)n f (2n) (x) ,
and note that both g(0) and g(1) are integers under the supposition that
π 2 = p/q. Because of the structure of g, we see that
d
g (x) sin πx − πg(x) cos πx = g (x) + π 2 g(x) sin πx
dx
= q n π 2n+2 f (x) sin πx = π 2 pn f (x) sin πx .
Consequently,
1 1
1
πp n
f (x) sin πx dx = g (x) sin πx − g(x) cos πx = g(1) + g(0) ,
0 π 0
56 2. Special Sequences
which is an integer. On the other hand, since 0 < f (x) < 1/n! for 0 < x < 1 ,
we find that 1
n πpn
0 < πp f (x) sin πx dx < <1
0 n!
if n is chosen sufficiently large. But this is impossible, because there is no
integer between 0 and 1. Thus the assumption that π 2 is rational has led to
a contradiction, and so we conclude that π 2 is irrational, which implies that
π is irrational.
The transcendence of π has an interesting application to classical geom-
etry. It settles once and for all the ancient problem of “squaring the circle”
with straight edge and compass. Given an arbitrary circle, the problem is
to construct a square of the same area. Since the circle has area πr2 , this
amounts to starting with a line segment of unit length and constructing a
√
segment of length π. A segment of length π could then be constructed,
since from any segment of length it is possible to construct a segment of
length 2 . But it can be shown that the length of every segment constructible
with straight edge and compass, starting with a segment of unit length, is
an algebraic number. (A good reference for this fact is the book by Courant
and Robbins [1].) Therefore, if it were possible to square the circle with
straight edge and compass, the number π would have to be algebraic. But
π is transcendental, so it is impossible to square the circle.
It is named for Leonhard Euler, who first discussed it in 1734. The number
γ is an important constant that occurs frequently in mathematical formulas.
The existence of the limit is not obvious. Our aim is to prove that the limit
exists and to determine its approximate numerical value.
Consider the curve y = 1/x for 1 ≤ x ≤ n, where n = 2, 3, . . . . The area
under the curve is given by
n
1
An = dx = log n .
1 x
Now construct rectangular boxes of heights 1/k over the intervals [k, k + 1],
as shown in Figure 1.
2.3. Euler’s constant 57
x
1 2 3 4 ... n
Since
1 1 1
≤ ≤ for k ≤ x ≤ k + 1 ,
k+1 x k
it follows that
k+1 k+1 k+1
1 1 1 1 1
= dx ≤ dx ≤ dx =
k+1 k k+1 k x k k k
n−1 n 1 n−1
1 1
≤ dx ≤ .
k+1 1 x k
k=1 k=1
The existence of Euler’s constant has now been established, and the proof
shows that 0 ≤ γ ≤ 1.
In fact, it is visually clear from Figure 2 that γ is slightly larger than 12 .
Our next task is to derive quantitative bounds on γ by estimating the area
k+1
1 1
αk = − dx
k k x
of the region in the kth box that lies above the curve y = 1/x. Since the
curve is convex, that region contains a triangle of area
1 1 1
−
2 k k+1
1 1
− 1 ,
k k+ 2
2.3. Euler’s constant 59
1
k k+2 k+1
constructed by drawing the tangent line to the curve at the point where
x = k + 12 . (See Figure 3 and note that the trapezoid above the tangent line
is obtained by removing the lower trapezoid from the entire rectangle.)
2
lim (b1 b2 · · · bn ) = ,
n→∞ π
where
1 1
b1 = 2 and bn+1 = 2 + 12 bn , n = 1, 2, . . . .
x sin θ
lim 2n sin = x, since lim = 1.
n→∞ 2n θ→0 θ
2.5. Wallis product formula 61
1 1 1 2
(2) lim 1− 2 1− 2 ··· 1 − =
n→∞ 2 4 (2n)2 π
and
22n (n!)2 √
(3) lim √ = π.
n→∞ (2n)! n
The factors in (2) are simply the reciprocals of those in (1). To see that (1)
2n → 1, the formula (1)
and (3) are equivalent, observe first that because 2n+1
implies
62 2. Special Sequences
22 42 62 · · · (2n − 2)2 π
lim (2n) = .
n→∞ 3 5 7 · · · (2n − 1)
2 2 2 2 2
Taking square roots, we deduce that
π 2 · 4 · 6 · · · (2n − 2) √
= lim 2n
2 n→∞ 3 · 5 · 7 · · · (2n − 1)
22 42 62 · · · (2n − 2)2 (2n)2
= lim √
n→∞ (2n)! 2n
1 22n (n!)2
= √ lim √ ,
2 n→∞ (2n)! n
which shows that (1) implies (3). Since all steps are reversible, a similar
argument shows that (3) implies (1).
Taken in the version (3), the Wallis product√formula may be regarded
as a weak form of Stirling’s formula n! ∼ nn e−n 2πn , or
n!
lim √ = 1,
n→∞ nn e−n 2πn
which will be proved in the next section.
For a proof of the product formula (1), consider the integrals
π/2
In = sinn x dx , n = 0, 1, 2, . . . .
0
Observe that I0 = π
2 and I1 = 1. To compute In for n ≥ 2, write
I2n I2n−1 2n + 1
1< < = → 1,
I2n+1 I2n+1 2n
which shows that I2n /I2n+2 → 1 as n → ∞. This completes the proof of the
Wallis product formula (1).
The formula is actually a special case of a much more general relation.
The sine function has the infinite product representation
∞
x2
sin πx = πx 1− 2 , x ∈ R,
n
n=1
1
as will be proved in Chapter 8. Set x = 2 to obtain the Wallis product
formula (2).
On the other hand, the area can be estimated geometrically. Since the
logarithm is a concave function, the curve y = log x lies above each of its
chords connecting successive points (k, log k), for k = 1, 2, . . . , n. Thus An
is larger than the sum of areas of the trapezoids under those line segments.
The total area of the trapezoids is
Tn = 1
2 log 2 + 12 (log 2 + log 3) + · · · + 12 (log(n − 1) + log n)
= log 2 + log 3 + · · · + log(n − 1) + 1
2 log n
= log(n!) − 1
2 log n .
x
1 2 3 4 ... n
Now let αk denote the area of the small region bounded by the curve
y = log x and the line segment joining the two points (k, log k) and (k +
1, log(k + 1)), for k = 1, 2, . . . , n − 1. Then the total area under the curve is
An = Tn + En , where En = α1 + α2 + · · · + αn−1 .
Inserting the expressions for An and Tn , we can write this relation in the
form
log(n!) = n + 12 log n − n + 1 − En ,
or
n! = Cn nn+ 2 e−n ,
1
where Cn = e1−En .
The sequence {En } is increasing, since each term αk is positive. We
now show that the sequence {En } has an upper bound and is therefore
convergent. In order to estimate αk , we construct the tangent line to the
curve y = log x at the point where x = k + 12 (see Figure 5) and compare
areas:
1 1
αk < log k + − log k + log(k + 1)
2 2
1 k + 12 1 k+1
= log − log
2 k 2 k + 12
1 1 1 1
= log 1 + − log 1 +
2 2k 2 2k + 1
1 1 1 1
< log 1 + − log 1 + .
2 2k 2 2(k + 1)
k k + 1/2 k
n−1
n−1
1
1
1
1
En = αk < log 1 + − log 1 +
2 2k 2 2(k + 1)
k=1 k=1
1 3 1 1 1 3
= log − log 1 + < log ,
2 2 2 2n 2 2
since
the dominant series telescopes. Thus En increases to a finite limit
E = ∞ k=1 αk , and so Cn = e
1−En decreases to a limit C = e1−E > 0. In
Exercises
x2
x− < log(1 + x) < x , x > 0,
2
Exercises 67
Conclude that n
1 1
lim 1− = .
n→∞ n e
(b) Find
1 2 n
lim 1+ 2 1+ 2 ··· 1 + 2 .
n→∞ n n n
x2 x2 x3
x− < log(1 + x) < x − + , x > 0.
2 2 3
(This is an improvement for small x.) Apply the result to calculate
n 2
1
lim 1+ e−n .
n→∞ n
3. (a) Evaluate
1/n
1 (2n)!
lim
n→∞ n n!
by showing, directly from the definition of an integral, that the logarithm
2
of the expression tends to 1 log x dx. Calculate the integral.
(b) Check your result with the help of Stirling’s formula.
5. Write e = sn + rn , where sn = 1 + 1
1! + 1
2! +··· + 1
n! .
1 1
(a) Show that < n!rn < .
n+1 n
(b) Calculate lim n sin(2πn!e) .
n→∞
6. Test the following infinite series for convergence and absolute conver-
gence. Explain your reasoning.
∞
n!en
(a) , p ∈ R.
nn+p
n=1
∞
a
(b) (−1)n sin , a > 0.
n
n=1
∞
1
(c) 1 − cos .
n
n=1
2 π sin x
x ≤ sin x ≤ x for 0 ≤ x ≤ ; lim = 1.
π 2 x→0 x
1 1 1
1− + − + · · · = log 2 .
2 3 4
1 1 1 1 1 1 1 1 3
1+ − + + − + + − + · · · = log 2 .
3 2 5 7 4 9 11 6 2
9. Show that
n
1 1 γ
lim − log n = + log 2 .
n→∞ (2k + 1) 2 2
k=0
11. Let
n
1 1
xn = − log n and yn = xn − .
k 2n
k=1
(a) Show that the sequence {xn } is decreasing and {yn } is increasing,
so that yn < γ < xn for n = 1, 2, . . . .
Exercises 69
culate xn for n = 10, 000. Use the result to give the bounds 0.57721 < γ <
0.57726 and thus to calculate γ = 0.5772 . . . to 4 decimal places.
2n
2n
n
k+1 log k log k log 2k
(−1) = −2 ,
k k 2k
k=1 k=1 k=1
√
3 3
1
2
1
2 + 1
2 · 1
2
1
2 + 1
2
1
2 + 1
2 · 1
2
1
2 + 1
2
1
2 + 1
2
1
2 + 1
2 · ··· =
1
2 .
4π
π/2
In = sinn x dx , n = 1, 2, . . . ,
0
2n 22n
∼√ as n → ∞ .
n πn
π
sin2n+1 x < sin2n x < sin2n−1 x , 0<x< ,
2
2 2
2 (2n)!! 1 (2n)!!
<π< , n = 1, 2, . . . ,
2n + 1 (2n − 1)!! n (2n − 1)!!
where
References
[1] Richard Courant and Herbert Robbins, What is Mathematics?, Oxford Univer-
sity Press, London and New York, 1941.
[2] Charles Hermite, “Sur la fonction exponentielle”, Comptes Rendus Acad. Sci.
Paris 77 (1873), 18–24, 74–79, 226–233, 285–293.
[4] Ferdinand von Lindemann, “Über die Zahl π”, Math. Annalen 20 (1882), 213–
225.
[6] Ivan Niven, Numbers: Rational and Irrational , New Mathematical Library,
Volume 1, Random House, New York-Toronto, 1961.
This chapter begins with a review of the basic properties of power series. The
more delicate problem of boundary behavior is then addressed with a dis-
cussion of Abel’s theorem. In the converse direction, Taylor’s formula with
remainder is developed and applied to the expansion of certain functions in
power
series. The results then allow an elementary evaluation of Euler’s sum
1/k = π 2 /6. The chapter concludes with Weierstrass’s famous example
2
The number a is called the center of the power series, and the numbers cn
are its coefficients. The first problem is to find the set of numbers x ∈ R for
which the series converges.
Obviously, the series converges for x = a since every term then vanishes
except possibly for c0 = f (a). It need not converge anywhere else, but if
it does converge at some point x0 = a, then it will converge in the interval
|x − a| < |x0 − a|, uniformly in each closed subinterval |x − a| ≤ ρ < |x0 − a|.
73
74 3. Power Series and Related Topics
To see this, observe first that the convergence at x0 implies that the terms
cn (x0 − a)n tend to zero as n → ∞, so that
1
R= , where β = lim sup n
|cn | .
β n→∞
A power series can also be differentiated term by term within its interval
of convergence. We have the following theorem.
Theorem. The sum of a power series
∞
f (x) = cn (x − a)n , |x − a| < R ,
n=0
Because the derivative f (x) is given by another power series with the
same radius of convergence, the theorem shows that f (x) exists and is
expressed by the twice-differentiated power series. Continuing in this way,
we see that the sum of a power series has derivatives of all orders, calculated
by successive term-by-term differentiation.
converges at the point x = 1. It then follows that the power series converges
for all x in the interval (−1, 1], and Abel’s theorem asserts that the sum
function f is continuous at the point 1. Since the sum of a power series
is known to be continuous within its interval of convergence (−R, R), the
theorem as stated is of interest only when R = 1.
Here are some examples. The function f (x) = log(1 + x) has the power
series expansion
1− 1
2 + 1
3 − 1
4 + · · · = lim log(1 + x) = log 2 .
x→1−
1− 1
3 + 1
5 − 1
7 + · · · = lim tan−1 x = tan−1 1 = π
4 .
x→1−
On the other hand, the converse of Abel’s theorem is false in the absence
of additional hypotheses. In other words, the existence of the limit of f (x)
as x → 1− does not imply the convergence of the power series at x = 1. For
instance,
1
1 − x + x2 − x3 + · · · = → 12
1+x
as x → 1−, but the series 1 − 1 + 1 − 1 + . . . is divergent.
The proof of Abel’s theorem makes use of a device known as Abel sum-
mation or summation by parts, the discrete analogue of integration by parts.
It can be expressed as follows.
Let {ak } and {bk } be arbitrary sequences of real numbers, and let
Lemma.
sn = nk=0 ak . Then for 0 ≤ m < n,
n
n−1
ak bk = sk (bk − bk+1 ) + sn bn − sm−1 bm ,
k=m k=m
where s−1 = 0.
78 3. Power Series and Related Topics
Proof.
n
n
n
n−1
ak bk = (sk − sk−1 )bk = sk bk − sk bk+1
k=m k=m k=m k=m−1
n−1
= sk (bk − bk+1 ) + sn bn − sm−1 bm .
k=m
n
Proof of Abel’s theorem. Again let sn = k=0 ak denote the partial
sums of the series, so that sn → s as n → ∞. For |x| < 1, let
∞
n
k
f (x) = ak x , and let fn (x) = ak x k
k=0 k=0
denote the partial sums of the power series. Then by the lemma,
n−1
fn (x) = sk (1 − x)xk + sn xn .
k=0
Fixing x in the interval (0, 1) and letting n → ∞, we arrive at the expression
∞
f (x) = (1 − x) sk xk , 0 < x < 1,
k=0
since sn → s and xn → 0. But
∞
(1 − x) xk = 1 , |x| < 1 ,
k=0
so we can write
∞
f (x) − s = (1 − x) (sn − s)xn , 0 < x < 1.
n=0
Given ε > 0, choose N such that |sn − s| < 2ε for all n ≥ N . Then
N −1 ∞
|f (x) − s| ≤ (1 − x) |sn − s|x +
n
|sn − s|x n
n=0 n=N
N −1
< (1 − x) |sn − s| + ε
2 < ε
2 + ε
2 =ε
n=0
if x is sufficiently close to 1. In other words, there is a number δ > 0 such
that
|f (x) − s| < ε when 0 < 1−x < δ.
This proves the theorem.
has radius of convergence R, where 0 < R < ∞, and if the series converges
for x = R, then limx→R− f (x) = f (R). In other words, the function f (x) is
left-continuous at x = R.
For a proof, we need only consider the function g(x) = f (Rx), whose
power series development has radius of convergence 1.
Abel’s theorem is much easier to prove in the special case where all of
the coefficients an are of one sign. Then the result is a direct consequence
of the Weierstrass M-test for uniform convergence. To see this, observe
that
|an xn | ≤ an for |x| ≤ 1 if an ≥ 0, so the convergence of the series ∞
n=0 an
implies the uniform convergence of the power series
∞
f (x) = an x n
n=0
in the closed interval [−1, 1]. Therefore, the sum f (x) is continuous in [−1, 1]
and is in particular continuous at the point 1, as Abel’s theorem asserts.
The technique of Abel summation has many applications. For instance,
it can be used to obtain the following result.
Theorem. If an infinite series ∞ n=1 an has bounded partial sums, and if
n∞} is a sequence of positive numbers that decrease to zero, then the series
{b
n=1 an bn is convergent.
Proof. By hypothesis, the partial sums sn = nk=1 ak have the property
|sn | ≤ M for some constant M and all n. An Abel summation gives
n
n−1
ak bk = sk (bk − bk+1 ) + sn bn .
k=1 k=1
Since sn bn → 0 as n → ∞ and
n−1
n−1
|sk (bk − bk+1 )| ≤ M (bk − bk+1 ) = M (b1 − bn ) ≤ M b1 ,
k=1 k=1
∞
it follows that the series k=1 ak bk converges.
n
cn = an−k bk = an b0 + an−1 b1 + · · · + a0 bn , n = 0, 1, 2, . . . .
k=0
gives
∞
∞
an xn bn x n
n=0 n=0
= a0 + a1 x + a2 x2 + . . . b0 + b1 x + b2 x2 + . . .
= a0 b0 + (a1 b0 + a0 b1 )x + (a2 b0 + a1 b1 + a0 b2 )x2 + . . .
= c 0 + c 1 x + c 2 x2 + . . . ,
The Cauchy product of two convergent series need not converge. Con-
sider for example the series
∞
∞
(−1)n 1 1 1
an = √ = 1 − √ + √ − √ +... ,
n=0 n=0
n+1 2 3 4
82 3. Power Series and Related Topics
But n 2 n 2 n 2
(n − k + 1)(k + 1) = +1 − −k ≤ +1 ,
2 2 2
so that
n
2 2(n + 1)
|cn | ≥ = ≥ 1, n = 0, 1, 2, . . . .
n+2 n+2
k=0
Hence the series cn diverges, since its terms do not tend to 0.
The preceding example involves a series an that is convergent but not
absolutely convergent. If an and bn are both absolutely convergent,
it is permissible to rearrange terms to show that their Cauchy product is
convergent. A theorem of Mertens says that absolute convergence of only
one of the series allows the same conclusion.
Mertens’ Theorem. Suppose the series an and bn converge to sums
A
and B, respectively. Let c n be the
Cauchy product of the two series. If
an is absolutely convergent, then cn converges to sum C = AB.
Proof. Let
n
n
n
An = ak , Bn = bk , and Cn = ck
k=0 k=0 k=0
for all n sufficiently large. This proves that σn → 0, and it follows that
Cn → AB as n → ∞, which proves the theorem.
Franz Josef Mertens (1840–1927) is best known for his work in num-
ber theory. As a student at the University of Berlin, he attended lectures
by Kummer, Kronecker, and Weierstrass. Later he held professorships at
universities in Krakow, Graz, and Vienna.
in that interval, or at least in some smaller open interval about the point a?
This is a much more delicate question, and the answer in general is no. A
counterexample is outlined in Exercise 6.
The formal expansion (2) of a function f into power series is known as the
Taylor series of f at the point a. It is so named after Brooke Taylor (1685–
1731), an English mathematician who recorded it as early as 1715, without
considering the question of convergence. The problem of convergence of
the Taylor series to a given function was first addressed by Colin Maclaurin
(1698–1746) in 1742 and was successfully analyzed by Joseph-Louis Lagrange
(1736–1813) in his book Théorie des fonctions analytiques, published in
1797.
Although the problem of Taylor series representation is fully understood
only in the context of functions of a complex variable, sufficient conditions
for validity of the expansion are available through an explicit formula for the
difference between f (x) and the nth partial sum of its Taylor series. This
expression is commonly called Taylor’s formula with remainder, but its first
clear statement appears in the work of Lagrange.
Here is a derivation of Taylor’s formula. Let f (x) have continuous deriva-
tives of all orders on some open interval containing a given point a ∈ R. By
the fundamental theorem of calculus,
x
f (x) = f (a) + f (t) dt for x near a.
a
Now hold x fixed and integrate by parts, letting u(t) = f (t) and v(t) =
−(x − t) in the usual notation. This leads to
x x
f (x) = f (a) + f (t)v (t) dt = f (a) + f (a)(x − a) + f (t)(x − t) dt .
a a
where
n
1 (k)
Sn (x) = f (a)(x − a)k
k!
k=0
and x
1
Rn (x) = f (n+1) (t)(x − t)n dt .
n! a
3.4. Taylor’s formula with remainder 85
This is Taylor’s formula. The polynomial Sn (x) is the nth partial sum of
the Taylor series of f at the point a, and Rn (x) is called the remainder.
In order to prove that Sn (x) converges to f (x) as n → ∞, or that f has
the Taylor series expansion
∞
1 (k)
f (x) = f (a)(x − a)k ,
k!
k=0
Lemma. Let g(t) and h(t) be continuous functions on an interval [a, b], and
suppose that h(t) ≥ 0. Then
b b
g(t)h(t) dt = g(ξ) h(t) dt
a a
To derive the Lagrange form of the remainder from the integral repre-
sentation, we take g(t) = f (n+1) (t) and h(t) = (x − t)n in the lemma to
conclude that
x
1 (n+1) (x − a)n+1
Rn (x) = f (ξ) (x − t)n dt = f (n+1) (ξ)
n! a (n + 1)!
(x − ξ)n
Rn (x) = f (n+1) (ξ) (x − a)
n!
for some ξ between a and x. Specifically, this means that a < ξ < x or
x < ξ < a.
Here are two examples.
86 3. Power Series and Related Topics
Example 1. Let f (x) = sin x and let a = 0. We are going to show that the
familiar Taylor series of the sine function actually converges to sin x for every
x ∈ R. Since it is a power series, we can then conclude that the convergence
is uniform on each bounded subset of R, but this will also follow directly
from our estimate of the remainder in Taylor’s formula. We find that
x3 x5 x7
x− + − + ... .
3! 5! 7!
It is easy to see (by the ratio test, for instance) that this infinite series
converges for all x ∈ R, but convergence alone does not imply that its sum
is sin x. To prove this, we have to verify that Rn (x) → 0 as n → ∞ for each
fixed x ∈ R. Referring to the Lagrange form of the remainder and noting
that f (n+1) (ξ) is equal to ± sin ξ or ± cos ξ, we see that |f (n+1) (ξ)| ≤ 1, so
that
|x|n+1
|Rn (x)| ≤ →0 as n → ∞ .
(n + 1)!
A similar analysis verifies the expansions
x2 x4 x6 x2 x3
cos x = 1 − + − + ... and ex = 1 + x + + +...
2! 4! 6! 2! 3!
for all x ∈ R.
Example 2. Next let f (x) = log(1 + x) for x > −1, and again take a = 0.
A calculation gives
Thus the formal Taylor series of log(1 + x) about the origin is found to be
x − 12 x2 + 13 x3 − 14 x4 + . . . .
Sn (x) = x − 12 x2 + 13 x3 − · · · + (−1)n+1 n1 xn
and
x n
1 x
x−t 1
Rn (x) = f (n+1)
(t) (x − t) dt = (−1)
n n
dt .
n! 0 0 1+t 1+t
3.5. Newton’s binomial series 87
We now apply the inequality x−t 1+t ≤ |x|, which can be seen to hold for all
t between 0 and x. (See Exercise 4.) It provides the estimate
x
1
|Rn (x)| ≤ |x|n dt = |x|n log(1 + |x|) ,
0 1+t
1 (1 + ξ)−n+1
Rn (1) = f (n+1) (ξ) = (−1)n
(n + 1)! n+1
for some ξ in the interval 0 < ξ < 1. Thus |Rn (1)| ≤ n+1
1
→ 0, which shows
that the power series converges to f (1) when x = 1. Observe that we have
given another derivation, without appeal to Abel’s theorem, of the formula
1− 1
2 + 1
3 − 1
4 + · · · = log 2 .
is a generalized
α binomial coefficient. Observe that if α is a positive integer,
then k = 0 for all k ≥ α + 1 and Newton’s binomial series reduces to the
standard binomial theorem.
We can derive Newton’s formula as a Taylor series expansion about the
origin of the function f (x) = (1 + x)α , where α = 0, 1, 2, . . . and x > −1.
Our assumption that α is not a positive integer or zero implies that αk = 0
for all k. Successive differentiations produce the formula
1 (k) α
f (x) = (1 + x)α−k , k = 1, 2, . . . .
k! k
Thus
n
α
Sn (x) = 1 + xk
k
k=1
and
x n
x−t α(α − 1) · · · (α − n)
Rn (x) = rn (1 + t)α−1 dt , where rn = .
0 1+t n!
which implies that for each fixed number ρ > 1, the inequality |rn+1 /rn | ≤ ρ
holds for all n ≥ N . Thus |rn | ≤ Cρn for all n, where C is a constant.
x−t
Appealing to the elementary inequality 1+t ≤ |x| used in the previous
section, we can estimate Rn (x) by
x
n α−1
|Rn (x)| ≤ |rn | |x| (1 + t) dt .
0
Given x ∈ (−1, 1), choose ρ > 1 such that ρ|x| < 1. Then
In fact, the proof shows that for α > 0 the series converges uniformly
to (1 + x)α in the closed interval −1 ≤ x ≤ 1. To see this, note that
3.6. Composition of power series 89
Then, as we saw in Section 3.3, Mertens’ theorem can be used to prove that
the product has the power series expansion
∞
n
f (x)g(x) = c n xn , where cn = an−k bk ,
n=0 k=0
convergent at least in the interval |x| < R = min{R1 , R2 }. In fact, the radius
of convergence R may well be larger than both R1 and R2 . For instance,
consider the pair of functions
1+x 1−x
f (x) = and g(x) = .
1−x 1+x
90 3. Power Series and Related Topics
are power series developments in some interval |x| < r. Then |g(x)| < r if x
is sufficiently small, say |x| < ρ ≤ r, and the composition
∞
h(x) = f (g(x)) = an [g(x)]n
n=1
3.6. Composition of power series 91
where bnm = 0 for all m < n. Therefore, the composite function has the
form
∞
∞
(3) h(x) = an bnm xm .
n=1 m=1
and choose a positive number ρ1 ≤ ρ small enough that |G(x)| < r whenever
|x| < ρ1 . Then the composition
∞
H(x) = F (G(x)) = |an |[G(x)]n
n=1
is convergent. But since |bnm | ≤ Bnm , this implies that the double series
(3) is absolutely convergent for |x| < ρ1 , which completes the proof that the
composite function h(x) = f (g(x)) can be expanded in power series in some
neighborhood of the origin, and that the formal method for calculating the
coefficients is valid.
Finally, it must be acknowledged that from the viewpoint of complex
function theory, some paradoxes encountered in the real domain are readily
explained, and the theory becomes much more transparent. In particular,
the tools of complex analysis provide easy proofs of the existence of power
series developments for the product, quotient, and composition of two func-
tions known to have power series expansions. However, the actual calcula-
tion of coefficients in those developments involves essentially the same tricks
discussed here for real power series. Thus the argument presented in this
section, justifying the formal calculation of coefficients in the composition
of power series on the basis of Cauchy’s double series theorem, is also of
interest for complex power series and is readily adapted to that setting.
1
and set x = 2 to obtain
1 ∞
1 1 1
log 2 = + + + ··· = ,
2 8 24 k2k
k=1
accurate to 6 decimal places. Later he used what is now called the “Euler-
Maclaurin summation formula” (developed in Chapter 11 of this book) to
compute the sum to 20 decimal places. Then in 1735 Euler solved the Basel
problem, making the sensational discovery that the sum of the series is π 2 /6.
Euler’s first derivation of the sum π 2 /6 was not at all rigorous. Other
mathematicians criticized his method, but they could hardly question the
result because π 2 /6 matched the true sum numerically to 20 decimal places!
(Euler’s “proof ” is described at the end of Section 8.5.)
Soon after his initial triumph, Euler evaluated the more general sums
∞
2n−1 π 2n
1 n+1 2
= (−1) B2n , n = 1, 2, 3, . . . ,
k 2n (2n)!
k=1
1 ∞
x
= Bn xn , |x| < 1 .
e −1
x n!
n=0
B2 = 1
6 , B4 = − 30
1
, B6 = 1
42 , B8 = − 30
1
, B10 = 5
66 .
Thus
∞
∞
∞
1 π2 1 π4 1 π6
= , = , = , etc.
k2 6 k4 90 k6 945
k=1 k=1 k=1
We will evaluate these more general sums in Chapter 11, which is devoted
to Bernoulli numbers and their applications.
Many proofs of Euler’s basic relation 1/k 2 = π 2 /6 have been found,
but none is really simple. The standard proofs use advanced methods such
94 3. Power Series and Related Topics
or
∞ ∞
3 1 1
= ,
4 n2 (2k + 1)2
n=1 k=0
in the special case where α = −1/2. Our previous calculation (cf. Section
3.5) led to the expression
− 12 (−1)n (2n)!
= , n = 1, 2, . . . .
n 22n (n!)2
22n (n!)2 √
lim √ = π.
n→∞ (2n)! n
3.7. Euler’s sum 95
Second Proof. Here is a totally different approach, based upon the curious
identity
n
2 kπ n(2n − 1)
(4) cot = , n = 1, 2, . . . .
2n + 1 3
k=1
(See Chapter 4, Section 4.1.) Take reciprocals and squares to infer that
(2n + 1)2 1
n
n(2n − 1) n(2n − 1)
< 2 2
<n+ .
3 π k 3
k=1
The last step of the argument used the fact that a monic polynomial of
degree m with roots x1 , x2 , . . . , xm has the factorization
xm + am−1 xm−1 + · · · + a1 x + a0 = (x − x1 )(x − x2 ) · · · (x − xm ) .
By expanding the product and equating coefficients of xm−1 , one sees that
the sum of the roots is
m
xk = −am−1 .
k=1
In the same way as before, this formula then leads to the general result
∞
2n−1 π 2n
1 n+1 2
= (−1) B2n , n = 1, 2, 3, . . . .
k 2n (2n)!
k=1
where a is an odd integer and b is a number in the interval 0 < b < 1 such
that ab > 1 + 3π/2. Since 0 < b < 1, it follows from the Weierstrass M-test
that this series of continuous functions converges uniformly on the whole
real line, so that its sum is everywhere continuous. The assumption that
ab > 1 + 3π/2 is needed for the more technical proof that the function is
nowhere differentiable. We will come to the details presently.
After Weierstrass published his construction, many other examples were
devised. One idea was to look at a general function of the form
∞
f (x) = bk ϕ(ak x) ,
k=0
where ϕ is a suitable periodic function. In a scholarly paper written in 1918,
Knopp [9] discussed this approach among others, and gave many references
to earlier literature. A few years later, van der Waerden [16] discovered a
relatively simple example of this type. In modified form presented by Rudin
[12], van der Waerden’s example is
∞
(8) f (x) = (3/4)k ϕ(4k x) ,
k=0
is the nth partial sum of the series and rn (x) is the remainder. For an
arbitrarily chosen point x, let
sn (x + h) − sn (x) rn (x + h) − rn (x)
φn (h) = and ψn (h) =
h h
denote the respective difference quotients. By the mean value theorem,
n−1
φn (h) = sn (t) = − ak πbk sin(ak πt)
k=0
n−1
(ab)n − 1 π(ab)n
(9) |φn (h)| ≤ π (ab)k = π < .
ab − 1 ab − 1
k=0
Now hold n fixed and let m be the integer closest to an x, so that |an x −
m| ≤ 12 . (In case of ambiguity, when an x is a half-integer, take for instance
m = an x + 12 .) Let λn = an x − m and define the increment
1 − λn m+1
hn = n
= − x,
a an
so that 0 < hn ≤ 32 a−n . Since a is an odd integer, we see that
cos ak π(x + hn ) = cos ak−n (m + 1)π = (−1)m+1 for k ≥ n .
1 k
∞
ψn (hn ) = b cos ak π(x + hn ) − cos ak πx
hn
k=n
(−1)m+1 k
∞
= b 1 + cos(ak−n λn π) .
hn
k=n
Every term in this infinite series is nonnegative, so we can discard all but
the first term to obtain the estimate
bn bn
|ψn (hn )| ≥ [1 + cos(λn π)] ≥ ≥ 23 (ab)n ,
hn hn
3.8. Continuous nowhere differentiable functions 101
since |λn | ≤ 1
2 and 0 < hn ≤ 32 a−n .
The final step is to invoke the estimate (9) and conclude that
f (x + hn ) − f (x)
= |φn (hn ) + ψn (hn )|
hn
≥ |ψn (hn )| − |φn (hn )|
n 2 π
≥ (ab) − .
3 ab − 1
Because ab > 1 + 3π/2, the lower bound is positive and we see that
f (x + hn ) − f (x)
→∞ as n → ∞ .
hn
Exercises
1. Find the radius of convergence of the power series ∞ n! n
n=1 nn x .
2. Directly from the geometric series ∞ n=0 x = 1/(1 − x) , calculate the
n
sums
∞
x
(a) nxn = ,
(1 − x)2
n=1
∞
x(1 + x)
(b) n2 xn = ,
(1 − x)3
n=1
5. Show that
1+x
1
log = x + 13 x3 + 15 x5 + . . . , |x| < 1 .
2 1−x
Use the result to calculate log 11
9 = 0.20067 . . . by choosing only the first
3 nonzero terms. Estimate the remainder in the Taylor series expansion to
show that this numerical value is correct to 5 decimal places. Finally, check
the number on a calculator or computer and record it to 8 or 10 decimal
places.
6. Consider the function f (x) = e−1/x for x > 0 and f (x) = 0 for x ≤ 0.
Show that f is of class C ∞ on R , and that f (n) (0) = 0 for n = 0, 1, 2, . . . .
Conclude that f cannot be expanded into Taylor series at the origin.
Exercises 103
First prove that the series converges for |x| < 1, and let f (x) denote its sum.
Verify the identity
α−1 α−1 α
+ =
k k−1 k
and use it to show that (1 + x)f (x) = αf (x). Then solve the differential
equation to conclude that f (x) = (1 + x)α .
∞
√ (−1)k+1 (2k)! k
1+x=1+ x , |x| < 1 .
22k (k!)2 (2k − 1)
k=1
∞
(−1)k (2k)! √
1+ = 1/ 2 .
22k (k!)2
k=1
11. Suppose that the functions u1 (x), u2 (x), . . . are continuous on a set
E ⊂ R, and there is a constant M such that
n
uk (x) ≤ M , n = 1, 2, . . . ,
k=1
*20. (a) Show that the van der Waerden function f defined by (8) is
continuous on the whole real line.
(b) Show that the function ϕ in (8) satisfies the Lipschitz condition
|ϕ(s) − ϕ(t)| ≤ |s − t|.
(c) For any fixed number x ∈ R and any integer n > 0, define the
number δn = ± 12 4−n , where the sign is chosen so that no integer lies strictly
between 4n x and 4n (x + δn ). Next define the difference quotients
ϕ(4k (x + δn )) − ϕ(4k x)
Qk = , k = 0, 1, 2, . . . ,
δn
and show that Qk = 0 for every k > n.
(d) Show that |Qk | ≤ 4k for 0 ≤ k ≤ n and that |Qn | = 4n .
(e) Show that the difference quotients of f satisfy
f (x + δn ) − f (x) n−1
≥ 3n 4−n Qn −
3k 4−k Qk ≥ 12 (3n + 1) ,
δn
k=0
23. The theory of power series can be extendedto the complex plane.
For any complex coefficients cn the power series ∞ n
n=0 cn z in a complex
variable z can be shown to converge for every point in some disk |z| < R
References 107
and to diverge whenever |z| > R. (Here, as for real power series, we adopt
the convention that 0 ≤ R ≤ ∞.) For instance, this analysis allows us to
define the exponential function
z2 z3
ez = 1 + z + + + ...
2! 3!
for all complex numbers z. Use this definition to derive Euler’s formula
References
[1] T. M. Apostol, “Another elementary proof of Euler’s formula for ζ(2n) ”, Amer.
Math. Monthly 80 (1973), 425–431.
[2] Raymond Ayoub, “Euler and the zeta function”, Amer. Math. Monthly 81
(1974), 1067–1086.
[3] R. P. Boas, A Primer of Real Functions, Third edition, Mathematical Associa-
tion of America, Washington, D.C., 1981.
[4] B. R. Choe, “An elementary proof of ∞ 2 2
n=1 1/n = π /6 ”, Amer. Math. Monthly
94 (1987), 662–663.
[5] Joseph Gerver, “The differentiability of the Riemann function at certain rational
multiples of π”, Amer. J. Math. 92 (1970), 33–55.
[6] Joseph Gerver, “More on the differentiability of the Riemann function”, Amer.
J. Math. 93 (1971), 33–41.
[7] G. H. Hardy, “Weierstrass’s non-differentiable function” Trans. Amer. Math.
Soc. 17 (1916), 301–325.
∞
[8] F. Holme, “En enkel beregning av k=1 1/k2 ”, Nordisk Mat. Tidskr. 18 (1970),
91–92; 120. [Norwegian, English summary]
[9] Konrad Knopp, “Ein einfaches Verfahren zur Bildung stetiger nirgends differen-
zierbarer Funktionen”, Math. Zeitschrift 2 (1918), 1–26.
[10] Gerhard Kowalewski, “Über Bolzanos nichtdifferenzierbare stetige Funktion”,
Acta Math. 44 (1923), 315–319.
∞
[11] I. Papadimitriou, “A simple proof of the formula k=1 k−2 = π 2 /6 ”, Amer.
Math. Monthly 80 (1973), 424–425.
[12] Walter Rudin, Principles of Mathematical Analysis, Third edition, McGraw–
Hill, New York, 1976.
108 3. Power Series and Related Topics
The proof is quite easy. The function f (x) = ex − x − 1 has the value
f (0) = 0 and derivative f (x) = ex − 1, with f (0) = 0. But ex is an
increasing function, so f (x) > 0 for x > 0 and f (0) < 0 for x < 0. Thus
f (x) decreases to zero for x < 0 and increases from zero for x > 0, so that
f (x) > 0 for all x = 0. In other words, 1 + x < ex for all x = 0.
The next inequality is
2 π
x < sin x < x , 0<x< ,
π 2
as illustrated by Figure 2.
109
110 4. Inequalities
0 π/2
θ
0 1
d
as h → 0. In other words, {sin x} = cos x. Now observe that
dx
d sin x x cos x − sin x π
= < 0, 0<x< ,
dx x x2 2
sin x
since x < tan x. Thus the function g(x) = decreases from g(0) = 1 to
x
g(π/2) = 2/π , so that
2 sin x π
< < 1, 0<x< ,
π x 2
which was to be proved.
112 4. Inequalities
(1 + x)p ≥ 1 + px , p ≥ 1 , x > −1 .
Then h (x) < 0 for −1 < x < 0, whereas h (x) > 0 for x > 0. Since h(0) = 0,
it follows that h(x) ≥ 0 for all x > −1, which is the desired result.
a · b = a1 b1 + a2 b2 + · · · + an bn .
1/2
The norm of a is defined to be a = a21 + a22 + · · · + a2n , so that a·· a =
a2 . Cauchy’s inequality then says that |a · b| ≤ ab.
4.2. Cauchy’s inequality 113
a × b = (a2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 ) ,
so that
1
3 3
a × b = 2
(aj bk − ak bj ) =
2
(aj bk − ak bj )2 .
2
j<k j=1 k=1
1 1
The inverse function is found to be x = y 1/(p−1) = y q−1 , since p + q = 1.
The area under the curve x = y q−1 , for 0 ≤ y ≤ b, is
b
1 q
y q−1 dy = b .
0 q
But it is clear geometrically (see Figure 4) that the sum of these two areas
is greater than or equal to the area of the rectangle 0 ≤ x ≤ a, 0 ≤ y ≤ b.
Hence
1 1
ab ≤ ap + bq ,
p q
with equality if and only if ap−1 = b, so that the curve y = xp−1 passes
through the vertex (a, b) of the rectangle. The condition for equality reduces
to ap = bq , as claimed.
y y
b
b
x x
a a
n
n
|ak |p = |bk |q = 1 ,
k=1 k=1
because the general case can then be deduced by normalization. After this
reduction, Young’s inequality gives
n
1 n
1
n
1 1
ak bk ≤ |ak | +
p
|bk |q = + = 1
p q p q
k=1 k=1 k=1
n
1/p n 1/q
= |ak |p |bk |q ,
k=1 k=1
and apply Hölder’s inequality to each of the sums on the right-hand side.
Since the conjugate index of p is q = p/(p − 1), we find
n
n 1/p
n (p−1)/p
|ak ||ak + bk | p−1
≤ |ak | p
|ak + bk | p
,
k=1 k=1 k=1
and similarly
n
n 1/p
n (p−1)/p
|bk ||ak + bk | p−1
≤ |bk | p
|ak + bk | p
.
k=1 k=1 k=1
where the last step again used the result for n = 2. This proves the inequality
for all indices n = 2k .
If n is not a power of 2, then 2k−1 < n < 2k for some integer k. Apply
the inequality for index 2k to the numbers
a1 , a2 , . . . , an , An , An , . . . , An ,
f ∞ = sup |f (x)| .
x∈R
The proofs are essentially the same as for sums and are left as exercises.
Equality occurs in Hölder’s inequality if and only if f (x)g(x) has constant
sign and |f (x)|p = λ|g(x)|q .
The integral analogue of Cauchy’s inequality, or Hölder’s inequality for
p = q = 2, is
1/2 1/2
∞ ∞ ∞
f (x)g(x) dx ≤ |f (x)| dx
2
|g(x)| dx
2
.
−∞ −∞ −∞
for positive functions f (x) on the interval [a, b]. Under the assumption that
f and log f are Riemann integrable, a proof can be based on the classical
arithmetic–geometric mean inequality for sums, approximating the integrals
by their corresponding Riemann sums. However, as we will see presently,
the result depends only on the convexity of the exponential function and is a
special case of a much more general inequality, known as Jensen’s inequality.
Geometrically, these inequalities say that the graph of the function lies on
or below every chord. Jensen’s inequality, in most primitive form, asserts
that a similar inequality then holds for more general means. Here, then, is
the discrete form of the inequality.
n+1
n
tk
tk xk = tn+1 xn+1 + (1 − tn+1 ) xk
1 − tn+1
k=1 k=1
4.5. Jensen’s inequality 121
which is the desired result for n + 1. This proves Jensen’s inequality. The
case of equality is left as an exercise.
It is not difficult to formulate an appropriate analogue for integrals. In
continuous form, Jensen’s inequality may be expressed as follows.
Jensen’s Inequality (continuous form). Let ϕ(y) be a convex function
on an interval J ⊂ R . Suppose the function y = f (x) is integrable over an
interval I ⊂ R , with f (I) ⊂ J. Let w(x) ≥ 0 be a weight function on I with
integral I w(x) dx = 1. Then
ϕ f (x)w(x) dx ≤ ϕ(f (x))w(x) dx .
I I
whenever a < x < y < b and 0 < t < 1. To see this, let s = 1 − t and write
tx+sy y
ϕ(tx + sy) − tϕ(x) − sϕ(y) = t ϕ (u) du − s ϕ (u) du
x tx+sy
≤ ts(y − x)ϕ (tx + sy) − st(y − x)ϕ (tx + sy) = 0 ,
holds, with strict inequality unless one of the sequences {aj } or {bk } is iden-
tically zero. The constant π cannot be replaced by any smaller number.
In other words, the bound is best possible but is attained only in trivial
cases.
Hilbert’s inequality has found applications to real and complex analysis
and to analytic number theory. Hilbert originally included the result in
4.6. Hilbert’s inequality 123
lectures at Göttingen on integral equations, but his proof was first published
in the 1908 dissertation of his student Hermann Weyl. Actually, Hilbert
obtained a weaker inequality with constant 2π, and several years later Issai
Schur constructed another proof that led to the sharp constant π. Since that
time many proofs have been found, but none is entirely simple. (See, for
instance, Hardy–Littlewood–Pólya [4], Oleszkiewicz [5], and Steele [6].) The
proof presented here was discovered by David Ullrich [7] and seems relatively
simple and straightforward. The strategy is to begin with a continuous form
of the theorem and to deduce the discrete form from it.
∞
∞
f (x) = aj ψj (x) and g(y) = bk ψk (y) .
j=1 k=1
Then
∞
∞
f 22 = a2j , g22 = b2k , and
j=1 k=1
∞ ∞ ∞
∞
f (x)g(y)
dxdy = wjk aj bk , where
0 0 x+y
j=1 k=1
k j
1
wjk = dxdy .
k−1 j−1 x + y
It is obvious that wjk > 1/(j + k), and in fact the convexity of the integrand
implies that wjk > 1/(j + k − 1). Specifically, for each fixed y > 0, we see
that
j
1 1
dx > , j = 1, 2, . . . ,
j−1 x + y y + j − 12
by comparing the area under the curve with the area under the tangent line
at the midpoint x = j − 12 . A similar estimate then shows that
k
1 1 1
(3) wjk > dy > 1 = j+k−1.
k−1 y+j− 1
2 (k − 1
2) + (j − 2 )
Therefore, if neither {aj } nor {bk } is the zero-sequence, it follows from (2)
that
∞ ∞ ∞
∞ ∞ ∞
aj bk |f (x)g(y)|
< wjk |aj bk | = dxdy
j + k − 1 x+y
j=1 k=1 j=1 k=1 0 0
∞ 1/2 ∞ 1/2
≤ π f 2 g2 = π aj2 2
bk ,
j=1 k=1
which proves (1) with strict inequality. Note that (3) gives strict inequality
because |aj bk | > 0 for some pair of indices j and k.
4.6. Hilbert’s inequality 125
Proof of sharpness. The next step is to show that the constant π is best
possible in the continuous form of Hilbert’s inequality, then to adapt the
calculations to deduce that π is best possible in the discrete form as well.
For the first purpose it is useful to choose the functions
√
1/ x for 1 ≤ x ≤ R
(4) f (x) = g(x) = hR (x) =
0 elsewhere in (0, ∞) ,
The problem is then to obtain an effective lower bound for the last integral
in terms of R. We propose to show that
√ √
R R
4
(5) dudv > π log R − π log 2 − 8 .
1 1 u2 + v 2
Putting these results into (6), we arrive at the inequality (5), which shows
that the constant π is sharp in the continuous form (2) of Hilbert’s inequality.
In order to show that π is also best possible in the discrete form (1), we
choose √
1/ j for j = 1, 2, . . . , N
aj = bj =
0 for j > N ,
where N is a large integer, and let f (x) = g(x) = hN +1 (x) as defined in (4).
Then
∞
∞
f (x) ≤ aj ψj+1 (x) and g(y) ≤ bk ψk+1 (y)
j=1 k=1
where
k+1 j+1
1 1 1
wj+1,k+1 = dxdy < < .
k j x+y j+k j+k−1
Exercises
(1 + x)n ≥ 1 + nx , x > −1 ,
for n = 1, 2, . . . .
(b) Generalize the result by proving that
n
(1 + a1 )(1 + a2 ) · · · (1 + an ) ≥ 1 + ak
k=1
15. (a) Show that if a sequence a = (a1 , a2 , . . . ) belongs to the space p for
some p < ∞, then a ∈ q for all q with p < q < ∞.
Exercises 129
holds, with strict inequality unless one of the sequences {aj } or {bk } is
identically zero. The constant π/ sin(π/p) cannot be replaced by any smaller
number.
mb1 ≤ a1 b1 + a2 b2 + · · · + an bn ≤ M b1 .
18. By successive integrations of the inequality cos x ≤ 1, show that for all
x ≥ 0,
x2 x3 x2 x4
sin x ≤ x , cos x ≥ 1 − , sin x ≥ x − , cos x ≤ 1 − + ,
2! 3! 2! 4!
and in general that
x3 x4k+1 x2 x4k+2
sin x ≤ x − +··· + , cos x ≥ 1 − + ···− ,
3! (4k + 1)! 2! (4k + 2)!
x3 x4k+3 x2 x4k+4
sin x ≥ x − +··· − , cos x ≤ 1 − + ···+
3! (4k + 3)! 2! (4k + 4)!
for each k = 0, 1, 2, . . . and all x ≥ 0. Prove by induction.
19. Let f (x) ≥ 0 on the interval [0, 1], and suppose it has a finite integral
1
A = 0 f (x) dx. Prove that
1
1+ A2 ≤ 1 + f (x)2 dx ≤ 1 + A .
0
References
[1] E. F. Beckenbach and R. Bellman, Inequalities, Springer–Verlag, New York,
1965.
[2] G. H. Hardy, J. E. Littlewood, and G. Pólya, Inequalities, Second edition, Cam-
bridge University Press, Cambridge, U.K., 1952.
[3] Nicholas D. Kazarinoff, Analytic Inequalities, Holt, Rinehart and Winston, New
York, 1961; reprinted by Dover Publications, Mineola, N.Y., 2003.
[4] D. S. Mitrinović, Elementary Inequalities, P. Noordhoff, Groningen, The Nether-
lands, 1964.
[5] Krzysztof Oleszkiewicz, “An elementary proof of Hilbert’s inequality”, Amer.
Math. Monthly 100 (1993), 276–280.
[6] J. Michael Steele, The Cauchy–Schwarz Master Class: An Introduction to the
Art of Mathematical Inequalities, Cambridge University Press, Cambridge, 2004.
[7] David Ullrich, “A simple elementary proof of Hilbert’s inequality”, Amer. Math.
Monthly, to appear.
Chapter 5
Infinite Products
Infinite products, like infinite series, arise often and are natural objects
in mathematical analysis. The convergence theory of infinite products is
closely related to that of infinite series but is more subtle. In this chapter
we develop standard criteria for convergence and uniform convergence of
infinite products.
where ak are real numbers, comes down to the behavior of the sequence of
partial products
n
pn = (1 + ak ) = (1 + a1 )(1 + a2 ) · · · (1 + an )
k=1
as n tends to infinity. The reason for writing the factors in the form (1 + ak )
will become apparent shortly.
It turns out that the “right” notion of convergence is not
the obvious
one. It would seem natural to declare an infinite product ∞ k=1 (1 + ak )
convergent if its partial products pn form a convergent sequence, but that
131
132 5. Infinite Products
and the infinite product is then assigned the value p. The infinite product is
declared to be divergent if it is not convergent; that is, if the sequence {pn }
either diverges or tends to zero. If pn → 0, the product is said to diverge to
zero.
One advantage of insisting on a nonzero limit is that convergence of
an infinite product implies convergence of the corresponding product of re-
ciprocal factors, since the sequence {1/pn } then also converges to a finite,
nonzero limit. Another important consequence is that the factors of a con-
vergent product tend to 1, just as the terms of a convergent series tend to
0. Specifically, if an infinite product converges to p , then
1 + an = pn /pn−1 → p/p = 1 as n → ∞ ,
1 k2 − 1 (k − 1)(k + 1)
1 + ak = 1 − 2
= 2
= ,
k k k2
and again the partial product telescopes to give
n
1 1·3 2·4 3·5 (n − 1)(n + 1) 1 n+1 1
pn = 1− 2 = · 2 · 2 ··· = · →
k 22 3 4 n2 2 n 2
k=2
5.1. Basic concepts 133
Our two
examples suggest the possibility that convergence of an infinite
product (1 + ak ) is equivalent to convergence of the corresponding infinite
series ak . This is actually true if the numbers ak are all of the same sign,
as we now proceed to show.
Theorem 1. If ak ≥ 0 for all k, then the product ∞ k=1 (1 + ak ) converges
∞
if and only if the series k=1 ak converges. The same is true if ak ≤ 0 for
all k.
n
Proof. nAgain let pn = k=1 (1 + ak ) denote the partial products, and let
sn = k=1 ak denote the partial sums of the related series. Suppose first
that ak ≥ 0 for all k. Then 1 ≤ pn ≤ pn+1 and sn ≤ sn+1 , so by the
monotone boundedness theorem, either of the sequences {pn } or {sn } is
convergent if and only if it is bounded above. But in view of the inequality
1 + x ≤ ex ,
1 + sn ≤ pn ≤ esn , n = 1, 2, . . . .
1 ≤ (1 + x)(1 − 2x) , − 12 ≤ x ≤ 0 ,
shows that
1
≤ 1 + ak .
1 − 2ak
134 5. Infinite Products
However,
the infinite product (1 − 2ak ) converges, since −2ak ≥ 0 and the
series
ak converges. Thus the above inequality shows that the product
(1 + ak ) is also convergent. This completes the proof.
Note that
1 1 1 1 1
1− √ 1+ √ + = 1 − 3/2 and 1− √ → 1,
k k k k k
Proof. Suppose first that the product converges. Then the partial products
pn → p = 0 as n → ∞. In particular, the partial products are bounded away
from zero: |pn | ≥ b for some b > 0 and all n. For each ε > 0, the Cauchy
condition says that
In other words, the partial products are bounded away from both zero and
infinity. Therefore, for each ε > 0 it follows from (1) that
for all m and n greater than some N ≥ M . This shows that {pn } is a Cauchy
sequence, and so pn → p as n → ∞. Finally, p = 0 since the partial products
pn are bounded away from zero. Thus the infinite product is convergent.
We can now
discuss the question of absolute convergence. An infi-
nite product (1 + ak ) is said to be absolutely convergent if the product
136 5. Infinite Products
(1 + |ak |) is convergent. In view of Theorem 1, a product is absolutely
convergent if and only if its associated series ak is absolutely convergent.
A convergent product need not be absolutely convergent, as simple examples
show. For instance, the product
1 1 1 1 1 1
1+ 1− 1+ 1− 1+ 1− ···
2 2 3 3 4 4
suggests that the product and its associated series of logarithms will converge
or diverge together. Under an assumption to guarantee that the logarithms
are well-defined, this turns out to be correct.
∞
Theorem 4. If ak > −1 for all k, then the ∞ infinite product k=1 (1 + ak )
converges if and only if the infinite series k=1 log(1 + ak ) converges.
pn → eL = 0, and the infinite product converges. The proof shows that the
value of the product is the exponentiated sum of the series, as expected.
(2) 1 2
3x < x − log(1 + x) < x2 , |x| < δ ,
for some positive constant δ < 1 sufficiently small. This is a simple conse-
quence of the limit
x − log(1 + x) 1
lim = .
x→0 x2 2
2
If the series ak converges, then ak → 0 and so |ak | < δ for all indices k
larger than some number N . The inequality (2) then gives
converges, then so does the other. In view of Theorem 4, this completes the
proof.
If the divergent
product of Example 3 is written in the first
form, with
a
k = −1/k, then a 2 < ∞, so the simultaneous divergence of
k (1+ak ) and
ak is in accordance with Theorem 5. On the other hand, if the product
is
written in the second form, then a2 = ∞ and so the convergence of ak
k
and divergence of (1 + ak ) is beyond the jurisdiction
of Theorem 5. For
Example 4, where 2 the (1 + ak ) converges but the ak diverges, we see
similarly that ak = ∞.
138 5. Infinite Products
2
If the product (1 + ak ) and the sum ak both converge,
2 then ak
also converges (cf. Exercise 5). However, the convergence of ak is by no
means necessary for √ the simultaneous divergence
of product and sum. For
example, let ak = 1/ k. Then the product (1 + ak ), the sum ak , and
the sum 2
ak all diverge.
for all real numbers x. Note that the infinite product converges for each fixed
x, by Theorem 1. It vanishes precisely for x = 0, ±1, ±2, . . . , which we know
are the zeros of the function sin πx. An infinite product representation of this
type can be viewed as a generalization of the factorization of a polynomial
according to its (real) zeros.
Consider now a general infinite product of the form
∞
(4) 1 + gk (x) ,
k=1
∞
Theorem 7. If the
∞series
k=1 |g
k (x)| converges uniformly on a set E ⊂ R,
then the product k=1 1 + gk (x) converges uniformly on E.
Proof. Let
n
n
sn (x) = |gk (x)| and Fn (x) = 1 + |gk (x)| .
k=1 k=1
have the property (5), so Theorem 6 ensures that the infinite product con-
verges uniformly on E.
Exercises 141
Exercises
1. (a) Calculate partial products and show directly that both infinite
products in Example 3 diverge to zero.
(b) Carry out the details to show that the infinite product in Example
4 is convergent.
2. Calculate partial products and show that
∞
2 1
1− = .
k(k + 1) 3
k=2
3. Show that
∞
(−1)k+1
1+ = 1.
k
k=1
∞
4. Let {xk } be a
sequence of real numbers for which 2
k=1 xk < ∞. Prove
that the product ∞ k=1 cos xk converges.
2
5. Adapt the proof
of Theorem 5 to show
that the sum ak converges
whenever the sum ak and the product (1 + ak ) both converge.
6. Prove that
∞
∞
4 1 9 1
1− 2 = and 1− 2 = .
k 6 k 20
k=3 k=4
and prove that the infinite product converges. (Its numerical value is known
to be approximately 8.700 .)
is continuous on the whole real line, without using the fact that s(x) =
sin πx.
ix
11. Recall that sinh x = 12 ex − e−x and sin x = 2i 1
e − e−ix . Use the
formula (3) for sin πx to conclude, at least formally, that
∞
x2
sinh πx = πx 1+ 2 .
k
k=1
13. Find a general formula for the partial products and show that
∞
k3 − 1 2
3
= .
k +1 3
k=2
1
(1 + x)(1 + x2 )(1 + x4 )(1 + x8 ) · · · = , |x| < 1 .
1−x
16. For each real number α > 0, show that the generalized binomial coeffi-
cients
α α(α − 1) · · · (α − n + 1)
=
n n!
tend to zero as n → ∞.
Reference
[1] Konrad Knopp, Theory and Application of Infinite Series, Second English edi-
tion, Blackie & Son, London and Glasgow, 1951; reprinted by Dover Publications,
Mineola, NY, 1990.
Chapter 6
Approximation by
Polynomials
6.1. Interpolation
The main focus of this chapter is a theorem of Weierstrass [15] that
every continuous function on a closed bounded interval can be approximated
uniformly by algebraic polynomials; that is, by functions of the form
P (x) = a0 + a1 x + a2 x2 + · · · + an xn ,
where n is a positive integer and the coefficients ak are real numbers. The
result is important because it often reduces a problem about continuous
functions to the corresponding problem for polynomials. Broadly speaking,
the theorem is a prototype for a large collection of results in approximation
theory.
To put the approximation problem in better perspective, we begin with
145
146 6. Approximation by Polynomials
a0 + a1 x0 + a2 x20 + . . . + an xn0 = y0
a0 + a1 x1 + a2 x21 + . . . + an xn1 = y1
···
a0 + a1 xn + a2 x2n + . . . + an xnn = yn
values are known (or have been measured) at a finite set of points. If a
function f is continuous over an interval [a, b], and x0 , x1 , . . . , xn are distinct
points in that interval, the Lagrange interpolation
n
f (xj ) (x)
P (x) = ,
(xj )(x − xj )
j=0
can be viewed as an approximation to f (x) over the interval [a, b]. But how
good is the approximation? Certainly it is exact if f is itself a polynomial
of degree ≤ n, but in the absence of further information there will be no
control away from the nodes xk , and the error
as measured in the uniform norm may be quite large. The error can be
controlled, however, if f is known to be sufficiently smooth.
Theorem 1. Suppose a function f has n + 1 continuous derivatives on the
interval [a, b], and let P be its Lagrange interpolation polynomial with nodes
x0 , x1 , . . . , xn . Then
1
|f (x) − P (x)| ≤ f (n+1) ∞ |(x)| ,
(n + 1)!
where (x) = (x − x0 )(x − x1 ) · · · (x − xn ).
Proof. Fix an arbitrary point x in [a, b], not equal to any of the nodes xk .
Define the function
g(t) = f (t) − P (t) − c (t) ,
where c = [f (x) − P (x)]/(x), so that g(x) = 0. It is clear that g(xk ) = 0
for k = 0, 1, . . . , n. Thus g has at least n + 2 distinct zeros in [a, b]. By
Rolle’s theorem, the derivative g (t) vanishes at least once between each pair
of zeros, so g has at least n + 1 zeros. Continuing the argument by taking
successive derivatives, we find ultimately that g (n+1) (ξ) = 0 at some point
ξ ∈ (a, b). But P is a polynomial of degree ≤ n and is a monic polynomial
of degree n + 1, so P (n+1) (t) ≡ 0 and (n+1) (t) ≡ (n + 1)! . Therefore, the
conclusion is that f (n+1) (ξ) = c(n + 1)! , or
1
f (x) − P (x) = f (n+1) (ξ) (x) ,
(n + 1)!
which yields the desired result.
The question now arises as to how the nodes should be situated to min-
imize the error of the Lagrange approximation, as measured by the uniform
148 6. Approximation by Polynomials
Lemma 1. For n > 0 the Chebyshev polynomial Tn (x) has simple zeros at
the points
2k − 1
xk = cos π , k = 1, 2, . . . , n .
2n
In the interval [−1, 1], the local maxima of |Tn (x)| occur at the points
ξk = cos(kπ/n) , k = 0, 1, . . . , n ,
d
n
Tn (x) = cos n cos−1 x = √ sin n cos−1 x = 0
dx 1 − x2
for
We can now show that among all monic polynomials of fixed degree, the
normalized Chebyshev polynomials have smallest uniform norm.
(x) = (x − x1 )(x − x2 ) · · · (x − xn )
It is clear a priori that for minimum norm all of the points xk must lie in
the interval [−1, 1], since |(x)| is a product of distances. Another interpre-
tation of the theorem is that among all polynomials of degree ≤ n − 1, the
polynomial
Pn (x) = xn − 21−n Tn (x)
gives the best uniform approximation to the function xn over the interval
[−1, 1].
Therefore, Q(ξk ) < Tn (ξk ) when k is even, and Q(ξk ) > Tn (ξk ) when k is
odd. This implies that the difference T n (x) − Q(x) vanishes at some point in
each of the n intervals (ξ1 , ξ0 ), (ξ2 , ξ1 ), . . . , (ξn , ξn−1 ). But this is impossible,
because Tn and Q are both monic polynomials of degree n, so T n − Q is a
polynomial of degree ≤ n − 1 and can have at most n − 1 zeros. Thus we
have arrived at a contradiction, which proves the theorem.
6.2. Weierstrass approximation theorem 151
Karl Weierstrass (1815–1897) was 70 years old at the time of his discov-
ery. In fact, the theorem was discovered independently at almost the same
time by Carl Runge (1856–1927), who had been a student of Weierstrass
five years earlier. Runge obtained the result as part of his groundbreaking
work on approximation theory in the complex domain.
The theorem of Weierstrass is remarkable because continuous functions
can have a much more complicated behavior than polynomials. For instance,
in 1872 Weierstrass had constructed examples of continuous functions that
are not differentiable at any point. If the function f is sufficiently smooth,
it can be approximated locally by partial sums of its Taylor series, but
in general these polynomials will not provide the global approximation of
the Weierstrass theorem. A more promising idea is to construct from f its
Lagrange interpolation polynomials at n equally spaced points in the interval
[a, b]. As n → ∞, it is reasonable to expect this sequence of polynomials
to converge uniformly to f . However, these interpolation polynomials need
not converge pointwise to f , even if f has derivatives of all orders. In
fact, the interpolation polynomials can fail to be uniformly bounded. This
surprising state of affairs makes the Weierstrass theorem appear all the more
remarkable.
Runge gave a striking example in 1901 that illustrates the failure of
approximation by Lagrange interpolation polynomials. For the function
f (x) = 1/(1+x2 ), Runge considered the interpolation polynomials at equally
spaced nodes over the interval [−5, 5] and showed that they are unbounded
for c < |x| < 5, where c = 3.63 . . . . Sergei Bernstein showed in 1912 that for
the function f (x) = |x| in the interval [−1, 1] the interpolation polynomials
at equally spaced nodes converge only at the points −1, 0, and 1. One might
suspect, in light of the results discussed in the preceding section, that the
bad behavior in these examples is the fault of equally spaced nodes, and that
nodes at the scaled roots of Chebyshev polynomials would produce better
results. Indeed, this is true when the target function f is sufficiently smooth;
for instance, when it has a continuous derivative. However, it is a theorem
of Georg Faber, proved in 1914, that for any preassigned sequence of nodes
152 6. Approximation by Polynomials
Lebesgue’s proof. This proof has a striking feature. It reduces the ap-
proximation of an arbitrary continuous function f to the approximation of
a single function, namely f (x) = |x|. The reduction proceeds as follows. If a
function f is continuous at each point of the closed bounded interval [a, b], it
is uniformly continuous there. Therefore, f can be approximated uniformly
by a continuous piecewise linear function g whose graph is a polygonal path
connecting finitely many points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) on the curve
y = f (x), where
a = x1 < x2 < · · · < xn = b .
ϕk (x) = |x − xk | + (x − xk )
k−1
c0 + 2 cj (xk − xj ) = yk , k = 1, 2, . . . , n ,
j=1
recalling that the series converges uniformly in the interval [−1, 1]. (See
Chapter 3, Section 3.5.) Now replace x by 1 − t2 to see that
|t| = 1 − 12 (1 − t2 ) − 18 (1 − t2 )2 − . . . ,
and the convergence is again uniform in [−1, 1]. This shows that |t| can be
approximated uniformly by partial sums of the series, which are polynomials
in t, and the proof is complete.
Kn (x) = cn (1 − x2 )n , n = 1, 2, 3, . . . ,
(2n + 1)!
cn = 2n+1 2
∼ n/π , n → ∞,
2 (n!)
x
-1 1
√
but the more elementary estimate cn < n will suffice for our purpose.
Recall first that the inequality (1 + t)n ≥ 1 + nt holds for all t > −1, a fact
easily verified by induction. Hence
For each fixed x the number Pn (x) may be viewed as a “weighted average”
of the numbers f (t) as t ranges over the interval [0, 1], with greatest weight
attached (when n is large) to the values of t near x. Intuitively, then, it is
to be expected that Pn (x) → f (x) as n → ∞. Because Kn is a polynomial,
it is easy to see that Pn is also a polynomial.
In order to prove that Pn (x) → f (x) uniformly in [0, 1], it is convenient
first to extend the given function by setting f (x) = 0 for all x outside the
interval [0, 1]. The extended function is continuous on R because of our
initial assumption that f (0) = f (1) = 0. Since Kn is an even function, the
interval of integration can be shifted to give
1 1−x
Pn (x) = Kn (t − x)f (t) dt = Kn (t)f (x + t) dt .
0 −x
1
since −1 Kn (t) dt = 1. Now write |f (x)| ≤ M in [0, 1], and invoke the
basic theorem that a function continuous at each point of a closed bounded
interval is uniformly continuous there. Thus to each ε > 0 there corresponds
a number δ > 0 such that
we find
1
|Pn (x) − f (x)| = Kn (t)[f (x + t) − f (x)] dt
−1
1
≤ Kn (t)|f (x + t) − f (x)| dt
−1
−δ δ
≤ 2M Kn (t) dt + Kn (t)|f (x + t) − f (x)| dt
−1 −δ
1
+ 2M Kn (t) dt
δ
δ
√
≤ 4M n(1 − δ ) + ε
2 n
Kn (t) dt
−δ
√
< 4M n(1 − δ 2 )n + ε < 2ε
The main idea of Landau’s proof, convolution with a peaking kernel, was
already present (as Landau [4] acknowledged) in the original proof given by
Weierstrass [15]. But instead of a polynomial peaking kernel, Weierstrass
used the kernel e−(x/a) with a > 0, for which
2
∞
√
e−(x/a) dx = a π .
2
−∞
Lemma 2.
n
(a) b(k; n, x) = 1 ;
k=0
n
(b) (k − nx)2 b(k; n, x) = nx(1 − x) .
k=0
The proof of (a) is the simple observation that by the binomial theorem,
the sum is equal to [x + (1 − x)]n = 1. The sum in (b) is a calculation of the
variance of the binomial distribution; the proof is deferred for the moment.
In view of (a), we have
n
Bn (x) − f (x) = [f (k/n) − f (x)] b(k; n, x) .
k=0
The proof that Bn (x) tends uniformly to f (x) appeals again to the uni-
form continuity of f on [0, 1]. For each ε > 0, there is a δ > 0 such that
|f (x) − f (y)| < ε for all pairs of points x, y ∈ [0, 1] with |x − y| < δ. Again
suppose |f (x)| ≤ M for all x in the interval [0, 1], and write
n
|Bn (x) − f (x)| ≤ |f (k/n) − f (x)| b(k; n, x) = S1 + S2 ,
k=0
k
where the sum S1 extends over all integers k (0 ≤ k ≤ n) with − x < δ
n
and the sum S2 is taken over those integers k with nk − x ≥ δ. By the
uniform continuity of f and part (a) of the lemma, we see that S1 < ε. On
the other hand,
n
(k/n − x)2
S2 ≤ 2M b(k; n, x) ≤ 2M b(k; n, x)
δ2
|n
k
−x|≥δ k=0
2M
n
2M x(1 − x) 2M
= 2 2
(k − nx)2 b(k; n, x) = 2
≤ 2 <ε
δ n δ n δ n
k=0
for all n ≥ N > 2M/(δ 2 ε), where part (b) of the lemma has been used.
Observe that N depends only on ε, so we have shown that
whenever n ≥ N . This shows that Bn (x) → f (x) uniformly in [0, 1], which
completes Bernstein’s proof of the Weierstrass approximation theorem.
6.4. Bernstein polynomials 159
n
n
n
1 = [x + (1 − x)] = n
x (1 − x)
k n−k
= b(k; n, x) ,
k
k=0 k=0
n n
n−1
μ= k b(k; n, x) = nx xk−1 (1 − x)n−k
k−1
k=0 k=1
n−1
= nx b(j; n − 1, x) = nx ,
j=0
n
n
(k − nx) b(k; n, x) =
2
(k 2 − 2kμ + μ2 )b(k; n, x)
k=0 k=0
n
= k 2 b(k; n, x) − μ2 .
k=0
But
n
n
2
k b(k; n, x) = μ + k(k − 1)b(k; n, x)
k=0 k=2
n
n−2
= μ + n(n − 1)x 2
xk−2 (1 − x)n−k
k−2
k=2
n−2
= μ + n(n − 1)x2 b(j; n − 2, x)
j=0
which combines with the previous formula to yield the desired result.
The Bernstein polynomials have the remarkable property that for func-
(ν)
tions f with continuous νth derivative, Bn (x) → f (ν) (x) uniformly in [0, 1].
(See Exercise 14.) The books by Cheney [2], Davis [3], and Lorentz [6] may
be consulted for this and other facts about Bernstein polynomials.
160 6. Approximation by Polynomials
can be made arbitrarily small with some choice of polynomial P . The theo-
rem suggests the problem of finding the best approximation by polynomials
of fixed degree. More precisely, it is of interest to determine the quantity
En (f ) = inf f − P ∞ ,
P
Before embarking on the proof, let us remark that the Chebyshev poly-
nomials provide a special case of this alternating property. As previously
noted, Theorem 2 can be interpreted as saying that among all polynomials
of degree ≤ n − 1,
P (x) = xn − 21−n Tn (x)
gives the best uniform approximation to the function f (x) = xn over the
interval [−1, 1]. Lemma 1 then confirms that
f (x) − P (x) = 21−n Tn (x)
attains its maximum absolute value in the interval [−1, 1], with alternating
signs at n + 1 points ξk = cos(kπ/n) , for k = 0, 1, . . . , n. Note that n is
replaced here by n − 1.
162 6. Approximation by Polynomials
denote the subintervals in which |ϕ| attains the value δ , with numbering
chosen so that
S = [a, b] \ (I1 ∪ I2 ∪ · · · ∪ Im ) .
σkj +1 = −σkj , j = 1, 2, . . . , ν .
P − Q = (f − Q) − (f − P )
It is clear that the set of all polynomials P (x) in one real variable is an
algebra that separates points in any interval [a, b] and does not vanish at
any point, so the classical approximation theorem of Weierstrass is a special
case of the Stone–Weierstrass theorem. The set of polynomials P (F (x)) of
a fixed function F is a more general example of an algebra that does not
vanish at any point, but it separates points only if F is univalent on E. As
a special case, the set of all even polynomials is an algebra that does not
separate points in any symmetric interval [−a, a].
Recall that a subset of a metric space is said to be compact if it has the
Heine–Borel property: each open covering contains a finite subcovering. For
the sake of simplicity, we will prove the Stone–Weierstrass theorem under
the special assumption that E is a closed bounded subset of the real line,
but the argument generalizes with little change to arbitrary metric spaces.
A simple algebraic lemma will be needed.
ϕ = [g − g(x1 )] h2 , ψ = [g − g(x2 )] h1 .
Proof of theorem. Let B denote the uniform closure of A ; that is, the set
of all functions uniformly approximable by functions in A. More precisely,
g ∈ B if for each ε > 0 there exists f ∈ A such that |g(x) − f (x)| < ε for
all x ∈ E. It is easy to see that B is an algebra of continuous functions. We
are to prove that B is the algebra of all continuous functions on E.
The first step is to show that |f | ∈ B whenever f ∈ B. To see this, let
M = sup{|f (x)| : f ∈ E} and let ε > 0. By the Weierstrass approximation
theorem, there is a polynomial P such that
max(f, g) = 12 (f + g) + 12 |f − g| and
min(f, g) = 1
2 (f + g) − 2 |f
1
− g|
By the continuity of ht and f , it follows that ht (x) > f (x) − ε for all x in
some open set Ut containing t. But the collection of sets {Ut : t ∈ E} forms
an open covering of the compact set E, so the Heine–Borel theorem says
that
E ⊂ Ut1 ∪ Ut2 ∪ · · · ∪ Utn
for some finite set of points t1 , t2 , . . . , tn ∈ E. Now define
Then h ∈ B and it has the properties h(s) = f (s) and h(x) > f (x) − ε for
all x ∈ E, by construction of the functions ht and the neighborhoods Ut .
Thus with gs = h our claim is established.
The final step is to show that every continuous function is uniformly
approximable by functions in B, or equivalently that f ∈ B. The proof
involves another appeal to the Heine–Borel theorem. Given a function f
continuous on E, a point s ∈ E, and a number ε > 0, we have constructed
a function gs ∈ B for which gs (s) = f (s) and gs (x) > f (x) − ε for all x ∈ E.
By continuity, the inequality gs (x) < f (x) + ε persists in some open set Vs
containing s. The sets Vs form an open covering of E, so there is a finite
subcovering:
E ⊂ Vs1 ∪ Vs2 ∪ · · · ∪ Vsm
for some finite collection of points s1 , s2 , . . . , sm . Now define
Then g ∈ B and g(x) < f (x) + ε for all x ∈ E, by the construction of g and
the sets Vs . On the other hand, g(x) > f (x) − ε for all x ∈ E because each
function gs has this property. Combination of the two inequalities gives
Of course, the power series need not converge at any point x0 = 0, since
that would make its sum f infinitely differentiable in the interval (−x0 , x0 ),
whereas f is not required to have a derivative at any point.
One particular case of Theorem 8 is worthy of note. Taking a0 = a1 =
· · · = an = 0, we infer that each continuous function f on an interval [−r, r]
with f (0) = 0 can be approximated uniformly by polynomials of the form
rn+1 ε
rn+1 + rn+2 + · · · = < .
1−r 2
Note that Corollary 2 fails unless r < 1. Consider for instance the
function f (x) = x/2 on the interval [−1, 1].
170 6. Approximation by Polynomials
Corollary
∞ 3 (Fekete’s theorem). There exists a single power series
k whose partial sums s (x) = n k
a
k=1 k x n k=1 ak x approximate every func-
tion f continuous on an arbitrary interval [−r, r], provided only that f (0) =
0. More precisely, to each such function f there corresponds an increas-
ing sequence {n1 , n2 , . . . } of positive integers such that snj (x) → f (x) as
j → ∞, uniformly in −r ≤ x ≤ r.
Proof. It follows directly from the Weierstrass theorem that each func-
tion f continuous on an interval [a, b] can be approximated uniformly with
any prescribed accuracy by polynomials with rational coefficients (cf. Ex-
ercise 19). The set of polynomials with rational coefficients is countable.
Let {Q1 (x), Q2 (x), . . . } be an enumeration of all polynomials with rational
coefficients that vanish at the origin: Q1 (0) = Q2 (0) = · · · = 0. Choose
a polynomial P1 (x) with P1 (0) = 0 such that |P1 (x) − Q1 (x)| < 1 for all
x ∈ [−1, 1]. Appealing to Pál’s theorem (Theorem 8), next choose a poly-
nomial P2 (x) that begins with P1 (x) and has the approximation property
1
|P2 (x) − Q2 (x)| < for all x ∈ [−2, 2] .
2
Now proceed inductively. Having chosen P1 , P2 , . . . , Pn−1 , choose a polyno-
mial Pn (x) that begins with Pn−1 (x) and satisfies
1
|Pn (x) − Qn (x)| < for all x ∈ [−n, n] .
n
Then P1 (x), P2 (x), . . . are partial sums of a power series with the desired
property. To see this, suppose f (x) is continuous on an interval [−r, r]
and f (0) = 0. Choose an increasing sequence {nk } of indices such that
Qnk (x) → f (x) uniformly on [−r, r] as k → ∞. Then by construction,
[Pnk (x) − Qnk (x)] → 0 and so Pnk (x) → f (x) uniformly on [−r, r].
Exercises
1. Use induction to calculate the Vandermonde determinant
1 x0 x20 . . . xn0
1 x1 x21 . . . xn1
. . = (xk − xj ) .
. . .
.. ..
.
. .
j<k
1 xn x2n . . . xnn
Suggestion. Moving from right to left, subtract xn times the kth column
from the (k + 1)st column, then remove common factors and reduce the
problem to the calculation of a Vandermonde determinant with n replaced
by n − 1.
172 6. Approximation by Polynomials
6. (a) Show that the Chebyshev polynomials of second kind satisfy the
differential equation
8. Show that the polynomials Tn (x) and Un (x) are even functions when n
is even and odd functions when n is odd.
|f (x) − P (x)| < ε , |f (x) − P (x)| < ε , . . . , |f (n) (x) − P (n) (x)| < ε
Bn (x; 3) = 2
(n − 1)(n − 2)x3 + 3(n − 1)x2 + x .
n
In particular, B1 (x; 3) = x and B2 (x; 3) = 14 (3x2 + x).
Hint. Write k 3 = k(k − 1)(k − 2) + 3k 2 − 2k.
(c) Show by induction that Bn (x; m) has degree n for n ≤ m and degree
m for n > m.
(d) Generalize (c) by showing that if f is any polynomial of degree m,
then its Bernstein polynomial Bn (x) has degree n for n ≤ m and degree
≤ m for n > m.
Exercises 175
n−1
n−1
k+1
k
Bn (x; f ) =n f −f xk (1 − x)n−1−k ,
k n n
k=0
(b) If f has a continuous derivative in [0, 1], show that gn (x) converges
uniformly to f (x) as n → ∞.
(c) Deduce that |Bn−1 (x; gn ) − Bn−1 (x; f )| → 0 uniformly in [0, 1].
Hint. Apply the result of Exercise 12.
(d) Conclude that if f is continuously differentiable, then Bn (x; f ) →
f (x) uniformly in [0, 1].
*(e) By a similar method, prove more generally that if f has a continuous
(ν)
derivative of order ν, then Bn (x; f ) → f (ν) (x) as n → ∞, uniformly in
[0, 1].
Reference: Lorentz [6].
15. Let I n denote the closed unit cube in Rn , consisting of all points
(x1 , x2 , . . . , xn ) with 0 ≤ xj ≤ 1 for j = 1, 2, . . . , n. If a function
f (x1 , x2 , . . . , xn ) is continuous on I n , show that it can be uniformly ap-
proximated on I n by polynomials; that is, by finite linear combinations of
the monomials xk11 xk22 · · · xknn .
n
P (x) = a0 + (ak cos kx + bk sin kx)
k=1
17. Let f be continuous on the interval [0, 2π] with f (0) = f (2π), and form
the Poisson integral
2π
1 1 − r2
u(r, θ) = f (t) dt , 0 ≤ r < 1.
2π 0 1 − 2r cos(θ − t) + r2
Show that u(r, θ) → f (θ) as r → 1, uniformly for θ ∈ [0, 2π]. Give two
proofs, one patterned after Landau’s proof of Weierstrass’s theorem, the
other invoking the trigonometric form of Weierstrass’s theorem.
18. Carry out the details in the deduction of Corollary 1 from Pál’s theorem
(Theorem 8).
21. Show directly, without appeal to the Müntz–Szász theorem, that for
each integer m > 0, every continuous function can be approximated uni-
formly on the interval [0, 1] by polynomials of the form
References
[1] S. N. Bernstein, “Démonstration du théorème de Weierstrass fondée sur le calcul
des probabilités”, Comm. Soc. Math. Kharkow 13 (1912), 1–2.
[2] E. W. Cheney, Introduction to Approximation Theory, McGraw–Hill, New York,
1966.
[3] Philip J. Davis, Interpolation and Approximation, Blaisdell, New York, 1963.
[4] Edmund Landau, “Über die Approximation einer stetigen Funktion durch eine
ganze rationale Funktion”, Rend. Circ. Mat. Palermo 25 (1908), 337–345.
[5] Henri Lebesgue, “Sur l’approximation des fonctions”, Bull. Sci. Math. 22 (1898),
278–287.
[6] G. G. Lorentz, Bernstein Polynomials, University of Toronto Press, Toronto,
1953.
References 177
[7] Chaim Müntz, “Über den Approximationssatz von Weierstrass”, in H.A. Schwarz
Festschrift, Mathematische Abhandlungen, Berlin, 1914, pp. 303–312.
[8] Julius Pál, “Über eine Anwendung des Weierstrass-schen Satzes von der Annä-
herung stetiger Funktionen durch Polynome”, Tôhoku Math. J. 5 (1914), 8–9.
[9] Julius Pál, “Zwei kleine Bemerkungen”, Tôhoku Math. J. 6 (1914), 42–43.
[10] A. Pincus, “Weierstrass and approximation theory”, J. Approx. Theory 107
(2000), 1–66.
[11] Walter Rudin, Principles of Mathematical Analysis, Third edition,
McGraw–Hill, New York, 1976.
[12] M. H. Stone, “The generalized Weierstrass approximation theorem”, Math.
Magazine 21 (1948), 167–184; 237–254.
[13] Otto Szász, “Über die Approximation stetiger Funktionen durch lineare
Aggregate von Potenzen”, Math. Annalen 77 (1916), 482–496.
[14] Charles de la Vallée-Poussin, Leçons sur l’approximation des fonctions d’une
variable réelle, Gauthier–Villars, Paris, 1919; reprinted 1952.
[15] Karl Weierstrass, “Über die analytische Darstellbarkeit sogenannter willkür-
licher Functionen einer reellen Veränderlichen”, Sitzungsberichte der Königlich
Preussischen Akademie der Wissenschaften zu Berlin, 1885, pp. 633–639, 789–805.
Chapter 7
Tauberian Theorems
179
180 7. Tauberian Theorems
However, the possible existence of the Abel limit when a series diverges
suggests
a natural way to assign it a generalized sum. For example, the
series ∞ n=0 (−1) n is divergent, but
∞
1 1
f (x) = (−1)n xn = → as x → 1− ,
1+x 2
n=0
so we may say that the series is Abel summable to the sum 12 . Observe
that
the extended notion of sum retains its linearity. In other words,
if an is
Abel summable to A and bn is Abel summable to B, then (an + bn ) is
summable to A + B and can is summable to cA for any constant c. Also,
Abel’s theorem guarantees that the Abel sum of a convergent series exists
and is equal to the ordinary sum.
A similar technique for summation of divergent series can be based on
the averages of partial sums. Let
sn = a0 + a1 + · · · + an
denote the partial sums of the series, and let
s0 + s1 + · · · + sn
σn =
n+1
denote their arithmetic means, also known as the Cesàro means. The series
an is said to be Cesàro summable to the sum σ if if σn → σ as n → ∞.
Recall (cf. Chapter 1, Exercise 13) that σn → s whenever sn → s, but
the sequence {σn } may converge when {sn } does not. For instance, the
divergent series ∞ n
n=0 (−1) has partial sums sn = 1 when n is even and
sn = 0 when n is odd, so it is Cesàro summable to the sum 12 , which is the
same as its Abel sum. The series ∞ n=1 (−1)
n+1 n is not Cesàro summable,
N
ε
≤ (n + 1)|σn − σ| + ,
(1 − x)2
n=0
and so
∞
N
an xn − σ ≤ (1 − x)2 (n + 1)|σn − σ| + ε < 2ε
n=0 n=0
Many other summation procedures have been introduced, and the cor-
responding “Abelian theorems” proved, asserting that whenever a series is
convergent or summable by some method, it must be summable to the same
sum by another more powerful method. For instance, a series is said to be
Borel summable if the limit
∞
−x 1
lim e sn xn
x→∞ n!
n=0
exists. The method is named for Émile Borel, who introduced it in 1899 and
pointed out the corresponding Abelian theorem, that a convergent series is
Borel summable to its ordinary sum (cf. Exercise 2). Borel summability
arises naturally in complex function theory, especially in problems of analytic
continuation.
182 7. Tauberian Theorems
exists, where
x log(1/x)
L(x) = for 0 < x < 1, and L(1) = 1 ,
1−x
Briefly, Tauber’s theorem asserts that if a series is Abel summable and its
terms satisfy the additional condition nan → 0, then the series is convergent.
The theorem uncovers a remarkable phenomenon. Not only does the Abel
method fail to sum series that diverge too rapidly, but it also fails to sum
series whose divergence is too slow. For example, if |an | ≤ 1/(n log n) ,
Tauber’s theorem tells us that the series an is not Abel summable unless
it is convergent. The same is true if |an | ≤ 1/n , but the proof is based on a
stronger form of Tauber’s theorem requiring only that the sequence {nan }
be bounded. More about this later.
7.3. Theorems of Hardy and Littlewood 183
n
Proof of Tauber’s theorem. With the notation sn = k=0 ak for the
partial sums, we can write
n ∞
sn − f (x) = ak (1 − x ) −
k
ak x k .
k=1 k=n+1
to conclude that
n ∞
|sn − f (x)| ≤ (1 − x) k|ak | + |ak |xk .
k=1 k=n+1
Now let ε > 0 be given. By hypothesis, we may choose n large enough that
k|ak | < ε for all k > n. Then
∞
∞
∞
1 k ε k ε
|ak |xk < ε x < x = .
k n n(1 − x)
k=n+1 k=n+1 0
1
n
|sn − f (xn )| ≤ k|ak | + ε
n
k=1
for n sufficiently large. But since k|ak | → 0, it follows that the arithmetic
means
1
n
k|ak |
n
k=1
Hardy’s Theorem. If the infinite series ∞ n=0 an is Cesàro summable
and {nan } is bounded, then the series is convergent.
Proof. Suppose that g(x) has a jump-discontinuity at c ∈ (0, 1), and let
g(c+) and g(c−) denote the right- and left-hand limits. Suppose without
loss of generality that g(c−) ≤ g(c+). For fixed δ ∈ (0, c), let (x) be the
linear function such that
(c − δ) = g(c − δ) + ε/2 , (c) = g(c+) + ε/2 .
Define
g(x) + ε/2 if 0 ≤ x < c − δ or c < x ≤ 1
φ(x) =
max{(x), g(x) + ε/2} if c − δ ≤ x ≤ c .
Then φ(x) is continuous in [0, 1] and φ(x) ≥ g(x)+ε/2. Choose a polynomial
Q(x) such that |Q(x) − φ(x)| < ε/2 for all x ∈ [0, 1]. Then Q(x) > g(x) and
1
Q(x) − g(x) dx < ε
0
if δ is sufficiently small. A similar construction produces a polynomial
1
P (x) < g(x) with 0 [g(x) − P (x)] dx < ε.
as claimed. Invoking the lemma, we can now infer from (1) that
∞
1
(2) lim (1 − x) sn xn g(xn ) = A g(t) dt
x→1− 0
n=0
Let xN = e−1/N and observe that xnN ≥ 1/e if and only if n ≤ N , so that
∞
N
sn xnN g(xnN ) = sn = (N + 1)σN .
n=0 n=0
as N → ∞, since xN → 1. But
m
P (x) = bk x k with P (0) = 0 ,
k=1
so that
N ∞
sN = an = an g(xnN ) , where xN = e−1/N .
n=0 n=0
we will apply the relation (3) to a polynomial P with the properties P (0) =
0, P (1) = 1, and P (t) ≥ g(t) for 0 ≤ t ≤ 1. For this purpose we choose a
polynomial Q with
g(t) − t
Q(t) ≥ = h(t) , 0 < t < 1,
t(1 − t)
1 − xn
= 1 + x + · · · + xn−1 ≤ n for 0 ≤ x < 1.
1−x
since φ(t) has a continuous extension to the interval [0, 1] except for the
jump-discontinuity at t = 1/e. Indeed, for any fixed x ∈ (0, 1), the integral
is approximated by its Riemann sum
∞
∞
φ(xn )
(x − x
n n+1
) = (1 − x) φ(xn ) ,
xn
n=1 n=1
Since Q(t) ≥ h(t), the last integral is positive, and in view of the lemma
it can be made arbitrarily small by suitable choice of the polynomial Q.
Putting everything together and recalling from (3) that
∞
lim an P (xn ) = A ,
x→1−
n=0
Another appeal to the lemma shows that the last integral can be made
arbitrarily close to zero with suitable choice of the polynomial Q, which
gives (5). Combining (4) and (5), we see that
∞
lim an g(xn ) = A .
x→1−
n=0
f (x) = x − x2 + x4 − x8 + x16 − . . . .
Does f (x) have a limit as x approaches 1 from below? If so, what is the
limit?
and
f (x) = x − (x2 − x4 ) − (x8 − x16 ) − · · · < x
for 0 < x < 1. In particular, f (x) is bounded in the interval [0, 1]. The
identity f (x) = x − f (x2 ) shows that if f (x) has a limit as x → 1−, the
limit must be 12 . Iteration of the identity shows that f (x4 ) < f (x), which
suggests that f (x) is an increasing function in the interval 0 < x < 1. If
7.5. Hardy’s power series 191
so, then by the monotone boundedness theorem f (x) does have a limit, and
f (x) → 12 as x → 1−. Figure 1 displays the graph of
30
k
(−1)k x2 , 0 < x < 1,
k=0
0.5
0.4
0.3
0.2
0.1
that Hardy had studied in 1907. The most astounding feature of the high-
indices theorem is that lacunarity alone serves as a Tauberian condition.
For Hardy’s power series we see k
that nk = 2 is sufficiently lacunary, and
the partial sums of the series ak are alternately equal to 0 and 1, so the
high-indices theorem shows that f (x) cannot tend to a limit as x → 1−.
A proof of the high-indices theorem is beyond the scope of this book,
but can be found for instance in the book by Korevaar [10], p. 50 ff. Al-
ternatively, we can appeal to the more elementary theorem of Hardy and
Littlewood, as stated at the end of Section 7.3, to show that Hardy’s sum
does not tend to a limit as x → 1−. For the function
∞
∞
k
f (x) = an xn = (−1)k x2 ,
n=1 k=0
0.52
0.51
0.49
Exercises
1. Show that the series
∞
(−1)n+1 n = 1 − 2 + 3 − 4 + . . .
n=1
Thus a k k
n = (−1) if n = 2 and an = 0 otherwise. Show directly that the
series an is not Cesàro summable by computing
1 2
lim inf σn = 3 and lim sup σn = 3 .
n→∞ n→∞
4. Show that
∞
2 1
(−1)k xk → as x → 1 − .
2
k=0
Suggestion. Show that the series an is Cesàro summable, where an =
k 2
(−1) if n = k and an = 0 otherwise.
5. (a) Show directly, without appeal to a Tauberian theorem, that the
infinite series ∞ 1
n=1 n is not Cesàro summable.
(b) Show directly that the same series is not Abel summable.
194 7. Tauberian Theorems
6. Show that for an ≥ 0, the series an is Abel summable only if it is
convergent.
7. (a) Prove the Abelian theorem for Lambert summability. In other words,
show that if
∞
∞
an = s , then lim an L(xn ) = s ,
x→1−
n=0 n=0
References
[1] E. R. Berlekamp and J. P. Buhler (editors), Puzzles Column, Emissary, Math-
ematical Sciences Research Institute, Berkeley, Fall 2004 and Spring 2005.
[2] G. Frobenius, “Ueber die Leibnitzsche Reihe”, J. Reine Angew. Math. 89 (1880),
262–264.
[3] G. H. Hardy, “On certain oscillating series”, Quart. J. Math. 38 (1907), 269–288.
[4] G. H. Hardy, “Theorems relating to the summability and convergence of slowly
oscillating series”, Proc. London Math. Soc. 8 (1910), 301–320.
[5] G. H. Hardy, Divergent Series, Clarendon Press, Oxford, 1949.
[6] G. H. Hardy and J. E. Littlewood, “Tauberian theorems concerning power series
and Dirichlet’s series whose coefficients are positive”, Proc. London Math. Soc. 13
(1914), 174–191.
[7] G. H. Hardy and J. E. Littlewood, “A further note on the converse of Abel’s
theorem”, Proc. London Math. Soc. 25 (1926), 219–236.
References 195
Power series expansions of functions are known to play a vital role in anal-
ysis. In this chapter we turn to another kind of expansion, a sum of sines
and cosines known as a Fourier series. It is fundamentally different from a
Taylor series expansion in that the coefficients are determined not by dif-
ferentiation but by integration, exploiting a property of orthogonality; and
the function to be represented need not be differentiable and may even have
points of discontinuity. Here we develop criteria for convergence and Cesàro
summability of Fourier series, with applications to specific examples. The
discussion then shifts to a continuous analogue, the Fourier transform and its
inversion, leading ultimately to the remarkable Poisson summation formula.
∂ 2 u 1 ∂u 1 ∂2u
+ + =0
∂r2 r ∂r r2 ∂θ2
197
198 8. Fourier Series
in polar coordinates.
Now if f (θ) = sin nθ or cos nθ for n = 1, 2, . . . , this boundary-value
problem can be solved by inspection. The solutions are u(r, θ) = rn sin nθ
and rn cos nθ, respectively. But the Laplace equation is linear, so a principle
of superposition applies. This means that for any constant c, the solution
with boundary function cf (θ) is cu(r, θ), and if v(r, θ) is the solution cor-
responding to another boundary function g(θ), then the u + v solves the
problem for boundary function f + g.
All of this suggests that the boundary-value problem will be solved for
an arbitrary periodic function f (θ) if that function can be expanded into an
infinite series of the form
∞
1
(1) f (θ) = 2 a0 + (an cos nθ + bn sin nθ) ,
n=1
An expansion of the form (1) is called a Fourier series, after the French
mathematician and physicist Joseph Fourier (1768–1830), who conceived the
method and developed it in Théorie analytique de la chaleur, a revolution-
ary monograph on heat conduction published in 1822. Fourier’s arguments
were not rigorous, and his ideas were not readily accepted. In particular,
the notion that a discontinuous function could be represented as a sum of
sines and cosines seemed inconceivable and was greeted with skepticism. Ul-
timately the paradoxes inherent in Fourier series played an important role in
leading Cauchy and others to develop more precise concepts in mathematical
analysis, thereby placing the entire apparatus on a more secure foundation.
Fourier series are the prototype for a variety of orthogonal expansions
that arise in similar manner from problems of mathematical physics. For ex-
ample, expansions into series of Bessel functions and Legendre polynomials
will be encountered in Chapter 13 of this book.
Joseph Fourier had a colorful career that combined science with public
service. His early work on the theory of algebraic equations was interrupted
by the French Revolution of 1789. He then became embroiled in politics and
was actually condemned to the guillotine at one time. After the Revolution
he taught for a while at the new École Polytechnique in Paris, then was
appointed as a scientific advisor in Napoleon’s expedition to Egypt. Upon
return to France, he became a civil administrator in Grenoble, during which
8.2. Orthogonality relations 199
time (around 1805) he began the work on heat diffusion that would culminate
in his famous book of 1822.
Since the time of Fourier, an extensive mathematical theory of Fourier se-
ries has emerged, with broad applications not only to mathematical physics,
but to such fields as coding theory, signal transmission, data storage and re-
trieval, crystallography, medical imaging, and pure mathematics itself. The
mathematical theory will be our focus in this chapter.
Suppose now that for some coefficients ak and bk the trigonometric series
∞
(3) f (x) = 12 a0 + (ak cos kx + bk sin kx)
k=1
converges uniformly in the interval [−π, π] to a sum f (x). (The reason for
the 12 will become apparent shortly.) Because of the uniform convergence,
the function f is continuous and we can integrate term by term to see that
π π
1
f (x) dx = 2 a0 = πa0 .
−π −π
200 8. Fourier Series
The numbers
1 π
an = f (x) cos nx dx , n = 0, 1, 2, . . . ,
π −π
(4) π
1
bn = f (x) sin nx dx , n = 1, 2, . . .
π −π
are called the Fourier coefficients of the function f , and the trigonometric
series of the form (3) with these coefficients is known as the Fourier series
of f . As our calculations show, the Fourier series of a continuous function f
is the only trigonometric series of the form (3) that can converge uniformly
to f in the interval [−π, π]. In particular, no function can have more than
one uniformly convergent trigonometric series expansion.
A much more delicate question is whether the Fourier series of a given
continuous periodic function will actually converge to the function, uni-
formly or even pointwise. Before turning to this important question, we
shall consider a problem of best “mean-square” approximation.
where
π the orthogonality relations (2) have been used to calculate the integral
2
−π T (x) dx. Now let
n
(6) sn (x) = 12 a0 + (ak cos kx + bk sin kx)
k=1
denote the nth partial sum of the Fourier series of f . With the choice of
trigonometric polynomial T (x) = sn (x), the expression (5) reduces to
π π
n
(7) [f (x) − sn (x)]2 dx = f (x)2 dx − 12 πa20 − π (a2k + b2k ) .
−π −π k=1
n π
1 2
2 πa0 +π (ak + bk ) ≤
2 2
f (x)2 dx ,
k=1 −π
202 8. Fourier Series
π
since −π [f (x) − sn (x)]2 dx ≥ 0. Letting n → ∞, one sees that the infinite
series converges and that
∞
π
(8) 1 2
2 πa0 + π (ak + bk ) ≤
2 2
f (x)2 dx .
k=1 −π
an identity known as Parseval’s relation. This will follow from the equation
(7) if we can show that
π
(9) lim [f (x) − sn (x)]2 dx = 0 .
n→∞ −π
For this purpose we will appeal to the trigonometric form of the Weierstrass
approximation theorem, stated here for easy reference.
Weierstrass Approximation Theorem. Let f (x) be continuous on the
interval [−π, π], with f (−π) = f (π). Then for each ε > 0 there is a trigono-
metric polynomial T (x) such that |f (x) − T (x)| < ε for −π ≤ x ≤ π.
The trigonometric form of the Weierstrass approximation theorem is
closely related to the algebraic form, as presented in Chapter 6. A proof is
outlined in Exercise 6. Another proof will be given later in this chapter as
a corollary of Fejér’s theorem. To show that the relation (9) holds for every
function f that is square-integrable over [−π, π], we note first that such a
function can be approximated in mean by a function g that is continuous
on [−π, π] and has the property g(−π) = g(π). Combining this fact with
the Weierstrass approximation theorem, we see that for each ε > 0 there is
a trigonometric polynomial T such that
π
[f (x) − T (x)]2 dx < ε .
−π
since we have shown that the Fourier polynomial sn gives the best mean-
square approximation to f among all trigonometric polynomials of degree n
or lower. This verifies (9), which proves the Parseval relation.
The function
sin(n + 12 )x
n
1
Dn (x) = = 2 + cos kx
2 sin 12 x k=1
is known as the Dirichlet kernel. It has the property
1 π
(10) Dn (x) dx = 1 , n = 1, 2, . . . ,
π −π
π
since −π cos kx dx = 0 for k = 1, 2, . . . . The Dirichlet kernel is an even
function, periodic with period 2π. Thus Dn (−x) = Dn (x) and Dn (x+2π) =
Dn (x). A graph of the Dirichlet kernel for n = 10 is shown in Figure 1.
x
−π π
With the help of Lemma 1, we may now derive a useful formula for the
nth partial sum sn (x) of the Fourier series of a function f , as defined by
(6). We take f to be integrable over the interval [−π, π], with the property
f (−π) = f (π), and we extend it to the whole real line as a periodic function
with period 2π. Introducing the formulas (4) for the Fourier coefficients of
f , we can write
1 π
1
n
1
= 1
2 + cos k(x − t) f (t) dt
π −π k=1
π π
1 1
= Dn (x − t)f (t) dt = Dn (t)f (x + t) dt ,
π −π π −π
since the Dirichlet kernel Dn is even and both Dn and f have period 2π.
This relation
1 π
(11) sn (x) = Dn (t)f (x + t) dt
π −π
is known as the Dirichlet formula for the nth partial sum of a Fourier series.
In view of the property (10), we have deduced that
1 π
(12) sn (x) − f (x) = Dn (t)[f (x + t) − f (x)] dt .
π −π
for some constant C > 0, so this method of proof cannot succeed. What is
worse, it is actually possible for the Fourier series of a continuous periodic
function to diverge at some points. Thus in order to prove that the Fourier
series converges to the function at a given point, we shall have to impose
hypotheses stronger than continuity. We will assume that the function sat-
isfies a smoothness condition near the point where convergence is to occur.
The precise statement is as follows.
8.4. Convergence of Fourier series 205
so the integral
π
1
(14) ϕx (t) sin(n + 12 )t dt
π −π
206 8. Fourier Series
can be regarded as a sum of Fourier sine and cosine coefficients of the square-
integrable functions ϕx (t) cos 12 t and ϕx (t) sin 12 t, respectively. Thus by the
corollary to Bessel’s inequality, the integral (14) tends to zero as n → ∞. It
therefore follows from (13) that sn (x) → f (x) as n → ∞, which completes
the proof.
It may be remarked that the proof remains valid under the weaker
smoothness hypothesis
|f (x + t) − f (x)| ≤ C |t|α , |t| < δ ,
for some exponent α > 12 . That still ensures the square-integrablity of ϕx (t),
so its Fourier coefficients tend to zero.
There is a more general form of the convergence theorem that applies to
functions with jump discontinuities. If the function f has one-sided limits
f (x+) and f (x−) at a point x, and it is sufficiently smooth in both of the one-
sided neighborhoods of x, then its Fourier series converges to the average
of the two limits. This may be viewed as a corollary of the convergence
theorem. Here is a more precise statement.
Corollary. Let f be periodic with period 2π and square-integrable over
[−π, π]. At some point x ∈ R, suppose f has one-sided limits
f (x+) = lim f (x + t) , f (x−) = lim f (x − t) .
t→0, t>0 t→0, t>0
and ⎧
⎨ 0, −π ≤ t ≤ 0
ψx (t) = f (x + t) − f (x+)
⎩ , 0 < t ≤ π,
2 sin 12 t
we conclude as before that sn (x) → 12 [f (x+) + f (x−)] as n → ∞.
8.5. Examples
In calculating the Fourier coefficients of a given function f it is often
useful to take advantage of special symmetries. For instance, if f is an even
function, then f (x) sin nx is odd, and it is clear without calculation that the
Fourier sine coefficients of f must vanish: bn = 0 for all n. Also, the Fourier
cosine coefficients take the form
2 π
an = f (x) cos nx dx n = 0, 1, 2, . . . .
π 0
Example 1. For a first example, consider the function f (x) = |x| for
−π ≤ x ≤ π, and let it be extended periodically to the whole line so that
208 8. Fourier Series
In order to verify (15), we have only to check that the right-hand side is
indeed the formal Fourier series of the function f . We may use the trigono-
metric identity
for n = 1, 2, . . . , and
1 π 1 π 2
a0 = f (x) dx = cos cx dx = sin cπ .
π −π π −π cπ
8.5. Examples 211
Since the series in (16) converges uniformly in each interval [−a, a] with
0 < a < 1, we may integrate term by term over an interval [ε, x], where
0 < ε < x < 1, to obtain
∞ 2
sin πx sin πε x − n2
log − log = log .
πx πε ε2 − n2
n=1
In fact, the formula (17) holds for all x ∈ R. To see this, we need only
show that the right-hand side is periodic with period 2. In other words,
if p(x) denotes the infinite product (which converges for every x ∈ R), we
want to show that p(x + 2) = p(x). But the partial product
x2 x2
pn (x) = πx 1 − x 2
1 − 2 ··· 1 − 2
2 n
can be rewritten as
(−1)n π
pn (x) = (x − n) · · · (x − 1)x(x + 1) · · · (x + n) ,
(n!)2
which gives the relation
(x + n + 1)(x + n + 2)
pn (x + 2) = pn (x) , n = 1, 2, . . . .
(x − n + 1)(x − n)
Letting n tend to infinity, we conclude that p(x + 2) = p(x).
212 8. Fourier Series
This was Euler’s original derivation, although his discovery of the product
formula for the sine function amounted only to a plausible argument.
denote the partial sums of that Fourier series. Let ξn denote the first positive
point where the function sn (x) attains a local maximum. An inspection of
Figure 4 suggests the conjecture that the sequence {sn (ξn )} tends to a limit
larger than π/2 as n → ∞. In other words, the overshoot persists in the
limit.
To prove the conjecture, first calculate the derivative
n
sn (x) = cos kx = Dn (x) − 12 ,
k=1
where
sin(n + 12 )x
Dn (x) =
2 sin 12 x
is the Dirichlet kernel, as found in Lemma 1. Thus the equation sn (x) = 0
is equivalent to
sin(n + 12 )x = sin 12 x .
8.6. Gibbs’ phenomenon 213
for k = 0, ±1, ±2, . . . . Hence the positive critical points of sn (x) in the
interval (0, π] have the form
2kπ
x= , k = 1, 2, . . . , [n/2] ,
n
and
(2k + 1)π
x= , k = 0, 1, 2, . . . , [n/2] ,
n+1
where [x] denotes the integer part of x. The critical points in the first
group are found to be local minima and those in the second group are local
maxima. Thus the first positive local maximum of sn (x) occurs at the point
π
x = ξn = .
n+1
Let us now investigate the behavior of
n
1 kπ
sn (ξn ) = sin
k n+1
k=1
n+1 n+1
n+1 kπ π
sn (ξn ) = sin = f (xk ) Δxk ,
kπ n+1 n+1
k=1 k=1
where
sin x kπ π
f (x) = , xk = , and Δxk = .
x n+1 n+1
The last sum is none other than a Riemann sum for the integral
π π
sin x
f (x) dx = dx
0 0 x
π 2π 3π 4π
sin x
Figure 5. Graph of x .
In the limit, then, the overshoot amounts to about 9% of the total jump π.
Although a particular example has served to illustrate the Gibbs phe-
nomenon, it is important to emphasize that the phenomenon is quite general.
It is true that our analysis has taken advantage of special features of the
sawtooth function, and a similar analysis applies to the square wave of Ex-
ample 2 (see Exercise 10). However, the phenomenon is not peculiar to
special examples and will manifest itself at any jump-discontinuity. To be
more precise, suppose that a periodic function g has a jump-discontinuity
at a point x0 , so that g(x0 +) − g(x0 −) = c > 0, where g(x0 +) and g(x0 −)
denote the right-hand and left-hand limits of g(x), respectively, at x0 . Then,
if f is the sawtooth function of Example 3, the function
c
h(x) = g(x) − f (x − x0 )
π
has a continuous extension to x0 and the partial sums of its Fourier series
are
c
un (x) = tn (x) − sn (x − x0 ) ,
π
8.7. Arithmetic means of partial sums 215
1
n
(18) σn (x) = sk (x) , n = 1, 2, . . . ,
n+1
k=0
Lemma 2.
n
sin2 12 (n + 1)x
sin(k + 12 )x = , n = 1, 2, . . . .
k=0
sin 12 x
n
1
2 sin 12 x sin(k + 12 )x = 1 − cos(n + 1)x = 2 sin2 2 (n + 1)x .
k=0
sin(k + 12 )t
Dk (t) = , k = 0, 1, 2, . . . ,
2 sin 12 t
1
n π n
1 1
σn (x) = sk (x) = Dk (t) f (x + t) dt
n+1 n+1 π −π
(19) k=0 k=0
π
1
= Kn (t)f (x + t) dt ,
π −π
where
1 sin2 12 (n + 1)x
Kn (x) = .
n + 1 2 sin2 12 x
The function Kn (x) is called the Fejér kernel. Figure 6 shows a plot for
n = 10.
8.7. Arithmetic means of partial sums 217
x
Π Π
1
n
Kn (x) = Dk (x) ,
n+1
k=0
Like the Dirichlet kernel, the Fejér kernel peaks at the origin as n increases,
but it has the crucial advantage that Kn (x) ≥ 0 for all x. Since sin2 12 x ≥
(x/π)2 , we see that
π2
(21) 0 ≤ Kn (x) ≤ , δ ≤ |x| ≤ π ,
2(n + 1)δ 2
for every n sufficiently large. This proves that σn (x) → f (x) uniformly on
R as n → ∞.
The representation formula (19) for the Fejér means, together with the
property (20), shows at once that
Hence there can be no Gibbs phenomenon for the Fejér means, no over-
shooting near points of discontinuity. The dramatic difference in mode of
convergence is illustrated in Figure 7, where the Fejér means σn (x) of the
discontinuous sawtooth function
− 12 (π + x) , −π ≤ x < 0
(23) f (x) = 1
2 (π − x) , 0<x≤π
are shown for n = 10 and 20. This should be compared with the Gibbs
phenomenon displayed by the partial sums sn (x), as shown in Figure 4.
With the help of the property (22) of the Fejér means, we can now prove
that the partial sums sn (x) of the function (23) are uniformly bounded. The
estimate (22) gives |σn (x)| ≤ π/2. On the other hand, a calculation shows
that
1
n
|sn (x) − σn (x)| = sin kx < 1 .
n+1
k=1
Thus |sn (x)| ≤ 1 + π/2 for all x ∈ R. The details are left as an exercise.
1 1 1
gm (x) = + cos x + cos 2x + · · · + cos(m − 1)x
m m−1 m−2
1 1
− cos(m + 1)x − cos(m + 2)x − · · · − cos 2mx
2 m
m
1 m
sin kx
= cos(m − k)x − cos(m + k)x = 2 sin mx
k k
k=1 k=1
∞
1
(24) F (x) = gm (x) .
k2 k
k=1
1
Recall that the partial sums of the series k sin kx are uniformly bounded,
as we saw at the end of the preceding section. Thus the functions gm (x)
are also uniformly bounded, and the Weierstrass M-test shows that the
infinite series (24) converges uniformly on R. Therefore, the sum F (x) is a
continuous function on the real line, periodic with period 2π. With suitable
choice of the sequence {mk }, we will see that the Fourier series of F is
divergent at the origin; in fact, its partial sums are unbounded there.
220 8. Fourier Series
Let smn (x) denote the nth partial sum of the Fourier series of gm (x).
Then smn (x) = gm (x) for all n ≥ 2m, since gm is a trigonometric polynomial
of degree 2m. In view of the uniform convergence, the Fourier coefficients of
F can be calculated by integrating the relevant infinite series term by term.
This says that each Fourier coefficient of F is the sum of corresponding
Fourier coefficients of the terms of the series (24). It follows that the nth
partial sum Sn (x) of the Fourier series of F is the sum of the nth partial
sums of the terms. In other words,
∞
1
(25) Sn (x) = sm n (x) .
k2 k
k=1
In particular,
∞
1
Sn (0) = sm n (0) .
k2 k
k=1
Observe now that smn (0) ≥ 0 for every pair of indices m and n, and that
m
1 1 dx
smm (0) = 1 + + · · · + > = log m .
2 m 1 x
3
Now choose mk = 2k to infer that
1 3
Smk (0) > k log 2 = k log 2 ,
k2
so that the partial sums of F are unbounded at the origin.
The function F (x) can be translated to obtain a function F (x−r) whose
Fourier series diverges at a specified point r. Then by forming a suitable
weighted sum of the functions F (x − rk ) corresponding to an arbitrary se-
quence rk , we can construct a continuous function whose Fourier series di-
verges at every point rk . For instance, the sequence rk can be taken to be
an enumeration of the rational numbers, and we then obtain a continuous
function whose Fourier series diverges at a countable dense set of points.
On the other hand, Lennart Carleson proved in 1966 that the Fourier
series of every continuous function converges almost everywhere. In other
words, the points of divergence (if any) constitute a set of measure zero.
8.9. Fourier transforms 221
where
2π 2π
1 1
an = f (t) cos nt dt and bn = f (t) sin nt dt
π 0 π 0
Therefore,
n
1 n
ikt
ck e = a0 + (ak cos kt + bk sin kt) = sn (t)
2
k=−n k=1
is the nth partial sum of the Fourier series of f , expressed in complex nota-
tion.
Recall now that ei(s+t) = eis eit , by the addition formulas for sine and
cosine. Differentiation gives
d iat
e = −a sin at + ia cos at = iaeiat .
dt
222 8. Fourier Series
The complex form of a Fourier series can be obtained more directly through
the orthogonality relations
2π 2π
int −imt i(n−m)t
0 for m = n
e e dt = e dt =
0 0 2π for m = n .
Formally, if a complex-valued function has an expansion of the form (26),
multiplication by e−imt and integration shows that the coefficients cn are
given by (27).
In what follows we will be dealing with complex-valued functions f (t)
defined on the real line R and Riemann integrable over each bounded inter-
val. For p > 0 we will write f ∈ Lp to indicate that |f (t)|p is integrable over
R . We will denote by L∞ the set of all bounded locally integrable functions
on R .
The Fourier transform
∞
f (x) = e−ixt f (t) dt , −∞ < x < ∞ ,
−∞
Example 3. Next let f (t) = e−a|t| for some constant a > 0. Then since f
is an even function, we see that
∞ ∞ ∞
f (x) = e −ixt −a|t|
e dt = e −a|t|
cos xt dt = 2 e−at cos xt dt .
−∞ −∞ 0
Then
|x|
2π 1 − for |x| ≤ R
f(x) = R
0 for |x| > R .
Example 6. If f (t) = a
a2 +t2
for some constant a > 0, then f(x) = πe−a|x| .
plays a central role in probability theory, is essentially its own Fourier trans-
form. Here is a slight generalization.
Example 7. If f (t) = e−at
2 /2
for some constant a > 0, then
f(x) = 2π/a e−x
2 /2a
.
Theorem. Let f ∈ L1 and let f(x) denote its Fourier transform. Then
(a) f(x) is uniformly continuous on R .
(b) f(x) → 0 as |x| → ∞ .
(c) If g(t) = tf (t) and g ∈ L1 , then f is differentiable and f (x) =
− i
g (x) .
(d) If f is differentiable and f ∈ L1 , then f has Fourier transform
f (x) = ixf(x) .
as desired.
as given in Example 7. Since f (t) = −at e−at /2 is also integrable, Parts (c)
2
and (d) of the preceding theorem both apply and show that
Thus f satisfies the differential equation af (x) + xf(x) = 0, which implies
that
d x2 /2a
ex /2a f(x) = C
2
e f (x) = 0 , and so
dx
for some constant C. But
∞ ∞
C = f (0) = e −at2 /2
dt = 2/a e−s ds = 2π/a ,
2
−∞ −∞
which shows that f(x) = 2π/a e −x2 /2a , as claimed. (For calculation of
the last integral, see Section 9.1.)
The convolution f ∗ g of two functions f, g ∈ L1 is defined by the integral
∞
(f ∗ g)(t) = f (s)g(t − s) ds .
−∞
It is easily seen that the integral converges absolutely if one of the functions
f or g is bounded. The same is true, by the Cauchy-Schwarz inequality, if
f and g are both in the space L2 . The operation of convolution behaves
much like multiplication of numbers. It is commutative and associative, and
distributive over addition:
f ∗g =g∗f, f ∗ (g ∗ h) = (f ∗ g) ∗ h , f ∗ (g + h) = f ∗ g + f ∗ h .
Perhaps the most important property of the Fourier transform is that it con-
verts convolution to ordinary pointwise multiplication: f ∗ g(x) = f(x) g(x).
For a proof, we can interchange the order of integration to write
∞ ∞
f ∗ g(x) = e −ixt
f (s)g(t − s) ds dt
−∞ −∞
∞ ∞
= e −ixs
f (s) e−ix(t−s) g(t − s) dt ds = f(x)
g (x) .
−∞ −∞
228 8. Fourier Series
On the other hand, the operation of convolution does not admit a multi-
plicative identity element. In other words, there is no function g ∈ L1 with
the property that f ∗ g = f for every function f ∈ L1 . If there were such a
function, Fourier transformation would give in particular f(x) g (x) = f(x)
√
for the function f (t) = e −t 2 /2
, with transform f (x) = 2π e −x2 /2
= 0. But
that implies g (x) ≡ 1, contradicting the fact that g (x) → 0 as |x| → ∞,
as shown in the preceding theorem. To circumvent the difficulty, it is of-
ten useful to implement
∞ an “approximate identity”, a sequence of functions
gn ∈ L such that −∞ gn (t)dt = 1 and f ∗ gn → f in some sense for every
1
Let I = limt→∞ A(t). Given ε > 0, choose M large enough that |A(t) − I| <
ε/2 for all t > M . Then for every R > M sufficiently large, we have
R M R
1 1 1
A(t) dt − I ≤ |A(t) − I| dt + |A(t) − I| dt < ε ,
R R 0 R M
0
But
δ δ ∞
|gR (s)| ds = R |g(Rs)| ds ≤ |g(s)| ds < ∞ , and
−δ −δ −∞
∞ ∞
|gR (s)| ds = |g(s)| ds → 0 as R → ∞ .
δ δR
230 8. Fourier Series
−δ
Similarly, −∞ |gR (s)| ds → 0 as R → ∞. It follows that |(f ∗gR )(t)−f (t)| ≤
ε for all R sufficiently large, and so (f ∗ gR )(t) → f (t) as R → ∞. If f is
uniformly continuous on the whole real line, the same argument shows that
the convergence is uniform.
where
2
g(s) = (1 − cos s) and gR (s) = R g(Rs) .
s2
Here the interchange of the order of integration is justified by the absolute
integrability of the integrand, and Example 2 in the last section has been
applied to calculate the Fourier transform of the function 1 − |x|
R .
We are now in position to apply Lemma 2. Integration by parts gives
∞ ∞ ∞
1 1
g(s) ds = 2 2
(1 − cos s) ds = 4 (1 − cos s) ds
−∞ −∞ s 0 s2
∞
sin s
=4 ds = 2π .
0 s
8.10. Inversion of Fourier transforms 231
(See Exercise 12 for calculation of the last integral.) Thus Lemma 2, ad-
justed by the factor 2π, shows that
∞
f (s)gR (t − s) ds = (f ∗ gR )(t) → 2πf (t) as R → ∞ ,
−∞
It may be noted that the more general form of Parseval’s relation follows
easily from the particular form just stated, by means of the polarization
identity (cf. Exercise 28).
converges uniformly in each bounded interval. Suppose also that the series
∞
f(2πn)
n=−∞
Proof. Observe first that under the hypothesis of uniform convergence the
sum g(t) is continuous and is periodic with period 1 ; that is, g(t + 1) = g(t)
for all t ∈ R . The Fourier coefficients of g, adapted to the change of scale,
are
1 ∞
1
−2πint
cn = g(t) e dt = f (t + k) e−2πint dt
0 k=−∞ 0
∞
k+1 ∞
= f (t) e−2πint dt = f (t) e−2πint dt = f(2πn) ,
k=−∞ k −∞
converges absolutely and uniformly to a continuous sum h(t) for which h(t +
1) ≡ h(t). Since both h and g are continuous periodic functions with the
same Fourier coefficients, it follows that h(t) ≡ g(t). (See Exercise 21.)
Therefore,
∞
∞
(31) g(t) = f (t + n) = f(2πn) e2πint .
n=−∞ n=−∞
As a corollary, we see that if f(x) = 0 for all x outside the open interval
(−2π, 2π), then
∞ ∞
f (t) dt = f(0) = f (n) .
−∞ n=−∞
and so
∞
∞
√ √
f (2πn) = (1/ t) e−πn /t = (1/ t) ϑ(1/t) .
2
n=−∞ n=−∞
Consequently, the remarkable identity
√
ϑ(1/t) = t ϑ(t) ,
known as Jacobi’s inversion formula, follows from the Poisson summation
formula.
For another application, let
sin πt 2
f (t) = .
πt
According to Example 5, this function has Fourier transform
|x|
1 − 2π for |x| ≤ 2π
f(x) =
0 for |x| > 2π .
In particular, f(2πn) = 0 for all n except n = 0, and f(0) = 1. Hence the
Poisson summation formula in the form (31) evaluates the sum
∞
sin π(n + t) 2
= 1, t ∈ R.
n=−∞
π(n + t)
But sin2 π(n + t) = sin2 πt for all n, so the expansion reduces to
∞
sin πt 2 1
= 1,
π n=−∞
(n + t)2
which implies that
∞
1 π 2
=
n=−∞
(n + t)2 sin πt
if t is not an integer. For t = 12 this says that
∞
1 π2
= ,
(2n + 1)2 8
n=0
which is equivalent to Euler’s sum ∞ 2 2
n=1 1/n = π /6.
The Poisson summation formula can also be applied to prove the sam-
pling theorem, a result of great importance in signal analysis. Here a signal
is understood to be a continuous function f (t) of time t, where −∞ < t < ∞.
It is said to be band-limited if its Fourier transform vanishes outside some
bounded interval. Band-limited signals are analogous to trigonometric poly-
nomials and have similar structural properties.
8.11. Poisson summation formula 235
∞ ∞
1 π sin π(t − n)
= f (n) cos(t − n)x dx = f (n) .
n=−∞
π 0 n=−∞
π(t − n)
236 8. Fourier Series
The article by Higgins [5] may be consulted for further information about
the cardinal series. The book by Folland [3] offers a broad discussion of
Fourier transforms and their applications to problems of physics and engi-
neering.
Siméon-Denis Poisson (1781–1840) was a student at École Polytechnique
in Paris, where Laplace and Lagrange recognized his talent and promoted
his cause. At age 25 he became a professor there, the successor to Fourier.
He made important contributions to mathematical physics (electricity and
magnetism), celestial mechanics, and probability theory (the Poisson dis-
tribution). The Poisson summation formula was known to Poisson, among
others, as early as 1823.
Exercises
1 1 1 π2 1 1 1 π2
1+ + + + · · · = and 1− + − + · · · = .
22 32 42 6 22 32 42 12
3. Show that
∞
(−1)n
πt csc πt = 1 + 2t2 , t = 0, ±1, ±2, . . . .
t2 − n2
n=1
4. Show that
∞ ∞
2 4 cos 2nx 8 sin2 nx
| sin x| = − = , −∞ < x < ∞ .
π π 4n2 − 1 π 4n2 − 1
n=1 n=1
Hint. 1
2n−1 − 1
2n+1 = 2
4n2 −1
.
Exercises 237
5. (a) If a function f (x) is periodic with period 2π, and if f has a continuous
derivative of order k, show that the Fourier coefficients of f satisfy
for some constant C > 0. Conclude that the Fourier series converges uni-
formly to f (x).
(b) Show that every continuous periodic function can be approximated
uniformly by a continuous piecewise linear function with the same period.
(c) Deduce the trigonometric form of the Weierstrass approximation the-
orem.
7. Prove that π
4
|Dn (t)| dt ∼ log n , n → ∞,
−π π
where Dn (t) is the Dirichlet kernel.
of Example 3 in Section 8.5 have the property sn (x) > 0 for 0 < x < π.
Hint. Proceed by induction. Assuming that sn−1 (x) > 0 for 0 < x < π,
investigate the sum sn (x) at a local minimum.
(b) Show that |sn (x)| ≤ 1 + π/2 for all x ∈ R.
Suggestion. Use the Fejér means σn (x) as indicated at the end of Section
8.7.
238 8. Fourier Series
n
sin 2(n + 1)x
cos(2k + 1)x = , n = 1, 2, . . . .
2 sin x
k=0
(b) For the Fourier series of the square wave function of Example 2,
show that the partial sums
4 1
n
s2n+1 (x) = sin(2k + 1)x
π 2k + 1
k=0
kπ
x= , k = 1, 2, . . . , 2n + 1 ,
2(n + 1)
in the interval (0, π), which are alternately local maxima and local minima.
(c) By a direct method similar to that of Section 8.6, show that
π 2 π sin x
lim s2n+1 = dx = 1.1789 . . . ,
n→∞ 2(n + 1) π 0 x
11. For the Fourier series of the discontinuous sawtooth function of Exam-
ple 3, show that the partial sums exhibit an “undershoot” as well as the
“overshoot” of the Gibbs phenomenon. Specifically, for the first positive
local minimum λn = 2π/n, show that
2π
sin x π
lim sn (λn ) = dx = 1.4181 . . . = (0.9028 . . . ) .
n→∞ 0 x 2
Writing
π π
1 1 π sin(n + 12 )x
1 − x
1
Dn (x) dx = sin(n + 2 )x dx + dx ,
0 0 2 sin 2 x 0 x
13. If f (x) is a continuous function of period 2π, except for a jump discon-
tinuity at a point x0 where it has one-sided limits f (x0 +) and f (x0 −), show
that its Fejér means have the property
lim σn (x) = 12 [f (x0 +) + f (x0 −)] .
n→∞
and apply Abel summation to the second sum to conclude that |sn (x)| ≤
(π + 2)M .
16. (a) If an+1 ≤ an and an → 0 as n → ∞, show that the series
∞
1
2 a0 + an cos nx
n=1
17. (a) By expanding the given function into Fourier series, show that
∞ 1
1 (π − 1)x , 0 ≤ x ≤ 1
sin n sin nx = 21
2 (π − x) , 1 ≤ x ≤ π.
n2
n=1
Conclude that
∞
∞
sin n sin n 2 π−1
= = .
n n 2
n=1 n=1
1 − r2
P (r, θ) = ,
1 − 2r cos θ + r2
verify the infinite series expansion
∞
P (r, θ) = 1 + 2 rn cos nθ , 0 ≤ r < 1.
n=1
Note that this formula can be viewed either as a Fourier expansion of P (r, θ)
for fixed r or as a Taylor series expansion for fixed θ. Suggestion: Show that
1 + reiθ
P (r, θ) = Re .
1 − reiθ
(c) Use integral expressions for the Fourier coefficients an and bn to es-
tablish the Poisson formula
2π
1
u(r, θ) = P (r, θ − t)f (t) dt , 0 ≤ r < 1.
2π 0
19. (a) If ϕ(x) is a function of bounded variation over the interval [−π, π],
integrate by parts to show that its Fourier coefficients
1 π 1 π
an = cos nx ϕ(x) dx and bn = sin nx ϕ(x) dx
π −π π −π
20. (a) Suppose that a function f (x) is continuous and periodic with period
2π on the real line, and its Fourier coefficients satisfy an = O(1/n) and
bn = O(1/n) as n → ∞. Invoke Hardy’s Tauberian theorem to show that
the Fourier series converges to f (x) uniformly on R.
(b) Conclude that if f (x) is a continuous function of bounded variation
over [−π, π] with f (−π) = f (π), then the Fourier series converges to f (x)
uniformly in [−π, π].
21. Let f (x) be a continuous function with period 2π, and suppose that all
of its Fourier coefficients vanish:
π π π
f (x) dx = 0 and f (x) cos nx dx = f (x) sin nx dx = 0
−π −π −π
23. Suppose that a function f (x) has a continuous derivative on the interval
2π
[0, 2π] and that f (0) = f (2π) and 0 f (x)dx = 0. Use Parseval’s relation
to show that
2π 2π
[f (x)]2 dx ≤ [f (x)]2 dx ,
0 0
with strict inequality unless f (x) = A cos x + B sin x for some constants A
and B. The result is known as Wirtinger’s inequality.
(a) Apply Green’s theorem to show that the area of the region enclosed
by C is
∞
A= x dy = π n(an dn − bn cn ) .
C n=1
∞
π n2 (a2n + b2n + c2n + d2n ) = 2π .
n=1
(c) Combine the results of parts (a) and (b) to deduce that
∞
∞
2(π − A) = π (n2 − n)(a2n + b2n + c2n + d2n ) + π n[(an − dn )2 + (bn + cn )2 ] .
n=1 n=1
26. Refer to Example 3 in Section 8.9 for the Fourier transform of f (t) =
e−a|t| , where a > 0.
(a) Calculate the Fourier transform of f (t).
(b) Use the Parseval relation to calculate the integral
∞
x2 π
2 2 2 2
dx = , a > 0, b > 0.
−∞ (x + a )(x + b ) a+b
27. Calculate the Fourier transform of the function f (t) = sint t , and explain
your reasoning. Note that f is Riemann integrable but |f | is not. In other
words, f ∈
/ L1 .
244 8. Fourier Series
31. Derive the sampling theorem from the Poisson summation formula.
32. Generalize the sampling theorem by showing that if f(x) = 0 for |x| ≥ a,
then
∞
sin(at − nπ)
f (t) = f (nπ/a) .
n=−∞
at − nπ
References
[1] Robert Baillie, Solution to Advanced Problem 6241 (also proposed by R. Baillie),
Amer. Math. Monthly 87 (1980), 496–497.
[2] R. P. Boas, Jr., “Summation formulas and band-limited signals”, Tôhoku Math.
J. 24 (1972), 121–125.
[3] Gerald B. Folland, Fourier Analysis and Its Applications, American Mathemat-
ical Society, Providence, Rhode Island, 1992.
[4] Edwin Hewitt and Robert E. Hewitt, “The Gibbs–Wilbraham phenomenon:
An episode in Fourier analysis”, Archive for History of Exact Sciences 21 (1979),
129–160.
[5] J. R. Higgins, “Five short stories about the cardinal series”, Bull. Amer. Math.
Soc. 12 (1985), 45–89.
References 245
[6] A. Hurwitz, “Sur le problème des isopérimètres”, Comptes Rendus Acad. Sci.
Paris 132 (1901), 401–403.
[7] Dunham Jackson, Fourier Series and Orthogonal Polynomials, Mathematical
Association of America, Washington, D.C., Carus Monograph No. 6, 1941; reprinted
by Dover Publications, Mineola, NY, 2004.
[8 ] T.W. Körner, Fourier Analysis, Cambridge University Press, Cambridge, U.K.,
1988.
[9] W. Rogosinski, Fourier Series, Chelsea Publishing Co., New York, 1950. [Eng-
lish translation of Fouriersche Reihen, Sammlung Göschen, W. de Gruyter, Berlin,
1930.]
[10] David V. Widder, Advanced Calculus, Second edition, Prentice–Hall, Engle-
wood Cliffs, N.J., 1961; reprinted by Dover Publications, Mineola, NY, 1989.
[11] A. Zygmund, Trigonometric Series, Second edition, Cambridge University
Press, Cambridge, U.K., 1959.
Chapter 9
The Gamma Function
247
248 9. The Gamma Function
x
R
2R
x
1 2 3
y
2
x
1 2 3
for n = 1, 2, . . . , or
Γ(x + n)
Γ(x) = , x > 0.
x(x + 1) · · · (x + n − 1)
The last formula allows the definition of Γ(x) to be extended to the interval
x > −n, provided that x is not a negative integer or 0. Since n is an
arbitrary positive integer, this extends the definition of Γ(x) to the entire
real line, excluding the singular points x = 0, −1, −2, . . . .
Γ(x)Γ(y)
(5) B(x, y) = .
Γ(x + y)
which results from the substitution t = cos2 θ. The argument is very similar
to the calculation of the probability integral (1). The proof begins with a
comparison of integrals of the positive function
over the same three regions as before, as displayed in Figure 1. Here, how-
ever, x and y are fixed positive parameters, whereas u and v are the variables
of integration. After passing to polar coordinates with the substitutions
u = r cos θ and v = r sin θ, we arrive at the formulas
π/2 R
e−r r2x+2y−1 dr
2x−1 2y−1 2
(cos θ) (sin θ) dθ
0 0
R R
e−u u2x−1 du e−v v 2y−1 dv
2 2
≤
0 0
√
π/2 2R
e−r r2x+2y−1 dr .
2
≤ (cos θ)2x−1 (sin θ)2y−1 dθ
0 0
Many proofs have been found. We will give a particularly elementary proof
that uses the basic connection (5) between the beta and gamma functions,
plus the special formula
To prove (8), observe first that because of the symmetry of the integrand
in (4) when x = y, we can write
1/2
x−1
B(x, x) = 2 t(1 − t) dt , x > 0.
0
The existence of the limit is not obvious and is part of the assertion. The
result is called the Gauss product formula after Carl Friedrich Gauss (1777–
1855), although it was known to Euler. In fact, this was Euler’s original
construction in 1729, but Gauss rediscovered the formula and recognized its
importance.
One approach to (13) is to observe that
1
Γ(x)Γ(n + 1)
tx−1 (1 − t)n dt = B(x, n + 1) =
0 Γ(x + n + 1)
n! Γ(x) n!
= = .
(x + n)(x + n − 1) · · · xΓ(x) x(x + 1) . . . (x + n)
The substitution of t/n for t then leads to the representation
n ∞
n! nx t n
= t x−1
1− dt = gn (t)tx−1 dt ,
x(x + 1) · · · (x + n) 0 n 0
where n
1 − nt , for 0 ≤ t ≤ n
gn (t) =
0, for n < t < ∞ .
−t
Because gn (t) increases to the limit e as n → ∞, Dini’s theorem (see
Section 1.8) ensures that the integrals converge and
∞
n! nx
lim = e−t tx−1 dt = Γ(x) .
n→∞ x(x + 1) · · · (x + n) 0
This concludes the proof of the Gauss product formula.
It is now a short step to the infinite product representation of the gamma
function.
256 9. The Gamma Function
∞
1 γx
x −x/n
= xe 1+ e
Γ(x) n
n=1
is Euler’s constant.
x −x/n x x x2
1+ e = 1+ 1 − + 2 + ···
n n n 2n
x2 1
=1− 2 +O
2n n3
and the series 1/n2 converges. In fact, the product can be shown to
converge uniformly on each bounded subset of R, so that it represents a
continuous function on R. Note that the product is equal to zero at the
points x = 0, −1, −2, . . . where the gamma function is infinite.
1 x(x + 1) · · · (x + n)
= lim
Γ(x) n→∞ n! nx
x x x −x log n
= lim x 1 + 1+ ··· 1 + e
n→∞ 1 2 n
n
x −x/k
= lim xex(1+ 2 +···+ n −log n)
1 1
1+ e
n→∞ k
k=1
∞
x
= xeγx 1+ e−x/k .
k
k=1
The proof shows that the Gauss product formula (13) is essentially the
same as the infinite product representation of 1/Γ(x).
9.8. Bohr–Mollerup theorem 257
Γ(n + a) √
lim = 2π ,
n+a− 21
n→∞ n e−n
or
Γ(n + a) √
lim = 2π , 0 < a ≤ 1,
n+a− 21
n→∞ (n + a) e−(n+a)
n
since 1 + na → ea uniformly in the interval 0 < a ≤ 1. This proves
Stirling’s formula for the gamma function.
Geometrically, this says that the graph of f lies beneath every chord.
We begin by proving that log Γ(x) is convex, a property evident from its
graph in Figure 3. Suppose 0 < a < b and 0 < r < 1. Then by definition
∞
Γ(ra + (1 − r)b) = e−t tra+(1−r)b−1 dt
∞
0
−t a−1 r −t b−1 1−r
= e t e t dt
0
∞ r ∞ 1−r
−t a−1 −t b−1
≤ e t dt e t dt = Γ(a)r Γ(b)1−r ,
0 0
where Hölder’s inequality has been applied with the conjugate indices p =
1/r and q = 1/(1 − r). Taking logarithms, we conclude that
which shows that log Γ(x) is a convex function. (See also Exercise 26.)
Harald Bohr and Johannes Mollerup [3] discovered that the gamma func-
tion is actually characterized by its logarithmic convexity. Emil Artin [2]
gave an elegant presentation of their argument and clarified the role of log-
arithmic convexity.
Proof. The hypotheses G(x + 1) = xG(x) and G(1) = 1 imply that G(n +
1) = n! for n = 1, 2, . . . . For any positive integer n and for 0 < x ≤ 1, we
express
n + x = (1 − x)n + x(n + 1)
as a convex combination of n and n + 1. Then by the convexity hypothesis,
or
G(n + x) ≤ G(n)1−x G(n + 1)x = n! nx−1 .
In a similar way, the convex combination
n + 1 = x(n + x) + (1 − x)(n + x + 1)
mx−1/2 x x + 1
x+m−1
G(x) = Γ Γ ···Γ
(2π)(m−1)/2 m m m
is equal to Γ(x). In view of the Bohr–Mollerup theorem, it will suffice to
show that log G(x) is convex, G(x + 1) = xG(x), and G(1) = 1. The
logarithmic convexity follows at once from that of the gamma function. To
see that G(x+1) = xG(x), it is convenient to write G(x) = αm gm (x), where
m−1/2
αm =
(2π)(m−1)/2
260 9. The Gamma Function
and x x + 1
x+m−1
gm (x) = mx Γ Γ ···Γ .
m m m
Now observe that
x+m x
gm (x + 1) = m Γ Γ gm (x) = xgm (x) ,
m m
x x
since Γ x+m
m = Γ m + 1 = m
x
Γ m . Thus G(x + 1) = xG(x).
For this purpose we apply the Gauss product formula (13) to write
k n! nk/m mn+1
Γ = lim , k = 1, 2, . . . , m .
m n→∞ k(k + m)(k + 2m) · · · (k + nm)
But
(m + nm)! 1 2 m
= 1+ 1+ ··· 1 + →1
(nm)!(nm)m nm nm nm
as n → ∞, and so the expression (19) reduces to
m
1 2 (n!)m mmn
Γ Γ ···Γ = lim .
m m m n→∞ (nm)! n(m−1)/2
√
Now Stirling’s formula n! ∼ nn e−n 2πn can be applied to give
so that (1 + s2 )F (s) = 1.
Next make a change of variables to obtain
∞ ∞
Γ(a) = ta−1 e−t dt = xa ta−1 e−xt dt , x > 0, a > 0.
0 0
Take ε > 0 and use this form of the gamma function to write
∞ ∞ ∞
1
e−εx x−a sin x dx = e−εx sin x ta−1 e−xt dt dx
0 Γ(a)
0 ∞ ∞ 0
1
= ta−1 e−(ε+t)x sin x dx dt
Γ(a) 0 0
∞
1 1
= ta−1 dt ,
Γ(a) 0 1 + (ε + t)2
where the formula (21) has been invoked. To justify the interchange in order
of integration, observe that the exponential factor e−εx makes the integrand
absolutely integrable over the first quadrant of the (x, t) plane.
Now appeal to the integral analogue of Abel’s theorem (Section 3.2) to
conclude that
∞ ∞
−a
x sin x dx = lim e−εx x−a sin x dx
0 ε→0 0
∞ a−1
1 t
= dt .
Γ(a) 0 1 + t2
262 9. The Gamma Function
But the last integral is a beta function in disguise. Let u = t2 and refer to
Exercise 13 to write
∞ a−1
1 ∞ u 2 −1
a
t 1
2
dt = du = B( a2 , 1 − a2 ) .
0 1+t 2 0 1+u 2
B( a2 , 1 − a2 ) = Γ( a2 ) Γ(1 − a2 ) = π csc(πa/2) ,
More generally,
∞
xa−1 sin bx dx = b−a Γ(a) sin(πa/2) , 0 < a < 1, b > 0.
0
Exercises
3. (a) Prove that the function Γ(x) has derivatives of all orders at every
point x > 0, and that its nth derivative is given by
∞
Γ(n) (x) = e−t tx−1 (log t)n dt , n = 1, 2, . . . .
0
(b) Show that Γ (x) > 0 for x > 0, so that the curve y = Γ(x) is convex.
(c) Show that Γ(x) attains a minimum value for x > 0 at a point x0 in
the interval 1 < x0 < 2. Show further that Γ(x) is decreasing in the interval
(0, x0 ) and increasing in (x0 , ∞).
4. Show that
√
(2n)! π
Γ n + 12 = , n = 1, 2, . . . .
4n n!
Exercises 263
12. Find the area of the region bounded by the hypocycloid x2/3 + y 2/3 = 1.
u
13. Use the substitution t = 1+uto derive the formula
∞
ux−1
B(x, y) = du .
0 (1 + u)x+y
14. Apply the formula of the preceding exercise to calculate the integral
∞
x3 1
7
dx = .
0 (1 + x) 60
15. Calculate the integral
∞ −a
x π
dx = , 0 < a < 1.
0 1+x sin πa
More generally, calculate
∞ m−1
x π
n
dx = , 0 < m < n,
0 1+x n sin(mπ/n)
a result known to Euler. (Here m and n need not be integers.)
16. Compare the infinite product representation of Γ(x) with that of the
sine function to obtain another proof of the Euler reflection formula.
17. Prove that Γ (1) = −γ, where γ is Euler’s constant. Conclude that
∞
γ=− e−t log t dt .
0
Hint. Take logarithmic derivatives in the infinite product formula for Γ(x).
Justify the term-by-term differentiation.
18. Calculate the integral
∞ ∞
log x log n 1
x
dx = −γ log 2 + (−1)n = − (log 2)2 .
0 1+e n 2
n=1
Hint. Expand 1/(1 + ex ) = e−x /(1 + e−x ) into geometric series and integrate
term by term (justify), then refer to Exercise 13 of Chapter 2 for evaluation
of the infinite series.
Exercises 265
which is assumed to converge for all real numbers s larger than some number
s0 . For each exponent a > −1, show that the Laplace transform of f (t) = ta
is
Γ(a + 1)
F (s) = , s > 0.
sa+1
Show that L(f ∗ g) = L(f )L(g). In other words, show that the Laplace
transform of a convolution is the product of transforms.
21. If f (t) = tx−1 and g(t) = ty−1 for some x > 0 and y > 0, show that the
convolution h = f ∗ g has the form h(t) = tx+y−1 B(x, y). By taking Laplace
transforms, conclude that
Γ(x) Γ(y)
B(x, y) = .
Γ(x + y)
22. Use Stirling’s formula for the gamma function to show that
Γ(n + c − a) Γ(n + c − b)
lim =1
n→∞ Γ(n + c) Γ(n + c − a − b)
27. Apply Stirling’s formula to verify the Gauss product formula (13) in
the special case where x is a positive integer.
References
In this chapter we discuss two unrelated topics that lie at the intersection
of analysis with number theory. First we consider the notion of equidistri-
bution, a classical topic that has found applications beyond number theory
to such areas as probability, functional analysis, and topological algebra.
(The German and French terms are Gleichverteilung and équirépartition,
respectively. Another English term is uniform distribution.) Our discussion
will focus on a beautiful criterion for equidistribution (modulo 1) due to
Hermann Weyl.
Our second topic is the Riemann zeta function, which has intimate con-
nections with the prime numbers. We develop Euler’s product formula for
the zeta function, then give two relatively elementary derivations of the
functional equation, which involves the gamma function. In truth a full
appreciation of the functional equation requires some familiarity with com-
plex analysis, which guarantees the uniqueness of an analytic continuation.
Otherwise, however, the derivations are entirely self-contained.
10.1. Equidistributed sequences
Loosely speaking, a sequence of points is said to be equidistributed over
a given set if each subset receives its proper share of points. To make the
notion precise, let us consider the special case of a numerical sequence {αn }
in the interval [0, 1). The reason for choosing a half-open interval will become
apparent later. For any subinterval I ⊂ [0, 1), let ν(n, I) denote the number
269
270 10. Two Topics in Number Theory
ν(n, I)
lim = |I|
n→∞ n
for every interval I ⊂ [0, 1), where |I| denotes the length of I. The existence
of each such limit is part of the requirement.
For example, it is intuitively clear that the sequence
0 0 1 0 1 2 0 1 2 3
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, . . .
is equidistributed over [0, 1), although the proof requires a bit of effort. At
the other extreme, a convergent sequence can never be equidistributed. In
fact, it is easy to see that an equidistributed sequence must be everywhere
dense; that is, it must have a subsequence that converges to each point of the
interval [0, 1]. If not, then some interval I ⊂ [0, 1) of positive length is free
of all but a finite number of points αn in the sequence, and so ν(n, I)/n → 0
as n → ∞. Since |I| = 0, this violates the requirement for equidistribution.
In general it is not easy to determine whether a given sequence is equidis-
tributed. Our main purpose is to develop an effective criterion. Actually,
our discussion will focus on a modular notion of equidistribution. Two num-
bers x and y are said to be congruent modulo m, where m > 0 is a prescribed
number, if their difference x − y is an integer multiple of m. Such a con-
gruence is indicated by writing x ≡ y(mod m). The integer part of any real
number x is defined to be the greatest integer less than or equal to x, and is
denoted by [x]. The number x = x − [x] is called the fractional part of x.
In particular, 0 ≤ x < 1 and x ≡ x(mod 1) for every x ∈ R. Hence the
fractional part of a number x may be viewed as the unique representative
in the interval [0, 1) of the equivalence class determined by x.
An arbitrary numerical sequence {αn } is said to be equidistributed mod-
ulo 1 if its sequence {αn } of fractional parts is uniformly distributed in
[0, 1). If ξ is a fixed rational number, the sequence defined by αn = nξ
has periodic fractional parts and therefore is certainly not equidistributed
modulo 1. On the other hand, it is a remarkable fact that for any irrational
number ξ, the sequence {ξ, 2ξ, 3ξ, . . . } is equidistributed modulo 1. This
generalizes a classical result of Kronecker, who showed that the fractional
parts of each such sequence are everywhere dense in [0, 1]. Many elementary
proofs are now known. We will obtain a proof later as a corollary of Weyl’s
criterion.
Meanwhile, however, we digress to give a direct proof of Kronecker’s
theorem. Since ξ is irrational, the fractional parts nξ are all distinct. In
10.2. Weyl’s criterion 271
for every interval I ⊂ [0, 1). If the sequence {αn } is equidistributed modulo
1, the relation (1) can be generalized by linearity to periodic extensions of
arbitrary step functions on [0, 1). Here a step function is understood to be
a finite linear combination of characteristic functions of intervals in [0, 1).
This in turn implies the validity of (1) for each bounded Riemann integrable
function f that is periodic with period 1. To see this, recall that by the
definition of Riemann integrability, any such function f can be approximated
below and above by periodic extensions g and h of step functions on [0, 1).
More precisely, for each prescribed ε > 0, there are periodic extensions g
and h of step functions on [0, 1) for which g(x) ≤ f (x) ≤ h(x) and 0 ≤
272 10. Two Topics in Number Theory
Since the equation (1) holds for both g and h, we see that
1 1
1
n
lim g(αk ) = g(x) dx > f (x) dx − ε ,
n→∞ n 0 0
k=1
so that 1
1 1
n n
f (αk ) ≥ g(αk ) > f (x) dx − ε
n n 0
k=1 k=1
for all n sufficiently large. But ε was chosen arbitrarily, so the last two
inequalities combine to show that (1) holds for f as well. In summary, we
have shown that whenever a sequence {αn } is equidistributed modulo 1, it
has the property (1) for every periodic function f that is Riemann integrable
over [0, 1). In particular, the limit in (1) exists.
The most familiar examples of nonconstant Riemann integrable func-
tions of period 1 are sin 2πmx and cos 2πmx for integers m > 0. Apply-
ing the formula (1) to these functions, and using Euler’s formula eiθ =
cos θ + i sin θ to simplify the writing, we conclude that if a sequence {αn } is
equidistributed modulo 1, then
1 2πimαk
n
lim e = 0, m = 1, 2, . . . .
n→∞ n
k=1
1 2πimαk
n
(2) lim e =0
n→∞ n
k=1
Proof. We have already seen that (2) holds whenever the sequence {αk } is
equidistributed modulo 1. For the converse, suppose the sequence {αk } has
the property (2). It then follows by linearity that the relation (1) holds for
every trigonometric polynomial
N
f (x) = a0 + (am cos 2πmx + bm sin 2πmx) ,
m=1
since it holds trivially for constant functions. But according to the trigono-
metric version of the Weierstrass approximation theorem, every continuous
function that is periodic with period 1 can be approximated uniformly by
such polynomials. In fact, given any ε > 0, we can approximate such a
function f above and below by trigonometric polynomials t and u so that
t(x) ≤ f (x) ≤ u(x) , f (x) < t(x) + ε , and f (x) > u(x) − ε .
Similarly,
1 1
1 1
n n
lim inf f (αk ) ≥ lim t(αk ) = t(x) dx > f (x) dx − ε .
n→∞ n n→∞ n 0 0
k=1 k=1
Since ε was chosen arbitrarily, it follows that the limit exists and
1
1
n
lim f (αk ) = f (x) dx .
n→∞ n 0
k=1
Thus the relation (1) holds for every continuous function f that is periodic
with period 1.
Finally, for any given interval I ⊂ [0, 1), let f be the periodic extension
of the characteristic function of I. It is to be shown that (1) holds for
each such function f . But f can be approximated in integral norm above
and below by continuous periodic functions, and since we have shown that
such functions satisfy (1), an argument similar to the above leads to the
conclusion that f satisfies (1) as well. This completes the proof that the
sequence {αk } is equidistributed modulo 1.
(2) of Weyl’s theorem is satisfied for αk = kξ. But for αk = kξ the sum is a
geometric series, which can be evaluated as
n
e2πim(n+1)ξ − e2πimξ
e2πimkξ = .
e2πimξ − 1
k=1
since the sums telescope. Dividing the last inequality by n and using the
hypothesis that n Δαn → ∞, we conclude that (2) holds. To see that
(αn+1 − α1 ) /n → 0, observe that those expressions are arithmetic means
of the sequence {Δαn }, which tends to 0. Because (2) holds for each m =
1, 2, . . . , Weyl’s theorem shows that the sequence {αn } is equidistributed
modulo 1.
It is now a short step to a special case which Pólya and Szegő [7] attribute
to Fejér.
Proof. By the mean value theorem, Δαn = g (xn ) for some point xn ∈
(n, n+1), and so the sequence {αn } satisfies the hypotheses of the preceding
theorem.
√
Fejér’s theorem shows for instance that the sequences { n log n},
{(log n)2 }, and {n/ log n} are equidistributed modulo 1. The discrete form of
276 10. Two Topics in Number Theory
the theorem is more flexible. For example, it shows that the partial sums of
√
the divergent series ∞n=1 1/ n constitute a sequence that is equidistributed
modulo 1.
On the other hand, the sequence {log n} is not equidistributed modulo
1, as can be seen from the following partial converse of Fejér’s theorem.
1 2πiαk
n
σn = e →0 as n → ∞ .
n
k=1
Euler showed that ζ(2) = π 2 /6 and he found a general expression for ζ(2m),
where 2m is any even integer, in terms of the Bernoulli numbers. (Details
will be given in Chapter 11.) Some closely related series are
∞
∞ ∞
1 1 1 1
= − = 1 − ζ(x)
(2k + 1)x nx (2k)x 2x
k=0 n=1 k=1
and
∞
∞
∞
(−1)n+1 1 1 1
= −2 = 1 − x−1 ζ(x).
nx nx (2k)x 2
n=1 n=1 k=1
The zeta function has intrinsic connections with number theory. For
instance, it is not hard to see that
∞
ζ(x)2 = d(n) n−x , x > 1,
n=1
At a more basic level, the zeta function is an essential tool in the study
of prime numbers. An integer k is a divisor of n if n = km for some integer
m. An integer p > 1 is said to be prime if it has no positive divisors except
1 and p. The first few primes are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, . . . .
An integer larger than 1 that is not prime is called composite. It is not
difficult to see that every composite number has a prime divisor.
It was known to the ancient Greeks that there are infinitely many prime
numbers. Here is Euclid’s elegant proof. Suppose, on the contrary, that
there were only a finite number of primes: p1 , p2 , . . . , pn . Then since the
number N = p1 p2 · · · pn + 1 is greater than every prime pk , it is composite
and therefore has a prime divisor. But clearly N is not divisible by any of
the primes pk . This contradiction shows that the number of primes cannot
be finite.
Each positive integer has a unique prime factorization. For instance,
60 = 22 · 3 · 5.
278 10. Two Topics in Number Theory
and the fundamental theorem of arithmetic. For each fixed x > 1 and
integers M > N ≥ 2 we have
1 1 1
∞
1
1 + x + 2x + · · · + M x ≤ = ζ(x) .
p p p nx
p≤N n=1
N
1
1 1 1
≤ 1 + x + 2x + · · · + M x
nx p p p
n=1 p≤N
1
1
≤ −x
≤ .
1−p p
1 − p−x
p≤N
N
1
1
≤
n=1
n
p≤N
1− 1
p
280 10. Two Topics in Number Theory
for some positive constants A and B. Then in 1896, Jacques Hadamard and
Charles de la Vallée Poussin used results of Riemann on the zeta function
to give independent proofs of the asymptotic relation (3), now known as the
prime number theorem.
An equivalent form of the prime number theorem is that pn ∼ n log n
as n → ∞. To see this, note that there are exactly n primes less than or
equal to pn , which says that π(pn ) = n. Since pn → ∞ as n → ∞, the
prime number theorem in the form (3) implies that π(pn ) ∼ pn / log pn , so
that pn ∼ n log pn . In other words, pn /(n log pn ) → 1, and it follows that
But log log pn / log pn → 0, so this shows that log pn ∼ log n, which allows us
to conclude that pn ∼ n log n. A similar argument shows that the asymptotic
relation pn ∼ n log n implies (3). The details are left as an exercise.
Further discussion of prime numbers can be found for instance in the
book by Tenenbaum and Mendès France [8]. The article by Bateman and
Diamond [1] gives a nice historical account of the prime number theorem.
Observe now that each of the terms on the right-hand side is defined and
continuous for all x > 0, except for the singularity of 1/(x − 1) at x = 1.
Consequently, since Γ(x) is defined and Γ(x) > 0 for x > 0, we have extended
the definition of ζ(x) to all x > 0, with a singularity at x = 1. Moreover,
we see that ζ(x) − 1/(x − 1) is continuous in the interval (0, ∞) because
Γ(1) = 1.
This extension of the zeta function into the interval (0, 1) may appear
rather arbitrary, but in the context of functions of a complex variable the
extension is found to be uniquely determined. If z is a complex number, the
formulas
∞ ∞
−z
ζ(z) = n and Γ(z) = e−t tz−1 dt
n=1 0
282 10. Two Topics in Number Theory
define ζ(z) and Γ(z) as analytic functions in the half-planes Re{z} > 1 and
Re{z} > 0, respectively. A generalization of the process just described then
extends ζ(z) to an analytic function in the half-plane Re{z} > 0, except
for a pole at z = 1. But a basic principle of complex analysis says that an
analytic extension can be given in at most one way.
In the language of analytic number theory, the vertical strip 0 < Re{z} <
1 is called the critical strip for the zeta function. The Riemann hypothesis,
perhaps the most famous unsolved problem in mathematics, is the conjecture
that all zeros of ζ(z) in the critical strip are situated on the line Re{z} = 12 ,
known as the critical line. The Riemann hypothesis, if true, would have
important implications in number theory. In particular, it would lead to a
sharper form of the prime number theorem.
For 0 < x < 1 this equation exhibits a certain symmetry of the extended
function ζ(x) about the point x = 12 . For x > 1 it defines a natural ex-
tension of the zeta function to the negative real axis, the restriction of its
analytic continuation to the left half-plane Re{z} < 0. Since the left-hand
side is continuous and positive for x > 1, the functional equation (5) shows
that the same is true for Γ(x/2)ζ(x) when x < 0. But the gamma func-
tion is continuous and nonvanishing on the negative real line except for its
singularities at the negative integers, so the extended zeta function has the
same properties except that the singularities of Γ(x/2) at the negative even
integers must be canceled by zeros of ζ(x). In other words, the functional
equation shows that the zeta function is continuous and nonvanishing on the
negative real line except that ζ(x) = 0 at the points x = −2, −4, −6, . . . .
These points are sometimes called the “trivial zeros” of the zeta function.
It is known that infinitely many zeros lie on the critical line, and that only
the trivial zeros lie outside the critical strip. Computer calculations have
located millions of zeros on the critical line and none elsewhere in the critical
strip, providing overwhelming numerical evidence in favor of the Riemann
hypothesis, but no proof has been found.
We will give two distinct proofs of the functional equation. Both are
elementary, but a full appreciation of either argument requires a little back-
ground in complex analysis; in particular, some familiarity with analytic
functions and the uniqueness of an analytic continuation.
10.5. Functional equation 283
The first proof is due to G. H. Hardy [3, 4]. It begins with the Fourier
series
∞
sin(2n + 1)t π
(6) F (t) = = (−1)m , mπ < t < (m + 1)π ,
2n + 1 4
n=0
derived at the end of the last chapter (Section 9.9). Deferring a justification
of the term-by-term integration, we obtain
∞ ∞
∞
x−1 1
t F (t) dt = tx−1 sin(2n + 1)t dt
0 2n + 1 0
n=0
∞
1
= Γ(x) sin(πx/2)
(2n + 1)x+1
n=0
= Γ(x) sin(πx/2) 1 − 2−x−1 ζ(x + 1) , 0 < x < 1.
On the other hand, taking account of the sum of the Fourier series (6), we
find that
∞ ∞ (m+1)π
x−1 π m
t F (t) dt = (−1) tx−1 dt
0 4 mπ
m=0
∞
π x+1
= 1+ (−1) (m + 1) − m
m x x
.
4x
m=1
The last series converges for 0 < x < 1, by Leibniz’ alternating series theo-
rem, since the sequence {(m + 1)x − mx } decreases to zero as m → ∞. But
for x < 0 the same sequence increases to zero, so the series again converges.
In fact, the convergence is uniform in each interval −∞ < x ≤ b < 1, so the
sum
∞
S(x) = 1 + (−1)m (m + 1)x − mx
m=1
is well defined and continuous in the interval (−∞, 1). Trivially, S(0) = 1.
Observe now that for x < −1 the sum is
S(x) = 2 (1x − 2x + 3x − . . . ) = 2 1 − 2x+1 ζ(−x) .
284 10. Two Topics in Number Theory
By analytic continuation, this relation S(x) = 2 1 − 2x+1 ζ(−x) extends
to all x < 1 and effectively defines ζ(−x) for 0 ≤ x < 1. In particular, for
0 < x < 1 the two calculations of the integral can be equated to obtain
π x+1
1 − 2x+1 ζ(−x) = Γ(x) sin(πx/2) 1 − 2−x−1 ζ(x + 1) ,
2x
which reduces to
2x π x+1 ζ(−x) = Γ(x + 1) sin(πx/2) ζ(x + 1) .
Replacing x by x − 1, we arrive at the functional equation
(8) ζ(1 − x) = 21−x π −x cos(πx/2)Γ(x)ζ(x)
for the zeta function, an equivalent form of (5). The details of this equiva-
lence are left as an exercise.
It remains to justify the interchange of summation and integration:
∞ ∞ ∞
sin(2n + 1)t
(9) tx−1 F (t) dt = tx−1 dt , 0 < x < 1.
0 0 2n + 1
n=0
Observe first that the partial sums of the series
∞
∞ ∞
sin(2n + 1)t sin nt sin 2nt
= −
2n + 1 n 2n
n=0 n=1 n=1
are uniformly bounded. (See Chapter 8, Exercise 15.) On the other hand, it
can be shown by the technique of Abel summation (cf. Chapter 3, Exercise
12) that this series converges uniformly in each closed set that contains no
multiple of π (cf. Chapter 8, Exercise 16). These two facts allow us to
conclude that
R ∞ R
x−1 sin(2n + 1)t
t F (t) dt = tx−1 dt , 0 < x < 1,
0 0 2n + 1
n=0
for arbitrary R < ∞. Hence the proof of (9) reduces to showing that
∞
∞
1
(10) lim tx−1 sin(2n + 1)t dt = 0 .
R→∞ 2n + 1 R
n=0
But an integration by parts gives
∞ x−1
R cos(2n + 1)R
t x−1
sin(2n + 1)t dt =
2n + 1
R
∞
x−1
+ t x−2
cos(2n + 1)t dt
2n + 1 R
∞
Rx−1 1−x 2 Rx−1
≤ + tx−2 dt = ,
2n + 1 2n + 1 R 2n + 1
and (10) follows. This completes the proof of (9) and justifies the term-by-
term integration used to derive the functional equation for the zeta function.
10.5. Functional equation 285
where
∞
e−n
2 πt
(12) ψ(t) = .
n=1
+ tx/2−1 ψ(t) dt
1
∞ ∞
1
= + s−x/2−1/2 ψ(s) ds + tx/2−1 ψ(t) dt ,
x(x − 1) 1 1
But it is easy to see that ψ(t) ≤ Ce−πt for some constant C > 0 and all
t ≥ 1, and so the last integral in (14) converges for every x ∈ R. Thus the
relation (14) defines a natural extension of the zeta function to the whole
real line, except for the singularity at x = 1. A simple calculation reveals
that the right-hand side of (14) is unchanged if x is replaced by 1 − x, and
so the functional equation (5) follows.
The essence of the proof is to derive the functional equation for the zeta
function from Jacobi’s inversion formula for the theta function. On the other
hand, it is possible to reverse the argument and derive Jacobi’s formula
from the functional equation. Thus the two relations can be regarded as
equivalent.
Readers familiar with complex function theory may wish to consult the
book by Titchmarsh [10], where both of the preceding proofs of the func-
tional equation can be found, together with several others, in the more
natural setting of the complex plane. Further details of Hardy’s proof are
given in Titchmarsh [9], Section 4.45.
Bernhard Riemann had an unusually fertile imagination. Although he
published only a few papers, each one introduced new ideas of fundamental
importance. Born in Hanover (now part of Germany) in 1826, he entered
the University of Göttingen at age 19 but soon transferred to Berlin, where
he studied under Dirichlet and Jacobi. Returning to Göttingen in 1849, he
became a student of Gauss and completed an inaugural dissertation on geo-
metric aspects of complex function theory, including the Riemann mapping
theorem and the concept of a Riemann surface. His Habilitationschrift in
1853 focused on functions representable as the sum of a trigonometric se-
ries and led to him to formalize the notion of a Riemann integral. In his
famous Habilitationsvortrag of 1854 he presented the fundamental ideas of
Riemannian geometry. When Gauss died in 1855, Dirichlet was appointed
his successor, then when Dirichlet died four years later the position passed
to Riemann. At that point Riemann produced his celebrated paper on the
zeta function, viewing it not as a function of a real variable as Euler had
done, but of a complex variable. There he derived the functional equation,
stated the Riemann hypothesis, and showed its relevance to the distribution
of prime numbers. After several years of ill health, Riemann succumbed to
tuberculosis in 1866, two months short of his fortieth birthday.
Exercises
1. Show that the sequence
0 0 1 0 1 2 0 1 2 3
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, . . .
4. Show that the sequence {cos n} is everywhere dense in the interval [−1, 1].
In particular,
8. Show that the sequence {(log n)p } is not equidistributed modulo 1 when
p ≤ 1.
References
[1] P. T. Bateman and H. G. Diamond, “A hundred years of prime numbers”, Amer.
Math. Monthly 103 (1996), 729–741.
[2] H. Davenport, The Higher Arithmetic: An Introduction to the Theory of Num-
bers, Eighth edition, Cambridge University Press, Cambridge, U.K., 2008.
[3] G. H. Hardy, “A new proof of the functional equation for the zeta-function”,
Matematisk Tidsskrift B, 1922, 71–73.
[4] G. H. Hardy, “On the integration of Fourier series”, Messenger of Math. 51
(1922), 186–192.
[5] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers,
Third edition, Oxford University Press, Oxford, 1954.
[6] L. Kuipers and H. Niederreiter, Uniform Distribution of Sequences, New York,
Wiley, 1974; reprinted by Dover Publications, Mineola, NY, 2006.
[7] G. Pólya and G. Szegő, Aufgaben und Lehrsätze aus der Analysis, Band 1.
Vierte Auflage, Springer-Verlag, Heidelberg, 1970; English edition: Problems and
Theorems in Analysis, Volume 1, Springer-Verlag, New York, 1972.
[8] G. Tenenbaum and M. Mendès France, Les Nombres Premiers, Presses Univer-
sitaires de France, Paris, 1997; English translation: The Prime Numbers and Their
Distribution, American Mathematical Society, Providence, RI, 2000.
[9] E. C. Titchmarsh, The Theory of Functions, Second edition, Oxford University
Press, London, 1939.
[10] E. C. Titchmarsh, The Theory of the Riemann Zeta-Function, Oxford Univer-
sity Press, London, 1951.
[11] Hermann Weyl, “Über die Gleichverteilung von Zahlen mod. Eins”, Math.
Annalen 77 (1916), 313–352.
Chapter 11
Bernoulli Numbers
291
292 11. Bernoulli Numbers
Bn ∞
x
= 1 − 1
x + xn .
ex − 1 2 n!
n=2
is an even function; it has the property f (−x) = f (x). This implies that
its power series contains only even powers of x. In other words, B3 = B5 =
B7 = · · · = 0.
The Bernoulli numbers of even index are best calculated with the help
of a recurrence relation. Multiplying both sides of the generating relation
(1) by the function
∞
1 n
ex − 1 = x ,
n!
n=1
we find that
∞
∞
∞
Bn 1 n
n
x= x x = c n xn ,
n! n!
n=0 n=1 n=1
where
B0 B1 B2 Bn−1
cn = + + +··· + .
0! n! 1! (n − 1)! 2! (n − 2)! (n − 1)! 1!
n−1
n
(3) Bk = 0 , n = 2, 3, . . . .
k
k=0
1
1
24 B0 + 16 B1 + 14 B2 + B3 = 1
24 − 1
12 + 1
24 + 16 B3 = 0 ,
6
11.1 Calculation of Bernoulli numbers 293
1
5 B0 + B1 + 2B2 + 2B3 + B4 = 0 ,
so that
B4 = − 15 + 1
2 − 1
3 = − 30
1
.
More generally, the equation c2n+1 = 0 takes the form
1
n
1 (2n)!
(4) − + B2k = 0 ,
2n + 1 2 (2n − 2k + 1)! (2k)!
k=1
1
7 − 1
2 + 3B2 + 5B4 + B6 = 0 ,
which gives
B6 = − 17 + 1
2 − 1
2 + 1
6 = 1
42 .
For n = 4 the relation (4) is
1
9 − 1
2 + 4B2 + 14B4 + 28
3 B6 + B8 = 0 ,
and so
B8 = − 19 + 1
2 − 2
3 + 7
15 − 2
9 = − 30
1
.
For convenient reference, here is a summary of results:
B0 = 1 , B1 = − 12 , B2 = 1
6 , B4 = − 30
1
, B6 = 1
42 , B8 = − 30
1
.
B10 = 5
66 , B12 = − 2730
691
, B14 = 7
6 , B16 = − 3617
510 , etc.
n(n + 1)(2n + 1)
12 + 22 + 32 + · · · + n2 = ,
6
2
n(n + 1)
13 + 23 + 33 + · · · + n3 = , ...
2
are also easy to verify by induction, but are perhaps more difficult to dis-
cover. One method of discovery is to guess by analogy with the integral
of xp that the sum 1p + 2p + · · · + np will be a polynomial in n of degree
p + 1 and to determine the coefficients by interpolation of numerical data.
However, this method does not lead to a general formula for the sum. Jacob
Bernoulli found such a formula in terms of Bernoulli numbers.
Bernoulli’s formula is
p+1
1 p+1
n
p
(5) k = Bp−j+1 (n + 1)j , p = 1, 2, . . .
p+1 j
k=1 j=1
where
p+1 (p + 1)!
=
j (p − j + 1)! j!
is a binomial coefficient. If p = 2, for instance, the formula (5) reduces to
n
n(n + 1)(2n + 1)
k 2 = B2 (n + 1) + B1 (n + 1)2 + 13 B0 (n + 1)3 = .
6
k=1
n
n
e(n+1)x − 1
e kx
= (ex )k = , x = 0 .
ex − 1
k=0 k=0
11.3. Euler’s sums 295
Comparing the two expressions, and recalling the generating relation (1),
we see that
n
x
(n+1)x
∞ xp+1
kp = x e −1
p! e −1
p=0 k=0
∞ ∞⎛ ⎞
Bk (n + 1)j ∞
= xk ⎝ xj ⎠ = ap xp+1 ,
k! j!
k=0 j=1 p=0
where
p+1
1
ap = Bp−j+1 (n + 1)j .
(p − j + 1)! j!
j=1
n
p+1
p!
k p = p! ap = Bp−j+1 (n + 1)j , p = 1, 2, . . . .
(p − j + 1)! j!
k=0 j=1
(See Section 3.7.) Euler solved the problem in 1735 with the sensational
discovery that the sum of the series is π 2 /6. Shortly thereafter, Euler also
found that
∞
∞
1 π4 1 π6
(6) = , = ,
n4 90 n6 945
n=1 n=1
B2k2 is a Bernoulli
where number. Observe that for k = 1 the sum (7) reduces
to 1/n = π 2 /6, since B2 = 1/6.
296 11. Bernoulli Numbers
Euler’s formula 1/n2 = π 2 /6 has been verified by many methods,
some quite elementary, but no proof is really simple. Two elementary proofs
were presented in Section 3.7 of this book. Euler based a proof on the infinite
product formula for the sine function (cf. Section 8.5). His result often
appears as a special case of a Fourier expansion, but then it depends on a
theorem ensuring that the Fourier series actually converges to the function
at the point in question. In fact, Fourier series provide a key to the proof of
Euler’s general formula (7), since Fourier expansion of the function cos cx
leads to the formula
∞
2t2
πt cot πt = 1 + ,
t2 − n2
n=1
Now recall Euler’s formula eit = cos t + i sin t and its consequences:
Equating the coefficients of t2k in the two expansions (8) and (10) for
πt cot πt, we arrive at Euler’s formula (7).
A more direct version of the proof can be based instead on the expression
for πt coth πt obtained in Exercise 29 of Chapter 8 as an application of the
11.4. Bernoulli polynomials 297
For t = 0 this reduces to the generating relation (1) for the Bernoulli num-
bers, and so bn (0) = Bn . To see that bn (t) is a polynomial of degree n,
multiply the power series
∞ k
t
etx = 1 + xk
k!
k=1
by that of (1) and compare with (13) to calculate the coefficients b0 (t) ≡ 1
and
bn (t) Bn−k
n
= tk , n = 1, 2, . . . ,
n! (n − k)! k!
k=0
298 11. Bernoulli Numbers
or
n
n
(14) bn (t) = Bn−k tk .
k
k=0
for n = 2, 3, . . . , where the sum extends over all integers k, positive and
negative, except for k = 0. The same formula holds for n = 1 but is restricted
to the open interval 0 < t < 1, since the periodic extension of b1 (t) = t − 12
is discontinuous at the endpoints. Recall that bn (0) = bn (1) = Bn for every
n ≥ 2.
To verify the expansion (17), write
∞
bn (t) = cnk e2πikt , 0 < t < 1,
k=−∞
This completes the inductive argument and shows that the formula holds
for each k = 0 and all n ≥ 1. Hence the polynomial bn (t) has the Fourier
expansion (17).
For n = 2m and t = 0, the expression (17) reduces to
∞
2(2m)! 1
B2m = b2m (0) = − ,
(2πi)2m k 2m
k=1
so that
∞
2m
1 m+1 (2π)
= (−1) B2m .
k 2m 2(2m)!
k=1
Thus we have found another proof of Euler’s formula (7) for the value of the
zeta function at an even integer.
In general, the formula (17) takes the form
∞
2(2m)! cos 2πkt
(18) b2m (t) = (−1) m+1
, 0 ≤ t ≤ 1,
(2π)2m k 2m
k=1
for odd indices n = 2m+1, with the proviso that (19) holds only for 0 < t < 1
when m = 0.
suppose a function f (t) is defined on the whole real line and has derivatives
of all orders. Introduce the Bernoulli polynomial
b1 (t) = t − 1
2 = 12 b2 (t)
Now integrate by parts again, invoking the relations b2 (0) = b2 (1) = B2 and
b2 (t) = 13 b3 (t), to write
1
1 B2 1 1
f (t) dt = f (0) + f (1) + f (0) − f (1) + b3 (t)f (t) dt .
0 2 2! 3! 0
The last integral is now transformed in the same way, making repeated use
of the relations
1
bk (t) = b (t) and bk (0) = bk (1) = Bk ,
k + 1 k+1
and recalling that Bk = 0 for odd indices k ≥ 3. The result is
1 1
1 1 1 1
b3 (t)f (t) dt = b3 (t)f (t) − b (t)f (t) dt
3! 0 3! 0 4! 0 4
1 1 1 1
= − b4 (t)f (t) + b4 (t)f (4) (t) dt
4! 0 4! 0
B4
1 1
= f (0) − f (1) − b5 (t)f (5) (t) dt .
4! 5! 0
so that Pn (t) = bn (t) for 0 ≤ t ≤ 1 and Pn (t+1) = Pn (t) for all t ∈ R. These
functions Pn (t) are called the Bernoulli periodic functions. They retain the
properties Pn (t) = nPn−1 (t) and Pn (j) = Pn (j +1) = Bn , and so the formula
(20) is generalized to
(21)
j+1
1
B2k (2k−1)
f (t) dt = f (j) + f (j + 1) + f (j) − f (2k−1) (j + 1)
j 2 (2k)!
k=1
j+1
1
− P2+1 (t)f (2+1)(t) dt , = 1, 2, . . . .
(2 + 1)! j
Finally, the formulas (21) for intervals [j, j + 1] can be added to obtain
a general formula for any interval [m, n] where m and n are integers with
m < n. It is
n
n−1
f (t) dt = 12 f (m) + f (j) + 12 f (n)
m j=m+1
(22)
B2k (2k−1)
+ f (m) − f (2k−1) (n)
(2k)!
k=1
n
1
− P2+1 (t)f (2+1)(t) dt , = 1, 2, . . . .
(2 + 1)! m
This is the Euler–Maclaurin summation formula. It is valid for any function
f that has continuous derivatives up to order 2 + 1 on the interval [m, n].
Integration by parts gives the alternate form
n n
1 1
(23) P2 (t)f (t) dt = −
(2)
P2+1 (t)f (2+1)(t) dt
(2)! m (2 + 1)! m
for the remainder. It is often more convenient for estimation because the
inequality |Pn (t)| ≤ |Bn | holds for even indices n (cf. Exercise 15).
Apply the Euler–Maclaurin formula (22) with m = 1 and f (t) = 1/t. The
derivatives are
k!
f (k) (t) = (−1)k , k = 1, 2, . . . ,
tk+1
11.6. Applications of Euler–Maclaurin formula 303
∞ √
B2k 1
(25) 1− + P2+1 (t) t−(2+1) dt = log 2π
(2k − 1)(2k) 2 + 1 1
k=1
The infinite series in (27) diverges for every fixed value of n, but any partial
sum approximates the left-hand side as n → ∞ with error tending to zero
at the same rate as the first neglected term.
The formal infinite series in the asymptotic formula (27) is known as
Stirling’s series. The exponential form is
∞
n! B2k 1
√ exp
n −n 2πn (2k − 1)(2k) n2k−1
(28) n e k=1
1 1 139 571
1+ + 2
− 3
− + ... .
12 n 288 n 51840 n 2488320 n4
Exercises
1. Use Bernoulli’s formula (5) to derive the relation
n
3 n(n + 1) 2
k = , n = 1, 2, . . . .
2
k=1
3. Use Euler’s formula to check the sums (6) for ζ(2), ζ(4), and ζ(6), and
to calculate the sum
∞
1 π8
ζ(8) = = .
n8 9450
n=1
9. Deduce the formulas (11) and (12) from Euler’s formula (7).
10. Show that the tangent function has the Taylor series expansion
∞
4n (4n − 1)
tan x = (−1)n+1 B2n x2n−1
(2n)!
n=1
π
= x + 13 x3 + 2 5
15 x + 17 7
315 x + ... , |x| < .
2
Hint. tan x = cot x − 2 cot 2x. For radius of convergence, use Exercise 4.
Exercises 307
13. Apply the Fourier expansion (18) of the Bernoulli polynomial b2n (t)
and Euler’s formula (12) to show that
b2n 12 = (21−2n − 1)B2n , n = 1, 2, . . . ,
a formula that holds trivially for odd indices (cf. Exercise 12). Conclude
that |b2n ( 12 )| < |B2n | for n = 1, 2, . . . .
1
14 . Show that b4n+1 (t) < 0 and b4n+3 (t) > 0 for 0 < t < 2, where
n = 0, 1, 2, . . . .
Suggestion. Proceed by induction and consider bn (t).
15. Prove that |b2n (t)| < |B2n | for 0 < t < 1 , n = 1, 2, . . . .
Suggestion. Use a computer to plot the polynomials and see what needs to
be proved. Mathematica will plot the nth Bernoulli polynomial bn (t) over
the interval 0 ≤ t ≤ 1 from the code Plot[BernoulliB[n,t], {t,0,1}].
16. Derive the formula bn 12 = (21−n − 1)Bn directly from the generating
relation (13), by appeal to the expansion (9) for t coth t.
Hint. coth x − coth 2x = csch 2x .
19. Show that Bernoulli’s formula (5) for the sum of positive powers is a
special case of the Euler–Maclaurin summation formula.
308 11. Bernoulli Numbers
20. Carry out the calculations to derive the numerical formula (28) from
Stirling’s series (27).
References
[1] Konrad Knopp, Theory and Application of Infinite Series, Second English edi-
tion, Blackie & Son, London and Glasgow, 1951; reprinted by Dover Publications,
Mineola, NY, 1990.
[2] H. L. Montgomery and R. C. Vaughn, Multiplicative Number Theory I. Classical
Theory, Cambridge University Press, Cambridge, U.K., 2007.
[3] G. Rza̧dkowski, “A short proof of the explicit formula for Bernoulli numbers”,
Amer. Math. Monthly 111 (2004), 432–434.
[4] V. S. Varadarajan, Euler Through Time: A New Look at Old Themes, American
Mathematical Society, Providence, RI, 2006.
Chapter 12
The Cantor Set
In this chapter we explore some of Georg Cantor’s ideas about abstract sets,
including the notion of cardinality. The centerpiece of our discussion is Can-
tor’s famous “middle-thirds” set, an uncountable set of Lebesgue measure
zero that is a valuable source of counterexamples in analysis. For instance,
the Cantor set is the basis for construction of the “devil’s staircase”, de-
scribed by a continuous nondecreasing function that is not constant, yet
has zero derivative at almost every point. The Cantor set also provides the
underlying concept for construction of a space-filling curve, as presented at
the end of the chapter. We begin, however, with an extended discussion of
cardinal numbers.
309
310 12. The Cantor Set
ideas were revolutionary, and they met with considerable resistance at the
time, notably from Cantor’s former teacher Kronecker, who firmly believed
that mathematical proofs should be constructive. The opposition was so
intense that Cantor was unable to obtain a proper academic position until
1879, when he became a professor at the University of Halle. He remained
in Halle for the rest of his career.
Two abstract sets (not necessarily sets of real numbers) are said to have
the same cardinality, or the same cardinal number, if they are bijectively
equivalent; that is, if they can be put in one-to-one correspondence. Thus
two finite sets have the same cardinality n if both sets contain exactly n
elements. Each set can then be put in one-to-one correspondence with the
set {1, 2, . . . , n} of the first n positive integers. A set is said to be countable
or denumerable if either it is finite or it can be put in one-to-one correspon-
dence with the set N = {1, 2, 3, . . . } of all positive integers. In the latter
case it is called countably infinite. Clearly, any subset of a countable set
is countable. According to Cantor’s notation, a countably infinite set has
cardinality ℵ0 , usually pronounced “aleph-naught” or “aleph-null”. (Aleph
is the first letter of the Hebrew alphabet.) Thus the set Q of all rational
numbers has cardinality ℵ0 . Cantor also proved that the set of all algebraic
numbers is countable; it too has cardinality ℵ0 . Cardinal numbers of infinite
sets are sometimes called transfinite numbers.
Observe that two infinite sets may have the same cardinality even though
one is a proper subset of the other. For instance, the set {2, 4, 6, . . . } of all
even integers is in obvious one-to-one correspondence with the set N through
the mapping n ↔ 2n. Cantor introduced the diagonal method to prove that
the set R of all real numbers is not countable. (See Section 1.2.) In other
words, its cardinality is larger than ℵ0 , and it is in this sense “more infinite”
than its subset Q. The cardinality of R is customarily denoted by c and is
called the cardinality of the continuum.
The notion that c is larger than ℵ0 can be formulated in greater gener-
ality. The cardinal number of an abstract set A is denoted by |A|. If B is
another set and there exists an injective mapping f : A → B, then we say
that |A| ≤ |B|. If in fact there exists a injection of A onto B, or a bijection,
then |A| = |B|. If |A| ≤ |B| but no such bijection exists, then we say that
|A| < |B|. In this sense we know that ℵ0 < c.
It is worth remarking that the existence of an injective mapping f : A →
B is equivalent to the existence of a surjective mapping g : B → A. In other
words, there is a one-to-one mapping of A into B if and only if there is a
mapping of B onto A.
For finite cardinal numbers, or ordinary positive integers n and m, the
two relations n ≤ m and m ≤ n imply n = m. This implication can be shown
12.1. Cardinal numbers 311
The result is very useful because it allows us to conclude that two sets
have the same cardinality without explicitly constructing a bijection. The
theorem is sometimes called the Cantor–Schröder–Bernstein theorem or the
Cantor–Bernstein theorem, or simply Bernstein’s theorem. After Cantor
made the conjecture, Ernst Schröder (1841–1902) and Felix Bernstein (1878–
1956) gave independent proofs in 1896 and 1897, respectively. Bernstein
presented his proof, at age 19, in Cantor’s seminar in Halle.
Note that x ∈
/ E implies x ∈ / E0 , so x ∈ g(B) and g −1 (x) is well defined.
We claim that h is the required bijection.
To show that h is a one-to-one mapping, suppose on the contrary that
f (x1 ) = g −1 (x2 ) for some elements x1 ∈ E and x2 ∈
/ E. But x1 ∈ E implies
x1 ∈ En for some index n, so
The property (i) implies that g(y) = g(f (x)) for all x ∈ E. In particular,
g(y) = g(f (x)) for all x ∈ En , where n = 0, 1, 2, . . . . Therefore, g(y) ∈
/ En+1
312 12. The Cantor Set
Cantor’s Theorem. The power set of every nonempty set A has cardinal-
ity |P(A)| > |A|.
Proof of theorem. First note that |A| ≤ |P(A)|, since we can construct
an injection of A into P(A) by assigning the singleton subset {x} to any
element x ∈ A. To show that the cardinality of P(A) is strictly larger than
that of A, let f : A → P(A) be an arbitrary injection, and define the set
B ∈ P(A) by
B = {x ∈ A : x ∈
/ f (x)} .
Then g(A) has a binary expansion ending is a string of ones precisely when
N \ A is finite. The corresponding binary expansion of the same number
g(A) − 1 that ends in a string of zeros will occur when A is finite, which is
part of the case where N \ A is infinite. These considerations show that g
is an injection of P(N) into the interval [0, 2], so that P(N) has cardinality
|P(N)| ≤ |[0, 2]| = c. Thus |P(N)| = c, by the Schröder–Bernstein theorem.
The proof is left as an exercise. On the basis of the lemma, the measure
of a bounded open set E ⊂ R can be defined by
m(E) = m(Ik ) , where E= Ik
k k
where the supremum is taken over all closed sets F contained in A and
the infimum extends over all bounded open sets E that contain A. If A is
measurable, its Lebesgue measure m(A) is defined as the common value of
the supremum and infimum.
Lebesgue measure has many desirable properties. If A and B are mea-
surable sets with A ⊂ B, it is easily seen that m(A) ≤ m(B). For arbitrary
measurable sets A and B, it is possible to prove that A ∪ B is measurable
and
m(A ∪ B) ≤ m(A) + m(B) ,
with equality if A ∩ B = ∅. More generally, the union of any collection of
measurable sets A1 , A2 , . . . is measurable and
m Ak ≤ m(Ak ) ,
k k
12.3. The Cantor set 315
0 1 2 1 2 7 8 1
9 9 3 3 9 9
The nested sets theorem guarantees that C is not the empty set, but this
is also evident by inspection, since each set Fn contains all of the endpoints
0, 1, 13 , 23 , 19 , 29 , 79 , 89 , 1 2 7 8
27 , 27 , 27 , 27 , ...
But for each ε > 0 we can choose n large enough that 2(2/3)n < ε. This
shows that C has measure zero.
Our next aim is to show that the Cantor set is uncountable. By virtue
of its construction, Fn consists precisely of those points x that have some
ternary expansion
∞
x = 0.a1 a2 a3 · · · = ak 3−k
k=1
with ak = 0 or 2 for k = 1, 2, . . . , n. Thus a point x belongs to the Cantor
set if and only if it has a ternary expansion of the form x = 0.a1 a2 a3 · · ·
with ak = 0 or 2 for all k. The endpoints of removed intervals are triadic
rationals, numbers of the form m/3n . These points have two distinct ternary
expansions, exactly one of which has all digits ak = 0 or 2. For example,
7
9 = 0.21000 · · · = 0.20222 . . . .
This operation defines a mapping x
→ y of the Cantor set onto the interval
[0, 1]. However, the mapping is not injective. For instance,
x= 7
9 = 0.20222 · · · → y = 0.10111 · · · = 3
4 and
x= 8
9 = 0.22000 · · · → y = 0.11000 · · · = 3
4 .
12.4. The Cantor–Scheeffer function 317
In any case, at most two points x ∈ C correspond in this manner to the same
point y ∈ [0, 1]. This shows that a proper subset of C is bijectively equiv-
alent to the interval [0, 1], and so has the cardinality of the continuum. In
particular, the Cantor set is uncountable, a result that can also be obtained
by a diagonal argument similar to the proof that the set of real numbers is
uncountable.
To show that the full Cantor set has the cardinality of the continuum,
we may apply the Schröder–Bernstein theorem. Since C ⊂ [0, 1], there is
an obvious injection of C into [0, 1]. To construct an injective mapping of
[0, 1] into C, represent an arbitrary point y ∈ [0, 1] by its binary expansion
y = 0.b1 b2 b3 . . . , adopting the convention that in cases of ambiguity the
expansion shall end in an infinite string of zeros. Let ak = 2bk and define
∞
x = 0.a1 a2 a3 · · · = ak 3−k
k=1
x = 0.a1 a2 a3 . . .
removed from [0, 1] in the construction of the Cantor set, it is not difficult
to see that f (x1 ) = f (x2 ). The definition of f is then extended to the whole
interval [0, 1] by setting f (x) ≡ f (x1 ) in each removed interval x1 ≤ x ≤ x2 .
The resulting function can then be shown to be continuous and to have the
other stated properties.
However, there is an alternate approach that is more geometric and
perhaps more transparent. We will produce the function f (x) as the uniform
limit of a sequence of continuous nondecreasing functions fn (x) with values
fn (0) = 0 and fn (1) = 1. First define
⎧ 3
⎪
⎨ 2x , 0 ≤ x ≤ 13
3 ≤x≤ 3
1 1 2
f1 (x) =
⎪ 2,
⎩ 3
2x − 2 , 3 ≤ x ≤ 1.
1 2
y
1
1
2
x
1 2 1
3 3
y
1
3
4
1
2
1
4
x
1 2 1 2 7 8 1
9 9 3 3 9 9
y
1
7
8
3
4
5
8
1
2
3
8
1
4
1
8
x
1 2 1 2 7 8 1
9 9 3 3 9 9
Since the Cantor set has measure zero, this shows that f (x) = 0 almost
everywhere on the interval [0, 1].
The curve in the graph of the Cantor–Scheeffer function is sometimes
called the devil’s staircase. The function was popularized by Lebesgue, who
adopted it as an example of a continuous nonconstant monotonic function
that is singular in the sense that f (x) = 0 almost everywhere. The con-
struction can be modified to produce a continuous singular function that
is strictly monotonic; in other words, it is strictly increasing on every open
subinterval of [0, 1].
x = ϕ(t) , y = ψ(t) , a ≤ t ≤ b,
where ϕ and ψ are continuous functions. More precisely, the curve is defined
as an equivalence class of such representations, since the actual choice of pa-
rameter is unimportant. The reason for such a pedantic definition is that the
12.5. Space-filling curves 321
more intuitive notion is not entirely adequate. It would seem more natural
to view a curve as a continuous image of a line segment, a connected path
of points in the plane. However, such an image can have totally unexpected
form. In 1890, Giuseppe Peano (1858–1932) astounded the mathematical
world by constructing a continuous curve that passes through every point
of a square!
Soon after Peano [4] produced his example, David Hilbert and E. H.
Moore found different constructions. Today such curves are known as space-
filling curves or Peano curves. Lebesgue based another construction on the
Cantor set, and I. J. Schoenberg [6] modified Lebesgue’s construction to
produce a very elegant example, which we now proceed to describe.
First define the function
⎧
⎪
⎨ 0, 0≤t≤ 1
3
g(t) = 3t − 1 , 1
≤t≤ 2
⎪
⎩
3 3
1, 2
3 ≤ t ≤ 1.
Let g(−t) = g(t) and g(t + 2) = g(t), so that g is a continuous even function
of period 2. Its graph is shown in Figure 3.
The curve is now defined by the parametric representation
1 1 1
x = ϕ(t) = g(t) + 2 g(32 t) + 3 g(34 t) + . . .
2 2 2
1 1 1
y = ψ(t) = g(3t) + 2 g(3 t) + 3 g(35 t) + . . . ,
3
2 2 2
y
1
t
1 2 3 4 5
g(3n t0 ) = bn+1 , n = 0, 1, 2, . . . .
To see this, observe that 3n t0 differs from ∞ k=1 an+k 3
−k by an even integer,
because all of the coefficients ak are even. Since g(t + 2) = g(t), this shows
that ⎛ ⎞
∞
g(3n t0 ) = g ⎝ an+j 3−j ⎠ = bn+1 , n = 0, 1, 2, . . . .
j=1
Exercises
1. (a) The Cartesian product A × B of two sets A and B is the set of all
ordered pairs (a, b) with a ∈ A and b ∈ B. If A and B are countable sets,
prove that A × B is countable.
(b) Prove that every countable union of countable sets is countable. More
precisely,
∞ if each of the sets E1 , E2 , . . . is countable, then their union E =
E
n=1 n is also countable.
4. Let A be an arbitrary set and let P(A) denote its power set, the set of
all subsets of A. Show that there is a bijection between P(A) and the set of
all mappings f : A → {0, 1}, the set consisting of the two numbers 0 and 1.
5. Prove that every open set E ⊂ R has a unique representation E = In
as a union of a countable collection of open intervals In .
7. For any bounded set E ⊂ R, the derived set E consists of all cluster
points of E. We know that E is always a closed set, although it may be
empty. A bounded set E ⊂ R is said to be perfect if E = E . Thus every
closed interval [a, b] is a perfect set. Prove that the Cantor set C is a perfect
set.
∞
(b) Show that the intersection F = n=1 Fn is an uncountable closed set
of measure zero.
∞(c) Show that F has measure zero if and only if the infinite product
n=1 (2ξn ) diverges to 0.
(d) Exhibit
∞ a sequence {ξk } with 0 < ξk < 12 for which the infinite
product n=1 (2ξn ) converges. Thus obtain a closed set of positive measure
with empty interior.
11. Show that the “devil’s staircase” curve given by the graph of the Cantor–
Scheeffer function, y = f (t) for 0 ≤ t ≤ 1, has arclength 2. Note the failure
of the usual formula
1
1 + f (t)2 dt
0
for arclength. Note also the failure of the fundamental theorem of calculus,
since
1
f (t) dt = f (1) − f (0) .
0
Here both integrals are taken in the Lebesgue sense. Two functions that
differ only on a set of measure zero have equal Lebesgue integrals.
References 325
References
[1] Ralph P. Boas, A Primer of Real Functions, Third edition, Mathematical
Association of America, Washington, DC, 1981.
[2] E. Hille and J. D. Tamarkin, “Remarks on a known example of a monotone
continuous function”, Amer. Math. Monthly 36 (1929), 255–264.
[3] E. Kamke, Theory of Sets, English translation, Dover Publications, New York,
1950.
[4] G. Peano, “Sur une courbe qui remplit toute une aire plane”, Math Annalen 36
(1890), 157–160.
[5] Hans Sagan, “An elementary proof that Schoenberg’s space-filling curve is
nowhere differentiable”, Math. Magazine 65 (1992), 125–128.
[6] I. J. Schoenberg, “On the Peano curve of Lebesgue”, Bull. Amer. Math. Soc. 44
(1938), 519.
Chapter 13
Differential Equations
Is there a function y = ϕ(x) that satisfies the differential equation and the
initial condition, and is there only one such function?
327
328 13. Differential Equations
Proof of theorem. The first step is to observe that the initial-value prob-
lem (1) is equivalent to the integral equation
x
(2) ϕ(x) = y0 + f (t, ϕ(t)) dt .
x0
In other words, any continuously differentiable function ϕ that solves the
initial-value problem (1) must satisfy the integral equation (2), and con-
versely any continuous solution of the integral equation must be continuously
differentiable and satisfy (1). Thus our problem reduces to showing that the
integral equation (2) has a continuous solution, and that this solution is
unique.
Turning first to the question of existence, we will show that the equation
(2) has a continuous solution ϕ(x) in the interval |x − x0 | < δ. The strategy
is to obtain the solution as a uniform limit of continuous functions defined
inductively by ϕ0 (x) ≡ y0 and
x
(3) ϕn+1 (x) = y0 + f (t, ϕn (t)) dt . |x − x0 | < δ ,
x0
for n = 0, 1, 2, . . . . For convenience we will restrict attention to the half-
interval [x0 , x0 + δ). Similar arguments apply to the interval (x0 − δ, x0 ].
The first step is to show that each of the functions ϕn (x) is well defined
and continuous on [x0 , x0 + δ), and that
(4) |ϕn (x) − y0 | ≤ M (x − x0 ) , x0 ≤ x < x0 + δ .
Obviously the constant function ϕ0 (x) has these properties. Proceeding
inductively, suppose that some function ϕn (x) has the stated properties.
Then since δ ≤ b/M it follows that (x, ϕn (x)) ∈ R for all x ∈ [x0 , x0 + δ),
so that ϕn+1 (x) is well defined and continuous on this interval and
x
|ϕn+1 (x) − y0 | ≤ |f (t, ϕn (t))| dt ≤ M (x − x0 ) .
x0
Thus every function ϕn (x) has the stated properties.
The proof of uniform convergence will rely on the inequality
M Cn
(5) |ϕn+1 (x) − ϕn (x)| ≤ (x − x0 )n+1 , x0 ≤ x < x0 + δ ,
(n + 1)!
for n = 0, 1, 2, . . . , where C is the Lipschitz constant of the function f (x, y).
For n = 0 the inequality (5) is the same as (4). To prove it for general n,
we apply the iterative definition (3) to see that
x
|ϕn+2 (x) − ϕn+1 (x)| ≤ |f (t, ϕn+1 (t)) − f (t, ϕn (t))| dt
x0
x
≤C |ϕn+1 (t) − ϕn (t))| dt .
x0
330 13. Differential Equations
which shows that (5) holds for the next integer n + 1 as well. This proves
(5) for all n.
In particular, the inequality (5) gives the uniform estimate
M C n δ n+1
|ϕn+1 (x) − ϕn (x)| ≤ , n = 0, 1, 2, . . .
(n + 1)!
in the interval [x0 , x0 + δ) . From this it follows that {ϕn (x)} is a uniform
Cauchy sequence, hence it converges uniformly on [x0 , x0 + δ) to some con-
tinuous function ϕ(x). Alternatively, the Weierstrass M-test shows that
n
ϕn+1 (x) = y0 + ϕk+1 (x) − ϕk (x)
k=0
Consequently, we can pass to the limit in the iterative relation (3) and con-
clude that ϕ(x) satisfies the integral equation (2). This proves the existence
of a solution of (2), hence of the initial-value problem (1).
Uniqueness is proved in a similar way. If ϕ(x) and ψ(x) are solutions of
(2), then
x
|ϕ(x) − ψ(x)| ≤ |f (t, ϕ(t)) − f (t, ψ(t))| dt
x0
x
≤C |ϕ(t) − ψ(t)| dt , x0 ≤ x < x0 + δ .
x0
Since |ϕ(x) − x0 | < b and |ψ(x) − x0 | < b, so that |ϕ(x) − ψ(x)| < 2b, it
follows that
Iteration gives
x
2b C 2
|ϕ(x) − ψ(x)| ≤ 2b C 2
(t − x0 ) dt = (x − x0 )2
x0 2!
and in general
2b C n 2b C n δ n
|ϕ(x) − ψ(x)| ≤ (x − x0 )n ≤ , x0 ≤ x < x0 + δ ,
n! n!
for n = 1, 2, . . . . Letting n → ∞, we conclude that ϕ(x) ≡ ψ(x) in the
interval [x0 , x0 + δ). This proves the uniqueness of a solution of (2), hence
of (1), and completes the proof of the theorem.
M eCδ C n δ n+1
|ϕ(x) − ϕn (x)| ≤ , |x − x0 | < δ , n = 0, 1, 2, . . . .
(n + 1)!
there. Then the theorem asserts the existence of a unique solution y = ϕ(x)
to the initial-value problem (6), valid in some interval |x − x0 | < δ. Picard’s
iterative proof can be adapted to this more general situation. The problem
(6) is converted to an equivalent integral equation
x
y(x) = b + f (t, y(t)) dt ,
x0
332 13. Differential Equations
The details are similar to the scalar case and are left as an exercise.
The last theorem applies in particular to linear systems
where the functions ajk = ajk (x) are continuous in some neighborhood of
x0 .
A single linear differential equation of higher order can be viewed as a
special case of a first-order linear system. Consider a differential equation
of the form
z1 = y , z2 = y , . . . , zn = y (n−1) .
and the initial conditions (8) become z(x0 ) = b, where z = (z1 , . . . , zn ) and
b = (b1 , . . . , bn ). Consequently, the existence and uniqueness theorem for
first-order linear systems implies the existence and uniqueness of a solution
to the linear differential equation (7) under the initial conditions (8).
13.2. Wronskians 333
13.2. Wronskians
Recall that a set of functions f1 (x), f2 (x), . . . , fn (x) is said to be linearly
independent over an open interval I ⊂ R if no linear combination
vanishes throughout I, except for the trivial combination with all coefficients
ck = 0. It is equivalent to require that no function in the set be expressible
as a linear combination of the others. The functions are said to be linearly
dependent if some nontrivial combination vanishes identically in I.
If all of the functions fn (x) have derivatives up to order n − 1 on I, their
linear dependence implies that the system of linear equations
y = c1 y1 + c2 y2 + · · · + cn yn
But the differential equation is linear, so y is a solution with the same data
at x0 as the trivial solution z(x) ≡ 0. By the uniqueness of the solution
to an initial-value problem, it follows that y(x) = z(x) = 0 for all x ∈ I.
In other words, the functions y1 , . . . , yn are linearly dependent on I if their
Wronskian vanishes at any point of the interval.
In summary, these results demonstrate a sharp dichotomy for sets of
solutions y1 , . . . , yn of the same differential equation (7). Either these func-
tions are linearly dependent and their Wronskian vanishes everywhere on
the interval I, or they are linearly independent and their Wronskian van-
ishes nowhere on I. There is no middle ground. The same dichotomy will be
apparent from an explicit formula for the Wronskian that we now proceed
to derive.
The case n = 2 is the most important, and will be considered first. Let
y1 and y2 be solutions of a differential equation
W = y1 y2 − y1 y2
= (py1 + qy1 )y2 − y1 (py2 + qy2 )
= p(y1 y2 − y1 y2 ) = −pW .
so that
x
(10) W (x) = W (x0 ) exp − p(t)dt .
x0
The formula (10) is known as the Wronskian relation for the differential
equation (9). Its most remarkable feature is its dependence only on the
coefficient function p, not on the solutions y1 and y2 , except for the value
of their Wronskian at a single point. The formula shows again that either
W (x) = 0 for all x ∈ I or W (x) ≡ 0 on I.
For a set of solutions y1 , y2 , . . . , yn of the general differential equation
(7) of order n, a similar calculation shows that the Wronskian W (x) =
W (x; y1 , . . . , yn ) satisfies the equation W = −a1 W , and so
x
(11) W (x) = W (x0 ) exp − a1 (t)dt .
x0
It then follows from the uniqueness property that z(x) ≡ y(x) for this choice
of coefficients. In other words, every solution of (7) is a linear combination
of the set of linearly independent solutions y1 , . . . , yn .
In the special case where the equation (7) has constant coefficients, there
are well known procedures for explicit construction of a complete set of lin-
early independent solutions. The general solution is then expressible as some
linear combination of these functions. In the case of variable coefficients, the
calculation of solutions is generally not so easy. The problem is even more
difficult near points where the coefficients have singularities. In the next
section we discuss methods for finding solutions in the form of power series.
336 13. Differential Equations
y + 2xy + 2y = 0 .
We look for a solution in the form of a power series y = ∞ k
k=0 ck x with
positive radius of convergence. Term-by-term differentiation gives
∞
∞
y = kck xk−1 , y = k(k − 1)ck xk−2 .
k=1 k=2
Putting these expressions into the differential equation, we are led to the
requirement that
∞
∞
∞
(k + 2)(k + 1)ck+2 xk + 2 kck xk + 2 c k xk = 0 .
k=0 k=0 k=0
This imposes the equivalent condition that the coefficients of all powers xk
must vanish, so that
or
2
(13) ck+2 = − ck , k = 0, 1, 2, . . . .
k+2
It is now clear that the first two coefficients c0 and c1 can be chosen ar-
bitrarily, and then the recurrence relation (13) will determine the others.
Specifically, we find from (13) that
1 1 1 1
c2 = −c0 , c4 = − c2 = c0 , c6 = − c4 = − c0 , . . . ;
2 2 3 3!
2 2 2 2
c3 = − c1 , c5 = − c3 = c1 , . . . ,
3 5 1·3·5
13.3. Power series solutions 337
and in general
(−1)k (−1)k 2k
c2k = c0 and c2k+1 = c1 , k = 1, 2, 3, . . . .
k! 1 · 3 · 5 · · · (2k + 1)
Both series are easily seen to converge on the whole real line, so that the
term-by-term differentiation is justified and the series represent actual solu-
tions to the differential equation. In fact, the first series sums to the function
e−x .
2
1
xy + y + xy = 0 , or y + y + y = 0.
x
Here x0 = 0 is a singular point. Nevertheless, we look for a power series
solution of the form
∞
y(x) = c k xk .
k=0
Inserting this series and its derivatives into Bessel’s equation, we find that
∞
∞
∞
k(k − 1)ck xk−1 + kck xk−1 + ck xk+1 = 0 ,
k=2 k=1 k=0
or
∞
c1 + (k + 2)2 ck+2 + ck xk+1 = 0 .
k=0
1
ck+2 = − ck , k = 0, 1, 2, . . . .
(k + 2)2
338 13. Differential Equations
Since c1 = 0, the recurrence relation implies that ck = 0 for all odd indices
k. With the choice c0 = 1, the relation gives
1 1 1
c2 = − , c4 = , c6 = − ,... ,
22 22 42 22 42 62
so that, in standard notation,
x2 x4 x6
y(x) = J0 (x) = 1 − + − + ...
22 22 42 22 42 62
is a solution of Bessel’s equation. This power series converges on the whole
real line and is called a Bessel function of first kind. Its graph is shown in
Figure 1.
Because of the requirement that c1 = 0, the only power series solutions
are constant multiples of J0 (x). The search for a second solution leads us to
a general discussion of regular singular points and the method of Frobenius.
We will now describe this method in detail, applying it to specific equations
but omitting a proof of its general validity. For a more complete theoretical
discussion, the reader is referred to the book by Coddington and Levinson
[4].
A singular point x0 of the differential equation (12) is called a regular
singular point if (x − x0 )p(x) and (x − x0 )2 q(x) have power series expansions
about x0 :
∞
∞
(x − x0 )p(x) = ak (x − x0 )k and (x − x0 )2 q(x) = bk (x − x0 )k ,
k=0 k=0
0.8
0.6
0.4
0.2
5 10 15 20 25 30
-0.2
-0.4
xy + y + xy = 0
where the real number α and the coefficients ck are to be determined. Formal
calculations give
∞
∞
y = (k + α)ck xk+α−1 and y = (k + α)(k + α − 1)ck xk+α−2 ,
k=0 k=0
so that
∞
∞
∞
j−1 k+α−1
py = aj x (k + α)ck x = aj (k + α)ck xn+α−2
j=0 k=0 n=0 j+k=n
and
∞
∞
∞
j−2 k+α
qy = bj x ck x = bj ck xn+α−2 .
j=0 k=0 n=0 j+k=n
y + py + qy =F (α)xα−2
∞
n−1
+ F (n + α)cn + (k + α)an−k + bn−k ck xn+α−2 ,
n=1 k=0
n−1
(14) F (n + α)cn = − (k + α)an−k + bn−k ck , n = 1, 2, . . . .
k=0
It can be proved that each of these formal power series actually converges
in some neighborhood of the regular singular point x0 = 0, and therefore
represents an actual solution to the differential equation.
Case 2. α1 and α2 are real and α1 = α2 . Then only one solution can
be obtained by the process just described. However, F (α1 ) = F (α1 ) = 0,
since α1 is a double root of the polynomial F (α). This observation allows
us to obtain a second (independent) solution by a process of differentiation
with respect to α. For arbitrary real α near α1 , let the functions c0 (α) = 1,
c1 (α), c2 (α), . . . be defined recursively through the equation (14). Then each
coefficient cn (α) is a rational function of α, the quotient of two polynomials.
Consider the function
∞
y = y(x, α) = xα cn (α) xn .
n=0
Because of the way the coefficients cn (α) were constructed, we see that
where the subscripts denote partial derivatives. Once more it is clear that
y(x, α1 ) satisfies the differential equation, but we can say more. Differenti-
ation with respect to α gives
y + py + qy = (α − α1 )F (α) xα−2
∞
+ F (n + α)cn + (αan + bn )(α − α1 )
n=1
n−1
+ (k + α)an−k + bn−k ck xn+α−2 ,
k=1
where the last sum is absent for n = 1. For α near α1 , we now use the
relation
n−1
(15) F (n + α)cn = −(αan + bn )(α − α1 ) − (k + α)an−k + bn−k ck
k=1
F (m + α) = (α − α1 )(α + m − α1 ) ,
342 13. Differential Equations
0.5
5 10 15 20 25 30
-0.5
-1
-1.5
Both power series converge for all x ∈ R. The functions J0 (x) and Y0 (x) are
known as Bessel functions of first and second kinds, respectively. Figure 2
shows the graph of Y0 (x). It should be remarked that conventions differ and
the definition of Y0 (x) is not entirely standard in the literature.
converges and satisfies the prescribed initial conditions u(r, 0) = f (r) and
ut (r, 0) = g(r). Formally, these requirements reduce to
∞
∞
an J0 (λn r) = f (r) and λn bn J0 (λn r) = g(r) .
n=1 n=1
For a proof of (17), recall that yn (x) = J0 (λn x) satisfies the differential
equation
d
xyn + λ2n xyn = 0 .
dx
Multiply the equation by ym and integrate by parts to see that
1 1
d
2
λn xyn (x)ym (x) dx = − ym (x) xyn
0 0 dx
1 1
= − xyn (x)ym (x) + xyn (x)ym
(x) dx
0 0
1
= xyn (x)ym
(x) dx ,
0
x2 y + xy + (x2 − n2 )y = 0 ,
(−1)k x n+2k
∞
Jn (x) =
k!(n + k)! 2
(18) k=0
xn x2 x4
= n 1− + − ...
2 n! 2(2n + 2) 2 · 4(2n + 2)(2n + 4)
and
π
1
(20) Jn (x) = cos(x sin θ − nθ) dθ
π 0
0.6
0.4
0.2
5 10 15 20 25 30
-0.2
Each of the representations (19) and (20) for the Bessel function Jn (x) can
be verified by showing that the right-hand side satisfies Bessel’s differential
equation and is finite and properly normalized at the origin (see Exercises
10, 13, and 14). The representation (20) is often called Bessel’s formula. A
standard calculation in the theory of functions of a complex variable (contour
integration to find the coefficients in a Laurent expansion) produces Bessel’s
formula as a consequence of the remarkable generating relation
∞
x 1
(21) e 2 (t− t ) = Jn (x) tn .
n=−∞
∞
∞
∞
Jn (x) t =n n
Jn (x) t + (−1)n Jn (x) t−n
n=−∞ n=0 n=1
∞
∞
(−1)k x n+2k
= tn
k!(n + k)! 2
n=0 k=0
(−1)n+k x n+2k −n
∞
∞
+ t = S1 + S2 .
k!(n + k)! 2
n=1 k=0
∞ [m/2]
(−1)k m
m−2k x
S1 = 1 + t and
k!(m − k)! 2
m=1 k=0
(−1)m−k 2k−m x m
∞ [(m−1)/2]
S2 = t ,
k!(m − k)! 2
m=1 k=0
348 13. Differential Equations
where [α] denotes the largest integer less than or equal to α. On the other
hand, the binomial theorem gives
m
m
1 m!
t− = (−1)k tm−2k
t k!(m − k)!
k=0
[m/2]
m!
[(m−1)/2]
m!
= (−1)k tm−2k + (−1)m−j t2j−m ,
k!(m − k)! (m − j)!j!
k=0 j=0
which is the desired result (21). (For another proof, see Exercise 15.)
It was essentially through the generating relation that the functions now
called Bessel functions originally arose in 1817, when Bessel was studying
perturbations of astronomical orbits. Other mathematicians had previously
encountered special cases of Bessel functions, but Bessel developed their
properties in a systematic treatise published in 1824.
Friedrich Wilhelm Bessel (1784–1846) had an unusual life history. He
left school at age 14 to become apprentice to a shipping merchant in Bre-
men. Thereafter he was mainly self-educated. Attracted by the problem
of determining longitude at sea, he developed a keen interest in astronomy.
His work drew the attention of professional astronomers, and at age 26 he
became director of the observatory in Königsberg, where he remained for
the rest of his career. Using sophisticated mathematical methods, he made
important contributions to astronomy, including improved data on the po-
sitions of stars.
where a, b, and c are real parameters. There are many important special
cases. It has regular singular points at 0 and 1, and (with suitable interpre-
tation) at ∞, and it is in a sense the most general linear differential equation
of second order with three regular singular points. In the notation of Section
13.3,
c − (a + b + 1)x ab x
xp(x) = and x2 q(x) = − ,
1−x 1−x
13.5. Hypergeometric functions 349
Since the coefficient of every power xk must vanish, this gives the recurrence
relation
a(a + 1) · · · (a + k − 1) b(b + 1) · · · (b + k − 1)
γk = , k = 1, 2, . . . .
k! c(c + 1) · · · (c + k − 1)
In terms of the gamma function, the shifted factorial is (a)k = Γ(a + k)/Γ(a) .
An application of the ratio test confirms that the power series has radius of
convergence equal to 1 unless either a or b is a negative integer, in which
case the series terminates and F (x) is a hypergeometric polynomial.
350 13. Differential Equations
Other examples are complete elliptic integrals and Jacobi polynomials, which
include the Legendre polynomials and Chebyshev polynomials as special
cases.
Gauss made the first systematic study of hypergeometric functions in a
famous paper of 1812, but Euler had found a basic integral representation
in 1769.
Proof. The restriction of parameters to c > b > 0 ensures that the integral
converges. For fixed x with |x| < 1, expand the last factor into binomial
series:
∞
−a −a
(1 − xt) = (−xt)k ,
k
k=0
where
−a (−a)(−a − 1) · · · (−a − k + 1) (−1)k (a)k
= = .
k k! k!
We will give two proofs. The first uses Euler’s integral representation and
requires the additional assumption that c > b > 0. Euler’s representation is
easily seen to give
1
Γ(c)
lim F (a, b; c; x) = tb−1 (1 − t)c−a−b−1 dt
x→1− Γ(b) Γ(c − b) 0
Γ(c) Γ(c) Γ(c − a − b)
= B(b, c − a − b) = .
Γ(b) Γ(c − b) Γ(c − a) Γ(c − b)
Because the condition c > a + b ensures that the series converges at x = 1,
we can infer from Abel’s theorem that its sum agrees with its Abel limit, so
the Gauss summation formula is proved for c > b > 0.
The second proof shows that the condition c > b > 0 is superfluous.
Assuming only that c > a + b, we begin by establishing the identity
(c − a)(c − b)
(25) F (a, b; c; 1) = F (a, b; c + 1; 1) .
c(c − a − b)
For a proof of (25), denote the terms of the two series by
(a)k (b)k (a)k (b)k
Ak = and Bk = ,
k! (c)k k! (c + 1)k
respectively. Then
(a + k)(b + k) c
Ak+1 = Ak and Bk = Ak ,
(k + 1)(c + k) c+k
and a straightforward calculation confirms that
c(c − a − b)Ak = (c − a)(c − b)Bk + c kAk − (k + 1)Ak+1 .
Consequently,
n
n
c(c − a − b) Ak = (c − a)(c − b) Bk − c(n + 1)An+1 ,
k=0 k=0
since the last sum telescopes. But because c > a+b, the asymptotic relation
(24) shows that
(n + 1)An+1 ∼ M na+b−c → 0 as n → ∞ ,
13.5. Hypergeometric functions 353
(c − a)n (c − b)n
F (a, b; c; 1) = F (a, b; c + n; 1) ,
(c)n (c − a − b)n
or
(26)
Γ(c − a) Γ(c − b) Γ(c − a + n) Γ(c − b + n)
F (a, b; c; 1) = F (a, b; c + n; 1)
Γ(c) Γ(c − a − b) Γ(c + n) Γ(c − a − b + n)
for n = 1, 2, . . . . Now pass to the limit as n → ∞. An application of
Stirling’s formula for the gamma function (cf. Chapter 9, Exercise 22) shows
that
Γ(c − a + n) Γ(c − b + n)
lim = 1.
n→∞ Γ(c + n) Γ(c − a − b + n)
t(1 − t)y + [a + b + 1
2 − (a + b + 1)t]y − ab y = 0 .
x(1 − x)Y + [a + b + 1
2 − (2a + 2b + 1)x]Y − 4ab Y = 0 .
y1 (x2 ) = 0, it follows that y1 (x1 ) > 0 and y1 (x2 ) < 0; while y2 (x1 ) ≥ 0
and y2 (x2 ) ≥ 0. Observe now that because y1 and y2 satisfy their respective
differential equations, we have the identity
p(y2 y1 − y1 y2 ) = y2 (py1 ) − y1 (py2 ) = (q2 − q1 )y1 y2 .
Integration gives
x2 x2
p(y2 y1 − y1 y2 ) = q2 (x) − q1 (x) y1 (x)y2 (x) dx > 0 ,
x1 x1
in view of the hypothesis that q1 (x) ≤ q2 (x) with strict inequality in some
subinterval, and the assumptions that y1 (x) > 0 and y2 (x) > 0 in (x1 , x2 ).
On the other hand, since y1 vanishes at both x1 and x2 , the left-hand side
reduces to
x2
p y2 y1 = p(x2 )y2 (x2 )y1 (x2 ) − p(x1 )y2 (x1 )y1 (x1 ) ≤ 0 ,
x1
by virtue of the standing assumption that p(x) > 0 and the supposed in-
equalities y2 (x1 ) ≥ 0, y2 (x2 ) ≥ 0, y1 (x1 ) > 0, and y1 (x2 ) < 0. This con-
tradicts the previous conclusion that the right-hand side is positive, thereby
invalidating our initial supposition that y2 (x) = 0 throughout the interval
(x1 , x2 ). Consequently, y2 (x) = 0 for some point x in (x1 , x2 ), as the theorem
asserts.
u , which are the same as those of J0 (x) = x−1/2 u(x) , to behave for large x
like those of sin x and cos x.
These notions are made precise by appeal to the Sturm comparison
theorem. Since u(x) satisfies the differential equation u + qu = 0 with
we can use Sturm’s theorem to compare the zeros of u(x) with those of
cos x, for instance, which satisfies the differential equation y + y = 0. The
theorem guarantees that u(x) has at least one zero between each pair of
zeros of cos x, which lie apart at distance π. We conclude in particular
that J0 (x) has infinitely many positive zeros λn , which can be ordered by
0 < λ1 < λ2 < . . . with λn → ∞ as n → ∞, since the zeros can have no
finite cluster point.
In fact, we can say more. Let y1 = cos(x − δ) be a solution to the
differential equation y + y = 0 which vanishes together with J0 at some
point λn . Then the next zero of y1 will occur at λn + π. But by the
√
comparison theorem, the function y2 (x) = u(x) = xJ0 (x) must vanish
between successive zeros of y1 . This shows that λn < λn+1 < λn + π, or
0 < λn+1 − λn < π for n = 1, 2, . . . .
But we can say even more. By reversing the roles of the two differential
equations, we can apply the Sturm comparison theorem to prove that λn+1 −
λn → π as n → ∞. For this it will suffice to show that for each ε > 0 there
is a number N such that λn+1 − λn > π/(1 + ε) for all n ≥ N . Given ε > 0,
choose N so large that
Now define y2 (x) = cos((1 + ε)x − δ), choosing δ so that y2 (λn ) = 0. Then
since y2 satisfies the equation y + (1 + ε)2 y = 0, the Sturm comparison
√
theorem ensures that y2 (x) will vanish again before u(x) = xJ0 (x) does.
But the zeros of y2 are spaced at distance π/(1 + ε) apart, so this says that
for all n ≥ N . Because ε > 0 was chosen arbitrarily, and we have already
found that λn+1 − λn < π for all n, this implies that λn+1 − λn → π as
n → ∞.
358 13. Differential Equations
Note that λ12 − λ11 ≈ 3.14128 , which is already very close to π ≈ 3.14159 .
The data suggest that λn+1 − λn increases to π, and this can be verified
analytically. (See Exercises 28 and 29.)
The hypotheses guarantee that y2 (t) > 0 in (x1 , x2 ), and so it follows from
the Sturm comparison theorem that y1 (t) > 0 as well. Thus, in view of the
13.7. Refinements of Sturm’s theory 359
Therefore, the function y1 /y2 has positive derivative in this interval. But
y1 (x)/y2 (x) → 1 as x → x1 + (apply l’Hôpital’s rule if y1 (x1 ) = y2 (x1 ) = 0),
so we conclude that y1 (x)/y2 (x) > 1 in (x1 , x2 ), as claimed.
Since (pq) (x) > 0 on (a, b) by hypothesis, this shows that φ(x) is decreasing
on the interval [a, b]. In particular, if ξn are successive local maxima of
|y(x)|, so that a < ξ1 < ξ2 < · · · < b, then φ(ξ1 ) > φ(ξ2 ) > . . . . But
φ(ξn ) = y(ξn )2 , because y (ξn ) = 0 at each local maximum of |y(x)|. Thus
|y(ξ1 )| > |y(ξ2 )| > . . . , as the theorem asserts. Note that the inequalities
extend to the endpoints a or b if y (a) = 0 or y (b) = 0, and if p(x)/q(x)
tends to a finite limit as x → a+ or x → b−, respectively.
local maxima of |J0 (x)| in (0, ∞), then |J0 (ξ1 )| > |J0 (ξ2 )| > . . . . In fact,
it is easy to see from the differential equation that the points ξn interlace
with the zeros λn of J0 , since J0 (ξn ) = −J0 (ξn ) = 0 at each of these critical
points, so that J0 has a local maximum at ξn if J0 (ξn ) > 0 and a local
minimum if J0 (ξn ) < 0, as the graph in Figure 1 suggests. Recall that J0
cannot vanish at a critical point because it is a solution of a linear differential
equation of second order. Finally, we note that J0 has a local maximum the
origin, since J0 (0) = 0 and J0 (0) = − 12 < 0. Also J0 (0) = 1 > 0, and the
Sonin–Pólya theorem extends to show that J0 (0) > |J0 (ξ1 )| despite the fact
that the function p(x) = x vanishes at the origin, because p(x)/q(x) ≡ 1. In
particular, it follows that |J0 (x)| ≤ 1 for all x ≥ 0.
A similar elementary argument shows that J0 (x) → 0 as x → ∞. In
√
fact, it may be expected that J0 (x) decays like 1/ x, because as we have
√
seen, the substitution u(x) = x y(x) transforms Bessel’s equation to u +
1 + 4x1 2 u = 0, whose solutions behave like sinusoids for large x. To make
these notions precise, write the transformed equation as
1
u + Qu = 0 , where Q(x) = 1 + 2 ,
4x
and consider the function
ψ(x) = u (x)2 + Q(x)u(x)2 .
A simple calculation shows that ψ (x) = Q (x)u(x)2 ≤ 0, which implies that
ψ(x) is nonincreasing over the interval (0, ∞). In particular, if νn are the
successive local maxima of |u(x)|, with 0 < ν1 < ν2 < . . . , then
Q(νn )u(νn )2 = ψ(νn ) ≤ ψ(ν1 ) = Q(ν1 )u(ν1 )2 , n = 1, 2, . . . ,
since u (ν
n ) =√0. It follows that u(x) remains bounded as x → ∞, so that
J0 (x) = O(1/ x) as x → ∞. In particular, J0 (x) → 0 as x → ∞.
A more general form of the preceding argument appears in an article
by Duren and Muldoon [5]. Our presentation of the Sonin-Pólya theorem
is adapted from the book of Kreider, Kuller, and Ostberg [8]. For further
refinements of the Sturm theory, the reader is referred to Birkhoff and Rota
[2] and the classic book by Ince [6], and to the article by Laforgia and
Muldoon [9], which gives additional references.
Exercises
1. For the initial-value problem y = f (x, y), y(x0 ) = y0 , verify the error
estimate
M eCδ C n δ n+1
|ϕ(x) − ϕn (x)| ≤ , |x − x0 | < δ ,
(n + 1)!
as stated after the proof of the existence and uniqueness theorem in Section
13.1.
Exercises 361
2. Carry out the details in the proof of existence and uniqueness of a solution
to the initial-value problem (6) for first-order linear systems of differential
equations.
3. Show that the functions f (x) = x and g(x) = x2 are linearly independent
over the interval (−1, 1), yet their Wronskian vanishes at the origin.
4. Verify the Wronskian relation (11) for solutions to a differential equation
(7) of order n.
Hint. The derivative of an n × n determinant is equal to the sum of n
determinants, each involving the derivatives of one row. Loosely speaking,
the determinant can be differentiated row by row.
5. Use the method of reduction of order to derive a second solution, in-
dependent of J0 (x), for Bessel’s equation xy + y + xy = 0. Set y = J0 v
and derive a linear differential equation of first order for v. Solve the latter
equation to obtain
1
Y0 (x) = J0 (x) dx = J0 (x) log x + 14 x2 − 128
3 4
x + ... .
xJ0 (x)2
6. (a) Find the general power series solution y = ∞ k
k=0 ck x of Legendre’s
equation
(1 − x2 )y − 2xy + λy = 0 .
Show that the series is convergent for |x| < 1.
(b) Show that the Legendre equation has a (nontrivial) polynomial so-
lution if and only if λ = n(n + 1) for some integer n = 0, 1, 2, . . . . When
λ = n(n + 1), show that the polynomial solution is unique up to a constant
factor and has degree n. With standard normalization (choice of constant
factor), this solution is called the nth Legendre polynomial and is denoted
by Pn (x). The standard normalization is to require that Pn (1) = 1. Show
that Pn (−x) = (−1)n Pn (x), or in other words that the polynomial is an
even function if n is even, odd if n is odd.
(c) Verify that
d
(1 − x2 )Pn (x) = −n(n + 1)Pn (x) ,
dx
and use this relation to establish the orthogonality property
1
Pn (x) Pm (x) dx = 0 , n = m .
−1
Note. Legendre’s differential equation and Legendre polynomial expansions
(orthogonal series expansions analogous to Fourier series) occur naturally
when solving Laplace’s equation in a spherical region. The book by Jackson
[7] is a good reference.
362 13. Differential Equations
(b) Show that every zero of Pn (x) in the interval (−1, 1) is simple.
Hint. Remember that Pn (x) satisfies a differential equation.
(c) Prove that the Legendre polynomial Pn (x) has exactly n (distinct)
zeros in the interval (−1, 1).
Hint. Suppose that Pn (x) = 0 in (−1, 1) only at points x1 , x2 , . . . , xk for
k < n, and apply the result of (a) to the polynomial Q(x) = (x − x1 )(x −
x2 ) · · · (x − xk ) .
8. (a) For any constant λ > 0, show that
1
x J0 (λx)2 dx = 12 J0 (λ)2 + J0 (λ)2 .
0
Hint. Transform the first integral by substituting t = λx. Then show that
y = J0 (t) satisfies
d2 2 2
2ty 2 = t (y + y ) .
dt
(b) Generalize the result by proving that
1
x Jn (λx)2 dx = 12 Jn (λ)2 + 1 − n2
λ2
Jn (λ)2
0
10. (a) For n = 1, 2, . . . , show that Bessel’s equation x2 y +xy +(x2 −n2 )y =
0 has a regular singular point at the origin with indicial equation α2 −n2 = 0.
(b) Derive the series expansion (18) for Jn (x) as a solution to Bessel’s
equation, and show that it is the only solution, up to constant multiples,
that is bounded near the origin.
Exercises 363
15. Write
x 1 xt x
e 2 (t− t ) = e 2 e− 2t
and multiply the power series expansions of the two exponential functions
to recover the generating relation (21) for Bessel functions.
16. Differentiate the generating relation (21) term by term to obtain proofs
of the identities
x
Jn (x) = 12 Jn−1 (x) − Jn+1 (x) and n Jn (x) = Jn−1 (x) + Jn+1 (x) ,
2
where n = 1, 2, . . . . Justify the term-by-term differentiation.
364 13. Differential Equations
x1−c F (a − c + 1, b − c + 1; 2 − c; x)
1+x
(a) log = 2x F 12 , 1; 32 ; x2 ,
1−x
(b) (1 + x)−2a + (1 − x)−2a = 2 F a, a + 12 ; 12 ; x2 ,
√ −2a
(c) 1+ 1−x = 2−2a F a, a + 12 ; 2a + 1; x .
30. Generalizing the device used to study of zeros of the Bessel function J0 ,
show that for some choice of function ϕ(x) > 0 the transformation y = ϕu
converts a differential equation y + py + qy = 0 to an equation of the form
u + Qu = 0. This is known as Liouville’s normal form. Note that the zeros
of y(x) are the same as those of u(x).
References
[2] Garrett Birkhoff and G.-C. Rota, Ordinary Differential Equations, 4th edition,
Wiley, New York, 1989.
[12] C. Sturm, “Mémoire sur les équations différentielles linéaires du second ordre”,
J. Math. Pures Appl. 1 (1836), 106–186.
[13] G. N. Watson, A Treatise on the Theory of Bessel Functions, Second edition,
Cambridge University Press, London, 1966.
Chapter 14
Elliptic Integrals
The study of elliptic integrals dates back at least to the early 18th century,
when they arose in connection with problems of mechanics and classical ge-
ometry. They comprise a large class of nonelementary integrals whose calcu-
lation reduces to standard forms that were tabulated at an early stage. The
standard elliptic integrals enjoy some nice transformation properties and
have a close, totally unexpected connection with the arithmetic–geometric
mean. They also satisfy the Legendre relation, an elegant identity that pro-
vides an effective method for numerical calculation of π. These topics will
be the focus of this final chapter.
369
370 14. Elliptic Integrals
sin2 2θ 2
ds2 = dr2 + r2 dθ2 = 2
dθ + r2 dθ2
r
1 − r4 1
= 2
+ r dθ2 = 2 dθ2 ,
2
r r
which gives the total arclength
π/4 1
dθ dr
4 √ =4 √ .
0 cos 2θ 0 1 − r4
The arclength integrals for the ellipse and lemniscate are examples of
what are now called elliptic integrals. This term refers to integrals of the
form
R(t, p(t)) dt ,
and π/2 √
1
1 − k 2 t2
E = E(k) = 1 − k 2 sin2 θ dθ = √ dt ,
0 0 1 − t2
14.2. Fagnano’s duplication formula 371
√
for 0 < k < 1. The parameter k is called the modulus, and k = 1 − k 2 is
known as the complementary modulus. Symmetrically related to K and E
are K and E , defined respectively by K (k) = K(k ) and E (k) = E(k ).
It is customary to use the notation K (k) and E (k) despite possible misin-
terpretation as derivatives. The corresponding indefinite integrals are called
incomplete elliptic integrals. The complete elliptic integral of third kind is
π/2
dθ
Π(ν, k) =
0 (1 + ν sin2 θ) 1 − k 2 sin2 θ
1
dt
= √ √ .
0 (1 + νt2 ) 1 − t2 1 − k 2 t2
√ √ π/2
dθ √ 1
dx
K(1/ 2) = 2 = 2 √ ,
0 2 − sin2 θ 0 1 − x4
with the substitution x = cos θ. The lemniscate integral can also be ex-
pressed in terms of the gamma function (see Exercise 2). The arclength of
an ellipse is given by a complete elliptic integral of second kind (see Exercise
1).
0.8
0.6
0.4
0.2
for small x and y, which is similar to Euler’s formula for the lemniscate
integral.
Clearly, it would be awkward to base a study of trigonometry on the
inverse trigonometric functions. It is the sine and cosine that have the
elegant addition formulas, plus another key property: periodicity. Yet after
Euler’s work another half-century elapsed before anyone thought seriously
of inverting the incomplete elliptic integrals and basing the theory on their
inverse functions, now known as elliptic functions. Legendre took some
preliminary steps in this direction. Gauss recorded the idea in his diary but
did not publish it. In 1827, Abel published the first account of the theory
of elliptic functions. He viewed them as functions of a complex variable and
found them to be doubly periodic, with two independent complex periods.
Around the same time, Carl Jacobi (1804–1851) made similar innovations,
which he first published also in 1827. Abel died in 1829 at age 26, but
Jacobi gradually developed the theory that revolutionized the subject. For
many years, Legendre had labored to produce what was to have been a
two-volume treatise [7] on the theory of elliptic integrals, complete with
extensive numerical tables. The first volume was finally published in 1825
and the second in 1826, but Legendre soon realized that much of his work had
been rendered obsolete by the brilliant insights of the young mathematicians
Abel and Jacobi. He might well have reacted with dismay, but to his credit
Legendre embraced the new ideas with great enthusiasm and presented them
in coordination with his earlier work as extensive supplements comprising a
third volume of the treatise.
Given an arbitrary pair of real numbers a and b with 0 < b < a, calculate
their arithmetic and geometric means
√
a1 = 12 (a + b) and b1 = ab .
Then 0 < b < b1 < a1 < a. Now iterate the process, inductively defining
an+1 = 12 (an + bn ) and bn+1 = an bn , n = 1, 2, . . . .
Then
0 < b < b1 < b2 < · · · < bn < an < · · · < a2 < a1 < a .
Each of the sequences {an } and {bn } is monotonic and bounded, hence
convergent. Let
α = lim an , β = lim bn .
n→∞ n→∞
Observe that
1
α = lim an+1 = lim (an + bn ) = 12 (α + β) ,
n→∞ n→∞ 2
The AGM algorithm was discovered by Lagrange around the year 1784.
Gauss rediscovered it in 1791, at the age of 14. The algorithm converges
rapidly, allowing easy numerical calculation of the AGM.
On May 30, 1799, Gauss made a startling discovery. On that day he
recorded a note in his diary to the effect that the two numbers
1
1 2 dx
√ and √
M ( 2, 1) π 0 1 − x4
Here is the unexpected connection between the AGM and elliptic inte-
grals of first kind.
√
Theorem 1. For 0 < k < 1 and k = 1 − k 2 ,
1 2
= K(k ) .
M (1, k) π
The theorem provides an explicit formula for the AGM of any two posi-
tive numbers. More importantly, it offers an efficient method for the numer-
ical calculation of elliptic integrals of first kind. Elliptic integrals of second
kind can be computed by a similar process. The extensive tables of elliptic
integrals in Legendre’s treatise [7] were assembled in this manner.
The proof of the theorem will depend on the fact that the elliptic inte-
gral and the AGM have essentially the same transformation property. It is
convenient to express this relation in terms of the function
π/2
dθ 1 b
I(a, b) = = K , 0 < b < a.
0 a2 cos2 θ + b2 sin2 θ a a
Deferring the proof of the lemma, let us observe first that the theorem
is an immediate consequence.
Proof of Theorem 1. Recall the notation {an } and {bn } for the sequences
of arithmetic and geometric means converging to the common limit M (a, b).
In this notation, Lemma 1 says that I(a, b) = I(a1 , b1 ). Iterating this iden-
tity, we infer that I(a, b) = I(an , bn ) for n = 1, 2, . . . . Because the function
I(a, b) is continuous, it follows that
Since I(1, k) = K (k) = K(k ) for 0 < k < 1, this is the desired result.
Lemma 2. For 0 < k < 1, the complete elliptic integral of first kind satisfies
√
2 k
(i) K = (1 + k)K(k)
1+k
1−k 1+k
(ii) K = K(k ) .
1+k 2
The three identities in Lemmas 1 and 2 are all equivalent. To see this,
observe first that the conjugate modulus of
√
1−k 2 k
= is = .
1+k 1+k
Inversion gives
√
1− 1+k 1 2
k= , = , k = .
1+ 2 1+ 1+
Therefore, the identity (ii) of Lemma 2 reduces to
√
1 2
K() = K ,
1+ 1+
which is (i) with k replaced by . Thus (i) and (ii) are equivalent.
To see that Lemma 1 is another version of the same identity, let k = b/a
and note that the relation
1 b
I(a, b) = K
a a
implies
√ √
√ 2 2 ab 1 2 2 k
I 12 (a + b), ab = K = K .
a+b a+b a 1+k 1+k
Corollary. For 0 < k < 1, the complete elliptic integral of second kind
satisfies
√
2 k 2
(iii) E = E(k) − (1 − k) K(k)
1+k 1+k
1−k 1 k
(iv) E = E(k ) + K(k ) .
1+k 1+k 1+k
dK E − k2 K
= ,
dk kk 2
which comes from
√ the power series for K and E (see Exercise 9). Define the
modulus = 2 k/(1 + k) and apply the formulas
1−k d 1−k
= , =√
1+k dk k(1 + k)2
√
to differentiate the composite function K(2 k/(1 + k)). Specifically,
dK d d d dK
= K() = (1 + k)K(k) = K(k) + (1 + k) ,
d dk dk dk dk
or
E() − 2 K() 1−k E(k) − k 2 K(k)
√ = K(k) + (1 + k) ,
2 k(1 + k)2 kk 2
which reduces to
1+k 1−k 1 1
E() − K(k) = E(k) − K(k) ,
2k(1 − k) 2k k(1 − k) k
since K() = (1 + k)K(k) by Lemma 2. Further simplification leads to the
relation (iii). The identity (iv) can be verified either by algebraic manip-
ulations from (iii) or by differentiating (ii). In fact, as with (i) and (ii),
the relations (iii) and (iv) are equivalent forms of the same identity. The
details are left as an exercise.
corresponds to the expression I(a, b) introduced earlier. For 0 < b < a and
k = b/a, we find that J(a, b) = a E(k ) The substitution θ = π2 − t shows
that J(b, a) = J(a, b). Note that the arclength of the ellipse
x2 y 2
+ 2 =1
a2 b
is given by 4J(a, b) (see Exercise 1). It will be convenient to use the following
analogue of Lemma 1, which is closely related to Landen’s transformation
for E.
Proof. Let k = b/a and recall the relations I(a, b) = a1 K(k ) and J(a, b) =
a E(k ). Then by Landen’s transformation for E, as given by identity (iv)
in the corollary to Lemma 2, we have
√
√
2 ab
2 J 12 (a + b), ab = 2 12 (a + b) E
a+b
√
2 k
= a(1 + k)E
1+k
1−k
= a(1 + k)E = a E(k ) + kK(k )
1+k
= J(a, b) + ab I(a, b) .
In the notation of the AGM algorithm,
∞ n 2 let c n = a2n − b2n . It can be
shown that the infinite series n=1 2 cn converges (see Exercise 7). Here is
a formula for computing elliptic integrals of second kind.
Theorem 2. For 0 < k < 1, let {an } and {bn } be the sequences generated
by theAGM algorithm, beginning with a = a0 = 1 and b = b0 = k . Let
cn = a2n − b2n , so that c0 = k. Then
∞
E(k) = 1− 2n−1 c2n K(k) .
n=0
380 14. Elliptic Integrals
E(k) = J(1, k )
N
= 1− 2 cn K(k) + 2N +1 J(aN +1 , bN +1 ) − a2N +1 K(k) .
n−1 2
n=0
Much more information about the AGM and its applications can be
found in the book by Borwein and Borwein [2]. The articles by Cox [3, 4]
focus on the remarkable connections that Gauss found between the AGM
and functions of a complex variable. Almqvist and Berndt [1] give an en-
tertaining account of historical developments related to elliptic integrals.
Watson [9] discusses the early history, with emphasis on the contributions
of Fagnano and Landen. The book by Stillwell [8] presents similar material
in broader historical perspective.
14.4. The Legendre relation 381
Carl Friedrich Gauss (1777–1855) has been called the prince of math-
ematicians. He made groundbreaking advances in many scientific fields,
including number theory, probability and statistics, celestial mechanics, spe-
cial functions, differential geometry, electricity and magnetism, and poten-
tial theory. Born in Braunschweig to parents of humble means and little
education, Gauss was a true prodigy whose conspicuous talent gained him
financial support for higher study in Braunschweig and at the University of
Göttingen. He spent most of his career in Göttingen, where he was direc-
tor of the Sternwarte (astronomical observatory). For a concise account of
Gauss’s life and work, the biography by Hall [6] is recommended.
dK E − k2 K dE E−K
= and = ,
dk kk 2 dk k
shows that dP/dk = 0, so P is constant. In other words,
π
K E + EK − K K = , 0 < k < 1.
2
This remarkable identity has become known as the Legendre relation.
Legendre seems to have discovered his relation by accident, and his proof
is not illuminating. It verifies the Legendre relation but does not explain
it. In his treatise [7] of 1825, Legendre reproduced the same calculations
without adding any new insights. Later, when Abel and Jacobi introduced
their theory of elliptic functions, and when Weierstrass developed another
version of the theory, the Legendre relation emerged in a natural way. There
is, however, a more elementary approach which exploits the fact that K and
K are solutions of the same hypergeometric differential equation and so
satisfy a Wronskian relation. It turns out that the Wronskian relation is
none other than the Legendre relation.
382 14. Elliptic Integrals
Here are the details. Recall that the Gauss hypergeometric function is
defined by
ab 1 a(a + 1)b(b + 1) 2
F (a, b; c; x) = 1 + x+ x +... ,
c 2! c(c + 1)
where a, b, c are real parameters with c = 0, −1, −2, . . . , and |x| < 1. The
function y = F (a, b; c; x) satisfies the hypergeometric equation
x(1 − x)y + [c − (a + b + 1)x]y − ab y = 0 .
Under the special condition a + b + 1 = 2c, it is easily verified that if some
function y(x) is a solution, then so is y(1 − x). (See Chapter 13, Exercise
22.)
The complete elliptic integrals K and E are hypergeometric functions
of x = k 2 . Specifically,
π
π
K(k) = F 12 , 12 ; 1; k 2 , E(k) = F − 12 , 12 ; 1; k 2 .
2 2
These formulas can be derived by expanding the integrands in binomial
series and integrating term by term. As in the proof of Lemma 2, this
process leads to the representation
∞
π −1/2 2 2n
K(k) = 1+ k .
2 n
n=1
and
1 1 −1/2
F − 12 , 12 ; 1; k 2 = t (1 − t)−1/2 (1 − k 2 t)1/2 dt
π 0
2 1 2
= (1 − u2 )−1/2 (1 − k 2 u2 )1/2 du = E(k) ,
π 0 π
1 1
W (x) = W ( 12 ) , 0 < x < 1.
4 x(1 − x)
But the Wronskian relation says that x(1 − x)W (x) is constant, so it follows
that Legendre’s expression K E + E K − K K is constant.
To show that the constant is π/2, it is simplest to compute the limit √ of
Legendre’s
√ expression
√ as k → 0. An
√ alternative is to calculate K(1/ 2) =
K (1/ 2) and E(1/ 2) = E (1/ 2) in terms of the gamma function and
then show that
√ √ √ π
2K(1/ 2)E(1/ 2) − K(1/ 2)2 = .
2
The details are left as an exercise.
√ √
Because the elliptic integrals K(1/ 2) and E(1/ 2) can be computed
by the rapidly converging AGM algorithm, the Legendre relation offers an
effective method for numerical calculation of π. For further information, see
the book by Borwein and Borwein [2]. For an exposition of the Legendre
relation as it occurs in the theory of elliptic functions, see also the paper by
Duren [5].
Exercises
1. Show that for 0 < b < a the arclength of the ellipse
x2 y 2
+ 2 =1 is 4a E (b/a) ,
a2 b
where E (k) = E(k ) is a complete elliptic integral of second kind. Use the
parametric representation x = a cos t, y = b sin t.
2. (a) Evaluate the lemniscate integral
1 √
dt 1
√ to show that K(1/ 2) = √ Γ( 14 )2 .
0 1 − t4 4 π
(b) Confirm Gauss’s conjecture (made on the basis of numerical evidence)
that
1 2 1 dx
√ = √ .
M ( 2, 1) π 0 1 − x4
√
3. Use the substitution u = 1 − t2 to show that
√ √ 1
1 + u2
2 E(1/ 2) = √ du .
0 1 − u4
Deduce that
√ 1
E(1/ 2) = √ Γ( 14 )2 + 4 Γ( 34 )2 .
8 π
√ √
4. Show that values of K(1/ 2) and E(1/ 2) found in the preceding exer-
cises are consistent with the Legendre relation.
Hint. Use the Euler reflection formula for the gamma function.
Exercises 385
(b) Use the power series expansions of K(k) and E(k) to verify the formula
dK E − k2 K
= .
dk kk 2
Discussion. In contrast with dE/dk in Exercise 8, the expression for dK/dk
is not readily calculated by differentiation under the integral sign. The power
series calculation is somewhat laborious, but a remarkable combination of
terms gives the result.
10. Verify the Landen transformation for complete elliptic integrals of sec-
ond kind as given by the formula (iv) in the corollary to Lemma 2.
11. As early as 1738, Euler discovered the formula
1 1
dx x2 dx π
√ √ = .
0 1 − x4 0 1 − x4 4
(a) Show
√ that this is a special case of the Legendre relation, with k =
k = 1/ 2.
386 14. Elliptic Integrals
References
387
388 Index of Names
Viète, François, 60
Wallis, John, 61
Weierstrass, Karl, 40, 98, 151, 156, 381
Weyl, Hermann, 123, 271
Wilbraham, Henry, 212
Wronski, Josef, 333
Subject Index
389
390 Subject Index
5 Patrick M. Fitzpatrick
Advanced Calculus, Second Edition
2006
4 Gerald B. Folland
Fourier Analysis and Its Applications
1992
3 Bettina Richmond and Thomas Richmond
A Discrete Transition to Advanced Mathematics
2004
2 David Kincaid and Ward Cheney
Numerical Analysis: Mathematics of Scientific Computing, Third Edition
2002
1 Edward D. Gaughan
Introduction to Analysis, Fifth Edition
1998
This book gives a rigorous treatment of selected topics in
classical analysis, with many applications and examples.
Sally
The This series was founded by the highly respected
mathematician and educator, Paul J. Sally, Jr.