Contractions: 3.1 Metric Spaces
Contractions: 3.1 Metric Spaces
Contractions: 3.1 Metric Spaces
Contractions
In this chapter we discuss contractions which are of fundamental importance for the field of
analysis, and essential tools for proving properties of ODEs. Before we discuss them, we first
need to introduce some background on the setting of metric spaces.
2. d(x, y) = 0 ⇔ x = y
25
26 CHAPTER 3. CONTRACTIONS
A point x ∈ X is called an accumulation point of a set A ⊂ X if all all balls B(x, ε) intersect
A{x}. The set of accumulation points of A is called the derived set A0 . A set A is closed if
A0 ⊂ A and Ā = A ∪ A0 . A is called perfect if A = A0 .
We say that a sequence {xn }n∈N if ∀ε > 0 there exists N ∈ N such that ∀n ≥ N we have
d(xn , x) < ε. We say that two sequences {xn }n∈N and {yn }n∈N converge exponentially (or with
exponential speed ) to each other if d(xn , yn ) < cdn for some c > 0 and 0 ≤ d < 1. The sequence
is a Cauchy sequence if ∀ ε > 0 there exists N ∈ N such that d(xi , xj ) < ε whenever i, j ≥ N .
A metric space is called complete if every Cauchy sequence converges in it.
Examples of complete metric spaces are Rn with the (usual) Euclidean metric, and all closed
subsets of Rn with this metric.
A condition of the type (3.2.1) is called a Lipschitz condition, where K ≥ 0 is called the
Lipschitz constant. Contractions are thus Lipschitz maps with a Lipschitz constant that is
smaller than 1.
We now formulate the central result about contractions.
Theorem 3.2.2 (Contraction mapping theorem). Let X be a complete metric space, and F :
X → X be a contraction. Then F has a unique fixed point, and under the action of iterates of
F : X → X, all points converge with exponential speed to it.
Proof. Iterating d(F (x), F (y)) ≤ Kd(x, y) gives
with x, y ∈ X and n ∈ N. Thus (F n (x))n∈N is a Cauchy sequence, because with m > n we have
m−n−1 m−n−1
X X Kn
d(F m (x), F n (x)) ≤ d(F n+k+1 (x), F n+k (x)) ≤ K n+k d(F (x), (x)) ≤ d(F (x), x)
k=0 k=0
1−K
and K n → 0 as n → ∞. In the last step we used the fact that with 0 ≤ K < 1 it follows that
m−n−1 ∞
X
k
X 1
K ≤ Kk = .
k=0 k=0
1−K
Thus the limit limn→∞ F n (x) exists because Cauchy sequences converge in X. We denote
the limit x0 . By (3.2.2) under iteration by F all points in X converge to the same point as
limn→∞ d(F n (x), F n (y)) = 0 for all x, y ∈ X so that if x converges to x0 then so does any
y ∈ X.
3.3. THE DERIVATIVE TEST 27
d(x0 , F (x0 )) ≤ d(x0 , F n (x)) + d(F n (x), F n+1 (x)) + d(F n+1 (x), F (x0 ))
≤ (1 + K)d(x0 , F n (x)) + K n d(x, F (x)),
for all x ∈ X and n ∈ N. The right-hand-side of this inequality tends to zero as n → ∞, and
hence F (x0 ) = x0 .
Proof. First we show that if F 0 (x) ≤ K then F is Lipschitz with Lipschitz constant K. By the
Mean Value Theorem, for any two points x, y ∈ I there exists a c between x and y such that
d(F (x), F (y)) = |F (x) − F (y)| = |F 0 (c)(x − y)| = |F 0 (c)|d(x, y) ≤ Kd(x, y).
At some point x0 ∈ I the maximum of |F 0 (x)| will be attained since F is continuous, and
|F 0 (x0 )| < 1.
Remark 3.3.2. The conclusion of Proposition 3.3.1 do not necessarily apply if the domain of
F is taken to be the entire real line.
Example 3.3.3 (Fibonacci’s rabits). Leonardo Pisano, better known as Fibonacci, tried to
understand how many pairs of rabits can be grown from one pair in one year. He figured out
that each pair breads a pair every month, but a newborn pair only breads in the second month
after birth. Let bn denote the number of rabit pairs at time n. Let b0 = 1 and in the firts
month they bread one pair so b1 = 2. At time n = 2, again one pair is bread (from the one that
were around at time n = 1, the other one does not yet have the required age to bread). Hence,
b2 = b1 + b0 . Subsequently, bn+1 = bn + bn−1 . Expecting the growth to be exponential we would
like to see how fast these number grow, by calculating an = bn+1 /bn . Namely, if bn → cdn as
n → ∞ for some c, d then bn+1 /bn → d. We have
1
an+1 = bn+2 /bn+1 = + 1.
an
Thus {an }n∈N is the orbit of a0 = 1 of the map g(x) = 1/x + 1. We have g 0 (x) = −x−2 . Thus g
is not a contraction on (0, ∞). But we note that a1 = 2 and consider the map g on the closed
interval [3/2, 2]. We have g(3/2) = 5/3 > 3/2 and g(2) = 3/2. Hence g([3/2, 2]) ⊂ [3/2, 2].
28 CHAPTER 3. CONTRACTIONS
Furthermore, fro x ∈ [3/2, 2] we have |g 0 (x)| = 1/x2 ≤ 4/9 < 1 so that g is a contraction
on [3/2, 2]. Hence, by the contraction mapping theorem, there exists a unique fixed point, so
limn→∞ an exists. The solution is a fixed√point of g(x), yielding x2 − x − 1 = 0. The only
positive root of this equation is x = (1 + 5)/2.
Example 3.3.4 (Newton’s method). Finding the roots (preimages of zero) of a function F :
R → R is difficult in general. Newton’s method is an approach to find such roots through
iteration. The idea is rather straight forward. Suppose x0 is a guess for a root. We would like
to improve our guess by chosing an improved approximation x1 . We write the first order Taylor
expansion of F at x1 in terms of our knowledge about F at x0 : F (x1 ) = F (x0 )+F 0 (x0 )(x1 −x0 ).
By setting F (x1 ) = 0 (our aim), we obtain from the Taylor expansion that
We note that a fixed point y of G corresponds to a root of F if F 0 (y) 6= 0. We call a fixed point
y of a differentiable map G superattracting if G0 (y) = 0. We have
Proposition 3.3.5. If |F 0 (x)| > δ for some δ > 0 and |F 00 (x)| < M for some M < ∞ on a
neighbourhood of a root r (satisfying F (r) = 0), then r is a superattracting fixed point of G (cf
(3.3.1)).
Proof. We observe that G0 (x) = F (x)F 00 (x)/(F 0 (x))2 . Note that G is a contraction on a
neighbourhood of r.
Note that if we consider the map G : C → C instead of G : R → R, the iterates behave in
a much more complicated way.
There is a higher dimensional version of this result, which requires us to introduce the notion
of the derivative DF of a map F : Rm → Rm :
F (x + εy) − F (x)
DF (x)y = lim .
ε→0 ε
Making a Taylor expansion of F in ε, and denoting F = (F1 , . . . , Fm ) where Fi denotes the ith
component of the map we obtain
yielding that (DF (x)y)i = ∇Fi (x) · y. In other words, DF is a linear map from Rm to Rm
which we may represent by the so-called Jacobian matrix
∂F ∂F1
1
∂x1
(x) · · · ∂x m
(x)
DF (x) = .. ..
.
. .
∂Fm ∂Fm
∂x1
(x) · · · ∂xm (x)
where xi denotes the ith component of the vector x = (x1 , . . . , xm ). For this derivation to be
meaningful, we need the first derivative of Fi with respect to xj for all i, j = 1, . . . m to exist.
If one of these does not exist then the map F is not differentiable.
3.4. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 29
For completeness, we now state the derivative test in Rm without proof. Recall that a
strictly convex set C ⊂ Rn is a set C such that for all a, b ∈ C̄, the line segment with endpoints
a, b is entirely contained in C, except possibly for one or both endpoints. Also, let the norm
||A|| of a linear map A is defined by ||A|| := max|v|=1 |A(v)|.
We note that this result is in agreement with the fact that equilibria of linear autonomous
ODEs with all eigenvalues having negative real part are asymptotically stable (with exponential
convergence).
Theorem 3.4.1 (Inverse function theorem in R). Suppose I ⊂ R is an open interval and
F : I → R is a differentiable function. If a is such that F 0 (a) 6= 0 and F 0 is continuous at a,
then F is invertible on a neighbourhood U of a and for all x ∈ U we have (F −1 )0 (y) = 1/F 0 (x),
where y = F (x).
Proof. The proof is by application of the contraction mapping theorem. We consider the map
y − F (x)
φy (x) = x +
F 0 (a)
on I. Fixed points of φy are solutions of our problem since φy (x) = x if and only if F (x) = y.
We now show that φy is a contraction in some closed neighbourhood of a ∈ I. Then by the
contraction mapping theorem, φy has a unique fixed point, and hence there exists a unique x
such that F (x) = y for y close enough to F (a).
Let A = F 0 (a) and α := |A|/2. By continuity of F 0 at a there is an ε > 0 such that with
W := (a − ε, a + ε) ⊂ I we have |F 0 (x) − A| < α for x in the closure W̄ of W .
To see that φy is a contraction on W̄ we observe that if x ∈ W̄ we have
0 A − F 0 (x)
F (x) < α = 1/2.
|φ0y (x)|
= 1 − =
|A|
A A
Now, using Proposition 3.3.1 we obtain |φy (x) − φy (x0 )| ≤ |x − x0 |/2 for all x, x0 ∈ W̄ .
30 CHAPTER 3. CONTRACTIONS
We also need to show that φy (W̄ ) ⊂ W̄ for y suffiently close to b := F (a). Let δ = |A|ε/2
and V = (b − δ, b + δ). Then for y ∈ V we have
y − F (a) y − b δ ε
|φy (a) − a| = a − − a = < = .
A A A 2
So if x ∈ W̄ then
|x − a| ε
|φy (x) − a| ≤ |φy (x) − φy (a)| + |φy (a) − a| ≤ + ≤ ε,
2 2
and hence φy (x) ∈ W̄ .
Hence, if y ∈ V then φy : W̄ → W̄ has a unique fixed point G(y) ∈ W which depends
continuously on y.
Next we prove that the inverse is differentiable: for y = F (x) ∈ V we will show that
G (y) = 1/B where B := F 0 (G(y)).
0
Hence, we have
|h| k |k| 1 2
≤ < and < .
2 A α |k| α|h|
Since G(y + k) − G(y) − k/B = h − k/B = −(F (x + h) − F (x) − Bh)/B we obtain
Remark 3.4.2. The above proof may look rather technical, but one should keep in mind that
the geometrical picture is rather straightforward. Consider the graph y = F (x). The condition
that F 0 (a) 6= 0 implies that the graph is locally monotonicaly increasing or decreasing near
(x, y) = (a, F (a)). Where F is invertible, we need the property that the graph y = F (x) can
also be seen as a graph of x as a function of y. Crucially we need for this the property that
locally each point in the domain (x) has a unique image point (y) in the range. In the graph,
this means that the curve y = F (x) when 900 rotated still has the form of a graph of a function
near y = F (a). Problems arise only when F has a local minimum or maximum at a, which
implies that F 0 (a) = 0. In that case, clearly F is not locally invertible near this point.
Example 3.4.4. Let F (x) = sin(x). W have F 0 (0) = 1. Hence, F is invertible near 0.
Being assured of the fact that the inverse locally exists, it makes sense to derive a Taylor
expansion of it. Let G = F −1 be define in a small neighbourhood of F (0) = 0, where it satisfies
3.4. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 31
We can interpret this as saying that A(x, y) = 0 implicitly defines a map L : Rp → Rm such
that A(Ly, y) = 0. The crucial condition transpiring from this manipulation is that A1 needs
to be invertible.
The IFT asserts that this property naturally extends to nonlinear maps F : Rm × Rp → Rm ,
in the neighbourhood of a point (a, b) where F (a, b) = 0, the corresponding condition being
that D1 F (a, b) (denoting the derivative with respect to the first variable) is invertible. The
IFT is closely related to the Inverse Function Theorem, and can be derived directly from it.
Example 3.4.8. Let F : R → R where F (x, λ) = sin(x) + λ we know that F (0, 0) = 0 and
would like to know about the existence of roots near x = 0 is λ is small. Since D1 F (0, 0) = 1 6= 0
the IFT asserts that if λ is small, there exists a unique x(λ) near 0 such that F (x(λ)) = 0.
Example 3.4.9 (Persistence of transverse intersections). Consider two curves in the plane
R2 . Let they have the parametrized form f, g : R → R2 . Then the intersection points of
these curves are roots of the equation h : R2 → R2 with h(s, t) = f (s) − g(t). Suppose they
have an intersection at f () = g(t) with (s, t) = (0, 0). Writing f (s) = (f1 (s), f2 (s))T and
g(s) = (g1 (s), g2 (s))Y we obtain
df1
− dgdt1 (0)
ds
(0)
Dh = df2 .
ds
(0) − dgdt2 (0)
The first column vector is the tangent vector to the curve of f and the second vector is the tan-
gent vector to the curve of g. Namely, thinking of the tangent as the best linear approximation
to the curve, we find
df
f (s) = f (0) + s (0) + O(s2 ).
ds
df
so that indeed ds
= ( dfds2 (0), dfds2 (0)) is the tangent vector at s = 0.
3.4. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 33
Suppose now that the curves depend smoothly on some parameter λ ∈ R, yielding parametriza-
tions fλ and gλ , then the intersections are given by roots of hλ = fλ − gλ . Suppose now that
at λ = 0 there is an intersection of the curves at (s, t) = (0, 0). We would like to understand
what happens to this intersection if λ is perturbed away from 0.
It follows from the IFT that if h0 (0, 0) = 0 and Dh0 (0, 0) is nonsingular, that for sufficiently
small λ, there exists smooth functions s(λ) and t(λ) so that hλ (s(λ), t(λ)) = 0 and these
functions describe the unique solutions near (0, 0). We refer to this locally smooth variation of
the intersection point as persistence.
The condition that Dh0 (0, 0) is nonsingular is related to transversality. We call the linear
subspace generated by the tangent vector to the curve for f transversal to the linear subspace
generated by the tangent vector to the curve for g if these tangent vectors span R2 . The latter
depends on the fact whether these vectors are linearly independent, which is identical to the
nonsingularity condition that det(Dh) 6= 0. We call the intersection of the two curves transverse
if the corresponding tangent vectors span the R2 .
We thus obtain the result that transverse intersections of curves in the plane are persistent.
This is an illustration of a more general theorem concerning the fact that transverse intersections
are persistent. It actually turns out that typically intersections of curves are transverse.
Remark 3.4.10. We note that the Inverse and Implicit Function Theorems can be proven not
only in Rm but also in more general Banach spaces (which are complete normed vector spaces).
There are any important examples of (infinite dimensional) function spaces that are Banach
spaces.
34 CHAPTER 3. CONTRACTIONS