Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Contractions: 3.1 Metric Spaces

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Chapter 3

Contractions

In this chapter we discuss contractions which are of fundamental importance for the field of
analysis, and essential tools for proving properties of ODEs. Before we discuss them, we first
need to introduce some background on the setting of metric spaces.

3.1 Metric spaces


Definition 3.1.1. Consider a set X, and a distance function d : X × X → R satisfying
1. d(x, y) = d(y, x) (symmetric),

2. d(x, y) = 0 ⇔ x = y

3. d(x, y) + d(y, z) ≥ d(x, z) (triangle inequality)


for all x, y, z ∈ X. Then (X, d) is called a metric space.
Note that it follows from the above prescribed conditions, that the distance function is
positive definite: d(x, y) ≥ 0.
We introduce a few elementary notions in metric spaces. We define the open r-ball at x ∈ X
as
B(x, r) := {y ∈ X | d(x, y) < r}.
A set A ⊂ X is called bounded if it is contained in an r-ball for some r < ∞, and open if for
all x ∈ A there exists an r such that B(x, r) ⊂ A. The interior of a set is the union of all
its open subsets. Any open subset of X containing x is called a neighbourhood of x ∈ X. A
point x ∈ X is a boundary point of a subset A ⊂ X if for all neighbourhoods U of x, we have
U ∩ A 6= ∅ and U \ A 6= ∅. The boundary ∂A of A ⊂ X is the set of all boundary points of A.
The closure of A, is defined as the set

Ā := {x ∈ X | B(x, r) ∩ A 6= ∅, ∀r > 0}.

A set A is closed is A = Ā, and A ⊂ X is dense in X if Ā = X. A set A is nowhere dense if


its closure has empty interior.

25
26 CHAPTER 3. CONTRACTIONS

A point x ∈ X is called an accumulation point of a set A ⊂ X if all all balls B(x, ε) intersect
A{x}. The set of accumulation points of A is called the derived set A0 . A set A is closed if
A0 ⊂ A and Ā = A ∪ A0 . A is called perfect if A = A0 .
We say that a sequence {xn }n∈N if ∀ε > 0 there exists N ∈ N such that ∀n ≥ N we have
d(xn , x) < ε. We say that two sequences {xn }n∈N and {yn }n∈N converge exponentially (or with
exponential speed ) to each other if d(xn , yn ) < cdn for some c > 0 and 0 ≤ d < 1. The sequence
is a Cauchy sequence if ∀ ε > 0 there exists N ∈ N such that d(xi , xj ) < ε whenever i, j ≥ N .
A metric space is called complete if every Cauchy sequence converges in it.
Examples of complete metric spaces are Rn with the (usual) Euclidean metric, and all closed
subsets of Rn with this metric.

3.2 The Contraction Mapping Theorem


Definition 3.2.1 (Contraction). A map F : X → X, where (X, d) is a metric space, is a
contraction if there exists K < 1 such that

d(F (x), F (y)) ≤ Kd(x, y), ∀x, y ∈ X. (3.2.1)

A condition of the type (3.2.1) is called a Lipschitz condition, where K ≥ 0 is called the
Lipschitz constant. Contractions are thus Lipschitz maps with a Lipschitz constant that is
smaller than 1.
We now formulate the central result about contractions.
Theorem 3.2.2 (Contraction mapping theorem). Let X be a complete metric space, and F :
X → X be a contraction. Then F has a unique fixed point, and under the action of iterates of
F : X → X, all points converge with exponential speed to it.
Proof. Iterating d(F (x), F (y)) ≤ Kd(x, y) gives

d(F n (x), F n (y)) ≤ K n d(x, y), (3.2.2)

with x, y ∈ X and n ∈ N. Thus (F n (x))n∈N is a Cauchy sequence, because with m > n we have
m−n−1 m−n−1
X X Kn
d(F m (x), F n (x)) ≤ d(F n+k+1 (x), F n+k (x)) ≤ K n+k d(F (x), (x)) ≤ d(F (x), x)
k=0 k=0
1−K

and K n → 0 as n → ∞. In the last step we used the fact that with 0 ≤ K < 1 it follows that
m−n−1 ∞
X
k
X 1
K ≤ Kk = .
k=0 k=0
1−K

Thus the limit limn→∞ F n (x) exists because Cauchy sequences converge in X. We denote
the limit x0 . By (3.2.2) under iteration by F all points in X converge to the same point as
limn→∞ d(F n (x), F n (y)) = 0 for all x, y ∈ X so that if x converges to x0 then so does any
y ∈ X.
3.3. THE DERIVATIVE TEST 27

It remains to be shown that x0 is a fixed pont of F : F (x0 ) = x0 . Using the triangle


inequality we have

d(x0 , F (x0 )) ≤ d(x0 , F n (x)) + d(F n (x), F n+1 (x)) + d(F n+1 (x), F (x0 ))
≤ (1 + K)d(x0 , F n (x)) + K n d(x, F (x)),

for all x ∈ X and n ∈ N. The right-hand-side of this inequality tends to zero as n → ∞, and
hence F (x0 ) = x0 .

3.3 The derivative test


We show that in the case of a differentiable map F : X → X, one can use the derivative to
prove that it is a contraction (on some bounded closed subset of the phase space).
We first consider the situation that F : I → I, where I ⊂ R is a closed bounded interval.

Proposition 3.3.1. Let I ⊂ R is a closed bounded interval, and F : I → I a continuously


differentiable (C 1 ) function with |F 0 (x)| < 1 for all x ∈ I. Then F is a contraction.

Proof. First we show that if F 0 (x) ≤ K then F is Lipschitz with Lipschitz constant K. By the
Mean Value Theorem, for any two points x, y ∈ I there exists a c between x and y such that

d(F (x), F (y)) = |F (x) − F (y)| = |F 0 (c)(x − y)| = |F 0 (c)|d(x, y) ≤ Kd(x, y).

At some point x0 ∈ I the maximum of |F 0 (x)| will be attained since F is continuous, and
|F 0 (x0 )| < 1.

Remark 3.3.2. The conclusion of Proposition 3.3.1 do not necessarily apply if the domain of
F is taken to be the entire real line.

Example 3.3.3 (Fibonacci’s rabits). Leonardo Pisano, better known as Fibonacci, tried to
understand how many pairs of rabits can be grown from one pair in one year. He figured out
that each pair breads a pair every month, but a newborn pair only breads in the second month
after birth. Let bn denote the number of rabit pairs at time n. Let b0 = 1 and in the firts
month they bread one pair so b1 = 2. At time n = 2, again one pair is bread (from the one that
were around at time n = 1, the other one does not yet have the required age to bread). Hence,
b2 = b1 + b0 . Subsequently, bn+1 = bn + bn−1 . Expecting the growth to be exponential we would
like to see how fast these number grow, by calculating an = bn+1 /bn . Namely, if bn → cdn as
n → ∞ for some c, d then bn+1 /bn → d. We have
1
an+1 = bn+2 /bn+1 = + 1.
an

Thus {an }n∈N is the orbit of a0 = 1 of the map g(x) = 1/x + 1. We have g 0 (x) = −x−2 . Thus g
is not a contraction on (0, ∞). But we note that a1 = 2 and consider the map g on the closed
interval [3/2, 2]. We have g(3/2) = 5/3 > 3/2 and g(2) = 3/2. Hence g([3/2, 2]) ⊂ [3/2, 2].
28 CHAPTER 3. CONTRACTIONS

Furthermore, fro x ∈ [3/2, 2] we have |g 0 (x)| = 1/x2 ≤ 4/9 < 1 so that g is a contraction
on [3/2, 2]. Hence, by the contraction mapping theorem, there exists a unique fixed point, so
limn→∞ an exists. The solution is a fixed√point of g(x), yielding x2 − x − 1 = 0. The only
positive root of this equation is x = (1 + 5)/2.
Example 3.3.4 (Newton’s method). Finding the roots (preimages of zero) of a function F :
R → R is difficult in general. Newton’s method is an approach to find such roots through
iteration. The idea is rather straight forward. Suppose x0 is a guess for a root. We would like
to improve our guess by chosing an improved approximation x1 . We write the first order Taylor
expansion of F at x1 in terms of our knowledge about F at x0 : F (x1 ) = F (x0 )+F 0 (x0 )(x1 −x0 ).
By setting F (x1 ) = 0 (our aim), we obtain from the Taylor expansion that

x1 = x0 − F (x0 )/F 0 (x0 ) =: G(x0 ). (3.3.1)

We note that a fixed point y of G corresponds to a root of F if F 0 (y) 6= 0. We call a fixed point
y of a differentiable map G superattracting if G0 (y) = 0. We have
Proposition 3.3.5. If |F 0 (x)| > δ for some δ > 0 and |F 00 (x)| < M for some M < ∞ on a
neighbourhood of a root r (satisfying F (r) = 0), then r is a superattracting fixed point of G (cf
(3.3.1)).
Proof. We observe that G0 (x) = F (x)F 00 (x)/(F 0 (x))2 . Note that G is a contraction on a
neighbourhood of r.
Note that if we consider the map G : C → C instead of G : R → R, the iterates behave in
a much more complicated way.
There is a higher dimensional version of this result, which requires us to introduce the notion
of the derivative DF of a map F : Rm → Rm :
F (x + εy) − F (x)
DF (x)y = lim .
ε→0 ε
Making a Taylor expansion of F in ε, and denoting F = (F1 , . . . , Fm ) where Fi denotes the ith
component of the map we obtain

Fi (x + εy) = Fi (x) + ε∇Fi (x) · y + o(ε),

yielding that (DF (x)y)i = ∇Fi (x) · y. In other words, DF is a linear map from Rm to Rm
which we may represent by the so-called Jacobian matrix
 ∂F ∂F1

1
∂x1
(x) · · · ∂x m
(x)
DF (x) =  .. ..
.
 
. .
∂Fm ∂Fm
∂x1
(x) · · · ∂xm (x)

where xi denotes the ith component of the vector x = (x1 , . . . , xm ). For this derivation to be
meaningful, we need the first derivative of Fi with respect to xj for all i, j = 1, . . . m to exist.
If one of these does not exist then the map F is not differentiable.
3.4. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 29

For completeness, we now state the derivative test in Rm without proof. Recall that a
strictly convex set C ⊂ Rn is a set C such that for all a, b ∈ C̄, the line segment with endpoints
a, b is entirely contained in C, except possibly for one or both endpoints. Also, let the norm
||A|| of a linear map A is defined by ||A|| := max|v|=1 |A(v)|.

Theorem 3.3.6. If C ⊂ Rn is an open strictly convex set, C̄ its closure, F : C̄ → Rn


differentiable on C and continuous on C̄ with ||DF || ≤ K < 1 on C, then F has a unique fixed
point x0 ∈ C̄ and d(F n (x), x0 ) ≤ K n d(x, x0 ) for every x ∈ C̄.

We note that this result is in agreement with the fact that equilibria of linear autonomous
ODEs with all eigenvalues having negative real part are asymptotically stable (with exponential
convergence).

3.4 The Inverse and Implicit Function Theorems


The inverse function theorem says that if a differentiable map has invertible derivative at some
point, then the map is invertible near that point. It is thus related to ”linearizability”: if the
linearization of a map in a point is invertible, then so is the nonlinear map in a neighbourhood
of this point.
We first consider the simplest version of the inverse function theorem, in R.

Theorem 3.4.1 (Inverse function theorem in R). Suppose I ⊂ R is an open interval and
F : I → R is a differentiable function. If a is such that F 0 (a) 6= 0 and F 0 is continuous at a,
then F is invertible on a neighbourhood U of a and for all x ∈ U we have (F −1 )0 (y) = 1/F 0 (x),
where y = F (x).

Proof. The proof is by application of the contraction mapping theorem. We consider the map

y − F (x)
φy (x) = x +
F 0 (a)

on I. Fixed points of φy are solutions of our problem since φy (x) = x if and only if F (x) = y.
We now show that φy is a contraction in some closed neighbourhood of a ∈ I. Then by the
contraction mapping theorem, φy has a unique fixed point, and hence there exists a unique x
such that F (x) = y for y close enough to F (a).
Let A = F 0 (a) and α := |A|/2. By continuity of F 0 at a there is an ε > 0 such that with
W := (a − ε, a + ε) ⊂ I we have |F 0 (x) − A| < α for x in the closure W̄ of W .
To see that φy is a contraction on W̄ we observe that if x ∈ W̄ we have
0 A − F 0 (x)

F (x) < α = 1/2.
|φ0y (x)|

= 1 − =
|A|
A A

Now, using Proposition 3.3.1 we obtain |φy (x) − φy (x0 )| ≤ |x − x0 |/2 for all x, x0 ∈ W̄ .
30 CHAPTER 3. CONTRACTIONS

We also need to show that φy (W̄ ) ⊂ W̄ for y suffiently close to b := F (a). Let δ = |A|ε/2
and V = (b − δ, b + δ). Then for y ∈ V we have

y − F (a) y − b δ ε
|φy (a) − a| = a − − a = < = .
A A A 2

So if x ∈ W̄ then
|x − a| ε
|φy (x) − a| ≤ |φy (x) − φy (a)| + |φy (a) − a| ≤ + ≤ ε,
2 2
and hence φy (x) ∈ W̄ .
Hence, if y ∈ V then φy : W̄ → W̄ has a unique fixed point G(y) ∈ W which depends
continuously on y.
Next we prove that the inverse is differentiable: for y = F (x) ∈ V we will show that
G (y) = 1/B where B := F 0 (G(y)).
0

Let U := G(V ) = W ∩ F −1 (V ), which is open. Take y + k = F (x + h) ∈ V . Then



|h| F (x) − F (x + h) k
≥ |φy (x + h) − φy (x)| = h + = h − ≥ |h| − |K/A|.
2 A A

Hence, we have
|h| k |k| 1 2
≤ < and < .
2 A α |k| α|h|
Since G(y + k) − G(y) − k/B = h − k/B = −(F (x + h) − F (x) − Bh)/B we obtain

|G(y + k) − G(y) − k/B| 2 |F (y + h) − F (y) − Bh|


< → 0 as |h| ≤ |k|/α → 0.
|k| |B|α |h|

This proves that G0 (y) = 1/B.

Remark 3.4.2. The above proof may look rather technical, but one should keep in mind that
the geometrical picture is rather straightforward. Consider the graph y = F (x). The condition
that F 0 (a) 6= 0 implies that the graph is locally monotonicaly increasing or decreasing near
(x, y) = (a, F (a)). Where F is invertible, we need the property that the graph y = F (x) can
also be seen as a graph of x as a function of y. Crucially we need for this the property that
locally each point in the domain (x) has a unique image point (y) in the range. In the graph,
this means that the curve y = F (x) when 900 rotated still has the form of a graph of a function
near y = F (a). Problems arise only when F has a local minimum or maximum at a, which
implies that F 0 (a) = 0. In that case, clearly F is not locally invertible near this point.

Remark 3.4.3. In Theorem 3.4.1, if F is C r then it can be shown that F −1 is C r as well.

Example 3.4.4. Let F (x) = sin(x). W have F 0 (0) = 1. Hence, F is invertible near 0.
Being assured of the fact that the inverse locally exists, it makes sense to derive a Taylor
expansion of it. Let G = F −1 be define in a small neighbourhood of F (0) = 0, where it satisfies
3.4. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 31

G(sin(x)) = x. We obtain a Taylor expansion of G by substituting the Taylor expansion of


sin(x) and that of G in this equation and resolve the equaliy at each order in x. We write
G(y) = ay + by 2 + cy 3 + dy 4 + O(y 5 ) and sin(x) = x − 61 x3 + O(x5 ). Matching Taylor coefficients
we obtain a = b = 1, c = 16 and d = 0 so that
1
F −1 (x) = x + x3 + O(x5 ).
6
(Note that without the knowledge about the inverse function theorem, one could still try to
find a Taylor expansion of the inverse, but one would not know - in principle - whether this
expansion would converge and thus whether this was the expansion of an existing inverse.)
Without too much difficulty (replacing some numbers by linear maps and some absolute
values by matrix norms) a similar result can be proven for maps of Rm . (We leave this as an
exercise.)
Theorem 3.4.5 (Inverse function theorem in Rm ). Suppose O ⊂ Rm is open, F : O → Rm
differentiable, and DF is invertible at a point a ∈ O and continuous at a. Then there exist
neighbourhoods U ⊂ O of a and V of b := F (a) ∈ Rm such that F is a bijection from U to V
[i.e. F is one-to-one on U and F (U ) = V ]. The inverse G : V → U of F is differentiable with
DG(y) = (DF (G(y)))−1 . Furthermore, if F is C r on U , then so is its inverse (on V ).
Example 3.4.6. Consider the map F : R2 → R2 defined by
   2 
x x −y
F = .
y −x
Then  
2x −1
DF (x) = ,
−1 0
from which it follows hat DF (x) is invertible for all x since det(DF (x)) = −1, and nonin-
vertibility would require that his determinant is equal to zero. The fact that the derivative is
invertible for all x ∈ R2 appears to imply that F is invertible on all of R2 . And indeed, the
inverse of F can be computed to be
   
x −y
F = .
y y2 − x
We now turn our attention to a result that is closely related to the inverse function theorem.
The Implicit Function Theorem (IFT) establishes, under the assumption of some conditions
on derivatives, that if we can solve an equation for a particular parameter value, then there
is a solution for nearby parameters as well. We illustrate the principle with a linear map
A : Rm × Rp → Rm . We write A := (A1 , A2 ), where A1 : Rm → Rm and A2 : Rp → Rm are
linear. Suppose we pick y ∈ Rp and want to find x ∈ Rm so that A(x, y) = 0. To see when
this can be done, write A1 x + A2 y = 0 as

A(x, y) = 0 ⇔ x = −(A1 )−1 A2 y := Ly. (3.4.1)


32 CHAPTER 3. CONTRACTIONS

We can interpret this as saying that A(x, y) = 0 implicitly defines a map L : Rp → Rm such
that A(Ly, y) = 0. The crucial condition transpiring from this manipulation is that A1 needs
to be invertible.
The IFT asserts that this property naturally extends to nonlinear maps F : Rm × Rp → Rm ,
in the neighbourhood of a point (a, b) where F (a, b) = 0, the corresponding condition being
that D1 F (a, b) (denoting the derivative with respect to the first variable) is invertible. The
IFT is closely related to the Inverse Function Theorem, and can be derived directly from it.

Theorem 3.4.7 (Implicit Function Theorem in Rm ). Let O ⊂ Rm ×Rp be open and F : O → Rm


a C r map. If there is a point (a, b) ∈ O such that F (a, b) = 0 and D1 F (a, b) is invertible,
then there are open neighbourhoods U ⊂ O of (a, b) and V ⊂ Rp of b such that for every y ∈ V
there exists a unique x =: G(y) ∈ Rm with (x, y) ∈ U and F (x, y) = 0. Furthermore, G is C r
and DG(y) = −(D1 F (x, y))−1 D2 F (x, y).

Proof. The map H(x, y) := (F (x, y), y) : O → Rm × Rp is C r then DH(a, b)(x, y) =


(D1 F (a, b)x + D2 F (a, b)y, y). This is equal to (0, 0) only if y = 0 and D1 F (a, b)x = 0,
which implies that x = 0 if D1 F (a, b) is invertible. Hence DH is invertible and by the In-
verse Function Theorem there are open neighbourhoods U ⊂ O of (a, b) and W ⊂ Rm × Rp
of (0, b) such that H : U → W is invertible with C r inverse H −1 : W → U . Thus, for any
y ∈ V := {y ∈ Rp | (0, y) ∈ W } there exists an x := G(y) ∈ Rm such that (x, y) ∈ U and
H(x, y) = (0, y), or equivalently F (x, y) = 0.
Now (G(y), y) = (x, y) = H −1 (0, y) and hence G is C r . To find DG(b), let γ(y) :=
(G(y), y). Then F (γ(y)) = 0 and hence DF (γ(y))Dγ(y) = 0 by the chain rule. For y = b
this gives D1 F (a, b)DG(b) + D2 F (a, b) = DF (a, b)Dγ(b) = 0, completing the proof.

Example 3.4.8. Let F : R → R where F (x, λ) = sin(x) + λ we know that F (0, 0) = 0 and
would like to know about the existence of roots near x = 0 is λ is small. Since D1 F (0, 0) = 1 6= 0
the IFT asserts that if λ is small, there exists a unique x(λ) near 0 such that F (x(λ)) = 0.

Example 3.4.9 (Persistence of transverse intersections). Consider two curves in the plane
R2 . Let they have the parametrized form f, g : R → R2 . Then the intersection points of
these curves are roots of the equation h : R2 → R2 with h(s, t) = f (s) − g(t). Suppose they
have an intersection at f () = g(t) with (s, t) = (0, 0). Writing f (s) = (f1 (s), f2 (s))T and
g(s) = (g1 (s), g2 (s))Y we obtain
df1
− dgdt1 (0)
 
ds
(0)
Dh = df2 .
ds
(0) − dgdt2 (0)

The first column vector is the tangent vector to the curve of f and the second vector is the tan-
gent vector to the curve of g. Namely, thinking of the tangent as the best linear approximation
to the curve, we find
df
f (s) = f (0) + s (0) + O(s2 ).
ds
df
so that indeed ds
= ( dfds2 (0), dfds2 (0)) is the tangent vector at s = 0.
3.4. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 33

Suppose now that the curves depend smoothly on some parameter λ ∈ R, yielding parametriza-
tions fλ and gλ , then the intersections are given by roots of hλ = fλ − gλ . Suppose now that
at λ = 0 there is an intersection of the curves at (s, t) = (0, 0). We would like to understand
what happens to this intersection if λ is perturbed away from 0.
It follows from the IFT that if h0 (0, 0) = 0 and Dh0 (0, 0) is nonsingular, that for sufficiently
small λ, there exists smooth functions s(λ) and t(λ) so that hλ (s(λ), t(λ)) = 0 and these
functions describe the unique solutions near (0, 0). We refer to this locally smooth variation of
the intersection point as persistence.
The condition that Dh0 (0, 0) is nonsingular is related to transversality. We call the linear
subspace generated by the tangent vector to the curve for f transversal to the linear subspace
generated by the tangent vector to the curve for g if these tangent vectors span R2 . The latter
depends on the fact whether these vectors are linearly independent, which is identical to the
nonsingularity condition that det(Dh) 6= 0. We call the intersection of the two curves transverse
if the corresponding tangent vectors span the R2 .
We thus obtain the result that transverse intersections of curves in the plane are persistent.
This is an illustration of a more general theorem concerning the fact that transverse intersections
are persistent. It actually turns out that typically intersections of curves are transverse.

Remark 3.4.10. We note that the Inverse and Implicit Function Theorems can be proven not
only in Rm but also in more general Banach spaces (which are complete normed vector spaces).
There are any important examples of (infinite dimensional) function spaces that are Banach
spaces.
34 CHAPTER 3. CONTRACTIONS

You might also like