Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Math-UA.326.001: Analysis II Notes For The Inverse Function Theorem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Math-UA.326.

001: Analysis II
Notes for the Inverse Function Theorem

Tim Austin
803 Warren Weaver Hall
tim@cims.nyu.edu
http://cims.nyu.edu/tim

1 The Contraction Mapping Principle

Suppose that E Rn is closed and that f : E E is a function.


Definition 1 (Fixed point). A fixed point for f is a point x E such that f (x) = x.
Definition 2 (Contraction). The function f : E E is a contraction if there is a
constant c < 1 such that

kf (x) f (y)k ckx yk x, y E.

Remark 1. Intuitively, this definition means that f makes all distances shorter this
is where the name contraction comes from. However, be aware that the definition of
a contraction is a little bit stronger than just asking that kf (x) f (y)k < kx yk
whenever x 6= y, because the constant c < 1 is fixed independent of x and y. We adopt
the stronger definition for the sake of the Contraction Mapping Principle, stated below.
If f has only this weaker property of making distances shorter, it might not satisfy that
principle: see Homework 8. 
Lemma 3. Contractions are uniformly continuous.

Proof. For any > 0, if we let = then

kx yk < = kf (y) f (y)k ckx yk < c < = .

Theorem 4 (The Contraction Mapping Principle). If E Rn is nonempty and closed


and f : E E is a contraction, then f has a unique fixed point. If x is the fixed point
and y E is any other point, then the sequence of images

y1 = y, y2 = f (y), y3 = f (y2 ) = f (f (y)), ...

1
converges to x, and we have

ky f (y)k
ky xk . (1)
1c

In prose, the last part of this theorem means that if y is approximately fixed, in that
ky f (y)k < for some small , then y is actually close to the genuine fixed point,
in that ky xk < /(1 c).

Proof. Let y E and define the sequence (yn ) n=1 as above. We will first show that
yn converges to a fixed point for which (1) holds, and then show that the fixed point is
unique.
If f (y) = y, then we have already found a fixed point, so assume this is not the case.
Then ky2 y1 k > 0.
Because f is a contraction, for any n one has

kyn+2 yn+1 k = kf (yn+1 ) f (yn )k ckyn+1 yn k,

and now by induction on n this implies

kyn+2 yn+1 k cn ky2 y1 k n 1.

Combined with the triangle inequality, this gives a bound on the distance between any
two terms of the sequence as follows: if k m + 1, then
k1
X k1
X
kyk ym k kyn+1 yn k cn1 ky2 y1 k
n=m n=m
m1
= c (1 + c + + ckm1 )ky2 y1 k.

Since one always has



X 1
1 + c + + ckm1 ci = ,
i=0
1c

the above inequality implies

cm1
kyk ym k ky2 y1 k. (2)
1c

It follows that > 0, if one chooses N 1 so large that


1c
cN 1 <
ky2 y1 k

2
(which is possible because c < 1 so cN 1 0 as N ), then whenever
k, m N one has
cN 1
kyk ym k ky2 y1 k < .
1c

Therefore, (yn ) n
n=1 is a Cauchy sequence. By the completeness of R , it has a limit
x; and because E is closed, that limit must be in E.
Thus, yn x as n . Since f is continuous this implies f (yn ) f (x) as
n . However, the construction of the sequence yn gives

f (yn ) = yn+1 ,

so this is just the same sequence with the index n shifted by 1. It must therefore have
the same limit x, and so
f (x) = x.

This gives a fixed point x. Also, applying (2) with m = 1 gives


1 1
ky yk k ky1 y2 k = ky f (y)k,
1c 1c
so letting n this proves the inequality (1).
Lastly, to prove uniqueness of the fixed point, suppose that x, x0 E are both fixed by
f . Then
kx x0 k = kf (x) f (x0 )k ckx xk,
and since c < 1 this is possible only if kx x0 k = 0, hence x = x0 .
Corollary 5. Suppose that f : Br (a) Br (a) is a differentiable function and c < 1
is a constant such that
kDf (z)k c z Br (a).
Then f is a contraction with the same constant c, so has a unique fixed point in Br (a).

Proof. By the inequality deduced from the Mean Value Theorem, our assumption im-
plies that
kf (x) f (y)k ckx yk x, y Br (a).

Now suppose that x, y Br (a). Then we may choose sequences (xn ) n=1 and
(yn )
n=1 in B r (a) such that x n x and y n y as n . Since f is dif-
ferentiable, it is also continuous, and therefore f (xn ) f (x) and f (yn ) f (y)
and so also,

kf (x) f (y)k = lim kf (xn ) f (yn )k c lim kxn yn k = ckx yk.


n n

3
2 The Inverse Function Theorem

Suppose that T : Rn Rn is a linear function represented by the (n n)-matrix


B. Sometimes we think of such a T as a change of coordinate system, but in this case
we want T to be invertible, so that every point in one coordinate system corresponds to
exactly one point in the other. This happens if and only if the matrix B is invertible, and
that in turn holds if and only if det B 6= 0. Sometimes we will also write det T = 0,
where the determinant of a linear function is simply defined to be the determinant of
its representing matrix.
So, provided det T 6= 0, one has a bijection between Rn and itself. Of course its
inverse is just represented by B 1 .
Now, recall that if V Rn is open, a V and f : V Rn is a function of
class C 1 , then on a small enough ball B (a) the function f is well approximated by
the constant f (a) plus the linear function Df (a). So if the linear function Df (a) is
invertible, we might hope that f |B (a) itself is also invertible, at least on this small ball,
with a well-behaved inverse. This is essentially the contents of the Inverse Function
Theorem.
Example 1. One cannot prove a theorem about an inverse for the whole function f . If
we assume standard properties of the functions sin and cos (which will be treated later
in the course), then a simple example is the following: let

V = {(x, y) R2 | x > 0},

and let f : V R2 be the function

f (x, y) = (x sin y, x cos y).

One may check easily that it is differentiable with the derivative


 
sin y x cos y
Df (x, y) represented by .
cos y x sin y

This is invertible everywhere (its determinant is x( sin2 y cos2 y) = x, which


is nonzero on V ), and it varies continuously with (x, y). However, f is not bijective:
indeed,
f (x, y) = f (x, y + 2) (x, y) V.

See Remark 11.45 in Wade for another example. 


Theorem 6 (Inverse Function Theorem). Suppose that V R is open, f : V Rn
n

is C 1 , and a V . Suppose also that det Df (a) 6= 0. Then there are open subsets
U Rn and W Rn such that

a U and f (a) W ;
the restriction f |U is a bijection U W ;

4
the inverse function g = (f |U )1 : W U is of class C 1 with derivative
satisfying
1
Dg(y) = Df (g(y)) y W. (3)
Example 2. Of course, the assumption that det Df (a) 6= 0 is essential: if we drop
this, then any non-invertible linear function T : Rn Rn gives an example which
cannot have an inverse on any nonempty open set. 
Example 3. The assumption that f is not just differentiable, but actually of class C 1 ,
is also essential. Assuming again standard properties of sin, let f : R R be the
function
f (x) = x + 2x2 sin(1/x).
Then f is differentiable and f 0 (0) = 1. However, there is no open set U 3 0 such that
f |U is a bijection from U to another open set. This is because the function f oscillates
too fast as x 0, so that for any > 0 one finds that it still oscillates infinitely many
times on the interval (, ), and hence cannot be injective on that interval. This does
not contradict Theorem 6, because for this f we have
f 0 (x) = 1 + 4x sin(1/x) 2 cos(1/x),
which is not continuous at x = 0. This example is analyzed more carefully in Remark
11.44 in Wade. 

The proof of Theorem 6 will take several steps. Most of these will actually go towards
proving the following slightly weaker theorem:
Theorem 7. Suppose that V Rn is open, f : V Rn is C 1 , and a V . Suppose
also that det Df (a) 6= 0. Then there are open subsets U Rn and W Rn such
that

a U and f (a) W ;
the restriction f |U is a bijection U W ;
the inverse function g = (f |U )1 : W U is differentiable at f (a) with
derivative given by
1
Dg(f (a)) = Df (a) . (4)

The only difference between these theorems is in the last conclusion: Theorem 6
promises that g is differentiable everywhere in W , with continuous derivative, and
gives a formula for the derivative. On the other hand, Theorem 7 promises only that g
is differentiable at the particular point f (a), and says nothing about its differentiability
elsewhere.
We will first prove Theorem 7 and then show how Theorem 6 follows from it. Assum-
ing that V , f and a are data satisfying the hypotheses of those theorems, the proofs will
involve the following steps:

5
1. By applying some symmetries, we may assume for the proof of Theorem 7 that
a = f (a) = 0 and Df (0) is the identity mapping.
2. By restricting to a small enough U1 V , U1 3 0, we may arrange that f |U1 is
injective.
3. By restricting further to some U U1 , U 3 0, we may arrange that f |U is
bijective between U and some open set W containing 0. The key new conclusion
here is that both U and W are open. By the end of Step 2, we will already know
that f |U1 is a bijection U1 f (U1 ), but we wont know that f (U1 ) is open.
4. Next we can prove that the inverse function g : W U is also differentiable
at 0 with derivative equal to the identity mapping this will complete the proof
of Theorem 7.
5. Finally, we can show why this implies that g is actually differentiable on the
whole of W , with derivative given by (3) this will complete the proof of The-
orem 6.

2.1 Step 1: simplifying the problem

We begin by considering the weaker conclusion of Theorem 7. First we will argue that
for the proof of this, we may also assume without loss of generality that

(i) a = 0 V ,
(ii) f (0) = 0, and
(iii) Df (0) = I, the identity function.

To see this, assume for now that we know the conclusion of Theorem 7 whenever
these extra assumptions also hold. We will show how these extra assumptions can be
removed one-by-one.
Lemma 8. If the conclusion of Theorem 7 holds whenever we also have (i), (ii) and
(iii), then it holds whenever we have just (i) and (ii).

Proof. Suppose that V and f satisfy the assumptions of Theorem 7 and also (i) and (ii)
above, but not necessarily (iii). Let
T := Df (0),
so by the assumptions of Theorem 7 itself T is invertible.
Now let f1 = T1 f , so this is still a function V Rn . Note that it is equivalent
to write f = T f1 . Since f is differentiable, the Chain Rule tells us that f1 is also
differentiable, with
Df1 (x) = D(T1 )(f (x)) Df (x) = T1 Df (x).

6
Moreover, this proves that f1 is also C 1 , because

kT1 Df (x) T1 Df (y)kop = kT1 (Df (x) Df (y))kop


kT1 kop kDf (x) Df (y)kop ,

so if
kDf (x) Df (y)kop <
then
[kT1 Df (x) T1 Df (y)kop < kT1 kop .

In addition,
Df1 (0) = T1 Df (0) = T1 T = I.

So f1 satisfies (i), (ii) and (iii), so by our assumption we know that the conclusion of
Theorem 7 holds for f1 : there are open subsets U1 V containing 0 and W1 containing
0 such that

f1 |U1 is a bijection U1 W1 , and


the inverse function g1 := (f1 |U1 )1 : W1 U1 is differentiable at 0 with
derivative equal to (Df1 (0))1 = I.

Now let U = U1 , W = T(W1 ), and let g = g1 T1 . Then W is open, because it


equals (T1 )1 (W1 ) and T1 is continuous on Rn and W1 is open. Also, g is defined
on W , because if y W then T1 (y) W1 and g1 is defined on W1 .
Next, g is the inverse of f |U , because:

if x U then

g(f (x)) = g1 (T1 (f (x))) = g1 (T1 (T(f1 (x)))) = g1 (f1 (x)) = x,

and
if y W , then

f (g(y)) = T(f1 (g(y))) = T(f1 (g1 (T1 (y)))) = T(T1 (y)) = y.

Lastly, another appeal to the Chain Rule shows that g, like g1 , is differentiable at 0,
with derivative given by

Dg(0) = Dg1 (T1 (0)) T1 = Dg1 (0) T1 = T1 ,

as required for the conclusion of Theorem 7.

Removing the other assumptions, (i) and (ii), is easier, and we only sketch the proofs.

7
Lemma 9. If the conclusion of Theorem 7 holds whenever we also have (i), (ii), then
it holds whenever we have just (i).

Sketch proof. Suppose that V and f satisfy the assumptions of Theorem 7 and that
a = 0. Define a new function f1 : V Rn by

f1 (x) = f (x) f (0).

Alternatively, we may write this as the composition

f1 = A f ,

where A : Rn Rn is the translation

A(y) = y f (0).

Now another repeated appeal to the Chain Rule shows that if U1 V , W1 Rn and
g1 : W1 U1 are subsets and an inverse function as provided by Theorem 7 for f1 ,
then the suitable subsets and inverse function for f itself are

U = U1 , W = {y + f (0) | y W1 } = A1 (W1 )

and
g = g1 A.

Lemma 10. If the conclusion of Theorem 7 holds whenever we also have (i), then it
holds in general.

Sketch proof. Suppose that V and f satisfy the assumptions of Theorem 7. Define a
new set
V1 = {x a | x V }
and a new function f1 : V1 Rn by

f1 (x) = f (x + a),

so 0 V1 and f1 (0) = f (a). We may write these in terms of the translation

A(x) = x + a

as V1 = A1 (V ) and f1 = f A. Another repeated appeal to the Chain Rule shows


that if U1 V , W1 Rn and g1 : W1 U1 are the subsets and inverse function
provided by Theorem 7 for f1 , then the suitable subsets and inverse function for f itself
are
U = A(U1 ), W = W1
and
g = A g1 .

8
In conclusion, we have reduced the proof of Theorem 7 to the proof of the following:
Theorem 11. Suppose that V Rn is open, f : V Rn is C 1 , 0 V , and that
f (0) = 0 and Df (0) = I. Then there are open subsets U Rn and W Rn such
that

0 U and 0 W ;
the restriction f |U is a bijection U W ;
the inverse function g = (f |U )1 : W U is differentiable at 0 with deriva-
tive equal to I.

2.2 Step 2: injectivity

For now we continue to focus on Theorem 7, or rather its special case Theorem 11
For every > 0 there is some () > 0 such that
B2() (0) V
(because V is open) and
kDf (x) Ikop < x B2() (0)
1
(because f is of class C and Df (0) = I).
(Notice that we use a ball of radius 2(), not just (). This will lighten the notation
later. Of course it makes no difference formally, since we may choose () however
small we need.)
Lemma 12. With () as above, f is injective when restricted to the open ball Br (0)
with r = 2(1/2).

Proof. Suppose that x, y Br (0) and that f (x) = f (y). Let u = y x. By the
higher-dimensional Mean Value Theorem (11.32 in Wade), there is some z Br (0)
such that
u (f (y) f (x)) = u (Df (z)(y x)).
Of course, by assumption the left-hand side here is 0. On the other hand, writing Df (z)
as I + (Df (z) I), the right-hand side is equal to

(y x) (y x) + (y x) (Df (z) I)(y x)
= ky xk2 + (y x) (Df (z) I)(y x)


ky xk2 (y x) (Df (z) I)(y x)




ky xk2 ky xk Df (z) I op ky xk

1 1
ky xk2 ky xk2 = ky xk2 .
2 2

Therefore ky xk = 0, and so y = x.

9
2.3 Step 3: bijectivity

Now assume the hypotheses of Theorem 11, and assume also that Br (0) V is such
that f |Br (0) is injective, as provided by the previous lemma.
We next prove that f must hit a whole open ball around 0.
Lemma 13. For any < 1/2, the following holds:

(i) f (B() (0)) B()/2 (0), and

(ii) if h B()/2 (0), and if x B() (0) satisfies f (x) = h, then



kx hk < khk.
1

Proof. This is the point at which we will apply the Contraction Mapping Principle.
For part (i), suppose that h B()/2 (0). We need to find some x B() (0) such
that

f (x) = h. (5)

Now, since f is C 1 and Df (0) = I, our intuition is that f should be very well approx-
imated by the identity function close to 0. Define r(x) = f (x) x, so that f is equal
to I plus the remainder r, which should be very small.
By the sum rule,
Dr(z) = Df (z) I,
so for z B() (0) B2() (0) one must have kDr(z)kop . In terms of r, we
seek a point x such that

h = x + r(x), i.e. x = r(x) + h.

Thus, we have re-arranged the problem into finding a fixed point for the function

G(x) = r(x) + h.

On the one hand, this function satisfies

DG(x) = Dr(x),

which has operator norm at most on B() (0). On the other hand, for any x
B() (0), we have

kG(x)k kr(x)k + khk = kr(x) r(0)k + khk kxk + ()/2 < (),

because < 1/2, so also G(x) B() (0).

10
Therefore, the function G is a contraction from B() (0) to itself, so it has a unique
fixed point x. This is the desired solution to equation (5), so we have proved (i).
To prove (ii), let h B()/2 (0) and let x be the pre-image of h obtained above. The
point h itself satisfies
kf (h) hk = kr(h)k khk,
by Corollary 5. Therefore, applying the inequality (1) proved as part of the Contraction
Mapping Principle with y = h, we have

kh xk khk,
1
as required.

Corollary 14. With the same hypotheses as above, there are open sets U Br (0) and
W Rn such that 0 U and 0 W , and f |U is a bijection U W .

Proof. Let = 1/3 (actually any value in (0, 1/2) will do) and let W = B()/2 (0).
Since we chose r = 2(1/2) > (), we have

Br (0) B() (0),

and hence also


f (Br (0)) f (B() (0)) W,
by the previous lemma. So now letting

U = Br (0) f 1 (W ),

it follows that f |U is injective (since U Br (0), and the restriction of f to that larger
set is already injective by Lemma 12). On the other hand, part (i) of Lemma 13 implies
that f (U ) = W , so f is aso surjective. Lastly, f 1 (W ) is open (as the inverse image
of an open set under a continuous map) and so U is also open. Thus f |U is a bijection
U W , as required.

2.4 Step 4: differentiability of the inverse at the origin

Having found the subsets U V and W = B()/2 (0) in Step 3, we may now let
g : W U be the inverse of f |U . In particular, since f (0) = 0, so also g(0) = 0.
We can now complete the proof of Theorem 11.
Lemma 15. The function g is differentiable at 0 with derivative equal to I.

Proof. This is a matter of checking the definition. Since g(0) = 0, we must show that

g(h) g(0) h g(h) h


= 0
khk khk

11
as h 0.
To see this, recall from Lemma 12 that if < 1/2 and h B()/2 (0), then there is a
unique point x B() (0) such that f (x) = h. Because of that uniqueness, we must
have x = g(h). However, now part (ii) of Lemma 12 gives that

kg(h) hk khk < 2khk,
1
since 1 > 1/2. Therefore, for any > 0, if we choose = (/2)/2, then for any
h B (0) we have

kg(h) hk
kg(h) hk < 2(/2)khk = < ,
khk

so since we could choose > 0 arbitrarily small, this ratio tends to 0 as h 0.

This completes the proof of Theorem 11, and so, as explained in Step 1, it also com-
pletes the proof of Theorem 7.

2.5 Step 5: the inverse is continuous and of class C 1

Having proved Theorem 7, in the last step we show how to deduce Theorem 6 from it.
Lemma 16. Suppose that f , V , a and also U and W are as in Theorem 6. Then the
inverse function g = (f |U )1 : W U is differentiable at every point of W , with
derivative given by
1
Dg(y) = Df (g(y)) y W. (6)

Proof. The key point is that for any x U , if we let y = f (x), then we may simply
apply Theorem 7 again at this new point to find further open subsets U1 U and
W1 W such that f |U1 is a bijection U1 W1 , and such that the inverse function
g1 : W1 U1 is differentiable at y = f (x) with
1
Dg1 (y) = Df (x) . (7)

However, now for each z W1 we have f (g(z)) = z (because g is inverse to f ) and


also f (g1 (z)) = z (because g1 is inverse to f |U1 ), so since every point of W has a
unique inverse under f we must actually have g(z) = g1 (z) for all z W1 . Thus,
g1 must simply equal the restriction g|W1 . Since W1 is open, this means that g and
g1 agree on some open ball around y, and therefore g is also differentiable at y and
equation (7) actually gives
1
Dg(y) = Df (x) .
Since x was arbitrary, this completes the proof.

12
Corollary 17. The inverse function g : W U output by Theorem 7 is continuous.

Proof. We have just seen that it is differentiable on W , and differentiability at a point


implies continuity at that point (Theorem 11.13 in Wade).

The only part of Theorem 6 that still needs to be proved is that the inverse function
g : W U obtained in Theorem 7 is actually of class C 1 , i.e., its derivative Dg(y)
is continuous as a function of y. However, (6) already gives us a formula for this
derivative, so we simply have to prove that the expression in that formula is continuous.
This comes from the following.
Lemma 18. Suppose that T : Rn Rn is linear and invertible and that > 0.
Then there is some > 0 (depending on T) such that if S : Rn Rn is linear and
kS Tkop < , then S is also invertible and kS1 T1 kop < .

Proof. First we will prove that if < 1/kT1 kop and kS Tkop < , then S is also
invertible.
Suppose that S(x) = S(y). Letting u = x y, this means S(u) = 0. For injectivity,
we need to prove that u = 0. To see this, observe that
T(u) = S(u) + (T S)(u) = (T S)(u),
so
kT(u)k kS Tkop kuk.
On the other hand, we always have
kuk = kT1 (T(u))k kT1 kop kT(u)k,
so combining these inequalities gives
1
kuk kS Tkop kuk.
kT1 kop
1
Since, on the other hand, kS Tkop < kT1 kop , this is possible only if u = 0.

This proves that S is injective. Since S is linear, it is represented by an (n n)-matrix


B, and now the rank-nullity formula implies that B has an inverse if and only if it has
full rank, if and only if there are no non-zero solutions to the equation Bu = 0. Thats
what we have just proved, so the bijectivity of S follows.
Finally, suppose that kS Tkop < < 1/kT1 kop , and also that
u Rn , x = S1 (u) and y = T1 (u).
Then y x = T1 S(x) x = T1 (S T)(x), and so

ky xk kT1 kop kS Tkop kxk kT1 kop kxk


kT1 kop (kx yk + kyk).

13
Since kT1 kop < 1, we may we-arrange this to obtain

kT1 kop kT1 kop kTkop


ky xk 1
kyk kuk.
1 kT kop 1 kT1 kop

Given > 0, we may choose > 0 so small that

1 kT1 kop kTkop


< and also < ,
kT1 kop 1 kT1 kop

and now the above shows that

kS Tkop < = kS1 (u) T1 (u)k < kuk.

This completes the proof.

Corollary 19. The function Dg given by (6) is continuous.

Proof. We assumed f is of class C 1 , so Df is continuous. For any > 0 and y W ,


the preceding Lemma now gives some > 0 (depending on and on kDf (g(y))k1 op ),
such that

kS Df (y)kop < = S invertible and kS1 (Df (y))1 kop < .

Having found this > 0, the continuity of Df gives some > 0 such that

kz yk < = kDf (z) Df (y)kop < ,

and hence

kDg(z) Dg(y)kop = k(Df (z))1 (Df (y))1 kop < .

Since was arbitrary, this proves that Dg is continuous.

14

You might also like