Math-UA.326.001: Analysis II Notes For The Inverse Function Theorem
Math-UA.326.001: Analysis II Notes For The Inverse Function Theorem
Math-UA.326.001: Analysis II Notes For The Inverse Function Theorem
001: Analysis II
Notes for the Inverse Function Theorem
Tim Austin
803 Warren Weaver Hall
tim@cims.nyu.edu
http://cims.nyu.edu/tim
Remark 1. Intuitively, this definition means that f makes all distances shorter this
is where the name contraction comes from. However, be aware that the definition of
a contraction is a little bit stronger than just asking that kf (x) f (y)k < kx yk
whenever x 6= y, because the constant c < 1 is fixed independent of x and y. We adopt
the stronger definition for the sake of the Contraction Mapping Principle, stated below.
If f has only this weaker property of making distances shorter, it might not satisfy that
principle: see Homework 8.
Lemma 3. Contractions are uniformly continuous.
1
converges to x, and we have
ky f (y)k
ky xk . (1)
1c
In prose, the last part of this theorem means that if y is approximately fixed, in that
ky f (y)k < for some small , then y is actually close to the genuine fixed point,
in that ky xk < /(1 c).
Proof. Let y E and define the sequence (yn ) n=1 as above. We will first show that
yn converges to a fixed point for which (1) holds, and then show that the fixed point is
unique.
If f (y) = y, then we have already found a fixed point, so assume this is not the case.
Then ky2 y1 k > 0.
Because f is a contraction, for any n one has
Combined with the triangle inequality, this gives a bound on the distance between any
two terms of the sequence as follows: if k m + 1, then
k1
X k1
X
kyk ym k kyn+1 yn k cn1 ky2 y1 k
n=m n=m
m1
= c (1 + c + + ckm1 )ky2 y1 k.
cm1
kyk ym k ky2 y1 k. (2)
1c
2
(which is possible because c < 1 so cN 1 0 as N ), then whenever
k, m N one has
cN 1
kyk ym k ky2 y1 k < .
1c
Therefore, (yn ) n
n=1 is a Cauchy sequence. By the completeness of R , it has a limit
x; and because E is closed, that limit must be in E.
Thus, yn x as n . Since f is continuous this implies f (yn ) f (x) as
n . However, the construction of the sequence yn gives
f (yn ) = yn+1 ,
so this is just the same sequence with the index n shifted by 1. It must therefore have
the same limit x, and so
f (x) = x.
Proof. By the inequality deduced from the Mean Value Theorem, our assumption im-
plies that
kf (x) f (y)k ckx yk x, y Br (a).
Now suppose that x, y Br (a). Then we may choose sequences (xn ) n=1 and
(yn )
n=1 in B r (a) such that x n x and y n y as n . Since f is dif-
ferentiable, it is also continuous, and therefore f (xn ) f (x) and f (yn ) f (y)
and so also,
3
2 The Inverse Function Theorem
is C 1 , and a V . Suppose also that det Df (a) 6= 0. Then there are open subsets
U Rn and W Rn such that
a U and f (a) W ;
the restriction f |U is a bijection U W ;
4
the inverse function g = (f |U )1 : W U is of class C 1 with derivative
satisfying
1
Dg(y) = Df (g(y)) y W. (3)
Example 2. Of course, the assumption that det Df (a) 6= 0 is essential: if we drop
this, then any non-invertible linear function T : Rn Rn gives an example which
cannot have an inverse on any nonempty open set.
Example 3. The assumption that f is not just differentiable, but actually of class C 1 ,
is also essential. Assuming again standard properties of sin, let f : R R be the
function
f (x) = x + 2x2 sin(1/x).
Then f is differentiable and f 0 (0) = 1. However, there is no open set U 3 0 such that
f |U is a bijection from U to another open set. This is because the function f oscillates
too fast as x 0, so that for any > 0 one finds that it still oscillates infinitely many
times on the interval (, ), and hence cannot be injective on that interval. This does
not contradict Theorem 6, because for this f we have
f 0 (x) = 1 + 4x sin(1/x) 2 cos(1/x),
which is not continuous at x = 0. This example is analyzed more carefully in Remark
11.44 in Wade.
The proof of Theorem 6 will take several steps. Most of these will actually go towards
proving the following slightly weaker theorem:
Theorem 7. Suppose that V Rn is open, f : V Rn is C 1 , and a V . Suppose
also that det Df (a) 6= 0. Then there are open subsets U Rn and W Rn such
that
a U and f (a) W ;
the restriction f |U is a bijection U W ;
the inverse function g = (f |U )1 : W U is differentiable at f (a) with
derivative given by
1
Dg(f (a)) = Df (a) . (4)
The only difference between these theorems is in the last conclusion: Theorem 6
promises that g is differentiable everywhere in W , with continuous derivative, and
gives a formula for the derivative. On the other hand, Theorem 7 promises only that g
is differentiable at the particular point f (a), and says nothing about its differentiability
elsewhere.
We will first prove Theorem 7 and then show how Theorem 6 follows from it. Assum-
ing that V , f and a are data satisfying the hypotheses of those theorems, the proofs will
involve the following steps:
5
1. By applying some symmetries, we may assume for the proof of Theorem 7 that
a = f (a) = 0 and Df (0) is the identity mapping.
2. By restricting to a small enough U1 V , U1 3 0, we may arrange that f |U1 is
injective.
3. By restricting further to some U U1 , U 3 0, we may arrange that f |U is
bijective between U and some open set W containing 0. The key new conclusion
here is that both U and W are open. By the end of Step 2, we will already know
that f |U1 is a bijection U1 f (U1 ), but we wont know that f (U1 ) is open.
4. Next we can prove that the inverse function g : W U is also differentiable
at 0 with derivative equal to the identity mapping this will complete the proof
of Theorem 7.
5. Finally, we can show why this implies that g is actually differentiable on the
whole of W , with derivative given by (3) this will complete the proof of The-
orem 6.
We begin by considering the weaker conclusion of Theorem 7. First we will argue that
for the proof of this, we may also assume without loss of generality that
(i) a = 0 V ,
(ii) f (0) = 0, and
(iii) Df (0) = I, the identity function.
To see this, assume for now that we know the conclusion of Theorem 7 whenever
these extra assumptions also hold. We will show how these extra assumptions can be
removed one-by-one.
Lemma 8. If the conclusion of Theorem 7 holds whenever we also have (i), (ii) and
(iii), then it holds whenever we have just (i) and (ii).
Proof. Suppose that V and f satisfy the assumptions of Theorem 7 and also (i) and (ii)
above, but not necessarily (iii). Let
T := Df (0),
so by the assumptions of Theorem 7 itself T is invertible.
Now let f1 = T1 f , so this is still a function V Rn . Note that it is equivalent
to write f = T f1 . Since f is differentiable, the Chain Rule tells us that f1 is also
differentiable, with
Df1 (x) = D(T1 )(f (x)) Df (x) = T1 Df (x).
6
Moreover, this proves that f1 is also C 1 , because
so if
kDf (x) Df (y)kop <
then
[kT1 Df (x) T1 Df (y)kop < kT1 kop .
In addition,
Df1 (0) = T1 Df (0) = T1 T = I.
So f1 satisfies (i), (ii) and (iii), so by our assumption we know that the conclusion of
Theorem 7 holds for f1 : there are open subsets U1 V containing 0 and W1 containing
0 such that
if x U then
and
if y W , then
Lastly, another appeal to the Chain Rule shows that g, like g1 , is differentiable at 0,
with derivative given by
Removing the other assumptions, (i) and (ii), is easier, and we only sketch the proofs.
7
Lemma 9. If the conclusion of Theorem 7 holds whenever we also have (i), (ii), then
it holds whenever we have just (i).
Sketch proof. Suppose that V and f satisfy the assumptions of Theorem 7 and that
a = 0. Define a new function f1 : V Rn by
f1 = A f ,
A(y) = y f (0).
Now another repeated appeal to the Chain Rule shows that if U1 V , W1 Rn and
g1 : W1 U1 are subsets and an inverse function as provided by Theorem 7 for f1 ,
then the suitable subsets and inverse function for f itself are
U = U1 , W = {y + f (0) | y W1 } = A1 (W1 )
and
g = g1 A.
Lemma 10. If the conclusion of Theorem 7 holds whenever we also have (i), then it
holds in general.
Sketch proof. Suppose that V and f satisfy the assumptions of Theorem 7. Define a
new set
V1 = {x a | x V }
and a new function f1 : V1 Rn by
f1 (x) = f (x + a),
A(x) = x + a
8
In conclusion, we have reduced the proof of Theorem 7 to the proof of the following:
Theorem 11. Suppose that V Rn is open, f : V Rn is C 1 , 0 V , and that
f (0) = 0 and Df (0) = I. Then there are open subsets U Rn and W Rn such
that
0 U and 0 W ;
the restriction f |U is a bijection U W ;
the inverse function g = (f |U )1 : W U is differentiable at 0 with deriva-
tive equal to I.
For now we continue to focus on Theorem 7, or rather its special case Theorem 11
For every > 0 there is some () > 0 such that
B2() (0) V
(because V is open) and
kDf (x) Ikop < x B2() (0)
1
(because f is of class C and Df (0) = I).
(Notice that we use a ball of radius 2(), not just (). This will lighten the notation
later. Of course it makes no difference formally, since we may choose () however
small we need.)
Lemma 12. With () as above, f is injective when restricted to the open ball Br (0)
with r = 2(1/2).
Proof. Suppose that x, y Br (0) and that f (x) = f (y). Let u = y x. By the
higher-dimensional Mean Value Theorem (11.32 in Wade), there is some z Br (0)
such that
u (f (y) f (x)) = u (Df (z)(y x)).
Of course, by assumption the left-hand side here is 0. On the other hand, writing Df (z)
as I + (Df (z) I), the right-hand side is equal to
(y x) (y x) + (y x) (Df (z) I)(y x)
= ky xk2 + (y x) (Df (z) I)(y x)
ky xk2 ky xk
Df (z) I
op ky xk
1 1
ky xk2 ky xk2 = ky xk2 .
2 2
Therefore ky xk = 0, and so y = x.
9
2.3 Step 3: bijectivity
Now assume the hypotheses of Theorem 11, and assume also that Br (0) V is such
that f |Br (0) is injective, as provided by the previous lemma.
We next prove that f must hit a whole open ball around 0.
Lemma 13. For any < 1/2, the following holds:
Proof. This is the point at which we will apply the Contraction Mapping Principle.
For part (i), suppose that h B()/2 (0). We need to find some x B() (0) such
that
f (x) = h. (5)
Now, since f is C 1 and Df (0) = I, our intuition is that f should be very well approx-
imated by the identity function close to 0. Define r(x) = f (x) x, so that f is equal
to I plus the remainder r, which should be very small.
By the sum rule,
Dr(z) = Df (z) I,
so for z B() (0) B2() (0) one must have kDr(z)kop . In terms of r, we
seek a point x such that
Thus, we have re-arranged the problem into finding a fixed point for the function
G(x) = r(x) + h.
DG(x) = Dr(x),
which has operator norm at most on B() (0). On the other hand, for any x
B() (0), we have
kG(x)k kr(x)k + khk = kr(x) r(0)k + khk kxk + ()/2 < (),
10
Therefore, the function G is a contraction from B() (0) to itself, so it has a unique
fixed point x. This is the desired solution to equation (5), so we have proved (i).
To prove (ii), let h B()/2 (0) and let x be the pre-image of h obtained above. The
point h itself satisfies
kf (h) hk = kr(h)k khk,
by Corollary 5. Therefore, applying the inequality (1) proved as part of the Contraction
Mapping Principle with y = h, we have
kh xk khk,
1
as required.
Corollary 14. With the same hypotheses as above, there are open sets U Br (0) and
W Rn such that 0 U and 0 W , and f |U is a bijection U W .
Proof. Let = 1/3 (actually any value in (0, 1/2) will do) and let W = B()/2 (0).
Since we chose r = 2(1/2) > (), we have
U = Br (0) f 1 (W ),
it follows that f |U is injective (since U Br (0), and the restriction of f to that larger
set is already injective by Lemma 12). On the other hand, part (i) of Lemma 13 implies
that f (U ) = W , so f is aso surjective. Lastly, f 1 (W ) is open (as the inverse image
of an open set under a continuous map) and so U is also open. Thus f |U is a bijection
U W , as required.
Having found the subsets U V and W = B()/2 (0) in Step 3, we may now let
g : W U be the inverse of f |U . In particular, since f (0) = 0, so also g(0) = 0.
We can now complete the proof of Theorem 11.
Lemma 15. The function g is differentiable at 0 with derivative equal to I.
Proof. This is a matter of checking the definition. Since g(0) = 0, we must show that
11
as h 0.
To see this, recall from Lemma 12 that if < 1/2 and h B()/2 (0), then there is a
unique point x B() (0) such that f (x) = h. Because of that uniqueness, we must
have x = g(h). However, now part (ii) of Lemma 12 gives that
kg(h) hk khk < 2khk,
1
since 1 > 1/2. Therefore, for any > 0, if we choose = (/2)/2, then for any
h B (0) we have
kg(h) hk
kg(h) hk < 2(/2)khk = < ,
khk
This completes the proof of Theorem 11, and so, as explained in Step 1, it also com-
pletes the proof of Theorem 7.
Having proved Theorem 7, in the last step we show how to deduce Theorem 6 from it.
Lemma 16. Suppose that f , V , a and also U and W are as in Theorem 6. Then the
inverse function g = (f |U )1 : W U is differentiable at every point of W , with
derivative given by
1
Dg(y) = Df (g(y)) y W. (6)
Proof. The key point is that for any x U , if we let y = f (x), then we may simply
apply Theorem 7 again at this new point to find further open subsets U1 U and
W1 W such that f |U1 is a bijection U1 W1 , and such that the inverse function
g1 : W1 U1 is differentiable at y = f (x) with
1
Dg1 (y) = Df (x) . (7)
12
Corollary 17. The inverse function g : W U output by Theorem 7 is continuous.
The only part of Theorem 6 that still needs to be proved is that the inverse function
g : W U obtained in Theorem 7 is actually of class C 1 , i.e., its derivative Dg(y)
is continuous as a function of y. However, (6) already gives us a formula for this
derivative, so we simply have to prove that the expression in that formula is continuous.
This comes from the following.
Lemma 18. Suppose that T : Rn Rn is linear and invertible and that > 0.
Then there is some > 0 (depending on T) such that if S : Rn Rn is linear and
kS Tkop < , then S is also invertible and kS1 T1 kop < .
Proof. First we will prove that if < 1/kT1 kop and kS Tkop < , then S is also
invertible.
Suppose that S(x) = S(y). Letting u = x y, this means S(u) = 0. For injectivity,
we need to prove that u = 0. To see this, observe that
T(u) = S(u) + (T S)(u) = (T S)(u),
so
kT(u)k kS Tkop kuk.
On the other hand, we always have
kuk = kT1 (T(u))k kT1 kop kT(u)k,
so combining these inequalities gives
1
kuk kS Tkop kuk.
kT1 kop
1
Since, on the other hand, kS Tkop < kT1 kop , this is possible only if u = 0.
13
Since kT1 kop < 1, we may we-arrange this to obtain
Having found this > 0, the continuity of Df gives some > 0 such that
and hence
14