Lecture Notes On Multivariable Calculus
Lecture Notes On Multivariable Calculus
f : Rn → Rm ,
• Is the zero locus f −1 (0) a smooth subset of Rn in a suitable sense, for example a
smooth surface in R3 ?
The first major new idea is to define the derivative at a point as a linear map, which
we can think of as giving a first-order approximation to the behaviour of the function
near that point. A key theme will be that, subject to suitable nondegeneracy assump-
tions, the derivative at a point will give qualitative information about the function on
a neighbourhood of the point. In particular, the Inverse Function Theorem will tell us
that invertibility of the derivative at a point (as a linear map) will actually guarantee
local invertibility of the function in a neighbourhood.
The results of this course are foundational for much of applied mathematics and
mathematical physics, and also for geometry. For example, we shall meet the concept of
a smooth manifold in Rn (intuitively, a generalisation to higher dimensions of a smooth
surface in R3 ), and use our theorems to obtain a criterion for when the locus defined
by a system of nonlinear equations is a manifold. Manifolds are the setting for much
of higher-dimensional geometry and mathematical physics and in fact the concepts of
differential (and integral) calculus that we study in this course can be developed on
general manifolds. The Part B course Geometry of Surfaces and the Part C course
Differentiable Manifolds develop these ideas further.
2
2 Differentiation of functions of several
variables
2.1 Introduction
In this chapter we will extend the concept of differentiability of a function of one variable
to the case of a function of several variables. We first recall the definitions for a function
of one variable.
f (x + h) − f (x)
f 0 (x) := lim
h→0 h
exists. Equivalently we can say that f is differentiable in x ∈ I if there exists a linear
map∗ L : R → R such that
f (x + h) − f (x) − Lh
lim = 0. (2.1)
h→0 h
In this case, L is given by L : h 7→ f 0 (x) · h.
Another way of writing (2.1) is
Rf (h)
f (x + h) − f (x) − Lh = Rf (h) with Rf (h) = o(|h|), i.e. lim = 0. (2.2)
h→0 |h|
This definition is more suitable for the multivariable case, where h is now a vector, so it
does not make sense to divide by h.
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
exists.
∗
Here and in what follows, we will often write Lh instead of L(h) if L is a linear map.
3
It is easily seen that f is differentiable at x ∈ I if and only if fi : I ⊆ R → R is
differentiable in x ∈ I for all i = 1, . . . , m. Also, f is differentiable in x ∈ I if and only
if there exists a linear map L : R → Rm such that
f (x + h) − f (x) − Lh
lim = 0.
h→0 h
You may check yourself that this defines a norm (and hence a metric). For the proof of
the triangle inequality you will need to use the Cauchy-Schwarz inequality.
We shall also use the matrix (Hilbert-Schmidt) norm
1
n 2
X
2
kCk = Cij
i,j=1
kABk ≤ kAkkBk.
4
2.2 Partial derivatives
We are going to consider† functions f : Ω ⊆ Rn → Rm . Here and in what follows we
always assume that Ω is an open and connected subset of Rn (a domain).
We first consider a few examples of such functions.
n
!1
X 2
x 7→ |x| = x2i .
i=1
We shall sometimes use the concepts of graphs and level sets. Let f : Ω ⊆ R2 → R be
a function. Then the graph of f , given by
Γf = {(x, y) ∈ Ω × R | y = f (x)},
†
We will use the shorthand f : Ω ⊆ Rn → Rm ’ to mean that “Ω is a domain in Rn and f : Ω → Rm
is a function”.
5
Definition 2.2.2. (Partial derivative) Let f : Ω ⊆ Rn → R. We say that the i-th
partial derivative of f in x ∈ Ω exists, if
∂f f (x + tei ) − f (x)
(x) = lim
∂xi t→0 t
exists, where ei is the i-th unit vector. In other words, the i-th partial derivative is the
derivative of g(t) = f (x + tei ) at t = 0.
Other common notations for the i-th partial derivative of f at x are
∂i f (x), Di f (x), ∂xi f (x), fxi (x) or f,i (x).
We will mostly use ∂i f (x). If f : Ω ⊆ R2 → R we often write f = f (x, y) and denote the
partial derivatives by ∂x f and ∂y f respectively.
Example 2.2.3. a) Let f : Rn → R be given by f (x) = |x|. Then for x 6= 0 we have
1 1 |x + tei |2 − |x|2 1 2txi + t2 2xi + t xi
|x + tei | − |x| = = = →
t t |x + tei | + |x| t |x + tei | + |x| |x + tei | + |x| |x|
as t → 0. Hence, for x 6= 0, the function f has partial derivatives, given by
xi
∂i f (x) = |x| . Notice that no partial derivative of f exists at x = 0.
b) Let f (x) = g(r) with r(x) = |x| and differentiable g : [0, ∞) → R. Then, for x 6= 0,
by the Chain Rule from Mods Analysis 2, we find
xi g 0 (r)
∂i f (x) = g 0 (r)∂i r(x) = g 0 (r) = xi .
|x| r
The following example shows that, surprisingly, functions whose partial derivatives all
exist are in general not continuous.
Example 2.2.4. Let f : R2 → R
xy
(x2 + y 2 )2
for (x, y) 6= (0, 0),
f (x, y) =
0 for (x, y) = (0, 0).
6
• But: f is not continuous at (x, y) = (0, 0). To see this, consider the behaviour of
the function on the line {(t, t) : t ∈ R}. On this line, f (t, t) = 4t12 which tends to
∞ as t → 0.
This shows that existence of partial derivatives is not the correct notion of differen-
tiability in higher dimensions. We shall see later that the correct generalized version
of differentiability, using the ideas concerning linear maps from the beginning of this
chapter, will imply continuity of the function. We shall also see that functions with
continuous partial derivatives are differentiable in the generalised sense and hence are
also continuous. In our example above, the partial derivatives ∂i f (x) are not continuous
at x = 0.
Before we define differentiablity, we shall make some more remarks about partial
derivatives.
The partial derivative is a special case of the directional derivative which we will now
define.
Example 2.2.6. Let f : Rn → R be given by f (x) = |x| and let x, v ∈ Rn \ {0}. Then
n 1
d d X 2
∂v f (x) = |x + tv| = |xi + tvi |2
dt t=0 dt t=0
i=1
n n
1 X X xi
x
= 2xi vi = vi = ,v ,
2|x| |x| |x|
i=1 i=1
where h·, ·i denotes the usual scalar product in Rn . If v = ei we recover the formula
xi
∂i |x| = |x| from Example 2.2.3.
Definition 2.2.7. (Gradient) Let f : Ω ⊆ Rn → R and assume that all partial deriva-
tives exist at x ∈ Ω. We call the vector field ∇f (x) ∈ Rn given by
∂1 f (x)
∇f (x) =
..
.
∂n f (x)
the gradient of f at x.
Note that the directional derivative is related to the gradient via ∂v f (x) = h∇f (x), vi.
7
x1
|x|
Example 2.2.8. If f (x) = |x| and x 6= 0 then ∇f (x) = ... = x
|x|
xn
|x|
and ∂v f (x) = h∇f (x), vi.
We can write the Jacobian matrix in terms of the gradients of the components:
(∇f1 (x))T
Df (x) = ..
,
.
(∇fm (x))T
∂j1 . . . ∂jk f (x) = ∂j1 (∂j2 . . . ∂jk )f (x) where j1 , . . . , jk ∈ {1, . . . , n}.
Notice that j1 , . . . , jk are not necessarily distinct, and that (a priori) their order is
k f (x)
important. A common notation is ∂∂j ...∂ j
or even ∂j1 j2 ...jk f (x).
1 k
8
b) Let C k (Ω, Rm ) be the set of continuous functions f : Ω → Rm whose partial deriva-
tives exist up to order k for all x ∈ Ω and are continuous in Ω. If m = 1, we write
C k (Ω). Given a domain Θ ⊂ Rm , we will sometimes write C k (Ω, Θ) to be the set
of functions f ∈ C k (Ω, Rm ) with f (Ω) ⊂ Θ.
∂i ∂j f (x) = ∂j ∂i f (x)
f (x+te )−f (x)
Proof. (Not examinable) Let ∂jt f (x) = j
t be the difference quotient of f in
xj . By definition
∂i ∂j f (x) = lim (lim ∂is ∂it f (x))
s→0 t→0
We need to show that both limits can be interchanged. By the Intermediate Value
Theorem we have for all functions g : Ω → R, for which ∂i g(x) exists, that
Corollary 2.3.3. Suppose that f ∈ C k (Ω). Then all partial derivatives up to order k
can be interchanged.
The following example shows that the condition in Proposition 2.3.2 that the second
partial derivatives must be continuous is indeed necessary.
x2 − y 2
xy 2 for (x, y) 6= (0, 0);
f (x, y) = x + y2
0 for (x, y) = (0, 0).
One can show that f ∈ C 1 (R2 ), but ∂x ∂y f (0, 0) = 1 and ∂y ∂x f (0, 0) = −1.
9
2.4 Differentiability
We will now introduce the notion of the (total) derivative which is based on the idea
that the function can be approximated at a point by a linear map.
f (x + h) − f (x) = A(x + h) − Ax
= Ax + Ah − Ax
= Ah
So in this case f (x + h) − f (x) is exactly given by the linear term Ah and the
remainder term Rf (h) is zero. So f is differentiable and the linear map L = df (x)
is given by df (x) : h 7→ Ah.
10
2. Let C = (cij ) ∈ Mn×n (R) be symmetric and let f : Rn → R be the quadratic form
corresponding to C, that is f (x) = xT Cx = hx, Cxi. Letting h ∈ Rn , we see:
f (x + h) − f (x) = hx + h, C(x + h)i − hx, Cxi
= hx, Cxi + hh, Cxi + hx, Chi + hh, Chi − hx, Cxi
= 2 hCx, hi + hh, Chi
where we use the fact that hx, Chi is a scalar so
hx, Chi = xT Ch = (xT Ch)T = hT C T x = hT Cx = hh, Cxi
as C is symmetric. Hence a candidate for df (x) is (2Cx)T , as 2(Cx)T h = 2 hCx, hi.
Indeed,
f (x + h) − f (x) − 2(Cx)T h
= hh, Chi ≤ |h||Ch|
|h| |h| |h|
≤ kCk |h| → 0 for h → 0 ,
P 1
n 2 2 . Thus f is differentiable at every x ∈ Rn and df (x) =
where kCk = c
i,j=1 ij
(2Cx)T , that is df (x)h = (2Cx)T h = h2Cx, hi.
3. f : Mn×n (R) → Mn×n (R), f (A) = A2 .
Let H ∈ Mn×n (R). Then
f (A + H) − f (A) = (A + H)(A + H) − A2
= AH + HA + H 2
The linear term AH + HA is a candidate for the derivative:
f (A + H) − f (A) − (AH + HA) H2
= →0 as H → 0.
|H| |H|
Hence f is differentiable in every A ∈ Mn×n (R) with df (A)H = AH + HA.
There is a nice general formula for the differential df (x) in terms of the Jacobian
matrix Df (x).
Proposition 2.4.4. If f : Ω ⊆ Rn → Rm is differentiable at x ∈ Ω, then f is continuous,
the partial derivatives ∂1 f (x), . . . , ∂n f (x) exist and
df (x)h = Df (x)h
for all h ∈ Rn . That is, with h = ni=1 hi ei we have
P
df1 (x)h ∂1 f1 (x) . . . ∂n f1 (x) h1
.. .. .. .. ..
= . .
. . . .
dfm (x)h ∂1 fm (x) . . . ∂n fm (x) hn
In other words, the Jacobian matrix Df (x) is the representation of df (x) with respect to
the given basis e1 , . . . , en .
11
Proof. It suffices to prove the statement for m = 1. Continuity of f at x follows from
lim f (x + h) − f (x) = lim Lh − Rf (h) = 0.
h→0 h→0
To show that the partial derivatives exist, choose h = tei . Then differentiability of f at
x implies 1
f (x + tei ) − f (x) − Lei → 0 as t → 0.
t
Hence ∂i f (x) = Lei . Since h = ni=1 hi ei we find Lh = ni=1 hi Lei = ni=1 hi ∂i f (x).
P P P
This can also be seen as the definition of the gradient. The gradient of f is the
vector field ∇f such that Lh = h∇f, hi for all h ∈ Rn .
Proof. ItPsuffices to consider the case m = 1. Let h ∈ Rn ; a candidate for the derivative
is Lh = nk=1 ∂k f (x)hk .
Let x0 = x, xk = x + kj=1 hj ej , such that xn = x + h. Then f (xk ) − f (xk−1 ) =
P
f (xk−1 + hk ek ) − f (xk−1 ) and the Mean Value Theorem (for functions of one variable)
12
implies f (xk ) − f (xk−1 ) = ∂k f (xk−1 + θk hk ek )hk with θk ∈ [0, 1]. Hence
n n
|f (x + h) − f (x) − Lh| 1 X X
= (f (x ) − f (x )) − ∂ f (x)h
k k−1 k k
|h| |h|
k=1 k=1
n n
1 X X
= ∂k f (xk−1 + θk hk ek )hk − ∂k f (x)hk
|h|
k=1 k=1
1
n n
!
|h| X X 2 2
≤ ∂k f (xk−1 + θk hk ek ) − ∂k f (x)
|h|
k=1 k=1
→ 0 for h → 0,
since ∂k f is continuous in x.
Proof. We define A = dg(x) and B = df (g(x)). We need to show that d(f ◦ g)(x) = BA.
Since g and h are differentiable, we have
and
f (y + η) = f (y) + Bη + Rf (η) with η ∈ Rm , Rf (η) = o(|η|).
We choose now η = g(x + h) − g(x) = Ah + Rg (h). Then
so
f (g(x + h)) = f (g(x)) + B(Ah + Rg (h)) + Rf (Ah + Rg (h)) .
It remains to show g(h) = BRg (h) + Rf (Ah + Rg (h)) = o(|h|) To that aim notice that
13
and, for sufficiently small |h|,
Hence
Rf (Ah + Rg (h))
→ 0 for |h| → 0
|h|
Remark 2.6.5. The direction of ∇f (x) is the direction of steepest ascent at x, (−∇f (x))
is the direction of steepest descent. Indeed, consider any v ∈ Rn with |v| = 1. Then
|df (x)v| = | h∇f (x), vi | ≤ |∇f (x)||v| = |∇f (x)|1 = |∇f (x)|
∇f (x)
and equality holds if v = |∇f (x)| .
14
2.7 Mean Value Theorems
Our goal in this section is to use information about the derivative of a function to obtain
information about the function itself.
Remark 2.7.1. In the case n = 1 we know the following Mean Value Theorem for a
differentiable function f : f (x) − f (y) = f 0 (ξ)(x − y) for some ξ ∈ (x, y). We cannot
generalize this, however, for vector-valued functions, since in general we get a different
ξ for every component. The RFundamental Theorem of Calculus does not have this
y
disadvantage: f (y) − f (x) = x f 0 (ξ) dξ is also true for vector-valued functions, but
0
requires f to be integrable.
We are now going to prove some versions of the Mean Value Theorem for functions of
several variables.
Proof. Let γ(t) = tx + (1 − t)y, t ∈ [0, 1], and F (t) = f (γ(t)). Then f (x) = F (1) and
f (y) = F (0). The Chain Rule implies that f is differentiable and
d
F (t) = df (γ(t))γ 0 (t) .
dt
By the Mean Value Theorem for n = 1 there exists τ ∈ (0, 1), such that F (1) − F (0) =
F 0 (τ ). Hence
Proof. Connect two arbitrary points by a polygon and apply the Mean Value Theorem
to each part.
Proof. Use the Fundamental Theorem of Calculus and the Chain Rule for for F (t) =
f (γ(t)).
15
Remark 2.7.5. Another version: let x ∈ Ω, ξ ∈ Rn und ∀t ∈ [0, 1] : x + tξ ∈ Ω. Then
Z 1
f (x + ξ) − f (x) = df (x + tξ)ξ dt .
0
Proposition 2.7.6. Let Ω ⊆ Rn be open and convex, i.e. for all points x, y ∈ Ω we also
have that the line segment [x; y] ⊂ Ω. Suppose that f ∈ C 1 (Ω, Rm ) and supx∈Ω |Df (x)| ≤
K. Then we have for all x, y ∈ Ω that
Proof. Exercise.
16
3 The Inverse Function Theorem and the
Implicit Function Theorem
The Inverse Function Theorem and the Implicit Function Theorem are two of the most
important theorems in Analysis. The Inverse Function Theorem tells us when we can
locally invert a function; the Implicit Function Theorem tells us when a function is given
implicitly as a function of other variables. We will discuss both theorems in Rn here,
but they are also valid in basically the same form in infinite-dimensional spaces (more
precisely, in Banach spaces). Their applications are vast and we can only get a glimpse
of their significance in this course.
The flavour of both results is similar: we linearise the problem at a point by taking
the derivative df . Now, subject to a suitable nondegeneracy condition on df , we obtain a
result that works on a neighbourhood of the point. In this way we go from an infinitesimal
statement to a local (but not a global) result.
The theorems are equivalent; the classical approach, however, is to prove first the
Inverse Function Theorem via the Contraction Mapping Fixed Point Principle and then
deduce the Implicit Function Theorem from it. The proof of the Inverse Function The-
orem is however lengthy and technical and we do not have the time to go through it in
this lecture course. We recommend the books by Spivak (Calculus on Manifolds) [6]or
Krantz and Parks (The Implicit Function Theorem, History, Theory and Applications,
Birkhäuser) where you can also find an elementary (but still not short) proof of the Im-
plicit Function Theorem which does not use the Inverse Function Theorem. The latter
then follows directly as a corollary from the Implicit Function Theorem.
In these lecture notes we first prove the Implicit Function Theorem in the simplest
setting, which is the case of two variables. We then state carefully the Implicit Function
Theorem and the Inverse Function Theorem in higher dimensions, deduce the Implicit
Function Theorem from the Inverse Function Theorem, and give some examples of ap-
plications.
17
The Implicit Function Theorem describes conditions under which certain variables can
be written as functions of the others. In R2 it can be stated as follows.
Theorem 3.1.1. (Implicit Function Theorem in R2 ) Let Ω ⊆ R2 be open and F ∈
C 1 (Ω). Let (x0 , y0 ) ∈ Ω and assume that
∂f
f (x0 , y0 ) = 0 and (x0 , y0 ) 6= 0 .
∂y
Then there exist open intervals I, J ⊆ R with x0 ∈ I, y0 ∈ J and a unique function
g : I → J such that y0 = g(x0 ) and
f (x, y) = 0 if and only if y = g(x) for all (x, y) ∈ I × J.
Furthermore, g ∈ C 1 (I) with
∂f
(x0 , y0 )
g 0 (x0 ) = − ∂f
∂x
. (3.1)
∂y (x0 , y0 )
∂f
Remark 3.1.2. Obviously, an analogous result is true if ∂x (x0 , y0 ) 6= 0.
∂f
Proof. (Not examinable) Without loss of generality we can assume that ∂y (x0 , y0 ) > 0.
∂f
Due to the continuity of we can also assume – by making Ω smaller if necessary –
∂y
that
∂f
(x, y) ≥ δ > 0 for all (x, y) ∈ Ω . (3.2)
∂y
As a consequence we can find y1 < y0 < y2 such that f (x0 , y1 ) < 0 < f (x0 , y2 ) and due
to the continuity of f we can find an open interval I containing x0 such that
f (x, y1 ) < 0 < f (x, y2 ) for all x ∈ I . (3.3)
The Intermediate Value Theorem and (3.2) imply that for each x ∈ I there exists a
unique y ∈ (y1 , y2 ) =: J such that f (x, y) = 0. Denote this y by g(x). The continuity of
f and the uniqueness of y also imply that g is continuous.
To complete the proof of the theorem, we need to show that g is continuously differ-
entiable in I and that (3.1) holds. With the notation y = g(x) we find
∂f ∂f p
f (x + s, y + t) − f (x, y) = s (x, y) + t (x, y) + ε(s, t) s2 + t2 , (3.4)
∂x ∂y
with ε(s, t) → 0 as (s, t) → 0. We now choose t = g(x + s) − g(x) such that the left hand
side in (3.4) vanishes and obtain
∂f ∂f p
t (x, y) = −s (x, y) − ε(s, t) s2 + t2 . (3.5)
∂y ∂x
We rearrange to obtain
t ∂f ∂f |ε| |t|
+ (x, y)/ (x, y) ≤ ∂f 1+ .
s ∂x ∂y | ∂y (x, y)| |s|
18
|t|
Thus, if we can show that |s| ≤ C as s → 0, then we can let s → 0 in the above inequality
0
to find that indeed g (x) exists for all x ∈ I. For (x, y) = (x0 , y0 ) we find the formula in
(3.1) and the properties of f and g also imply that g 0 is continuous.
|t|
We still need to show that |s| ≤ C . We obtain from (3.5) that
|ε| 1
We can choose now |s| so small such that δ ≤ 2 and then
|t| ∂f ∂f
≤ 2 (x, y)/ (x, y) + 1.
|s| ∂x ∂y
Example 3.1.3. In the example at the beginning of this section we have f (x, y) =
x2 + y 2 − 1. The theorem tells us that we this relation defines y as a function of x in
a neighbourhood of a point where ∂f∂y is nonzero, that is, in a neighbourhood of points
other than (±1, 0).
Example 3.1.4. We show that for sufficiently small a > 0 there exists a function
g ∈ C 1 (−a, a) with g(0) = 0 such that
Indeed, define f : R2 → R via f (x, y) = y 2 x+2x2 ey −y. Then f (0, 0) = 0 and ∂y f (0, 0) =
−1. Hence the Implicit Function Theorem implies the existence of the function g as
claimed. Furthermore we can compute g 0 (0) = −∂x f (0, 0)/∂y f (0, 0) = 0. Of course, we
cannot hope for an explicit expression for g, but the Implicit Function Theorem tells us
quite easily that such a g exists.
• Rn = Rk × Rm 3 (x1 , . . . , xk , y1 , . . . ym ) =: (x, y)
• f : Ω ⊆ Rn → Rm , (x0 , y0 ) ∈ Ω ⊆ Rn , f (x0 , y0 ) =: z0
19
It is instructive to consider first the linear case. Let f (x, y) = Ax + By with A ∈
Mm×k (R) and B ∈ Mm×m (R). If B is invertible then the equation f (x, y) = Ax0 +
By0 =: z0 can be solved for y via
y = B −1 z0 − Ax .
where
∂fj
Dx f (x, y) = ∈ Mm×k (R) (j = 1, . . . , m; i = 1, . . . , k)
∂xi
and
∂fj
Dy f (x, y) = ∈ Mm×m (R) (j = 1, . . . , m; i = 1, . . . , m) .
∂yi
Then
If the remainder term were zero, then we would have f (x, y) = f (x0 , y0 ) = z0 iff
Hence there exists a function g(x) auch that F (x, y) = z0 iff y = g(x), as desired.
In the nonlinear case, of course the remainder term is nonzero. The Implicit Function
Theorem is the statement that we can still conclude the existence of such a function g,
subject to the nondegeneracy condition that Dy f is invertible.
Theorem 3.2.1. (The Implicit Function Theorem) Let f : Ω ⊆ Rk+m → Rm , where
n = k + m, f ∈ C 1 (Ω, Rm ) and let (x0 , y0 ) ∈ Ω with z0 = f (x0 , y0 ). If Dy f (x0 , y0 )
is invertible then there exist open neighbourhoods U of x0 and V of y0 , and a function
g ∈ C 1 (U, V ) such that
Furthermore −1
Dg(x0 ) = − Dy f (x0 , y0 ) Dx f (x0 , y0 ) .
20
Example 3.2.2.
a) (Nonlinear system of equations)
Consider the system of equations
x3 + y13 + y23 − 7
0
f (x, y1 , y2 ) = = .
xy1 + y1 y2 + y2 x + 2 0
The function f is zero at the point (2, −1, 0) and
3x2 3y12 3y22
Df (x, y1 , y2 ) = ,
y1 + y2 x + y2 x + y1
hence
3 0
Dy f (2, −1, 0) = with detDy f (2, −1, 0) = 3 6= 0.
2 1
The Implicit Function Theorem implies that there exist open neighbourhoods I of
2 and V ⊆ R2 of (−1, 0) and a continuously differentiable function g : I → V , with
g(2) = (−1, 0), such that
f (x, y1 , y2 ) = 0 ⇔ y = (y1 , y2 ) = g(x) = (g1 (x), g2 (x))
for all x ∈ I, y ∈ V . Furthermore, the derivative of g at x0 = 2 is given by
−1
3 0 12 1 1 0 12 −4
Dg(2) = − =− = .
2 1 −1 3 −2 3 −1 9
Consider the point (2, 5, −1, 0) such that f (2, 5, −1, 0) = (0, 5)T . The Jacobian
matrix of f is
ev
2x u y
Df (x, y, u, v) = .
2 0 2u − v −u
Hence
ev
y 5 1
D(u,v) f (x, y, u, v) = and D(u,v) f (2, 5, −1, 0) = .
2u − v −u −2 1
Since det Df (2, 5, −1, 0) = 7 6= 0, the Implicit Function Theorem implies that
there exist open neighbourhoods U ⊂ R2 of (2, 5) and V ⊂ R2 of (−1, 0) and a
function g ∈ C 1 (U, V ) with g(2, 5) = (−1, 0) and f (x, y, g(x, y)) = (0, 5)T for all
(x, y) ∈ U . We can also compute that
−1
5 1 4 −1 1 2 −1
Dg(2, 5) = − =− .
−2 1 2 0 7 18 −2
21
c) (Writing a surface locally as graph)
Let h : R3 → R with h(x, y, z) = xy − z log y + exz − 1. Can we represent the
’surface’ given by h(x, y, z) = 0 locally in a neighbourhood of (0, 1, 1) either in
the form x = f (y, z), y = g(x, z) or z = p(x, y)? The Jacobian matrix of h is
Dh(x, y, z) = (y + zexz , x − yz , − log y + xexz ) and thus Dh(0, 1, 1) = (2, −1, 0).
Hence, the Implicit Function Theorem tells us that we can represent the surface
locally as x = f (y, z) or y = g(x, z), but it does not tell us whether we can do it
in the form z = p(x, y). In fact, one can show that the latter is not possible.
Now, if the remainder term were not present, then we could just invert the function by
Of course, this will only be true if f is itself linear! The content of the Inverse Function
Theorem is that for general differentiable f there will still be a local inverse, that is, an
inverse defined on a neighbourhood of x0 , provided the Jacobian Df (x0 ) is invertible.
Example 3.3.1. Let f : R → R be given by f (x) = x2 .
• For x0 > 0 or x0 < 0 we have that f is invertible in a neighbourhood of x0
Example 3.3.3. Here is a simple example for a function f : (−1, 1) → (−1, 1) which is
bijective, f ∈ C 1 , but f −1 is not differentiable on (−1, 1).
Let f : (−1, 1) → (−1, 1) be given by f (x) = x3 . Obviously f is bijective with inverse
1
f −1 : (−1, 1) → (−1, 1) given by f −1 (y) = y 3 . Furthermore, f ∈ C ∞ (−1, 1), but f −1 is
not differentiable in any neighbourhood of 0. Hence, f is not a diffeomorphism.
22
We sometimes informally think of a diffeomorphism as a ‘smooth change of coordi-
nates’.
We can now state our theorem.
Example 3.3.6.
a) f : R → R, f (x) = x2 . If x0 > 0 we can choose U = (0, ∞), if x0 < 0 we can choose
U = (−∞, 0).
b) Let f : R+ × R → R2 , be given by
Hence f is locally invertible everywhere, but not globally (in fact f is 2π-periodic
in the ϕ variable). The local inverse can be computed: Let f (r, ϕ) =: (x, y) ∈ R2 :
and let
n π π o
U = (r, ϕ) | ϕ ∈ − , and V = {(x, y) ∈ R2 | x > 0}.
2 2
23
c) The following important example is one we encountered in Example 2.2.1. The
exponential function exp : C → C given by z 7→ ez is, in real cordinates, the map
The Jacobian is
ex cos y −ex sin y
Df (x, y) =
ex sin y ex cos y
and det Df (x, y) = e2x (cos2 y + sin2 y) = e2x which never vanishes. Hence Df is
always invertible, and the Inverse Function Theorem tells us that the exponential
map is a local diffeomorphism. However it is not a global diffeomorphism as it is
not bijective. The map is periodic in y with period 2π–equivalently exp(z + 2πi) =
exp(z). (For those of you who have seen the concept in topology, the exponential
map is a covering map from C onto C − {0}).
d) Let f : (0, ∞) × R → R2 be given by f (x, y) = (cosh x cos y, sinh x sin y) =: (u, v).
Then
sinh x cos y − cosh x sin y
Df (x, y) = .
cosh x sin y sinh x cos y
Hence det Df (x, y) = sinh2 x+sin2 y and thus det Df (x, y) > 0 for all x > 0, y ∈ R.
As a consequence of the Inverse Function Theorem we have that f is locally a
diffeomorphism for all (x, y). (The function f is not a global diffeomorphism as it
is periodic in y.)
Notice that for fixed x > 0 the image f (x, y) describes an ellipse with axes of length
cosh x > 1 and sinh x respectively. Hence f ((0, ∞) × R) = R2 \ {(u, 0) | |u| ≤ 1}.
We conclude by deducing the Implicit Function Theorem from the Inverse Function
Theorem.
Recall the setup of the latter theorem. We have a C 1 map f : Rk+m → Rm . We
write a point of Rk+m as (x, y) with x ∈ Rk and y ∈ Rm , and we assume the m × m
submatrix Dy f of Df at (x0 , y0 ) is invertible. We want to locally find a function g such
that f (x, y) = 0 iff y = g(x).
In order to apply the Inverse Function Theorem we will expand f to a function Rk+m →
R k+m : explicitly, we let
F (x, y) = (x, f (x, y)).
Now
I 0
DF =
Dx f Dy f
so as we are assuming invertibility of Dy f , we have that DF is invertible at (x0 , y0 ). The
Inverse Function Theorem now tells us F has a local differentiable inverse h : (x, y) 7→
(h1 (x, y), h2 (x, y)). We have
24
so h1 (x, y) = x, and hence h(x, y) = (x, h2 (x, y)) with f (x, h2 (x, y)) = f ◦ h(x, y) = y.
In particular, f (x, h2 (x, 0)) = 0, and we can take g(x) = h2 (x, 0).
In fact, this way of approaching the Implicit Function Theorem also yields the following
useful theorem.
f ◦ h(x1 , . . . , xn ) = (xn−m+1 , . . . , xn )
Proof. This is basically contained in the proof of the Implicit Function Theorem above.
After applying a permutation of coordinates (which is a diffeomorphism of Rn ) we can
assume that the m × m matrix formed from the m last columns of Df (a) is invertible.
Now the proof we saw above shows the existence of h such that f ◦ h(x, y) = y, as
required.
25
4 Submanifolds in Rn and constrained
minimisation problems
We are now going to introduce the notion of submanifolds of Rn , which are generalisa-
tions to general dimensions of smooth surfaces in R3 .
4.1 Submanifolds in Rn
Let us begin by looking at hypersurfaces in Rn , that is, zero loci of functions f : Rn → R.
These can be very complicated and singular in general, but we expect that for a generic
choice of f we get a smooth set. We can make this precise using the Implicit Function
Theorem as follows.
Let f : Rn → R be a C 1 -function and M = {x ∈ Rn | f (x) = 0} = f −1 {0} its zero
set. If Df (a) 6= 0 for some a ∈ M , then we know from the Implicit Function Theorem,
that we can represent M in a neighbourhood of a as a graph of a function of n − 1
variables (after a suitable reordering of coordinates xn = h(x1 , . . . , xn−1 ) In this way, a
neighbourhood of a is seen to be diffeomorphic to an open set in Rn−1 . We could also
see this via the result at the end of the previous section, because this tells us that after
a diffeomorphism we can reduce to the case when the map is a projection.
If this kind of behaviour holds for all points a ∈ M , we say M is an n − 1-dimensional
submanifold of Rn . So we know that if Df (x) is nonzero for all x ∈ M , then M is an
n − 1-dimensional submanifold of Rn .
We are now going to generalize this definition to k-dimensional submanifolds of Rn ,
for general k.
26
The next proposition, which is a consequence of the Implicit Function Theorem, will
tell us that a submanifold can be locally represented as a graph of a differentiable
function.
(2) For each x ∈ M we can, after suitably relabelling the coordinates, write x = (z0 , y0 )
with z0 ∈ Rk , y0 ∈ Rn−k and find an open neighbourhood U of z0 in Rk , an open
neighbourhood V of y0 in Rn−k , and a map g ∈ C 1 (U, V ) with g(z0 ) = y0 , such
that
M ∩ (U × V ) = {(z, g(z)) | z ∈ U }.
Remark 4.1.4. In (2) it is important that we remember that the statement is true
only after relabelling the coordinates. For instance, consider again the unit circle S 1 =
(x1 , x2 ) | x1 + x2 = 1 in R . If x = (1, 0) ∈ S 1 we have
2 2 2
p
S 1 ∩ (0, 2) × (−1, 1) = ( 1 − z 2 , z) | |z| < 1 .
Proof. We first show that (1) implies (2): After possibly relabelling the coordinates we
can write x as x = (z0 , y0 ) such that Dy f (x) is invertible. Then property (2) follows
from the Implicit Function Theorem.
Now assume that (2) is satisfied. Define Ω = U × V and f ∈ C 1 (Ω, Rn−k ) via
f (z, y) = y − g(z)
Then M ∩ Ω = f −1 {0} and Df (z, y) = (−Dg(z), Idn−k ). It follows that rankDf (z, y) =
n − k.
Remark 4.1.5. In fact it is possible to use these ideas to define abstract k-dimensional
manifolds without reference to an embedding in Rn . Roughly speaking, such a manifold
is a space covered by open sets (‘charts’), each homeomorphic to an open set in Rk , such
that the charts fit together smoothly in a suitable sense. It is in fact possible to transfer
the machinery of differential and integral calculus to such abstract manifolds. These
ideas are explored further in the Part C course on Differentiable Manifolds.
Example 4.1.6.
a) (Curves in R2 )
27
i) The unit circle in R2 :
∗ A definition as the level set of a function is given by {(x, y) ∈ R2 |
x2 + y 2 − 1 = 0}. Note that this is f −1 (0) where f (x, y) = x2 + y 2 − 1, and
that Df (x, y) = (2x, 2y) which has rank 1 at all points of the unit circle
(in fact, at all points except the origin). So the circle is a 1-dimensional
submanifold of R2 .
∗ A√local representation as a graph of a function is for example y(x) =
± 1 − x2 ; x ∈ [−1, 1].
∗ A parametrisation is given by γ : [0, 2π) → R2 ; γ(t) = (cos t, sin t).
ii)
M = {(x, y) ∈ R2 | x3 − y 2 − 1 = 0}
defines a one-dimensional submanifold (a regular curve) in R2 . Here, Df (x, y) =
(3x2 , −2y) which again has rank 1 on all points of M .
iii) Consider now
M = {(x, y) ∈ R2 | x2 − y 2 − 0}
Now M = f −1 (0) where f (x, y) = x2 − y 2 , and Df (x, y) = (2x, −2y) has rank
1 except at the origin, which is on the curve. So the curve has the submanifold
property away from the origin, but not at the origin itself. Geometrically, we
can see that the curve is the union of the two lines y = x and y = −x, which
meet at (0, 0). So away from the origin the curve looks like a 1-dimensional
submanifold, but this breaks down at the origin.
b) (Ellipsoids)
An ellipsoid is given by
x21 x22 x23
3
M = x ∈ R | f (x) = 2 + 2 + 2 − 1 = 0 .
a b c
for some a, b, c > 0. We check that this defines a two-dimensional submanifold of
R3 . Indeed, x x x
1 2 3
Df (x) = 2 2 , 2 , 2
a b c
and thus Df (x) = 0 if and only if x = 0, but x = 0 ∈/ M.
c) (Torus)
Let n = 3 and k = 2. For 0 < r < R a torus is given by
p 2
T = (x, y, z) ∈ R3 | f (x, y, z) = x2 + y 2 − R + z 2 − r 2 = 0 .
That is, the torus consists of the points in R3 which have distance r to a circle
with radius R. The defining function f is continuously differentiable away from
28
the z-axis. However, when r < R the torus does not contain any point on the
z-axis. We calculate
p x p y
Df (x, y, z) = 2 x2 + y 2 − R p , 2 x2 + y 2 − R p , 2z
x2 + y 2 x2 + y 2
d) (Orthogonal group)
We claim that n o
O(n) = X ∈ Mn×n (R) | X T X = Id
2
is a submanifold of Rn of dimension 21 n(n − 1).
To see this let n o
S(n) = X ∈ Mn×n (R) | X T = X
f (X) = X T X .
df (X)H = H T X + X T H ∈ S(n).
It remains to show that for all X ∈ O(n) the map df (X) is surjective. Let Z ∈ S(n)
and define H := 12 XZ. Then
1 1 1
df (X)H = Z T X T X + X T XZ = (Z T + Z) = Z .
2 2 2
Hence the range of df (X) is S(n), thus rank df (X) = dim S(n) = 12 n(n + 1) and
O(n) is a submanifold of dimension k = n2 − 12 n(n + 1) = 21 n(n − 1).
This is an example of a Lie group, a manifold which is also a group, such that the
group operations of multiplication and inversion are given by differentiable maps.
Many symmetry groups in physical problems turn out to be Lie groups. This
important topic linking geometry and algebra is the subject of the Part C course
Lie Groups. There are also many excellent books on the subject [1, 2, 4].
We now define an important concept for manifolds, the tangent space at a point of
the manifold.
Definition 4.1.7. (Tangent vector, tangent space, normal vector) Let M ⊆ Rn be a
k-dimensional submanifold of Rn .
29
1. We call v ∈ Rn a tangent vector to M at x ∈ M , if there exists a C 1 -function
γ : (−ε, ε) → Rn , such that γ(t) ∈ M for all t ∈ (−ε, ε), γ(0) = x and γ 0 (0) = v.
2. The set of all tangent vectors to M at x is called the tangent space to M at x, and
we denote it by Tx M .
3. We call w ∈ Rn a normal vector to M at x ∈ M if hw, vi = 0 for all v ∈ Tx M .
Thus the set of all normal vectors to M at x is precisely the orthogonal complement
Tx M ⊥ of Tx M in Rn .
Next we prove the generalisation of the property that ‘the gradient is perpendicular
to the level sets of a function’ (see Corollary 2.6.4). This result in particular also shows
that Tx M is indeed a k-dimensional vector space and as a consequence that the space
of normal vectors is an (n − k)-dimensional vector space.
Proposition 4.1.8. Let M be a k-dimensional submanifold of Rn . Let Ω be an open
subset of Rn and let f ∈ C 1 (Ω, Rn−k ) be such that M ∩ Ω = f −1 {0} and rankDf (x) =
n − k for all x ∈ Ω. Then we have
Tx M = ker Df (x),
for all x ∈ M ∩ Ω, that is the tangent space equals the kernel of Df (x).
Proof. We first claim that Tx M ⊆ ker Df (x):
Indeed, let v ∈ Tx M , then there exists γ : (−ε, ε) → M such that
γ(0) = x and γ 0 (0) = v.
It follows for all t ∈ (−ε, ε), that f (γ(t)) = 0. Hence
d
0= f (γ(t)) = Df (γ(t))γ 0 (t)
dt
and for t = 0 we find 0 = Df (x)v, hence v ∈ ker Df (x).
Now recall that, possibly after a suitable relabelling, we can assume in view of Propo-
sition 4.1.3 that x = (z0 , y0 ) ∈ Rk × Rn−k and that there exist open subsets U ⊆ Rk
with z0 ∈ U and V ⊆ Rn−k with y0 ∈ V and a function g ∈ C 1 (U, V ) with g(z0 ) = y0
such that
M ∩ (U × V ) = {(z, g(z)) | z ∈ U }.
We define G : U → Rn by G(z) = (z, g(z)) and for an arbitrary ξ ∈ Rk and sufficiently
small ε we let γ : (−ε, ε) → M be given by
γ(t) = G(z0 + tξ) .
Then γ 0 (t) = DG(z0 + tξ)ξ and
0 Idk
γ (0) = DG(z0 )ξ with DG(z0 ) = .
Dg(z0 )
Hence imDG(z0 ) ⊆ Tx M and thus we have shown so far that imDG(z0 ) ⊆ Tx M ⊆
kerDf (x). But DG(z0 ) is obviously injective, hence dim imDg(z0 ) = k = n−rankDf (x) =
dim kerDf (x). Hence imDg(z0 ) = kerDf (x) = Tx M .
30
Example 4.1.9.
Tx M = {v ∈ Rn | 2 hv, Axi = 0}
2
We have seen that O(n) is a submanifold of Rn of dimension 21 n(n − 1) and we
also have Id ∈ O(n). With df (X)H = X T H + H T X and df (Id)H = H T + H it
follows
TId M = {H ∈ Mn×n (R) | H T + H = 0},
that is the tangent space at Id is the skew-symmetric matrices.
In fact, the tangent space to a general Lie group at the identity element carries a
very rich algebraic structure, beyond the basic vector space structure it has as a
tangent space. This structure is that of a Lie algebra, a vector space V which also
carries a skew-symmetric bilinear map V × V → V satisfying a certain identity
called the Jacobi identity. Writing the bilinear form, as traditional, using bracket
notation [X, Y ], the Jacobi identity is
For matrix Lie groups such as O(n), the bracket is actually given by [X, Y ] =
XY − Y X (check that this does indeed satisfy the Lie algebra axioms!). For
further material on Lie algebras, we refer the reader to the books [3, 2] and also
the Part C course Lie Algebras.
31
4.2 Extremal problems with constraints
We now consider an application of these ideas to the study of constrained extremisation
problems. We know from elementary vector calculus that the critical points of a func-
tion on Rn are the points where the gradient vanishes. We now want to consider the
more subtle problem of extremising a function subject to a constraint–that is, finding
the extrema of a function on some subset of Euclidean space defined by one or more
equations.
Let us first consider the simplest case, where we have two functions f, g ∈ C 1 (R2 ),
and our goal is to minimise (or maximise) g under the constraint that f (x, y) = 0.
We can often (under some assumptions on f ) think of the set
Γ = {(x, y) ∈ R2 | f (x, y) = 0}
as a curve in R2 . Let (x0 , y0 ) be such that for some ε > 0 and all (x, y) ∈ Γ ∩ Bε (x0 , y0 ):
Suppose that ∇f (x0 , y0 ) 6= 0, and assume without loss of generality that ∂y f (x0 , y0 ) 6=
0. The Implicit Function Theorem guarantees that we can represent Γ in an open
neighbourhood of (x0 , y0 ) as (x, ϕ(x)) for x ∈ I, where I is an open interval with x0 ∈
I, ϕ ∈ C 1 (I) and ϕ(x0 ) = y0 . The tangent to Γ at (x, ϕ(x)) is given by the vector
(1, ϕ0 (x)) and since the gradient is perpendicular to the level sets we have
1
⊥ ∇f (x, ϕ(x)).
ϕ0 (x0 )
Define G(x) = g(x, ϕ(x)) and consider the point (x0 , y0 ) where g has a local minimiser
on Γ. Then, by Fermat’s Theorem and the Chain Rule,
0 0 1
0 = G (x0 ) = ∂x g(x0 , y0 ) + ∂y g(x0 , y0 )ϕ (x0 ) = ∇g(x0 , y0 ), .
ϕ0 (x0 )
32
Theorem 4.2.1. (Theorem on Lagrange multipliers) Let Ω ⊆ Rn be open, g ∈ C 1 (Ω)
and f ∈ C 1 (Ω, Rn−k ). If x0 ∈ f −1 {0} is a local extremum of g on f −1 {0}, that is there
exists an open neighbourhood V of x0 such that for all x ∈ V which satisfy f (x) = 0 we
have
g(x) ≥ g(x0 ) (or g(x) ≤ g(x0 )),
and if rankDf (x0 ) = n − k, then there exist λ1 , . . . , λn−k ∈ R, such that
n−k
X
∇g(x0 ) = λi ∇fi (x0 ).
i=1
Proof. If V is sufficiently small then we have for all x ∈ V that rankDf (x) = n − k,
hence M = f −1 {0} ∩ V is a k-dimensional submanifold of Rn . For a v ∈ Tx0 M let
γ : (−ε, ε) → M be a C 1 -function such that γ(0) = x0 and γ 0 (0) = v. The function g ◦ γ
has in t = 0 a local minimum. Thus
d
g(γ(t))|t=0 = ∇g(γ(t)), γ 0 (t) |t=0 = h∇g(x0 ), vi
0=
dt
and thus ∇g(x0 ) ∈ (Tx0 M )⊥ . Furthermore we have for all x ∈ M and i = 1, . . . , n − k
that
fi (x) = 0, and thus in particular ∇fi (x0 ) ⊥ Tx0 M.
Since rankDf (x0 ) = n − k the vectors ∇fi (x0 ) are linearly independent and form a basis
of (Tx0 M )⊥ . Hence there exist λ1 , . . . , λn−k such that ∇g(x0 ) = n−k
P
i=1 i ∇fi (x0 ).
λ
Example 4.2.2.
We compute
1 1 1
Df (x, y, z) =
2x 2y 2z
and conclude that rankDf (x, y, z) = 2 for all (x, y, z) ∈ M (as (1, 1, 1) ∈
/ M ).
Hence M is a one-dimensional submanifold of R3 . Furthermore M is compact, g is
33
continuous and hence g attains its infimum and supremum on M . Theorem 4.2.1
implies that there exist λ1 , λ2 ∈ R such that in an extremal point (x, y, z) we have
∇g(x, y, z) = λ1 ∇f1 (x, y, z) + λ2 ∇f2 (x, y, z). Thus, we find for an extremal point
(x, y, z) that
5 1 2x
1 = λ1 1 + λ2 2y
−3 1 2z
and f (x, y, z) = 0. This implies that
n
!1 n
n
Y 1X
ai ≤ ai
n
i=1 i=1
The details of the proof are an exercise on the second problem sheet.
∇f (x) = 2x 6= 0.
34
M is compact, so that g attains its supremum on M in a point x0 . By Theorem
4.2.1 there exists λ ∈ R, such that ∇g(x0 ) = λ∇f (x0 ) that is Ax0 = λx0 . This
implies that x0 is an eigenvector of A with eigenvalue λ. Thus, we have shown
that every real symmetric n × n matrix has a real eigenvalue. We also find that
Since g(x0 ) is the maximal value of g on M this also implies that λ is the largest
eigenvalue of A.
35
Bibliography
[1] T. Bröcker and T. tom Dieck, Representations of compact Lie groups, Graduate
Texts in Mathematics, Springer, 1985.
[2] R. Carter,. G. Segal and I. Macdonald, Lectures on Lie groups and Lie algebras,
LMS Student Texts, CUP, 1995.
[4] B. Hall, Lie groups, Lie algebras and representations : an elementary introduction,
Graduate Texts in Mathematics, Springer, 2003.
[5] S.G. Krantz and H. Parks, The implicit function theorem, Birkahäuser, 2002.
36