0% found this document useful (0 votes)

55 views

Lecture Notes MA 2022

This document is a course module on mathematical analysis taught at Bocconi University. It covers topics related to differentiation and integration in Euclidean spaces. Specifically, it introduces the Euclidean space Rd as the space of d-tuples of real numbers. It defines concepts like the dot product, distances, and domains in Rd. It then covers topics such as parametric curves, continuity, differentiation of functions from R2 to R, vector fields, higher order derivatives, path integrals, and integrals of functions of several variables.

Uploaded by

Suryansh Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Lecture Notes MA 2022

Uploaded by

Suryansh Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Mathematical Analysis – Module 2

Bocconi University – course number 30543

Taught during the spring 2022 (version: February 3, 2022)

Hugo LAVENANT
ii
Contents

1 The Euclidean space Rd 1

1.1 The space Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dot product and distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Determinant and cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Domains of Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

I Differentiation 15

2 Parametric curves 17
2.1 Definition and representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Taylor expansion and local behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 A remark on the mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Curves in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Notions of topology in Rd 25
3.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Neighborhood, interior and closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Continuity and differentiability for functions from R2 to R 33

4.1 Definition and representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Complement: continuity, openness and closedness . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6 A particular case of the chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Function from Rd to Rp : vector fields and change of coordinates 47

5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Continuous function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Higher order derivatives 57

6.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Schwarz’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Taylor expansion of second order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 Complement: the algebraic structure of higher order derivatives . . . . . . . . . . . . . . . 62

iii
Bocconi University – course 30543 (Mathematical Analysis module 2)

II Integration 65
7 Path integrals 67
7.1 Integral of a scalar field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Independence of the parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Length of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.4 Integral of a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8 Integrals of functions of several variables 75

8.1 Topological preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.2 Definition of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.4 Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

iv
Chapter 1

The Euclidean space Rd

In most of the course, we will deal with functions of several variables, and whose codomain is not
necessarily the real line. That is, functions which are defined and/or take their values in the set Rd .
Formally, Rd is the space of tuples of d real numbers. In the first semester, you have studied this space
from the point of view of linear algebra, and in the rest of the course will deal with the topological and
differential properties of this space.

1.1 The space Rd

Definition 1.1. We write Rd for the space of d-uples of real numbers. That is, an element x of Rd is
an ordered collection of d real numbers x px1 , x2 , . . . , xd q.

Note that the convention in these notes is that vectors are written in a bold font. In practice, we
represent Rd when d 2 by a plane, and when d 3 by the 3-dimensional space. Moreover, in the case
of R2 a typical point is px, y q while in R3 it is px, y, z q. The canonical basis of Rd will be denoted by
pe1 , e2 , . . . , ed q, that is, ei is the vector in Rd whose components are all equal to 0 but for the i-th one
which is equal to 1.
The space Rd is a vector space, that is we can add vectors of Rd by adding them component wise: if
x, y P Rd then

x1 y1 x1 y1
x y ... ... ... .

xd yd xd yd
We will indifferently write a vector as a line or column vector, using the latter when it makes the operations
clearer. Moreover, we can multiply a vector by a scalar by multiplying each component by the scalar: if
x P Rd and a P R,

x1 ax1
ax a ... : ... .

xd axd

Example 1.2. Let’s look at the function f : R Ñ R2 defined by

3 2t
f ptq
1 t .
This is a function which takes a single number t P R and outputs a point x f ptq p3 2t, 1 tq.
Actually, we can decompose it as

3 2t 3 2
f ptq
1 t 1 t
1
.

1
Bocconi University – course 30543 (Mathematical Analysis module 2)

y
x

x+y

y 2y

−1/2x

Figure 1.1: Summing vectors and multiplying them by a scalar.

f (R)

f (t) = f (0) + t(−2, 1)

t(−2, 1)
f (1)

(−2, 1)

f (0) = (3, −1)

Figure 1.2: Plotting values of f ptq p3 2t, 1 tq.

How to plot f ptq? With this decomposition above, the procedure reads as: (i) start from the point p3, 1q;
(ii) then attach to this point the vector tp2, 1q which is colinear to p2, 1q; (iii) and at the end of this
vector you find f ptq.
Actually in the example above, an element x P Rd can be either thought as a point, that is as a
location in space, or as a vector, that is as a arrow which has a direction and a length (but no fixed
“origin”). Specifically p3, 1q is a point and tp2, 1q is vector. In the first case we represent it as an
actual point, while in the second case we represent with an arrow (by convention an arrow joining the
origin 0 to the point x, but the origin of the vector does not matter). However, just by looking at the
coordinates, that is at the mathematical object x P Rd , it is not possible to distinguish between the two
interpretations. The take home message is: when doing formal manipulations, only think of elements of
Rd as collections of numbers, but to picture them, sometimes picture them as points, sometimes picture
them as vectors (and the difference between “points” and “vectors” depends on the context).

1.2 Dot product and distances

In this section we define (canonical) dot product, or scalar product or inner product (the words “dot
product”, “scalar product” or “inner product” all refer to the same object in our context). We specify

2
Chapter 1. The Euclidean space Rd

here canonical because on Rd there exists more than one dot product, but we work here with the “most
natural” one. A study of vector spaces endowed with arbitrary dot products can be done but is out of
the scope of this course.
Definition 1.3. If x, y belong to Rd , we define their dot product x y P R by

¸
d
xy xi yi .

i 1

The dot product between two vectors is a scalar, that is, a real number.
Example 1.4. If x p2, 1, 3q and y p1, 0, 2q then

x y p2 1q pp1q 0q p3 p2qq 2 0 6 4.

Remark 1.5. With pe1 , e2 , . . . , ed q the canonical basis of Rd there holds x ei xi , that is, doing the dot
product with the i-th element of the canonical basis yields the i-th coordinate.
The following properties of the dot product are in fact the ones required for some function Rd Rd ÑR
to be called a dot product.
Proposition 1.6. Let x, y, z P Rd and a, b P R. Then
(i) Symmetry: x y y x.
(ii) Linearity with respect to the first variable: pax byq z apx zq bpy zq.
(iii) Positive-Definite: x x ¥ 0 and x x 0 if and only if x 0.
Note that by combining (i) and (ii) we get x pay bzq a px yq b px zq, that is linearity with respect
to the second variable. The map px, yq ÞÑ x y is said bilinear.
Proof. The proof of (i) and (ii) follows from the definition. Then for (iii) notice that

¸
d
xx pxi q2 .

i 1

We get a sum of squares which are all non negative, and the sum is equal to 0 if and only if each term is
equal to 0.
Definition 1.7. Two vectors x and y in Rd are said orthogonal if x y 0
That corresponds to your intuition of it, that is that the lines directed by x and y are perpendicular.
Definition 1.8. The norm, or length of a vector x is denoted }x} and defined by
?
}x} x x.

In coordinates, g
f d
f¸
}x} e p q
xi 2 .

i 1

This corresponds to the notion of length that one knows in its everyday life. Note that the norm of a
vector is the “length” of the arrow representing it. However, to compute a distance between points, one
has to compute the length of the vector joining the two points. That is, the distance between x and y is
}x y} the norm of the vector x y.
Proposition 1.9 (Cauchy-Schwarz inequality). Let x, y P Rd . Then

|x y| ¤ }x}}y},
and there is equality if and only if x and y are linearly dependent.

3
Bocconi University – course 30543 (Mathematical Analysis module 2)

Proof. If y 0 then both terms are 0, and x, y are linearly dependent thus the result stands. Let’s now
assume y 0, which is equivalent to }y} ¡ 0.
The idea is to look at the real-valued function f : t P R ÞÑ }x ty}2 . This is real-valued function, and
actually it is a polynomial of degree exactly 2 (as }y} ¡ 0):

f ptq px tyq px tyq px xq 2tpx yq t2 py yq.

On the other hand, we always have f ptq ¥ 0 as the dot product is positive-definite! Thus by the criteria
for non negativity of polynomial of degree 2:

∆ p2 x yq2 4px xqpy yq ¤ 0.

Reshuffling the term and using the definition of the norm yields the result after taking the square root.
Moreover, if there is equality in the Cauchy Schwarz inequality then ∆ p2x yq2 4px xqpy yq 0
thus the polynomial f has a root: there exists t0 P R such that f pt0 q 0. But for such a t0 there holds
f pt0 q }x t0 y}2 0 thus }x t0 y} 0 hence x t0 y 0: this shows that x and y are linearly
dependent.
Remark 1.10. In coordinates this reads:
g g
¸ fd fd
d f¸ f¸

xi yi

¤ e xi p q2e yi 2 , p q

i 1 i i

and it’s actually not obvious that the equality holds. We hope you can appreciate the neat proof of
Cauchy-Schwarz above.
Remark 1.11. The geometric definition of the dot product is

x y }x}}y} cospθq,

where θ is the angle between the vectors x and y. This is consistent with Definition 1.7 as, if x, y 0
then x y 0 if and only if cospθq 0, that is θ π {2 (modulo 2π).
Here °with our axiomatic construction we have started from the definition of the dot product with the
formula xi yi . To recover a geometric interpretation we then define the angle θ in such a way that the
formula holds. Specifically, one defines the angle θ between two non zero vectors by

xy
θ arccos }x}}y} .

The Cauchy-Schwarz equality guarantees that }xx}}yy} P r1, 1s thus it can be written as the cosine of an
angle. Actually, this defines θ P r0, π s. The information that misses is the sign of the angle, which comes
from an orientation of the space (which cannot be obtained just with the dot product).
The Cauchy Schwarz inequality is the central one when dealing with dot products. One of its conse-
quences is the triangle inequality.
Proposition 1.12 (Triangle inequality). If x, y P Rd then

}x y} ¤ }x} }y}.
Proof. We square all terms: indeed

}x y}2 px y q px yq x x 2 px yq y y }x}2 }y}2 2 px yq

Then we use Cauchy-Schwarz on the dot product, and recognize the square of the right hand side:

}x y}2 }x}2 2px yq }y}2 ¤ }x}2 2}x}}y} }y}2 p}x} }y}q2 .

Taking the square root on each side yields the conclusion.

4
Chapter 1. The Euclidean space Rd

x · y = kxk kyk cos(θ)

Figure 1.3: Dot product and angle between vectors.

kx + yk ≤ kxk + kyk

x+y

Figure 1.4: Geometric interpretation of the triangle inequality (Proposition 1.12).

The key properties of the norm } } are summarized in the following proposition.
Proposition 1.13. Let x, y P Rd and a P R. Then

(i) Triangle inequality: }x y} ¤ }x} }y}.

(ii) Homogeneity: }ax} |a| }x}.

(iii) Positivity: }x} ¥ 0 and }x} 0 if and only if x 0.

Proof. We have already proved (i) in Proposition 1.12. The proof of (ii) follows from paxqpaxq a2 px xq
and the definition of the norm. Eventually, (iii) is just a rewriting of (iii) of Proposition 1.6.

A useful consequence of (ii) is the following: if x 0, then }xx} is the only vector which is positively
colinear to x and of norm 1.
You may have already seen the last theorem of this section in a geometry course but that we rephrase
here with our language.

Proposition 1.14 (Pythagoras’s theorem). Let x and y two orthogonal vectors in Rd . Then

}x y}2 }x}2 }y}2 .

Proof. This is obtained from the following formula valid for all x, y P Rd that we already saw in the proof
of the triangle inequality:
}x y}2 }x}2 }y}2 2 x y.
Then to get Pythagoras’s theorem one just uses the definition of orthogonality, which yields x y 0.

5
Bocconi University – course 30543 (Mathematical Analysis module 2)

C
kx − yk2 = kxk2 + kyk2 − 2 x · y

y reads
x−y

BC 2 = AB 2 + AC 2 − 2 AB AC cos(BAC)
d
θ = BAC
d

A
x B

Figure 1.5: Law of cosine, see Remark 1.15.

Plane passing
through (−2, 0, 1)
and normal to
(3, 2, 0)

Point (−2, 0, 1)

Vector (3, 2, 0)

Figure 1.6: Plane of Example 1.16.

Remark 1.15. As we saw in the proof of the Theorem, we can say something even if x and y are not
orthogonal. Indeed, using x y }x}}y} cospθq, we can see that for any vector x, y P Rd

}x y}2 }x}2 }y}2 2}x}}y} cospθq.

ÝÝÑ ÝÑ ÝÝÑ
If A, B, C are points in Rd and x AB and y AC so that x y CB, this correspond to the identity

BC 2 AB 2 AC 2 2 AB AC cospBAC
{q

which is known in geometry as “the law of cosine” (see Figure 1.5). Pythagoras’s theorem corresponds
{ π.
to the case BAC 2

Example 1.16. As an application of what we saw, we can write the equation in R3 of the plane passing
through a point x0 and with a normal vector n. By definition, this is the set of points x such that the
vector joining x0 to x is orthogonal to n. Mathematically, it reads

tx P R3 : px x0 q n 0u.
For instance, the set a point x px, y, zq belongs to the plane passing through p2, 1, 0q and normal to
p3, 2, 0q if and only if
3px 2q 2py 1q 0.

6
Chapter 1. The Euclidean space Rd

e2 e1

z
e1 y y

x x

Figure 1.7: In R2 (left), | detpx, yq| corresponds to the area of the parallelogram delimited by the dashed
purple lines. In R3 (right), | detpx, y, zq| corresponds to the volume of the prism delimited by the dashed
purple lines.

1.3 Determinant and cross product

We recall that given pxp1q , xp2q , . . . , xpdq q a family of d vectors in Rd , one can compute their determinant
detpxp1q , xp2q , . . . , xpdq q. It corresponds to the determinant of the matrix whose columns are the xpiq :
p1q xp2q . . . xpdq
x1
p 1q p2q . . . xpdq
1 1
x2 2
detpxp1q , xp2q , . . . , xpdq q det
x2

.. .. .. .
. . ... .
p 1q x p 2q xd
pdq
xd ...
d

Importantly, the absolute value of detpxp1q , xp2q , . . . , xpdq q corresponds to the d-volume of the volume
built on the pxp1q , xp2q , . . . , xpdq q, see Figure 1.7. Then the sign depends on the orientation of the family.
What is remarkable is that these d-dimensional volumes have a clean algebraic expression (namely the
formula for the determinant that you learned in algebra).
Now let’s turn to the object of this section: in R3 , one can define the cross product of two vectors
and it gives a vector (and not a scalar). As for the dot product, we start from an analytical expression
before giving a geometrical interpretation.
Definition 1.17. Let x px1 , x2 , x3 q and y py1 , y2 , y3 q two vectors of R3 . We define their cross
product x y P R3 as
x2 y3 x3 y2
x y x3 y1 x1 y3 .
x1 y2 x2 y1
Importantly, we can express the different components with determinants as

px yq1 det x2
x3
y2
y3
, px yq2 det x1
x3
y1
y3
, and px yq3 det x1
x2
y1
y2
.

Proposition 1.18. (Characterization of the cross product) If x, y and z belong to R3 then

detpx, y, zq px yq z.
Proof. It has to do with the expansion of the determinant along a column: indeed,

x1 y1 z1

detpx, y, zq det x2 z2 z1 det 2 z2 det px yq z.

x y2 x1 y1 x1 y1
y2 z3 det
x3 y3 x3 y3 x2 y2
x3 y3 z3

7
Bocconi University – course 30543 (Mathematical Analysis module 2)

x×y
x0
x0 × y 0

y0
y

e1 e2

x x00 x00 × y00 = 0

y00
Area kxkkyk | sin(θ)| = kx × yk

Figure 1.8: Example of cross products.

Remark 1.19. One can rather define x y as the unique vector such that detpx, y, zq px yq z holds
for all z and then retrieve Definition 1.17 as a consequence.
Though the formula for the cross product is a bit cumbersome, it has a nice geometrical interpretation.
Actually, it is remarkable that the geometric “definition” below has an analytical counterpart.

Proposition 1.20. Let x, y P R3 and let x y be their cross product. Then

(i) x y is orthogonal to both x and y.

(ii) The norm }x y} is equal to }x}}y}| sinpθq| where θ is the angle between x and y. That is, }x y}
coincides with the surface of the parallelogram generated by x and y.

(iii) There holds detpx, y, x yq ¥ 0, in particular if detpx, y, x yq 0 then px, y, x yq is a basis

with positive orientation.

Remark 1.21. You can convince yourself that the three properties of Proposition 1.20 define a unique
vector. Indeed, if x and y are not colinear (if they are then x y 0) then they define a plane and
there exists only one line passing through 0 and normal this plane. Thanks to (i) we know that x y
must line on this line. Then (ii) gives us the norm of x y, and (iii) its direction.
Remark 1.22. Before doing the proof, let us expand what we mean in (iii) by “positive” orientation. Take
x, y, z three vectors of R3 which form a basis. In particular, we must have detpx, y, zq 0. We say that
the basis has positive (resp. negative) orientation if detpx, y, zq ¡ 0 (resp. detpx, y, zq ¡ 0). Intuitively,
we can move between two basis with the same orientation thanks to a physical deformation of the space.
While to move between two basis with different orientation, we need a reflection, that is, to “reflect one
of the basis in a mirror”.

Proof of Proposition 1.20. We recall that for all z P R3

detpx, y, zq px yq z.

Taking z x, the left hand side vanishes (you have a determinant with two identical vectors), thus
px yq x 0. Similarly, px yq y 0. This proves (i) of Proposition 1.20.
8
Chapter 1. The Euclidean space Rd

Then for (iii) we take z x y and use the positivity of the dot product.
Eventually, to get (ii), one needs to take z on the line perpendicular to the plane spanned by x
and y. Indeed, the prism spanned by x, y and z is a right prism with basis of area }x}}y}| sinpθq| and
height }z} hence its volume is | detpx, y, zqq| }x}}y}}z}| sinpθq|. As z and x y are colinear, then
|px yq z| }x y}}z}. Thus }x}}y}}z}| sinpθq| }x y}}z}, and dividing by }z} yields (ii).
From the explicit expression of the definition we get the following identities for the cross product,
some of them would (actually the linearity) would have been hard to prove simply from the geometric
characterization of Proposition 1.20.

Proposition 1.23. Let x, y, z P R3 and a, b P R. Then

(i) Antisymmetry: x y y x.

(ii) Linearity: pax byq z apx zq bpy zq.

(iii) Linear dependence: x y 0 if and only if x and y are linearly dependent.

Again by combining (i) and (ii) we also get linearity with respect to the second variable.

Proof. The points (i) and (ii) are straightforward to check from the explicit expression of Definition 1.17.
To prove the last point (iii), we first notice that if x, y are linearly dependent then detpx, y, zq 0 for
all z. Taking z x y and using Proposition 1.18, we see that }x y} 0, that is x y 0. On the
other hand, if x, y are linearly independent, then there exists at least one z such that detpx, y, zq 0.
For this z, we have (again thanks to Proposition 1.18) z px yq 0, which implies that x y 0.

Example 1.24. Let’s look for an equation of the plane passing through the points A p3, 0, 1q, B p0, 1, 1q
and C p1, 2, 1q.
ÝÝÑ
We look for a vector normal to the plane. It must be normal to both AB p3, 1, 0q and AC
ÝÑ
p4, 2, 2q. Thus it is normal to

ÝÑ ÝÑ 2
Ý 1
AB AC 6 2 3 .
2 1

Thus a point px, y, z q belongs to the plane if and only if the vector joining A to px, y, z q is normal to
p1, 3, 1q which reads
px 3q 3y pz 1q 0.
This gives the equation of the plane. As a safety check, A, B and C indeed satisfy the equation above.

1.4 Domains of Rd
We will use the word domain to talk about subsets of Rd . There are usually two ways to define a domain
of Rd .

• By parametrization, that is as the image of a function. For instance, we can look at

"
*
3 2t
D x P R such that Dt P R, x
2
1 t .

Here D is the set of x which can be written x f ptq with f ptq p3 2t, 1 tq. The variable t is
usually called the parameter.

• By an equation, that is as the set of points which satisfy some equation. For instance, we can define
(
D px, y, zq P R3 such that 3x 4 .
2y

As discussed above in Example 1.16, this is a plane normal to p3, 2, 0q and passing through p2, 0, 1q.

9
Bocconi University – course 30543 (Mathematical Analysis module 2)

The distinction is not really sound from a logical point of view because any domain D can be given as
parametrized as the identity function (that is the function defined on D and which maps every x to
itself). So by “parametrization”, we mean “simple parametrization”, that is paramerization that is easy
to manipulate.
Notice that it is easy to check if a given point belongs to a domain defined by an equation (just
check if the equation is satisfied); while it is easy to find one point belonging to a domain defined by
parametrization (just take one value of the parameter). On the other, each reverse task (to find a point
which belongs to a domain defined by an equation, or to check if a point belongs to a domain defined by
parametrization) can be much more difficult and usually amounts to solve an equation.
Example 1.25. Let’s define
(
D px, yq P R2 : x2 y2 1 .
This is a domain defined by an equation. We recognize x2 y 2 }px, y q}2 , thus D is made of points
whose norm is exactly 1. That is such that the distance to the origin is 1: in other words D is the unit
circle in R2 . But note that by trigonometry we can build a definition of D by parametrization. Indeed,
"

*
cosptq
D px, yq P R2 : Dt P R, x
y
sinptq
.

Example 1.26. Let’s take D the domain of R2 parametrized as

# # +
px, yq P R 2
: Dt P R, :
x 3 2t,
D
y 1 t.
that we already encountered in Example 1.2. We have seen that this is a line passing through p3, 1q
and directed by p2, 1q. In particular, a normal vector to this line is p1, 2q (because p1, 2q is the
image of p2, 1q by the rotation of angle π {2). Thus D is also characterized as the set of points px, y q
such that the vector joining p3, 1q to px, y q is normal to p1, 2q, that is by the equation

px 3q 2py 1q 0,

which reads D tpx, yq P R2 : x 2y 1u.

Example 1.27. On the other hand, let’s consider in R3 the domain D defined by (a set of) equations:
# # +
px, y, zq P R 3
:
x y z 1,
2x 3y z 1.
D

The domain D is the intersection of two planes which are not colinear, it should be a line. The first plane
has p1, 1, 1q has a normal vector while the second has p2, 3, 1q as normal vector. Thus a vector parallel
to the line D must be orthogonal to both p1, 1, 1q and p2, 3, 1q, thus it must be parallel to

1 2 4
1 3 1 .
1 1 5
Moreover, we must find at least one point in D. For instance, p0, 0, 1q works. We conclude that D is the
line passing through p0, 0, 1q and parallel to p4, 1, 5q, this reads
$ $ ,
'
& '
&x 4t, / .
D 'px, y, zq P R3 : Dt P R, :
'
y t, /
% %
z 1 5t. -
Below is a list of some standard domains in R2 and R3 .

10
Chapter 1. The Euclidean space Rd

x2 2
Ellipse a2
+ yb2 = 1
(0, b)

a
n (−a, 0)

(a, 0)
x0

Line passing through x0 , parallel to (0, −b)

a and normal to n

Figure 1.9: Left: A generic line in R2 . Right: An ellipse in R2 .

Lines In general, a line in Rd is characterized by a point x0 through which the line passes and a vector
a 0 parallel to the line. Then, it reads
(
D x P Rd : Dt P R, x x0 ta .

In R2 , a line can also be characterized by a normal vector n, it should be such that a n 0. If

a pa1 , a2 q, then a normal vector to it is pa2 , a1 q. Then, the equation of the line is
(
D x P R2 : px x0 q n 0 .

As we have seen, in R3 , a line can also be seen as an intersection of two planes (which are not parallel).

Planes A plane in R3 is characterized by a point x0 and a normal vector n 0. Then the equation of
the plane is (
D x P R3 : px x0 q n 0 .
Actually, in Rd the equation above defines what is called a hyperplane: a hyperplane is a line in R2 ; a
plane in R3 ; and in general an object of dimension d 1 in Rd ).

Circles The domain

D tpx, yq P R2 : px x0 q2 py y0 q2 r2 u
is the circle of center px0 , y0 q and of radius r. Indeed, px x0 q2 py y0 q2 }px, y q px0 , y0 q}2 . A
parametrization of this domain is

D tpx0 r cosptq, y0 r sinptqq : t P Ru.

Ellipses If a, b ¡ 0 we can consider the domain

" *
x2 y2
D px, yq P R 2
: 2
a b2
1
which is an ellipse centered in p0, 0q with width 2a and height 2b. If a b then it coincides with the
circle of center p0, 0q and of radius a. A parametrization of this domain is

D tpa cosptq, b sinptqq : t P Ru.

Spheres The domain

tpx, y, zq P R3 : px x0 q2 py y0 q2 pz z0 q2 r2 u
is the sphere of center px0 , x0 , z0 q and of radius r. To see it, note that px x0 q2 py y0 q2 pz z0 q2
is the squared Euclidean distance between px, y, z q and px0 , x0 , z0 q.

11
Bocconi University – course 30543 (Mathematical Analysis module 2)

Length
√ 2
x + y2

Point
√
Cone x2 + y 2 = |z| (x, y, z)

Cylinder x2 + y 2 = 9

Figure 1.10: Cylinder and cones in R3 .

a
Cones and cylinders If px, y, z q P R3 , then x2 y 2 is the distance of this point to the vertical axis.
From it we can for instance define the domain
(
D px, y, zq P R3 : x2 y2 r2
which is an (infinite) cylinder of revolution around the vertical axis of radius r. We can also consider
! a )
D px, y, zq P R3 : x2 y2 |z| .

which gives a cone. See Figure 1.10 for an illustration.

Remark 1.28. Let D a domain of Rd defined by an equation F px1 , x2 , . . . , xd q 0. Then the image of D
by the translation of vector y py1 , y2 , . . . , yd q is the domain D1 defined by the equation F px1 y1 , x2
y2 , . . . , xd yd q 0. Indeed, a point z belongs to D1 if and only if z x y for some x P D, that is if
and only if z y x belongs to D.
As an example, you can look at circles of spheres. In R2 , the example of the unit circle, centered at
the origin, is
x2 y 2 1.
On the other hand, the circle centered in p1, 2q and of radius 1, which is the image of the unit circle by
the translation of the vector p1, 2q, has for equation

px 1q2 py 2q2 1
(and not px 1q2 py 2q2 1).
Eventually, one can combine these “standard” domains by taking intersections.
Definition 1.29. If D1 , D2 are two domains of Rd , their intersection D1 X D2 is defined as the set of x
which belong to both D1 and D2 .
Example 1.30. In R2 , let’s find the intersection of the circle of center p1, 2q and radius 5 with the line
passing through p4, 0q and with direction p1, 1q.
We write px0 , y0 q p2, 1q. The circle of center px0 , y0 q and radius 5 is made of points such that
}px, yq px0 , y0 q} 5. By squaring the equality and expanding it, this corresponds to points px, yq such
that px 1q2 py 2q2 25 that is such that

x2 y2 2x 4y 20.
12
Chapter 1. The Euclidean space Rd

On the other hand, we can have a representation by parametrization of the line: it is given by the set
of px, y q such that there exists t P R with
#
x 4 t,
y t
Thus a point px, y q belongs the the line if px, y q p4 t, tq and it belongs to the circle if x2 y 2 2x 4y
20. This gives us that t must satisfy the equation p4 tq2 ptq2 2p4 tq 4ptq 20 which reads

2t2 6t 4 0.

This equation has two solutions t 2 and t 1, which yields two points on the intersection: p2, 2q
and p3, 1q.

13
14
Part I

Differentiation

15
16
Chapter 2

Parametric curves

In this chapter we study vector valued functions of one variable, that is functions γ : R Ñ Rd , which are
also called parametric curves. They can be represented as a collection of real-valued functions, a good
part of the analysis boils down to the study of real-valued variables. However there are few differences
and parametric curves encompass information with a more geometric flavor.

2.1 Definition and representation

We consider γ : I R Ñ Rd where I is a subset of R. We write

γ1 ptq
γ2 ptq
γ ptq .

..
γd ptq
where each γi : I Ñ R is a real-valued function, called the i-th coordinate function. Note that we usually
use t P I for the argument of the function γ as it is thought as the time.
? that all coordinates functions
Remark 2.1. If the domain I is not specified, one takes the largest one such
are defined. For instance, if γ is defined by γ ptq pγ1 ptq, γ2 ptqq p 1 t, 1{tq then the domain is
I p8, 0q Y p0, 1s. Indeed, γ1 has a domain p8, 1s while γ2 has domain p8, 0q Y p0, 8q. The
intersection of the two domains yields I.
Contrary to the case d 1 (that is real-valued) function, it is more difficult to visualize a vector
valued function. Let’s take for instance the function γ : R Ñ R2 defined by

cosptq
γ ptq
sinptq
.

The different representations discussed below are plotted in Figure 2.1. A first option would be to plot
the graph of each of the coordinate function: we would get two graphs of real-valued functions. However,
this does not capture the idea that the function is valued in a space R2 . A second alternative is to plot
the graph of the function as a subset of Rd 1 .
Definition 2.2. If γ : I R Ñ Rd is a vector valued function, its graph is the subset of Rd 1
made of
the points x P Rd 1 which can be written

t
p q
γ1 t

x p q
γ2 t t
γ ptq

.
..
γd ptq
for some t P I.

17
Bocconi University – course 30543 (Mathematical Analysis module 2)

γ2 (t)

t = π/2

t=π t=0
γ1 (t)

t = 3π/2

Figure 2.1: Three ways to represent the parametric curve t ÞÑ pcosptq, sinptqq. The left one does not
inform about the geometric content, the one in the center contains too much information, thus we prefer
to stick to the one on the right (the image of the function).

Note that we “gain” one dimension in the representation. The graph of a R2 -valued function is a
subset of R3 whereas the graph of a real-valued function is a subset of R2 (the latter case is what you
saw during the first term).
Usually the representation as a graph carries too much information, and one prefers to restrict to the
image of the function.
Definition 2.3. If γ : I R Ñ Rd is a vector valued function, its image it the subset of Rd made of
points x P Rd which can be written x γ ptq for some t P I.
The image is sometimes called a parametric curve and the function γ is a parametrization of the
curve.

Remark 2.4. Actually, by looking at Definitions 2.2 and 2.3, one can see that the graph of a curve
γ : I R Ñ Rd is the same as the image of the curve θ : I Ñ Rd 1 defined by θptq pt, γ ptqq for t P I.
That is, the concept of image is larger than the one of graph.
On the other hand, to retrieve the image from the graph amounts to project the graph on the last
d-coordinates (that is one “forgets” about the t variable).
Remark 2.5. The image of a function is also a legit concept for real-valued function but it usually does
not contain enough information. For instance, the image of the function g : t P R Ñ cosptq P R is the set
r1, 1s; to compare to the image of the function γ : t P R Ñ pcosptq, sinptqq which is the unit circle of R2 .
One of the two representations has more geometric content than the other!
Importantly, different functions can have the same image. Let’s consider the two functions γ, θ : R Ñ
R2 defined by

3 2t 5 2t
γ p tq and θptq
1t
.
t
Then γ ptq θp1 tq hence the image of the functions are the same. Another example could be the
functions γ, θ : R Ñ R2 and ω : p0, 8q Ñ R2 defined by

cosptq cosp2tq cosplnptqq

γ ptq θptq and ω ptq
sinptq sinp2tq sinplnptqq
, .

Then γ, θ and ω have the same image. We will explore it more in Chapter 7, but these functions
correspond to the same curve (the unit circle of R2 ) but traveled at different speed. Note also that trying
to represent the graph of ω would likely be too intricate: representing the image is usually what contains
enough information to understand what’s happening, while still being able to parse the representation.

2.2 Derivatives
We now switch to the definition of the differential properties of the curve. We will see later why the
definitions given here can be seen as particular case of more principled definitions. However, to study

18
Chapter 2. Parametric curves

parametric curves one does not need to rely on these abstract definitions and just uses the ones of the
real-valued case.

Definition 2.6. Let γ : I R Ñ Rd . The function γ is said continuous at a point t P I if all the
coordinates functions are continuous at t. It is continuous over I if it is continuous at every point of I,
that is if all the coordinate functions are continuous over I.

Hence, checking the continuity of γ amounts to check the continuity of the coordinate functions,
which are themselves real-valued functions (thus all the concepts and theorems of Mathematical Analysis
– Module 1 apply). Checking differentiability and computing derivatives is similar.

Definition 2.7. Let γ : I R Ñ Rd . The function is differentiable at the point t P I if all the coordinate
functions are differentiable at t. If this is the case, the derivative at the point t is the vector of Rd denoted
by γ 1 ptq and defined by
1
γ1 ptq
γ21 ptq
γ 1 ptq . .

..
γd1 ptq
The function is differentiable over I if it is differentiable at every point of I.

Remark 2.8. In physics, when t represents the time, the notation γ9 rather than γ 1 is used to denote the
derivative of γ.

Definition 2.9. If γ : I R Ñ Rd is differentiable at a point t P I, the scalar quantity }γ 1 ptq} P R is

called the speed of the curve at t.

Note that the derivative γ 1 : I Ñ Rd is still a vector-valued function; whereas the speed is a real-valued
one. Differentiating vector-valued functions amounts to differentiate them component-wise so again all
the rules your learned in Mathematical Analysis – Module 1 still hold.
Example 2.10. Let’s again look at γ ptq pcosptq, sinptqq defined on R. Then the function γ : R Ñ R2
is differentiable on R with γ 1 ptq p sinptq, cosptqq. In particular the speed is }γ 1 ptq} 1 and does not
depend on t. You have a function with constant speed but non constant velocity vector: the direction of
the velocity vector changes.
Example 2.11. Let γ : I R Ñ R2 a differentiable function. Prove that the real-valued function g : I ÑR
defined by g ptq }γ ptq}2 for t P I is differentiable and that g 1 ptq 2 γ 1 ptq γ ptq.

Eventually, let us say that we can extend the definition of the derivatives to higher order derivative.

Definition 2.12. Let γ : I R Ñ Rd a vector valued function defined on I an interval of R and k ¥ 0

an integer. The function γ is said k times differentiable over I if all the coordinate functions are k times
differentiable over I. In such a case, the k-th derivative is the vector valued function γ pkq : I Ñ Rd
defined by, for t P I, pk q
γ1 ptq
pk q
γ2 ptq
γ pkq ptq
.. .

.
γd
pkq ptq

The function γ is said of class C k if it is k-times differentiable and γ pkq is continuous over I. It is
equivalent to require that all the coordinate functions are of class C k over I.

The second derivative γ 2 , sometimes called the acceleration, is of utmost importance, especially in
physics (and in this case it is denoted by γ: ). Indeed, if γ : R Ñ R3 denotes the position of a particle over
time, Newton’s second law states that mγ: F where m is the mass of the particle and F is the sum of
all the forces acting on the particle.

19
Bocconi University – course 30543 (Mathematical Analysis module 2)

Tangent line at t = t0
γ ′ (t0 )

γ(t0 )

γ ′′ (t0 )

Figure 2.2: Image of a curve in R2 (in red) together with its two first derivative and the tangent line at
one point.

2.3 Taylor expansion and local behavior

In this section we investigate the local behavior of a curve around a point. This is related to the derivatives
of the curve. We start by giving the definition of the tangent and then analyze in details the behavior
around a cusp, before giving a more systematic analysis based on Taylor expansion.
In the first semester, you have seen that the derivative of a function R Ñ R is linked to the tangent
to the graph of the function. Here this is still the case, but we are rather interested in the tangent to the
image.
Definition 2.13 (Tangent to a curve). Let γ : I R Ñ Rd which is differentiable at t0 P I. If γ 1 pt0 q 0,
the line of Rd passing by γ pt0 q and with direction γ 1 pt0 q, that is the image of the function

h P R ÞÑ γ pt0 q hγ 1 pt0 q P Rd ,

is called the tangent line to the image of γ at the point γ pt0 q.

We will see below how this definition can be justified with the concept of Taylor expansion.
Definition 2.14. If γ : I R Ñ Rd , a point t P I where γ is differentiable and γ 1 ptq 0 is called a
regular point.
Example 2.15. Why do we impose γ 1 pt0 q 0 in Definition 2.13? Because otherwise the local behavior of
the curve can be involved. Let’s consider the case of the function γ : R Ñ R2 defined by γ ptq pt2 , t3 q
whose image is plotted in Figure 2.3. This function is differentiable everywhere on R and its velocity
vector vanishes for t 0: this is the only irregular point.
If t ¥ 0 then γ2 ptq t3 pγ1 ptqq3{2 . On the other hand, if t ¤ 0 then γ2 ptq t3 pγ1 ptqq3{2 . Thus
the image of γ on r0, 8q is the set tpx, y q : y x3{2 and x ¥ 0u while the image of γ on p8, 0s is the
set tpx, y q : y x3{2 and x ¥ 0u. Drawing the image, one can see that there is a “cusp” at the origin:
there is no tangent line at this point.
On the other hand, the origin p0, 0q is nothing else than f p0q and t 0 is an irregular point: so
visually we see that the only irregular point is where we cannot draw a tangent line.
As we just saw in Example 2.15, the local behavior of a parametric curve can be quite intricate
around an irregular point. We now analyze this behavior more in detail, starting by the concept of a
Taylor expansion. Unsurprisingly it corresponds to Taylor expansion coordinate per coordinate.

20
Chapter 2. Parametric curves

Cusp at t = 0 because γ 0 (0) = 0

Image of γ(t) = (t2 , t3 )

= {(x, y) : |y| = x2/3 and x ≥ 0}

Figure 2.3: Image of the function studied in Example 2.15.

Proposition 2.16. Let γ : I R Ñ Rd and t0 P I. If γ is k times differentiable over I and t0 P I then

hγ 1 pt0 q
h2 2 hk pkq
γ pt0 hq γ pt0 q γ pt0 q ... γ pt0 q ophk q.
2 k!
Here ophk q denotes a function g : J R Ñ Rd , defined on a neighborhood of 0, such that each coordinate
function gi , i P t1, 2, . . . du belongs to ophk q (that is gi phq hk g̃i phq with g̃i which goes to 0 as h Ñ 0).
The intuition is that for h small the function h ÞÑ γ pt0 hq behaves as the principal part of the Taylor
expansion (the part without the ophk q). In particular, the image of γ is close to the image of the principal
part of the Taylor expansion. Below are some consequences of this principle, illustrated in Figure 2.4.

• If γ 1 pt0 q 0 then γ admits the Taylor expansion

γ pt0 hq γ pt0 q hγ 1 pt0 q ophq

thus the image of γ is close to the line h ÞÑ γ pt0 q hγ 1 pt0 q: this is the tangent line. This correspond
to a regular point as discussed before.
• If γ 1 pt0 q 0 and γ 2 pt0 q is not colinear to γ 1 pt0 q then

hγ 1 pt0 q
h2 2
γ pt0 hq γ pt0 q γ pt0 q oph2 q
2
thus in the frame pγ 1 pt0 q, γ 2 pt0 qq centered at γ pt0 q, the image of γ is close to the parabola y x2 {2.
This is called a biregular point.
• Actually if γ 2 pt0 q is colinear to γ 1 pt0 q then the image of γ could “cross” the tangent. Indeed, if
γ 2 pt0 q λγ 1 pt0 q is colinear to γ 1 pt0 q but γ 3 pt0 q is not then

λ γ 1 pt0 q
h2 h3 3
γ p t0 hq γ pt0 q h γ pt0 q oph3 q.
2 6

Thus in the frame pγ 1 pt0 q, γ 3 pt0 qq centered at γ pt0 q, the image of γ is close to the curve y x3 {6.
This is called a inflexion point.
• If γ 1 pt0 q 0 but γ 2 pt0 q and γ 3 pt0 q are not colinear then

h2 2 h3 3
γ pt0 hq γ pt0 q γ p t0 q γ pt0 q oph3 q.
2 6

21
Bocconi University – course 30543 (Mathematical Analysis module 2)

Tangent
γ 00 (t0 )

γ 0 (t0 ) γ 0 (t0 ) γ 000 (t0 )

γ(t0 )
γ(t0 ) γ 00 (t0 ) γ(t0 )
γ 000 (t0 )
Ordinary cusp
Inflexion point

Tangent

Biregular point

Figure 2.4: Generic behavior around a biregular point, an inflexion point and an ordinary cusp.

This means that in the frame pγ 2 pt0 q, γ 3 pt0 qq centered at γ pt0 q the image of γ is close to the image
of h ÞÑ ph2 {2, h3 {6q. Similarly to Example 2.15, the curve has a cusp called ordinary cusp.
In the general case, to analyze a singularity, one finds the two smallest integer 1 ¤ p q such that
γ ppq pt0 q and γ pqq pt0 q are not colinear. Then the nature of the singularity depends on the parity of p and
q (the oddness or evenness).
Remark 2.17. In Figure 2.4, notice that γ 1 pt0 q is not necessarily orthogonal to γ 2 pt0 q. Actually, it happens
if and only if γ has constant speed. More precisely, if γ : I Ñ Rd is of class C 2 , then [γ 1 ptq is orthogonal to
γ 2 ptq for all t P I] if and only if [the function t ÞÑ }γ 1 ptq} is constant]. To see that, note that [the function
t ÞÑ }γ 1 ptq} is constant] if and only if g : t ÞÑ }γ 1 ptq}2 is constant. On the other hand, g 1 ptq 2 γ 1 ptq γ 2 ptq.
Thus we conclude by using the property that a function is constant if and only if its derivative vanishes.

2.4 A remark on the mean value theorem

The mean value theorem is a key result to link a function with its derivative. For a function f : ra, bs Ñ
R which is defined on a bounded interval ra, bs of R and real-valued, if f is continuous on ra, bs and
differentiable on pa, bq then there exists c P pa, bq such that
f pbq f paq
ba
f 1 pcq. (2.1)

If in addition f 1 is continuous over ra, bs then we have a more precise expression

»b
f pbq f paq f 1 ptq dt. (2.2)
a

For vector valued curve, the short message is that (2.1) fails while (2.2) still holds.
Example 2.18. Let’s take γ : R Ñ R2 defined by γ ptq pcosptq, sinptqq. Then γ is of class C 8 (that is of
class C k for every k ¥ 1). Moreover, γ 1 ptq p sinptq, cosptqq. Note that
γ p2π q γ p0q 0,

however there exists no c P p0, 2π q such that γ 1 pcq γ p2π2π

qγ p0q 0 as }γ 1 } is constant an equal to 1.
The reason is that if we look at the first coordinate function γ1 : R Ñ R, there exists c1 π P p0, 2π q
such that γ11 pc1 q 0 (this is the mean value theorem for γ1 ). Similarly, there exists c2 π {2 P p0, 2π q
such that γ21 pc2 q 0 (this is the mean value theorem for γ2 ). But we cannot find c P p0, 2π q such that
both γ11 pcq and γ21 pcq vanish. This shows that (2.1) cannot hold for functions γ : R Ñ Rd .

22
Chapter 2. Parametric curves

eθ (θ)
er (θ)

γ(θ) = g(θ)er (θ)

Length g(θ)

Figure 2.5: Curve in polar coordinate r gpθq (in red) with the vectors er and eθ .

On the other hand, (2.2) still holds provided one defines, for a continuous function θ : ra, bs Ñ Rd ,
the integral
»b
θptq dt
a
³b
as the vector in Rd whose i-th coordinate is a θi ptq dt; being θi : ra, bs Ñ Rd the i-th coordinate function
of θ. Then (2.2) holds simply by writing it coordinate per coordinate.

2.5 Curves in polar coordinates

To end this chapter, let us briefly mention curves in polar coordinates which are a particular class of
functions in R2 . This corresponds to curve of the form r g pθq where r is the distance to the origin and
θ the angle made with the origin. In equations, this is defined as follows:
Definition 2.19. Let I R and g : I Ñ r0, 8q a function defined on it taking non negative values.
The curve r g pθq is the vector valued function γ : I Ñ R2 defined by, for all θ P I,

cospθq
γ pθq g pθq (2.3)
sinpθq
.

Remark 2.20. By standard properties about product of continuous and differentiable functions, one can
check that if the function g is of class C k for some integer k ¥ 1, then the corresponding function γ is of
class C k . The converse actually holds.
To compute derivatives of the function γ, it is useful to introduce the following functions er , eθ which
are functions defined on R and valued in R2 . For θ P R, we define

cospθq
er pθ q
sinpθq
and eθ cos
sinpθq
pθq .

Thus (2.3) reads f pθq g pθqer pθq.

Lemma 2.21. Both functions er , eθ are of class C 8 , there holds er eθ 0 on R and moreover
e1r eθ and e1θ er .
Proof. Straightforward computations.
This leads to the following formula for the speed and acceleration of a function defined via polar
coordinates.

23
Bocconi University – course 30543 (Mathematical Analysis module 2)

Proposition 2.22. Let I an interval of R and g : I Ñ r0, 8q a function of class C 2 . We define the
function γ : I Ñ R2 via (2.3), that is γ g er . Then the function γ is of class C 2 and

γ1 g1 er g eθ and γ2 pg2 gq er 2g1 eθ .

Proof. It relies on the differentiation (twice) of the identity γ ger together with Lemma 2.21. If you’re
not confident, you can do it coordinate per coordinate.
Remark 2.23. It is sometimes more convenient to write the the function as a complex valued curve, that
is γ pθq g pθqeiθ (being i P C a complex number such that i2 1) but we will not cover this point of
view in this course.

24
Chapter 3

Notions of topology in Rd

Topology denotes the field of mathematics studying proximity in general spaces. It is concerned with
questions like: “What does it mean for a sequence to converge?”, “What does it mean for a function to
be continuous?”. In R and even Rd , there is a canonical way to answer these questions that we will see in
this chapter. A trend during the 20th century was to design topological notions for much more general
spaces than Rd . Even though we will not tackle this issue this year (but you will in the next years), this
has an impact on the content of this chapter as the notions are presented in such a way that they extend
easily to more general frameworks. b
°d
As we have seen before, with Rd comes a norm, that is, }x} i1 pxi q measures the length of
2

the vector x. From this norm one can define the distance between points in Rd : the distance between x
and y is nothing else than }x y}. If d 1, it boils down to |x y | the absolute value between the two
real numbers.
From the triangle inequality }x y} ¤ }x} }y} (see Proposition 1.12), one can write a triangle
inequality for distances which reads

}x y} ¤ }x z} }z y}.
Topology can be studied in the more general context of metric spaces, which are spaces endowed with
a metric satisfying some axioms including the triangle inequality.

3.1 Limits
We consider in this section sequences in Rd , that we denote by pxn qnPN . This corresponds to points in
Rd indexed by n P N, or, said differently, to a function N Ñ Rd .
Definition 3.1. Let pxn qnPN be a sequence in Rd and a P Rd . We say that the sequence pxn qnPN converges
to a P Rd if and only if the real-valued sequence p}xn a}qnPN converges to 0.
In other words, pxn qnPN converges to a P Rd if the distance between xn and a converges to 0 as
n Ñ 8. The characterization with quantifiers is the following.
Proposition 3.2 (ε δ characterization of the limit). The sequence pxn qnPN converges to a P Rd if and
only if
@ε ¡ 0, DN P N, @n ¥ N, }xn a} ¤ ε.
Proof. This is just about copying the definition that a real-valued sequence converges to 0.
This is to compare to the definition of a real-valued sequence pxn qnPN converging to a P R: it reads

@ε ¡ 0, DN P N, @n ¥ N, |xn a| ¤ ε.
Thus, the only difference is that absolute values have been replaced by the norm. There is another
characterization: just looking at what’s happening coordinate per coordinate.

25
Bocconi University – course 30543 (Mathematical Analysis module 2)

Bc (a, ε) x1

ε
a

Figure 3.1: Illustration of the characterization in Proposition 3.2.

Proposition 3.3 (Coordinate-wise characterization of the limit). Let pxn qnPN be a sequence in Rd and
a P Rd . For i P t1, 2, . . . , du we write xi,n for the i-th coordinate for the vector xn . Then the sequence
pxn qnPN converges to a P Rd if and only if for all i P t1, 2, . . . , du, the sequence pxi,n qnPN converges to ai .
Proof. Implication pñq. We assume that pxn qnPN converges to a P Rd . Fix i P t1, 2, . . . , du and note that
g
f d b
f¸
}xn a} e xj,np aj q2 ¥ pxi,n ai q2 |xi,n ai |.

j 1

As the left hand side goes to 0, so does the right hand side.
Implication pðq. We start again from the expression
g
f d
f¸
}xn a} e p
xi,n ai q2 .

i 1

Each of the sequence xi,n ai goes to 0 when n Ñ 8 for i P t1, 2, . . . du. Thus so do the sequences
pxi,n ai q2 because the square of a sequence going to 0 also goes to 0. A finite sum of sequence going
to 0 also goes to 0, and then when we take the square root we still get a sequence which goes to 0. This
shows that the sequence }xn a} goes to 0 when n Ñ 8 and concludes the proof.

Proposition 3.4 (Operations on limits). Let pxn qnPN and pyn qnPN two sequences in Rd which converge
to a and b respectively. Then,

(i) The sequence pxn yn qnPN converges to a b.

(ii) The sequence pxn yn qnPN converges to a b.

(iii) If d 3 then the sequence pxn yn qnPN converges to a b.

(iv) The sequence p}xn }qnPN converges to }a}.

Proof. This is direct using the characterization of the limit coordinate per coordinate in Proposition 3.3,
and then relying on operations for limits for sequences in R.

26
Chapter 3. Notions of topology in Rd

3.2 Neighborhood, interior and closure

After studying limits, we will study subsets of Rd and the notions of neighborhood, interior, closure,
boundary.
For a P Rd and r ¥ 0, we define the following domains of Rd :

Bo pa, rq tx : }x a} ru and Bc pa, rq tx : }x a } ¤ r u

They are called respectively the open and closed ball of center a and radius r. Using the triangle inequality
and the homogeneity of the norm, you can prove some inclusions of balls: if a, b P Rd and r, s ¡ 0 then
Bc pa, rq Bc pb, sq if and only if s ¥ r }b a}.
Note that Proposition 3.2 can be read as: a sequence pxn qnPN converges to a P Rd if and only if for
every ε ¡ 0, there exists an integer N such that all the terms of the sequence after rank N belong to
Bc pa, εq.
Definition 3.5 (Neighborhood). Let x P Rd and V Rd . We say that V is a neighborhood of x if there
exists ε ¡ 0 such that Bc px, εq V . With quantifiers this corresponds to:

Dε ¡ 0, @y P Rd , }x y} ¤ ε ñ y P V.
In particular, if V is a neighborhood of x then x P V . A neighborhood of x is a subset of Rd which
contains all points which are sufficiently close to x.
Example 3.6. Let’s take a p1, 0q P R2 . Then V tpx, y q : x ¥ 0u is a neighborhood of a: indeed,
Bc px, εq V for all ε P p0, 1q. On the other hand, V tpx, y q : x ¥ 1u is not a neighborhood of a
(exercise), and neither is V tpx, y q : y ¥ 2u (as a does not even belong to V ).
With this vocabulary, we can reformulate what it means for a sequence to converge.
Proposition 3.7 (Characterization of the limit in terms of neighborhood). Let pxn qnPN be a sequence
in Rd and a P Rd . Then the sequence pxn qnPN converges to a if and only if for all V neighborhood of a,
there exists N P N such that xn P V for all n ¥ N .
Proof. Implication pñq. We assume that pxn qnPN converges to a and we take V a neighborhood of a.
Then by definition there exists ε ¡ 0 such that Bc pa, εq V . By definition of the limit, there exists
N P N such that xn P Bc pa, εq for all n ¥ N . This concludes the implication as Bc pa, εq V .
Implication pðq. We assume that for all V neighborhood of a, there exists N P N such that xn P V
for all n ¥ N . In particular, if ε ¡ 0, taking V Bc pa, εq (which is a neighborhood of a) we see that
there exists N P N such that xn P V Bc pa, εq for all n ¥ N . This is enough to say that pxn qnPN
converges to a.
Definition 3.8 (Interior). Let V Rd . We call interior of V , and write V̊ , the subset of Rd made of
x P Rd such that V is a neighborhood of x. That is x P V̊ if and only if V is a neighborhood of x. With
quantifiers:
x P V̊ ô Dε ¡ 0, @y P Rd , }x y} ¤ ε ñ y P V .

Example 3.9. This notion is indeed what we have in mind when we speak of interior. For instance, the
interior of the set V tpx, y q P R2 : y ¤ 3u R2 is V̊ tpx, y q P R2 : y 3u. Or the interior of
Bc pa, rq is Bo pa, rq for any a P Rd and r ¡ 0.
Let’s state some straightforward properties of the interior whose proof is quite direct and left as an
exercise.
Proposition 3.10. Let V, W be two subsets of Rd and V̊ , W̊ be their interior.
(i) There always holds V̊ V.
(ii) If V W then V̊ W̊ .
A concept that is tightly linked to the one of interior is the one of closure (it’s not apparent at first
glance but will be proved in Proposition 3.17).

27
Bocconi University – course 30543 (Mathematical Analysis module 2)

ε2 ε3
x2 x3

xi for i = 1...4 all be- ε4

long to the interior of ε1 x4
D, with different “ε” x1
for each i.

Domain D

Figure 3.2: Illustration of the definition of the interior of a domain D.

In D and the closure

Sequence that con-
verges to a point in the
closure

In the closure but not in D

Domain D = {(x, y) : y > 1} Not in the closure

Figure 3.3: Illustration of the closure of a set, see Example 3.13.

Definition 3.11 (Closure). Let V be a subset of Rd . We say that x belongs to the closure of V , and we
write x P V if and only if there exists a sequence pyn qnPN in Rd such that yn P V for all n P N and the
sequence pyn qnPN converges to x.

The definition of closure is more intricate because it is about the existence of sequence. Note that
one can still prove the following properties similar to the interior.

Proposition 3.12. Let V, W be two subsets of Rd and V , W be their interior.

(i) There always holds V V.

(ii) If V W then V W.
Proof. For (i) it is enough to take x P V and then the constant sequence pyn qnPN defined by yn x for
all n P N: it clearly converges to x and yn x P V for all n P N.
For the second point, if x P V and pyn qnPN is a sequence in V which converges to x, then yn P V so
yn P W for all n P N, which proves that x P W .

Example 3.13. Let V tpx, y q : y ¡ 1u R2 , see Figure 3.3. Then V tpx, y q : y ¥ 1u. Indeed, let
px0 , y0 q P V . If y0 ¡ 1, then px0 , y0 q already belongs to V . If y0 1, then one can define the sequence
pxn , yn qnPN by xn x0 and yn 1 1{n for n P N. One can check that this sequence converges to
px0 , y0 q px0 , 1q while yn 1 1{n ¡ 1 for all n P N, so pxn , yn qn P V for all n P N. This shows that V
contains tpx, y q : y ¥ 1u. Eventually, if pxn , yn qnPN is any converging sequence which belongs to V , for
any n there holds yn ¡ 1. Passing to the limit, limn yn ¥ 1. Thus the closure of V is necessarily included
in tpx, y q : y ¥ 1u.

28
Chapter 3. Notions of topology in Rd

Domain D = {(x, y) : x > 1 and x + y ≤ 3}

Not in D nor its closure

In D and on the boundary of D
but not in the interior

Not in D but in its closure

In the interior of D

Figure 3.4: Example of a domain together with some points in its interior, its closure, its complement.

Example 3.14. On the other hand, let’s check that if a P Rd and r ¥ 0, the closure of the closed ball
Bc pa, rq is itself. A set is always contained in its closure. On the other hand, let x in the closure of
Bc pa, rq. There exists pyn qnPN a sequence in Bc pa, rq which converges to x. By Proposition 3.4,

lim }yn a} }x a}.

n Ñ 8
For n P N, as ynP Bc pa, rq there holds }yn a} ¤ r. Passing to the limit this inequality, }x a} ¤ r.
Definition 3.15 (Boundary). If V Rd , its (topological) boundary B V is defined as B V V zV̊ .
Example 3.16. If V tpx, y q : y ¡ 1u R2 , then its boundary is the line defined by y 1. This is
the same boundary as the set W tx, y q : y ¥ 1u R2 . Actually, in this case, W̊ V̊ V while
W V W and B V B W W zV .
Eventually, we finish by stating a key property between interior and closure: they are linked by
“complementation”. If V Rd , we write V c Rd zV its complement, that is the set of points in Rd that
do not belong to V .
Proposition 3.17 (Link between interior and closure). Let V be a subset of Rd . Then

V̊ Y V c Rd , (3.1)

and the union is disjoint. As a consequence,

pV c q pV̊ qc and V
c
pV c q˚.
Said with words, the closure of the complement is the complement of the interior and vice versa.

Example 3.18. Try to write it for instance in the case V tpx, yq : y ¡ 1u R2 . You can also take a
look at Figure 3.5.
Proof. Identity (3.1). Let’s take x P Rd . If x P V̊ , then there is nothing to prove. On the other hand,
assume that x R V̊ , then we want to prove that x P V c . For each n ¥ 1, we know that Bc px, 1{nq is not
included into V (if not x P V̊ ). Let’s take yn a point in V c X Bc px, 1{nq. In particular }x yn } ¤ 1{n,
thus the sequence pyn qnPN converges to x. As moreover yn P V c for all n P N, we conclude that x P pV c q.
In (3.1), the union is disjoint. Let’s take x which belongs to both V̊ and V c . We want to reach a
contradiction. As x P V̊ , there exists ε ¡ 0 such that Bc px, εq V . On the other hand, as x P V c , there
exists a sequence pyn qnPN such that all elements belong to V c , and which converges to x. In particular,
yn belongs to Bc px, εq V for n large enough. This contradicts yn P V c for all n.
Deducing the other identities from (3.1). That pV c q pV̊ qc is really a rewriting of (3.1). Then to
prove the last identity we introduce W V c , we can write the second one pV c q pV̊ qc as W ppW c q˚qc .
Taking then the complement, and as pDc qc D for any set D Rd , we get W pW c q˚. But as V can
c

be any set, so does W .

29
Bocconi University – course 30543 (Mathematical Analysis module 2)

Domain V = Bc (x, r)

= ∪
Vc =
r

x
V̊ ∪ V c = Rd

= ∪
Disjoint union

Figure 3.5: Illustration of one of the identity in Proposition 3.17 about the link between interior, closure
and complementation.

3.3 Open and closed sets

We end this chapter with the definition of the following classes of sets.
Definition 3.19 (Open and closed sets). A subset V of Rd is said open if V̊ V. It is said closed if
V V.
With the vocabulary above, a set is open if it is equal to its interior, that is if it is a neighborhood of
all of its points. With quantitifiers, V is open if and only if

@x P V, Dε ¡ 0, Bc px, εq V.
On the other hand, a set is closed if it coincides with its closure. This can be interpreted as being “stable
by limits”. Indeed, V Rd is closed if and only if: for all sequence pyn qnPN which is convergent to some
x P Rd , if yn P V for all n P N then x P V . That is, the limit of a sequence in V still stays in V .
Example 3.20. Open balls are open, closed balls are closed (exercise).
The counterpart of Proposition 3.17 is the following.
Proposition 3.21 (Link between open and closed sets). The complement of a open set is closed, and
the complement of a closed set is open.
Proof. This is a direct consequence of Proposition 3.17. Indeed, let’s take V an open set (that is such
that V̊ V ). There holds
pV c q pV̊ qc V c .
This exactly means that V c is closed. To prove that the complement of a closed set is open is done in a
similar way.
Proposition 3.22. Let V be a subset of Rd . Then V̊ is open, it is the largest open set contained in V .
On the other hand V is closed, it is the smallest closed set containing V .
Proof. Let’s first prove that V̊ is open. If x P V̊ , then there exists ε ¡ 0 such that Bc px, εq V ,
in particular Bo px, εq V . Taking the “interior” on both sides of the inclusion (specifically: using
Proposition 3.10 (ii)) and as open balls are open, Bo px, εq V̊ . In particular Bc px, ε{2q V̊ and this
is enough to conclude that x belongs to the interior of V̊ , thus V̊ is open. Moreover, let W V be an
open set. Using again Proposition 3.10 (ii), we see that W̊ V̊ , but W̊ W by openness of W , which
shows W V̊ . That is, every open set W contained in V is contained in the interior of V : the latter is
the largest open set contained in V .

30
Chapter 3. Notions of topology in Rd

Then, to prove that V is closed, we rather reason with Proposition 3.17 and Proposition 3.21. Indeed,
V is the complement of pV c q˚(Proposition 3.17), that is is the complement of an open set by what we
just proved. Using Proposition 3.21, the complement of an open set is a closed set. Then, proving the
minimizing property of V can be done similarly to V̊ , with Proposition 3.12 which shows that V W
for any closed set W which contains V .

31
32
Chapter 4

Continuity and differentiability for

functions from R2 to R

In this chapter, we study functions f : D R2 Ñ R, that is, functions defined over a domain of R2 and
valued into R. These functions map each point of their domain of definition to a scalar value.
All the concepts of this chapter can be extended in a straightforward way to functions of more than
2 variables (but still real-valued). We prefer to keep it to d 2 in this chapter for the sake of clarity. We
will use px, y q as the notation for a generic point in R2 .
First, let’s motivate the study of such functions with “concrete” examples.
• Some formulas can be read as functions of two variables. For instance, the volume of a cylinder of
height x and radius y is πxy 2 . We can define the function f : px, y q ÞÑ πxy 2 which expresses the
volume of a cylinder as a function of its dimensions.
• With a physical flavor. In an ideal gas, there holds P V nRT , with P the pressure, V the volume,
n the number of molecules and T the temperature (and R a physical constant). Thus we could
define f pP, T q RT {P which gives the volume occupied by one mole of ideal gas as a function of
the temperature and the pressure.
• Or a function of two variables can be read from some data. For instance f px, y q is the temperature
measured at a longitude x and latitude y at the surface of the earth.

4.1 Definition and representation

Definition 4.1. A real-valued function on R2 is the data of D R2 which is the domain of definition,
and f which associates to every point px, y q of D a real f px, y q P R.
Given an analytic expression, to determine the domain of definition one uses the same rules than for
functions of one variable.
?
Example 4.2. Let f : px, y q ÞÑ lnpx2 y 2 9q y. Then the canonical domain of definition of f is
D tpx, y q : x2 y 2 9 ¡ 0 and y ¥ 0u. It is the intersection of the complement of a centered disk
with a half space.
Representing a function of several variable is not that easy, and usually it’s harder to visualize than
a function of single variable.
Definition 4.3 (Graph of a function). Let f : D R2 Ñ R a function of 2 variables. Its graph is the
subset of R3 made of the points px, y, z q such that px, y q P D and z f px, y q.
In other words, the height z corresponds to f px, y q value of the function. In general, the graph of a
function D Rd Ñ R is a subset of Rd 1 , so it’s clearly hard to picture as soon as d ¡ 2.
Another object to represent the function are the level sets. They are subsets of R2 .

33
Bocconi University – course 30543 (Mathematical Analysis module 2)

Figure 4.1: Two examples of functions of two variables from “real” data. Left: altitude as a function of the
position, specifically only the level sets are represented (image found on Wikipedia, material originally
coming from the United States Geological Survey). Right: temperature as a function of the position
(coming from the website openweathermap.org). Note that in both cases one does not represent the
graph of the function: on the right one uses colors, on the left one just plots the level sets.

Definition 4.4 (Level sets). Let f : D R2 Ñ R a function of 2 variables. The level set (at the level
k P R) is the subset of R2 made of points px, y q P D such that f px, y q k.

The generic situation is that a level set is a curve in R2 , but it’s not always the case: think at a constant
function whose level sets are all empty but for one which is R2 .
Remark 4.5. For physical quantities, the level sets are usually called iso[something]. For instance, the
level sets of constant temperature are the isotherms, the level sets of constant pressure are the isobars,
etc.
Example 4.6. Let’s take the example of a linear (rather affine) function. We consider f : px, y q P R2 ÞÑ
ax by c where a, b, c P R are fixed.
Then the graph of f , which is a subset of R3 , is characterized by the equation

ax by z c 0.

That is, it is a plane with normal vector pa, b, 1q. On the other hand, the level set at the level k P R is
the line in R2 defined by
ax by k c,
it is a line which is normal to the vector pa, bq and directed by pb, aq. Actually, the vector pa, bq represents
the direction in which the function “increases the most”, the vector pa, bq represents the direction in
which the function “decreases the most” while pb, aq represents the direction in which the function does
not change.
Eventually, by composition with a curve, one can go back with a usual function R Ñ R. Indeed, if
γ : I Ñ R2 where I is an interval of R and f : D R2 Ñ R, then provided that γ ptq P D for all t P I one
can define f γ : t P I ÞÑ f pγ1 ptq, γ2 ptqq P R. Though a lesson of the rest of this chapter will be that it’s
sometimes hard to understand a function by looking only at its behavior when restricted along curve.

4.2 Continuity
Let’s turn to the definition of continuity. We will give different characterizations, but we start with a
ε δ definition.

34
Chapter 4. Continuity and differentiability for functions from R2 to R

Figure 4.2: Take f : px, y q ÞÑ 3x2 y 2 . On the left is the graph of the function as a subset of R3 . On the
right are some level sets, that is subsets of R2 defined by f px, y q k for some k P R.

f (x0 ) + ε
x0
f (Bc (x0 , δ))
δ
Bc (x0 , δ) f (x0 )

f (x0 ) − ε
R2 f
R

Figure 4.3: Illustration of the definition of continuity for a function of two variables.

Definition 4.7 (Continuous function). Let f : D R2 Ñ R a function. We say that f is continuous at

the point x P R2 if
@ε ¡ 0, Dδ ¡ 0, @y P D, }y x} ¤ δ ñ |f pxq f pyq| ¤ ε.
That is, f is continuous at the point x P R2 if for all ε ¡ 0 one can find a δ ¡ 0 such that the image of
Bc px, δ q is included in rf pxq ε, f pxq εs Bc pf pxq, εq.
The function f is continuous over D if it is continuous at every point of D.

There is a sequential charactization of continuity: continuity of a function can be characterized only

with the help of converging sequences.

Proposition 4.8 (Sequential characterization of continuity). Let f : D R2 Ñ R a function. The

function f is continuous at a point x P D if and only if, for every sequence pyn qnPN which converges to
x and such that yn P D for all n P N, the sequence pf pyn qqnPN converges to f pxq.

Loosely speaking, a function f is continuous over D if it commutes with the limit, that is if

lim f pyn q f lim yn

n Ñ 8 n Ñ 8
at least for every converging sequence pyn qnPN in D which converges to a point in D.

35
Bocconi University – course 30543 (Mathematical Analysis module 2)

Proof. Let’s first assume that f is continuous at a point x P D and let pyn qnPN be a sequence which
converges to x and such that yn P D for all n P N. We want to show that f pyn q converges to f pxq as
n Ñ 8. Let ε ¡ 0. By continuity of f , we can find δ ¡ 0 such that }y x} ¤ δ implies |f pyq f pxq| ¤ ε.
As pyn qnPN converges to x, we can find N such that if n ¥ N then }yn x} ¤ δ. Thus combining these
two assertions, if n ¥ N then |f pyn q f pxq| ¤ ε. This is enough to show that pf pyn qqnPN converges to
f pxq.
On the other hand to prove the converse we reason by contraposition and we assume that f is not
continuous at a point x. That means with quantifiers,

Dε ¡ 0, @δ ¡ 0, Dy P D, }y x} ¤ δ and |f pxq f pyq| ¥ ε.

Let’s invoke such ε ¡ 0. If n ¥ 1, for δ 1{n we can find yn P D such that }yn x} ¤ 1{n and
|f pxq f pyn q| ¥ ε. In particular the sequence pyn qnPN converges to x but on the other hand, pf pyn qqnPN
cannot converge to f pxq because for all n P N, |f pxq f pyn q| ¥ ε.

This characterization is the one one should use to prove that a function is continuous when this is
not a “delicate” case. For instance, with Proposition 4.8 as well as Proposition 3.3 (which shows that a
sequence in R2 converges if and only if it does coordinate per coordinate) you can show that the functions

px, yq ÞÑ lnpx2 y 2 9q
?y, px, yq ÞÑ 3x 2y, px, yq ÞÑ expxpx4
2
y2 q
1

are all continuous (over its domain of definition for the first one, over R2 for the two last ones).

Proposition 4.9 (Operations on continuous functions). Let f, g : D R2 Ñ R be two functions. Assume

that f, g are continuous at a point x P D. Then the functions f g, f g and f {g (if g does not vanish at
x) are also continuous at x.
As a consequence, the functions f g, f g and f {g (if g does not vanish on D) are continuous over D
provided f and g are.

Proof. This is a consequence of Proposition 4.8 once we know that sums, products and quotients of
converging sequences converge to sums, product and quotients of the limits.

Let’s prove a first result about the composition of continuous functions, we will see more general
results later in the course.

Proposition 4.10. Let γ : I R Ñ R2 a parametric curve and f : D R2 Ñ R a function of two

variables where I is an interval of R while D is a domain of R2 . Assume furthermore that γ ptq P D for
all t P I. If γ is continuous over I and f is continuous over D, then f γ : I Ñ R is continuous

Proof. We rely on the sequential characterization. Let t P I and ptn qnPN a sequence which converges
to t. Then γ ptn q converges to γ ptq in R2 (to check that put together Definition 2.6 with Proposition
3.3). Thus, by sequential characterization (Proposition 4.8) the sequence pf pγ ptn qqqnPN converges to
pf pγ ptqqqnPN . This is enough to prove the claim.
However, it can happen that f γ is continuous for many curves γ while f is not continuous.
Example 4.11. Let’s take f : R 2
Ñ R defined by
#
1 if y x2 and px, y q p0, 0q,
f px, y q
0 otherwise.

This function is not continuous at p0, 0q. Indeed, for every δ ¡ 0 the ball centered at p0, 0q and of radius
δ contains points such that y x2 thus there exists px, y q P Bc pp0, 0q, δ q such that |f px, y q f p0, 0q|
|1 0| 1. Another way to see the discontinuity, is to take the sequence pp1{n, 1{n2 qqnPN which converges
to p0, 0q but f p1{n, 1{n2 q 1 does not converge to f p0, 0q.
On the other hand, if γ is the parametrization of a straight line with γ p0, 0q p0, 0q, then f γ is
continuous at p0, 0q. Indeed, let pv, wq P R2 be a non zero vector and γ : t P R ÞÑ tpv, wq ptv, twq be a

36
Chapter 4. Continuity and differentiability for functions from R2 to R

Curve y 2 = x;
f = 1 on it except at (0, 0)

f (0, 0) = 0

The restriction of f to this

line is continuous at (0,0).

f (x, y) = 0 not on
the blue curve

Figure 4.4: Example of the discontinuous function considered in Example 4.11.

parametrization of a line directed by pv, wq. If v 0 or w 0 then the image of γ is the Ox or the Oy
axis and f pγ ptqq 0 for all t P R. If not then
#
1 if t w{v 2 ,
f pγ ptqq
0 otherwise.

Thus for a given pv, wq, the map f γ is continuous at t 0. Actually f γ is even differentiable on a
neighborhood of 0. In other words, the restriction of f to a straight line passing through the origin is
always continuous at the origin, but f itself is discontinuous at p0, 0q.
The take-home message from this is that it’s not enough to look at the behavior of f across straight
lines passing through the origin to understand what’s going on.
Let’s end this section with a word on linear functions. A linear function from R2 to R can be written

Lpx, y q ax by

where a, b are two real numbers corresponding to Lp1, 0q and Lp0, 1q respectively. In particular, using for
instance Proposition 4.8:
Proposition 4.12. A linear function defined on R2 and valued in R is always continuous.

4.3 Complement: continuity, openness and closedness

This section is not something you could be tested on, but is rather here to give you a complement on the
notion of continuity and a more abstract characterization.
Definition 4.13 (Inverse image). If f : D R2 Ñ R and V R, we define f 1 pV q, the inverse image
of V as the subset of x P D such that f pxq P V .
The concept of inverse image does not apply only for functions of two variables. The main result of
this section is the following.
Proposition 4.14. Let f : R2 Ñ R a function defined R2 . Then f is continuous over R2 if and only
the inverse image of a open subset of R is an open subset of R2 . That is, f is continuous over R2 if and
only if for all V R open, the set f 1 pV q is open.
Proof. Implication (ñ). Assume f is continuous and let us take V R open, as well as x P f 1 pV q,
that is, f pxq P V . As V is open, there exists ε ¡ 0 such that Bc pf pxq, εq V . By continuity of f at
x, there exists δ ¡ 0 such that if y P Bc px, δ q then f pyq P Bc pf pxq, εq V . Thus f 1 pV q contains
f 1 pBc pf pxq, εqq which contains Bc px, δ q. This is enough to conclude that x belongs to the interior of
f 1 pV q, thus f 1 pV q is open.
Implication (ð). On the other hand assume that f 1 pV q is open for any open V and let’s take x P R2 .
We fix ε ¡ 0. We know that V Bo pf pxq, εq is open, thus f 1 pV q is also open. As it contains x, we know

37
Bocconi University – course 30543 (Mathematical Analysis module 2)

that we can find δ ¡ 0 such that Bo px, δ q f 1 pV q. The latter inclusion implies that |f pxq f pyq| ε
as soon as }y x} δ, and it is enough to conclude to the continuity of f at x. We finish by noticing
that x P R2 is arbitrary.

There is actually a similar characterization with the inverse image of closed sets. We first recall
without proof the following property from set theory: inverse image and complementation “commute”.

Lemma 4.15. Let f : R2 Ñ R and V R. Then f 1 pV c q pf 1 pV qqc .

Proposition 4.16. Let f : R2 Ñ R a function defined R2 . Then f is continuous over R2 if and only the
inverse image of a closed subset of R is an closed subset of R2 .

Proof. The proof is left as an exercise but can be thought as a corollary of Proposition 4.14 and Proposition
3.21 from the previous chapter, which shows that closed sets are nothing else than the complement of
open sets (and of course use Lemma 4.15 just stated above).

Remark 4.17. The direct image of an open set is in general not open, and the direct image of a closed
set is in general not closed, but it is already the case for functions of one variable. For instance, for the
function f : x ÞÑ cospxq, then the direct image of any open interval of length larger than 2π by f is
r1, 1s, which is not open (actually it is closed).

4.4 Partial derivatives

The first idea to differentiate a function of two or more variables is to differentiate it along a line to get
back to a function of one variable. This leads to the concept of partial derivative.

Definition 4.18 (Partial derivative). Let f : D R2 Ñ R. We define the partial derivative in the x
direction at a point px0 , y0 q P R2 as, if it exists,

Bf px , y q lim f px0 h, y0 q f px0 , y0 q

Bx 0 0 hÑ0 h
.

Equivalently, the partial derivative in the x direction at a point px0 , y0 q P R2 is the derivative at 0 of the
function h ÞÑ f px0 h, y0 q.
We define in a similar way the partial derivative in the y direction at a point px0 , y0 q P R2 as

Bf px , y q lim f px0 , y0 hq f px0 , y0 q

By 0 0 hÑ0 h
.

In other words, we only look at the derivate of a function defined on R, namely f γ where γ is the
parametrization of a line passing through px0 , y0 q with direction p1, 0q or p0, 1q. More generally, if u P R2
is any unit vector (that is }u} 1) in R2 , we can define

Bf f ppx0 , y0 q huq f px0 , y0 q

Bu px0 , y0 q hlim
Ñ0 h
(4.1)

the partial derivative in the direction of u. If u is not zero but is not a unit vector, the directional
derivative correspond to the one with respect to u{}u}, that is the unit vector sharing the same direction
as u. The partial derivatives defined above correspond to u e1 or e2 the two vectors of the canonical
basis.
Remark 4.19. In practice, to compute a partial derivative, one “freezes” the other variable and use
derivation rules for single variable calculus. For instance, if f is defined by f px, y q lnpx2 y 2 1q 3xy 2
then
Bf px, yq 2x 3y2 and Bf px, yq 2y
6xy. (4.2)
Bx x2 y 2 1 By x 2 y2 1

38
Chapter 4. Continuity and differentiability for functions from R2 to R

Remark 4.20. The notation pB f {B xqpx, y q is a bit ambiguous because the second x denotes the point at
which we differentiate while the first x indicates the variable with respect to which we differentitate. A
more proper notation could be B1 f px, y q to indicate that we differentiate f with respect with its first
coordinate.
As an example, let’s look at the function f defined on R2 by

f px, y q xy,

and let’s look at the two quantities

Bf px, xq and
d
rf px, xqs.
Bx dx
Specifically the first quantity is the partial derivative with respect to the first variable evaluated at px, xq,
while the second one is the derivative of the function g : x ÞÑ f px, xq x2 which is a function of a single
variable. Doing the computations, one finds
Bf px, xq x and
d
rf px, xqs 2x,
Bx dx
and these two quantities differ, at least for x 0.
Importantly, that a function has partial derivatives does not imply much: the function can even be
discontinuous! Indeed, if you take the function f of Example 4.11, then you can check that the partial
derivatives of the function at p0, 0q in the two directions x and y exist (the function is constant and equal
to 0 on the two axis). On the other hand, this function is discontinuous! The partial derivatives do not
encode enough information, because they are only about the behavior of the functions on lines parallel
to the axis.

4.5 Differential
We now turn to the concept of differentiability. We will start with a first definition which is not the most
standard but is easier to grasp, and also the one that one uses to check that a function is differentiable
on examples. At the end of the section we will give one which is more abstract, but is also the one which
makes sense in more general context.
First, as a preliminary result, let’s define what we will mean by op}h}q.
Lemma 4.21. Let g : D R2 Ñ R defined on a neighborhood of 0. Then the followings are equivalent:
(i) There holds g p0q 0 and the function h Ñ g phq{}h} converges to 0 as h Ñ 0, that is

g phq
@ε ¡ 0, Dδ ¡ 0, @h P Dzt0u, }h} ¤ δ ñ }h} ¤ ε.

(ii) The following holds:

@ε ¡ 0, Dδ ¡ 0, @h P D, }h} ¤ δ ñ |gphq| ¤ ε}h}.

(iii) There exists a function ω : D R2 Ñ R such that ωp0q 0, ω is continuous at 0 and such that
g phq }h}ω phq on D.
If this holds, we say that g is “small o of h”, that we write g phq ophq.
Proof. The equivalence between (i) and (ii) is a matter of multiplying or dividing by h. Then the
equivalence between (i) and (iii) relies on defining
$
p q
&g h if h 0,
ω ph q }h}
if h 0.
%
0

and then use Definition 4.7.

39
Bocconi University – course 30543 (Mathematical Analysis module 2)

Now let’s consider a point px0 , y0 q and a function f : R2 Ñ R for which both partial derivatives exist.
Note that we can read it as:

f px0 h, y0 q f px0 , y0 q
Bf px , y q, f px0 , y0 k q f px0 , y0 q
Bf px , y q,
h
Bx 0 0 k
By 0 0
where the two equations correspond to respectively existence of B f {B x and B f {B y. The precise statement
would involve Taylor expansion (see (4.3) below), it is the characterization of derivatives for functions of
one variable that you have seen in Mathematical Analysis – Module 1. Differentiability is the possibility
to combine these two estimates together, that is, to write:

f px0 k q f px0 , y0 q
B f px , y q Bf px , y q.
h, y0 h
Bx 0 0 k
By 0 0
More specifically, the will be replaced by a ophq as defined in Lemma 4.21.
Definition 4.22 (Differentiability, first definition). Let f : D R2 Ñ R be a function of two variables
defined on an open set D. We say that f is differentiable at px0 , y0 q P R2 if both partial derivatives of f
at px0 , y0 q exist and there exists r ¡ 0 such that, for all ph, k q P Bc p0, rq there holds px0 h, y0 k q P D
and
Bf Bf
f px0 h, y0 k q f px0 , y0 q h px0 , y0 q k px0 , y0 q opph, k qq.
Bx By
In this case, we call differential of f at px0 , y0 q, and write Dfpx ,y q the linear map from R2 to R defined
by: for ph, k q P R2
0 0

Bf Bf
Dfpx ,y q ph, k q h px0 , y0 q k px0 , y0 q.
0 0
Bx By
The equation defining differentiability should be read:

f px0 k q looomooon
f px0 , y0 q
Bf px , y q k Bf px , y q pph, kqq,
h, y0 h
Bx 0 0 By 0 0
loooooooooooooooomoooooooooooooooon
olooomooon
constant term
p q
reminder
linear function of h,k

that is, f is well described in a neighborhood of px0 , y0 q by the sum of the constant term f px0 , y0 q and a
linear function (which we call the differential).
At this point it may be useful to make the analogy with functions of one variable. Note that if
f : R Ñ R then its derivative at the point x0 is defined as the limit when h Ñ 0 of
f p x0 h q f p x0 q
.
h
If f is defined over R2 , then both x0 and h are vectors and its is not clear what the expression above
means: there is no canonical definition of 1{h the inverse of a vector1 . We rather look at this equivalent
definition of derivative: if f : R Ñ R is differentiable at x0 P R then
f p x0 hq f px0 q f 1 px0 qh ophq, (4.3)
where ophq is a quantity that goes to 0 faster than h as h Ñ 0. In some sense, the only thing that we have
done is “multiplying by h”. But this definition can be extended to functions defined over R2 . Indeed,
in this case f 1 px0 qh is interpreted as a linear function h ÞÑ f 1 px0 qh. For a function of two variables, the
term h ÞÑ f 1 px0 qh is replaced by the linear map

ph, kq ÞÑ h BBfx px0 , y0 q k

B f px , y q.
By 0 0
Let’s introduce some lighter notations. We write x0 px0 , y0 q while the vector ph, k q is h. Thus we
write Dfx0 phq in place of Dfpx0 ,y0 q ph, k q. Actually, it is convenient to introduce a notation for the vector
of partial derivatives, which is called the gradient.
1 A notable exception is when R2 is identified to C the set of complex numbers. Then one can divide by divide by h

which is a complex number. This gives rise to the field of complex analysis which we will not discuss at all.

40
Chapter 4. Continuity and differentiability for functions from R2 to R

Definition 4.23 (Gradient). Let f : D R2 Ñ R be a function of two variables defined on an open

set D. If x0 is differentiable at x0 , we call gradient of f at x0 , and we write ∇f px0 q or gradf px0 q the
vector
Bf px0 q
∇f px0 q BBfx P R2 .
By px0 q
In particular, Dfx0 phq ∇f px0 q h and we can write

f px0 hq f px0 q ∇f px0 q h ophq.

Following Example 4.6, the gradient gives the direction where the differential “increases the most”. As
the differential approximates the function, the interpretation is that ∇f px0 q gives the direction in which
the function f increases the most if one starts from point x0 P R2 .

Now let’s unpack some consequences from the notion of differentiability.

Proposition 4.24 (Differentiability implies continuity). Let f : D R2 Ñ R be a function of two

variables defined on an open set D. If f is differentiable at x0 then f is continuous at x0 .

Proof. Recall that we can write f px0 hq f px0 q Dfx0 phq ophq. The function f being continuous
at x0 is equivalent to g : h ÞÑ f px0 hq being continuous at 0. But g is the sum of f px0 q (which does
not depend on h hence is continuous), Dfx0 (which is continuous, see Proposition 4.12) and ophq. The
latter goes to 0 as h tends to 0 as seen for instance in Lemma 4.21, (ii).

As in one dimension the derivative gives the tangent line to the graph of f , the differential gives the
tangent plane to the graph of f . First let us notice that by writing h y x0 , one can write the Taylor
expansion as
f pyq f px0 q Dfx0 py x0 q opy x0 q.
The graph of the principal part of the Taylor expansion corresponds to the tangent plane.

Definition 4.25 (Tangent plane). Let f : D R2 Ñ R be a function of two variables defined on an

open set D. If f is differentiable at x0 , we call tangent plane at px0 , f px0 qq to the graph of f the graph
of the function y ÞÑ f px0 q Dfx0 py x0 q.
In coordinates, if x px0 , y0 q then the tangent plane at px0 , y0 , f px0 , y0 qq is the plane defined as the
set of px, y, z q P R3 satisfying the equation

Bf px , y qpx x q Bf px , y qpy y q pf px , y q zq 0.
Bx 0 0 0
By 0 0 0 0 0

The explicit expression for the tangent plane can be obtained thanks Definition 4.22 (which gives the
expression of the differential in terms of partial derivatives) and Example 4.6 (which gives the expression
of the graph of an affine function).

We then move to a result justifying differentiability. Indeed, a priori if we read Definition 4.22 one
should first compute the partial derivatives, and then check that the difference between f and its candidate
differential is small, that is, is ophq. As emphasized before (Example 4.11), if the partial derivatives exist
at a point then the function is not necessarily differentiable. However this becomes the case if the partial
derivatives exist at every point and are continuous, this is the point of the following result.

Theorem 4.26 (Existence and continuity of partial derivatives implies differentiability). Let f : D
R2 Ñ R be a function of two variables defined on an open set D. We assume that the partial derivatives
exist at every point of D, and that the functions BBfx : D Ñ R and BBfy : D Ñ R are continuous (as functions
of two variables) on D. Then the function f is differentiable at every point of D.
In such a case, the function is said of class C 1 .

41
Bocconi University – course 30543 (Mathematical Analysis module 2)

Figure 4.5: Example of the graph of the function f px, y q 3x2 y 2 xy (in red) with its tangent plane
at the point A p1, 1, 3q and the normal vector the the plane pB f {B x, B f {B y, 1q (in green).

y0 + k

f (x0 + h, y0 + k) − f (x0 , y0 )
f (x0 + h, y0 + k) − f (x0 + h, y0 )
∂f
y0 =k (x0 + h, y 0 )
∂y
y0
∂f 0
f (x0 + h, y0 ) − f (x0 , y0 ) = h (x , y0 )
∂x

x0 x0 x0 + h

Figure 4.6: Idea of the proof of Theorem 4.26. We decompose f px0 k, y0 lq f px0 , y0 q (in blue) as two
variations (in red and green) aligned with the axis, and for each of them we use the mean value theorem
for a function of one variable.

42
Chapter 4. Continuity and differentiability for functions from R2 to R

Proof. Let x0 px0 , y0 q P D be fixed and take h ph, kq. We look at
Bf Bf
∆phq f px0 hq f px0 q h px0 q k px0 q
Bx By
and we want to show that the quantity ∆phq is a ophq. To that end, we will rewrite ∆phq using the mean
value theorem.
Specifically we write

f px0 hq f px0 q f px0 h, y0 k q f px0 , y0 q

rf px0 h, y0 k q f px0 h, y0 qs
rf px0 h, y0 q f px0 , y0 qs
As D is open, we can find r ¡ 0 such that x0 Bc p0, rq D. For any h ph, k q P Bc p0, rq, as the partial
functions r ÞÑ f px0 r, y0 q and s ÞÑ f px0 h, sq are of class C 1 , we can use the mean value theorem and
see that
f px0 k q f px0 h, y0 q
h, y0
k
BBfy px0 h, y 1 q and
f p x0 h, y0 q f px0 , y0 q
h
BBfx px1 , y0 q,
where x1 P px0 , x0 hq and y 1 P py0 , y0 k q. Thus plugging this into the first equality we derived,

f px0
Bf px1 , y q k Bf px h, y1 q.
hq f px0 q h (4.4)
Bx 0 By 0
Now we want to replace the points px1 , y0 q and px0 h, y 1 q by px0 , y0 q: we will do that up to a small error
thanks to the continuity of the partial derivatives.
Let ε ¡ 0. By continuity of B f {B x and B f {B y, we can find δ ¡ 0 such that, if }h1 } ¤ δ then
|Bf {Bxpx0 h1 q Bf {Bxpx0 q| ¤ ε and |Bf {Bypx0 h1 q Bf {Bypx0 q| ¤ ε. So now let’s fix h P Bc p0, δq.
Applying the result above for h1 ph, y 1 y0 q and h1 px1 x0 , 0q we discover that

B p
f
h, y 1 q
Bf px , y q ¤ ε and
B p
f 1
B f
q Bx px0 , y0 q ¤ ε.
B
y x0 By 0 0 B
x x , y0

Then plugging this bound into (4.4) and playing with the triangle inequality we see:

|∆phq| f px0 hq f px0 q h
Bf px q k Bf px q
Bx
0 By 0

h BBfx px1 , y0 q BBfx px0 , y0 q k BBfy px0

h, y 1 q
Bf px , y q
¤ |h|ε |k|ε ¤ ?2ε}h},
By 0 0
where the last line is Cauchy Schwarz (Proposition 1.9). Thus we fall in (iii) of Lemma 4.21, and it shows
that ∆phq ophq which concludes the proof.
One corollary of this result is the following stability result for functions of class C 1 .
Proposition 4.27. Let f, g : D R2 Ñ R two functions of class C 1 defined on a open subset of R2 .
Then the functions f g and f g are of class C 1 . If g does not vanish on D, the function f {g is of class
C 1.
Proof. It is enough to prove that the functions f g, f g and f {g have partial derivatives and that these
partial derivatives are continuous. The first property derives from calculus for functions of one variable
(the sum, product, quotient of functions of differentiable functions is differentiable), and for the second
property we can use Proposition 4.9 which shows that sum, product, quotient of continuous functions
over R2 are continuous.
Eventually, we conclude this section by a second equivalent definition of differentiability which does
not involve partial derivatives: it is enough for the function to be approximated by a linear application,
and then necessarily this linear function is the differential and can be written with the help of partial
derivatives. In most textbooks, this is directly this definition that you will encounter, as it is more
compact and more “elegant” than Definition 4.22.

43
Bocconi University – course 30543 (Mathematical Analysis module 2)

Proposition 4.28 (Differentiability, second definition). Let f : D R2 Ñ R be a function of two

variables defined on an open set D. Then f is differentiable at x0 P R2 if and only if there exists r ¡ 0
and a linear application L : R2 Ñ R such that, for h belonging to Bc p0, rq there holds x0 h P D and

f px0 hq f px0 q Lphq ophq.

In this case, the partial derivatives of f at px0 , y0 q exist and we necessarily have L Dfx0 , that is,

Lph, k q h
B f px , y q Bf px , y q.
Bx 0 0 k
By 0 0
Proof. The direction pñq is direct: if f is differentiable according to Definition 4.22, then indeed such L
exists and is nothing else than Dfx0 .
On the other hand, the converse direction pðq is where there is something to do. So let’s assume
that there exists such a L. We claim that it implies that the partial derivatives at px0 , y0 q exist. Indeed,
taking h ph, 0q for h small we find

f px0 h, y0 q f px0 , y0 q Lph, 0q ophq f px0 , y0 q hLp1, 0q ophq

by linearity of h. It implies that the function h ÞÑ f px0 h, y0 q is differentiable at 0 with derivative given
by Lp1, 0q. Thus B f {B xpx0 , y0 q exists and is equal to Lp1, 0q. Similarly, testing with h p0, hq, we find
that B f {B y px0 , y0 q exists and is equal to Lp0, 1q. Thus, by linearity of L,

Lph, k q hLp1, 0q
Bf px , y q
kLp0, 1q h
Bf px , y q Df phq.
Bx 0 0 k
By 0 0 x 0

Eventually, we simply write f px0 hq f px0 q Lphq f px0 hq f px0 q Dfx phq, and the left hand
side is ophq by assumption, so is the right hand side. We conclude to the differentiability of f .
0

4.6 A particular case of the chain rule

Let us conclude by a particular case of a chain rule: we will differentiate a function along a curve and
draw some consequences from it. We start with the main proposition.
Proposition 4.29. Let f : D R2 Ñ R be a function of two variables defined on an open set D, and
let γ : I Ñ R2 defined on an interval I and such that γ ptq P D for all t P I. We assume that γ is
differentiable at t0 P I and that f is differentiable at γ pt0 q P D. Then the function g : t P I ÞÑ f pγ ptqq is
differentiable at t0 and
g 1 pt0 q Dfγ pt0 q pγ 1 pt0 qq ∇f pγ pt0 qq γ 1 pt0 q.
Written in coordinates, this reads

g 1 pt0 q γ11 pt0 q

Bf f pγ pt q, γ pt qq γ21 pt0 q
Bf f pγ pt q, γ pt qq.
Bx 1 0 2 0 By 1 0 2 0
Proof. Thanks to the definition of the differential, we can write for a real number h 0 small enough,

g pt0 hq g pt0 q f pγ pt0 hqq f pγ pt0 qq γ pt0 hq γ pt0 q 1

h
h
Dfγpt q 0
h h
opγ pt0 hq γ pt0 qq

Next we send h to 0. There holds

γ pt0 hq γ pt0 q
lim Dfγ pt0 q
h Ñ0 h
Dfγpt q pγ 1 pt0 qq
0

by definition of γ 1 and continuity of Dfγ pt0 q . On the other hand, to handle the “small o” let us write it
}h}ωphq where ω is a function which goes to 0 as h goes to 0. Then
1
opγ pt0 hq γ pt0 qq
}γ pt0 hq γ pt0 q}
ω pγ pt0 hq γ pt0 qq.
h h

44
Chapter 4. Continuity and differentiability for functions from R2 to R

Level set {x : f (x) = f (x0 )}

x0
∇f (x0 )

Figure 4.7: Level sets of a function (in blue) with the gradient ∇f px0 q (in red) being orthogonal to the
level at at the point x0 (in green).

When h Ñ 0, then ω pγ pt0 hq γ pt0 qq goes to 0 (as γ is continuous at t0 ) while }γ pt0 hhqγ pt0 q} goes to
}γ 1 pt0 q} (actually it tends to }γ 1 pt0 q} if h stays positive and }γ 1 pt0 q} if h stays negative) and remains
bounded. Thus we have the product of a term which is bounded by a term which goes to 0: the whole
thing goes to 0. This concludes the proof.
A first consequence is the following. If a function is differentialble, then all the directional derivatives
can be expressed in terms of the differential, that is, of the two partial derivatives.
Proposition 4.30. Let f : D R2 Ñ R be a function of two variables defined on an open set D which
is differentiable at a point x0 P D. If u P R2 is a unit vector (that is }ubf } 1), then f is differentiable
at x0 in the direction u and
Bf px q Df puq ∇f px q u,
Bu 0 x0 0

where we recall that the directional derivative is defined in (4.1). In coordinates, if u pu1 , u2 q P R2 ,
Bf px q u Bf px q u Bf px q.
Bu 0 1
Bx 0 2 By 0
Proof. As seen in (4.1), denoting γ : t ÞÑ x0 tu, then B f {B upx0 q is nothing else than the derivative of
the function f γ for t 0. Thus we directly apply Proposition 4.29.
Another consequence of Proposition 4.29 is to interpret the gradient as the vector to which level sets
are orthogonal. One way to phrase a rigorous statement is the following.
Proposition 4.31. Let f : D R2 Ñ R be a function of two variables of class C 1 defined on an open
set D. Let γ : I R Ñ R2 which takes its value in a fixed level set of f . Then ∇f pγ ptqq γ 1 ptq 0 for
all t P I
Proof. Immediate from Proposition 4.29 once we notice that f γ is a constant function (this is equivalent
to say that γ is included in a level set).
The way to read ∇f pγ ptqq γ 1 ptq 0 is as follows: γ 1 ptq is the direction of the tangent to the curve
γ, which is included in the level set; and this direction is orthogonal to ∇f pγ ptqq the gradient of the
function. So for instance if the level set can written as the image of a C 1 curve, then the tangent to the
curve is orthogonal to the gradient. This is actually not that surprising: as mentioned in Example 4.6,
for an affine function px, y q ÞÑ ax by c, the vector pa, bq is orthogonal to level sets. In the case of
general function, the vector pa, bq of the linear approximation around a point x0 is nothing else than the
gradient ∇f px0 q.

45
46
Chapter 5

Function from Rd to Rp: vector fields

and change of coordinates

In this chapter, we study the more general case of functions defined on a subset of Rd and valued in Rp .
This encompasses Chapter 2 (with d 1) and Chapter 4 (with d 2 and p 1).
The question of continuity is very similar to Chapter 4 and we will not do the proofs in details.
Differentiability is a little bit more involved but the general idea is the same: the differential Dfx0 of
a function f : Rd Ñ Rp at a point x0 is a linear function Rd Ñ Rp such that f px0 q f px0 q is
approximated by Dfx0 . Though now the differential is represented not by the gradient vector but by a
matrix, called the Jacobian matrix. The main theoretical result of this section is the chain rule: abstractly
the result is very neat, as the differential of the composition is the composition of the differentials.

5.1 Definition
Definition 5.1. A function on f : D Rd Ñ Rp is the data of D Rd which is the domain of definition,
and f which associates to every point x of D a vector f pxq P Rp .

In coordinates, one can write for x pxi q1¤i¤d that

f1 px1 , x2 , . . . , xd q
f2 px1 , x2 , . . . , xd q
f pxq f px1 , x2 , . . . , xd q PR .
p
..
.
fp px1 , x2 , . . . , xd q

That is, f is given by pfj q1¤j ¤p the collection of p functions all defined on the same domain D and valued
in R. Similarly to the case of parametric curve, to plot the function or understand what’s going on,
thinking at f as a collection of real-valued functions doesn’t tell the whole story.
Representing such a function is not easy, the representation that one chooses depends on the context
and the message that one wants to convey.

Definition 5.2 (Graph of a function). Let f : D Rd Ñ Rp a vector valued function of d variables. Its
graph is the subset of Rd p made of the points px, yq such that x P D Rd , y P Rp and y f pxq.

Definition 5.3 (Image of a function). Let f : D Rd Ñ Rp a vector valued function of d variables. Its
image is the subset of Rp made of the points y such that y f pxq for some x P D.

However one cannot represent the graph as soon as d p ¡ 3. When d p 2, there are possible
workarounds.

• If f pxq P R2 is thought as a vector, we can plot the domain D and at some points x P D plot arrows
corresponding to f pxq P R2 . Then f is called a vector field, see Figure 5.1.

47
Bocconi University – course 30543 (Mathematical Analysis module 2)

Vector f (x, y)

y
Point (x, y)

x Representation of the
function
2
x − y2 − 4

f (x, y) =
2xy

Figure 5.1: Representation of a function f : R2 Ñ R2 as a vector field. One attaches the vector f px, y q
to the point px, y q for different values of px, y q P R2 .

• If f pxq P R2 is rather thought as a point, one can grid the domain D and show what it is mapped
to. This makes more sense when f is thought as a change of coordinates, see Figure 5.2.
• If x belongs to a subset of R2 and f pxq P R3 , instead of parametric curve (which corresponds to
the case x P R), we have a parametric surface and one can represent the image of f , that is, the
subset of R3 defined as ty P R3 : Dx P R2 , f pxq yu. See Figure 5.3 for such an example.
Similarly to the case of real-valued function, by composition with a curve, one can go back to a
parametric curve R Ñ Rp . Indeed, if γ : I Ñ Rd where I is an interval of R and f : D Rd Ñ Rp ,
then provided that f ptq P D for all t P I one can define f γ : t P I ÞÑ f pγ ptqq P Rp . That is, f γ is a
parametric curve.

5.2 Continuous function

This section will not contain many proofs because we basically already did them in Chapter 4. Most of
the times the only thing required is to take the proofs of the previous chapter and replace absolute valued
by norms.
Definition 5.4 (Continuous function). Let f : D Rd Ñ Rp a function. We say that f is continuous
at the point x P Rd if
@ε ¡ 0, Dδ ¡ 0, @y P D, }y x} ¤ δ ñ }f pxq f pyq} ¤ ε.
The function f is continuous over D if it is continuous at every point of D.
Note that in the expression above }y x} and }f pxq f pyq} correspond to Euclidean norm but in different
Euclidean spaces (Rd for the former and Rp for the latter). The sequential characterization works in the
same and the proof is identical to the one in Chapter 4.
Proposition 5.5 (Sequential characterization of continuity). Let f : D Rd Ñ Rp a function. The
function f is continuous at a point x P D if and only if, for every sequence pyn qnPN which converges to
x and such that yn P D for all n P N, the sequence pf pyn qqnPN converges to f pxq.
Note that in the statement above, “pyn qnPN converges to x” holds in the domain D Rd while “pf pyn qqnPN
converges to f pxq” holds in the codomain Rp .
Thanks to this sequential characterization we draw some consequences.

48
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates

f (r, θ) = (r cos(θ), r sin(θ))

θ
f (r, θ)
π/2
(r, θ)

π/4

r
1/2 1

Figure 5.2: Representation of a function f : r0, 8q r0, 8q Ñ R2 as a “change of coordinates”. Here

the function is f pr, θq pr cospθq, r sinpθqq and corresponds to polar coordinates. One shows how the
domain is “distorted” into the codomain. Specifically, one plots a grid on the domain and plots the image
of this grid on the codomain. The colors are chosen in such a way that they are preserved by f , e.g.
something subset which is in blue in the domain is mapped to something in blue in the codomain.

Figure 5.3: Image of a function f : R2 Ñ R3 , here a torus. The function f is f pu, v q ppR
r cospuqq cospv q, pR r cospuqq sinpv q, r sinpuqq where pr, Rq p1, 3q are the inner and outer radius of
the torus. One plots only the points px, y, z q which can be written px, y, z q f pu, v q for some pu, v q.

49
Bocconi University – course 30543 (Mathematical Analysis module 2)

Proposition 5.6 (A composition of continuous functions is continuous). Let f : D Rd Ñ Rp and

g : U Rp Ñ Rq two functions. We assume that f pxq P U for all x P D, so that g f : D Ñ Rq is
well-defined. If f is continuous at x and g is continuous at f pxq, then g f is continuous at x.
In particular, if f is continuous over D and g is continuous over U , then g f is continuous over D.
Proof. The proof is left as an exercise and relies on the sequential characterization, but it is basically the
same as for Proposition 4.10 from the previous chapter.
We also have the following consequence: one can study continuity independently for each coordinate
function.
Proposition 5.7. Let f : D Rd Ñ Rp a function and write it f pfj q1¤j ¤p for the coordinate
functions, that is, each fj is defined on D and valued in R. Then the function is continuous at x (respec-
tively continuous over D) if and only if all of the fj for j P t1, 2, . . . pu is continuous at x (respectively
continuous over D).
Proof. The idea is to use the sequential characterization: the function f is continuous at x if and only if
for all sequence pyn qnPN in D,

pyn qnPN converges to x ñ pf pyn qqnPN converges to f pxq.

Then, using Proposition 3.3 from Chapter 3, we can replace the right hand side of the implication by

pyn qnPN converges to x ñ @j P t1, 2, . . . , pu, pfj pyn qqnPN converges to fj pxq.
We can take the @j out of the implication, and then switch the order of the two quantifiers. This reads
as follows. The function f is continuous at x if and only if for all sequence pyn qnPN in D,

@j P t1, 2, . . . , pu, rpyn qnPN converges to x ñ pfj pyn qqnPN converges to fj pyqs .
But then for the latter implications we use the characterization of continuity for functions f : Rd Ñ R
that we have seen in the previous chapter: see Proposition 4.8. This is enough to conclude the proof.
So the study of continuity for a vector valued function is nothing else than the study of the continuity
of its coordinate functions. In particular, if d the dimension of the domain is larger or equal than 2, than
all the weird examples of discontinuous functions (see for instance Example 4.11) can be reproduced.
Note also that we adopted the characterization of continuity we just proved as a definition in the case of
curve (see Definition 2.6) for simplicity.
Remark 5.8. The same characterization as the one in Section 4.3 works. That is, a function f : Rd Ñ Rp
is continuous if and only if it the inverse image of an open set is an open set, and if and only if the inverse
image of a closed set is a closed set.

5.3 Differentiability
For the case of real-valued functions, we have seen first the definition of partial derivatives and then
the differential. We will reproduce the same path (without doing the proofs as they can be adapted
for instance by working coordinates per coordinates in the codomain Rp ), the difference is that partial
derivatives become vectors and the differential is represented by a matrix called the Jacobian matrix.
Definition 5.9 (Partial derivative). Let f : D Rd Ñ Rp a function and write it f pfj q1¤j ¤p for the
coordinate functions, that is each fj is defined on D and valued in R. Let i P t1, 2, . . . , du. If all the fj
have a partial derivative at x in the direction xi , we
define the partial derivative in the xi direction at a
d B f Bf
point x0 P R , and write it Bxi px0 q, as the vector Bxji px0 q P Rp .
¤¤
1 j p

No surprise, taking partial derivatives is equivalent to taking partial derivatives coordinate per coordinate
of the codomain. Let us introduce directly the Jacobian matrix, which is the neat algebraic structure to
store derivatives.

50
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates

Definition 5.10 (Jacobian matrix). Let f : D Rd Ñ Rp be a function of d variables defined on an

open set D. We write f pfj q1¤j ¤p for the coordinate functions, that is each fj is defined on D and
valued in R. If all partial derivatives of f at a point x0 exist, we define Jf px0 q the Jacobian matrix of f
at x0 : it is the matrix with p rows and d columns defined by

@i P t1, 2, . . . , du, @j P t1, 2, . . . , pu, rJf px0 qsji BBfxj px0 q. (5.1)
i

For instance, for f : R2 Ñ R2 the Jacobian matrix is given by

Bf1 Bf1
BBxf1 BBxf2 .
B x1 B x2
2 2

That is row indices correspond to the codomain while column indices are for the domain. In fact, the
columns of the Jacobian matrix are the vectors B f {B xi px0 q.
We move to the definition of differentiability, but let’s discuss it before giving the formal definition.
In the previous chapter, we define it through the Taylor expansion:

f px0 hq f px0 q Dfx0 phq ophq,

where Dfx0 was the differential, it is a linear function from R2 to R which reads (for a function from R2
to R)
Bf Bf
Dfx0 ph, k q h px0 q k px0 q.
Bx By
The natural extension is to have now

Dfx0 phq h1
Bf Bf Bf
Bx1 px0 q h2
Bx2 px0 q ... hd
Bxd px0 q.
Now notice that Dfx0 phq belongs to Rp (as each partial derivative BBxfi is a vector in Rp ), and that we
sum d terms instead of 2. Actually, the differential becomes in this case a linear operator from Rd to Rp .
Moreover, this can be written in a more compact form
Bf px q Bf px q Bf px q J px qh, (5.2)
h1
B x1 0 h2
B x2 0 B xd 0 f 0
... hd

as the matrix vector product between the Jacobian matrix Jf px0 q and the vector h.

Definition 5.11 (Differentiability for functions of several variables). Let f : D Rd Ñ Rp be a function

of d variables defined on an open set D. We say that f is differentiable at x0 P D if there exists r ¡ 0
such that for all h P Bc p0, rq there holds x0 h P D and

f px0 hq f px0 q Jf px0 qh ophq,

where ophq means that each of the coordinate function is a ophq in the sense of Lemma 4.21.
In this case, we call differential of f at x0 , and write Dfx the linear map Rd Ñ Rp represented by
the matrix Jf px0 q, that is, defined by h Ñ Jf px0 qh.
0

Given (5.2), this definition coincides with Definition 4.22 in the case pd, pq p2, 1q. To further explicit
the parallel, let us directly state the analogous result of Proposition 4.28.

Proposition 5.12 (Differentiability, second definition). Let f : D Rd Ñ Rp be a function of d variables

defined on an open set D. Then f is differentiable at x0 P R2 if and only if there exists r ¡ 0 and a
linear application L : Rd Ñ Rp such that, for h belonging to Bc p0, rq there holds x0 h P D and

f px0 hq f px0 q Lphq op h q .

In this case, we necessarily have Lphq Dfx0 phq Jf px0 qh.

51
Bocconi University – course 30543 (Mathematical Analysis module 2)

∂f ∂f
(u0 , v0 ) × (u0 , v0 )
∂u ∂u
Tangent plane to f (R2 ) at f (u0 , v0 )
∂f
v axis (u0 , v0 )
∂v
(u0 , v0 ) f : R2 → R3
∂f
f (u0 , v0 ) (u0 , v0 )
∂u

u axis

Surface f (R2 )

Figure 5.4: Tangent plane to the image of a function f : R2 Ñ R3 as discussed in Example 5.15. The
image of the function is the surface in green. At the point pu0 , v0 q, the two lines parallel to the axis (in
red and blue) are mapped by f to two curves (also in red and blue) included in the surface and whose
tangent at f pu0 , v0 q are the lines directed by B f {B upu0 , v0 q and B f {B v pu0 , v0 q. The tangent plane (in
purple) contains these two vectors, hence is normal to B f {B upu0 , v0 q B f {B v pu0 , v0 q.

Proof. Again, the direction pñq is straightforward.

For pðq, as L is linear we know that it is represented as Lphq M h where M is nothing else than
the matrix of the map L in the canonical basis. From linear algebra, we know that the i-th column of M
is nothing else than Lpei q, being ei the i-th vector of the canonical basis. Using the definition of the L
tested for h hei , we can write

f px0 hei q f px0 q Lphei q ophei q f px0 q hLpei q ophq,

in such a way that Lpei q is the derivative at 0 of h ÞÑ f px0 hei q. The latter expression exactly coincides
by definition with the vector B f {B xi px0 q. Thus, M Jf px0 q. Once we arrive at this point, the last step
is identical to the case of the previous chapter: f px0 hq f px0 q Jf px0 qh f px0 hq f px0 q Lphq,
and the right hand side is ophq by assumption, so is the left hand side.

In this abstract setting this characterization of the differential is word for word the one for functions
R2 Ñ R: that’s where there is a reward for abstraction! On the other hand, for practical computations
the present case is more involved because one needs to store a matrix rather than just the gradient vector.
For the record, let’s state a proposition analogous to the previous chapter (without proof, being the proof
almost identical to the one of Proposition 4.24).

Proposition 5.13. Let f : D Rd Ñ Rp be a function defined on an open set D. If f is differentiable

at x0 then f is continuous at x0 .

Remark 5.14 (Link between differential and derivative). Let f pf1 , . . . , fp q : I R Ñ Rp a parametric
curve, that is a function of one variable valued into Rp . If all the functions fj are differentiable, we can
define the derivative at a point t as f 1 ptq pf11 ptq, . . . , fp1 ptqq P Rp , see Definition 2.7. On the other hand,
the differential at the same point t P I is the linear map R Ñ Rp defined by

hf11 ptq
h P R ÞÑ Dft phq hf 1 ptq ... P Rp .

hfp1 ptq

So the derivative is a vector while the differential is a linear map R Ñ Rp . One can recover the derivative
from the differential by f 1 ptq Dft p1q, that is the differential applied to the “vector” 1 P R.

52
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates

Example 5.15 (Tangent to a surface). Let f : R2 Ñ R3 a “parametric surface” and write pu, v q for the
variables in the domain. We look at the image of f which we call S. It is the subset of R3 defined as
S tpx, y, z q P R3 , Dpu, v q P R2 , f pu, v q px, y, z qu. An example was already given in Figure 5.3.
Let pu0 , v0 q P R2 and assume that f is differentiable at pu0 , v0 q. The tangent plane to the surface is
the image of the principle part of the Taylor expansion, that is of h ÞÑ f pu0 , v0 q Dfpu0 ,v0 q phq. More
explicitly, the tangent plane to the surface is the image of the map

ph, kq P R2 ÞÑ f pu0 , v0 q Bf pu , v q Bf pu , v q P R3 ,
h
Bu 0 0 k
Bv 0 0
where here B f {B upu0 , v0 q and B f {B v pu0 , v0 q are vectors in R3 . Generically, it is the parametric equation
of a plane in R3 . It is a plane which contains the point f pu0 , v0 q and whose direction contains the vectors
Bf {Bupu0 , v0 q and Bf {Bvpu0 , v0 q, thus a normal direction is given by the cross product Bf {Bupu0 , v0 q
Bf {Bvpu0 , v0 q. In summary, a point x px, y, zq belongs to the tangent plane at pu0 , v0 q if and only if

px f pu0 , v0 qq BBfu pu0 , v0 q BBfv pu0 , v0 q 0.

We refer to Figure 5.4 for an illustration. This reasoning actually works when the cross product
Bf {Bupu0 , v0 q Bf {Bvpu0 , v0 q is non zero (that is equivalent to the rank of Dfpu0 ,v0 q being equal to
2), such points are called regular points. In the case where B f {B upu0 , v0 q B f {B v pu0 , v0 q 0 the image
of the function f can have cusps and singular points. There are more types of singular points for surfaces
than for curves, however this is out of the scope of these notes.
The same counterexample (Example 4.11) holds: even if partial derivatives exist at a point, the
function is not necessarily continuous nor differentiable. But we can prove differentiability if partial
derivatives exist and are continuous over D. We will not prove this theorem, its proof would consist in
a tedious adaptation of Theorem 4.26 with more cumbersome notations as the dimension of the domain
and codomain have increased. But the underlying idea is the same.
Theorem 5.16. Let f : D Rd Ñ Rp be a function of d variables defined on an open set D. We write
f pfj q1¤j ¤p for the coordinate functions, that is each fj is defined on D and valued in R. We assume
that the partial derivatives exist at every point of D at that the functions pB fj {B xi q are continuous on D
for all i P t1, 2, . . . , du and j P t1, 2, . . . , pu. Then the function f is differentiable at every point of D.
In such a case, the function is said of class C 1 .

5.4 The chain rule

We conclude this chapter with the chain rule about the differential of the composition. The abstract
statement is very clean, getting the formulas right on concrete examples can be more tedious.
Theorem 5.17 (Chain rule). Let f : D Rd Ñ Rp be a function of d variables defined on an open set
D. Let g : U Rp Ñ Rq be a function of p variables defined on an open set U . We assume that f pxq P U
for all x P D in such a way that the function g f : x P D ÞÑ g pf pxqq P Rq is well defined.
If f is differentiable at x0 P D and g is differentiable at f px0 q P U then g f is differentiable at x0
and, for h P Rd ,
Dpg f qx0 phq Dgf px0 q Dfx0 phq Dgf px0 q pDfx0 phqq .
In particular, if Jf px0 q is the Jacobian matrix of f at x0 and Jg pf px0 qq is the Jacobian matrix of g at
f px0 q then the Jacobian matrix of g f at x0 is the matrix product Jg pf px0 qqJf px0 q.
Proof. We simply compose the Taylor expansions of the functions. Indeed, for k P Rp one can write the
following expansion for g:

g pf px0 q kq g pf px0 qq Dgf px0 q pkq op k q .

We apply it for k f px0 hq f px0 q Dfx0 phq ophq. It yields

g pf px0 hqq g pf px0 qq Dgf px0 q pDfx0 phqq Dgf px0 q pophqq o pDfx0 phq ophqq .

53
Bocconi University – course 30543 (Mathematical Analysis module 2)

We now have to understand what’s happening to the small o. For the first one Dgf px0 q pophqq, we write
for a function ω : Rd Ñ Rp (or more precisely defined only on a neighborhood of 0) which goes to 0 as
hÑ0
Dgf px0 q pophqq Dgf px0 q p}h}ω phqq }h}Dgf px0 q pω phqq
by linearity of Dgf px0 q . By continuity of Dgf px0 q , the quantity Dgf px0 q pω phqq goes to 0 when h Ñ 0, and
that allows to conclude that Dgf px0 q pophqq is a ophq.
On the other hand, for the second one o pDfx0 phq ophqq we rather show (ii) of Lemma 4.21. First,
note that there exists a constant C such that }Dfx0 phq} ¤ C }h}: this can be checked coordinate per
coordinate, the coefficient C would depend on the entries of the matrix representing Dfx0 . Thus, using
also (ii) of Lemma 4.21 with ε 1, we find r ¡ 0 such that for all h P Bc p0, rq, there holds
}Dfx phq
0 ophq} ¤ pC 1q}h}.
Then, fix ε ¡ 0. Using (ii) of Lemma 4.21 for the “outer” small o, we find δ such that if }k} ¤ δ then
}opkq} ¤ ε{pC 1q}k}. We apply it to k Dfx phq ophq: if }h} ¤ minpr, δ{pC 1qq then we can see
0
that
}opDfx phq ophqq} ¤ ε}h}.
0

This is enough to justify that opDfx phq ophqq is a ophq.

0
In the end, we end up writing
g pf px0 hqq g pf px0 qq Dgf px0 q pDfx0 phqq op h q ,
and note that the map h Ñ Dgf px0 q pDfx0 phqq is linear as combination of linear maps. This is enough,
thanks to Proposition 5.12, to identify Dgf px0 q Dfx0 as the differential of g f .
The formula for the composition of the jacobian matrices comes from the property that composition
of linear functions corresponds to matrix multiplication of the corresponding matrices.
If we write it in full coordinates, let f pf1 , . . . , fp q while g pg1 , . . . , gq q. We also write y P Rd
for y f pxq the argument of g, that is the partial derivatives of gk are the B gk {B yj . As stated in
Theorem 5.17 the chain rule reads Jf g pxq Jg pf pxqqJf pxq. If we expand that in terms of coordinates,
the entry pk, iq of Jgf pxq is (by definition of matrix product)
¸
p
rJgf pxqski rJg pf pxqqskj rJf pxqsji .

j 1

Thus using the definition of the Jacobian matrices, see (5.1), we end up with

Bpg f qk pxq B rg pf px , . . . , x q, . . . , f px , . . . , x qqs ¸p Bgk pf pxqq Bfj pxq. (5.3)

B xi B xi k 1 1 d p 1 d
j 1
B yj Bxi
Example 5.18. Using this general formula and as an exercise, recover the result of Proposition 4.29 which
corresponds to d 1, p 2 and q 1.
Example 5.19. Let f : R2 Ñ R be a real valued function of x, y. We consider the function g : R2 ÑR
given by
g pr, θq f pr cospθq, r sinpθqq
which corresponds to the function f viewed in “polar coordinates”. Then, applying the chain rule,

Bg pr, θq cospθq Bf pr cospθq, r sinpθqq sinpθq Bf pr cospθq, r sinpθqq

Br Bx By
and
Bg pr, θq r sinpθq Bf pr cospθq, r sinpθqq r cospθq
Bf pr cospθq, r sinpθqq.
Bθ Bx By
Proposition 5.20. Let f : D Rd Ñ Rp be a function of d variables defined on an open set D. Let
g : U Rp Ñ Rq be a function of p variables defined on an open set U . We assume that f pxq P U for
all x P D in such a way that the function g f : x P D ÞÑ g pf pxqq P Rq is well defined.
If f and g are of class C 1 , then so is g f .

54
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates

We conclude this chapter by another consequence of the chain rule: functions of class C 1 are stable
by composition.
Proof. Thanks to Proposition 5.17, we know that g f is differentiable everywhere on D. Thus g f has
partial derivatives everywhere on D, which are actually given by the formula (5.3). As f and g are of
class C 1 , all the partial derivatives of f and g are continuous, thus so are the partial derivatives of g f
(because a product, sum, composition of continuous is continuous).

55
56
Chapter 6

Higher order derivatives

In the previous chapters we have investigated differentiability, that is we have only differentiate the
function once. In this chapter we will study higher order derivatives. Though the higher order partial
derivatives are quite easy to describe, what can be more intricate is the algebraic structure in which to
store these derivatives but we won’t discuss this aspect. The main theorem of this chapter is Schwarz’s
theorem which states that the order in which partial derivatives is taken does not matter. We will then
move to Taylor expansion of order 2 (recall that the Taylor expansion of order 1 is the differential that
we studied in the two previous chapters). It is possible to consider Taylor expansion of higher order,
however the formulas become heavier, in particular without the help of the right algebraic tools.
In this chapter, we will restrict to functions f : D Rd Ñ R defined on D an open subset of Rd and
real-valued. The case where the codomain is Rp and not R can be obtained by decomposing a Rp -valued
function as a collection of p real-valued functions and does not bring new conceptual novelties.

6.1 Partial derivatives

To define higher order partial derivative, we simply reason by induction: for instance if f f px, y q then
we define
B2 f B Bf
.
B xB y B x B y
The definition below just makes what is written above more formal. We actually do not strive for the
sharpest definition and directly define what it means to be of class C k over a domain (rather than to
define differentiability of order k at a single point). Also, by restricting to this framework, we avoid the
pathological cases like Example 4.11.
Definition 6.1 (Higher order derivative). Let f : D Rd Ñ R a real-valued function defined on an open
set D. We say that the function is of class C 2 over D if it is of class C 1 and all the partial derivatives
Bf {Bxi : D Ñ R for i P t1, 2, . . . , du are of class C 1 . In this case we define for i, j P t1, 2, . . . , du
B2 f B Bf
,
B xi B xj B xi B xj
in such a way that this defines a real-valued function D Ñ R.
More generally, for any sequence of index i1 , i2 , . . . , ik we define the partial derivative of order k as

Bk f BxB B B Bf
B xi B xi
1 2 . . . B x ik i
1 B xi 2
...
B xi
k 1
Bxi
k
.

By induction we say that a function is of class C k over D if it is of class C 1 and all of its (first order)
partial derivatives are of class C k1 .
Whereas for a function R Ñ R (or R Ñ Rp ) the derivative of order k at a point t is a single number
(or a single vector), here we have a collection of dk partial derivatives (actually less because of Schwarz’s
theorem, see below).

57
Bocconi University – course 30543 (Mathematical Analysis module 2)

−f (x0 , y0 + k) +f (x0 + h, y0 + k)
y0 + k
1/hk[f (x0 + h, y0 + k) − f (x0 , y0 )
(x0 + s′ , y0 + t′ )
−f (x0 + h, y0 ) − f (x0 , y0 + k)]

(x0 + s, y0 + t)
∂ 2f ∂ 2f
= (x0 + s, y0 + t) = (x0 + s′ , y0 + t′ )
∂y∂x ∂x∂y
y0
+f (x0 , y0 ) −f (x0 + h, y0 )

x0 x0 + h

Figure 6.1: Illustration of the identity (6.1): a finite difference involving the values of f on the boundary
of a rectangle (in red and blue) can be expressed as B 2 f {B y B x or B 2 f {B xB y, a priori at two different
points, but at points included in the dashed rectangle.

6.2 Schwarz’s theorem

A priori, the order in which we take partial derivatives matters in the definition. However, for function
of class C k , this is not the case. We will only prove this theorem for a function of two variables but it
extends directly to the more general case of d variables and k derivatives: the order in which we take
derivatives don’t matter for a function of class C k .
Theorem 6.2 (Schwarz’s theorem). Let f : D R2 Ñ R be a function of two variables of class C 2 on
an open set D. Then, for every point x0 of D there holds

B2 f px q B2 f px q.
B xB y 0 B y B x 0
In this result it is important that f is of class C 2 , that is that the functions B 2 f {B xB y and B 2 f {B y B x are
continuous. There exist counterexamples where the partial derivatives of second order exist everywhere
and are not symmetric (but in this case they are not continuous).
Proof. Let x0 px0 , y0 q P R2 and let r ¡ 0 such that Bc px0 , rq D. We introduce on Bc p0, rq, at least
if h and k are not 0, the function F defined by
f px0 kq f px0 , y0 q f px0 h, y0 q f px0 , y0 kq
F ph, k q
h, y0
.
hk
It was chosen because of the following identities:

B2 f px , y q lim lim F ph, kq while

B2 f px , y q lim lim F ph, kq
BxBy 0 0 hÑ0 kÑ0 ByBx 0 0 kÑ0 hÑ0
(only the order of the limits is interchanged) but we won’t use them. (The identities above can be checked
by using the definition of the derivative for a function of one variable as the limit of the slope).
Rather we use the mean value theorem for functions of one variable. Writing the mean value theorem
for the function s ÞÑ g psq f px0 s, y0 k q f px0 s, y0 q (which is of class C 1 because f is), we see
that
g phq g p0q
F ph, k q
hk
k1 g1 psq
for some s P p0, hq. Given the expression of g, it reads:

g phq g p0q 1 B f px Bf px
F ph, k q kq s, y0 q .
hk k Bx 0 s, y0
Bx 0
58
Chapter 6. Higher order derivatives

Now that we have this s (which depends on h and k), we look at the function t ÞÑ hptq B f {B xpx0
s, y0 tq (which depends on s) and we notice that, by the mean value theorem (which we can apply
because h is of class C 1 as f is of class C 2 ):

F ph, k q
1 B f px kq
Bf px s, y0 q hpkq k hp0q h1 ptq
k Bx 0 s, y0
Bx 0
for some t P p0, k q. Given the expression of h, this reads:

F ph, k q
B2 f px s, y tq.
ByBx 0 0

In conclusion, we have proved: for every ph, k q P Bc p0, rq, there exists ps, tq P p0, hq p0, k q Bc p0, rq
such that:
F ph, k q
B2 f px s, y tq.
ByBx 0 0

Reasoning symmetrically (that is, applying the mean value theorem first in the y variable and then in
the x variable), for every ph, k q P Bc p0, rq, there exists ps1 , t1 q P p0, hq p0, k q Bc p0, rq such that:

F ph, k q
B2 f px s1 , y t1 q.
B xB y 0 0

In conclusion, eliminating the function F : for every ph, k q P Bc p0, rq, there exists ps, tq and ps1 , t1 q two
points in Bc p0, rq such that:

B2 f px tq
B2 f px s1 , y0 t1 q, (6.1)
ByBx 0 s, y0
B xB y 0
see Figure 6.2 for an illustration.
Eventually we conclude with continuity: let us fix ε ¡ 0. By continuity of the second derivatives of
f , there exists δ ¡ 0 such that, for every point ph1 , k 1 q P Bc p0, δ q, there holds
2
B h1 , y0 k1 q
B2 f px , y q ¤ ε, 2
B h1 , y0 k1 q
B2 f px , y q ¤ ε.
BB p BB p
f f
and
x y x0 B xB y 0 0 y x x0 ByBx 0 0
Now fix ph, k q P Bc p0, δ q such that h and k are not 0. Using the triangle inequality and (6.1) for
ph1 , k1 q ps, tq and ph1 , k1 q ps1 , t1 q we see:
2
B f B 2
B xB y px0 , y0 q B y B x px0 , y0 q
f
2
BBxBfy px0 , y0 q BBxBfy px0 s1 , y0 B2 f px B

t1 q
2 2
tq px0 , y0 q
f

2
ByBx 0
2
s, y0
ByBx
¤ BBxBfy px0 , y0 q BBxBfy px0 s1 , y0
B f B
t1 q
2 2

B y B x p x0 tq p x0 , y0 q ¤ 2ε.
f
s, y0
ByBx
As ε is arbitrary, we have proved what we wanted to.
Example 6.3. As an application, let’s show a necessary condition for a field to be a gradient. Let’s
assume that we have g : R2 Ñ R2 a vector field over R2 of class C 1 . The question is: is it possible to
find f : R2 Ñ R such that g ∇f ? That is, can we find f such that

Bf
BBfx ?
gx
g
gy By
Well if this is the case, applying Scwharz’s theorem we find that
Bgx Bgy (6.2)
By Bx
59
Bocconi University – course 30543 (Mathematical Analysis module 2)

as the left hand side is B 2 f {B y B x while the right hand side is B 2 f {B xB y. But (6.2) depends only on g. So
we see that a necessary condition for g to be a gradient field is that (6.2) holds, that is, the “equality of
the cross derivatives”.
For instance, the function px, y q ÞÑ px cospy q, x sinpy qq (change of variables in polar coordinates) cannot
be written as the gradient of a function as it does not satisfy (6.2).
Actually, if the domain of definition is (for instance) R2 , then the converse holds, that is any vector
field g of class C 1 such that (6.2) holds can be written g ∇f , that is, as the gradient of a C 2 function.
Though we have proved Schwarz’s theorem for functions from R2 Ñ R, it holds in a more general
context (and the proof is basically by induction with the help of Theorem 6.2).

Theorem 6.4 (Schwarz’s theorem, general case). Let f : D Rd Ñ R be a function of class C k defined
on an open set D. Let x0 P D and σ : t1, 2, . . . , k u Ñ t1, 2, . . . , k u a permutation (that is a bijective map).
Then, for every point x0 of D and any sequence of index i1 , i2 , . . . , ik , there holds

Bk f px0 q
Bk f
B xi B xi . . . B xi
1 2 k
Bxi p q Bxi p q . . . Bxi p q px0 q.
σ 1 σ 2 σ k

To give an example, if you have a function of 3 variables f f px, y, z q which is of class C 4 , then for
instance there holds
B4 f B4 f B4 f ,
BxByBxBz pBxq2 ByBz ByBzpBxq2
and actually many more identities: the order in which you take partial derivatives does not matter.

Whereas the first derivatives are stored in the gradient, the compact notation for the second derivative
is to store them in the Hessian matrix. We will see in the next section on Taylor expansion why storing
second order partial derivatives this way makes sense.

Definition 6.5 (Hessian matrix). Let f : D Rd Ñ R a function of class C 2 over of an open domain
of Rd . We define the Hessian matrix of f at a point x0 P D, denoted Hf px0 q, as the d d matrix whose
entry for the i-th row and j-th column is
B2 f px q.
B xi B xj 0
Thanks to Theorem 6.2, this matrix is symmetric, that is, rHf px0 qsij rHf px0 qsji for all i, j.
For instance, in the case of a function f : R2 Ñ R it reads

B2 f2 B2 f
Bx Bx2By .
Hf B2 f B f2
B xB y By
Remark 6.6 (Counting derivatives). As you can see with the Hessian matrix, for a function f : R2 Ñ R,
one needs to compute only 3 partial derivatives of second order instead of 4 thanks to Schwarz’s theorem.
dpd 1q
If you have a function from Rd Ñ R, the number of distinct derivatives you need to compute is ,
2
which is the same as the dimension of the set of symmetric matrices.
A more tricky question of combinatorics is the following: for a function f : Rd Ñ R and given
Schwarz’s theorem, how many distinct derivatives one needs to compute to get all the derivatives of order
k? This is for sure less than dk (the naive estimate), and actually the answer is

d
k
k
pdd! k!kq! .
But this is not an easy result (well, depends what is considered easy in combinatorics!) and out of the
scope of these lecture notes.

60
Chapter 6. Higher order derivatives

6.3 Taylor expansion of second order

Previously we have seen only the first order Taylor expansion of f which reads f px0 hq f px0 q
Dfx0 phq ophq. If a function is of class C k , then it admits a Taylor expansion of order k. However,
writing this expansion is heavy because there is more than one “direction” in which to take derivatives
and the algebraic structure becomes intricate. We will focus in these notes on the Taylor expansion of
order 2 which involves the Hessian matrix. The proof will not be that complicated, as we import results
from functions of one variable.
We begin with the following lemma about the higher order derivative of a function on a line.

Lemma 6.7. Let f : D Rd Ñ R a function of class C 2 defined on an open subset D of Rd . Let

x pxi q1¤i¤d P D and h phi q1¤i¤d be two vectors of Rd , we let r ¡ 0 such that x th P D for all
t P pr, rq. Then the function g : t P pr, rq ÞÑ f px thq is of class C 2 and

g 1 ptq
¸
d
Bf px thq,

i 1
hi
Bxi
g 2 ptq
¸
d ¸
d
B 2 f px thq.

i 1j 1
hi hj
B xi B xj
Proof. For the first derivative this actually nothing else than Proposition 4.29. For the second one, one
just differentiates the first one using again Proposition 4.29 (as well as Theorem 6.2 to exchange the role
of the indexes i and j).

Then we can move to the main result of this section.

Proposition 6.8 (Second order Taylor expansion for a function of several variables). Let f : D Rd Ñ R
a function of class C 2 defined on an open subset D of Rd . Let x0 P D and r ¡ 0 such that Bc px0 , rq D.
Then, for h phi q1¤i¤d P Bc p0, rq there holds

f px0 hq f px0 q
¸
d
B f px q 1 ¸¸
d d
B2 f px q op}h}2 q
i1
hi
B xi 0 2 i1 j 1
hi hj
B xi B xj 0
where op}h}2 q }h}2 ω phq, and ω : Bc p0, rq Ñ R is a function which goes to 0 when h goes to 0.

Proof. Let us define

∆phq f px0 hq f px0 q
Bf px q 1
¸
d ¸
d ¸
d
B 2 f px q
i1
hi
B xi
0
2 i1 j 1
hi hj
B xi B xj 0 ,

or goal is to show that ∆phq is op}h}2 q.

We apply the Taylor expansion with explicit reminder to the function g : t ÞÑ f px0 thq for t 1: it
reads
1 2
f px0 hq g p1q g p0q g 1 p0q g ptq,
2
for some t P p0, 1q (which depends on h). Thanks to Lemma 6.7 we know what g 1 p0q and g 2 ptq are, while
g p0q f px0 q. Thus, we can write

∆phq f px0 hq f px0 q
¸
d
Bf px q 1 ¸
d ¸
d
B 2 f px q
i1
hi
B xi
0
2 i1 j 1
hi hj
B xi B xj 0
2
12
¸d ¸ d
B f px thq
B2 f px q
i1 j 1
hi hj
B x i B xj
0
B xi B xj 0

61
Bocconi University – course 30543 (Mathematical Analysis module 2)

Figure 6.2: Example of graphs of functions f : R2 Ñ R such that the gradient at p0, 0q vanishes, that is,
the plane Oxy (in gray) is tangent at p0, 0q. Then the local behavior of the function
around
the tangent
0.4 0
plane can change depending on the Hessian at p0, 0q. Left: Hessian Hf p0, 0q which gives

0 1.2
a graph above the tangent plane. Center: Hessian Hf p0, 0q
1.2 0 which gives a graph below

0 0.4
1.2 0
the tangent plane. Right: Hessian Hf p0, 0q which gives a graph crossing the tangent
0 0.4
plane, this is called a saddle point.

for some t P p0, 1q which may depend on h. Note that

|hi hj | }h}2 }hhi} }hhj} ¤ }h}2 .
So we can bound:
d
|∆phq| ¤ }h2}
2 ¸d ¸
2fB B2 f px q .
B B p thq

i1 j 1
x x x0
i j B xi B xj 0

For each i, j, the function h ÞÑ B 2 f B xi B xj px0 thq BxBi Bfxj px0 q goes to 0 as h Ñ 0 by continuity of
2

the second order derivatives. As a sum of functions going to 0 also goes to 0, it concludes the proof.
The Taylor expansion looks quite heavy, and it is. For a function f : R2 Ñ R, this reads
f px0 k q f px0 , y0 q
Bf Bf h2 B f 2
B2 f k2 Bf 2
oph2 k 2 q,
h, y0 h
Bx k
By 2 B x2
B xB y hk
2 By2
where here all the partial derivatives of f are taken at the point px0 , y0 q. Some examples of the graph of
the function f (when px0 , y0 q p0, 0q and ∇f p0, 0q) are given in Figure 6.2.
In general, if one wants to write in a compact way, then one uses the Hessian matrix: actually the
Taylor expansion can be written
1
f px0 hq f px0 q ∇f px0 q h h pHf px0 qhq op}h}2 q.
2
Here, h pHf px0 qhq is the dot product between the vector h and the vector Hf px0 qh, which is the
multiplication between the matrix Hf px0 q and the vector h.

6.4 Complement: the algebraic structure of higher order deriva-

tives
This section is not something you could be tested on, but is rather here to give you a complement on the
algebraic structure of higher order derivatives.
Recall that the differential of f : Rd Ñ R at a point x0 is a linear map Rd Ñ R. Then, partial
derivatives are interpreted as the coefficient of the matrix representing this linear map in the canonical

62
Chapter 6. Higher order derivatives

basis. The goal of this section is to explain how it translates this language for higher order derivatives.
Thus we fix f : D Rd Ñ R a function of class C k defined on an open set D, as well as x0 P D.
The differential of order k of f at x0 , that we write Dk fx0 is a k-linear map from Rd Ñ R. That is,
it takes as argument hp1q , hp2q , . . . hpkq a family of k vectors in Rd and outputs

Dk fx0 php1q , hp2q , . . . , hpkq q P R.

In addition, it is linear in each of its argument hpj q , when keeping the others fixed. Intuitively, the
quantity Dk fx0 php1q , . . . , hpkq q corresponds to “first differentiating in hpkq , then in hpk1q , up to hp1q ”.
The link with the partial derivatives is that it can be expressed as

Dk fx0 php1q , hp2q , . . . , hpkq q

¸ p1q p2q pkq Bk f px0 q,
i1 ,i2 ,...ik
hi1 hi2 . . . hik
B xi B xi
1 2 . . . B x ik

where the sum is taken among all k-uplets of indexes i1 , i2 , . . . ik , that is, over t1, 2, . . . , duk
Schwarz’s theorem reads as: the mapping Dk fx0 : pRd qk Ñ R is not only k-linear but also symmetric,
that is
Dk fx0 php1q , hp2q , . . . , hpkq q Dk fx0 phpσp1qq , hpσp2qq , . . . , hpσpkqq q
for every permutation σ : t1, 2, . . . , k u Ñ t1, 2, . . . , k u.
Eventually one can write a Taylor expansion at any order which reads, if f is of class C k 1
:

¸
k
1
f px0 hq f px0 q h, h, . . . , hq
Dj fx0 plooooomooooon op}h}k q,
j 1
j!
j times

where Dj fx0 ph, h, . . . , hq means we apply the j-linear form Dj fx0 to the collection of j vectors all equal
to h.
Remark 6.9 (Counting derivatives, continued). Remark 6.6 about “counting derivatives” can be re-read
with the prism of linear algebra. It actually states that the linear space of all symmetric k-linear forms
from pRd qk Ñ R has dimension (as vector space) of d k k pdd! k! kq!
.
Remark 6.10 (Polarization formulas). It can be surprising that in the Taylor expansion we only evaluate
Dk fx0 on the diagonal, that is, when all arguments are equal to the same vector. Actually a symmetric
k-linear form is entirely determined by its value on the diagonal. Let us illustrate why in the simple case
of a 2-linear form, the general case is algebraically more involved but conceptually similar. If we take
L : php1q , hp2q q P pRd q2 Ñ R which is 2-linear and symmetric then
1 1 1
Lphp1q , hp2q q Lphp1q h p 2q , h p 1q hp2q q Lphp1q , hp1q q Lphp2q , hp2q q
2 2 2
as it can be checked by linearity and symmetry. Thus, the knowledge of Lph, hq for every h P Rd is
enough to reconstruct Lphp1q , hp2q q for all hp1q , hp2q P Rd .

63
64
Part II

Integration

65
66
Chapter 7

Path integrals

In this chapter we are interested in the integral of a function f defined over Rd on the image of a
parametric curve γ : I Ñ Rd that we will write
»
f.
γ

The function f can be a scalar field, with a particular case corresponding to the constant function equal
to 1 which yields the length of the curve γ. Or it can be valued in Rd (that is f : Rd Ñ Rd with the same
d for the domain and codomain) and thought as a vector field. In physics, this would correspond to the
work of a force f along a trajectory γ. In the case when the vector field is the gradient of a function, we
recover the fundamental theorem of calculus: the integral of the derivative of the function coincides with
the difference between the value of the function at the boundary.
One key property of these two integrals is their invariance under reparametrization: ³ the speed at
which the curve is traveled is not important, only the image matters. The quantity γ f is a geometric
quantity, which depends only on the image of γ.
In the rest of the chapter we take I ra, bs a closed and bounded interval of R and γ : I Ñ Rd a C 1
curve. We also take f a function defined on a domain D of Rd and such that γ pI q is included in *D*.
In this chapter all the integrals we define boil down to integrals of functions of one variable, so there
is no need to rebuild a theory of integration.

7.1 Integral of a scalar field

Definition 7.1 (Path integral of a scalar field). Let γ : I ra, bs Ñ Rd a curve and f : Rd Ñ R a scalar
function. If γ is of class C 1 and f is continuous, we define
» »b
f ds f pγ ptqq }γ 1 ptq} dt.
γ a

If γ is only piecewise C 1 , that is, there exists t0 a ¤ t1 ¤ t2 . . . ¤ tN b such that γ is continuous

over ra, bs and of class C 1 on each segment rti1 , ti s for i 1, . . . , N then
» N » ti
¸
f ds f pγ ptqq }γ 1 ptq} dt.
γ
i 1 ti1

The definition makes sense: as γ is of class C 1 , the function γ 1 is continuous; and the function
t ÞÑ f pγ ptqq is continuous as a composition of continuous functions (Proposition 4.10). Thus we are
integrating the continuous function t ÞÑ f pγ ptqq }γ 1 ptq} over a segment.
The quantity ds stands for the infinitesimal arc length. Indeed, the definition above basically reads
“ds }γ 1 ptq}dt”, and the right hand side is the distance traveled on the curve during a time dt. To make
this more formal, one can prove indeed a “Riemann sum” representation which is illustrated in Figure
7.1.

67
Bocconi University – course 30543 (Mathematical Analysis module 2)

γ(b) = γ(tN
N)

N =6

γ(tN
2 )

γ(tN Contribution to the integral

1 )
kγ(tN N N
2 ) − γ(t1 )kf (t2 )

γ(a) = γ(tN
0 )

³
Figure 7.1: Illustration of Proposition 7.2. The integral γ f ds can be seen as sum of contributions of
the form “length” f . If f 1, we recover the length of the curve.

Proposition 7.2. Let γ : ra, bs Ñ Rd a C 1 curve and f : Rd Ñ R a scalar continuous function. For
N ¥ 1 we write tNk a k pb aq{N , that is, tt0 , t1 , . . . , tN u are ordered and evenly distributed on ra, bs.
N N N

Then there holds

» ¸
N

f ds lim f tN }γ ptNk q γ ptNk1 q}.
γ NÑ 8 k 1 k

We could actually replace f tN
k by f pt1 q for any t1 P rtNk , tNk1 s, it does not change the validity of the
result.
Sketch of the proof. Using the Riemann sum representation of the integral for the continuous (scalar
valued) function t ÞÑ f pγ ptqq}γ 1 ptq}, we have:
» »b ¸
N

f ds f pγ ptqq}γ 1 ptq} dt lim f tN }γ 1 ptNk q}ptNk tNk1 q, (7.1)
γ a N Ñ 8 k1 k

but this is not exactly the result we want.

To get the result we will prove the following additional claim: with the assumptions and notations of
the Proposition, for all ε ¡ 0 there exists N0 P N such that if N ¥ N0 then for k P t1, 2, . . . , N u

} p q γ ptNk1 q} }γ 1 ptNk q}ptNk tNk1 q ¤ ptNk tNk1 qε.
γ tN
k (7.2)

This is the quantified version of “ds }γ 1 ptq}dt”. If the claim is proved then for N ¥ N0 by using the
triangle inequality to sum the inequalities above,

N
¸ ¸N
1

f tN } p q p
γ tN γ tN k1 q} N
f tk γ tk } p q}p
N N
tk tk1
N
q¤ sup |f | pb aqε
pq
k k
1 k 1
kloooooooooooooooooomoooooooooooooooooon looooooooooooooooooomooooooooooooooooooon γ I

(I) (II)

The term (I) is the one in the ³statement of the Proposition, while in (II) we recognize, thanks to (7.1),
something which converges to γ f ds. Then there is a bit of quantification to play with, but it yields the
result. Note that supγ pI q |f | 8 as the function f γ : I Ñ R is continuous over the segment ra, bs,
thus it attains its maximum and minimum.

68
Chapter 7. Path integrals

It remains to prove the claim, that is, (7.2). We start by choosing ε ¡ 0. We can use the “mean value
theorem” (in its integral form) and write the following equality between vectors:
» tN
γp q γp q γ 1 ptq dt.
k
tN
k tN
k 1
k 1
tN

As γ 1 is uniformly continuous on the segment ra, bs (as each of its coordinate function is uniformly
continuous), we can find N0 such that if N ¥ N0 then

}γ 1 ptq γ ptNk q} ¤ ε
k1 , tk s. In particular, using the triangle inequality for integrals,
for all t P rtN N

» N
N tk
p q γp
γ t
k tN
k 1 qp
tN
k tN
k 1 qγ 1 ptN q
k
tN
γ 1 ptq dt ptN N 1 N
k tk1 qγ ptk q

k 1
» N
tk
γ1 t p q γ 1 ptNk q

dt
tN
k 1
» tN
1
¤ p q γ 1 ptNk q dt ¤ ptNk tNk1 qε.
k
γ t
k 1
tN

Then to finally get the claim, we use that |}a} }b}| ¤ }a b} which is true for any vector a, b P Rd
(this is a consequence of the triangle inequality) and that we apply to a γ ptNk q γ ptk1 q and b
N

ptNk tNk1 qγ 1 ptNk q. This concludes the claim (7.2), hence the proof.

7.2 Independence of the parametrization

A key property of this integral is its invariance under parametrization: only the curve matters, not the
speed at which it is traveled. We first need to define what it means to travel the same curve at different
speed.

Definition 7.3. If I, J are two intervals of R and k ¥ 1 is an integer, we call diffeomorphism of class
C k a function φ : I Ñ J of class C k which is bijective and such that the inverse function φ1 : J Ñ I is
also of class C k .

A simple characterization of diffeomorphisms is the following.

Proposition 7.4. If I, J are two intervals of R and k ¥ 1 is an integer, a function φ : I ÑJ of class

C k which is bijective is a diffeomorphism if and only if φ1 : I Ñ R does not vanish.

Proof. Single variable calculus.

Actually, by the intermediate value theorem, if φ1 does not vanish then it does not change sign, thus
a diffeomorphism is either increasing or decreasing.
Example 7.5. As example of increasing diffeomorphism, one can take φ : t P R ÞÑ 2t P R or ψ : t P
p0, 8q ÞÑ ln t P R. As an example of decreasing diffeomorphism, one can take η : t P r0, 1s ÞÑ 1t P r0, 1s.
As an example of a bijective function which is not a diffeomorphism, there is ξ : t P R ÞÑ t3 P R. Indeed,
ξ 1 p0q 0 and the inverse function ξ 1 : t ÞÑ t1{3 is not differentiable at t 0.

Definition 7.6. A C k oriented curve Γ is the data of pI, γ q where I is an interval of R and γ : I Ñ Rd
is a C k function. The function γ is called a parametrization of the curve.
Let γ : I Ñ Rd and ω : J Ñ Rd two functions of class C k (k ¥ 1) defined on I, J two intervals of R.
We say that pI, γ q and pJ, ω q represent the same oriented curve if there exists φ : I Ñ J an increasing
diffeomorphism of class C k such that γ ω φ, that is, if for all t P I, γ ptq ω pφptqq.

69
Bocconi University – course 30543 (Mathematical Analysis module 2)

Example 7.7. Let I r0, 2π s and γ : t ÞÑ pcosptq, sinptqq which gives a parametrization of the unit circle.
Then J r0, π s and ω : t ÞÑ pcosp2tq, sinp2tqq are such that pI, γ q and pJ, ω q define the same oriented
curve (namely, the unit circle traveled once in the counterclockwise direction). In this case the function
φ : I Ñ J is defined by φptq t{2.
Technically speaking, an oriented curve is an equivalence class of pI, γ q where the equivalence relation
is given by “representing the same oriented curve”. What there is to remember is that an oriented curve is
more than the image of a C k function, it is the image with a parametrization. Here oriented means that
we impose the diffeomorphism to be increasing: we remember in which orientation the curve is traveled
by the function, but not at which speed. By the definition of a diffeomorphism, this relation is symmetric
in pI, γ q and pJ, ω q: the definition of diffeomorphism was built for that!
Importantly, let γ : I Ñ Rd and ω : J Ñ Rd two functions of class C k (k ¥ 1) defined on I, J two
intervals of R. Let φ : I Ñ J an increasing diffeomorphism of class C k such that γ ω φ. Then for all
t P I,
γ 1 ptq φ1 ptqω 1 pφptqq.
That is, written with the coordinate functions,

γ11 ptq ω11 pφptqq
. 1 ..
.. φ ptq .
.
γd1 ptq ωd1 pφptqq

This can be proved precisely by writing the equation coordinate by coordinate and using the rule of
single variable calculus about the derivative of a composite function. Note that φ1 ptq is a scalar while
γ 1 ptq, ω 1 pφptqq are vectors in Rd . In particular, as φ1 ptq ¡ 0 (because φ is increasing), the vectors are
colinear and in the same direction.
They key result is the following of a path integral is the following: the value of the integral over an
oriented curve does not depend on the parametrization.
Proposition 7.8 (Path integral is independent of the parametrization). Let γ : I Ñ Rd and ω : J Ñ Rd
two functions of class C 1 defined on I, J two bounded intervals of R. If they represent the same oriented
curve, for any continuous f : Rd Ñ R, » »
f ds f ds.
γ ω

Proof. Actually the proof is quite simple and relies on a change of variables in the integral. We take
φ : I Ñ J a C 1 increasing diffeomorphism such that γ ω φ. In particular, from γ 1 ptq φ1 ptqω 1 pφptqq
and as φ1 ¡ 0, we see that }γ 1 ptq} |φ1 ptq|}ω 1 pφptqq} φ1 ptq}ω 1 pφptqq}. Substituting in the definition:
» »
f ds f pγ ptqq }γ 1 ptq} dt
γ I
»
f pω pφptqqq φ1 ptq}ω 1 pφptqq} dt
I
» »
f pω prqq }ω 1 prq} dr f ds
J ω

where the second to last equality is the change of variables r φptq.

7.3 Length of a curve

³
A particular, when the function f is constant and equal to 1, the integral γ
ds yields by definition the
length of the curve.
Definition 7.9. If γ : I Ñ R is a curve of class C 1 (or continuous and piecewise C 1 ), we define its
length as » »
ds }γ 1 ptq} dt.
γ I

70
Chapter 7. Path integrals

So computing the length of a curve is nothing else than computing an integral. However, in practice
computations are quite hard to do. For instance, there is no simple formula for the length of an ellipse,
where we recall that a parametrization is given by t P r0, 2π s ÞÑ pa cosptq, b sinptqq for some a, b ¡ 0.
The following result is just a particular case of Proposition 7.8.
Proposition 7.10. Let γ : I Ñ Rd and ω : J Ñ Rd two functions of class C 1 defined on I, J two bounded
intervals of R. If they represent the same oriented curve, then the length of the pI, γ q is the same as the
length of pJ, ω q.
The notion of length enables to choose a distinguished parametrization: the one which has unit speed.
Definition 7.11. Let Γ be an oriented curve and let pI, γ q be one of its parametrization. The parametriza-
tion is said normal if }γ 1 ptq} 1 for all t P I.
For a normal parametrization, the length of the curve between γ paq and γ pbq (for a ¤ b both in I)
is b a. Sometimes the curve is called “parametrized by arclength”, because the parameter t is now
interpreted as the arclength.
Proposition 7.12. Let Γ be an oriented curve of class C k and pI, γ q one of its parametrization. We
assume that all point of γ are regular, that is, γ 1 ptq 0 for all t P I. Then there exists ω : J Ñ Rd a
function of class C k such that pJ, ω q is a normal parametrization of Γ.
Proof. We build the parametrization φ with the help of the length. Let t0 P I and for t P I we define
»t
φptq }γ 1 ptq} dt.
t0

As γ is of class C k and γ 1 never vanishes, one can check that }γ 1 } is of class C k1 . Thus the function φ is
of class C k on I. Moreover, φ1 }γ 1 } never vanishes (by regularity of γ) thus φ is a C k diffeomorphism
onto J φpI q. We define ω : J Ñ Rd by ω γ φ1 . As φ1 : J Ñ I is also of class C k , this defines a
function ω of class C k and γ ω φ. Hence pI, γ q and pJ, ω q represent the same oriented curve.
Eventually, from γ 1 φ1 pω φq, by taking the norm and as φ1 ptq }γ 1 ptq} ¡ 0 for all t P I, there
holds
}γ 1 ptq} φ1 ptq}ω1 pφptqq} }γ 1 ptq}}ω1 pφptqq}.
Simplifying by }γ 1 ptq}, we conclude that }ω 1 pφptqq} 1 for all t P I. By bijectivity of φ, we see that
}ω1 prq} 1 for all r P J: the parametrization pJ, ωq is normal.

7.4 Integral of a vector field

We move to the definition of the integral of a vector field along a path. We take f : Rd Ñ Rd a vector
field (with d the dimension of the domain being the same as the d of the codomain).
Definition 7.13 (Path integral of a vector field). Let γ : I Ñ Rd be a C 1 curve defined on³ a closed
bounded interval ra, bs and f : Rd Ñ Rd be a continuous function. We define the real number γ f ds as
» » d »
¸
f ds f pγ ptqq γ 1 ptq dt fi pγ ptqqγi1 ptq dt.
γ I
i 1 I

Similarly to Definition 7.1, it can be extended to the case of γ continuous and piecewise C 1 curves by
concatenation.
Again, one can check that the integral is well defined as t ÞÑ f pγ ptqq γ 1 ptq is a composition and sum of
continuous functions. Note that we integrate the dot product between the vector f pγ ptqq and the vector
γ 1 ptq so that in the end we integrate a scalar quantity and the final outcome is a scalar. So here ds is
the “vectorial infinitesimal increment” γ 1 ptqdt. In some sense, the integral measures in average if the
displacement γ 1 is aligned with the vector field. Again this can be interpreted with the help of a Riemann
sum.

71
Bocconi University – course 30543 (Mathematical Analysis module 2)

γ(b) = γ(tN
N)

N =6

f (tN
1 )

γ(tN
2 )

γ(tN
1 )
Contribution to the integral
(γ(tN N N
2 ) − γ(t1 )) · f (t1 )

γ(a) = γ(tN
0 )

³
Figure 7.2: Illustration of Proposition 7.14. The integral γ f ds can be seen as sum of contributions of
the form “displacement” f . Compared to Proposition 7.14 we have made a little shift in the indexes
(that does change the validity

of the proposition), that is, we sum f t k1
N
γ pt N
k q γ ptN
k1 q instead of
f tk γ ptk q γ ptk1 q .
N N N

Proposition 7.14. Let γ : ra, bs Ñ Rd a C 1 curve and f : Rd Ñ Rd a continuous vector field. For N ¥ 1
k a
we write tN k pb aq{N , that is, ttN
0 , t1 , . . . , tN u are ordered and evenly distributed on ra, bs. Then
N N

there holds
» ¸
N

f ds lim f tN γ ptNk q γ ptNk1 q .
γ N Ñ 8 k1 k

We will not sketch the proof as it is very similar to the one of Proposition 7.2. Roughly, the idea is
that γ ptN 1 N
k q γ ptk1 q ptk tk1 qγ ptk q and then one has to quantify properly.
N N N

Remark 7.15 (A useful bound). A link between the integral of a scalar and vector field is the following:
if f : Rd Ñ Rd is continuous and γ : I Ñ Rd is a C 1 curve
» »

f ds
¤ }f } ds.
γ γ

Indeed, using the triangle inequality and then Cauchy Schwarz:

» » » » »

f ds
p
f γ t
p qq γ 1 ptq dt ¤
p
f γ t p qq γ 1 ptq dt ¤ }f pγ ptqq}}γ 1 ptq} dt }f } ds.
γ I I I γ

Similarly to the scalar case, one can also prove the independence of the integral from the parametriza-
tion.

Proposition 7.16 (Path integral is independent of the parametrization (case of a vector field)). Let
γ : I Ñ Rd and ω : J Ñ Rd two functions of class C 1 defined on I, J two bounded intervals of R. If they
represent the same oriented curve, for any continuous f : Rd Ñ Rd ,
» »
f ds f ds.
γ ω

Proof. The proof is left as an exercise and very similar to the case of Proposition 7.8.

72
Chapter 7. Path integrals

f = ∇g

γ(a) = γ(b)

Z
∇g · ds = 0
γ

Figure 7.3: Illustration of Corollary 7.18: the integral of a gradient (in green) on a closed curve (in red)
is zero. Note that in this drawing it is not apparent that the vector field f is the gradient of a scalar
function.

Eventually, let us prove the analogue of the fundamental theorem of calculus which relates the integral
of the derivative of the function to the function. We start by recalling it for a function R Ñ R, you have
seen this result in Mathematical Analysis – Module 1. When you have a function f : R Ñ R which is of
class C 1 then for every bounded interval ra, bs there holds
»b
f 1 ptq dt f pbq f paq. (7.3)
a

There is a similar result in this case, but now what you should put inside the integral is the gradient of
a scalar function, that you see as a vector field.

Proposition 7.17 (Path integral of a gradient field). Let g : Rd Ñ R a scalar function of class C 1 and
γ : I ra, bs Ñ Rd a C 1 curve. Then
»
∇g ds g pγ pbqq g pγ paqq.
γ

That is, the path integral of f ∇g on γ is rg γ sba .

Proof. By definition
» »b
∇g ds ∇g pγ ptqq γ 1 ptq dt.
γ a

By Proposition 4.29, the function t ÞÑ ∇g pγ ptqq γ 1 ptq is nothing else than the derivative (with respect
to t) of the function t ÞÑ g pγ ptqq. Thus we can use (7.3) for the function t ÞÑ g pγ ptqq which is defined on
I ra, bs and valued in R.

A particular useful case is the following: if the curve is closed, that is if γ pbq γ paq. In this case, the
integral of a gradient over the curve vanishes.

Corollary 7.18. Let g : Rd Ñ R a function of class C 1 and γ : I ra, bs Ñ Rd a C 1 closed curve

(meaning γ paq γ pbq). Then »
∇g ds 0.
γ

73
Bocconi University – course 30543 (Mathematical Analysis module 2)

Proof. We just apply the previous proposition and notice that g pγ pbqq g pγ paqq as γ paq γ pbq.
Example 7.19. Let us consider a constant vector field, that is, we fix u P Rd³ and we define f : x ÞÑ u for
all x P Rd . Let γ : ra, bs Ñ Rd be a curve of class C 1 , we want to compute γ f ds. One quick way is to
notice that f ∇g provided we define g : x ÞÑ u x. Thus
» »
f ds u ds g pγ pbqq g pγ paqq u γ pbq u γ paq u pγ pbq γ paqq.
γ γ

This could also have been obtained by working coordinate per coordinate.
We conclude this chapter with two remarks: one about standard results about integrals that were not
put in the core of the chapter not to overload it, and a complement about a concept that we do not cover
but that you may encounter in other context.
Remark 7.20 (Some identities). As everything boils down to one-dimensional integrals, you can directly
import some results and formulas, like linearity. For instance, you can check the following formulas: if
f, g are scalar functions and γ is a curve while a, b P R,
» » »
paf bg q ds a f ds b g ds.
γ γ γ

The same hold if f, g are vector valued functions:

» » »
paf bg q ds a f ds b g ds.
γ γ γ

Remark 7.21 (Complement: 1-form). You may encounter what is called 1-forms. From a notation
perspective, a 1-form ω on Rd reads
¸
d
ω fi pxq dxi ,
i 1
where each fi is a function from Rd Ñ R. Thus a 1-form is a collection of d real-valued functions, it is
nothing else than a function f : Rd Ñ Rd whose coordinate functions are the fi .³ For all practical purpose,
you can read the integral of a 1-form on a curve γ : I Ñ Rd , usually denoted γ ω as the integral of the
vector field f on the curve γ: » »
ω f ds.
γ γ

So why introduce this notation of 1-form? To be precise, a 1-form is a function ω : Rd Ñ LpRd q, where
LpRd q is the set of linear forms on Rd (the set of linear maps from Rd to R). The paradigmatic example
of a 1-form is the differential of a function f : Rd Ñ R. Then one defines
» »
ω ω pγ ptqqrγ 1 ptqs dt.
γ I

In the formula above, ω pγ ptqqrγ 1 ptqs is the linear form ω pγ ptqq evaluated at γ 1 ptq.
However, in Rd there is a canonical isomorphism between LpRd q and Rd which is given by the dot
product. For every L P LpRd q there is a unique vector xL such that @x P Rd , Lpxq x xL , and the
mapping L Ø xL is an isomorphism of vector space between LpRd q and Rd . Actually, the i-th component
of xL is obtained as Lpei q. Thus instead of looking at a 1-form, one can use this isomorphism to see it as
a vector valued function Rd Ñ Rd . This reasoning actually holds as soon as you have a dot product (if
the space is infinite dimensional you need to impose a metric assumption: the space must be complete
for the distance generated by the norm).
In summary: as you have a dot product on Rd , there is a canonical isomorphism between LpRd q and
R and it’s too mucch effort for not a great reward to distinguish between LpRd q and Rd , thus 1-forms and
d

vector fields can be thought as the same objects. But in some other context it makes sense to distinguish
between a vector space V and the space LpV q of 1-forms over it, and in this case one distinguishes between
1-form and vector fields.

74
Chapter 8

Integrals of functions of several

variables

In this chapter we study the integral of functions of several variables. Though we will stick to functions
of 2 variables, the definitions we present can be extended to functions of d variables at the price of more
cumbersome notations and a less intuitive´geometric representation. If f : D R2 Ñ R is a function of
2 variables, we want to give a meaning to D f px, y q dxdy which can be interpreted as the volume under
the graph of f , as well as present techniques to compute this quantity.
The technical aspects of this chapter are not the most important ones, we will skip most of the proofs
as they are quite heavy. Actually a good framework to perform integration is the one given by measure
theory but this is out of the scope of these notes. The important take-home message of this chapter is not
the formal definitions, but rather the rules describing how to manipulate and compute these integrals.
You will apply these rules during the TA sessions on examples.
Specifically, in addition to the basic properties that one can expect from any definition of the integral
(linearity and positivity), the two important tools to compute integrals are: Fubini’s theorem, which
reduces the computation of a bidimensional integral to the one of two unidimensional integrals in a row;
and the change of variables formula, where the Jacobian matrix will play an important role.

8.1 Topological preliminaries

We begin this chapter with additional results on topology, namely we prove Heine’s theorem (continuous
functions on a compact set are uniformly continuous) which will be needed to justify that continuous
functions are indeed Riemann-integrable.

Definition 8.1 (Compact set). Let K Rd be a domain of Rd . We say that K is compact if it is closed
(see Chapter 3) and bounded (that is, there exists a constant C such that }x} ¤ C for all x P K).

Example 8.2. In Rd , the closed balls Bc px, rq with x P Rd and r P r0, 8q are compact.
The main interest of compact set is the Bolzano Weierstrass theorem: if a sequence lies in a compact
set, up to extraction it converges to a point in the set.

Theorem 8.3 (Bolzano Weierstrass). Let K be a compact set of Rd and pxn qnPN a sequence such that
xn P K for all n P N. Then there exists φ : N Ñ N strictly increasing such that the sequence pxφpnq qnPN
converges to a point x P K.

Proof. We will actually go back to the one dimensional case. For simplicity we only take d 2 (that is,
a sequence in R2 ). Let us write xn px1,n , x2,n q. Let C be a number such that }xn } ¤ C for all n P N.
Note that for all n P N,

|x1,n | ¤ }xn } ¤ C and |x2,n | ¤ }xn } ¤ C.

75
Bocconi University – course 30543 (Mathematical Analysis module 2)

As the sequence px1,n qnPN is bounded, there exists an extraction, that is, φ1 : N Ñ N strictly increas-
ing, such that px1,φ1 pnq qnPN converges to a limit x1 (Bolzano Weierstrass in R). Then as the sequence
px2,φ1 pnq qnPN is also bounded, there exists φ2 : N Ñ N strictly increasing such that px2,φ1 pφ2 pnqq qnPN
converges to a limit x2 .
We write φ φ1 φ2 . By what we just said px1,φpnq qnPN and px2,φpnq qnPN converge to respectively x1
and x2 . Thanks to Proposition 3.3, it implies that pxφpnq qnPN converges in R2 to x px1 , x2 q. Eventually,
as K is closed and xφpnq P K for every n P N, there holds x P K.

We have introduce the notion of compactness mainly for the Heine theorem, which states that a
continuous function on a compact set is uniformly continuous.

Theorem 8.4 (Heine). Let K Rd a compact set and f :K Ñ R a continuous function. Then f is
uniformly continuous, that is

@ε ¡ 0, Dδ ¡ 0, @x, y P K, }x y} ¤ δ ñ |f pxq f pyq| ¤ ε.

Proof. Assume by contradiction that it is not the case:

Dε ¡ 0, @δ ¡ 0, Dx, y P K, }x y} ¤ δ and |f pxq f pyq| ¡ ε.

So let’s invoke such an ε ¡ 0. From the assertion above, for δ 1{n, we can find xn and yn such that
}xn yn } ¤ 1{n but |f pxn q f pyn q| ¡ ε.
By the Bolzano Weierstrass theorem, we know that there exists φ : N Ñ N an extraction such that
pxφpnq qnPN converges to x P K. From }xn yn } ¤ 1{n, it is easy to conclude that the sequence pyφpnq qnPN
also converges to x. By continuity of f at x P K,

lim f pxφpnq q lim f pyφpnq q f pxq.

n Ñ 8 nÑ 8

This contradicts |f pxφpnq q f pyφpnq q| ¡ ε which should hold for any n P N.

Remark 8.5. Note that in the statement of Theorem 8.4, one can take a vector valued function (that
is f : K Ñ Rp for p ¥ 1) and the conclusion still holds. For instance, one can reason coordinate per
coordinate in the codomain.
As and additional result, the following statement guaranteeing existence of extrema is similar to the
version for functions defined over R, and the proof is similar.

Theorem 8.6 (Extrema of continuous functions over compact sets). Let K be a compact set of Rd and
f : K Ñ R a continuous function. Then the function f is bounded over K and attains its extrema, that
is, there exist xmin and xmax in K such that

f pxmin q inf f pxq and f pxmax q sup f pxq.

P
x K P
x K

Proof. We only prove it for the minimum. Let m inf xPK f pxq. By definition of the infimum, it means
that there exists a sequence pxn qnPN of points in K such that

lim f pxn q m.
n Ñ 8
By the Bolzano Weiestrass theorem, there exists an “extraction” φ : N Ñ N and a point xmin such that
pxφpnq qnPN converges to xmin . By continuity of f ,

f pxmin q f lim xφpnq nÑlim8 f pxφpnq q m.

Ñ
n 8
76
Chapter 8. Integrals of functions of several variables

z = f (x, y)

(x, y)
Ox

Domain D

Figure 8.1: The integral of a non-negative function f : R2 Ñ R over a domain D is the volume of the
region between the graph of f (in blue) and the domain D (in red), seen as a subset of R3 .

8.2 Definition of the integral

Intuitively, the integral of a non-negative function f : D R2 Ñ R is the volume of the region tpx, y, z q P
R3 : 0 ¤ z ¤ f px, y qu, that is, of the region below the graph of f . See Figure 8.1 for an illustration. If f
takes negative values, the volume above the graph is counted negatively for the regions where f px, y q ¤ 0.
To define this, we will follow a similar strategy as for integrals of functions of one variable: we discretize
the domain D. However, instead of cutting an interval in smaller intervals, we decompose here the domain
D as a union of rectangles.
Let us start with the domain being a rectangle. Let D ra, bsrc, ds R2 a rectangle and f : D Ñ R2
a function defined over D. We will cut the rectangle into rectangles aligned with the axis. So let us start
with one-dimensional partition.

Definition 8.7 (1 dimensional partition). A partition P of ra, bs is the data of N ordered real numbers
a x0 x1 x2 . . . xN b. The step-size of the partition ∆pP q is the maximal value of xi xi1
for i P t1, 2, . . . , N u.

With a partition of ra, bs and a partition of rc, ds it is easy to build a partition of ra, bs rc, ds, see the
bottom of Figure 8.2 and the following definition.

Definition 8.8 (Product of partitions). If P1 px0 , x1 , . . . , xN q is a partition of ra, bs and P2

py0 , y1 , . . . , yM q
is a partition of rc, ds, we call P1 b P2 the partition of D ra, bs rc, ds made of
the rectangles
Rij tpx, y q P R2 : xi1 ¤ x ¤ xi and yj 1 ¤ y ¤ yj u

for i P t1, 2, . . . , N u and j

P t1, 2, . . . , M u. The area of the rectangle Rij is ApRij q pxi xi1 qpyj yj1 q.
Once equipped with a partition of ra, bsrc, ds, one can define a lower and an upper approximation of
the integral. It corresponds, on each rectangle R, to sandwich f between its lower bound inf px,yqPR f px, y q
and its upper bound suppx,yqPR f px, y q.

Definition 8.9 (Lower and upper sums). Let D ra, bs rc, ds R2 a rectangle and f : D Ñ R a
function defined over D. We assume that f is bounded, that is, there exists C P R such that |f px, y q| ¤ C
for all px, y q P D. For P1 and P2 partitions of ra, bs and rc, ds respectively, we define respectively the

77
Bocconi University – course 30543 (Mathematical Analysis module 2)

a = x0
Rectangle R11
Rectangle R25
x1 Partition P1 of [a, b]
Ox x2
y3 = b
Domain D c = y0 y1 y2 y3 y4 y5 y6 = d
Partition P2 of [c, d]

Figure 8.2: Approximation of the volume with rectangles. The domain D is partitioned into rectangles
starting from a partition P1 of ra, bs (in dark blue) and a partition P2 of rc, ds (in green). For each
rectangle of the partition, we build two cuboids corresponding to the lower (orange) and upper (purple)
approximation of the graph in f (in blue). Here we have represented the cuboids only for two different
rectangles in the partition P1 b P2 . The lower (resp. upper) sum is then the sum of the volume of all
cuboids with orange top (resp. purple top).

lower and upper sums by:

¸
Lpf, P1 b P2 q ApRq inf f px, y q ,
P b
R P1 P2
px,yqPR

¸
U pf, P1 b P2 q ApRq sup f px, y q .
P b
R P1 P2 px,yqPR
Note that we always have Lpf, P1 b P2 q ¤ U pf, P1 b P2 q: it comes from the identity inf px,yqPR f px, y q ¤
suppx,yqPR f px, y q that we then sum over R P P1 b P2 . Then to define the integral, we take the supremum
and infimum over all partitions.
Definition 8.10 (Integral over a rectangle). Let D ra, bs rc, ds R2 a rectangle and f : D Ñ R a
function defined over D. We assume that f is bounded, that is there exists C P R such that |f px, y q| ¤ C
for all px, y q P D. We say that f is Riemann-integrable if

sup Lpf, P1 b P2 q inf U pf, P1 b P2 q,

P1 ,P2 P1 ,P2

where the supremum and the infimum are taken over all P1 , P2 partitions of ra, bs and rc, ds respectively.
If this is the case, the common value to the supremum and the infimum is called the integral of f over
the rectangle D and denoted ¼ ¼
f or f px, y q dxdy.
D ra,bsrc,ds
Remark 8.11. In order to prove that f is integrable, it is enough to show that for all ε ¡ 0 there exist
partitions P1 , P2 such that Lpf, P1 b P2 q ¥ U pf, P1 b P2 q ε.
Remark 8.12. In the one-dimensional case, there is an orientation associated to a segment. That
³b ³a
is, if a, b are real numbers then both a f and b f are defined (for instance for a continuous f :

78
Chapter 8. Integrals of functions of several variables

³b ³a
rminpa, bq, maxpa, bqs Ñ R), and there holds a f b f . In the case of integrals of functions of
case: in the definition of the integral of f on ra, bs rc, ds we always
two variables, this is no longer the ´
assume that a ¤ b and c ¤ d. If D f 0 then necessarily there are regions where f takes negative
values.
Now that we have a definition, let us check that it is not empty, that is, there are functions which are
Riemann-integrable: actually all the continuous functions are.

Proposition 8.13 (Continuous functions are Riemann-integrable over rectangles). Let D ra, bs
rc, ds R2 a rectangle and f : D Ñ R a continuous function. Then f is Riemann-integrable over D.
Proof. Observe that D is compact. Thus we know that f is bounded thanks to Theorem 8.3.
Let ε ¡ 0. Following Remark 8.11 it is enough to show that there exist partitions P1 , P2 such that
Lpf, P1 b P2 q ¥ U pf, P1 b P2 q ε.
From the uniform continuity of f (Theorem 8.4), there exists δ such that if }px, y q px1 , y 1 q} ¤ δ then
|f px, yq f px1 , y1 q| ¤ ε{ppb aqpd cqq. ?
Now we choose P1 and P2 uniform partitions with step-size smaller than δ { 2. In particular, with
this choice if R is a rectangle in P1 b P2 then }px, y q px1 , y 1 q} ¤ δ for all px, y q, px1 , y 1 q P R. As a
consequence of the uniform continuity,

f px, y q ¥ sup f px, y q.

ε
inf
px,yqPR pb aqpc dq px,yqPR
Multiplying by the area of R and summing over R P P1 b P2 , we reach the conclusion Lpf, P1 b P2 q ¥
U pf, P1 b P2 q ε.

Though this result is satisfying, it is not enough as functions could be integrated on more complicated
domains (this is in contrast with the one-dimensional case where integrating over a segment covers already
a wide variety of cases). More generally, we can define the integral of a function over a domain which is
not a rectangle: the trick is to extend the function by defining it equal to 0 outside of the domain.
We restrict to domains D which are bounded, that is, there exists a constant C such that |x| ¤ C for
all x P D.

Definition 8.14 (Integral over a domain). Let D R2 a bounded domain and f : D Ñ R a bounded
function defined over D. We define the function f˜ : R2 Ñ R by
#
f p xq if x P D,
f˜pxq
0 otherwise,

that is, we extend f by 0 outside of D. Let D̃ be a rectangle containing D. We say that f is Riemann
integrable over D if f˜ is Riemann integrable over D̃, and in this case we define
¼ ¼
f f˜
D D̃

Both the notion of being Riemann integrable and the value of the integral do not depend on the rectangle
D̃ (as long as it contains D).

A particular case of this definition corresponds to f being the constant function equal to 1: in this
case we recover the notion of area.

Definition 8.15 (Indicator function). Let D R2 . We denote by 1D : R 2 Ñ R the indicator function

of D, defined by
#
1 if x P D,
1D pxq
0 otherwise.

79
Bocconi University – course 30543 (Mathematical Analysis module 2)

y = ϕ(x)

Domain D

y = ψ(x)

a x b

Figure 8.3: Example of the domain D defined in Proposition 8.17.

Definition 8.16 (Jordan measurable set, area). Let D R2 a bounded set. Let D̃ be a rectangle
containing D. We say that D is Jordan measurable if 1D is Riemann integrable over D̃. In this case,
we define the area of D as ¼
ApDq 1D .
D̃

Both the notion of being Jordan measurable and of area do not depend on the rectangle D̃ (as long as it
contains D).
Of course, one thing that is difficult is to find sharp criteria for a domain to be Jordan measurable.
Indeed, the function 1D is not continuous so we cannot us Proposition 8.13. Let us give one example of
a domain which is Jordan measurable.
Proposition 8.17. Let ψ, φ : ra, bs Ñ R two functions defined on an interval ra, bs and valued in R. We
assume that ψ pxq ¤ φpxq for all x P ra, bs. We define the domain
(
D px, yq P R2 : x P ra, bs and ψ pxq ¤ y ¤ φpxq ,
see Figure 8.3 for an illustration. If the functions ψ and φ are continuous, then the domain D is Jordan
measurable.
Proof. We follow again Remark 8.11. Let ε ¡ 0. From the uniform continuity of ψ, φ we can find δ ¡ 0
such that |x x1 | ¤ δ implies |ψ pxq ψ px1 q| ¤ ε and |φpxq φpx1 q| ¤ ε.
Now we choose P1 and P2 uniform partitions with step-size smaller than δ and ε respectively. Let
rxi1 , xi s one element of the partition P1 . By uniform continuity, the variations of ψ and φ are at most ε
on rxi1 , xi s. Thus the image of ψ and φ on rxi1 , xi s is included in at most 4 elements of the partition
P2 , and on each of this element the function 1D cannot vary by more than 1. For the rest of the elements
of P2 , the function 1D is constant over them. Moreover, each element of P2 has length at most ε. Thus:

¸
Aprxi xi1 s Ij q sup 1D inf 1D
P
Ij P2 rxi xi1 sIj rxi xi1 sIj
¤ 4 Imax
PP
j
LengthpIj qpxi xi1 q 4εpxi xi1 q.
2

Summing this over i, one can conclude that

Lp1D , P1 b P2 q ¥ U p1D , P1 b P2 q 4pb aqε

80
Chapter 8. Integrals of functions of several variables

which is enough to yield integrability.

Remark 8.18. The sharp criterion for Jordan measurability that we will not prove is the following. A set
D is Jordan measurable if and only if its topological boundary has zero measure, that is, for every ε ¡ 0
we can include B D in a union of (potentially overlapping) rectangles so that the sum of the areas of the
rectangles is smaller than ε.
Let’s mix these two definitions: in some sense, the regularity of the domain and the regularity of the
functions can be studied separately. Then to combine we can use the following proposition.

Proposition 8.19 (Continuous functions are integrable over Jordan measurable domains). Let D R2
be a domain which is compact and Jordan measurable. Let f : D Ñ R a continuous function. Then f is
Riemann-integrable over D.

Proof. We will not do it properly because it’s a bit tedious but the idea is the following. Let recall that f˜
is the function which coincides with f on D and which is 0 outside of D. When you take a fine partition
P1 b P2 of a domain D̃ containing D, there are three kind of rectangles:

• Those which do not intersect D: over these ones you integrate the zero function so the difference
between the sup and the inf of f˜ is 0.

• Those which intersect both D and Dc : the total area that they cover is small because the domain
D is Jordan measurable.

• Those which are fully into D: over these ones the difference between the sup and the inf of f˜
coincides with the difference between the sup and inf of f , which is small because f is continuous.

Summing all these estimates enables to conclude to the an estimate of the form Lpf˜, P1 b P2 q ¥ U pf˜, P1 b
P2 q ε and yields integrability.

Remark 8.20. The sharpest criterion is that a bounded function is Riemann-integrable if and only if its
set of discontinuity point has zero measure, that is, for every ε ¡ 0 we can include it in a union of
(potentially overlapping) rectangles so that the sum of the areas of the rectangles is smaller than ε.
To conclude this section, let us state some useful properties of the integral. In some sense these are
the minimal properties that every reasonable definition of the integral should satisfy.

Proposition 8.21. Let D R2 a bounded domain and f, g : D Ñ R be two Riemann integrable functions
over D. Let also a, b P R be two scalars.

(i) Linearity: the function af bg is Riemann integrable over D and

¼ ¼ ¼
paf bg q a f b g.
D D D

¼
(ii) Positivity: if f ¥ 0 over D then f ¥ 0.
D
¼ ¼
(iii) Monotonicity: if f ¤ g everywhere on D then f ¤ g.
D D

Proof. Left as an exercise. The idea is to work at the level of upper and lower sums, but this can be a
bit tedious and quantification can be delicate. Note that (iii) is a consequence of (i) and (ii).

Remark 8.22. The first point can be rewritten more abstractly:´the space of Riemann integrable functions
over a given domain is a vector space, and the mapping f ÞÑ D f is a linear form on it.

81
Bocconi University – course 30543 (Mathematical Analysis module 2)

Ox Domain D
y0
y0 + ∆y
Z b !
Contribution f (x, y0 ) dx ∆y
a

Figure 8.4: Illustration of Theorem 8.23. We decompose the volume under the graph as a sum of tiny
“y-slices” (only two are represented in green). The contribution of each slice to the total is an integral
with respect to the x variable. Summing all this contributions (that is, integrating in y) yields the total
integral of f .

8.3 Fubini’s theorem

Fubini’s theorem is one of the main tool to compute integrals of functions of several variables as it reduces
to compute integrals for functions of one variable. We will state it only for continuous functions as it
simplifies the proof, but assumptions can be loosened.
As a preliminary, let us state “Fubini’s theorem” for finite sums: if puij qij is a family of real numbers
indexed by i P t1, 2, . . . , N u and j P t1, 2, . . . , M u then

¸
N ¸
M ¸
M ¸
N
uij uij . (8.1)

i 1j 1
j 1i 1

Indeed the two sums correspond to the same set of indexes t1, 2, . . . , N u t1, 2, . . . , M u but enumerated
in two different ways. Fubini’s theorem is the same kind of identities, but for integrals, and its proof in
the end boils down to using (8.1) for the lower and upper sums.
Theorem 8.23 (Fubini for rectangles). Let D ra, bs rc, ds R2 be a rectangle and f : D Ñ R
³b ³d
a continuous function. Then the functions y P rc, ds ÞÑ a f px, y q dx and x P ra, bs ÞÑ c f px, y q dy are
» » » »
continuous and ¼ d b b d
f f px, y q dx dy f px, y q dy dx.
c a a c
D

Let us first state and prove a Lemma which is of its own interest and which corresponds to the first
part of the theorem.
Lemma 8.24. Let D ra, bs rc, ds R2 be a rectangle and f : D Ñ R a continuous function. Then
³b ³d
the functions y P rc, ds ÞÑ a f px, y q dx and x P ra, bs ÞÑ c f px, y q dy are continuous.
Proof. The function f is uniformly continuous on D (Heine’s theorem, see Theorem 8.4). Let ε ¡ 0, and
take δ such that }px, y q px1 , y 1 q} ¤ δ implies |f px, y q f px1 , y 1 q| ¤ ε. In particular, if we fix x P ra, bs

82
Chapter 8. Integrals of functions of several variables

then
|y y1 | ¤ δ ñ |f px, yq f px, y1 q| ¤ ε.
Integrating this inequality in x P ra, bs and using the triangle inequality for integrals,
» »b
b
|y y1 | ¤ δ ñ
p q
f x, y dx
a
f x, y 1 dx
p

q ¤ pb aqε.
a

³
This proves that the function y P rc, ds ÞÑ³ ab f px, yq dx is (uniformly) continuous. By permuting the role
of x and y, we also see that x P ra, bs ÞÑ c f px, y q dy is also uniformly continuous.
d

Remark 8.25. This lemma can fail if f is only Riemann-integrable, that is, f Riemann-integrable does not
³b ³d
imply that the functions y P rc, ds ÞÑ a f px, y q dx and x P ra, bs ÞÑ c f px, y q dy are Riemann-integrable:
see for instance Exercise 7.3.3 of the textbook Vector Calculus by Baxandall and Liebeck. This is one
of the “weaknesses” of the Riemann theory of integration (that is solved by the Lebesgue theory of
integration).
Proof of Theorem 8.23. We start by choosing ε ¡ 0. Let P1 , P2 partitions of ra, bs and rc, ds such that
Lpf, P1 b P2 q ¥ U pf, P1 b P2 q ε. We write P1 px0 , x1 , . . . , xN q and P2 py0 , y1 , . . . , yM q. For
i t1, 2, . . . , N u and j P t1, 2, . . . , M u we write

lij rx min f and uij rx max f

1 ,xi sryj1 ,yj s
i 1 ,xi sryj1 ,yj s
i

for the minimal and maximal value of f , in such a way that

¸
N ¸
M ¸
N ¸
M
Lpf, P1 b P2 q pxi xi1 qpyj yj1 qlij , U pf, P1 b P2 q pxi xi1 qpyj yj1 quij .

i 1j 1
i 1j 1

We concentrate on the lower sum. Now, this is the important point, as we have finite sums we can really
first sum in j and then sum in i (that is identity (8.1)), that is:

¸
N ¸
M
Lpf, P1 b P2 q pxi xi1 q pyj yj1 qlij .
i 1 j 1
Let’s fix an index i. Then for each x P rxi1 , xi s, and by using the definition of Riemann sum for one
dimensional integrals,
¸
M »d
pyj yj1 qlij ¤ f px, y q dy.
j 1 c

Then we take the minimum in x on the right hand side: that is

¸
M »d
pyj yj1 qlij ¤ xPrxmin,x s f px, y q dy.

j 1 1
i i c

Next we sum in i, and again we use the definition of Riemann sum for one dimensional integrals (but
³d
this time for the function x ÞÑ c f px, y q dy which is Riemann-integrable thanks to Lemma 8.24):
»d » b » d
¸
N ¸
M ¸
N
pxi xi1 q pyj yj1 qlij ¤ pxi xi1 q xPrxmin,x s f px, y q dy ¤ f px, y q dy dx,

i 1 j 1
i 1 i1 i c a c

where we recall that the left hand side is nothing else than Lpf, P1 b P2 q. Doing the same reasoning with
the upper sum, one can see on the other hand that
» b » d
¸
N ¸
M
U pf, P1 b P2 q pxi xi1 q pyj yj1 quij ¥ f px, y q dy dx.

i 1
j 1 a c

83
Bocconi University – course 30543 (Mathematical Analysis module 2)

To conclude we use the assumption that the partitions P1 , P2 were chosen in such a way that Lpf, P1 b
P2 q ¥ U pf, P1 b P2 q ε. It yields
» b » d » b » d

p
L f, P1

b P2 q f px, y q dy

dx

¤ε and

p
U f, P1

b P2 q f px, y q dy

dx

¤ ε.
a c a c

As ε ¡ 0 can be chosen arbitrary small, we get the equality

¼ » b » b
f f px, y q dy dx.
ra,bsrc,ds
a a

We can then permute the role of x and y to get the other equality.
Example 8.26. Let D be the rectangle delimited by 0 ¤ x ¤ 2 and 0 ¤ y ¤ 1. We would like to compute
¼

xy 2x xy 3 dxdy.
D

Using Fubini, we find

¼ » 1 » 2

xy 2x xy 3
dxdy xy 2x xy 3
dx dy
0 0
D
»1 2
yx2 x2 y 3
2
x2
2
dy
0
»1
x 0

2y 4 2y 3 dy
0
1
1 4 1
y 2
4y
2
y 1 4
2
112 .

y 0

A useful case if when a function is the product of two functions as the double integral becomes a
product of two one-dimensional integrals. Note that it is a particular case, and not every function can
be expressed as a product of a function of x and a function of y. Note also that it works only when the
domain D is a rectangle.
Corollary 8.27. Let D ra, bs rc, ds R2 be a rectangle and g : ra, bs Ñ R, h : rc, ds Ñ R two
continuous functions of one variable. We define f : D Ñ R by f px, y q g pxqhpy q. Then
¼ » »
b d
f g pxq dx hpy q dy .
a c
D

Proof. We use (twice) the linearity of the integral: starting from Theorem 8.23,

¼ » b » d » b » d
f f px, y q dy dx g pxqhpy q dy dx
a c a c
D
»b » » »
d b d
g pxq hpy q dy dx g pxq dx hpy q dy .
a c a c

Example 8.28. Redo Example 8.26 using the linearity of the integral and the corollary we just proved.
Let us also state a theorem of Fubini for a domain which is not a rectangle. We will not do the proof
because it is more involved but the basic idea is the same: go back to a partition of a domain in little
rectangles and summing over the rectangles in the right order.

84
Chapter 8. Integrals of functions of several variables

D is the shaded domain. y ex

Boundaries of D are included.

y x2 1

Figure 8.5: Plot of the domain in Example 8.30. It corresponds to one as described in Proposition 8.17.

Theorem 8.29 (Fubini for more general domains). Let ψ, φ : ra, bs Ñ R two continuous functions defined
on an interval ra, bs and valued in R. We assume that ψ pxq ¤ φpxq for all x P ra, bs. We define the domain
(
D px, yq P R2 : x P ra, bs and ψ pxq ¤ y ¤ φpxq .

³ φpxq
We take f : D Ñ R a continuous function. Then the function x P ra, bs Ñ ψ x p q f px, yq dy is continuous
and
¼ » b » φpxq
f f px, y q dy dx.
D
a ψ x pq

Of course we can do a similar statement if the domain D has a nice decomposition in the Oy direction.
Example ´ 8.30. Let D be the domain delimited by the curves x 2, y ex and y 21 x 1. Let’s
compute D xy dxdy. Though we are integrating a product of functions, the domain is not a rectangle
so the resulting integral is not the product of the integrals. After drawing the domain, see Figure 8.5, we
realize that we are in the framework of the theorem with a 0, b 2, ψ pxq 12 x 1 and φpxq ex .
Thus

¼ » 2 » exppxq »2 yexppxq »2
2
y2 1 1
xy dxdy xy dy x
2
dx
2
x e2x 1
2
x dx
D
0 1{2x 1 0 {
y 1 1 2x 0

»2
2
1 1 1 2x 1 2x x2
2
xe 2x
x x 2
x3
4
dx
2 2
xe e
4 2
x3
3
x4
16
38 e4 241 .
0 0

where you can notice that we have used one integration by parts. As you can see, applying Fubini is just
one step, there is a tedious computation left.

85
Bocconi University – course 30543 (Mathematical Analysis module 2)

v axis y axis ∆u ∂ϕ
∂u

(x, y) = ϕ(u, v)
v + ∆v

∆v ∂ϕ
∂v
v
ϕ(u, v)

u axis x axis
u u + ∆u

Figure 8.6: Understanding the Jacobian factor. On the left, small rectangle of area ∆u∆v (in red). On
the right, image of the rectangle by the change of variables px, y q φpu, v q, it gives a “curved rectangle”
(also in red). This curved rectangle is approximated by the parallelogram with sides ∆u BBφu and ∆v BBφv (in
purple): this is precisely what differentiability means. The area of this parallelogram is |∆u∆v det Dφ|.
The link between areas and determinant was recalled in Chapter 1, see Figure 1.7.

8.4 Change of variables

Changing the variables is the other main tool to compute integral of function of several variables (usually
it happens in the following order: first doing a change of variables, and then ending with an integral that
Fubini’s theorem can compute, see examples below). Let’s assume that we have a function f px, y q and
we want to do a change of variables px, y q φpu, v q where φ : R2 Ñ R2 .
Let’s first what recall in the one dimensional case. We have f : R Ñ R, and we do the change of
variables x φpuq. The formula in this case is: for a, b P R,
» φpbq »b
f pxq dx f pφpuqq φ1 puqdu, (8.2)
pq
φ a a

it holds for instance if f is continuous and φ is of class C 1 . Informally, one can think about it as
“dx φ1 puqdu”. This makes sense: locally, an infinitesimal variation of length du yields a variation in
x φpuq of dx φ1 puqdu. The question is to understand what it becomes for functions φ from R2 Ñ R2 .
Let’s take φ : R2 Ñ R2 and think px, y q φpu, v q. Locally, at a point pu, v q the map φ is well
approximated by its differential Dφpu, v q (note that we use a slightly different notation compared to the
previous chapters, it should rather be Dφpu,vq but that would be too cumbersome). Then, a linear map
represented by a 2 2 matrix M , that is pu, v q Ñ M pu, v q, distorts area by multiplying them by a factor
| detpM q| (we have added absolute value because areas here are not considered to be oriented). Combining
these two ideas, and see Figure 8.6 for an illustration, we arrive to the conclusion that one should write
dxdy | det Dφpu, v q|dudv. That is, in a change of variables one has to plug the “determinant of the
Jacobian matrix” to take in account volume distortion. Note that in coordinates, | det Dφpu, v q| is the
real number defined as

| det Dφpu, vq| det
Bφ Bφ1

p q
B φ1
p q Bφ2 pu, vq Bφ1 pu, vq Bφ2 pu, vq .
BBφu2 B
1

Bφ2 Bu (8.3)
Bv Bv Bu
v u, v u, v
Bu Bv
We state below a rigorous theorem, though we will not prove it (the proof can be very tedious and
amounts to quantify properly what was written above).

Theorem 8.31 (Change of variables for integral of 2 variables). Let U be a Jordan measurable domain
and φ : Ũ Ñ R2 be a function defined on an open set Ũ containing U . We assume that φ is injective,

86
Chapter 8. Integrals of functions of several variables

ϕ(u, v) = (au, bv)

D
U

Figure 8.7: Computation of the area of an ellipse, see Example 8.35. An ellipse (in blue) can be seen as
the image of a circle (in red) by a linear map.

of class C 1 and that det Dφ does not vanish. We define D φpU q and take f :D Ñ R a continuous
function. Then D is Jordan measurable and
¼ ¼
f px, y q dxdy f pφpu, v qq| det Dφpu, v q| dudv.
D U

In the identity above | det Dφpu, v q| is the determinant of the Jacobian matrix of φ at the point pu, v q as
defined in (8.3).

Remark 8.32. Importantly we assume that φ is injective whereas it was not the case in (8.2): indeed,
in one dimension the integrals are signed, and if φ is not injective then φ1 changes sign and there will
be compensations that ensure (8.2) to hold. As mentioned in Remark 8.12 such effect cannot happen
for two dimensional integrals (and this also linked to the absolute value around det Dφ), thus one has to
assume that φ is injective.
Remark 8.33. The assumption that det Dφ 0 is needed for the proof but it’s more a technical issue than
a conceptual one, as this kind of assumption can be removed in the Lebesgue theory of integration. A
function φ : U Ñ R2 of class C 1 such that φ is injective and det Dφ 0 is actually a C 1 -diffeomorphism
onto its image D φpU q, in the sense that φ1 : D Ñ U is also of class C 1 .

Corollary 8.34. Let M be a 2 2 matrix with det M 0. We define φ : R2 Ñ R2 by

φpu, v q M
u
.
v

If U is a Jordan measurable domain, we define D φpU q, it is also a Jordan measurable domain.

Moreover there holds ¼ ¼

f px, y q dxdy | det M |

u
f M dudv.
v
D U

Proof. We apply Theorem 8.31. If det M 0 then we know that the map φ is a bijection. Moreover,
| det Dφ| is constant and equal to | det M |.
Example 8.35 (Area of an ellipse). Let us compute the area of an ellipse, that is, we fix a, b ¡ 0 and we
want to compute the area of " *
x2 y2
D px, y q P R : 2
2
¤1 .
a b2
Let’s define U as the unit disk, that is, U tpx, yq P R2 : x2 y2 ¤ 1u. We also define M the 2 2
matrix given by

a 0
M 0 b
.

87
Bocconi University – course 30543 (Mathematical Analysis module 2)

We associate it to it φ : pu, v q ÞÑ M pu, v q pau, bv q. Then D φpU q (exercise: prove it rigorously) thus
¼ ¼
ApDq dxdy | det M | dudv | det M |ApU q.
D U

As ApU q π (this is geometry) and | det M | ab, we conclude that the area of the ellipse with parameters
a, b is πab.
A central change of variables is the one in polar coordinate, which is useful when the domain of
integration and/or the function have some radial symmetry.
Corollary 8.36 (Change of variables in polar coordinates). Let D be a Jordan measurable domain of
R2 . We define the domain U as
"
*
r cospθq
U pr, θq P r0, 8q r0, 2πq :
r sinpθq
PD .

The domain U is Jordan measurable and if the function f is continuous over D there holds
¼ ¼
f px, y q dxdy f pr cospθq, r sinpθqqr drdθ.
D U

Proof. Behind the proof is the function φ : R2 Ñ R2 defined as

r cospθq
φpr, θq
r sinpθq
.

Then this function is injective and of class C 1 on p0, 8q r0, 2π q. Note that we do not fall exactly in
the framework of Theorem 8.31 as φ is not injective on t0u r0, 2π q. However, as the region where r 0
has a area of 0 (it corresponds to a line in R2 ), it does pose a problem.
Then we compute the Jacobian matrix of φ, it yields:

cospθq r sinpθq
Dφpr, θq
sinpθq r cospθq
,

and a straightforward computation leads to

det Dφpr, θq r.
It vanishes for r 0 but as we said, as this region has an area of zero it does not change the value of the
integral.
Eventually, up to this technical issue that we have to “remove” the region where r 0, we use
Theorem 8.31.
Example 8.37. Let D be the unit disk in R2 , that is defined as D tpx, yq P R2 : x2 y2 ¤ 1u. Let’s
compute ¼
dxdy
.
1 x2 y 2
D
Both the geometry of a domain and the presence of x2 y 2 r2 invite to do a change of variables in
polar coordinates.
Let’s take U r0, 1sr0, 2π q in such a way that D φpU q for φpr, θq pr cospθq, r sinpθqq. Moreover,
note that x2 y 2 r2 . Thus, applying Corollary 8.36,
¼ ¼
dxdy
1 x2 y 2
1
r
r2
drdθ.
D U

Now the domain U is a rectangle so we can use Fubini! Recognizing in r{p1 r2 q the derivative of
1{2 lnp1 r2 q we get
¼ » 2π » 1
» 2π 1 » 2π
1 lnp2q
1
r
r2
drdθ 1
r
r2
dr dθ 2
lnp1 r2 q dθ 2
dθ π lnp2q.
0 0 0 0 0
U

FE1073 C1 Formal Report
100% (1)
FE1073 C1 Formal Report
9 pages
Math 201 Notes
No ratings yet
Math 201 Notes
145 pages
Second Text
100% (1)
Second Text
181 pages
Multivariable Calculus Shimamoto
No ratings yet
Multivariable Calculus Shimamoto
326 pages
Calculus and Analysis in Euclidean Space-Jerry Shurman
100% (7)
Calculus and Analysis in Euclidean Space-Jerry Shurman
516 pages
Calculus 3 Course Notes For MATH 237 Edi
100% (1)
Calculus 3 Course Notes For MATH 237 Edi
258 pages
Multivariable Functions Fields and Vector Calculus Notes 2020 PDF
No ratings yet
Multivariable Functions Fields and Vector Calculus Notes 2020 PDF
99 pages
Russell - The Argument From Analogy For Other Minds (1948) PDF
100% (1)
Russell - The Argument From Analogy For Other Minds (1948) PDF
3 pages
Lecture Notes For Math 417-517 Multivariable Calculus
No ratings yet
Lecture Notes For Math 417-517 Multivariable Calculus
141 pages
IA Vector Calculus Lecture Notes 2000 (Cambridge)
No ratings yet
IA Vector Calculus Lecture Notes 2000 (Cambridge)
139 pages
Vektor Matematik
100% (1)
Vektor Matematik
139 pages
Reed Multivariable Calculus
No ratings yet
Reed Multivariable Calculus
499 pages
Multivariable Calculus: Jerry Shurman Reed College
No ratings yet
Multivariable Calculus: Jerry Shurman Reed College
499 pages
Applied M I Lnote-1
No ratings yet
Applied M I Lnote-1
90 pages
Multivariable Calculus - J Shurman PDF
100% (3)
Multivariable Calculus - J Shurman PDF
493 pages
Vcalc PDF
No ratings yet
Vcalc PDF
523 pages
SHURMAN Multivariable Calculus
No ratings yet
SHURMAN Multivariable Calculus
523 pages
Multivariable Calculus
100% (4)
Multivariable Calculus
517 pages
Vcalc
No ratings yet
Vcalc
517 pages
Math 107 Fall 2011 Notes
No ratings yet
Math 107 Fall 2011 Notes
113 pages
LectureNotes237 PDF
No ratings yet
LectureNotes237 PDF
175 pages
112a Notes 2021
No ratings yet
112a Notes 2021
191 pages
Applied One
No ratings yet
Applied One
266 pages
Applied One
No ratings yet
Applied One
266 pages
Analysis in Multivariables
No ratings yet
Analysis in Multivariables
90 pages
AppliedOne PDF
No ratings yet
AppliedOne PDF
266 pages
Analysis II Script v1
No ratings yet
Analysis II Script v1
141 pages
1st Year Maths Notes
76% (34)
1st Year Maths Notes
48 pages
237 Course Notes
No ratings yet
237 Course Notes
273 pages
Applied One
No ratings yet
Applied One
266 pages
Realanalisys 2
No ratings yet
Realanalisys 2
217 pages
Multivariable Calculus
100% (3)
Multivariable Calculus
326 pages
112a Notes 1415
No ratings yet
112a Notes 1415
188 pages
Calcul 2
No ratings yet
Calcul 2
93 pages
Maths
No ratings yet
Maths
49 pages
MVC, La, de - Carlen
No ratings yet
MVC, La, de - Carlen
330 pages
MA1521+Lecture+Notes
No ratings yet
MA1521+Lecture+Notes
248 pages
Ref 2
No ratings yet
Ref 2
305 pages
Math 22 Module (2018) PDF
No ratings yet
Math 22 Module (2018) PDF
215 pages
Peter Saveliev
100% (1)
Peter Saveliev
516 pages
Calculus and Analysis in Euclidean Space: Jerry Shurman
No ratings yet
Calculus and Analysis in Euclidean Space: Jerry Shurman
515 pages
Calculus and Linear Algebra I
No ratings yet
Calculus and Linear Algebra I
296 pages
Dreizler E
No ratings yet
Dreizler E
265 pages
Classnotes Ma1101
No ratings yet
Classnotes Ma1101
139 pages
Complete
No ratings yet
Complete
216 pages
Classnotes Ma1101 PDF
No ratings yet
Classnotes Ma1101 PDF
134 pages
MECH2407 Lec 2021
100% (1)
MECH2407 Lec 2021
81 pages
104m
No ratings yet
104m
159 pages
Lectures Compressed (3985) MAT135
No ratings yet
Lectures Compressed (3985) MAT135
300 pages
ST208 - Chapters 1-6
No ratings yet
ST208 - Chapters 1-6
315 pages
Support 1 Annee 2023-24
No ratings yet
Support 1 Annee 2023-24
139 pages
Mul Downey
No ratings yet
Mul Downey
189 pages
MH1810 Notes 2023 (Part1)
No ratings yet
MH1810 Notes 2023 (Part1)
82 pages
Semiparametric Regression for the Social Sciences
From Everand
Semiparametric Regression for the Social Sciences
Luke John Keele
3/5 (1)
Stochastic Methods and their Applications to Communications: Stochastic Differential Equations Approach
From Everand
Stochastic Methods and their Applications to Communications: Stochastic Differential Equations Approach
Serguei Primak
No ratings yet
Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods
From Everand
Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods
Rainald Löhner
No ratings yet
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods
From Everand
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods
Joseph Keshet
No ratings yet
Location-Based Services: Fundamentals and Operation
From Everand
Location-Based Services: Fundamentals and Operation
Axel Küpper
No ratings yet
Presentations with LaTeX: Which package, which command, which syntax?
From Everand
Presentations with LaTeX: Which package, which command, which syntax?
Herbert Voß
No ratings yet
SDH / SONET Explained in Functional Models: Modeling the Optical Transport Network
From Everand
SDH / SONET Explained in Functional Models: Modeling the Optical Transport Network
Huub van Helvoort
No ratings yet
Basic Research and Technologies for Two-Stage-to-Orbit Vehicles: Final Report of the Collaborative Research Centres 253, 255 and 259
From Everand
Basic Research and Technologies for Two-Stage-to-Orbit Vehicles: Final Report of the Collaborative Research Centres 253, 255 and 259
Dieter Jacob
No ratings yet
Control Systems
From Everand
Control Systems
Francisco Luis Pagola y de las Heras
No ratings yet
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
Upsc Cse Free Test Series 2024
No ratings yet
Upsc Cse Free Test Series 2024
6 pages
DLD - Ch.1 Notes PDF
No ratings yet
DLD - Ch.1 Notes PDF
35 pages
Core Mathematics C3: Pearson Edexcel GCE
No ratings yet
Core Mathematics C3: Pearson Edexcel GCE
32 pages
Week 7 Practice Questions-1
No ratings yet
Week 7 Practice Questions-1
4 pages
Blade Element Theory
No ratings yet
Blade Element Theory
6 pages
FALLSEM2018-19 - MAT5009 - TH - TT531 - VL2018191004951 - Reference Material I - 01 - MAT 5009 - ADVANCED COMPUTER ARITHMETIC PDF
No ratings yet
FALLSEM2018-19 - MAT5009 - TH - TT531 - VL2018191004951 - Reference Material I - 01 - MAT 5009 - ADVANCED COMPUTER ARITHMETIC PDF
13 pages
Control System Kec602
No ratings yet
Control System Kec602
3 pages
The System Building Blueprint
No ratings yet
The System Building Blueprint
8 pages
Modeling CO2 Storage in Aquifers With A Fully-Coup
No ratings yet
Modeling CO2 Storage in Aquifers With A Fully-Coup
17 pages
Chapter 8 Review Activity Word
No ratings yet
Chapter 8 Review Activity Word
6 pages
MST121 Course Guide
No ratings yet
MST121 Course Guide
12 pages
Mechanical Engineering - Yale University
No ratings yet
Mechanical Engineering - Yale University
7 pages
Paper 3
No ratings yet
Paper 3
36 pages
Vol-IV 2 of 2
No ratings yet
Vol-IV 2 of 2
103 pages
1 s2.0 S0142112324000239 Main
No ratings yet
1 s2.0 S0142112324000239 Main
13 pages
10 1 1 703 8121 PDF
No ratings yet
10 1 1 703 8121 PDF
41 pages
There Are Some Laws of Indices Which Are As Follows
No ratings yet
There Are Some Laws of Indices Which Are As Follows
50 pages
Scaffold Erection
No ratings yet
Scaffold Erection
65 pages
Banking - RRB PO Prelims Full Length Mock Test 1 PYQ - English
No ratings yet
Banking - RRB PO Prelims Full Length Mock Test 1 PYQ - English
36 pages
Joseph, Kristen - thesis
No ratings yet
Joseph, Kristen - thesis
57 pages
Learner Guide For Cambridge IGCSE Physics 0972
No ratings yet
Learner Guide For Cambridge IGCSE Physics 0972
48 pages
Ellipseregular 2
No ratings yet
Ellipseregular 2
4 pages
Physics Homework 62
100% (1)
Physics Homework 62
4 pages
Bmos Mentoring Scheme (Senior Level) December 2012 (Sheet 3) Solutions
No ratings yet
Bmos Mentoring Scheme (Senior Level) December 2012 (Sheet 3) Solutions
6 pages
Std08 II Maths EM PDF
No ratings yet
Std08 II Maths EM PDF
76 pages
G10-Q3-DLL-W5
100% (1)
G10-Q3-DLL-W5
5 pages
Mathematics of Public Key Cryptography 1st Edition by Steven Galbraith 1107013925 9781107013926 - The 2025 ebook edition is available with updated content
100% (4)
Mathematics of Public Key Cryptography 1st Edition by Steven Galbraith 1107013925 9781107013926 - The 2025 ebook edition is available with updated content
76 pages