Lecture Notes MA 2022
Lecture Notes MA 2022
Hugo LAVENANT
ii
Contents
I Differentiation 15
2 Parametric curves 17
2.1 Definition and representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Taylor expansion and local behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 A remark on the mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Curves in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Notions of topology in Rd 25
3.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Neighborhood, interior and closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iii
Bocconi University – course 30543 (Mathematical Analysis module 2)
II Integration 65
7 Path integrals 67
7.1 Integral of a scalar field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Independence of the parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Length of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.4 Integral of a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
iv
Chapter 1
In most of the course, we will deal with functions of several variables, and whose codomain is not
necessarily the real line. That is, functions which are defined and/or take their values in the set Rd .
Formally, Rd is the space of tuples of d real numbers. In the first semester, you have studied this space
from the point of view of linear algebra, and in the rest of the course will deal with the topological and
differential properties of this space.
Note that the convention in these notes is that vectors are written in a bold font. In practice, we
represent Rd when d 2 by a plane, and when d 3 by the 3-dimensional space. Moreover, in the case
of R2 a typical point is px, y q while in R3 it is px, y, z q. The canonical basis of Rd will be denoted by
pe1 , e2 , . . . , ed q, that is, ei is the vector in Rd whose components are all equal to 0 but for the i-th one
which is equal to 1.
The space Rd is a vector space, that is we can add vectors of Rd by adding them component wise: if
x, y P Rd then
x1 y1 x1 y1
x y ...
...
...
.
xd yd xd yd
We will indifferently write a vector as a line or column vector, using the latter when it makes the operations
clearer. Moreover, we can multiply a vector by a scalar by multiplying each component by the scalar: if
x P Rd and a P R,
x1 ax1
ax a ...
: ...
.
xd axd
3 2t
f ptq
1 t .
This is a function which takes a single number t P R and outputs a point x f ptq p3 2t, 1 tq.
Actually, we can decompose it as
3 2t 3 2
f ptq
1 t 1 t
1
.
1
Bocconi University – course 30543 (Mathematical Analysis module 2)
y
x
x+y
y 2y
−1/2x
f (R)
t(−2, 1)
f (1)
(−2, 1)
How to plot f ptq? With this decomposition above, the procedure reads as: (i) start from the point p3, 1q;
(ii) then attach to this point the vector tp2, 1q which is colinear to p2, 1q; (iii) and at the end of this
vector you find f ptq.
Actually in the example above, an element x P Rd can be either thought as a point, that is as a
location in space, or as a vector, that is as a arrow which has a direction and a length (but no fixed
“origin”). Specifically p3, 1q is a point and tp2, 1q is vector. In the first case we represent it as an
actual point, while in the second case we represent with an arrow (by convention an arrow joining the
origin 0 to the point x, but the origin of the vector does not matter). However, just by looking at the
coordinates, that is at the mathematical object x P Rd , it is not possible to distinguish between the two
interpretations. The take home message is: when doing formal manipulations, only think of elements of
Rd as collections of numbers, but to picture them, sometimes picture them as points, sometimes picture
them as vectors (and the difference between “points” and “vectors” depends on the context).
2
Chapter 1. The Euclidean space Rd
here canonical because on Rd there exists more than one dot product, but we work here with the “most
natural” one. A study of vector spaces endowed with arbitrary dot products can be done but is out of
the scope of this course.
Definition 1.3. If x, y belong to Rd , we define their dot product x y P R by
¸
d
xy xi yi .
i 1
The dot product between two vectors is a scalar, that is, a real number.
Example 1.4. If x p2, 1, 3q and y p1, 0, 2q then
Remark 1.5. With pe1 , e2 , . . . , ed q the canonical basis of Rd there holds x ei xi , that is, doing the dot
product with the i-th element of the canonical basis yields the i-th coordinate.
The following properties of the dot product are in fact the ones required for some function Rd Rd ÑR
to be called a dot product.
Proposition 1.6. Let x, y, z P Rd and a, b P R. Then
(i) Symmetry: x y y x.
(ii) Linearity with respect to the first variable: pax byq z apx zq bpy zq.
(iii) Positive-Definite: x x ¥ 0 and x x 0 if and only if x 0.
Note that by combining (i) and (ii) we get x pay bzq a px yq b px zq, that is linearity with respect
to the second variable. The map px, yq ÞÑ x y is said bilinear.
Proof. The proof of (i) and (ii) follows from the definition. Then for (iii) notice that
¸
d
xx pxi q2 .
i 1
We get a sum of squares which are all non negative, and the sum is equal to 0 if and only if each term is
equal to 0.
Definition 1.7. Two vectors x and y in Rd are said orthogonal if x y 0
That corresponds to your intuition of it, that is that the lines directed by x and y are perpendicular.
Definition 1.8. The norm, or length of a vector x is denoted }x} and defined by
?
}x} x x.
In coordinates, g
f d
f¸
}x} e p q
xi 2 .
i 1
This corresponds to the notion of length that one knows in its everyday life. Note that the norm of a
vector is the “length” of the arrow representing it. However, to compute a distance between points, one
has to compute the length of the vector joining the two points. That is, the distance between x and y is
}x y} the norm of the vector x y.
Proposition 1.9 (Cauchy-Schwarz inequality). Let x, y P Rd . Then
|x y| ¤ }x}}y},
and there is equality if and only if x and y are linearly dependent.
3
Bocconi University – course 30543 (Mathematical Analysis module 2)
Proof. If y 0 then both terms are 0, and x, y are linearly dependent thus the result stands. Let’s now
assume y 0, which is equivalent to }y} ¡ 0.
The idea is to look at the real-valued function f : t P R ÞÑ }x ty}2 . This is real-valued function, and
actually it is a polynomial of degree exactly 2 (as }y} ¡ 0):
On the other hand, we always have f ptq ¥ 0 as the dot product is positive-definite! Thus by the criteria
for non negativity of polynomial of degree 2:
Reshuffling the term and using the definition of the norm yields the result after taking the square root.
Moreover, if there is equality in the Cauchy Schwarz inequality then ∆ p2x yq2 4px xqpy yq 0
thus the polynomial f has a root: there exists t0 P R such that f pt0 q 0. But for such a t0 there holds
f pt0 q }x t0 y}2 0 thus }x t0 y} 0 hence x t0 y 0: this shows that x and y are linearly
dependent.
Remark 1.10. In coordinates this reads:
g g
¸ fd fd
d f¸ f¸
xi yi
¤ e xi p q2e yi 2 , p q
i 1 i i
and it’s actually not obvious that the equality holds. We hope you can appreciate the neat proof of
Cauchy-Schwarz above.
Remark 1.11. The geometric definition of the dot product is
x y }x}}y} cospθq,
where θ is the angle between the vectors x and y. This is consistent with Definition 1.7 as, if x, y 0
then x y 0 if and only if cospθq 0, that is θ π {2 (modulo 2π).
Here °with our axiomatic construction we have started from the definition of the dot product with the
formula xi yi . To recover a geometric interpretation we then define the angle θ in such a way that the
formula holds. Specifically, one defines the angle θ between two non zero vectors by
xy
θ arccos }x}}y} .
The Cauchy-Schwarz equality guarantees that }xx}}yy} P r1, 1s thus it can be written as the cosine of an
angle. Actually, this defines θ P r0, π s. The information that misses is the sign of the angle, which comes
from an orientation of the space (which cannot be obtained just with the dot product).
The Cauchy Schwarz inequality is the central one when dealing with dot products. One of its conse-
quences is the triangle inequality.
Proposition 1.12 (Triangle inequality). If x, y P Rd then
}x y} ¤ }x} }y}.
Proof. We square all terms: indeed
Then we use Cauchy-Schwarz on the dot product, and recognize the square of the right hand side:
4
Chapter 1. The Euclidean space Rd
kx + yk ≤ kxk + kyk
x+y
The key properties of the norm } } are summarized in the following proposition.
Proposition 1.13. Let x, y P Rd and a P R. Then
Proof. We have already proved (i) in Proposition 1.12. The proof of (ii) follows from paxqpaxq a2 px xq
and the definition of the norm. Eventually, (iii) is just a rewriting of (iii) of Proposition 1.6.
A useful consequence of (ii) is the following: if x 0, then }xx} is the only vector which is positively
colinear to x and of norm 1.
You may have already seen the last theorem of this section in a geometry course but that we rephrase
here with our language.
Proposition 1.14 (Pythagoras’s theorem). Let x and y two orthogonal vectors in Rd . Then
5
Bocconi University – course 30543 (Mathematical Analysis module 2)
C
kx − yk2 = kxk2 + kyk2 − 2 x · y
y reads
x−y
BC 2 = AB 2 + AC 2 − 2 AB AC cos(BAC)
d
θ = BAC
d
A
x B
Plane passing
through (−2, 0, 1)
and normal to
(3, 2, 0)
Point (−2, 0, 1)
Vector (3, 2, 0)
Remark 1.15. As we saw in the proof of the Theorem, we can say something even if x and y are not
orthogonal. Indeed, using x y }x}}y} cospθq, we can see that for any vector x, y P Rd
BC 2 AB 2 AC 2 2 AB AC cospBAC
{q
which is known in geometry as “the law of cosine” (see Figure 1.5). Pythagoras’s theorem corresponds
{ π.
to the case BAC 2
Example 1.16. As an application of what we saw, we can write the equation in R3 of the plane passing
through a point x0 and with a normal vector n. By definition, this is the set of points x such that the
vector joining x0 to x is orthogonal to n. Mathematically, it reads
tx P R3 : px x0 q n 0u.
For instance, the set a point x px, y, zq belongs to the plane passing through p2, 1, 0q and normal to
p3, 2, 0q if and only if
3px 2q 2py 1q 0.
6
Chapter 1. The Euclidean space Rd
e3
e2
e2 e1
z
e1 y y
x x
Figure 1.7: In R2 (left), | detpx, yq| corresponds to the area of the parallelogram delimited by the dashed
purple lines. In R3 (right), | detpx, y, zq| corresponds to the volume of the prism delimited by the dashed
purple lines.
Importantly, the absolute value of detpxp1q , xp2q , . . . , xpdq q corresponds to the d-volume of the volume
built on the pxp1q , xp2q , . . . , xpdq q, see Figure 1.7. Then the sign depends on the orientation of the family.
What is remarkable is that these d-dimensional volumes have a clean algebraic expression (namely the
formula for the determinant that you learned in algebra).
Now let’s turn to the object of this section: in R3 , one can define the cross product of two vectors
and it gives a vector (and not a scalar). As for the dot product, we start from an analytical expression
before giving a geometrical interpretation.
Definition 1.17. Let x px1 , x2 , x3 q and y py1 , y2 , y3 q two vectors of R3 . We define their cross
product x y P R3 as
x2 y3 x3 y2
x y x3 y1 x1 y3
.
x1 y2 x2 y1
Importantly, we can express the different components with determinants as
px yq1 det x2
x3
y2
y3
, px yq2 det x1
x3
y1
y3
, and px yq3 det x1
x2
y1
y2
.
7
Bocconi University – course 30543 (Mathematical Analysis module 2)
x×y
x0
x0 × y 0
e3
y0
y
e1 e2
y00
Area kxkkyk | sin(θ)| = kx × yk
Remark 1.19. One can rather define x y as the unique vector such that detpx, y, zq px yq z holds
for all z and then retrieve Definition 1.17 as a consequence.
Though the formula for the cross product is a bit cumbersome, it has a nice geometrical interpretation.
Actually, it is remarkable that the geometric “definition” below has an analytical counterpart.
(ii) The norm }x y} is equal to }x}}y}| sinpθq| where θ is the angle between x and y. That is, }x y}
coincides with the surface of the parallelogram generated by x and y.
Remark 1.21. You can convince yourself that the three properties of Proposition 1.20 define a unique
vector. Indeed, if x and y are not colinear (if they are then x y 0) then they define a plane and
there exists only one line passing through 0 and normal this plane. Thanks to (i) we know that x y
must line on this line. Then (ii) gives us the norm of x y, and (iii) its direction.
Remark 1.22. Before doing the proof, let us expand what we mean in (iii) by “positive” orientation. Take
x, y, z three vectors of R3 which form a basis. In particular, we must have detpx, y, zq 0. We say that
the basis has positive (resp. negative) orientation if detpx, y, zq ¡ 0 (resp. detpx, y, zq ¡ 0). Intuitively,
we can move between two basis with the same orientation thanks to a physical deformation of the space.
While to move between two basis with different orientation, we need a reflection, that is, to “reflect one
of the basis in a mirror”.
detpx, y, zq px yq z.
Taking z x, the left hand side vanishes (you have a determinant with two identical vectors), thus
px yq x 0. Similarly, px yq y 0. This proves (i) of Proposition 1.20.
8
Chapter 1. The Euclidean space Rd
Then for (iii) we take z x y and use the positivity of the dot product.
Eventually, to get (ii), one needs to take z on the line perpendicular to the plane spanned by x
and y. Indeed, the prism spanned by x, y and z is a right prism with basis of area }x}}y}| sinpθq| and
height }z} hence its volume is | detpx, y, zqq| }x}}y}}z}| sinpθq|. As z and x y are colinear, then
|px yq z| }x y}}z}. Thus }x}}y}}z}| sinpθq| }x y}}z}, and dividing by }z} yields (ii).
From the explicit expression of the definition we get the following identities for the cross product,
some of them would (actually the linearity) would have been hard to prove simply from the geometric
characterization of Proposition 1.20.
(i) Antisymmetry: x y y x.
Again by combining (i) and (ii) we also get linearity with respect to the second variable.
Proof. The points (i) and (ii) are straightforward to check from the explicit expression of Definition 1.17.
To prove the last point (iii), we first notice that if x, y are linearly dependent then detpx, y, zq 0 for
all z. Taking z x y and using Proposition 1.18, we see that }x y} 0, that is x y 0. On the
other hand, if x, y are linearly independent, then there exists at least one z such that detpx, y, zq 0.
For this z, we have (again thanks to Proposition 1.18) z px yq 0, which implies that x y 0.
Example 1.24. Let’s look for an equation of the plane passing through the points A p3, 0, 1q, B p0, 1, 1q
and C p1, 2, 1q.
ÝÝÑ
We look for a vector normal to the plane. It must be normal to both AB p3, 1, 0q and AC
ÝÑ
p4, 2, 2q. Thus it is normal to
ÝÑ ÝÑ 2
Ý 1
AB AC 6
2 3
.
2 1
Thus a point px, y, z q belongs to the plane if and only if the vector joining A to px, y, z q is normal to
p1, 3, 1q which reads
px 3q 3y pz 1q 0.
This gives the equation of the plane. As a safety check, A, B and C indeed satisfy the equation above.
1.4 Domains of Rd
We will use the word domain to talk about subsets of Rd . There are usually two ways to define a domain
of Rd .
Here D is the set of x which can be written x f ptq with f ptq p3 2t, 1 tq. The variable t is
usually called the parameter.
• By an equation, that is as the set of points which satisfy some equation. For instance, we can define
(
D px, y, zq P R3 such that 3x 4 .
2y
As discussed above in Example 1.16, this is a plane normal to p3, 2, 0q and passing through p2, 0, 1q.
9
Bocconi University – course 30543 (Mathematical Analysis module 2)
The distinction is not really sound from a logical point of view because any domain D can be given as
parametrized as the identity function (that is the function defined on D and which maps every x to
itself). So by “parametrization”, we mean “simple parametrization”, that is paramerization that is easy
to manipulate.
Notice that it is easy to check if a given point belongs to a domain defined by an equation (just
check if the equation is satisfied); while it is easy to find one point belonging to a domain defined by
parametrization (just take one value of the parameter). On the other, each reverse task (to find a point
which belongs to a domain defined by an equation, or to check if a point belongs to a domain defined by
parametrization) can be much more difficult and usually amounts to solve an equation.
Example 1.25. Let’s define
(
D px, yq P R2 : x2 y2 1 .
This is a domain defined by an equation. We recognize x2 y 2 }px, y q}2 , thus D is made of points
whose norm is exactly 1. That is such that the distance to the origin is 1: in other words D is the unit
circle in R2 . But note that by trigonometry we can build a definition of D by parametrization. Indeed,
"
*
cosptq
D px, yq P R2 : Dt P R, x
y
sinptq
.
px 3q 2py 1q 0,
The domain D is the intersection of two planes which are not colinear, it should be a line. The first plane
has p1, 1, 1q has a normal vector while the second has p2, 3, 1q as normal vector. Thus a vector parallel
to the line D must be orthogonal to both p1, 1, 1q and p2, 3, 1q, thus it must be parallel to
1 2 4
1
3
1
.
1 1 5
Moreover, we must find at least one point in D. For instance, p0, 0, 1q works. We conclude that D is the
line passing through p0, 0, 1q and parallel to p4, 1, 5q, this reads
$ $ ,
'
& '
&x 4t, / .
D 'px, y, zq P R3 : Dt P R, :
'
y t, /
% %
z 1 5t. -
Below is a list of some standard domains in R2 and R3 .
10
Chapter 1. The Euclidean space Rd
x2 2
Ellipse a2
+ yb2 = 1
(0, b)
a
n (−a, 0)
(a, 0)
x0
Lines In general, a line in Rd is characterized by a point x0 through which the line passes and a vector
a 0 parallel to the line. Then, it reads
(
D x P Rd : Dt P R, x x0 ta .
As we have seen, in R3 , a line can also be seen as an intersection of two planes (which are not parallel).
Planes A plane in R3 is characterized by a point x0 and a normal vector n 0. Then the equation of
the plane is (
D x P R3 : px x0 q n 0 .
Actually, in Rd the equation above defines what is called a hyperplane: a hyperplane is a line in R2 ; a
plane in R3 ; and in general an object of dimension d 1 in Rd ).
tpx, y, zq P R3 : px x0 q2 py y0 q2 pz z0 q2 r2 u
is the sphere of center px0 , x0 , z0 q and of radius r. To see it, note that px x0 q2 py y0 q2 pz z0 q2
is the squared Euclidean distance between px, y, z q and px0 , x0 , z0 q.
11
Bocconi University – course 30543 (Mathematical Analysis module 2)
Length
√ 2
x + y2
Point
√
Cone x2 + y 2 = |z| (x, y, z)
Cylinder x2 + y 2 = 9
a
Cones and cylinders If px, y, z q P R3 , then x2 y 2 is the distance of this point to the vertical axis.
From it we can for instance define the domain
(
D px, y, zq P R3 : x2 y2 r2
which is an (infinite) cylinder of revolution around the vertical axis of radius r. We can also consider
! a )
D px, y, zq P R3 : x2 y2 |z| .
px 1q2 py 2q2 1
(and not px 1q2 py 2q2 1).
Eventually, one can combine these “standard” domains by taking intersections.
Definition 1.29. If D1 , D2 are two domains of Rd , their intersection D1 X D2 is defined as the set of x
which belong to both D1 and D2 .
Example 1.30. In R2 , let’s find the intersection of the circle of center p1, 2q and radius 5 with the line
passing through p4, 0q and with direction p1, 1q.
We write px0 , y0 q p2, 1q. The circle of center px0 , y0 q and radius 5 is made of points such that
}px, yq px0 , y0 q} 5. By squaring the equality and expanding it, this corresponds to points px, yq such
that px 1q2 py 2q2 25 that is such that
x2 y2 2x 4y 20.
12
Chapter 1. The Euclidean space Rd
On the other hand, we can have a representation by parametrization of the line: it is given by the set
of px, y q such that there exists t P R with
#
x 4 t,
y t
Thus a point px, y q belongs the the line if px, y q p4 t, tq and it belongs to the circle if x2 y 2 2x 4y
20. This gives us that t must satisfy the equation p4 tq2 ptq2 2p4 tq 4ptq 20 which reads
2t2 6t 4 0.
This equation has two solutions t 2 and t 1, which yields two points on the intersection: p2, 2q
and p3, 1q.
13
14
Part I
Differentiation
15
16
Chapter 2
Parametric curves
In this chapter we study vector valued functions of one variable, that is functions γ : R Ñ Rd , which are
also called parametric curves. They can be represented as a collection of real-valued functions, a good
part of the analysis boils down to the study of real-valued variables. However there are few differences
and parametric curves encompass information with a more geometric flavor.
cosptq
γ ptq
sinptq
.
The different representations discussed below are plotted in Figure 2.1. A first option would be to plot
the graph of each of the coordinate function: we would get two graphs of real-valued functions. However,
this does not capture the idea that the function is valued in a space R2 . A second alternative is to plot
the graph of the function as a subset of Rd 1 .
Definition 2.2. If γ : I R Ñ Rd is a vector valued function, its graph is the subset of Rd 1
made of
the points x P Rd 1 which can be written
t
p q
γ1 t
x p q
γ2 t t
γ ptq
.
..
γd ptq
for some t P I.
17
Bocconi University – course 30543 (Mathematical Analysis module 2)
γ2 (t)
t = π/2
t=π t=0
γ1 (t)
t = 3π/2
Figure 2.1: Three ways to represent the parametric curve t ÞÑ pcosptq, sinptqq. The left one does not
inform about the geometric content, the one in the center contains too much information, thus we prefer
to stick to the one on the right (the image of the function).
Note that we “gain” one dimension in the representation. The graph of a R2 -valued function is a
subset of R3 whereas the graph of a real-valued function is a subset of R2 (the latter case is what you
saw during the first term).
Usually the representation as a graph carries too much information, and one prefers to restrict to the
image of the function.
Definition 2.3. If γ : I R Ñ Rd is a vector valued function, its image it the subset of Rd made of
points x P Rd which can be written x γ ptq for some t P I.
The image is sometimes called a parametric curve and the function γ is a parametrization of the
curve.
Remark 2.4. Actually, by looking at Definitions 2.2 and 2.3, one can see that the graph of a curve
γ : I R Ñ Rd is the same as the image of the curve θ : I Ñ Rd 1 defined by θptq pt, γ ptqq for t P I.
That is, the concept of image is larger than the one of graph.
On the other hand, to retrieve the image from the graph amounts to project the graph on the last
d-coordinates (that is one “forgets” about the t variable).
Remark 2.5. The image of a function is also a legit concept for real-valued function but it usually does
not contain enough information. For instance, the image of the function g : t P R Ñ cosptq P R is the set
r1, 1s; to compare to the image of the function γ : t P R Ñ pcosptq, sinptqq which is the unit circle of R2 .
One of the two representations has more geometric content than the other!
Importantly, different functions can have the same image. Let’s consider the two functions γ, θ : R Ñ
R2 defined by
3 2t 5 2t
γ p tq and θptq
1t
.
t
Then γ ptq θp1 tq hence the image of the functions are the same. Another example could be the
functions γ, θ : R Ñ R2 and ω : p0, 8q Ñ R2 defined by
Then γ, θ and ω have the same image. We will explore it more in Chapter 7, but these functions
correspond to the same curve (the unit circle of R2 ) but traveled at different speed. Note also that trying
to represent the graph of ω would likely be too intricate: representing the image is usually what contains
enough information to understand what’s happening, while still being able to parse the representation.
2.2 Derivatives
We now switch to the definition of the differential properties of the curve. We will see later why the
definitions given here can be seen as particular case of more principled definitions. However, to study
18
Chapter 2. Parametric curves
parametric curves one does not need to rely on these abstract definitions and just uses the ones of the
real-valued case.
Definition 2.6. Let γ : I R Ñ Rd . The function γ is said continuous at a point t P I if all the
coordinates functions are continuous at t. It is continuous over I if it is continuous at every point of I,
that is if all the coordinate functions are continuous over I.
Hence, checking the continuity of γ amounts to check the continuity of the coordinate functions,
which are themselves real-valued functions (thus all the concepts and theorems of Mathematical Analysis
– Module 1 apply). Checking differentiability and computing derivatives is similar.
Definition 2.7. Let γ : I R Ñ Rd . The function is differentiable at the point t P I if all the coordinate
functions are differentiable at t. If this is the case, the derivative at the point t is the vector of Rd denoted
by γ 1 ptq and defined by
1
γ1 ptq
γ21 ptq
γ 1 ptq . .
..
γd1 ptq
The function is differentiable over I if it is differentiable at every point of I.
Remark 2.8. In physics, when t represents the time, the notation γ9 rather than γ 1 is used to denote the
derivative of γ.
Note that the derivative γ 1 : I Ñ Rd is still a vector-valued function; whereas the speed is a real-valued
one. Differentiating vector-valued functions amounts to differentiate them component-wise so again all
the rules your learned in Mathematical Analysis – Module 1 still hold.
Example 2.10. Let’s again look at γ ptq pcosptq, sinptqq defined on R. Then the function γ : R Ñ R2
is differentiable on R with γ 1 ptq p sinptq, cosptqq. In particular the speed is }γ 1 ptq} 1 and does not
depend on t. You have a function with constant speed but non constant velocity vector: the direction of
the velocity vector changes.
Example 2.11. Let γ : I R Ñ R2 a differentiable function. Prove that the real-valued function g : I ÑR
defined by g ptq }γ ptq}2 for t P I is differentiable and that g 1 ptq 2 γ 1 ptq γ ptq.
Eventually, let us say that we can extend the definition of the derivatives to higher order derivative.
The function γ is said of class C k if it is k-times differentiable and γ pkq is continuous over I. It is
equivalent to require that all the coordinate functions are of class C k over I.
The second derivative γ 2 , sometimes called the acceleration, is of utmost importance, especially in
physics (and in this case it is denoted by γ: ). Indeed, if γ : R Ñ R3 denotes the position of a particle over
time, Newton’s second law states that mγ: F where m is the mass of the particle and F is the sum of
all the forces acting on the particle.
19
Bocconi University – course 30543 (Mathematical Analysis module 2)
Tangent line at t = t0
γ ′ (t0 )
γ(t0 )
γ ′′ (t0 )
Figure 2.2: Image of a curve in R2 (in red) together with its two first derivative and the tangent line at
one point.
h P R ÞÑ γ pt0 q hγ 1 pt0 q P Rd ,
20
Chapter 2. Parametric curves
thus the image of γ is close to the line h ÞÑ γ pt0 q hγ 1 pt0 q: this is the tangent line. This correspond
to a regular point as discussed before.
• If γ 1 pt0 q 0 and γ 2 pt0 q is not colinear to γ 1 pt0 q then
hγ 1 pt0 q
h2 2
γ pt0 hq γ pt0 q γ pt0 q oph2 q
2
thus in the frame pγ 1 pt0 q, γ 2 pt0 qq centered at γ pt0 q, the image of γ is close to the parabola y x2 {2.
This is called a biregular point.
• Actually if γ 2 pt0 q is colinear to γ 1 pt0 q then the image of γ could “cross” the tangent. Indeed, if
γ 2 pt0 q λγ 1 pt0 q is colinear to γ 1 pt0 q but γ 3 pt0 q is not then
λ γ 1 pt0 q
h2 h3 3
γ p t0 hq γ pt0 q h γ pt0 q oph3 q.
2 6
Thus in the frame pγ 1 pt0 q, γ 3 pt0 qq centered at γ pt0 q, the image of γ is close to the curve y x3 {6.
This is called a inflexion point.
• If γ 1 pt0 q 0 but γ 2 pt0 q and γ 3 pt0 q are not colinear then
h2 2 h3 3
γ pt0 hq γ pt0 q γ p t0 q γ pt0 q oph3 q.
2 6
21
Bocconi University – course 30543 (Mathematical Analysis module 2)
Tangent
γ 00 (t0 )
γ(t0 )
γ(t0 ) γ 00 (t0 ) γ(t0 )
γ 000 (t0 )
Ordinary cusp
Inflexion point
Tangent
Biregular point
Figure 2.4: Generic behavior around a biregular point, an inflexion point and an ordinary cusp.
This means that in the frame pγ 2 pt0 q, γ 3 pt0 qq centered at γ pt0 q the image of γ is close to the image
of h ÞÑ ph2 {2, h3 {6q. Similarly to Example 2.15, the curve has a cusp called ordinary cusp.
In the general case, to analyze a singularity, one finds the two smallest integer 1 ¤ p q such that
γ ppq pt0 q and γ pqq pt0 q are not colinear. Then the nature of the singularity depends on the parity of p and
q (the oddness or evenness).
Remark 2.17. In Figure 2.4, notice that γ 1 pt0 q is not necessarily orthogonal to γ 2 pt0 q. Actually, it happens
if and only if γ has constant speed. More precisely, if γ : I Ñ Rd is of class C 2 , then [γ 1 ptq is orthogonal to
γ 2 ptq for all t P I] if and only if [the function t ÞÑ }γ 1 ptq} is constant]. To see that, note that [the function
t ÞÑ }γ 1 ptq} is constant] if and only if g : t ÞÑ }γ 1 ptq}2 is constant. On the other hand, g 1 ptq 2 γ 1 ptq γ 2 ptq.
Thus we conclude by using the property that a function is constant if and only if its derivative vanishes.
For vector valued curve, the short message is that (2.1) fails while (2.2) still holds.
Example 2.18. Let’s take γ : R Ñ R2 defined by γ ptq pcosptq, sinptqq. Then γ is of class C 8 (that is of
class C k for every k ¥ 1). Moreover, γ 1 ptq p sinptq, cosptqq. Note that
γ p2π q γ p0q 0,
22
Chapter 2. Parametric curves
eθ (θ)
er (θ)
Length g(θ)
Figure 2.5: Curve in polar coordinate r gpθq (in red) with the vectors er and eθ .
On the other hand, (2.2) still holds provided one defines, for a continuous function θ : ra, bs Ñ Rd ,
the integral
»b
θptq dt
a
³b
as the vector in Rd whose i-th coordinate is a θi ptq dt; being θi : ra, bs Ñ Rd the i-th coordinate function
of θ. Then (2.2) holds simply by writing it coordinate per coordinate.
cospθq
γ pθq g pθq (2.3)
sinpθq
.
Remark 2.20. By standard properties about product of continuous and differentiable functions, one can
check that if the function g is of class C k for some integer k ¥ 1, then the corresponding function γ is of
class C k . The converse actually holds.
To compute derivatives of the function γ, it is useful to introduce the following functions er , eθ which
are functions defined on R and valued in R2 . For θ P R, we define
cospθq
er pθ q
sinpθq
and eθ cos
sinpθq
pθq .
23
Bocconi University – course 30543 (Mathematical Analysis module 2)
Proposition 2.22. Let I an interval of R and g : I Ñ r0, 8q a function of class C 2 . We define the
function γ : I Ñ R2 via (2.3), that is γ g er . Then the function γ is of class C 2 and
24
Chapter 3
Notions of topology in Rd
Topology denotes the field of mathematics studying proximity in general spaces. It is concerned with
questions like: “What does it mean for a sequence to converge?”, “What does it mean for a function to
be continuous?”. In R and even Rd , there is a canonical way to answer these questions that we will see in
this chapter. A trend during the 20th century was to design topological notions for much more general
spaces than Rd . Even though we will not tackle this issue this year (but you will in the next years), this
has an impact on the content of this chapter as the notions are presented in such a way that they extend
easily to more general frameworks. b
°d
As we have seen before, with Rd comes a norm, that is, }x} i1 pxi q measures the length of
2
the vector x. From this norm one can define the distance between points in Rd : the distance between x
and y is nothing else than }x y}. If d 1, it boils down to |x y | the absolute value between the two
real numbers.
From the triangle inequality }x y} ¤ }x} }y} (see Proposition 1.12), one can write a triangle
inequality for distances which reads
}x y} ¤ }x z} }z y}.
Topology can be studied in the more general context of metric spaces, which are spaces endowed with
a metric satisfying some axioms including the triangle inequality.
3.1 Limits
We consider in this section sequences in Rd , that we denote by pxn qnPN . This corresponds to points in
Rd indexed by n P N, or, said differently, to a function N Ñ Rd .
Definition 3.1. Let pxn qnPN be a sequence in Rd and a P Rd . We say that the sequence pxn qnPN converges
to a P Rd if and only if the real-valued sequence p}xn a}qnPN converges to 0.
In other words, pxn qnPN converges to a P Rd if the distance between xn and a converges to 0 as
n Ñ 8. The characterization with quantifiers is the following.
Proposition 3.2 (ε δ characterization of the limit). The sequence pxn qnPN converges to a P Rd if and
only if
@ε ¡ 0, DN P N, @n ¥ N, }xn a} ¤ ε.
Proof. This is just about copying the definition that a real-valued sequence converges to 0.
This is to compare to the definition of a real-valued sequence pxn qnPN converging to a P R: it reads
@ε ¡ 0, DN P N, @n ¥ N, |xn a| ¤ ε.
Thus, the only difference is that absolute values have been replaced by the norm. There is another
characterization: just looking at what’s happening coordinate per coordinate.
25
Bocconi University – course 30543 (Mathematical Analysis module 2)
x2
Bc (a, ε) x1
xN
ε
a
Proposition 3.3 (Coordinate-wise characterization of the limit). Let pxn qnPN be a sequence in Rd and
a P Rd . For i P t1, 2, . . . , du we write xi,n for the i-th coordinate for the vector xn . Then the sequence
pxn qnPN converges to a P Rd if and only if for all i P t1, 2, . . . , du, the sequence pxi,n qnPN converges to ai .
Proof. Implication pñq. We assume that pxn qnPN converges to a P Rd . Fix i P t1, 2, . . . , du and note that
g
f d b
f¸
}xn a} e xj,np aj q2 ¥ pxi,n ai q2 |xi,n ai |.
j 1
As the left hand side goes to 0, so does the right hand side.
Implication pðq. We start again from the expression
g
f d
f¸
}xn a} e p
xi,n ai q2 .
i 1
Each of the sequence xi,n ai goes to 0 when n Ñ 8 for i P t1, 2, . . . du. Thus so do the sequences
pxi,n ai q2 because the square of a sequence going to 0 also goes to 0. A finite sum of sequence going
to 0 also goes to 0, and then when we take the square root we still get a sequence which goes to 0. This
shows that the sequence }xn a} goes to 0 when n Ñ 8 and concludes the proof.
Proposition 3.4 (Operations on limits). Let pxn qnPN and pyn qnPN two sequences in Rd which converge
to a and b respectively. Then,
26
Chapter 3. Notions of topology in Rd
Dε ¡ 0, @y P Rd , }x y} ¤ ε ñ y P V.
In particular, if V is a neighborhood of x then x P V . A neighborhood of x is a subset of Rd which
contains all points which are sufficiently close to x.
Example 3.6. Let’s take a p1, 0q P R2 . Then V tpx, y q : x ¥ 0u is a neighborhood of a: indeed,
Bc px, εq V for all ε P p0, 1q. On the other hand, V tpx, y q : x ¥ 1u is not a neighborhood of a
(exercise), and neither is V tpx, y q : y ¥ 2u (as a does not even belong to V ).
With this vocabulary, we can reformulate what it means for a sequence to converge.
Proposition 3.7 (Characterization of the limit in terms of neighborhood). Let pxn qnPN be a sequence
in Rd and a P Rd . Then the sequence pxn qnPN converges to a if and only if for all V neighborhood of a,
there exists N P N such that xn P V for all n ¥ N .
Proof. Implication pñq. We assume that pxn qnPN converges to a and we take V a neighborhood of a.
Then by definition there exists ε ¡ 0 such that Bc pa, εq V . By definition of the limit, there exists
N P N such that xn P Bc pa, εq for all n ¥ N . This concludes the implication as Bc pa, εq V .
Implication pðq. We assume that for all V neighborhood of a, there exists N P N such that xn P V
for all n ¥ N . In particular, if ε ¡ 0, taking V Bc pa, εq (which is a neighborhood of a) we see that
there exists N P N such that xn P V Bc pa, εq for all n ¥ N . This is enough to say that pxn qnPN
converges to a.
Definition 3.8 (Interior). Let V Rd . We call interior of V , and write V̊ , the subset of Rd made of
x P Rd such that V is a neighborhood of x. That is x P V̊ if and only if V is a neighborhood of x. With
quantifiers:
x P V̊ ô Dε ¡ 0, @y P Rd , }x y} ¤ ε ñ y P V .
Example 3.9. This notion is indeed what we have in mind when we speak of interior. For instance, the
interior of the set V tpx, y q P R2 : y ¤ 3u R2 is V̊ tpx, y q P R2 : y 3u. Or the interior of
Bc pa, rq is Bo pa, rq for any a P Rd and r ¡ 0.
Let’s state some straightforward properties of the interior whose proof is quite direct and left as an
exercise.
Proposition 3.10. Let V, W be two subsets of Rd and V̊ , W̊ be their interior.
(i) There always holds V̊ V.
(ii) If V W then V̊ W̊ .
A concept that is tightly linked to the one of interior is the one of closure (it’s not apparent at first
glance but will be proved in Proposition 3.17).
27
Bocconi University – course 30543 (Mathematical Analysis module 2)
ε2 ε3
x2 x3
Domain D
Definition 3.11 (Closure). Let V be a subset of Rd . We say that x belongs to the closure of V , and we
write x P V if and only if there exists a sequence pyn qnPN in Rd such that yn P V for all n P N and the
sequence pyn qnPN converges to x.
The definition of closure is more intricate because it is about the existence of sequence. Note that
one can still prove the following properties similar to the interior.
Example 3.13. Let V tpx, y q : y ¡ 1u R2 , see Figure 3.3. Then V tpx, y q : y ¥ 1u. Indeed, let
px0 , y0 q P V . If y0 ¡ 1, then px0 , y0 q already belongs to V . If y0 1, then one can define the sequence
pxn , yn qnPN by xn x0 and yn 1 1{n for n P N. One can check that this sequence converges to
px0 , y0 q px0 , 1q while yn 1 1{n ¡ 1 for all n P N, so pxn , yn qn P V for all n P N. This shows that V
contains tpx, y q : y ¥ 1u. Eventually, if pxn , yn qnPN is any converging sequence which belongs to V , for
any n there holds yn ¡ 1. Passing to the limit, limn yn ¥ 1. Thus the closure of V is necessarily included
in tpx, y q : y ¥ 1u.
28
Chapter 3. Notions of topology in Rd
In the interior of D
Figure 3.4: Example of a domain together with some points in its interior, its closure, its complement.
Example 3.14. On the other hand, let’s check that if a P Rd and r ¥ 0, the closure of the closed ball
Bc pa, rq is itself. A set is always contained in its closure. On the other hand, let x in the closure of
Bc pa, rq. There exists pyn qnPN a sequence in Bc pa, rq which converges to x. By Proposition 3.4,
V̊ Y V c Rd , (3.1)
pV c q pV̊ qc and V
c
pV c q˚.
Said with words, the closure of the complement is the complement of the interior and vice versa.
Example 3.18. Try to write it for instance in the case V tpx, yq : y ¡ 1u R2 . You can also take a
look at Figure 3.5.
Proof. Identity (3.1). Let’s take x P Rd . If x P V̊ , then there is nothing to prove. On the other hand,
assume that x R V̊ , then we want to prove that x P V c . For each n ¥ 1, we know that Bc px, 1{nq is not
included into V (if not x P V̊ ). Let’s take yn a point in V c X Bc px, 1{nq. In particular }x yn } ¤ 1{n,
thus the sequence pyn qnPN converges to x. As moreover yn P V c for all n P N, we conclude that x P pV c q.
In (3.1), the union is disjoint. Let’s take x which belongs to both V̊ and V c . We want to reach a
contradiction. As x P V̊ , there exists ε ¡ 0 such that Bc px, εq V . On the other hand, as x P V c , there
exists a sequence pyn qnPN such that all elements belong to V c , and which converges to x. In particular,
yn belongs to Bc px, εq V for n large enough. This contradicts yn P V c for all n.
Deducing the other identities from (3.1). That pV c q pV̊ qc is really a rewriting of (3.1). Then to
prove the last identity we introduce W V c , we can write the second one pV c q pV̊ qc as W ppW c q˚qc .
Taking then the complement, and as pDc qc D for any set D Rd , we get W pW c q˚. But as V can
c
29
Bocconi University – course 30543 (Mathematical Analysis module 2)
Domain V = Bc (x, r)
= ∪
Vc =
r
x
V̊ ∪ V c = Rd
= ∪
Disjoint union
Figure 3.5: Illustration of one of the identity in Proposition 3.17 about the link between interior, closure
and complementation.
@x P V, Dε ¡ 0, Bc px, εq V.
On the other hand, a set is closed if it coincides with its closure. This can be interpreted as being “stable
by limits”. Indeed, V Rd is closed if and only if: for all sequence pyn qnPN which is convergent to some
x P Rd , if yn P V for all n P N then x P V . That is, the limit of a sequence in V still stays in V .
Example 3.20. Open balls are open, closed balls are closed (exercise).
The counterpart of Proposition 3.17 is the following.
Proposition 3.21 (Link between open and closed sets). The complement of a open set is closed, and
the complement of a closed set is open.
Proof. This is a direct consequence of Proposition 3.17. Indeed, let’s take V an open set (that is such
that V̊ V ). There holds
pV c q pV̊ qc V c .
This exactly means that V c is closed. To prove that the complement of a closed set is open is done in a
similar way.
Proposition 3.22. Let V be a subset of Rd . Then V̊ is open, it is the largest open set contained in V .
On the other hand V is closed, it is the smallest closed set containing V .
Proof. Let’s first prove that V̊ is open. If x P V̊ , then there exists ε ¡ 0 such that Bc px, εq V ,
in particular Bo px, εq V . Taking the “interior” on both sides of the inclusion (specifically: using
Proposition 3.10 (ii)) and as open balls are open, Bo px, εq V̊ . In particular Bc px, ε{2q V̊ and this
is enough to conclude that x belongs to the interior of V̊ , thus V̊ is open. Moreover, let W V be an
open set. Using again Proposition 3.10 (ii), we see that W̊ V̊ , but W̊ W by openness of W , which
shows W V̊ . That is, every open set W contained in V is contained in the interior of V : the latter is
the largest open set contained in V .
30
Chapter 3. Notions of topology in Rd
Then, to prove that V is closed, we rather reason with Proposition 3.17 and Proposition 3.21. Indeed,
V is the complement of pV c q˚(Proposition 3.17), that is is the complement of an open set by what we
just proved. Using Proposition 3.21, the complement of an open set is a closed set. Then, proving the
minimizing property of V can be done similarly to V̊ , with Proposition 3.12 which shows that V W
for any closed set W which contains V .
31
32
Chapter 4
In this chapter, we study functions f : D R2 Ñ R, that is, functions defined over a domain of R2 and
valued into R. These functions map each point of their domain of definition to a scalar value.
All the concepts of this chapter can be extended in a straightforward way to functions of more than
2 variables (but still real-valued). We prefer to keep it to d 2 in this chapter for the sake of clarity. We
will use px, y q as the notation for a generic point in R2 .
First, let’s motivate the study of such functions with “concrete” examples.
• Some formulas can be read as functions of two variables. For instance, the volume of a cylinder of
height x and radius y is πxy 2 . We can define the function f : px, y q ÞÑ πxy 2 which expresses the
volume of a cylinder as a function of its dimensions.
• With a physical flavor. In an ideal gas, there holds P V nRT , with P the pressure, V the volume,
n the number of molecules and T the temperature (and R a physical constant). Thus we could
define f pP, T q RT {P which gives the volume occupied by one mole of ideal gas as a function of
the temperature and the pressure.
• Or a function of two variables can be read from some data. For instance f px, y q is the temperature
measured at a longitude x and latitude y at the surface of the earth.
33
Bocconi University – course 30543 (Mathematical Analysis module 2)
Figure 4.1: Two examples of functions of two variables from “real” data. Left: altitude as a function of the
position, specifically only the level sets are represented (image found on Wikipedia, material originally
coming from the United States Geological Survey). Right: temperature as a function of the position
(coming from the website openweathermap.org). Note that in both cases one does not represent the
graph of the function: on the right one uses colors, on the left one just plots the level sets.
Definition 4.4 (Level sets). Let f : D R2 Ñ R a function of 2 variables. The level set (at the level
k P R) is the subset of R2 made of points px, y q P D such that f px, y q k.
The generic situation is that a level set is a curve in R2 , but it’s not always the case: think at a constant
function whose level sets are all empty but for one which is R2 .
Remark 4.5. For physical quantities, the level sets are usually called iso[something]. For instance, the
level sets of constant temperature are the isotherms, the level sets of constant pressure are the isobars,
etc.
Example 4.6. Let’s take the example of a linear (rather affine) function. We consider f : px, y q P R2 ÞÑ
ax by c where a, b, c P R are fixed.
Then the graph of f , which is a subset of R3 , is characterized by the equation
ax by z c 0.
That is, it is a plane with normal vector pa, b, 1q. On the other hand, the level set at the level k P R is
the line in R2 defined by
ax by k c,
it is a line which is normal to the vector pa, bq and directed by pb, aq. Actually, the vector pa, bq represents
the direction in which the function “increases the most”, the vector pa, bq represents the direction in
which the function “decreases the most” while pb, aq represents the direction in which the function does
not change.
Eventually, by composition with a curve, one can go back with a usual function R Ñ R. Indeed, if
γ : I Ñ R2 where I is an interval of R and f : D R2 Ñ R, then provided that γ ptq P D for all t P I one
can define f γ : t P I ÞÑ f pγ1 ptq, γ2 ptqq P R. Though a lesson of the rest of this chapter will be that it’s
sometimes hard to understand a function by looking only at its behavior when restricted along curve.
4.2 Continuity
Let’s turn to the definition of continuity. We will give different characterizations, but we start with a
ε δ definition.
34
Chapter 4. Continuity and differentiability for functions from R2 to R
Figure 4.2: Take f : px, y q ÞÑ 3x2 y 2 . On the left is the graph of the function as a subset of R3 . On the
right are some level sets, that is subsets of R2 defined by f px, y q k for some k P R.
f (x0 ) + ε
x0
f (Bc (x0 , δ))
δ
Bc (x0 , δ) f (x0 )
f (x0 ) − ε
R2 f
R
Figure 4.3: Illustration of the definition of continuity for a function of two variables.
Loosely speaking, a function f is continuous over D if it commutes with the limit, that is if
35
Bocconi University – course 30543 (Mathematical Analysis module 2)
Proof. Let’s first assume that f is continuous at a point x P D and let pyn qnPN be a sequence which
converges to x and such that yn P D for all n P N. We want to show that f pyn q converges to f pxq as
n Ñ 8. Let ε ¡ 0. By continuity of f , we can find δ ¡ 0 such that }y x} ¤ δ implies |f pyq f pxq| ¤ ε.
As pyn qnPN converges to x, we can find N such that if n ¥ N then }yn x} ¤ δ. Thus combining these
two assertions, if n ¥ N then |f pyn q f pxq| ¤ ε. This is enough to show that pf pyn qqnPN converges to
f pxq.
On the other hand to prove the converse we reason by contraposition and we assume that f is not
continuous at a point x. That means with quantifiers,
This characterization is the one one should use to prove that a function is continuous when this is
not a “delicate” case. For instance, with Proposition 4.8 as well as Proposition 3.3 (which shows that a
sequence in R2 converges if and only if it does coordinate per coordinate) you can show that the functions
px, yq ÞÑ lnpx2 y 2 9q
?y, px, yq ÞÑ 3x 2y, px, yq ÞÑ expxpx4
2
y2 q
1
are all continuous (over its domain of definition for the first one, over R2 for the two last ones).
Proof. This is a consequence of Proposition 4.8 once we know that sums, products and quotients of
converging sequences converge to sums, product and quotients of the limits.
Let’s prove a first result about the composition of continuous functions, we will see more general
results later in the course.
Proof. We rely on the sequential characterization. Let t P I and ptn qnPN a sequence which converges
to t. Then γ ptn q converges to γ ptq in R2 (to check that put together Definition 2.6 with Proposition
3.3). Thus, by sequential characterization (Proposition 4.8) the sequence pf pγ ptn qqqnPN converges to
pf pγ ptqqqnPN . This is enough to prove the claim.
However, it can happen that f γ is continuous for many curves γ while f is not continuous.
Example 4.11. Let’s take f : R 2
Ñ R defined by
#
1 if y x2 and px, y q p0, 0q,
f px, y q
0 otherwise.
This function is not continuous at p0, 0q. Indeed, for every δ ¡ 0 the ball centered at p0, 0q and of radius
δ contains points such that y x2 thus there exists px, y q P Bc pp0, 0q, δ q such that |f px, y q f p0, 0q|
|1 0| 1. Another way to see the discontinuity, is to take the sequence pp1{n, 1{n2 qqnPN which converges
to p0, 0q but f p1{n, 1{n2 q 1 does not converge to f p0, 0q.
On the other hand, if γ is the parametrization of a straight line with γ p0, 0q p0, 0q, then f γ is
continuous at p0, 0q. Indeed, let pv, wq P R2 be a non zero vector and γ : t P R ÞÑ tpv, wq ptv, twq be a
36
Chapter 4. Continuity and differentiability for functions from R2 to R
Curve y 2 = x;
f = 1 on it except at (0, 0)
f (0, 0) = 0
f (x, y) = 0 not on
the blue curve
parametrization of a line directed by pv, wq. If v 0 or w 0 then the image of γ is the Ox or the Oy
axis and f pγ ptqq 0 for all t P R. If not then
#
1 if t w{v 2 ,
f pγ ptqq
0 otherwise.
Thus for a given pv, wq, the map f γ is continuous at t 0. Actually f γ is even differentiable on a
neighborhood of 0. In other words, the restriction of f to a straight line passing through the origin is
always continuous at the origin, but f itself is discontinuous at p0, 0q.
The take-home message from this is that it’s not enough to look at the behavior of f across straight
lines passing through the origin to understand what’s going on.
Let’s end this section with a word on linear functions. A linear function from R2 to R can be written
Lpx, y q ax by
where a, b are two real numbers corresponding to Lp1, 0q and Lp0, 1q respectively. In particular, using for
instance Proposition 4.8:
Proposition 4.12. A linear function defined on R2 and valued in R is always continuous.
37
Bocconi University – course 30543 (Mathematical Analysis module 2)
that we can find δ ¡ 0 such that Bo px, δ q f 1 pV q. The latter inclusion implies that |f pxq f pyq| ε
as soon as }y x} δ, and it is enough to conclude to the continuity of f at x. We finish by noticing
that x P R2 is arbitrary.
There is actually a similar characterization with the inverse image of closed sets. We first recall
without proof the following property from set theory: inverse image and complementation “commute”.
Proof. The proof is left as an exercise but can be thought as a corollary of Proposition 4.14 and Proposition
3.21 from the previous chapter, which shows that closed sets are nothing else than the complement of
open sets (and of course use Lemma 4.15 just stated above).
Remark 4.17. The direct image of an open set is in general not open, and the direct image of a closed
set is in general not closed, but it is already the case for functions of one variable. For instance, for the
function f : x ÞÑ cospxq, then the direct image of any open interval of length larger than 2π by f is
r1, 1s, which is not open (actually it is closed).
Definition 4.18 (Partial derivative). Let f : D R2 Ñ R. We define the partial derivative in the x
direction at a point px0 , y0 q P R2 as, if it exists,
Equivalently, the partial derivative in the x direction at a point px0 , y0 q P R2 is the derivative at 0 of the
function h ÞÑ f px0 h, y0 q.
We define in a similar way the partial derivative in the y direction at a point px0 , y0 q P R2 as
In other words, we only look at the derivate of a function defined on R, namely f γ where γ is the
parametrization of a line passing through px0 , y0 q with direction p1, 0q or p0, 1q. More generally, if u P R2
is any unit vector (that is }u} 1) in R2 , we can define
the partial derivative in the direction of u. If u is not zero but is not a unit vector, the directional
derivative correspond to the one with respect to u{}u}, that is the unit vector sharing the same direction
as u. The partial derivatives defined above correspond to u e1 or e2 the two vectors of the canonical
basis.
Remark 4.19. In practice, to compute a partial derivative, one “freezes” the other variable and use
derivation rules for single variable calculus. For instance, if f is defined by f px, y q lnpx2 y 2 1q 3xy 2
then
Bf px, yq 2x 3y2 and Bf px, yq 2y
6xy. (4.2)
Bx x2 y 2 1 By x 2 y2 1
38
Chapter 4. Continuity and differentiability for functions from R2 to R
Remark 4.20. The notation pB f {B xqpx, y q is a bit ambiguous because the second x denotes the point at
which we differentiate while the first x indicates the variable with respect to which we differentitate. A
more proper notation could be B1 f px, y q to indicate that we differentiate f with respect with its first
coordinate.
As an example, let’s look at the function f defined on R2 by
f px, y q xy,
4.5 Differential
We now turn to the concept of differentiability. We will start with a first definition which is not the most
standard but is easier to grasp, and also the one that one uses to check that a function is differentiable
on examples. At the end of the section we will give one which is more abstract, but is also the one which
makes sense in more general context.
First, as a preliminary result, let’s define what we will mean by op}h}q.
Lemma 4.21. Let g : D R2 Ñ R defined on a neighborhood of 0. Then the followings are equivalent:
(i) There holds g p0q 0 and the function h Ñ g phq{}h} converges to 0 as h Ñ 0, that is
g phq
@ε ¡ 0, Dδ ¡ 0, @h P Dzt0u, }h} ¤ δ ñ }h} ¤ ε.
39
Bocconi University – course 30543 (Mathematical Analysis module 2)
Now let’s consider a point px0 , y0 q and a function f : R2 Ñ R for which both partial derivatives exist.
Note that we can read it as:
f px0 h, y0 q f px0 , y0 q
Bf px , y q, f px0 , y0 k q f px0 , y0 q
Bf px , y q,
h
Bx 0 0 k
By 0 0
where the two equations correspond to respectively existence of B f {B x and B f {B y. The precise statement
would involve Taylor expansion (see (4.3) below), it is the characterization of derivatives for functions of
one variable that you have seen in Mathematical Analysis – Module 1. Differentiability is the possibility
to combine these two estimates together, that is, to write:
f px0 k q f px0 , y0 q
B f px , y q Bf px , y q.
h, y0 h
Bx 0 0 k
By 0 0
More specifically, the will be replaced by a ophq as defined in Lemma 4.21.
Definition 4.22 (Differentiability, first definition). Let f : D R2 Ñ R be a function of two variables
defined on an open set D. We say that f is differentiable at px0 , y0 q P R2 if both partial derivatives of f
at px0 , y0 q exist and there exists r ¡ 0 such that, for all ph, k q P Bc p0, rq there holds px0 h, y0 k q P D
and
Bf Bf
f px0 h, y0 k q f px0 , y0 q h px0 , y0 q k px0 , y0 q opph, k qq.
Bx By
In this case, we call differential of f at px0 , y0 q, and write Dfpx ,y q the linear map from R2 to R defined
by: for ph, k q P R2
0 0
Bf Bf
Dfpx ,y q ph, k q h px0 , y0 q k px0 , y0 q.
0 0
Bx By
The equation defining differentiability should be read:
f px0 k q looomooon
f px0 , y0 q
Bf px , y q k Bf px , y q pph, kqq,
h, y0 h
Bx 0 0 By 0 0
loooooooooooooooomoooooooooooooooon
olooomooon
constant term
p q
reminder
linear function of h,k
that is, f is well described in a neighborhood of px0 , y0 q by the sum of the constant term f px0 , y0 q and a
linear function (which we call the differential).
At this point it may be useful to make the analogy with functions of one variable. Note that if
f : R Ñ R then its derivative at the point x0 is defined as the limit when h Ñ 0 of
f p x0 h q f p x0 q
.
h
If f is defined over R2 , then both x0 and h are vectors and its is not clear what the expression above
means: there is no canonical definition of 1{h the inverse of a vector1 . We rather look at this equivalent
definition of derivative: if f : R Ñ R is differentiable at x0 P R then
f p x0 hq f px0 q f 1 px0 qh ophq, (4.3)
where ophq is a quantity that goes to 0 faster than h as h Ñ 0. In some sense, the only thing that we have
done is “multiplying by h”. But this definition can be extended to functions defined over R2 . Indeed,
in this case f 1 px0 qh is interpreted as a linear function h ÞÑ f 1 px0 qh. For a function of two variables, the
term h ÞÑ f 1 px0 qh is replaced by the linear map
which is a complex number. This gives rise to the field of complex analysis which we will not discuss at all.
40
Chapter 4. Continuity and differentiability for functions from R2 to R
Following Example 4.6, the gradient gives the direction where the differential “increases the most”. As
the differential approximates the function, the interpretation is that ∇f px0 q gives the direction in which
the function f increases the most if one starts from point x0 P R2 .
Proof. Recall that we can write f px0 hq f px0 q Dfx0 phq ophq. The function f being continuous
at x0 is equivalent to g : h ÞÑ f px0 hq being continuous at 0. But g is the sum of f px0 q (which does
not depend on h hence is continuous), Dfx0 (which is continuous, see Proposition 4.12) and ophq. The
latter goes to 0 as h tends to 0 as seen for instance in Lemma 4.21, (ii).
As in one dimension the derivative gives the tangent line to the graph of f , the differential gives the
tangent plane to the graph of f . First let us notice that by writing h y x0 , one can write the Taylor
expansion as
f pyq f px0 q Dfx0 py x0 q opy x0 q.
The graph of the principal part of the Taylor expansion corresponds to the tangent plane.
Bf px , y qpx x q Bf px , y qpy y q pf px , y q zq 0.
Bx 0 0 0
By 0 0 0 0 0
The explicit expression for the tangent plane can be obtained thanks Definition 4.22 (which gives the
expression of the differential in terms of partial derivatives) and Example 4.6 (which gives the expression
of the graph of an affine function).
We then move to a result justifying differentiability. Indeed, a priori if we read Definition 4.22 one
should first compute the partial derivatives, and then check that the difference between f and its candidate
differential is small, that is, is ophq. As emphasized before (Example 4.11), if the partial derivatives exist
at a point then the function is not necessarily differentiable. However this becomes the case if the partial
derivatives exist at every point and are continuous, this is the point of the following result.
Theorem 4.26 (Existence and continuity of partial derivatives implies differentiability). Let f : D
R2 Ñ R be a function of two variables defined on an open set D. We assume that the partial derivatives
exist at every point of D, and that the functions BBfx : D Ñ R and BBfy : D Ñ R are continuous (as functions
of two variables) on D. Then the function f is differentiable at every point of D.
In such a case, the function is said of class C 1 .
41
Bocconi University – course 30543 (Mathematical Analysis module 2)
Figure 4.5: Example of the graph of the function f px, y q 3x2 y 2 xy (in red) with its tangent plane
at the point A p1, 1, 3q and the normal vector the the plane pB f {B x, B f {B y, 1q (in green).
y0 + k
f (x0 + h, y0 + k) − f (x0 , y0 )
f (x0 + h, y0 + k) − f (x0 + h, y0 )
∂f
y0 =k (x0 + h, y 0 )
∂y
y0
∂f 0
f (x0 + h, y0 ) − f (x0 , y0 ) = h (x , y0 )
∂x
x0 x0 x0 + h
Figure 4.6: Idea of the proof of Theorem 4.26. We decompose f px0 k, y0 lq f px0 , y0 q (in blue) as two
variations (in red and green) aligned with the axis, and for each of them we use the mean value theorem
for a function of one variable.
42
Chapter 4. Continuity and differentiability for functions from R2 to R
Proof. Let x0 px0 , y0 q P D be fixed and take h ph, kq. We look at
Bf Bf
∆phq f px0 hq f px0 q h px0 q k px0 q
Bx By
and we want to show that the quantity ∆phq is a ophq. To that end, we will rewrite ∆phq using the mean
value theorem.
Specifically we write
f px0
Bf px1 , y q k Bf px h, y1 q.
hq f px0 q h (4.4)
Bx 0 By 0
Now we want to replace the points px1 , y0 q and px0 h, y 1 q by px0 , y0 q: we will do that up to a small error
thanks to the continuity of the partial derivatives.
Let ε ¡ 0. By continuity of B f {B x and B f {B y, we can find δ ¡ 0 such that, if }h1 } ¤ δ then
|Bf {Bxpx0 h1 q Bf {Bxpx0 q| ¤ ε and |Bf {Bypx0 h1 q Bf {Bypx0 q| ¤ ε. So now let’s fix h P Bc p0, δq.
Applying the result above for h1 ph, y 1 y0 q and h1 px1 x0 , 0q we discover that
B p
f
h, y 1 q
Bf px , y q ¤ ε and
B p
f 1
B f
q Bx px0 , y0 q ¤ ε.
B
y x0 By 0 0 B
x x , y0
Then plugging this bound into (4.4) and playing with the triangle inequality we see:
|∆phq| f px0 hq f px0 q h
Bf px q k Bf px q
Bx
0 By 0
h BBfx px1 , y0 q BBfx px0 , y0 q k BBfy px0
h, y 1 q
Bf px , y q
¤ |h|ε |k|ε ¤ ?2ε}h},
By 0 0
where the last line is Cauchy Schwarz (Proposition 1.9). Thus we fall in (iii) of Lemma 4.21, and it shows
that ∆phq ophq which concludes the proof.
One corollary of this result is the following stability result for functions of class C 1 .
Proposition 4.27. Let f, g : D R2 Ñ R two functions of class C 1 defined on a open subset of R2 .
Then the functions f g and f g are of class C 1 . If g does not vanish on D, the function f {g is of class
C 1.
Proof. It is enough to prove that the functions f g, f g and f {g have partial derivatives and that these
partial derivatives are continuous. The first property derives from calculus for functions of one variable
(the sum, product, quotient of functions of differentiable functions is differentiable), and for the second
property we can use Proposition 4.9 which shows that sum, product, quotient of continuous functions
over R2 are continuous.
Eventually, we conclude this section by a second equivalent definition of differentiability which does
not involve partial derivatives: it is enough for the function to be approximated by a linear application,
and then necessarily this linear function is the differential and can be written with the help of partial
derivatives. In most textbooks, this is directly this definition that you will encounter, as it is more
compact and more “elegant” than Definition 4.22.
43
Bocconi University – course 30543 (Mathematical Analysis module 2)
In this case, the partial derivatives of f at px0 , y0 q exist and we necessarily have L Dfx0 , that is,
Lph, k q h
B f px , y q Bf px , y q.
Bx 0 0 k
By 0 0
Proof. The direction pñq is direct: if f is differentiable according to Definition 4.22, then indeed such L
exists and is nothing else than Dfx0 .
On the other hand, the converse direction pðq is where there is something to do. So let’s assume
that there exists such a L. We claim that it implies that the partial derivatives at px0 , y0 q exist. Indeed,
taking h ph, 0q for h small we find
by linearity of h. It implies that the function h ÞÑ f px0 h, y0 q is differentiable at 0 with derivative given
by Lp1, 0q. Thus B f {B xpx0 , y0 q exists and is equal to Lp1, 0q. Similarly, testing with h p0, hq, we find
that B f {B y px0 , y0 q exists and is equal to Lp0, 1q. Thus, by linearity of L,
Lph, k q hLp1, 0q
Bf px , y q
kLp0, 1q h
Bf px , y q Df phq.
Bx 0 0 k
By 0 0 x 0
Eventually, we simply write f px0 hq f px0 q Lphq f px0 hq f px0 q Dfx phq, and the left hand
side is ophq by assumption, so is the right hand side. We conclude to the differentiability of f .
0
γ pt0 hq γ pt0 q
lim Dfγ pt0 q
h Ñ0 h
Dfγpt q pγ 1 pt0 qq
0
by definition of γ 1 and continuity of Dfγ pt0 q . On the other hand, to handle the “small o” let us write it
}h}ωphq where ω is a function which goes to 0 as h goes to 0. Then
1
opγ pt0 hq γ pt0 qq
}γ pt0 hq γ pt0 q}
ω pγ pt0 hq γ pt0 qq.
h h
44
Chapter 4. Continuity and differentiability for functions from R2 to R
x0
∇f (x0 )
Figure 4.7: Level sets of a function (in blue) with the gradient ∇f px0 q (in red) being orthogonal to the
level at at the point x0 (in green).
When h Ñ 0, then ω pγ pt0 hq γ pt0 qq goes to 0 (as γ is continuous at t0 ) while }γ pt0 hhqγ pt0 q} goes to
}γ 1 pt0 q} (actually it tends to }γ 1 pt0 q} if h stays positive and }γ 1 pt0 q} if h stays negative) and remains
bounded. Thus we have the product of a term which is bounded by a term which goes to 0: the whole
thing goes to 0. This concludes the proof.
A first consequence is the following. If a function is differentialble, then all the directional derivatives
can be expressed in terms of the differential, that is, of the two partial derivatives.
Proposition 4.30. Let f : D R2 Ñ R be a function of two variables defined on an open set D which
is differentiable at a point x0 P D. If u P R2 is a unit vector (that is }ubf } 1), then f is differentiable
at x0 in the direction u and
Bf px q Df puq ∇f px q u,
Bu 0 x0 0
where we recall that the directional derivative is defined in (4.1). In coordinates, if u pu1 , u2 q P R2 ,
Bf px q u Bf px q u Bf px q.
Bu 0 1
Bx 0 2 By 0
Proof. As seen in (4.1), denoting γ : t ÞÑ x0 tu, then B f {B upx0 q is nothing else than the derivative of
the function f γ for t 0. Thus we directly apply Proposition 4.29.
Another consequence of Proposition 4.29 is to interpret the gradient as the vector to which level sets
are orthogonal. One way to phrase a rigorous statement is the following.
Proposition 4.31. Let f : D R2 Ñ R be a function of two variables of class C 1 defined on an open
set D. Let γ : I R Ñ R2 which takes its value in a fixed level set of f . Then ∇f pγ ptqq γ 1 ptq 0 for
all t P I
Proof. Immediate from Proposition 4.29 once we notice that f γ is a constant function (this is equivalent
to say that γ is included in a level set).
The way to read ∇f pγ ptqq γ 1 ptq 0 is as follows: γ 1 ptq is the direction of the tangent to the curve
γ, which is included in the level set; and this direction is orthogonal to ∇f pγ ptqq the gradient of the
function. So for instance if the level set can written as the image of a C 1 curve, then the tangent to the
curve is orthogonal to the gradient. This is actually not that surprising: as mentioned in Example 4.6,
for an affine function px, y q ÞÑ ax by c, the vector pa, bq is orthogonal to level sets. In the case of
general function, the vector pa, bq of the linear approximation around a point x0 is nothing else than the
gradient ∇f px0 q.
45
46
Chapter 5
In this chapter, we study the more general case of functions defined on a subset of Rd and valued in Rp .
This encompasses Chapter 2 (with d 1) and Chapter 4 (with d 2 and p 1).
The question of continuity is very similar to Chapter 4 and we will not do the proofs in details.
Differentiability is a little bit more involved but the general idea is the same: the differential Dfx0 of
a function f : Rd Ñ Rp at a point x0 is a linear function Rd Ñ Rp such that f px0 q f px0 q is
approximated by Dfx0 . Though now the differential is represented not by the gradient vector but by a
matrix, called the Jacobian matrix. The main theoretical result of this section is the chain rule: abstractly
the result is very neat, as the differential of the composition is the composition of the differentials.
5.1 Definition
Definition 5.1. A function on f : D Rd Ñ Rp is the data of D Rd which is the domain of definition,
and f which associates to every point x of D a vector f pxq P Rp .
That is, f is given by pfj q1¤j ¤p the collection of p functions all defined on the same domain D and valued
in R. Similarly to the case of parametric curve, to plot the function or understand what’s going on,
thinking at f as a collection of real-valued functions doesn’t tell the whole story.
Representing such a function is not easy, the representation that one chooses depends on the context
and the message that one wants to convey.
Definition 5.2 (Graph of a function). Let f : D Rd Ñ Rp a vector valued function of d variables. Its
graph is the subset of Rd p made of the points px, yq such that x P D Rd , y P Rp and y f pxq.
Definition 5.3 (Image of a function). Let f : D Rd Ñ Rp a vector valued function of d variables. Its
image is the subset of Rp made of the points y such that y f pxq for some x P D.
However one cannot represent the graph as soon as d p ¡ 3. When d p 2, there are possible
workarounds.
• If f pxq P R2 is thought as a vector, we can plot the domain D and at some points x P D plot arrows
corresponding to f pxq P R2 . Then f is called a vector field, see Figure 5.1.
47
Bocconi University – course 30543 (Mathematical Analysis module 2)
Vector f (x, y)
y
Point (x, y)
x Representation of the
function
2
x − y2 − 4
f (x, y) =
2xy
Figure 5.1: Representation of a function f : R2 Ñ R2 as a vector field. One attaches the vector f px, y q
to the point px, y q for different values of px, y q P R2 .
• If f pxq P R2 is rather thought as a point, one can grid the domain D and show what it is mapped
to. This makes more sense when f is thought as a change of coordinates, see Figure 5.2.
• If x belongs to a subset of R2 and f pxq P R3 , instead of parametric curve (which corresponds to
the case x P R), we have a parametric surface and one can represent the image of f , that is, the
subset of R3 defined as ty P R3 : Dx P R2 , f pxq yu. See Figure 5.3 for such an example.
Similarly to the case of real-valued function, by composition with a curve, one can go back to a
parametric curve R Ñ Rp . Indeed, if γ : I Ñ Rd where I is an interval of R and f : D Rd Ñ Rp ,
then provided that f ptq P D for all t P I one can define f γ : t P I ÞÑ f pγ ptqq P Rp . That is, f γ is a
parametric curve.
48
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates
π/4
r
1/2 1
Figure 5.3: Image of a function f : R2 Ñ R3 , here a torus. The function f is f pu, v q ppR
r cospuqq cospv q, pR r cospuqq sinpv q, r sinpuqq where pr, Rq p1, 3q are the inner and outer radius of
the torus. One plots only the points px, y, z q which can be written px, y, z q f pu, v q for some pu, v q.
49
Bocconi University – course 30543 (Mathematical Analysis module 2)
pyn qnPN converges to x ñ @j P t1, 2, . . . , pu, pfj pyn qqnPN converges to fj pxq.
We can take the @j out of the implication, and then switch the order of the two quantifiers. This reads
as follows. The function f is continuous at x if and only if for all sequence pyn qnPN in D,
@j P t1, 2, . . . , pu, rpyn qnPN converges to x ñ pfj pyn qqnPN converges to fj pyqs .
But then for the latter implications we use the characterization of continuity for functions f : Rd Ñ R
that we have seen in the previous chapter: see Proposition 4.8. This is enough to conclude the proof.
So the study of continuity for a vector valued function is nothing else than the study of the continuity
of its coordinate functions. In particular, if d the dimension of the domain is larger or equal than 2, than
all the weird examples of discontinuous functions (see for instance Example 4.11) can be reproduced.
Note also that we adopted the characterization of continuity we just proved as a definition in the case of
curve (see Definition 2.6) for simplicity.
Remark 5.8. The same characterization as the one in Section 4.3 works. That is, a function f : Rd Ñ Rp
is continuous if and only if it the inverse image of an open set is an open set, and if and only if the inverse
image of a closed set is a closed set.
5.3 Differentiability
For the case of real-valued functions, we have seen first the definition of partial derivatives and then
the differential. We will reproduce the same path (without doing the proofs as they can be adapted
for instance by working coordinates per coordinates in the codomain Rp ), the difference is that partial
derivatives become vectors and the differential is represented by a matrix called the Jacobian matrix.
Definition 5.9 (Partial derivative). Let f : D Rd Ñ Rp a function and write it f pfj q1¤j ¤p for the
coordinate functions, that is each fj is defined on D and valued in R. Let i P t1, 2, . . . , du. If all the fj
have a partial derivative at x in the direction xi , we
define the partial derivative in the xi direction at a
d B f Bf
point x0 P R , and write it Bxi px0 q, as the vector Bxji px0 q P Rp .
¤¤
1 j p
No surprise, taking partial derivatives is equivalent to taking partial derivatives coordinate per coordinate
of the codomain. Let us introduce directly the Jacobian matrix, which is the neat algebraic structure to
store derivatives.
50
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates
@i P t1, 2, . . . , du, @j P t1, 2, . . . , pu, rJf px0 qsji BBfxj px0 q. (5.1)
i
That is row indices correspond to the codomain while column indices are for the domain. In fact, the
columns of the Jacobian matrix are the vectors B f {B xi px0 q.
We move to the definition of differentiability, but let’s discuss it before giving the formal definition.
In the previous chapter, we define it through the Taylor expansion:
where Dfx0 was the differential, it is a linear function from R2 to R which reads (for a function from R2
to R)
Bf Bf
Dfx0 ph, k q h px0 q k px0 q.
Bx By
The natural extension is to have now
Dfx0 phq h1
Bf Bf Bf
Bx1 px0 q h2
Bx2 px0 q ... hd
Bxd px0 q.
Now notice that Dfx0 phq belongs to Rp (as each partial derivative BBxfi is a vector in Rp ), and that we
sum d terms instead of 2. Actually, the differential becomes in this case a linear operator from Rd to Rp .
Moreover, this can be written in a more compact form
Bf px q Bf px q Bf px q J px qh, (5.2)
h1
B x1 0 h2
B x2 0 B xd 0 f 0
... hd
as the matrix vector product between the Jacobian matrix Jf px0 q and the vector h.
where ophq means that each of the coordinate function is a ophq in the sense of Lemma 4.21.
In this case, we call differential of f at x0 , and write Dfx the linear map Rd Ñ Rp represented by
the matrix Jf px0 q, that is, defined by h Ñ Jf px0 qh.
0
Given (5.2), this definition coincides with Definition 4.22 in the case pd, pq p2, 1q. To further explicit
the parallel, let us directly state the analogous result of Proposition 4.28.
51
Bocconi University – course 30543 (Mathematical Analysis module 2)
∂f ∂f
(u0 , v0 ) × (u0 , v0 )
∂u ∂u
Tangent plane to f (R2 ) at f (u0 , v0 )
∂f
v axis (u0 , v0 )
∂v
(u0 , v0 ) f : R2 → R3
∂f
f (u0 , v0 ) (u0 , v0 )
∂u
u axis
Surface f (R2 )
Figure 5.4: Tangent plane to the image of a function f : R2 Ñ R3 as discussed in Example 5.15. The
image of the function is the surface in green. At the point pu0 , v0 q, the two lines parallel to the axis (in
red and blue) are mapped by f to two curves (also in red and blue) included in the surface and whose
tangent at f pu0 , v0 q are the lines directed by B f {B upu0 , v0 q and B f {B v pu0 , v0 q. The tangent plane (in
purple) contains these two vectors, hence is normal to B f {B upu0 , v0 q B f {B v pu0 , v0 q.
in such a way that Lpei q is the derivative at 0 of h ÞÑ f px0 hei q. The latter expression exactly coincides
by definition with the vector B f {B xi px0 q. Thus, M Jf px0 q. Once we arrive at this point, the last step
is identical to the case of the previous chapter: f px0 hq f px0 q Jf px0 qh f px0 hq f px0 q Lphq,
and the right hand side is ophq by assumption, so is the left hand side.
In this abstract setting this characterization of the differential is word for word the one for functions
R2 Ñ R: that’s where there is a reward for abstraction! On the other hand, for practical computations
the present case is more involved because one needs to store a matrix rather than just the gradient vector.
For the record, let’s state a proposition analogous to the previous chapter (without proof, being the proof
almost identical to the one of Proposition 4.24).
Remark 5.14 (Link between differential and derivative). Let f pf1 , . . . , fp q : I R Ñ Rp a parametric
curve, that is a function of one variable valued into Rp . If all the functions fj are differentiable, we can
define the derivative at a point t as f 1 ptq pf11 ptq, . . . , fp1 ptqq P Rp , see Definition 2.7. On the other hand,
the differential at the same point t P I is the linear map R Ñ Rp defined by
hf11 ptq
h P R ÞÑ Dft phq hf 1 ptq ...
P Rp .
hfp1 ptq
So the derivative is a vector while the differential is a linear map R Ñ Rp . One can recover the derivative
from the differential by f 1 ptq Dft p1q, that is the differential applied to the “vector” 1 P R.
52
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates
Example 5.15 (Tangent to a surface). Let f : R2 Ñ R3 a “parametric surface” and write pu, v q for the
variables in the domain. We look at the image of f which we call S. It is the subset of R3 defined as
S tpx, y, z q P R3 , Dpu, v q P R2 , f pu, v q px, y, z qu. An example was already given in Figure 5.3.
Let pu0 , v0 q P R2 and assume that f is differentiable at pu0 , v0 q. The tangent plane to the surface is
the image of the principle part of the Taylor expansion, that is of h ÞÑ f pu0 , v0 q Dfpu0 ,v0 q phq. More
explicitly, the tangent plane to the surface is the image of the map
ph, kq P R2 ÞÑ f pu0 , v0 q Bf pu , v q Bf pu , v q P R3 ,
h
Bu 0 0 k
Bv 0 0
where here B f {B upu0 , v0 q and B f {B v pu0 , v0 q are vectors in R3 . Generically, it is the parametric equation
of a plane in R3 . It is a plane which contains the point f pu0 , v0 q and whose direction contains the vectors
Bf {Bupu0 , v0 q and Bf {Bvpu0 , v0 q, thus a normal direction is given by the cross product Bf {Bupu0 , v0 q
Bf {Bvpu0 , v0 q. In summary, a point x px, y, zq belongs to the tangent plane at pu0 , v0 q if and only if
g pf px0 hqq g pf px0 qq Dgf px0 q pDfx0 phqq Dgf px0 q pophqq o pDfx0 phq ophqq .
53
Bocconi University – course 30543 (Mathematical Analysis module 2)
We now have to understand what’s happening to the small o. For the first one Dgf px0 q pophqq, we write
for a function ω : Rd Ñ Rp (or more precisely defined only on a neighborhood of 0) which goes to 0 as
hÑ0
Dgf px0 q pophqq Dgf px0 q p}h}ω phqq }h}Dgf px0 q pω phqq
by linearity of Dgf px0 q . By continuity of Dgf px0 q , the quantity Dgf px0 q pω phqq goes to 0 when h Ñ 0, and
that allows to conclude that Dgf px0 q pophqq is a ophq.
On the other hand, for the second one o pDfx0 phq ophqq we rather show (ii) of Lemma 4.21. First,
note that there exists a constant C such that }Dfx0 phq} ¤ C }h}: this can be checked coordinate per
coordinate, the coefficient C would depend on the entries of the matrix representing Dfx0 . Thus, using
also (ii) of Lemma 4.21 with ε 1, we find r ¡ 0 such that for all h P Bc p0, rq, there holds
}Dfx phq
0 ophq} ¤ pC 1q}h}.
Then, fix ε ¡ 0. Using (ii) of Lemma 4.21 for the “outer” small o, we find δ such that if }k} ¤ δ then
}opkq} ¤ ε{pC 1q}k}. We apply it to k Dfx phq ophq: if }h} ¤ minpr, δ{pC 1qq then we can see
0
that
}opDfx phq ophqq} ¤ ε}h}.
0
Thus using the definition of the Jacobian matrices, see (5.1), we end up with
54
Chapter 5. Function from Rd to Rp : vector fields and change of coordinates
We conclude this chapter by another consequence of the chain rule: functions of class C 1 are stable
by composition.
Proof. Thanks to Proposition 5.17, we know that g f is differentiable everywhere on D. Thus g f has
partial derivatives everywhere on D, which are actually given by the formula (5.3). As f and g are of
class C 1 , all the partial derivatives of f and g are continuous, thus so are the partial derivatives of g f
(because a product, sum, composition of continuous is continuous).
55
56
Chapter 6
In the previous chapters we have investigated differentiability, that is we have only differentiate the
function once. In this chapter we will study higher order derivatives. Though the higher order partial
derivatives are quite easy to describe, what can be more intricate is the algebraic structure in which to
store these derivatives but we won’t discuss this aspect. The main theorem of this chapter is Schwarz’s
theorem which states that the order in which partial derivatives is taken does not matter. We will then
move to Taylor expansion of order 2 (recall that the Taylor expansion of order 1 is the differential that
we studied in the two previous chapters). It is possible to consider Taylor expansion of higher order,
however the formulas become heavier, in particular without the help of the right algebraic tools.
In this chapter, we will restrict to functions f : D Rd Ñ R defined on D an open subset of Rd and
real-valued. The case where the codomain is Rp and not R can be obtained by decomposing a Rp -valued
function as a collection of p real-valued functions and does not bring new conceptual novelties.
Bk f BxB B B Bf
B xi B xi
1 2 . . . B x ik i
1 B xi 2
...
B xi
k 1
Bxi
k
.
By induction we say that a function is of class C k over D if it is of class C 1 and all of its (first order)
partial derivatives are of class C k1 .
Whereas for a function R Ñ R (or R Ñ Rp ) the derivative of order k at a point t is a single number
(or a single vector), here we have a collection of dk partial derivatives (actually less because of Schwarz’s
theorem, see below).
57
Bocconi University – course 30543 (Mathematical Analysis module 2)
−f (x0 , y0 + k) +f (x0 + h, y0 + k)
y0 + k
1/hk[f (x0 + h, y0 + k) − f (x0 , y0 )
(x0 + s′ , y0 + t′ )
−f (x0 + h, y0 ) − f (x0 , y0 + k)]
(x0 + s, y0 + t)
∂ 2f ∂ 2f
= (x0 + s, y0 + t) = (x0 + s′ , y0 + t′ )
∂y∂x ∂x∂y
y0
+f (x0 , y0 ) −f (x0 + h, y0 )
x0 x0 + h
Figure 6.1: Illustration of the identity (6.1): a finite difference involving the values of f on the boundary
of a rectangle (in red and blue) can be expressed as B 2 f {B y B x or B 2 f {B xB y, a priori at two different
points, but at points included in the dashed rectangle.
B2 f px q B2 f px q.
B xB y 0 B y B x 0
In this result it is important that f is of class C 2 , that is that the functions B 2 f {B xB y and B 2 f {B y B x are
continuous. There exist counterexamples where the partial derivatives of second order exist everywhere
and are not symmetric (but in this case they are not continuous).
Proof. Let x0 px0 , y0 q P R2 and let r ¡ 0 such that Bc px0 , rq D. We introduce on Bc p0, rq, at least
if h and k are not 0, the function F defined by
f px0 kq f px0 , y0 q f px0 h, y0 q f px0 , y0 kq
F ph, k q
h, y0
.
hk
It was chosen because of the following identities:
g phq g p0q 1 B f px Bf px
F ph, k q kq s, y0 q .
hk k Bx 0 s, y0
Bx 0
58
Chapter 6. Higher order derivatives
Now that we have this s (which depends on h and k), we look at the function t ÞÑ hptq B f {B xpx0
s, y0 tq (which depends on s) and we notice that, by the mean value theorem (which we can apply
because h is of class C 1 as f is of class C 2 ):
F ph, k q
1 B f px kq
Bf px s, y0 q hpkq k hp0q h1 ptq
k Bx 0 s, y0
Bx 0
for some t P p0, k q. Given the expression of h, this reads:
F ph, k q
B2 f px s, y tq.
ByBx 0 0
In conclusion, we have proved: for every ph, k q P Bc p0, rq, there exists ps, tq P p0, hq p0, k q Bc p0, rq
such that:
F ph, k q
B2 f px s, y tq.
ByBx 0 0
Reasoning symmetrically (that is, applying the mean value theorem first in the y variable and then in
the x variable), for every ph, k q P Bc p0, rq, there exists ps1 , t1 q P p0, hq p0, k q Bc p0, rq such that:
F ph, k q
B2 f px s1 , y t1 q.
B xB y 0 0
In conclusion, eliminating the function F : for every ph, k q P Bc p0, rq, there exists ps, tq and ps1 , t1 q two
points in Bc p0, rq such that:
B2 f px tq
B2 f px s1 , y0 t1 q, (6.1)
ByBx 0 s, y0
B xB y 0
see Figure 6.2 for an illustration.
Eventually we conclude with continuity: let us fix ε ¡ 0. By continuity of the second derivatives of
f , there exists δ ¡ 0 such that, for every point ph1 , k 1 q P Bc p0, δ q, there holds
2
B h1 , y0 k1 q
B2 f px , y q ¤ ε, 2
B h1 , y0 k1 q
B2 f px , y q ¤ ε.
BB p BB p
f f
and
x y x0 B xB y 0 0 y x x0 ByBx 0 0
Now fix ph, k q P Bc p0, δ q such that h and k are not 0. Using the triangle inequality and (6.1) for
ph1 , k1 q ps, tq and ph1 , k1 q ps1 , t1 q we see:
2
B f B 2
B xB y px0 , y0 q B y B x px0 , y0 q
f
2
BBxBfy px0 , y0 q BBxBfy px0 s1 , y0 B2 f px B
t1 q
2 2
tq px0 , y0 q
f
2
ByBx 0
2
s, y0
ByBx
¤ BBxBfy px0 , y0 q BBxBfy px0 s1 , y0
B f B
t1 q
2 2
B y B x p x0 tq p x0 , y0 q ¤ 2ε.
f
s, y0
ByBx
As ε is arbitrary, we have proved what we wanted to.
Example 6.3. As an application, let’s show a necessary condition for a field to be a gradient. Let’s
assume that we have g : R2 Ñ R2 a vector field over R2 of class C 1 . The question is: is it possible to
find f : R2 Ñ R such that g ∇f ? That is, can we find f such that
Bf
BBfx ?
gx
g
gy By
Well if this is the case, applying Scwharz’s theorem we find that
Bgx Bgy (6.2)
By Bx
59
Bocconi University – course 30543 (Mathematical Analysis module 2)
as the left hand side is B 2 f {B y B x while the right hand side is B 2 f {B xB y. But (6.2) depends only on g. So
we see that a necessary condition for g to be a gradient field is that (6.2) holds, that is, the “equality of
the cross derivatives”.
For instance, the function px, y q ÞÑ px cospy q, x sinpy qq (change of variables in polar coordinates) cannot
be written as the gradient of a function as it does not satisfy (6.2).
Actually, if the domain of definition is (for instance) R2 , then the converse holds, that is any vector
field g of class C 1 such that (6.2) holds can be written g ∇f , that is, as the gradient of a C 2 function.
Though we have proved Schwarz’s theorem for functions from R2 Ñ R, it holds in a more general
context (and the proof is basically by induction with the help of Theorem 6.2).
Theorem 6.4 (Schwarz’s theorem, general case). Let f : D Rd Ñ R be a function of class C k defined
on an open set D. Let x0 P D and σ : t1, 2, . . . , k u Ñ t1, 2, . . . , k u a permutation (that is a bijective map).
Then, for every point x0 of D and any sequence of index i1 , i2 , . . . , ik , there holds
Bk f px0 q
Bk f
B xi B xi . . . B xi
1 2 k
Bxi p q Bxi p q . . . Bxi p q px0 q.
σ 1 σ 2 σ k
To give an example, if you have a function of 3 variables f f px, y, z q which is of class C 4 , then for
instance there holds
B4 f B4 f B4 f ,
BxByBxBz pBxq2 ByBz ByBzpBxq2
and actually many more identities: the order in which you take partial derivatives does not matter.
Whereas the first derivatives are stored in the gradient, the compact notation for the second derivative
is to store them in the Hessian matrix. We will see in the next section on Taylor expansion why storing
second order partial derivatives this way makes sense.
Definition 6.5 (Hessian matrix). Let f : D Rd Ñ R a function of class C 2 over of an open domain
of Rd . We define the Hessian matrix of f at a point x0 P D, denoted Hf px0 q, as the d d matrix whose
entry for the i-th row and j-th column is
B2 f px q.
B xi B xj 0
Thanks to Theorem 6.2, this matrix is symmetric, that is, rHf px0 qsij rHf px0 qsji for all i, j.
For instance, in the case of a function f : R2 Ñ R it reads
B2 f2 B2 f
Bx Bx2By .
Hf B2 f B f2
B xB y By
Remark 6.6 (Counting derivatives). As you can see with the Hessian matrix, for a function f : R2 Ñ R,
one needs to compute only 3 partial derivatives of second order instead of 4 thanks to Schwarz’s theorem.
dpd 1q
If you have a function from Rd Ñ R, the number of distinct derivatives you need to compute is ,
2
which is the same as the dimension of the set of symmetric matrices.
A more tricky question of combinatorics is the following: for a function f : Rd Ñ R and given
Schwarz’s theorem, how many distinct derivatives one needs to compute to get all the derivatives of order
k? This is for sure less than dk (the naive estimate), and actually the answer is
d
k
k
pdd! k!kq! .
But this is not an easy result (well, depends what is considered easy in combinatorics!) and out of the
scope of these lecture notes.
60
Chapter 6. Higher order derivatives
g 1 ptq
¸
d
Bf px thq,
i 1
hi
Bxi
g 2 ptq
¸
d ¸
d
B 2 f px thq.
i 1j 1
hi hj
B xi B xj
Proof. For the first derivative this actually nothing else than Proposition 4.29. For the second one, one
just differentiates the first one using again Proposition 4.29 (as well as Theorem 6.2 to exchange the role
of the indexes i and j).
Proposition 6.8 (Second order Taylor expansion for a function of several variables). Let f : D Rd Ñ R
a function of class C 2 defined on an open subset D of Rd . Let x0 P D and r ¡ 0 such that Bc px0 , rq D.
Then, for h phi q1¤i¤d P Bc p0, rq there holds
f px0 hq f px0 q
¸
d
B f px q 1 ¸¸
d d
B2 f px q op}h}2 q
i1
hi
B xi 0 2 i1 j 1
hi hj
B xi B xj 0
where op}h}2 q }h}2 ω phq, and ω : Bc p0, rq Ñ R is a function which goes to 0 when h goes to 0.
61
Bocconi University – course 30543 (Mathematical Analysis module 2)
Figure 6.2: Example of graphs of functions f : R2 Ñ R such that the gradient at p0, 0q vanishes, that is,
the plane Oxy (in gray) is tangent at p0, 0q. Then the local behavior of the function
around
the tangent
0.4 0
plane can change depending on the Hessian at p0, 0q. Left: Hessian Hf p0, 0q which gives
0 1.2
a graph above the tangent plane. Center: Hessian Hf p0, 0q
1.2 0 which gives a graph below
0 0.4
1.2 0
the tangent plane. Right: Hessian Hf p0, 0q which gives a graph crossing the tangent
0 0.4
plane, this is called a saddle point.
the second order derivatives. As a sum of functions going to 0 also goes to 0, it concludes the proof.
The Taylor expansion looks quite heavy, and it is. For a function f : R2 Ñ R, this reads
f px0 k q f px0 , y0 q
Bf Bf h2 B f 2
B2 f k2 Bf 2
oph2 k 2 q,
h, y0 h
Bx k
By 2 B x2
B xB y hk
2 By2
where here all the partial derivatives of f are taken at the point px0 , y0 q. Some examples of the graph of
the function f (when px0 , y0 q p0, 0q and ∇f p0, 0q) are given in Figure 6.2.
In general, if one wants to write in a compact way, then one uses the Hessian matrix: actually the
Taylor expansion can be written
1
f px0 hq f px0 q ∇f px0 q h h pHf px0 qhq op}h}2 q.
2
Here, h pHf px0 qhq is the dot product between the vector h and the vector Hf px0 qh, which is the
multiplication between the matrix Hf px0 q and the vector h.
62
Chapter 6. Higher order derivatives
basis. The goal of this section is to explain how it translates this language for higher order derivatives.
Thus we fix f : D Rd Ñ R a function of class C k defined on an open set D, as well as x0 P D.
The differential of order k of f at x0 , that we write Dk fx0 is a k-linear map from Rd Ñ R. That is,
it takes as argument hp1q , hp2q , . . . hpkq a family of k vectors in Rd and outputs
In addition, it is linear in each of its argument hpj q , when keeping the others fixed. Intuitively, the
quantity Dk fx0 php1q , . . . , hpkq q corresponds to “first differentiating in hpkq , then in hpk1q , up to hp1q ”.
The link with the partial derivatives is that it can be expressed as
where the sum is taken among all k-uplets of indexes i1 , i2 , . . . ik , that is, over t1, 2, . . . , duk
Schwarz’s theorem reads as: the mapping Dk fx0 : pRd qk Ñ R is not only k-linear but also symmetric,
that is
Dk fx0 php1q , hp2q , . . . , hpkq q Dk fx0 phpσp1qq , hpσp2qq , . . . , hpσpkqq q
for every permutation σ : t1, 2, . . . , k u Ñ t1, 2, . . . , k u.
Eventually one can write a Taylor expansion at any order which reads, if f is of class C k 1
:
¸
k
1
f px0 hq f px0 q h, h, . . . , hq
Dj fx0 plooooomooooon op}h}k q,
j 1
j!
j times
where Dj fx0 ph, h, . . . , hq means we apply the j-linear form Dj fx0 to the collection of j vectors all equal
to h.
Remark 6.9 (Counting derivatives, continued). Remark 6.6 about “counting derivatives” can be re-read
with the prism of linear algebra. It actually states that the linear space of all symmetric k-linear forms
from pRd qk Ñ R has dimension (as vector space) of d k k pdd! k! kq!
.
Remark 6.10 (Polarization formulas). It can be surprising that in the Taylor expansion we only evaluate
Dk fx0 on the diagonal, that is, when all arguments are equal to the same vector. Actually a symmetric
k-linear form is entirely determined by its value on the diagonal. Let us illustrate why in the simple case
of a 2-linear form, the general case is algebraically more involved but conceptually similar. If we take
L : php1q , hp2q q P pRd q2 Ñ R which is 2-linear and symmetric then
1 1 1
Lphp1q , hp2q q Lphp1q h p 2q , h p 1q hp2q q Lphp1q , hp1q q Lphp2q , hp2q q
2 2 2
as it can be checked by linearity and symmetry. Thus, the knowledge of Lph, hq for every h P Rd is
enough to reconstruct Lphp1q , hp2q q for all hp1q , hp2q P Rd .
63
64
Part II
Integration
65
66
Chapter 7
Path integrals
In this chapter we are interested in the integral of a function f defined over Rd on the image of a
parametric curve γ : I Ñ Rd that we will write
»
f.
γ
The function f can be a scalar field, with a particular case corresponding to the constant function equal
to 1 which yields the length of the curve γ. Or it can be valued in Rd (that is f : Rd Ñ Rd with the same
d for the domain and codomain) and thought as a vector field. In physics, this would correspond to the
work of a force f along a trajectory γ. In the case when the vector field is the gradient of a function, we
recover the fundamental theorem of calculus: the integral of the derivative of the function coincides with
the difference between the value of the function at the boundary.
One key property of these two integrals is their invariance under reparametrization: ³ the speed at
which the curve is traveled is not important, only the image matters. The quantity γ f is a geometric
quantity, which depends only on the image of γ.
In the rest of the chapter we take I ra, bs a closed and bounded interval of R and γ : I Ñ Rd a C 1
curve. We also take f a function defined on a domain D of Rd and such that γ pI q is included in *D*.
In this chapter all the integrals we define boil down to integrals of functions of one variable, so there
is no need to rebuild a theory of integration.
The definition makes sense: as γ is of class C 1 , the function γ 1 is continuous; and the function
t ÞÑ f pγ ptqq is continuous as a composition of continuous functions (Proposition 4.10). Thus we are
integrating the continuous function t ÞÑ f pγ ptqq }γ 1 ptq} over a segment.
The quantity ds stands for the infinitesimal arc length. Indeed, the definition above basically reads
“ds }γ 1 ptq}dt”, and the right hand side is the distance traveled on the curve during a time dt. To make
this more formal, one can prove indeed a “Riemann sum” representation which is illustrated in Figure
7.1.
67
Bocconi University – course 30543 (Mathematical Analysis module 2)
γ(b) = γ(tN
N)
N =6
γ(tN
2 )
γ(a) = γ(tN
0 )
³
Figure 7.1: Illustration of Proposition 7.2. The integral γ f ds can be seen as sum of contributions of
the form “length” f . If f 1, we recover the length of the curve.
Proposition 7.2. Let γ : ra, bs Ñ Rd a C 1 curve and f : Rd Ñ R a scalar continuous function. For
N ¥ 1 we write tNk a k pb aq{N , that is, tt0 , t1 , . . . , tN u are ordered and evenly distributed on ra, bs.
N N N
We could actually replace f tN
k by f pt1 q for any t1 P rtNk , tNk1 s, it does not change the validity of the
result.
Sketch of the proof. Using the Riemann sum representation of the integral for the continuous (scalar
valued) function t ÞÑ f pγ ptqq}γ 1 ptq}, we have:
» »b ¸
N
f ds f pγ ptqq}γ 1 ptq} dt lim f tN }γ 1 ptNk q}ptNk tNk1 q, (7.1)
γ a N Ñ 8 k1 k
This is the quantified version of “ds }γ 1 ptq}dt”. If the claim is proved then for N ¥ N0 by using the
triangle inequality to sum the inequalities above,
N
¸ ¸N
1
f tN } p q p
γ tN γ tN k1 q} N
f tk γ tk } p q}p
N N
tk tk1
N
q¤ sup |f | pb aqε
pq
k k
1 k 1
kloooooooooooooooooomoooooooooooooooooon looooooooooooooooooomooooooooooooooooooon γ I
(I) (II)
The term (I) is the one in the ³statement of the Proposition, while in (II) we recognize, thanks to (7.1),
something which converges to γ f ds. Then there is a bit of quantification to play with, but it yields the
result. Note that supγ pI q |f | 8 as the function f γ : I Ñ R is continuous over the segment ra, bs,
thus it attains its maximum and minimum.
68
Chapter 7. Path integrals
It remains to prove the claim, that is, (7.2). We start by choosing ε ¡ 0. We can use the “mean value
theorem” (in its integral form) and write the following equality between vectors:
» tN
γp q γp q γ 1 ptq dt.
k
tN
k tN
k 1
k 1
tN
As γ 1 is uniformly continuous on the segment ra, bs (as each of its coordinate function is uniformly
continuous), we can find N0 such that if N ¥ N0 then
}γ 1 ptq γ ptNk q} ¤ ε
k1 , tk s. In particular, using the triangle inequality for integrals,
for all t P rtN N
» N
N tk
p q γp
γ t
k tN
k 1 qp
tN
k tN
k 1 qγ 1 ptN q
k
tN
γ 1 ptq dt ptN N 1 N
k tk1 qγ ptk q
k 1
» N
tk
γ1 t p q γ 1 ptNk q
dt
tN
k 1
» tN
1
¤ p q γ 1 ptNk q dt ¤ ptNk tNk1 qε.
k
γ t
k 1
tN
Then to finally get the claim, we use that |}a} }b}| ¤ }a b} which is true for any vector a, b P Rd
(this is a consequence of the triangle inequality) and that we apply to a γ ptNk q γ ptk1 q and b
N
ptNk tNk1 qγ 1 ptNk q. This concludes the claim (7.2), hence the proof.
Definition 7.3. If I, J are two intervals of R and k ¥ 1 is an integer, we call diffeomorphism of class
C k a function φ : I Ñ J of class C k which is bijective and such that the inverse function φ1 : J Ñ I is
also of class C k .
Actually, by the intermediate value theorem, if φ1 does not vanish then it does not change sign, thus
a diffeomorphism is either increasing or decreasing.
Example 7.5. As example of increasing diffeomorphism, one can take φ : t P R ÞÑ 2t P R or ψ : t P
p0, 8q ÞÑ ln t P R. As an example of decreasing diffeomorphism, one can take η : t P r0, 1s ÞÑ 1t P r0, 1s.
As an example of a bijective function which is not a diffeomorphism, there is ξ : t P R ÞÑ t3 P R. Indeed,
ξ 1 p0q 0 and the inverse function ξ 1 : t ÞÑ t1{3 is not differentiable at t 0.
Definition 7.6. A C k oriented curve Γ is the data of pI, γ q where I is an interval of R and γ : I Ñ Rd
is a C k function. The function γ is called a parametrization of the curve.
Let γ : I Ñ Rd and ω : J Ñ Rd two functions of class C k (k ¥ 1) defined on I, J two intervals of R.
We say that pI, γ q and pJ, ω q represent the same oriented curve if there exists φ : I Ñ J an increasing
diffeomorphism of class C k such that γ ω φ, that is, if for all t P I, γ ptq ω pφptqq.
69
Bocconi University – course 30543 (Mathematical Analysis module 2)
Example 7.7. Let I r0, 2π s and γ : t ÞÑ pcosptq, sinptqq which gives a parametrization of the unit circle.
Then J r0, π s and ω : t ÞÑ pcosp2tq, sinp2tqq are such that pI, γ q and pJ, ω q define the same oriented
curve (namely, the unit circle traveled once in the counterclockwise direction). In this case the function
φ : I Ñ J is defined by φptq t{2.
Technically speaking, an oriented curve is an equivalence class of pI, γ q where the equivalence relation
is given by “representing the same oriented curve”. What there is to remember is that an oriented curve is
more than the image of a C k function, it is the image with a parametrization. Here oriented means that
we impose the diffeomorphism to be increasing: we remember in which orientation the curve is traveled
by the function, but not at which speed. By the definition of a diffeomorphism, this relation is symmetric
in pI, γ q and pJ, ω q: the definition of diffeomorphism was built for that!
Importantly, let γ : I Ñ Rd and ω : J Ñ Rd two functions of class C k (k ¥ 1) defined on I, J two
intervals of R. Let φ : I Ñ J an increasing diffeomorphism of class C k such that γ ω φ. Then for all
t P I,
γ 1 ptq φ1 ptqω 1 pφptqq.
That is, written with the coordinate functions,
γ11 ptq ω11 pφptqq
. 1 ..
..
φ ptq
.
.
γd1 ptq ωd1 pφptqq
This can be proved precisely by writing the equation coordinate by coordinate and using the rule of
single variable calculus about the derivative of a composite function. Note that φ1 ptq is a scalar while
γ 1 ptq, ω 1 pφptqq are vectors in Rd . In particular, as φ1 ptq ¡ 0 (because φ is increasing), the vectors are
colinear and in the same direction.
They key result is the following of a path integral is the following: the value of the integral over an
oriented curve does not depend on the parametrization.
Proposition 7.8 (Path integral is independent of the parametrization). Let γ : I Ñ Rd and ω : J Ñ Rd
two functions of class C 1 defined on I, J two bounded intervals of R. If they represent the same oriented
curve, for any continuous f : Rd Ñ R, » »
f ds f ds.
γ ω
Proof. Actually the proof is quite simple and relies on a change of variables in the integral. We take
φ : I Ñ J a C 1 increasing diffeomorphism such that γ ω φ. In particular, from γ 1 ptq φ1 ptqω 1 pφptqq
and as φ1 ¡ 0, we see that }γ 1 ptq} |φ1 ptq|}ω 1 pφptqq} φ1 ptq}ω 1 pφptqq}. Substituting in the definition:
» »
f ds f pγ ptqq }γ 1 ptq} dt
γ I
»
f pω pφptqqq φ1 ptq}ω 1 pφptqq} dt
I
» »
f pω prqq }ω 1 prq} dr f ds
J ω
70
Chapter 7. Path integrals
So computing the length of a curve is nothing else than computing an integral. However, in practice
computations are quite hard to do. For instance, there is no simple formula for the length of an ellipse,
where we recall that a parametrization is given by t P r0, 2π s ÞÑ pa cosptq, b sinptqq for some a, b ¡ 0.
The following result is just a particular case of Proposition 7.8.
Proposition 7.10. Let γ : I Ñ Rd and ω : J Ñ Rd two functions of class C 1 defined on I, J two bounded
intervals of R. If they represent the same oriented curve, then the length of the pI, γ q is the same as the
length of pJ, ω q.
The notion of length enables to choose a distinguished parametrization: the one which has unit speed.
Definition 7.11. Let Γ be an oriented curve and let pI, γ q be one of its parametrization. The parametriza-
tion is said normal if }γ 1 ptq} 1 for all t P I.
For a normal parametrization, the length of the curve between γ paq and γ pbq (for a ¤ b both in I)
is b a. Sometimes the curve is called “parametrized by arclength”, because the parameter t is now
interpreted as the arclength.
Proposition 7.12. Let Γ be an oriented curve of class C k and pI, γ q one of its parametrization. We
assume that all point of γ are regular, that is, γ 1 ptq 0 for all t P I. Then there exists ω : J Ñ Rd a
function of class C k such that pJ, ω q is a normal parametrization of Γ.
Proof. We build the parametrization φ with the help of the length. Let t0 P I and for t P I we define
»t
φptq }γ 1 ptq} dt.
t0
As γ is of class C k and γ 1 never vanishes, one can check that }γ 1 } is of class C k1 . Thus the function φ is
of class C k on I. Moreover, φ1 }γ 1 } never vanishes (by regularity of γ) thus φ is a C k diffeomorphism
onto J φpI q. We define ω : J Ñ Rd by ω γ φ1 . As φ1 : J Ñ I is also of class C k , this defines a
function ω of class C k and γ ω φ. Hence pI, γ q and pJ, ω q represent the same oriented curve.
Eventually, from γ 1 φ1 pω φq, by taking the norm and as φ1 ptq }γ 1 ptq} ¡ 0 for all t P I, there
holds
}γ 1 ptq} φ1 ptq}ω1 pφptqq} }γ 1 ptq}}ω1 pφptqq}.
Simplifying by }γ 1 ptq}, we conclude that }ω 1 pφptqq} 1 for all t P I. By bijectivity of φ, we see that
}ω1 prq} 1 for all r P J: the parametrization pJ, ωq is normal.
Similarly to Definition 7.1, it can be extended to the case of γ continuous and piecewise C 1 curves by
concatenation.
Again, one can check that the integral is well defined as t ÞÑ f pγ ptqq γ 1 ptq is a composition and sum of
continuous functions. Note that we integrate the dot product between the vector f pγ ptqq and the vector
γ 1 ptq so that in the end we integrate a scalar quantity and the final outcome is a scalar. So here ds is
the “vectorial infinitesimal increment” γ 1 ptqdt. In some sense, the integral measures in average if the
displacement γ 1 is aligned with the vector field. Again this can be interpreted with the help of a Riemann
sum.
71
Bocconi University – course 30543 (Mathematical Analysis module 2)
γ(b) = γ(tN
N)
N =6
f (tN
1 )
γ(tN
2 )
γ(tN
1 )
Contribution to the integral
(γ(tN N N
2 ) − γ(t1 )) · f (t1 )
γ(a) = γ(tN
0 )
³
Figure 7.2: Illustration of Proposition 7.14. The integral γ f ds can be seen as sum of contributions of
the form “displacement” f . Compared to Proposition 7.14 we have made a little shift in the indexes
(that does change the validity
of the proposition), that is, we sum f t k1
N
γ pt N
k q γ ptN
k1 q instead of
f tk γ ptk q γ ptk1 q .
N N N
Proposition 7.14. Let γ : ra, bs Ñ Rd a C 1 curve and f : Rd Ñ Rd a continuous vector field. For N ¥ 1
k a
we write tN k pb aq{N , that is, ttN
0 , t1 , . . . , tN u are ordered and evenly distributed on ra, bs. Then
N N
there holds
» ¸
N
f ds lim f tN γ ptNk q γ ptNk1 q .
γ N Ñ 8 k1 k
We will not sketch the proof as it is very similar to the one of Proposition 7.2. Roughly, the idea is
that γ ptN 1 N
k q γ ptk1 q ptk tk1 qγ ptk q and then one has to quantify properly.
N N N
Remark 7.15 (A useful bound). A link between the integral of a scalar and vector field is the following:
if f : Rd Ñ Rd is continuous and γ : I Ñ Rd is a C 1 curve
» »
f ds
¤ }f } ds.
γ γ
Similarly to the scalar case, one can also prove the independence of the integral from the parametriza-
tion.
Proposition 7.16 (Path integral is independent of the parametrization (case of a vector field)). Let
γ : I Ñ Rd and ω : J Ñ Rd two functions of class C 1 defined on I, J two bounded intervals of R. If they
represent the same oriented curve, for any continuous f : Rd Ñ Rd ,
» »
f ds f ds.
γ ω
Proof. The proof is left as an exercise and very similar to the case of Proposition 7.8.
72
Chapter 7. Path integrals
f = ∇g
γ(a) = γ(b)
Z
∇g · ds = 0
γ
Figure 7.3: Illustration of Corollary 7.18: the integral of a gradient (in green) on a closed curve (in red)
is zero. Note that in this drawing it is not apparent that the vector field f is the gradient of a scalar
function.
Eventually, let us prove the analogue of the fundamental theorem of calculus which relates the integral
of the derivative of the function to the function. We start by recalling it for a function R Ñ R, you have
seen this result in Mathematical Analysis – Module 1. When you have a function f : R Ñ R which is of
class C 1 then for every bounded interval ra, bs there holds
»b
f 1 ptq dt f pbq f paq. (7.3)
a
There is a similar result in this case, but now what you should put inside the integral is the gradient of
a scalar function, that you see as a vector field.
Proposition 7.17 (Path integral of a gradient field). Let g : Rd Ñ R a scalar function of class C 1 and
γ : I ra, bs Ñ Rd a C 1 curve. Then
»
∇g ds g pγ pbqq g pγ paqq.
γ
By Proposition 4.29, the function t ÞÑ ∇g pγ ptqq γ 1 ptq is nothing else than the derivative (with respect
to t) of the function t ÞÑ g pγ ptqq. Thus we can use (7.3) for the function t ÞÑ g pγ ptqq which is defined on
I ra, bs and valued in R.
A particular useful case is the following: if the curve is closed, that is if γ pbq γ paq. In this case, the
integral of a gradient over the curve vanishes.
73
Bocconi University – course 30543 (Mathematical Analysis module 2)
Proof. We just apply the previous proposition and notice that g pγ pbqq g pγ paqq as γ paq γ pbq.
Example 7.19. Let us consider a constant vector field, that is, we fix u P Rd³ and we define f : x ÞÑ u for
all x P Rd . Let γ : ra, bs Ñ Rd be a curve of class C 1 , we want to compute γ f ds. One quick way is to
notice that f ∇g provided we define g : x ÞÑ u x. Thus
» »
f ds u ds g pγ pbqq g pγ paqq u γ pbq u γ paq u pγ pbq γ paqq.
γ γ
This could also have been obtained by working coordinate per coordinate.
We conclude this chapter with two remarks: one about standard results about integrals that were not
put in the core of the chapter not to overload it, and a complement about a concept that we do not cover
but that you may encounter in other context.
Remark 7.20 (Some identities). As everything boils down to one-dimensional integrals, you can directly
import some results and formulas, like linearity. For instance, you can check the following formulas: if
f, g are scalar functions and γ is a curve while a, b P R,
» » »
paf bg q ds a f ds b g ds.
γ γ γ
Remark 7.21 (Complement: 1-form). You may encounter what is called 1-forms. From a notation
perspective, a 1-form ω on Rd reads
¸
d
ω fi pxq dxi ,
i 1
where each fi is a function from Rd Ñ R. Thus a 1-form is a collection of d real-valued functions, it is
nothing else than a function f : Rd Ñ Rd whose coordinate functions are the fi .³ For all practical purpose,
you can read the integral of a 1-form on a curve γ : I Ñ Rd , usually denoted γ ω as the integral of the
vector field f on the curve γ: » »
ω f ds.
γ γ
So why introduce this notation of 1-form? To be precise, a 1-form is a function ω : Rd Ñ LpRd q, where
LpRd q is the set of linear forms on Rd (the set of linear maps from Rd to R). The paradigmatic example
of a 1-form is the differential of a function f : Rd Ñ R. Then one defines
» »
ω ω pγ ptqqrγ 1 ptqs dt.
γ I
In the formula above, ω pγ ptqqrγ 1 ptqs is the linear form ω pγ ptqq evaluated at γ 1 ptq.
However, in Rd there is a canonical isomorphism between LpRd q and Rd which is given by the dot
product. For every L P LpRd q there is a unique vector xL such that @x P Rd , Lpxq x xL , and the
mapping L Ø xL is an isomorphism of vector space between LpRd q and Rd . Actually, the i-th component
of xL is obtained as Lpei q. Thus instead of looking at a 1-form, one can use this isomorphism to see it as
a vector valued function Rd Ñ Rd . This reasoning actually holds as soon as you have a dot product (if
the space is infinite dimensional you need to impose a metric assumption: the space must be complete
for the distance generated by the norm).
In summary: as you have a dot product on Rd , there is a canonical isomorphism between LpRd q and
R and it’s too mucch effort for not a great reward to distinguish between LpRd q and Rd , thus 1-forms and
d
vector fields can be thought as the same objects. But in some other context it makes sense to distinguish
between a vector space V and the space LpV q of 1-forms over it, and in this case one distinguishes between
1-form and vector fields.
74
Chapter 8
In this chapter we study the integral of functions of several variables. Though we will stick to functions
of 2 variables, the definitions we present can be extended to functions of d variables at the price of more
cumbersome notations and a less intuitive´geometric representation. If f : D R2 Ñ R is a function of
2 variables, we want to give a meaning to D f px, y q dxdy which can be interpreted as the volume under
the graph of f , as well as present techniques to compute this quantity.
The technical aspects of this chapter are not the most important ones, we will skip most of the proofs
as they are quite heavy. Actually a good framework to perform integration is the one given by measure
theory but this is out of the scope of these notes. The important take-home message of this chapter is not
the formal definitions, but rather the rules describing how to manipulate and compute these integrals.
You will apply these rules during the TA sessions on examples.
Specifically, in addition to the basic properties that one can expect from any definition of the integral
(linearity and positivity), the two important tools to compute integrals are: Fubini’s theorem, which
reduces the computation of a bidimensional integral to the one of two unidimensional integrals in a row;
and the change of variables formula, where the Jacobian matrix will play an important role.
Definition 8.1 (Compact set). Let K Rd be a domain of Rd . We say that K is compact if it is closed
(see Chapter 3) and bounded (that is, there exists a constant C such that }x} ¤ C for all x P K).
Example 8.2. In Rd , the closed balls Bc px, rq with x P Rd and r P r0, 8q are compact.
The main interest of compact set is the Bolzano Weierstrass theorem: if a sequence lies in a compact
set, up to extraction it converges to a point in the set.
Theorem 8.3 (Bolzano Weierstrass). Let K be a compact set of Rd and pxn qnPN a sequence such that
xn P K for all n P N. Then there exists φ : N Ñ N strictly increasing such that the sequence pxφpnq qnPN
converges to a point x P K.
Proof. We will actually go back to the one dimensional case. For simplicity we only take d 2 (that is,
a sequence in R2 ). Let us write xn px1,n , x2,n q. Let C be a number such that }xn } ¤ C for all n P N.
Note that for all n P N,
As the sequence px1,n qnPN is bounded, there exists an extraction, that is, φ1 : N Ñ N strictly increas-
ing, such that px1,φ1 pnq qnPN converges to a limit x1 (Bolzano Weierstrass in R). Then as the sequence
px2,φ1 pnq qnPN is also bounded, there exists φ2 : N Ñ N strictly increasing such that px2,φ1 pφ2 pnqq qnPN
converges to a limit x2 .
We write φ φ1 φ2 . By what we just said px1,φpnq qnPN and px2,φpnq qnPN converge to respectively x1
and x2 . Thanks to Proposition 3.3, it implies that pxφpnq qnPN converges in R2 to x px1 , x2 q. Eventually,
as K is closed and xφpnq P K for every n P N, there holds x P K.
We have introduce the notion of compactness mainly for the Heine theorem, which states that a
continuous function on a compact set is uniformly continuous.
Theorem 8.4 (Heine). Let K Rd a compact set and f :K Ñ R a continuous function. Then f is
uniformly continuous, that is
Remark 8.5. Note that in the statement of Theorem 8.4, one can take a vector valued function (that
is f : K Ñ Rp for p ¥ 1) and the conclusion still holds. For instance, one can reason coordinate per
coordinate in the codomain.
As and additional result, the following statement guaranteeing existence of extrema is similar to the
version for functions defined over R, and the proof is similar.
Theorem 8.6 (Extrema of continuous functions over compact sets). Let K be a compact set of Rd and
f : K Ñ R a continuous function. Then the function f is bounded over K and attains its extrema, that
is, there exist xmin and xmax in K such that
Proof. We only prove it for the minimum. Let m inf xPK f pxq. By definition of the infimum, it means
that there exists a sequence pxn qnPN of points in K such that
lim f pxn q m.
n Ñ 8
By the Bolzano Weiestrass theorem, there exists an “extraction” φ : N Ñ N and a point xmin such that
pxφpnq qnPN converges to xmin . By continuity of f ,
Oz
z = f (x, y)
Oy
(x, y)
Ox
Domain D
Figure 8.1: The integral of a non-negative function f : R2 Ñ R over a domain D is the volume of the
region between the graph of f (in blue) and the domain D (in red), seen as a subset of R3 .
Definition 8.7 (1 dimensional partition). A partition P of ra, bs is the data of N ordered real numbers
a x0 x1 x2 . . . xN b. The step-size of the partition ∆pP q is the maximal value of xi xi1
for i P t1, 2, . . . , N u.
With a partition of ra, bs and a partition of rc, ds it is easy to build a partition of ra, bs rc, ds, see the
bottom of Figure 8.2 and the following definition.
Definition 8.9 (Lower and upper sums). Let D ra, bs rc, ds R2 a rectangle and f : D Ñ R a
function defined over D. We assume that f is bounded, that is, there exists C P R such that |f px, y q| ¤ C
for all px, y q P D. For P1 and P2 partitions of ra, bs and rc, ds respectively, we define respectively the
77
Bocconi University – course 30543 (Mathematical Analysis module 2)
Oz
Oy
a = x0
Rectangle R11
Rectangle R25
x1 Partition P1 of [a, b]
Ox x2
y3 = b
Domain D c = y0 y1 y2 y3 y4 y5 y6 = d
Partition P2 of [c, d]
Figure 8.2: Approximation of the volume with rectangles. The domain D is partitioned into rectangles
starting from a partition P1 of ra, bs (in dark blue) and a partition P2 of rc, ds (in green). For each
rectangle of the partition, we build two cuboids corresponding to the lower (orange) and upper (purple)
approximation of the graph in f (in blue). Here we have represented the cuboids only for two different
rectangles in the partition P1 b P2 . The lower (resp. upper) sum is then the sum of the volume of all
cuboids with orange top (resp. purple top).
¸
Lpf, P1 b P2 q ApRq inf f px, y q ,
P b
R P1 P2
px,yqPR
¸
U pf, P1 b P2 q ApRq sup f px, y q .
P b
R P1 P2 px,yqPR
Note that we always have Lpf, P1 b P2 q ¤ U pf, P1 b P2 q: it comes from the identity inf px,yqPR f px, y q ¤
suppx,yqPR f px, y q that we then sum over R P P1 b P2 . Then to define the integral, we take the supremum
and infimum over all partitions.
Definition 8.10 (Integral over a rectangle). Let D ra, bs rc, ds R2 a rectangle and f : D Ñ R a
function defined over D. We assume that f is bounded, that is there exists C P R such that |f px, y q| ¤ C
for all px, y q P D. We say that f is Riemann-integrable if
where the supremum and the infimum are taken over all P1 , P2 partitions of ra, bs and rc, ds respectively.
If this is the case, the common value to the supremum and the infimum is called the integral of f over
the rectangle D and denoted ¼ ¼
f or f px, y q dxdy.
D ra,bsrc,ds
Remark 8.11. In order to prove that f is integrable, it is enough to show that for all ε ¡ 0 there exist
partitions P1 , P2 such that Lpf, P1 b P2 q ¥ U pf, P1 b P2 q ε.
Remark 8.12. In the one-dimensional case, there is an orientation associated to a segment. That
³b ³a
is, if a, b are real numbers then both a f and b f are defined (for instance for a continuous f :
78
Chapter 8. Integrals of functions of several variables
³b ³a
rminpa, bq, maxpa, bqs Ñ R), and there holds a f b f . In the case of integrals of functions of
case: in the definition of the integral of f on ra, bs rc, ds we always
two variables, this is no longer the ´
assume that a ¤ b and c ¤ d. If D f 0 then necessarily there are regions where f takes negative
values.
Now that we have a definition, let us check that it is not empty, that is, there are functions which are
Riemann-integrable: actually all the continuous functions are.
Proposition 8.13 (Continuous functions are Riemann-integrable over rectangles). Let D ra, bs
rc, ds R2 a rectangle and f : D Ñ R a continuous function. Then f is Riemann-integrable over D.
Proof. Observe that D is compact. Thus we know that f is bounded thanks to Theorem 8.3.
Let ε ¡ 0. Following Remark 8.11 it is enough to show that there exist partitions P1 , P2 such that
Lpf, P1 b P2 q ¥ U pf, P1 b P2 q ε.
From the uniform continuity of f (Theorem 8.4), there exists δ such that if }px, y q px1 , y 1 q} ¤ δ then
|f px, yq f px1 , y1 q| ¤ ε{ppb aqpd cqq. ?
Now we choose P1 and P2 uniform partitions with step-size smaller than δ { 2. In particular, with
this choice if R is a rectangle in P1 b P2 then }px, y q px1 , y 1 q} ¤ δ for all px, y q, px1 , y 1 q P R. As a
consequence of the uniform continuity,
Though this result is satisfying, it is not enough as functions could be integrated on more complicated
domains (this is in contrast with the one-dimensional case where integrating over a segment covers already
a wide variety of cases). More generally, we can define the integral of a function over a domain which is
not a rectangle: the trick is to extend the function by defining it equal to 0 outside of the domain.
We restrict to domains D which are bounded, that is, there exists a constant C such that |x| ¤ C for
all x P D.
Definition 8.14 (Integral over a domain). Let D R2 a bounded domain and f : D Ñ R a bounded
function defined over D. We define the function f˜ : R2 Ñ R by
#
f p xq if x P D,
f˜pxq
0 otherwise,
that is, we extend f by 0 outside of D. Let D̃ be a rectangle containing D. We say that f is Riemann
integrable over D if f˜ is Riemann integrable over D̃, and in this case we define
¼ ¼
f f˜
D D̃
Both the notion of being Riemann integrable and the value of the integral do not depend on the rectangle
D̃ (as long as it contains D).
A particular case of this definition corresponds to f being the constant function equal to 1: in this
case we recover the notion of area.
79
Bocconi University – course 30543 (Mathematical Analysis module 2)
y = ϕ(x)
Domain D
y = ψ(x)
a x b
Definition 8.16 (Jordan measurable set, area). Let D R2 a bounded set. Let D̃ be a rectangle
containing D. We say that D is Jordan measurable if 1D is Riemann integrable over D̃. In this case,
we define the area of D as ¼
ApDq 1D .
D̃
Both the notion of being Jordan measurable and of area do not depend on the rectangle D̃ (as long as it
contains D).
Of course, one thing that is difficult is to find sharp criteria for a domain to be Jordan measurable.
Indeed, the function 1D is not continuous so we cannot us Proposition 8.13. Let us give one example of
a domain which is Jordan measurable.
Proposition 8.17. Let ψ, φ : ra, bs Ñ R two functions defined on an interval ra, bs and valued in R. We
assume that ψ pxq ¤ φpxq for all x P ra, bs. We define the domain
(
D px, yq P R2 : x P ra, bs and ψ pxq ¤ y ¤ φpxq ,
see Figure 8.3 for an illustration. If the functions ψ and φ are continuous, then the domain D is Jordan
measurable.
Proof. We follow again Remark 8.11. Let ε ¡ 0. From the uniform continuity of ψ, φ we can find δ ¡ 0
such that |x x1 | ¤ δ implies |ψ pxq ψ px1 q| ¤ ε and |φpxq φpx1 q| ¤ ε.
Now we choose P1 and P2 uniform partitions with step-size smaller than δ and ε respectively. Let
rxi1 , xi s one element of the partition P1 . By uniform continuity, the variations of ψ and φ are at most ε
on rxi1 , xi s. Thus the image of ψ and φ on rxi1 , xi s is included in at most 4 elements of the partition
P2 , and on each of this element the function 1D cannot vary by more than 1. For the rest of the elements
of P2 , the function 1D is constant over them. Moreover, each element of P2 has length at most ε. Thus:
¸
Aprxi xi1 s Ij q sup 1D inf 1D
P
Ij P2 rxi xi1 sIj rxi xi1 sIj
¤ 4 Imax
PP
j
LengthpIj qpxi xi1 q 4εpxi xi1 q.
2
80
Chapter 8. Integrals of functions of several variables
Remark 8.18. The sharp criterion for Jordan measurability that we will not prove is the following. A set
D is Jordan measurable if and only if its topological boundary has zero measure, that is, for every ε ¡ 0
we can include B D in a union of (potentially overlapping) rectangles so that the sum of the areas of the
rectangles is smaller than ε.
Let’s mix these two definitions: in some sense, the regularity of the domain and the regularity of the
functions can be studied separately. Then to combine we can use the following proposition.
Proposition 8.19 (Continuous functions are integrable over Jordan measurable domains). Let D R2
be a domain which is compact and Jordan measurable. Let f : D Ñ R a continuous function. Then f is
Riemann-integrable over D.
Proof. We will not do it properly because it’s a bit tedious but the idea is the following. Let recall that f˜
is the function which coincides with f on D and which is 0 outside of D. When you take a fine partition
P1 b P2 of a domain D̃ containing D, there are three kind of rectangles:
• Those which do not intersect D: over these ones you integrate the zero function so the difference
between the sup and the inf of f˜ is 0.
• Those which intersect both D and Dc : the total area that they cover is small because the domain
D is Jordan measurable.
• Those which are fully into D: over these ones the difference between the sup and the inf of f˜
coincides with the difference between the sup and inf of f , which is small because f is continuous.
Summing all these estimates enables to conclude to the an estimate of the form Lpf˜, P1 b P2 q ¥ U pf˜, P1 b
P2 q ε and yields integrability.
Remark 8.20. The sharpest criterion is that a bounded function is Riemann-integrable if and only if its
set of discontinuity point has zero measure, that is, for every ε ¡ 0 we can include it in a union of
(potentially overlapping) rectangles so that the sum of the areas of the rectangles is smaller than ε.
To conclude this section, let us state some useful properties of the integral. In some sense these are
the minimal properties that every reasonable definition of the integral should satisfy.
Proposition 8.21. Let D R2 a bounded domain and f, g : D Ñ R be two Riemann integrable functions
over D. Let also a, b P R be two scalars.
¼
(ii) Positivity: if f ¥ 0 over D then f ¥ 0.
D
¼ ¼
(iii) Monotonicity: if f ¤ g everywhere on D then f ¤ g.
D D
Proof. Left as an exercise. The idea is to work at the level of upper and lower sums, but this can be a
bit tedious and quantification can be delicate. Note that (iii) is a consequence of (i) and (ii).
Remark 8.22. The first point can be rewritten more abstractly:´the space of Riemann integrable functions
over a given domain is a vector space, and the mapping f ÞÑ D f is a linear form on it.
81
Bocconi University – course 30543 (Mathematical Analysis module 2)
Oz
Oy
Ox Domain D
y0
y0 + ∆y
Z b !
Contribution f (x, y0 ) dx ∆y
a
Figure 8.4: Illustration of Theorem 8.23. We decompose the volume under the graph as a sum of tiny
“y-slices” (only two are represented in green). The contribution of each slice to the total is an integral
with respect to the x variable. Summing all this contributions (that is, integrating in y) yields the total
integral of f .
¸
N ¸
M ¸
M ¸
N
uij uij . (8.1)
i 1j 1
j 1i 1
Indeed the two sums correspond to the same set of indexes t1, 2, . . . , N u t1, 2, . . . , M u but enumerated
in two different ways. Fubini’s theorem is the same kind of identities, but for integrals, and its proof in
the end boils down to using (8.1) for the lower and upper sums.
Theorem 8.23 (Fubini for rectangles). Let D ra, bs rc, ds R2 be a rectangle and f : D Ñ R
³b ³d
a continuous function. Then the functions y P rc, ds ÞÑ a f px, y q dx and x P ra, bs ÞÑ c f px, y q dy are
» » » »
continuous and ¼ d b b d
f f px, y q dx dy f px, y q dy dx.
c a a c
D
Let us first state and prove a Lemma which is of its own interest and which corresponds to the first
part of the theorem.
Lemma 8.24. Let D ra, bs rc, ds R2 be a rectangle and f : D Ñ R a continuous function. Then
³b ³d
the functions y P rc, ds ÞÑ a f px, y q dx and x P ra, bs ÞÑ c f px, y q dy are continuous.
Proof. The function f is uniformly continuous on D (Heine’s theorem, see Theorem 8.4). Let ε ¡ 0, and
take δ such that }px, y q px1 , y 1 q} ¤ δ implies |f px, y q f px1 , y 1 q| ¤ ε. In particular, if we fix x P ra, bs
82
Chapter 8. Integrals of functions of several variables
then
|y y1 | ¤ δ ñ |f px, yq f px, y1 q| ¤ ε.
Integrating this inequality in x P ra, bs and using the triangle inequality for integrals,
» »b
b
|y y1 | ¤ δ ñ
p q
f x, y dx
a
f x, y 1 dx
p
q ¤ pb aqε.
a
³
This proves that the function y P rc, ds Þѳ ab f px, yq dx is (uniformly) continuous. By permuting the role
of x and y, we also see that x P ra, bs ÞÑ c f px, y q dy is also uniformly continuous.
d
Remark 8.25. This lemma can fail if f is only Riemann-integrable, that is, f Riemann-integrable does not
³b ³d
imply that the functions y P rc, ds ÞÑ a f px, y q dx and x P ra, bs ÞÑ c f px, y q dy are Riemann-integrable:
see for instance Exercise 7.3.3 of the textbook Vector Calculus by Baxandall and Liebeck. This is one
of the “weaknesses” of the Riemann theory of integration (that is solved by the Lebesgue theory of
integration).
Proof of Theorem 8.23. We start by choosing ε ¡ 0. Let P1 , P2 partitions of ra, bs and rc, ds such that
Lpf, P1 b P2 q ¥ U pf, P1 b P2 q ε. We write P1 px0 , x1 , . . . , xN q and P2 py0 , y1 , . . . , yM q. For
i t1, 2, . . . , N u and j P t1, 2, . . . , M u we write
¸
N ¸
M ¸
N ¸
M
Lpf, P1 b P2 q pxi xi1 qpyj yj1 qlij , U pf, P1 b P2 q pxi xi1 qpyj yj1 quij .
i 1j 1
i 1j 1
We concentrate on the lower sum. Now, this is the important point, as we have finite sums we can really
first sum in j and then sum in i (that is identity (8.1)), that is:
¸
N ¸
M
Lpf, P1 b P2 q pxi xi1 q pyj yj1 qlij .
i 1 j 1
Let’s fix an index i. Then for each x P rxi1 , xi s, and by using the definition of Riemann sum for one
dimensional integrals,
¸
M »d
pyj yj1 qlij ¤ f px, y q dy.
j 1 c
¸
M »d
pyj yj1 qlij ¤ xPrxmin,x s f px, y q dy.
j 1 1
i i c
Next we sum in i, and again we use the definition of Riemann sum for one dimensional integrals (but
³d
this time for the function x ÞÑ c f px, y q dy which is Riemann-integrable thanks to Lemma 8.24):
»d » b » d
¸
N ¸
M ¸
N
pxi xi1 q pyj yj1 qlij ¤ pxi xi1 q xPrxmin,x s f px, y q dy ¤ f px, y q dy dx,
i 1 j 1
i 1 i1 i c a c
where we recall that the left hand side is nothing else than Lpf, P1 b P2 q. Doing the same reasoning with
the upper sum, one can see on the other hand that
» b » d
¸
N ¸
M
U pf, P1 b P2 q pxi xi1 q pyj yj1 quij ¥ f px, y q dy dx.
i 1
j 1 a c
83
Bocconi University – course 30543 (Mathematical Analysis module 2)
To conclude we use the assumption that the partitions P1 , P2 were chosen in such a way that Lpf, P1 b
P2 q ¥ U pf, P1 b P2 q ε. It yields
» b » d » b » d
p
L f, P1
b P2 q f px, y q dy
dx
¤ε and
p
U f, P1
b P2 q f px, y q dy
dx
¤ ε.
a c a c
We can then permute the role of x and y to get the other equality.
Example 8.26. Let D be the rectangle delimited by 0 ¤ x ¤ 2 and 0 ¤ y ¤ 1. We would like to compute
¼
xy 2x xy 3 dxdy.
D
xy 2x xy 3
dxdy xy 2x xy 3
dx dy
0 0
D
»1 2
yx2 x2 y 3
2
x2
2
dy
0
»1
x 0
2y 4 2y 3 dy
0
1
1 4 1
y 2
4y
2
y 1 4
2
112 .
y 0
A useful case if when a function is the product of two functions as the double integral becomes a
product of two one-dimensional integrals. Note that it is a particular case, and not every function can
be expressed as a product of a function of x and a function of y. Note also that it works only when the
domain D is a rectangle.
Corollary 8.27. Let D ra, bs rc, ds R2 be a rectangle and g : ra, bs Ñ R, h : rc, ds Ñ R two
continuous functions of one variable. We define f : D Ñ R by f px, y q g pxqhpy q. Then
¼ » »
b d
f g pxq dx hpy q dy .
a c
D
Proof. We use (twice) the linearity of the integral: starting from Theorem 8.23,
¼ » b » d » b » d
f f px, y q dy dx g pxqhpy q dy dx
a c a c
D
»b » » »
d b d
g pxq hpy q dy dx g pxq dx hpy q dy .
a c a c
Example 8.28. Redo Example 8.26 using the linearity of the integral and the corollary we just proved.
Let us also state a theorem of Fubini for a domain which is not a rectangle. We will not do the proof
because it is more involved but the basic idea is the same: go back to a partition of a domain in little
rectangles and summing over the rectangles in the right order.
84
Chapter 8. Integrals of functions of several variables
Oy
x2
y x2 1
Ox
Figure 8.5: Plot of the domain in Example 8.30. It corresponds to one as described in Proposition 8.17.
Theorem 8.29 (Fubini for more general domains). Let ψ, φ : ra, bs Ñ R two continuous functions defined
on an interval ra, bs and valued in R. We assume that ψ pxq ¤ φpxq for all x P ra, bs. We define the domain
(
D px, yq P R2 : x P ra, bs and ψ pxq ¤ y ¤ φpxq .
³ φpxq
We take f : D Ñ R a continuous function. Then the function x P ra, bs Ñ ψ x p q f px, yq dy is continuous
and
¼ » b » φpxq
f f px, y q dy dx.
D
a ψ x pq
Of course we can do a similar statement if the domain D has a nice decomposition in the Oy direction.
Example ´ 8.30. Let D be the domain delimited by the curves x 2, y ex and y 21 x 1. Let’s
compute D xy dxdy. Though we are integrating a product of functions, the domain is not a rectangle
so the resulting integral is not the product of the integrals. After drawing the domain, see Figure 8.5, we
realize that we are in the framework of the theorem with a 0, b 2, ψ pxq 12 x 1 and φpxq ex .
Thus
¼ » 2 » exppxq »2 yexppxq »2
2
y2 1 1
xy dxdy xy dy x
2
dx
2
x e2x 1
2
x dx
D
0 1{2x 1 0 {
y 1 1 2x 0
»2
2
1 1 1 2x 1 2x x2
2
xe 2x
x x 2
x3
4
dx
2 2
xe e
4 2
x3
3
x4
16
38 e4 241 .
0 0
where you can notice that we have used one integration by parts. As you can see, applying Fubini is just
one step, there is a tedious computation left.
85
Bocconi University – course 30543 (Mathematical Analysis module 2)
v axis y axis ∆u ∂ϕ
∂u
(x, y) = ϕ(u, v)
v + ∆v
∆v ∂ϕ
∂v
v
ϕ(u, v)
u axis x axis
u u + ∆u
Figure 8.6: Understanding the Jacobian factor. On the left, small rectangle of area ∆u∆v (in red). On
the right, image of the rectangle by the change of variables px, y q φpu, v q, it gives a “curved rectangle”
(also in red). This curved rectangle is approximated by the parallelogram with sides ∆u BBφu and ∆v BBφv (in
purple): this is precisely what differentiability means. The area of this parallelogram is |∆u∆v det Dφ|.
The link between areas and determinant was recalled in Chapter 1, see Figure 1.7.
it holds for instance if f is continuous and φ is of class C 1 . Informally, one can think about it as
“dx φ1 puqdu”. This makes sense: locally, an infinitesimal variation of length du yields a variation in
x φpuq of dx φ1 puqdu. The question is to understand what it becomes for functions φ from R2 Ñ R2 .
Let’s take φ : R2 Ñ R2 and think px, y q φpu, v q. Locally, at a point pu, v q the map φ is well
approximated by its differential Dφpu, v q (note that we use a slightly different notation compared to the
previous chapters, it should rather be Dφpu,vq but that would be too cumbersome). Then, a linear map
represented by a 2 2 matrix M , that is pu, v q Ñ M pu, v q, distorts area by multiplying them by a factor
| detpM q| (we have added absolute value because areas here are not considered to be oriented). Combining
these two ideas, and see Figure 8.6 for an illustration, we arrive to the conclusion that one should write
dxdy | det Dφpu, v q|dudv. That is, in a change of variables one has to plug the “determinant of the
Jacobian matrix” to take in account volume distortion. Note that in coordinates, | det Dφpu, v q| is the
real number defined as
| det Dφpu, vq| det
Bφ Bφ1
p q
B φ1
p q Bφ2 pu, vq Bφ1 pu, vq Bφ2 pu, vq .
BBφu2 B
1
Bφ2 Bu (8.3)
Bv Bv Bu
v u, v u, v
Bu Bv
We state below a rigorous theorem, though we will not prove it (the proof can be very tedious and
amounts to quantify properly what was written above).
Theorem 8.31 (Change of variables for integral of 2 variables). Let U be a Jordan measurable domain
and φ : Ũ Ñ R2 be a function defined on an open set Ũ containing U . We assume that φ is injective,
86
Chapter 8. Integrals of functions of several variables
D
U
Figure 8.7: Computation of the area of an ellipse, see Example 8.35. An ellipse (in blue) can be seen as
the image of a circle (in red) by a linear map.
of class C 1 and that det Dφ does not vanish. We define D φpU q and take f :D Ñ R a continuous
function. Then D is Jordan measurable and
¼ ¼
f px, y q dxdy f pφpu, v qq| det Dφpu, v q| dudv.
D U
In the identity above | det Dφpu, v q| is the determinant of the Jacobian matrix of φ at the point pu, v q as
defined in (8.3).
Remark 8.32. Importantly we assume that φ is injective whereas it was not the case in (8.2): indeed,
in one dimension the integrals are signed, and if φ is not injective then φ1 changes sign and there will
be compensations that ensure (8.2) to hold. As mentioned in Remark 8.12 such effect cannot happen
for two dimensional integrals (and this also linked to the absolute value around det Dφ), thus one has to
assume that φ is injective.
Remark 8.33. The assumption that det Dφ 0 is needed for the proof but it’s more a technical issue than
a conceptual one, as this kind of assumption can be removed in the Lebesgue theory of integration. A
function φ : U Ñ R2 of class C 1 such that φ is injective and det Dφ 0 is actually a C 1 -diffeomorphism
onto its image D φpU q, in the sense that φ1 : D Ñ U is also of class C 1 .
φpu, v q M
u
.
v
Proof. We apply Theorem 8.31. If det M 0 then we know that the map φ is a bijection. Moreover,
| det Dφ| is constant and equal to | det M |.
Example 8.35 (Area of an ellipse). Let us compute the area of an ellipse, that is, we fix a, b ¡ 0 and we
want to compute the area of " *
x2 y2
D px, y q P R : 2
2
¤1 .
a b2
Let’s define U as the unit disk, that is, U tpx, yq P R2 : x2 y2 ¤ 1u. We also define M the 2 2
matrix given by
a 0
M 0 b
.
87
Bocconi University – course 30543 (Mathematical Analysis module 2)
We associate it to it φ : pu, v q ÞÑ M pu, v q pau, bv q. Then D φpU q (exercise: prove it rigorously) thus
¼ ¼
ApDq dxdy | det M | dudv | det M |ApU q.
D U
As ApU q π (this is geometry) and | det M | ab, we conclude that the area of the ellipse with parameters
a, b is πab.
A central change of variables is the one in polar coordinate, which is useful when the domain of
integration and/or the function have some radial symmetry.
Corollary 8.36 (Change of variables in polar coordinates). Let D be a Jordan measurable domain of
R2 . We define the domain U as
"
*
r cospθq
U pr, θq P r0, 8q r0, 2πq :
r sinpθq
PD .
The domain U is Jordan measurable and if the function f is continuous over D there holds
¼ ¼
f px, y q dxdy f pr cospθq, r sinpθqqr drdθ.
D U
r cospθq
φpr, θq
r sinpθq
.
Then this function is injective and of class C 1 on p0, 8q r0, 2π q. Note that we do not fall exactly in
the framework of Theorem 8.31 as φ is not injective on t0u r0, 2π q. However, as the region where r 0
has a area of 0 (it corresponds to a line in R2 ), it does pose a problem.
Then we compute the Jacobian matrix of φ, it yields:
cospθq r sinpθq
Dφpr, θq
sinpθq r cospθq
,
Now the domain U is a rectangle so we can use Fubini! Recognizing in r{p1 r2 q the derivative of
1{2 lnp1 r2 q we get
¼ » 2π » 1
» 2π 1 » 2π
1 lnp2q
1
r
r2
drdθ 1
r
r2
dr dθ 2
lnp1 r2 q dθ 2
dθ π lnp2q.
0 0 0 0 0
U
88