Implicit function theorem

In multivariable calculus, the implicit function theorem^[a] is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.

More precisely, given a system of $m$ equations $f i (x 1, ..., x n, y 1, ..., y m) = 0, i = 1, ..., m$ (often abbreviated into $F (x, y) = 0$ ), the theorem states that, under a mild condition on the partial derivatives (with respect to each $y i$ ) at a point, the $m$ variables $y i$ are differentiable functions of the $x j$ in some neighborhood of the point. As these functions generally cannot be expressed in closed form, they are implicitly defined by the equations, and this motivated the name of the theorem.^[1]

In other words, under a mild condition on the partial derivatives, the set of zeros of a system of equations is locally the graph of a function.

History

Augustin-Louis Cauchy (1789–1857) is credited with the first rigorous form of the implicit function theorem. Ulisse Dini (1845–1918) generalized the real-variable version of the implicit function theorem to the context of functions of any number of real variables.^[2]

First example

If we define the function $f (x, y) = x 2 + y 2$ , then the equation $f (x, y) = 1$ cuts out the unit circle as the level set ${(x, y) | f (x, y) = 1}$ . There is no way to represent the unit circle as the graph of a function of one variable $y = g (x)$ because for each choice of $x \in (-1, 1)$ , there are two choices of y, namely $\pm {\sqrt {1-x^{2}}}$ .

However, it is possible to represent part of the circle as the graph of a function of one variable. If we let $g_{1}(x)={\sqrt {1-x^{2}}}$ for $-1 \leq x \leq 1$ , then the graph of $y = g 1 (x)$ provides the upper half of the circle. Similarly, if $g_{2}(x)=-{\sqrt {1-x^{2}}}$ , then the graph of $y = g 2 (x)$ gives the lower half of the circle.

The purpose of the implicit function theorem is to tell us that functions like $g 1 (x)$ and $g 2 (x)$ almost always exist, even in situations where we cannot write down explicit formulas. It guarantees that $g 1 (x)$ and $g 2 (x)$ are differentiable, and it even works in situations where we do not have a formula for $f (x, y)$ .

Definitions

Let $f:\mathbb {R} ^{n+m}\to \mathbb {R} ^{m}$ be a continuously differentiable function. We think of $\mathbb {R} ^{n+m}$ as the Cartesian product $\mathbb {R} ^{n}\times \mathbb {R} ^{m},$ and we write a point of this product as $(\mathbf {x} ,\mathbf {y} )=(x_{1},\ldots ,x_{n},y_{1},\ldots y_{m}).$ Starting from the given function $f$ , our goal is to construct a function $g:\mathbb {R} ^{n}\to \mathbb {R} ^{m}$ whose graph $({\textbf {x}},g({\textbf {x}}))$ is precisely the set of all $({\textbf {x}},{\textbf {y}})$ such that $f({\textbf {x}},{\textbf {y}})={\textbf {0}}$ .

As noted above, this may not always be possible. We will therefore fix a point $({\textbf {a}},{\textbf {b}})=(a_{1},\dots ,a_{n},b_{1},\dots ,b_{m})$ which satisfies $f({\textbf {a}},{\textbf {b}})={\textbf {0}}$ , and we will ask for a $g$ that works near the point $({\textbf {a}},{\textbf {b}})$ . In other words, we want an open set $U\subset \mathbb {R} ^{n}$ containing ${\textbf {a}}$ , an open set $V\subset \mathbb {R} ^{m}$ containing ${\textbf {b}}$ , and a function $g:U\to V$ such that the graph of $g$ satisfies the relation $f={\textbf {0}}$ on $U\times V$ , and that no other points within $U\times V$ do so. In symbols,

$\{(\mathbf {x} ,g(\mathbf {x} ))\mid \mathbf {x} \in U\}=\{(\mathbf {x} ,\mathbf {y} )\in U\times V\mid f(\mathbf {x} ,\mathbf {y} )=\mathbf {0} \}.$

To state the implicit function theorem, we need the Jacobian matrix of $f$ , which is the matrix of the partial derivatives of $f$ . Abbreviating $(a_{1},\dots ,a_{n},b_{1},\dots ,b_{m})$ to $({\textbf {a}},{\textbf {b}})$ , the Jacobian matrix is

$(Df)(\mathbf {a} ,\mathbf {b} )=\left[{\begin{array}{ccc|ccc}{\frac {\partial f_{1}}{\partial x_{1}}}(\mathbf {a} ,\mathbf {b} )&\cdots &{\frac {\partial f_{1}}{\partial x_{n}}}(\mathbf {a} ,\mathbf {b} )&{\frac {\partial f_{1}}{\partial y_{1}}}(\mathbf {a} ,\mathbf {b} )&\cdots &{\frac {\partial f_{1}}{\partial y_{m}}}(\mathbf {a} ,\mathbf {b} )\\\vdots &\ddots &\vdots &\vdots &\ddots &\vdots \\{\frac {\partial f_{m}}{\partial x_{1}}}(\mathbf {a} ,\mathbf {b} )&\cdots &{\frac {\partial f_{m}}{\partial x_{n}}}(\mathbf {a} ,\mathbf {b} )&{\frac {\partial f_{m}}{\partial y_{1}}}(\mathbf {a} ,\mathbf {b} )&\cdots &{\frac {\partial f_{m}}{\partial y_{m}}}(\mathbf {a} ,\mathbf {b} )\end{array}}\right]=\left[{\begin{array}{c|c}X&Y\end{array}}\right]$

where $X$ is the matrix of partial derivatives in the variables $x_{i}$ and $Y$ is the matrix of partial derivatives in the variables $y_{j}$ . The implicit function theorem says that if $Y$ is an invertible matrix, then there are $U$ , $V$ , and $g$ as desired. Writing all the hypotheses together gives the following statement.

Statement of the theorem

Let $f:\mathbb {R} ^{n+m}\to \mathbb {R} ^{m}$ be a continuously differentiable function, and let $\mathbb {R} ^{n+m}$ have coordinates $({\textbf {x}},{\textbf {y}})$ . Fix a point $({\textbf {a}},{\textbf {b}})=(a_{1},\dots ,a_{n},b_{1},\dots ,b_{m})$ with $f({\textbf {a}},{\textbf {b}})=\mathbf {0}$ , where $\mathbf {0} \in \mathbb {R} ^{m}$ is the zero vector. If the Jacobian matrix (this is the right-hand panel of the Jacobian matrix shown in the previous section): $J_{f,\mathbf {y} }(\mathbf {a} ,\mathbf {b} )=\left[{\frac {\partial f_{i}}{\partial y_{j}}}(\mathbf {a} ,\mathbf {b} )\right]$ is invertible, then there exists an open set $U\subset \mathbb {R} ^{n}$ containing ${\textbf {a}}$ such that there exists a unique function $g:U\to \mathbb {R} ^{m}$ such that $g(\mathbf {a} )=\mathbf {b}$ , and $f(\mathbf {x} ,g(\mathbf {x} ))=\mathbf {0} ~{\text{for all}}~\mathbf {x} \in U$ . Moreover, $g$ is continuously differentiable and, denoting the left-hand panel of the Jacobian matrix shown in the previous section as: $J_{f,\mathbf {x} }(\mathbf {a} ,\mathbf {b} )=\left[{\frac {\partial f_{i}}{\partial x_{j}}}(\mathbf {a} ,\mathbf {b} )\right],$ the Jacobian matrix of partial derivatives of $g$ in $U$ is given by the matrix product:^[3] $\left[{\frac {\partial g_{i}}{\partial x_{j}}}(\mathbf {x} )\right]_{m\times n}=-\left[J_{f,\mathbf {y} }(\mathbf {x} ,g(\mathbf {x} ))\right]_{m\times m}^{-1}\,\left[J_{f,\mathbf {x} }(\mathbf {x} ,g(\mathbf {x} ))\right]_{m\times n}$

Higher derivatives

If, moreover, $f$ is analytic or continuously differentiable $k$ times in a neighborhood of $({\textbf {a}},{\textbf {b}})$ , then one may choose $U$ in order that the same holds true for $g$ inside $U$ . ^[4] In the analytic case, this is called the analytic implicit function theorem.

Proof for 2D case

Suppose $F:\mathbb {R} ^{2}\to \mathbb {R}$ is a continuously differentiable function defining a curve $F(\mathbf {r} )=F(x,y)=0$ . Let $(x_{0},y_{0})$ be a point on the curve. The statement of the theorem above can be rewritten for this simple case as follows:

Theorem — If $\left.{\frac {\partial F}{\partial y}}\right|_{(x_{0},y_{0})}\neq 0$ then in a neighbourhood of the point $(x_{0},y_{0})$ we can write $y=f(x)$ , where $f$ is a real function.

Proof. Since $F$ is differentiable we write the differential of $F$ through partial derivatives: $\mathrm {d} F=\operatorname {grad} F\cdot \mathrm {d} \mathbf {r} ={\frac {\partial F}{\partial x}}\mathrm {d} x+{\frac {\partial F}{\partial y}}\mathrm {d} y.$

Since we are restricted to movement on the curve $F=0$ and by assumption ${\tfrac {\partial F}{\partial y}}\neq 0$ around the point $(x_{0},y_{0})$ (since ${\tfrac {\partial F}{\partial y}}$ is continuous at $(x_{0},y_{0})$ and $\left.{\tfrac {\partial F}{\partial y}}\right|_{(x_{0},y_{0})}\neq 0$ ). Therefore we have a first-order ordinary differential equation: $\partial _{x}F\mathrm {d} x+\partial _{y}F\mathrm {d} y=0,\quad y(x_{0})=y_{0}$

Now we are looking for a solution to this ODE in an open interval around the point $(x_{0},y_{0})$ for which, at every point in it, $\partial _{y}F\neq 0$ . Since $F$ is continuously differentiable and from the assumption we have $|\partial _{x}F|<\infty ,|\partial _{y}F|<\infty ,\partial _{y}F\neq 0.$

From this we know that ${\tfrac {\partial _{x}F}{\partial _{y}F}}$ is continuous and bounded on both ends. From here we know that $-{\tfrac {\partial _{x}F}{\partial _{y}F}}$ is Lipschitz continuous in both $x$ and $y$ . Therefore, by Cauchy-Lipschitz theorem, there exists unique $y(x)$ that is the solution to the given ODE with the initial conditions. Q.E.D.

The circle example

Let us go back to the example of the unit circle. In this case n = m = 1 and $f(x,y)=x^{2}+y^{2}-1$ . The matrix of partial derivatives is just a 1 × 2 matrix, given by $(Df)(a,b)={\begin{bmatrix}{\dfrac {\partial f}{\partial x}}(a,b)&{\dfrac {\partial f}{\partial y}}(a,b)\end{bmatrix}}={\begin{bmatrix}2a&2b\end{bmatrix}}$

Thus, here, the $Y$ in the statement of the theorem is just the number $2 b$ ; the linear map defined by it is invertible if and only if $b \neq 0$ . By the implicit function theorem we see that we can locally write the circle in the form $y = g (x)$ for all points where $y \neq 0$ . For $(\pm1, 0)$ we run into trouble, as noted before. The implicit function theorem may still be applied to these two points, by writing $x$ as a function of $y$ , that is, $x=h(y)$ ; now the graph of the function will be $\left(h(y),y\right)$ , since where $b = 0$ we have $a = 1$ , and the conditions to locally express the function in this form are satisfied.

The implicit derivative of y with respect to x, and that of x with respect to y, can be found by totally differentiating the implicit function $x^{2}+y^{2}-1$ and equating to 0: $2x\,dx+2y\,dy=0,$ giving ${\frac {dy}{dx}}=-{\frac {x}{y}}$ and ${\frac {dx}{dy}}=-{\frac {y}{x}}.$

Application: change of coordinates

Suppose we have an $m$ -dimensional space, parametrised by a set of coordinates $(x_{1},\ldots ,x_{m})$ . We can introduce a new coordinate system $(x'_{1},\ldots ,x'_{m})$ by supplying m functions $h_{1}\ldots h_{m}$ each being continuously differentiable. These functions allow us to calculate the new coordinates $(x'_{1},\ldots ,x'_{m})$ of a point, given the point's old coordinates $(x_{1},\ldots ,x_{m})$ using $x'_{1}=h_{1}(x_{1},\ldots ,x_{m}),\ldots ,x'_{m}=h_{m}(x_{1},\ldots ,x_{m})$ . One might want to verify if the opposite is possible: given coordinates $(x'_{1},\ldots ,x'_{m})$ , can we 'go back' and calculate the same point's original coordinates $(x_{1},\ldots ,x_{m})$ ? The implicit function theorem will provide an answer to this question. The (new and old) coordinates $(x'_{1},\ldots ,x'_{m},x_{1},\ldots ,x_{m})$ are related by f = 0, with $f(x'_{1},\ldots ,x'_{m},x_{1},\ldots ,x_{m})=(h_{1}(x_{1},\ldots ,x_{m})-x'_{1},\ldots ,h_{m}(x_{1},\ldots ,x_{m})-x'_{m}).$ Now the Jacobian matrix of f at a certain point (a, b) [ where $a=(x'_{1},\ldots ,x'_{m}),b=(x_{1},\ldots ,x_{m})$ ] is given by $(Df)(a,b)=\left[{\begin{matrix}-1&\cdots &0\\\vdots &\ddots &\vdots \\0&\cdots &-1\end{matrix}}\left|{\begin{matrix}{\frac {\partial h_{1}}{\partial x_{1}}}(b)&\cdots &{\frac {\partial h_{1}}{\partial x_{m}}}(b)\\\vdots &\ddots &\vdots \\{\frac {\partial h_{m}}{\partial x_{1}}}(b)&\cdots &{\frac {\partial h_{m}}{\partial x_{m}}}(b)\\\end{matrix}}\right.\right]=[-I_{m}|J].$ where I_m denotes the m × m identity matrix, and $J$ is the $m \times m$ matrix of partial derivatives, evaluated at (a, b). (In the above, these blocks were denoted by X and Y. As it happens, in this particular application of the theorem, neither matrix depends on a.) The implicit function theorem now states that we can locally express $(x_{1},\ldots ,x_{m})$ as a function of $(x'_{1},\ldots ,x'_{m})$ if J is invertible. Demanding J is invertible is equivalent to det J ≠ 0, thus we see that we can go back from the primed to the unprimed coordinates if the determinant of the Jacobian J is non-zero. This statement is also known as the inverse function theorem.

Example: polar coordinates

As a simple application of the above, consider the plane, parametrised by polar coordinates $(R, θ)$ . We can go to a new coordinate system (cartesian coordinates) by defining functions $x (R, θ) = R cos(θ)$ and $y (R, θ) = R sin(θ)$ . This makes it possible given any point $(R, θ)$ to find corresponding Cartesian coordinates $(x, y)$ . When can we go back and convert Cartesian into polar coordinates? By the previous example, it is sufficient to have $det J \neq 0$ , with $J={\begin{bmatrix}{\frac {\partial x(R,\theta )}{\partial R}}&{\frac {\partial x(R,\theta )}{\partial \theta }}\\{\frac {\partial y(R,\theta )}{\partial R}}&{\frac {\partial y(R,\theta )}{\partial \theta }}\\\end{bmatrix}}={\begin{bmatrix}\cos \theta &-R\sin \theta \\\sin \theta &R\cos \theta \end{bmatrix}}.$ Since $det J = R$ , conversion back to polar coordinates is possible if $R \neq 0$ . So it remains to check the case $R = 0$ . It is easy to see that in case $R = 0$ , our coordinate transformation is not invertible: at the origin, the value of θ is not well-defined.

Generalizations

Banach space version

Based on the inverse function theorem in Banach spaces, it is possible to extend the implicit function theorem to Banach space valued mappings.^[5]^[6]

Let X, Y, Z be Banach spaces. Let the mapping $f : X \times Y \to Z$ be continuously Fréchet differentiable. If $(x_{0},y_{0})\in X\times Y$ , $f(x_{0},y_{0})=0$ , and $y\mapsto Df(x_{0},y_{0})(0,y)$ is a Banach space isomorphism from Y onto Z, then there exist neighbourhoods U of x₀ and V of y₀ and a Fréchet differentiable function g : U → V such that f(x, g(x)) = 0 and f(x, y) = 0 if and only if y = g(x), for all $(x,y)\in U\times V$ .

Implicit functions from non-differentiable functions

Various forms of the implicit function theorem exist for the case when the function f is not differentiable. It is standard that local strict monotonicity suffices in one dimension.^[7] The following more general form was proven by Kumagai based on an observation by Jittorntrum.^[8]^[9]

Consider a continuous function $f:\mathbb {R} ^{n}\times \mathbb {R} ^{m}\to \mathbb {R} ^{n}$ such that $f(x_{0},y_{0})=0$ . If there exist open neighbourhoods $A\subset \mathbb {R} ^{n}$ and $B\subset \mathbb {R} ^{m}$ of x₀ and y₀, respectively, such that, for all y in B, $f(\cdot ,y):A\to \mathbb {R} ^{n}$ is locally one-to-one, then there exist open neighbourhoods $A_{0}\subset \mathbb {R} ^{n}$ and $B_{0}\subset \mathbb {R} ^{m}$ of x₀ and y₀, such that, for all $y\in B_{0}$ , the equation f(x, y) = 0 has a unique solution $x=g(y)\in A_{0},$ where g is a continuous function from B₀ into A₀.

Collapsing manifolds

Perelman’s collapsing theorem for 3-manifolds, the capstone of his proof of Thurston's geometrization conjecture, can be understood as an extension of the implicit function theorem.^[10]

Notes

^ Also called Dini's theorem by the Pisan school in Italy. In the English-language literature, Dini's theorem is a different theorem in mathematical analysis.

References

^ Chiang, Alpha C. (1984). Fundamental Methods of Mathematical Economics (3rd ed.). McGraw-Hill. pp. 204–206. ISBN 0-07-010813-7.
^ Krantz, Steven; Parks, Harold (2003). The Implicit Function Theorem. Modern Birkhauser Classics. Birkhauser. ISBN 0-8176-4285-4.
^ de Oliveira, Oswaldo (2013). "The Implicit and Inverse Function Theorems: Easy Proofs". Real Anal. Exchange. 39 (1): 214–216. arXiv:1212.2066. doi:10.14321/realanalexch.39.1.0207. S2CID 118792515.
^ Fritzsche, K.; Grauert, H. (2002). From Holomorphic Functions to Complex Manifolds. Springer. p. 34. ISBN 9780387953953.
^ Lang, Serge (1999). Fundamentals of Differential Geometry. Graduate Texts in Mathematics. New York: Springer. pp. 15–21. ISBN 0-387-98593-X.
^ Edwards, Charles Henry (1994) [1973]. Advanced Calculus of Several Variables. Mineola, New York: Dover Publications. pp. 417–418. ISBN 0-486-68336-2.
^ Kudryavtsev, Lev Dmitrievich (2001) [1994], "Implicit function", Encyclopedia of Mathematics, EMS Press
^ Jittorntrum, K. (1978). "An Implicit Function Theorem". Journal of Optimization Theory and Applications. 25 (4): 575–577. doi:10.1007/BF00933522. S2CID 121647783.
^ Kumagai, S. (1980). "An implicit function theorem: Comment". Journal of Optimization Theory and Applications. 31 (2): 285–288. doi:10.1007/BF00934117. S2CID 119867925.
^ Cao, Jianguo; Ge, Jian (2011). "A simple proof of Perelman's collapsing theorem for 3-manifolds". J. Geom. Anal. 21 (4): 807–869. arXiv:1003.2215. doi:10.1007/s12220-010-9169-5. S2CID 514106.