Concave and Convex Functions: 1 Basic Definitions
Concave and Convex Functions: 1 Basic Definitions
Concave and Convex Functions: 1 Basic Definitions
Washington University
March 27, 2018
(b) f is strictly concave iff for any a, b ∈ C and any θ ∈ (0, 1), the above
inequality is strict.
(b) f is strictly convex iff for any a, b ∈ C and any θ ∈ (0, 1), the above
inequality is strict.
f is both concave and convex iff for any a, b ∈ RN and any θ ∈ (0, 1), f (θa +
(1 − θ)b) = θf (a) + (1 − θ)f (b). A function f is affine iff there is a 1 × N matrix
A and a number y ∗ ∈ R such that for all x ∈ C, f (x) = Ax + y ∗ . f is linear if it is
affine with y ∗ = 0.
Proof.
2. ⇐. Let y ∗ = f (0) and let g(x) = f (x) − y ∗ , so that g(0) = 0. Since f is both
concave and convex, so is g.
cbna. This work is licensed under the Creative Commons Attribution-NonCommercial-
1
1
• Claim: for any a ∈ RN , for any γ ≥ 0, g(γa) = γg(a).
The claim is trivially true for γ equal to either 0 or 1. Suppose γ ∈ (0, 1).
Then g(γa) = g(γa + (1 − γ)0) = γg(a) + (1 − γ)g(0) = γg(a).
On the other hand, if γ > 1, then 1/γ ∈ (0, 1) and hence g(a) =
g((1/γ)γa + (1 − 1/γ)0) = (1/γ)g(γa) + (1 − 1/γ)g(0) = (1/γ)g(γa).
Multiplying through by γ gives γg(a) = g(γa).
• Claim: for any a, b ∈ RN , g(a + b) = g(a) + g(b).
g(a + b) = g((1/2)(2a) + (1/2)(2b)) = (1/2)g(2a) + (1/2)g(2b) = g(a) +
g(b), where the last equality comes from the previous claim.
For N = 1, the next result says that a function is concave iff, informally, its
slope is weakly decreasing. If the function is differentiable then the implication is
that the derivative is weakly decreasing.
2
Proof. Take any a, b, c ∈ C, a < b < c. Since b − a and c − b > 0, the first inequality
under (1), holds iff
Proof. Suppose that f is concave. I will show that hypf is convex. Take any
z1 , z2 ∈ hypf and any θ ∈ [0, 1]. Then there is an a, b ∈ C and y1 , y2 ∈ R, such
that z1 = (a, y1 ), z2 = (b, y2 ), with f (a) ≥ y1 , f (b) ≥ y2 . By concavity of f ,
f (θa + (1 − θ)b) ≥ θf (a) + (1 − θ)f (b). Hence f (θa + (1 − θ)b) ≥ θy1 + (1 − θ)y2 . The
latter says that the point θz1 + (1 − θ)z2 = (θa + (1 − θ)b, θy1 + (1 − θ)y2 ) ∈ hypf ,
as was to be shown. The other directions are similar.
2
”Hypo” means “under” and “epi” means “over.” A hypodermic needle goes under your skin,
the top layer of which is your epidermis.
3
2 Concavity, Convexity, and Continuity.
Theorem 5. Let C ⊆ RN be non-empty, open and convex and let f : C → R be
either convex or concave. Then f is continuous.
Proof. Let f be concave. Consider first the case N = 1. Theorem 3 implies that
for any a, b, c ∈ C, with a < b < c, the graph of f is sandwiched between the graphs
of two lines through the point (b, f (b)), one line through the points (a, f (a)) and
(b, f (b)) and the other through the points (b, f (b)) and (c, f (c)). Explicitly, Theorem
3 implies that for all x ∈ [a, b],
If f (c) ≤ f (a), then the analog of (3) holds with f (b) − f (c) on the right-hand side.
Inequality (3) may be obvious from the “sandwich” characterization above, but
for the sake of completeness, here is a detailed argument. First, note that concavity
implies that f (b) ≥ f (a). Next, the first inequality in (1) implies that for all
x ∈ [a, b],
f (b) − f (a)
f (b) − f (x) ≤ (b − x) ≤ f (b) − f (a), (4)
b−a
where the second inequality in (4) follows since f (b)−f (a) ≥ 0 and since (b−x)/(b−
a) is non-negative and has a maximum value of 1 on [a, b]. Similarly, the second
inequality in (1) implies that
f (c) − f (b)
− (f (b) − f (x)) ≤ − (b − x) ≤ f (b) − f (a), (5)
c−b
where the second inequality in (5) follows trivially if −(f (c) − f (b)) ≤ 0, since
f (b) − f (a) ≥ 0; if −(f (c) − f (b)) > 0, then the inequality follows since −f (c) ≤
−f (a) and since (b − x)/(b − a) is non-negative and has a maximum value of 1 on
[a, b]). And similar arguments obtain if x ∈ [b, c]. Combining all this gives inequality
(3).
4
To complete the proof of continuity, take any x∗ ∈ C and consider the (hyper)
cube formed by the 2N vertices of the form x∗ + (1/t)en and x∗ − (1/t)en , where en
is the unit vector for coordinate n and where t ∈ {1, 2, . . . } is large enough that this
cube lies in C. Let vt be the vertex that minimizes f across the 2N vertices. For
any x in the cube, concavity implies f (vt ) ≤ f (x). Take any x in the cube. Then for
the line segment given by the intersection of the cube with the line through x and
x∗ , inequality (3) implies, since f (vt ) is less than or equal to the minimum value of
f along this line segment, and since x∗ lies in the center of this line segment.
Since there are only a finite number of coordinates, it is possible to find a subse-
quence {vtk } lying on a single axis. Continuity in the 1-dimensional case, established
above, implies f (vtk ) → f (x∗ ). Inequality (6) then implies continuity at x∗ .
Finally, for convex f , −f is concave, hence −f is continuous, and f is continuous
iff −f is continuous.
For functions defined on non-open sets, continuity can fail at the boundary. In
particular, if the domain is a closed interval in R, then concave functions can jump
down at end points and convex functions can jump up.
Example 1. Let C = [0, 1] and define
(
−x2 if x > 0,
f (x) =
−1 if x = 0.
5
2. f is convex iff for any x∗ , x ∈ C
Proof. If f is concave then for any x, x∗ ∈ C, x 6= x∗ , and any θ ∈ (0, 1), f (θx +
(1 − θ)x∗ ) ≥ θf (x) + (1 − θ)f (x∗ ), or, dividing by θ and rearranging,
f (x∗ + θ(x − x∗ )) − f (x∗ )
f (x) − f (x∗ ) ≤ .
θ
Taking the limit of the right-hand side as θ ↓ 0 and rearranging yields inequality
(7).
Conversely, consider any a, b ∈ C, take any θ ∈ (0, 1), and let x∗ = θa + (1 − θ)b.
Note that a − x∗ = −(1 − θ)(b − a) and b − x∗ = θ(b − a). Therefore, by inequality
(7),
Multiplying the first by θ > 0 and the second by 1 − θ > 0, and adding, yields
θf (a) + (1 − θ)f (b) ≤ f (x∗ ), as was to be shown.
The proofs for inequality (8) are analogous.
If w > 0, this inequality will be violated for (x∗ , y) for any y < 0 of sufficiently large
magnitude. By definition of hypf , there will be such (x∗ , y) ∈ hypf . Therefore,
w ≤ 0. Moreover, if w = 0 (and v 6= 0 since (v, w) 6= 0), then inequality (9) will be
violated at any (x, f (x)) with x = x∗ − γv, with γ > 0 small enough that x ∈ C.
Therefore, w < 0.
Since w < 0, I can assume w = −1: inequality (9) holds for (v, w) iff it holds
for γ(v, w), for any γ > 0; take γ = 1/|w|. With (v, −1) as the supporting vector,
inequality (9) then implies, taking (x, y) = (x, f (x)) and rearranging,
If f is differentiable at x∗ , then ∇f (x∗ ) is the unique v for which (10) holds for all
x. If f is not differentiable at x∗ then there will be a continuum of vectors, called
subgradients, for which inequality (10) holds.
6
It is easy to verify that the set of subgradients is closed and convex. In the case
N = 1, a subgradient is just a number and the set of subgradients is particularly easy
to characterize. Explicitly, for any x∗ ∈ C, Theorem 3 implies that the left-hand
and right-hand derivatives at x∗ ,
f (x) − f (x∗ )
m = lim∗ ,
x↑x x − x∗
f (x) − f (x∗ )
m = lim∗ ,
x↓x x − x∗
are well defined even if f is not differentiable at x∗ . The set of subgradients at x∗
is [m, m]; if f is differentiable at x∗ then m = m = Df (x∗ ).
Subgradients play an important role in some parts of economic theory, but I will
not be pursuing them here.
For differentiable functions on R (N = 1), Theorem 3 says implies that a function
is concave iff its derivative is weakly decreasing. For twice differentiable functions,
the derivative is weakly decreasing iff the second derivative is weakly negative ev-
erywhere. The following result, Theorem 7, records this fact and generalizes it to
N > 1. Recall that a symmetric N × N matrix A is negative semi-definite iff, for
any v ∈ RN , v 0 Av ≤ 0. The matrix is negative definite iff, for any v ∈ RN , v 6= 0,
v 0 Av < 0. The definitions of positive semi-definite and positive definite are analo-
gous. As is standard practice, I write D2 f (x) for the Hessian, D2 f (x) = D(∇f )(x),
which is the N ×N matrix of second order partial derivatives. By Young’s Theorem,
if f is C 2 then D2 f (x) is symmetric.
Theorem 7. Let C ⊆ RN be non-empty, open and convex and let f : C → R be C 2 .
1. (a) D2 f (x) is negative semi-definite for every x ∈ C iff f is concave.
(b) If D2 f (x) is negative definite for every x ∈ C then f is strictly concave.
7
It remains to prove the ⇐ direction of 1(a) and 2(a). Consider the ⇐ direction
of 1(a). I argue by contraposition. Suppose that D2 f (x∗ ) > 0 for some x∗ ∈ C.
Since f is C 2 , D2 f (x) > 0 for every x in some open interval containing x∗ .
Then Theorem 3 implies that f is not concave (it is, in fact, strictly convex)
for x in this interval. The proof of the ⇐ direction of 2(a) is similar.
2. N > 1. Then D2 f (x) is a symmetric matrix. I will show 1(b). The other
cases are similar.
Suppose, therefore, that D2 f (x) is negative definite for all x ∈ C. Consider
any a, b ∈ C, b 6= a, and any θ ∈ (0, 1). Let xθ = θa + (1 − θ)b. To show 1(b),
I need to show that f (xθ ) > θf (a) + (1 − θ)f (b).
Let g(θ) = b + θ(a − b), let h(θ) = f (g(θ)) = f (b + θ(a − b)), and let v = a − b.
By the N = 1 step above, a sufficient condition for the strict concavity of h is
that D2 h(θ) < 0. The interpretation is that D2 h(θ) is the second derivative
of f , evaluated at xθ , in the direction v = a − b.
By the Chain Rule, for any θ ∈ (0, 1), Dh(θ) = Df (g(θ))Dg(θ) = Df (g(θ))v =
∇f (g(θ)) · v. Also by the Chain Rule (and the symmetry of D2 f ),
Hence,
D2 h(θ) = v 0 D2 f (xθ )v.
Say that f is differentiably strictly concave iff D2 f (x) is negative definite for
every x. If N = 1 then f is differentiably strictly concave iff D2 f (x) < 0 for every
x. The definition of differentiable strict convexity is analogous.
Example 2. Consider f : R → R, f (x) = −x4 . f is strictly concave but fails
differentiable strict concavity since D2 f (0) = 0.
Remark 3. To reiterate a point made in the proof of Theorem 7, if f is C 2 and D2 f
is negative definite at the point x∗ then D2 f (x) is negative definite for every x in
some open ball around x∗ , and hence f is strictly concave in some open ball around
x∗ . In words, if f is differentiable strictly concave at a point then it is differentiably
strictly concave near that point.
If, on the other hand, D2 f is merely negative semi-definite at x∗ then we cannot
infer anything about the concavity or convexity of f near x∗ . For example, if
f : R → R is given by f (x) = x4 then Df (0) = 0, which is negative semi-definite,
but f is not concave; it is, in fact, strictly convex.
8
4 Facts about Concave and Convex Functions.
Recall that f : RN → R is affine iff it is of the form f (x) = Ax + b for some 1 × N
matrix A and some point b ∈ R. Geometrically, the graph of a real-valued affine
function is a plane (a line, if the domain is R). An important elementary fact is that
real-valued affine functions are both concave and convex. This is consistent with
the fact that the second derivative of any affine function is the zero matrix.
Showing that other functions are concave or convex typically requires work. For
N = 1, Theorem 7 can be used to show that many standard functions are concave,
strictly concave, and so on.
Example 3. All of the following claims can be verified with a simple calculation.
1. ex is strictly convex on R,
2. ln(x) is strictly concave on R+ ,
3. 1/x is strictly convex on R++ and strictly concave on R−− ,
4. xt , where t is an integer greater than 1, is strictly convex on R+ . On R− , xt
is strictly convex for t even and strictly concave for t odd.
5. xα , where α is a real number in (0, 1) is strictly concave on R+ .
One can often verify the concavity of other, more complicated functions by
decomposing the functions into simpler pieces. The following results help do this.
Theorem 8. Let C ⊆ RN be non-empty and convex. Let f : C → R be concave.
Let D be any interval containing f (C) and let g : D → R be concave and weakly
increasing. Then h : C → R defined by h(x) = g(f (x)) is concave. Moreover, if f
is strictly concave and g is strictly increasing then h is strictly concave. Analogous
claims hold for f convex (again with g increasing).
Proof. Consider any a, b ∈ C and any θ ∈ [0, 1]. Let xθ = θa + (1 − θ)b. Since f is
concave,
f (xθ ) ≥ θf (a) + (1 − θ)f (b).
Then
where the first inequality does from the fact that f is concave and g is weakly in-
creasing and the second inequality comes from the fact that g is concave. The other
parts of the proof are essentially identical.
9
Example 4. Let the domain be R. Consider h(x) = e1/x . Let f (x) = 1/x and let
g(y) = ey . Then h(x) = g(f (x)). Function f is strictly convex and g is (strictly)
convex and strictly increasing. Therefore, by Theorem 8, h is strictly convex.
It is important in Theorem 8 that g be increasing.
2
Example 5. Let the domain by R. Consider h(x) = e−x √ . This is just the 2standard
normal density except that it is off by a factor of 1/ 2π. Let f (x) = ex and let
g(y) = 1/y. Then h(x) = g(f (x)). Now, f is convex on R++ (indeed, on R) and g
is also convex on R++ . The function h is not, however, convex. While it is strictly
convex for |x| sufficiently large, for x near zero it is strictly concave. This does not
contradict Theorem 8 because g here is decreasing.
Theorem 9. Let C ⊆ R be a interval and let f : C → R be concave and strictly
positive for all x ∈ C. Then h : C → R defined by h(x) = 1/f (x) is convex.
Proof. Follows from Theorem 8, the fact that if f is concave then −f is convex,
and the fact that the function g(x) = −1/x is convex and increasing on R− .
Since y, ĥ, and θ were arbitrary, this implies that f −1 is convex. The other results
are similar. Note in particular that if f is concave but strictly decreasing then the
last inequality above flips, and f −1 is concave.
10
Example 6. Let C = (0, ∞) and let f (x) = ln(x). This function is (strictly) concave
and strictly increasing. Its inverse is f −1 (y) = ey , which is (strictly) convex and
strictly increasing.
Example 7. Let C = (−∞, 0) and let f (x) = ln(−x). This function is (strictly)
concave and strictly decreasing. Its inverse is f −1 (y) = −ey , which is (strictly)
concave and strictly decreasing.
Proof. The claims for concavity and strict concavity are almost immediate. For
differentiable strict concavity, note that
D f1 (x∗ )
2
0 ··· 0 0
0 D2 f2 (x∗ ) · · · 0 0
D2 h(x∗ ) = .. .. .. .. ..
.
. . . . .
∗
2
0 0 · · · D fN −1 (x ) 0
0 0 ··· 0 2 ∗
D fN (x )
11
This matrix is negative definite iff the diagonal terms are all strictly negative, which
is equivalent to saying that the fn are all differentiably strictly concave.
12