Convexsol 1
Convexsol 1
Convexsol 1
Optimization
Chapter 1 Solutions
Dimitri P. Bertsekas
with
1.1
(a) Let x ∈ ∩i∈I Ci and let α be a positive scalar. Since x ∈ Ci for all i ∈ I and
each Ci is a cone, the vector αx belongs to Ci for all i ∈ I. Hence, αx ∈ ∩i∈I Ci ,
showing that ∩i∈I Ci is a cone.
(b) Let x ∈ C1 × C2 and let α be a positive scalar. Then x = (x1 , x2 ) for some
x1 ∈ C1 and x2 ∈ C2 , and since C1 and C2 are cones, it follows that αx1 ∈ C1
and αx2 ∈ C2 . Hence, αx = (αx1 , αx2 ) ∈ C1 × C2 , showing that C1 × C2 is a
cone.
(c) Let x ∈ C1 + C2 and let α be a positive scalar. Then, x = x1 + x2 for some
x1 ∈ C1 and x2 ∈ C2 , and since C1 and C2 are cones, αx1 ∈ C1 and αx2 ∈ C2 .
Hence, αx = αx1 + αx2 ∈ C1 + C2 , showing that C1 + C2 is a cone.
(d) Let x ∈ cl(C) and let α be a positive scalar. Then, there exists a sequence
{xk } ⊂ C such that xk → x, and since C is a cone, αxk ∈ C for all k. Further-
more, αxk → αx, implying that αx ∈ cl(C). Hence, cl(C) is a cone.
(e) First we prove that A·C is a cone, where A is a linear transformation and A·C
is the image of C under A. Let z ∈ A · C and let α be a positive scalar. Then,
Ax = z for some x ∈ C, and since C is a cone, αx ∈ C. Because A(αx) = αz,
the vector αz is in A · C, showing that A · C is a cone.
Next we prove that the inverse image A−1 · C of C under A is a cone. Let
−1
x ∈ A · C and let α be a positive scalar. Then Ax ∈ C, and since C is a cone,
αAx ∈ C. Thus, the vector A(αx) ∈ C, implying that αx ∈ A−1 · C, and showing
that A−1 · C is a cone.
2
1.3 (Lower Semicontinuity under Composition)
which together with the fact {xk }K → x and the lower semicontinuity of f , yields
showing that f (xk ) K → f (x). By our choice of the sequence {xk }K and the
lower semicontinuity of g, it follows that
lim g f (xk ) = lim inf g f (xk ) ≥ g f (x) ,
k→∞, k∈K k→∞, k∈K
0 if x ≤ 0,
n
f (x) =
1 if x > 0,
3
1.4 (Convexity under Composition)
Let x, y ∈ <n and let α ∈ [0, 1]. By the definitions of h and f , we have
h αx + (1 − α)y = g f αx + (1 − α)y
= g f1 αx + (1 − α)y , . . . , fm αx + (1 − α)y
≤ g αf1 (x) + (1 − α)f1 (y), . . . , αfm (x) + (1 − α)fm (y)
= g α f1 (x), . . . , fm (x) + (1 − α) f1 (y), . . . , fm (y)
≤ αg f1 (x), . . . , fm (x) + (1 − α)g f1 (y), . . . , fm (y)
= αg f (x) + (1 − α)g f (y)
= αh(x) + (1 − α)h(y),
for all x = (x1 , . . . , xn ) ∈ X. From this, direct computation shows that for all
z = (z1 , . . . , zn ) ∈ <n and x = (x1 , . . . , xn ) ∈ X, we have
n
!2 n
!
f1 (x) zi zi 2
X X
0 2
z ∇ f1 (x)z = −n .
n2 xi xi
i=1 i=1
Note that this quadratic form is nonnegative for all z ∈ <n and x ∈ X, since
f1 (x) < 0, and for any real numbers α1 , . . . , αn , we have
(α1 + · · · + αn )2 ≤ n(α12 + · · · + αn
2
),
in view of the fact that 2αj αk ≤ αj2 + αk2 . Hence, ∇2 f1 (x) is positive semidefinite
for all x ∈ X, and it follows from Prop. 1.2.6(a) that f1 is convex.
4
(b) We show that the Hessian of f2 is positive semidefinite at all x ∈ <n . Let
β(x) = ex1 + · · · + exn . Then a straightforward calculation yields
n n
1 X X (xi +xj )
z 0 ∇2 f2 (x)z = e (zi − zj )2 ≥ 0, ∀ z ∈ <n .
β(x)2
i=1 j=1
case, the g is convex and monotonically increasing in the set {t | t < 0}, while h
is convex over <n . Using Exercise 1.4, it follows that the function f4 (x) = f (x)
1
n
is convex over < .
(e) The function f5 (x) = αf (x) + β can be viewed as a composition g f (x) of
the function g(t) = αt + β, where t ∈ <, and the function f (x) for x ∈ <n . In
this case, g is convex and monotonically increasing over < (since α ≥ 0), while f
is convex over <n . Using Exercise 1.4, it follows that f5 is convex over <n .
0
(f) The function f6 (x) = eβx Ax can be viewed as a composition g f (x) of the
function g(t) = eβt for t ∈ < and the function f (x) = x0 Ax for x ∈ <n . In this
case, g is convex and monotonically increasing over <, while f is convex over <n
(since A is positive semidefinite). Using Exercise 1.4, it follows that f6 is convex
over <n .
(g) This part is straightforward using the definition of a convex function.
(a) Let x1 , x2 , x3 be three scalars such that x1 < x2 < x3 . Then we can write x2
as a convex combination of x1 and x3 as follows
x3 − x2 x2 − x1
x2 = x1 + x3 ,
x3 − x1 x3 − x1
x3 − x2 x2 − x1
f (x2 ) ≤ f (x1 ) + f (x3 ).
x3 − x1 x3 − x1
x3 − x2 x2 − x1
f (x2 ) = f (x2 ) + f (x2 ),
x3 − x1 x3 − x1
5
imply that
x3 − x2 x2 − x1
f (x2 ) − f (x1 ) ≤ f (x3 ) − f (x2 ) .
x3 − x1 x3 − x1
(b) Let {xk } be an increasing scalar sequence, i.e., x1 < x2 < x3 < · · · . Then
according to part (a), we have for all k
f (xk ) − f (xk−1 )
→ γ, (1.3)
xk − xk−1
f (xk+1 ) − f (xk )
≤ γ, ∀ k. (1.4)
xk+1 − xk
f (yj+1 ) − f (yj )
lim ≤ γ.
j→∞ yj+1 − yj
Similarly, by exchanging the roles of {xk } and {yj }, we can show that
f (yj+1 ) − f (yj )
lim ≥ γ.
j→∞ yj+1 − yj
Thus the limit in Eq. (1.3) is independent of the choice for {xk }, and Eqs. (1.2)
and (1.4) hold for any increasing scalar sequence {xk }.
6
We consider separately each of the three possibilities γ < 0, γ = 0, and
γ > 0. First, suppose that γ < 0, and let {xk } be any increasing sequence. By
using Eq. (1.4), we obtain
k−1
X f (xj+1 ) − f (xj )
f (xk ) = (xj+1 − xj ) + f (x1 )
xj+1 − xj
j=1
k−1
X
≤ γ(xj+1 − xj ) + f (x1 )
j=1
= γ(xk − x1 ) + f (x1 ),
and since γ < 0 and xk → ∞, it follows that f (xk ) → −∞. To show that f
decreases monotonically, pick any x and y with x < y, and consider the sequence
x1 = x, x2 = y, and xk = y + k for all k ≥ 3. By using Eq. (1.4) with k = 1, we
have
f (y) − f (x)
≤ γ < 0,
y−x
so that f (y) − f (x) < 0. Hence f decreases monotonically to −∞, corresponding
to case (1).
Suppose now that γ = 0, and let {xk } be any increasing sequence. Then,
by Eq. (1.4), we have f (xk+1 ) − f (xk ) ≤ 0 for all k. If f (xk+1 ) − f (xk ) < 0 for all
k, then f decreases monotonically. To show this, pick any x and y with x < y,
and consider a new sequence given by y1 = x, y2 = y, and yk = xK+k−3 for all
k ≥ 3, where K is large enough so that y < xK . By using Eqs. (1.2) and (1.4)
with {yk }, we have
implying that f (y) − f (x) < 0. Hence f decreases monotonically, and it may
decrease to −∞ or to a finite value, corresponding to cases (1) or (2), respectively.
If for some K we have f (xK+1 ) − f (xK ) = 0, then by Eqs. (1.2) and (1.4)
where γ = 0, we obtain f (xk ) = f (xK ) for all k ≥ K. To show that f stays at
the value f (xK ) for all x ≥ xK , choose any x such that x > xK , and define {yk }
as y1 = xK , y2 = x, and yk = xN +k−3 for all k ≥ 3, where N is large enough so
that x < xN . By using Eqs. (1.2) and (1.4) with {yk }, we have
so that f (x) ≤ f (xK ) and f (xN ) ≤ f (x). Since f (xK ) = f (xN ), we have
f (x) = f (xK ). Hence f (x) = f (xK ) for all x ≥ xK , corresponding to case (3).
Finally, suppose
that γ > 0, and let {xk } be any increasing sequence. Since
f (xk ) − f (xk−1 ) /(xk − xk−1 ) is nondecreasing and tends to γ [cf. Eqs. (1.3)
and (1.4)], there is a positive integer K and a positive scalar with < γ such
that
f (xk ) − f (xk−1 )
≤ , ∀ k ≥ K. (1.5)
xk − xk−1
7
Therefore, for all k > K
k−1
X f (xj+1 ) − f (xj )
f (xk ) = (xj+1 − xj ) + f (xK ) ≥ (xk − xK ) + f (xK ),
xj+1 − xj
j=K
≥ 0.
Thus, dh/dt is nondecreasing on [0, 1] and for any t ∈ (0, 1), we have
Z t Z 1
h(t) − h(0) 1 dh(τ ) 1 dh(τ ) h(1) − h(t)
= dτ ≤ h(t) ≤ dτ = .
t t 0
dτ 1−t t
dτ 1−t
Equivalently,
th(1) + (1 − t)h(0) ≥ h(t),
and from the definition of h, we obtain
tf (y) + (1 − t)f (x) ≥ f ty + (1 − t)x .
Since this inequality has been proved for arbitrary t ∈ [0, 1] and x, y ∈ C, we
conclude that f is convex.
8
1.8 (Characterization of Twice Continuously Differentiable
Convex Functions)
Suppose that f : <n 7→ < is convex over C. We first show that for all x ∈ ri(C)
and y ∈ S, we have y 0 ∇2 f (x)y ≥ 0. Assume to arrive at a contradiction, that
there exists some x ∈ ri(C) such that for some y ∈ S, we have
y 0 ∇2 f (x)y < 0.
Without loss of generality, we may assume that kyk = 1. Using the continuity of
∇2 f , we see that there is an open ball B(x, ) centered at x̄ with radius such
that B(x, ) ∩ aff(C) ⊂ C [since x ∈ ri(C)], and
1 0 2
f (x̄ + αy) = f (x̄) + α∇f (x̄)0 y + y ∇ f (x̄ + ᾱy)y,
2
for some ᾱ ∈ [0, α]. Furthermore, k(x + αy) − xk ≤ [since kyk = 1 and ᾱ < ].
Hence, from Eq. (1.7), it follows that
On the other hand, by the choice of and the assumption that y ∈ S, the vectors
x̄ + αy are in C for all α with α ∈ [0, ), which is a contradiction in view of
the convexity of f over C. Hence, we have y 0 ∇2 f (x)y ≥ 0 for all y ∈ S and all
x ∈ ri(C).
Next, let x be a point in C that is not in the relative interior of C. Then, by
the Line Segment Principle, there is a sequence {xk } ⊂ ri(C) such that xk → x.
As seen above, y 0 ∇2 f (xk )y ≥ 0 for all y ∈ S and all k, which together with the
continuity of ∇2 f implies that
for some α ∈ [0, 1]. Since x, z ∈ C, we have that (z − x) ∈ S, and using the
convexity of C and our assumption, it follows that
9
1.9 (Strong Convexity)
n
(a) Fix some x, y ∈ < such that x 6= y, and define the function h : < 7→ < by
h(t) = f x + t(y − x) . Consider scalars t and s such that t < s. Using the chain
rule and the equation
0
∇f (x) − ∇f (y) (x − y) ≥ αkx − yk2 , ∀ x, y ∈ <n , (1.8)
Thus, dh/dt is strictly increasing and for any t ∈ (0, 1), we have
Z t Z 1
h(t) − h(0) 1 dh(τ ) 1 dh(τ ) h(1) − h(t)
= dτ < dτ = .
t t 0
dτ 1−t t
dτ 1−t
c2 0 2
f (x + cy) = f (x) + cy 0 ∇f (x) + y ∇ f (x + tcy)y,
2
and
c2 0 2
f (x) = f (x + cy) − cy 0 ∇f (x + cy) +y ∇ f (x + scy)y,
2
for some t and s belonging to [0, 1]. Adding these two equations and using Eq.
(1.8), we obtain
c2 0 2 0
y ∇ f (x + scy) + ∇2 f (x + tcy) y = ∇f (x + cy) − ∇f (x) (cy) ≥ αc2 kyk2 .
2
We divide both sides by c2 and then take the limit as c → 0 to conclude that
y 0 ∇2 f (x)y ≥ αkyk2 . Since this inequality is valid for every y ∈ <n , it follows
that ∇2 f (x) − αI is positive semidefinite.
For the converse, assume that ∇2 f (x) − αI is positive semidefinite for all
n
x ∈ < . Consider the function g : < 7→ < defined by
0
g(t) = ∇f tx + (1 − t)y (x − y).
10
for some t ∈ [0, 1]. On the other hand,
dg(t)
= (x − y)0 ∇2 f tx + (1 − t)y (x − y) ≥ αkx − yk2 ,
dt
where the last inequality holds because ∇2 f tx+(1−t)y −αI is positive semidef-
inite. Combining the last two relations, it follows that f is strongly convex with
coefficient α.
1.10 (Posynomials)
where gk is a posynomial and γk > 0 for all k. Using a change of variables similar
to part (b), we see that we can represent the function f (x) = ln g(y) as
r
X
f (x) = γk ln exp(Ak x + bk ),
k=1
with the matrix Ak and the vector bk being associated with the posynomial gk for
each k. Since f (x) is a linear combination of convex functions with nonnegative
coefficients [part (b)], it follows from Prop. 1.2.4(a) that f (x) is convex.
11
1.11 (Arithmetic-Geometric Mean Inequality)
Consider the function f (x) = − ln(x). Since ∇2 f (x) = 1/x2 > 0 for all x > 0, the
function − ln(x) is strictly convex Pn over (0, ∞). Therefore, for all positive scalars
x1 , . . . , xn and α1 , . . . αn with α = 1, we have
i=1 i
which is equivalent to
eln(α1 x1 +···+αn xn ) ≥ eα1 ln(x1 )+···+αn ln(xn ) = eα1 ln(x1 ) · · · eαn ln(xn ) ,
or
α
α1 x1 + · · · + αn xn ≥ x1 1 · · · xα n
n ,
as desired. Since − ln(x) is strictly convex, the above inequality is satisfied with
equality if and only if the scalars x1 , . . . , xn are all equal.
where 1/p + 1/q = 1, p > 0, and q > 0. The above relation also holds if u = 0 or
v = 0. By setting u = xp and v = y q , we obtain Young’s inequality
xp yq
xy ≤ + , ∀ x ≥ 0, ∀ y ≥ 0.
p q
To show Holder’s inequality, note that it holds if x1 = · · · = xn = 0 or
y1 = · · · = yn = 0. If x1 , . . . , xn and y1 , . . . , yn are such that (x1 , . . . , xn ) 6= 0
and (y1 , . . . , yn ) 6= 0, then by using
|xi | |yi |
x= 1/p and y= 1/q
P n P n
j=1
|xj |p j=1
|yj |q
|xi | |yi | i |x |p |y |q
P 1/p P 1/q ≤ Pn + P i .
n n n
|xj |p |yj |q p j=1
|xj |
p q j=1
|yj |q
j=1 j=1
12
1.13
Let (x, w) and (y, v) be two vectorsin epi(f ). Then f (x) ≤ w and f (y) ≤ v,
implying that there exist sequences (x, wk ) ⊂ C and (y, v k ) ⊂ C such that
for all k,
1 1
wk ≤ w + , vk ≤ v + .
k k
By the convexity of C, we have for all α ∈ [0, 1] and all k,
αx + (1 − αy), αwk + (1 − α)v k ∈ C,
1.14
1.15
Pm xi ∈ C. Since
for some positive integer m, nonnegative scalars λi , and vectors
y 6= 0, we cannot have all λi equal to zero, implying that λ > 0. Because
i=1 i
xi ∈ C for all i and C is convex, the vector
m
X λ
x= Pm i xi
j=1
λj
i=1
13
belongs to C. For this vector, we have
m
!
X
y= λi x,
i=1
Pm
with i=1
λi > 0, implying that y ∈ ∪x∈C γx | γ ≥ 0} and showing that
showing that λx ∈ C and that C is a cone. Let x, y ∈ C and let λ ∈ [0, 1]. Then
1 1
x= x1 + x2 ,
2 2
14
By taking the convex hull of both sides in the above inclusion and by using the
convexity of C1 + C2 , we obtain
conv(C1 ∪ C2 ) ⊂ conv(C1 + C2 ) = C1 + C2 .
αC1 ∩ (1 − α)C2 = C1 ∩ C2 .
1.17
α1 Ci1 + · · · + αm Cim ,
with
m
X
αi ≥ 0, ∀ i = 1, . . . , m, αi = 1,
i=1
15
1.18 (Convex Hulls, Affine Hulls, and Generated Cones)
(a) We first show that X and cl(X) have the same affine hull. Since X ⊂ cl(X),
there holds
aff(X) ⊂ aff cl(X) .
Conversely, because X ⊂ aff(X) and aff(X) is closed, we have cl(X) ⊂ aff(X),
implying that
aff cl(X) ⊂ aff(X).
We now show that X and conv(X) have the same affine hull. By using a
translation argument if necessary, we assume without loss of generality that X
contains the origin, so that both aff(X) and aff conv(X) are subspaces. Since
X ⊂ conv(X), evidently aff(X) ⊂ aff conv(X) . To show the reverse inclusion,
let the dimension of aff conv(X) be m, and let x1 , . . . , xm be linearly indepen-
dent vectors in conv(X) that span aff conv(X) . Then every x ∈ aff conv(X) is
a linear combination of the vectors x1 , . . . , xm , i.e., there exist scalars β1 , . . . , βm
such that
m
X
x= βi xi .
i=1
As an example showing that the above inclusion can be strict, consider the
set X = (1, 1) in <2 . Then conv(X) = X, so that
aff conv(X) = X = (1, 1) ,
and the dimension of conv(X) is zero. On the other hand, cone(X) = (α, α) |
α ≥ 0 , so that
aff cone(X) = (x1 , x2 ) | x1 = x2 ,
16
and the dimension of cone(X) is one.
(d) In view of parts (a) and (c), it suffices to show that
aff cone(X) ⊂ aff conv(X) = aff(X).
It is always true that 0 ∈ cone(X), so aff cone(X) is a subspace. Let the
dimension of aff cone(X) be m, and let x1 , . . . , xm be linearly independent
vectors in cone(X) that span aff cone(X) . Since every vector in aff cone(X) is
a linear combination of x1 , . . . , xm , and since each xi is a nonnegative combination
of some vectors in X, it follows that every vector in aff cone(X) is a linear
combination of some vectors in X. In view of the assumption that 0 ∈ conv(X),
the affine hull of conv(X) is a subspace, which implies by part (a) that the affine
hull of X is a subspace. Hence, every vector in aff cone(X) belongs to aff(X),
showing that aff cone(X) ⊂ aff(X).
1.19
By definition, f (x) is the infimum of the values of w such that (x, w) ∈ C, where
C is the convex hull of the union of nonempty convex sets epi(fi ). We have that
(x, w) ∈ C if and only if (x, w) can be expressed as a convex combination of the
form
X X X
(x, w) = αi (xi , wi ) = αi xi , αi wi ,
i∈I i∈I i∈I
where I ⊂ I is a finite set and (xi , wi ) ∈ epi(fi ) for all i ∈ I. Thus, f (x) can be
expressed as
(
X X
f (x) = inf αi wi (x, w) = αi (xi , wi ),
i∈I i∈I
)
X
(xi , wi ) ∈ epi(fi ), αi ≥ 0, ∀ i ∈ I, αi = 1 .
i∈I
Since the set xi , fi (xi ) | xi ∈ <n is contained in epi(fi ), we obtain
X X X
f (x) ≤ inf αi fi (xi ) x= αi xi , xi ∈ <n , αi ≥ 0, ∀ i ∈ I, αi = 1 .
i∈I i∈I i∈I
On the other hand, by the definition of epi(fi ), for each (xi , wi ) ∈ epi(fi ) we
have wi ≥ fi (xi ), implying that
X X X
f (x) ≥ inf αi fi (xi ) x= αi xi , xi ∈ <n , αi ≥ 0, ∀ i ∈ I, αi = 1 .
i∈I i∈I i∈I
17
By combining the last two relations, we obtain
X X X
f (x) = inf αi fi (xi ) x= αi xi , xi ∈ <n , αi ≥ 0, ∀ i ∈ I, αi = 1 ,
i∈I i∈I i∈I
On the other hand, by the definition of epi(f ), for each (xi , wi ) ∈ epi(f ) we have
wi ≥ f (xi ), implying that
(
X X
F (x) ≥ inf αi f (xi ) (x, w) = αi (xi , wi ),
i i
)
X
(xi , wi ) ∈ epi(f ), αi ≥ 0, αi = 1 ,
i
( )
X X X
= inf αi f (xi ) x= αi xi , xi ∈ X, αi ≥ 0, αi = 1 ,
i i i
which combined with the preceding inequality implies the desired relation.
(b) By using part (a), we have for every x ∈ X
F (x) ≤ f (x),
18
P
since f (x) corresponds to the value of the function α f (xi ) for a particular
i i
representation of x as a finite convex combination of elements of X, namely
x = 1 · x. Therefore, we have
Let f ∗ = inf x∈X f (x). If inf x∈conv(X) F (x) < f ∗ , then there exists z ∈
conv(X) with F (z) < f ∗ . According
P Pexist points xi ∈ X and
to part (a), there
nonnegative scalars αi with α = 1 such that z = i αi xi and
i i
X
F (z) ≤ αi f (xi ) < f ∗ ,
i
implying that
X
αi f (xi ) − f ∗ < 0.
i
Since each αi is nonnegative, for this inequality to hold, we must have f (xi )−f ∗ <
0 for some i, but this cannot be true because xi ∈ X and f ∗ is the optimal value
of f over X. Therefore
19
and
( )
X 0
X X
F (x) = inf αi c xi αi xi = x, xi ∈ X, αi = 1, αi ≥ 0
i i i
( ! )
X X X
= inf c0 αi xi αi xi = x, xi ∈ X, αi = 1, αi ≥ 0
i i i
= c0 x,
showing that
inf c0 x = inf c0 x.
x∈conv(X) x∈X
20
for some positive scalars α1 , . . . , αm and vectors
k m k
X X X
x= αi xi + αi yi , 1= αi .
i=1 i=k+1 i=1
k
X m
X
λi (xi , 1) + λi (yi , 0) = 0,
i=1 i=k+1
1.23
cl conv(X) ⊂ cl conv cl(X) = conv cl(X) .
conv cl(X) ⊂ conv cl conv(X) = cl conv(X) ,
since by Prop. 1.2.1(d), the closure of a convex set is convex. Hence, the result
follows.
21
1.24 (Radon’s Theorem)
m m
X X
λi xi = 0, λi = 0.
i=1 i=1
where
λ∗i
αi = P , i ∈ I.
k∈I
λ∗k
Pm Pm
In view of the equations i=1
λ∗i xi = 0 and i=1
λ∗i = 0, we also have
X
x∗ = αj xj ,
j∈J
where
−λ∗j
αj = P , j ∈ J.
k∈J
(−λ∗k )
Given four distinct points in the plane (i.e., m = 4 and n = 2), Radon’s
Theorem guarantees the existence of a partition into two subsets, the convex
hulls of which intersect. Assuming, there is no subset of three points lying on the
same line, there are two possibilities:
(1) Each set in the partition consists of two points, in which case the convex
hulls intesect and define the diagonals of a quadrilateral.
22
(2) One set in the partition consists of three points and the other consists of one
point, in which case the triangle formed by the three points must contain
the fourth.
In the case where three of the points define a line segment on which they lie,
and the fourth does not, the triangle formed by the two ends of the line segment
and the point outside the line segment form a triangle that contains the fourth
point. In the case where all four of the points lie on a line segment, the degenerate
triangle formed by three of the points, including the two ends of the line segment,
contains the fourth point.
Consider the induction argument of the hint, let Bj be defined as in the hint,
and for each j, let xj be a vector in Bj . Since M + 1 ≥ n + 2, we can apply
Radon’s Theorem to the vectors x1 , . . . , xM +1 . Thus, there exist nonempty and
disjoint index subsets I and J such that I ∪ J = {1, . . . , M + 1}, nonnegative
scalars α1 , . . . , αM +1 , and a vector x∗ such that
X X X X
x∗ = αi xi = αj xj , αi = αj = 1.
i∈I j∈J i∈I j∈J
1.26
Assume the contrary, i.e., that for every index set I ⊂ {1, . . . , M }, which contains
no more than n + 1 indices, we have
n o
infn max fi (x) < f ∗.
x∈< i∈I
This means that for every such I, the intersection ∩i∈I Xi is nonempty, where
Xi = x | fi (x) < f ∗ .
This contradicts the definition of f ∗ . Note: The result of this exercise relates to
the following question: what is the minimal number of functions fi that we need
to include in the cost function maxi fi (x) in order to attain the optimal value f ∗ ?
According to the result, the number is no more than n + 1. For applications of
this result in structural design and Chebyshev approximation, see Ben Tal and
Nemirovski [BeN01].
23
1.27
αf (x) + (1 − α)f (x) ≥ f αx + (1 − α)x ≥ γ.
1.28
From Prop. 1.4.5(b), we have that for any vector a ∈ <n , ri(C + a) = ri(C) + a.
Therefore, we can assume without loss of generality that 0 ∈ C, and aff(C)
coincides with S. We need to show that
ri(C) = int(C + S ⊥ ) ∩ C.
Let x ∈ ri(C). By definition, this implies that x ∈ C and there exists some
open ball B(x, ) centered at x with radius > 0 such that
B(x, ) ∩ S ⊂ C. (1.9)
x + αyS ∈ B(x, ) ∩ S ⊂ C,
B(x, ) ∩ S ⊂ C,
24
1.29
(a) Let C be the given convex set. The convex hull of any subset of C is contained
in C. Therefore, the maximum dimension of the various simplices contained in
C is the largest m for which C contains m + 1 vectors x0 , . . . , xm such that
x1 − x0 , . . . , xm − x0 are linearly independent.
Let K = {x0 , . . . , xm } be such a set with m maximal,
and let aff(K) denote
the affine hull of set K. Then, we have dim aff(K) = m, and since K ⊂ C, it
follows that aff(K) ⊂ aff(C).
We claim that C ⊂ aff(K). To see this, assume that there exists some
x ∈ C, which does not belong to aff(K). This implies that the set {x, x0 , . . . , xm }
is a set of m + 2 vectors in C such that x − x0 , x1 − x0 , . . . , xm − x0 are linearly
independent, contradicting the maximality of m. Hence, we have C ⊂ aff(K),
and it follows that
aff(K) = aff(C),
thereby implying that dim(C) = m.
(b) We first consider the case where C is n-dimensional with n > 0 and show that
the interior of C is not empty. By part (a), an n-dimensional convex set contains
an n-dimensional simplex. We claim that such a simplex S has a nonempty
interior. Indeed, applying an affine transformation if necessary, we can assume
that the vertices of S are the vectors (0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (0, 0, . . . , 1),
i.e., ( )
n
X
S= (x1 , . . . , xn ) xi ≥ 0, ∀ i = 1, . . . , n, xi ≤ 1 .
i=1
ri(C) = int(C + S ⊥ ) ∩ C,
1.30
(a) Let C1 be the segment (x1 , x2 ) | 0 ≤ x1 ≤ 1, x2 = 0 and let C2 be the box
(x1 , x2 ) | 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1 . We have
ri(C1 ) = (x1 , x2 ) | 0 < x1 < 1, x2 = 0 ,
25
ri(C2 ) = (x1 , x2 ) | 0 < x1 < 1, 0 < x2 < 1 .
ri(C1 ) = ri(C1 ∩ C2 ).
1.31
(a) Let x ∈ ri(C). We will show that for every x ∈ aff(C), there exists a γ > 1
such that x + (γ − 1)(x − x) ∈ C. This is true if x = x, so assume that x 6= x.
Since x ∈ ri(C), there exists > 0 such that
z | kz − xk < ∩ aff(C) ⊂ C.
Choose a point x ∈ C in the intersection of the ray x + α(x − x) | α ≥ 0 and
the set z | kz − xk < ∩ aff(C). Then, for some positive scalar α ,
x − x = α (x − x).
x + (γ − 1)(x − x ) ∈ C,
x + (γ − 1)α (x − x) ∈ C.
26
The result follows by letting γ = 1 + (γ − 1)α and noting that γ > 1, since
(γ − 1)α > 0. The converse assertion follows from the fact C ⊂ aff(C) and
Prop. 1.4.1(c).
(b) The inclusion cone(C) ⊂ aff(C) always holds if 0 ∈ C. To show the reverse
inclusion, we note that by part (a) with x = 0, for every x ∈ aff(C), there exists
γ > 1 such that x̃ = (γ − 1)(−x) ∈ C. By using part (a) again with x = 0, for
x̃ ∈ C ⊂ aff(C), we see that there is γ̃ > 1 such that z = (γ̃ − 1)(−x̃) ∈ C, which
combined with x̃ = (γ − 1)(−x) yields z = (γ̃ − 1)(γ − 1)x ∈ C. Hence
1
x= z
(γ̃ − 1)(γ − 1)
with z ∈ C and (γ̃ − 1)(γ − 1) > 0, implying that x ∈ cone(C) and, showing that
aff(C) ⊂ cone(C).
(c) This follows by part (b), where C = conv(X), and the fact
cone conv(X) = cone(X)
[Exercise 1.18(b)].
1.32
We have inf m≥0 kxm k > 0, since {xk } ⊂ C and C is a compact set not containing
the origin, so that
supm≥0 kym k
0 ≤ αk ≤ < ∞, ∀ k.
inf m≥0 kxm k
Thus, the sequence {(αk , xk )} is bounded and has a limit point (α, x) such that
α ≥ 0 and x ∈ C. By taking a subsequence of {(αk , xk )} that converges to (α, x),
and by using the facts yk = αk xk for all k and {yk } → y, we see that y = αx
with α ≥ 0 and x ∈ C. Hence, y ∈ cone(C), showing that cone(C) is closed.
(b) To
see that the assertion in part (a) fails when C is unbounded, let C be the
line (x1 , x2 ) | x1 = 1, x2 ∈ < in <2 not passing through the origin. Then,
cone(C) is the nonclosed set (x1 , x2 ) | x1 > 0, x2 ∈ < ∪ (0, 0) .
To see that the assertion in part (a) fails
when C contains the origin on its
relative boundary, let C be the closed ball (x1 , x2 ) | (x1 − 1)2 + x22 ≤ 1 in <2 .
27
Then, cone(C) is the nonclosed set (x1 , x2 ) | x1 > 0, x2 ∈ < ∪ (0, 0) (see
Fig. 1.3.2).
(c) Since C is compact, the convex hull of C is compact (cf. Prop. 1.3.2). Because
conv(C) does not contain the origin on its relative boundary, by part (a),
the cone
generated by conv(C) is closed. By Exercise 1.18(b), cone conv(C) coincides
with cone(C) implying that cone(C) is closed.
1.33
(a) By Prop. 1.4.1(b), the relative interior of a convex set is a convex set. We
only need to show that ri(C) is a cone. Let y ∈ ri(C). Then, y ∈ C and since C
is a cone, αy ∈ C for all α > 0. By the Line Segment Principle, all the points on
the line segment connecting y and αy, except possibly αy, belong to ri(C). Since
this is true for every α > 0, it follows that αy ∈ ri(C) for all α > 0, showing that
ri(C) is a cone.
(b) Consider the linear transformation A that maps (α1 , . . . , αm ) ∈ <m into
P m n
i=1
αi xi ∈ < . Note that C is the image of the nonempty convex set
(α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0
1.34
Let T be the linear transformation that maps (x, y) ∈ <n+m into x ∈ <n . Then
it can be seen that
A−1 · C = T · (D ∩ S). (1.10)
The relative interior of D is given by ri(D) = <n × ri(C), and the relative interior
of S is equal to S (since S is a subspace). Hence,
28
In view of the assumption that A−1 · ri(C) is nonempty, we have that the in-
tersection ri(D) ∩ S is nonempty. Therefore, it follows from Props. 1.4.3(d) and
1.4.5(a) that
ri T · (D ∩ S) = T · ri(D) ∩ S . (1.12)
Combining Eqs. (1.10)-(1.12), we obtain
Since the intersection ri(D) ∩ S is nonempty, it follows from Prop. 1.4.5(a) that
cl(D) ∩ S = cl(D ∩ S). Furthermore, since T is continuous, we obtain
To show the reverse inclusion, cl(A−1 · C) ⊂ A−1 · cl(C), let x be some vector in
cl(A−1 · C). This implies that there exists some sequence {xk } converging to x
such that Axk ∈ C for all k. Since xk converges to x, we have that Axk converges
to Ax, thereby implying that Ax ∈ cl(C), or equivalently, x ∈ A−1 · cl(C).
(a) Let g : <n 7→ [−∞, ∞] be such that g(x) ≤ f (x) for all x ∈ <n . Choose
any x ∈ dom(cl f ). Since epi(cl f ) = cl epi(f ) , we can choose a sequence
(xk , wk ) ∈ epi(f ) such that xk → x, wk → (cl f )(x). Since g is lower semicon-
tinuous at x, we have
g(x) ≤ lim inf g(xk ) ≤ lim inf f (xk ) ≤ lim inf wk = (cl f )(x).
k→∞ k→∞ k→∞
Note also that since epi(f ) ⊂ epi(cl f ), we have (cl f )(x) ≤ f (x) for all x ∈ <n .
(b) For the proof of this part and the next, we will use the easily shown fact that
for any convex function f , we have
ri epi(f ) = (x, w) | x ∈ ri dom(f ) , f (x) < w .
Let x ∈ ri dom(f ) , and consider the vertical line L = (x, w) | w ∈ < .
Then there exists ŵ such that (x, ŵ) ∈ L∩ri epi(f ) . Let w be such that (x, w) ∈
L ∩ cl epi(f ) . Then, by Prop. 1.4.5(a), we have L ∩ cl epi(f ) = cl L ∩ epi(f ) ,
so that (x, w) ∈ cl L ∩ epi(f ) . It follows from the Line Segment Principle that
the vector x, ŵ + α(w − ŵ) belongs to epi(f ) for all α ∈ [0, 1). Taking the
29
limit as α → 1, we see that f (x) ≤ w for all w such that (x, w) ∈ L ∩ cl epi(f ) ,
implying that f (x) ≤ (cl f )(x). On the other hand, since epi(f ) ⊂ epi(cl f ), we
have (cl f )(x) ≤ f (x) for all x ∈ <n , so f (x) = (cl f )(x).
We know that a closed convex function that is improper cannot take a finite
value at any point. Since cl f is closedand convex, and takes a finite value at all
points of the nonempty set ri dom(f ) , it follows that cl f must be proper.
(c) Since the function cl f is closed and is majorized by f , we have
(cl f )(y) ≤ lim inf (cl f ) y + α(x − y) ≤ lim inf f y + α(x − y) .
α↓0 α↓0
To show the reverse inequality, let w be such that f (x) < w. Then, (x, w) ∈
ri epi(f ) , while y, (cl f )(y) ∈ cl epi(f ) . From the Line Segment Principle, it
follows that
αx + (1 − α)y, αw + (1 − α)(cl f )(y) ∈ ri epi(f ) , ∀ α ∈ (0, 1].
Hence,
f αx + (1 − α)y < αw + (1 − α)(cl f )(y), ∀ α ∈ (0, 1].
By taking the limit as α → 0, we obtain
lim inf f y + α(x − y) ≤ (cl f )(y),
α↓0
ri dom(f ) = ∩m
i=1 ri dom(fi ) ,
it follows that x ∈ ri dom(f ) . By using part (c), we have for every y ∈ dom(cl f ),
m
X m
X
(cl f )(y) = lim f y + α(x − y) = lim fi y + α(x − y) = (cl fi )(y).
α↓0 α↓0
i=1 i=1
1.36
30
1.37 (Properties of Cartesian Products)
(a) We first show that the convex hull of X is equal to the Cartesian product of
the convex hulls of the sets Xi , i = 1, . . . , m. Let y be a vector that belongs to
conv(X). Then, by definition, for some k, we have
k k
X X
y= αi yi , with αi ≥ 0, i = 1, . . . , m, αi = 1,
i=1 i=1
where yi ∈ X for all i. Since yi ∈ X, we have that yi = (xi1 , . . . , xim ) for all i,
with xi1 ∈ X1 , . . . , xim ∈ Xm . It follows that
k k k
!
X X X
y= αi (xi1 , . . . , xim ) = αi xi1 , . . . , αi xim ,
i=1 i=1 i=1
(x11 , x2r1 , . . . , xm 1 2 m 1 2 m
rm−1 ), (x2 , xr1 , . . . , xrm−1 ), . . . , (xki , xr1 , . . . , xrm−1 ),
for all possible values of r1 , . . . , rm−1 , i.e., we fix all components except the
first one, and vary the first component over all possible x1j ’s used in the convex
combination that yields y1 . Since all these vectors belong to X, their convex
combination given by
k1
!
X
αj1 x1j , x2r1 , . . . , xm
rm−1
j=1
belongs to the convex hull of X for all possible values of r1 , . . . , rm−1 . Now,
consider the vectors
k1 k1
! !
X X
αj1 x1j , x21 , . . . , xm
rm−1 ,..., αj1 x1j , x2k2 , . . . , xm
rm−1 ,
j=1 j=1
i.e., fix all components except the second one, and vary the second component
over all possible x2j ’s used in the convex combination that yields y2 . Since all
these vectors belong to conv(X), their convex combination given by
k1 k2
!
X X
αj1 x1j , αj2 x2j , . . . , xm
rm−1
j=1 j=1
31
belongs to the convex hull of X for all possible values of r2 , . . . , rm−1 . Proceeding
in this way, we see that the vector given by
k1 k2 km
!
X X X
αj1 x1j , αj2 x2j , . . . , αjm xm
j
j=1 j=1 j=1
xij ∈ Xj . Thus,
r r r
!
X i
X X
y= β (xi1 , . . . , xim ) = β i xi1 , . . . , β i xim ,
i=1 i=1 i=1
r1 rm
!
X X
y= β1j xj1 , . . . , j j
βm xm .
j=1 j=1
belong to aff(X), and so does their sum, which is the vector y. Thus, y ∈ aff(X),
concluding the proof.
32
(b) Assume that y ∈ cone(X). We can represent y as
r
X
y= αi y i ,
i=1
for some r, where α1 , . . . , αr are nonnegative scalars and yi ∈ X for all i. Since
y i ∈ X, we have that y i = (xi1 , . . . , xim ) with xij ∈ Xj . Thus,
r r r
!
X i
X X
y= α (xi1 , . . . , xim ) = αi xi1 , . . . , αi xim ,
i=1 i=1 i=1
where xji ∈ Xi and αij ≥ 0 for each i and j. Since each Xi contains the origin,
we have that the vectors
r1 r2
! ! rm
!
X X X
α1j xj1 , 0, . . . , 0 , 0, α2j xj2 , 0, . . . , 0 ..., 0, . . . , j
αm xjm ,
j=1 j=1 j=1
belong to the cone(X), and so does their sum, which is the vector y. Thus,
y ∈ cone(X), concluding the proof.
Finally, consider the example where
Let x = (x1 , . . . , xm ) ∈ ri(X). Then, by Prop. 1.4.1 (c), we have that for all
x = (x1 , . . . , xm ) ∈ X, there exists some γ > 1 such that
x + (γ − 1)(x − x) ∈ X.
xi + (γ − 1)(xi − xi ) ∈ Xi ,
33
which, by Prop. 1.4.1(c), implies that xi ∈ ri(Xi ), i.e., x ∈ ri(X1 ) × · · · × ri(Xm ).
Conversely, let x = (x1 , . . . , xm ) ∈ ri(X1 ) × · · · × ri(Xm ). The above argument
can be reversed through the use of Prop. 1.4.1(c), to show that x ∈ ri(X). Hence,
the result follows.
Finally, let us show that
RX = RX1 × · · · × RXm .
Let y = (y1 , . . . , ym ) ∈ RX . By definition, this implies that for all x ∈ X and
α ≥ 0, we have x + αy ∈ X. From this, it follows that for all xi ∈ Xi and α ≥ 0,
xi + αyi ∈ Xi , so that yi ∈ RXi , implying that y ∈ RX1 × · · · × RXm . Conversely,
let y = (y1 , . . . , ym ) ∈ RX1 × · · · × RXm . By definition, for all xi ∈ Xi and α ≥ 0,
we have xi + αyi ∈ Xi . From this, we get for all x ∈ X and α ≥ 0, x + αy ∈ X,
thus showing that y ∈ RX .
34
1.39 (Recession Cones of Relative Interiors)
where the equalities follow from part (a) and the assumption that C = ri(C).
To see that the inclusion RC ⊂ RC can fail when C 6= ri(C), consider the
sets
C = (x1 , x2 ) | x1 ≥ 0, 0 < x2 < 1 , C = (x1 , x2 ) | x1 ≥ 0, 0 ≤ x2 < 1 ,
1.40
C k+1 ⊂ C k , ∀ k,
showing that assumption (1) of Prop. 1.5.6 is satisfied. Similarly, since by as-
sumption Xk ∩ Ck is nonempty for all k, we have that, for all k, the set
X ∩ C k = X ∩ Xk ∩ Ck = Xk ∩ Ck ,
35
is nonempty, showing that assumption (2) is satisfied. Finally, let R denote the
set R = ∩∞ k=0 RC . Since by assumption C k is nonempty for all k, we have, by
k
part (e) of the Recession Cone Theorem, that RC = RXk ∩ RCk implying that
k
R = ∩∞
k=0 RC
k
∞
= ∩k=0 (RXk ∩ RCk )
∩∞ ∩ ∩∞
= k=0 RXk k=0 RCk
= RX ∩ RC .
RX ∩ R = RX ∩ RC ⊂ LC ,
RX ∩ R ⊂ LC ∩ LX = L,
showing that assumption (3) of Prop. 1.5.6 is satisfied, and thus proving that the
intersection X ∩ (∩∞
k=0 C k ) is nonempty.
1.41
is closed. Since A·C ⊂ A·cl(C) and y ∈ cl(A·C), it follows that y is in the closure
of A · cl(C), so that
C is nonempty for every > 0. Furthermore, the recession
cone of the set x | kAx − yk ≤ coincides with the null space N (A), so that
RC = Rcl(C) ∩ N (A). By assumption we have Rcl(C) ∩ N (A) = {0}, and by part
(c) of the Recession Cone Theorem (cf. Prop. 1.5.1), it follows that C is bounded
for every > 0. Now, since the sets C are nested nonempty compact sets, their
intersection ∩>0 C is nonempty. For any x in this intersection, we have x ∈ cl(C)
and Ax − y = 0, showing that y ∈ A · cl(C). Hence, cl(A · C) ⊂ A · cl(C). The
converse A · cl(C) ⊂ cl(A · C) is clear, since for any x ∈ cl(C) and sequence
{xk } ⊂ C converging to x, we have Axk → Ax, showing that Ax ∈ cl(A · C).
Therefore,
cl(A · C) = A · cl(C). (1.13)
36
Conversely, let y ∈ RA·cl(C) . We will show that y ∈ A · Rcl(c) . This is true
if y = 0, so assume that y 6= 0. By definition of direction of recession, there is a
vector z ∈ A · cl(C) such that z + αy ∈ A · cl(C) for every α ≥ 0. Let x ∈ cl(C) be
such that Ax = z, and for every positive integer k, let xk ∈ cl(C) be such that
Axk = z + ky. Since y 6= 0, the sequence {Axk } is unbounded, implying that
{xk } is also unbounded (if {xk } were bounded, then {Axk } would be bounded,
a contradiction). Because xk 6= x for all k, we can define
xk − x
uk = , ∀ k.
kxk − xk
Let u be a limit point of {uk }, and note that u 6= 0. It can be seen that
u is a direction of recession of cl(C) [this can be done similar to the proof of
part (c) of the Recession Cone Theorem (cf. Prop. 1.5.1)]. By taking an appro-
priate subsequence if necessary, we may assume without loss of generality that
limk→∞ uk = u. Then, by the choices of uk and xk , we have
Axk − Ax k
Au = lim Auk = lim = lim y,
k→∞ k→∞ kxk − xk k→∞ kxk − xk
and the linear transformation A that maps (x1 , x2 ) ∈ <2 into x1 ∈ <. Then, C
is closed and its recession cone is
RC = (x1 , x2 ) | x1 = 0, x2 ≥ 0 ,
1.42
Let S be defined by
S = Rcl(C) ∩ N (A),
and note that S is a subspace of Lcl(C) by the given assumption. Then, by Lemma
1.5.4, we have
cl(C) = cl(C) ∩ S ⊥ + S,
37
so that the images of cl(C) and cl(C) ∩ S ⊥ under A coincide [since S ⊂ N (A)],
i.e.,
A · cl(C) = A · cl(C) ∩ S ⊥ .
(1.14)
Because A · C ⊂ A · cl(C), we have
cl(A · C) ⊂ cl A · cl(C) ,
which in view of Eq. (1.14) gives
cl(A · C) ⊂ cl A · cl(C) ∩ S ⊥ .
Define
C = cl(C) ∩ S ⊥
so that the preceding relation becomes
cl(A · C) ⊂ cl(A · C). (1.15)
The recession cone of C is given by
RC = Rcl(C) ∩ S ⊥ , (1.16)
[cf. part (e) of the Recession Cone Theorem, Prop. 1.5.1], for which, since S =
Rcl(C) ∩ N (A), we have
RC ∩ N (A) = S ∩ S ⊥ = {0}.
38
1.43 (Recession Cones of Vector Sums)
39
1.44
where the Qij are appropriately defined symmetric positive semidefinite mn×mn
matrices and the aij are appropriately defined vectors in <mn . Hence, the set C
is specified by convex quadratic inequalities. Thus, we can use Prop. 1.5.8(c) to
assert that the set AC = C1 + · · · + Cm is closed.
Helly’s Theorem implies that the sets C k defined in the hint are nonempty. These
sets are also nested and satisfy the assumptions of Props. 1.5.5 and 1.5.6. There-
fore, the intersection ∩∞
i=1 C i is nonempty. Since
∩∞ ∞
i=1 C i ⊂ ∩i=1 Ci ,
40