Convex Analysis and Appl
Convex Analysis and Appl
Translations of
MATHEMATICAL
Volume 222
MONOGRAPHS
~~$
!~"'r::"::·:;\
• OUNOEO \",,,,'to.
AMS Subcommittee
D. MacPherson Grigorii A. Margulis James D. Stasheff (Chair) ASL Subcommittee Steffen Lempp (Chair) IMS Subcommittee Mark I. Freidlin (Chair) Robert
r. r.
BbIIIYKJIbIM
MarapHJI-MJIhReB,
YPCC,
B. M. THXOMHPOB M IIPMJIOIKEHMH
AHAJIM3: TEOPMH
Mocxna, 2000
Translated
The present translation was created under license for the American Mathematical Society and is published by permission.
www.ams.org/bookpages/mmono-222
Library of Congress Cataloging-in-Publication Data
Magaril-Il'yaev, G. G. (Georgii G.), 1944[Vypuklyi analiz teoriia i prilozheniia. English] Convex analysis: theory and applications / G. G. Magaril-Il/yaev, V. M. Tikhomirov ; translated by Dmitry Chibisov. p. em. - (Translations of mathematical monographs, ISSN 0065-9282 ; v.222) Includes bibliographical references and index. ISBN 0-8218-3525-4 (acid-free paper) 1. Convex geometry. 2. Discrete geometry. 3. Functional analysis. 4. Operator theory. I. Tikhomirov, V. M. (Vladimir Mikhailovich), 1934- II. Title. III. Series. QA639.5.M3413 516.3'62---dc22 2003 2003062858
2003 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America.
The paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability. Information on copying and reprinting can be found in the back of this volume. Visit the AMS home page at http://www . ams. org/ 10987654321 08 07 06 05 04 03
@)
Contents
Preface Introduction Chapter 1. Theory 1. Basic definitions 2. Duality in convex analysis 3. Convex calculus 4. Finite-dimensional convex geometry 5. Convex extremal problems 6. Supplement: Convex analysis in vector spaces Chapter 2. Applications 7. Convex analysis of subspaces and cones and the theory of linear equations and inequalities 8. Classical inequalities, problems of geometry and mechanics 9. Kolmogorov-type inequalities for derivatives 10. Convex analysis and extremal problems of approximation and recovery Chapter 3. Appendix 11. Basic theorems of convex analysis 12. Supplementary topics of convex analysis 13. Convex analysis and the theory of extremum Bibliography Index
v
Preface
Convex analysis is a branch of mathematics that studies convex sets, convex functions, and convex extremal problems. Historically the beginnings of convex analysis date back to antiquity, but its name was coined only in the 60s of the 20th century. Many remarkable facts of convex analysis have been obtained quite recently. So, this is at the same time an ancient and modern part of mathematics. Convex analysis is closely related to geometry (in which it has its origins, since convexity is a geometric notion), but it is also deeply connected with analysis, and these latter connections stimulated intense interest in convex analysis in recent period. Hence the duality in the very name: convex analysis. At first sight it may seem surprising that convex functions and convex sets, which are very special objects of analysis and geometry, have diverse applications in mathematics, mathematical physics, technology, and economics. In fact, there are good grounds for that (which will be exposed in this book), but anyhow, the fact that the theory of convexity has diverse and fruitful applications is now so unquestionable that basic knowledge of convex analysis is nowadays necessary for almost every mathematician (especially for those dealing with applications). Moreover, studying convex analysis can be motivated aesthetically because it comprises many beautiful phenomena and facts. Apparently, certain elements of the convexity theory should become part of mathematical education at any level. The present book is intended to contribute to this. Convex analysis is one of the disciplines which require little preliminary knowledge for their study. This book was also written for a broad readership. Specially for "beginners" we have written the Introduction, where the fundamentals of convex analysis are illustrated in an elementary finite-dimensional setting.
vii
viii
PREFACE
Moreover, the subject matter of Chapter 1 dealing with theoretical aspects of the convexity theory also relies only on the notion of n-dimensional space and therefore, we believe, is accessible to a fairly broad readership. The geometrically minded reader is advised to combine reading with drawing, since many concepts and proofs have a very simple (and beautiful) geometric interpretation (which we tried to demonstrate by our figures). This book was originally conceived as the translation of the Russian publication by Magaril-Il'yaev and Tikhomirov (2000), but in the course of preparation of the English edition many parts were essentially revised, so that it became actually a different book. For this reason we somewhat changed its title. We express our deep gratitude to our colleagues, J. Brinkhuis, A. D. Ioffe, C. V. Konyagin, V. L. Levin, K. Yu. Osipenko, and V. Yu. Protasov for the material kindly provided to us, for useful discussions, and for assistance in preparation of the Russian edition of this book. We thank the student S. S. Chudova for drawing the figures. We are grateful to the translator, Professor D. M. Chibisov, for competent advice and comments which led to improvement of the book.
Introd uction
1. What is the subject of convex analysis? Recall that a set in a plane (or, generally, in a vector space) is said to be convex if, along with any two its points, it contains the entire line segment joining them; see Figure 1.
a)
b)
FIGURE 1. a) Convex set. b) Non-convex set. The boundary of a planar convex set is a convex curve, and in the case of a higher dimension it is a convex surface. A real-valued function defined on a real line or on a vector space is a convex function if its epigraph, i.e., the set lying over its graph, is convex (or, in other words, if the line segment joining any two points of the graph lies on or above the graph); see Figure 2. Seeking the minimum of a convex function on a convex set is referred to as a convex extremal problem or a problem of convex programming. Convex analysis is the branch of mathematics which studies convex objects, i.e., convex sets, convex functions, and convex extremal problems.
INTRODUCTION
f(xJ f(x)
I
f(xJ f(x) x,
IE--
domf--~
X21
a)
b)
2. Brief historical review. The concept of convexity appeared in antiquity. In his treatise "On the Sphere and Cylinder" Archimedes wrote (Axiom 4): "I call convex in one and the same direction the surfaces for which the straight line joining two arbitrary points [... ]lies on the same side of the surface." It is the same definition as we gave above (with a slight modification: our definition involves a "line segment" rather than a "straight line"; no doubt, this is what Archimedes meant). Surprisingly, "our" definition was given only in the 20th century. The first modern definition of the convex set is attributed to the German mathematician E. Steinitz, who proposed it in 1913. As a separate branch of geometry, convex geometry appeared in the 19th century in the works of Cauchy (who proved, in particular, the theorem on rigidity of polyhedra), Steiner, and Minkowski. Convex geometry as a new field of mathematics takes its origin from the publication of the book by Minkowski (1910). This book influenced the formation of a new field in mathematics, viz., functional analysis. Convex geometry came into vogue in the 30s. The monograph of German mathematicians Bonnesen and Fenchel (1934) summarized the development in this direction for the preceding century. In the 40s the theory of convexity experienced a striking surge of interest, owing primarily to the so-called problems of linear programming which consist of finding an extremum of a linear function on a polyhedron (specified by a finite system of linear equalities and inequalities). Many problems of this kind occur in economics.
INTRODUCTION
The pioneering role in the development of linear programming as a tool of economic planning was played by the Soviet mathematician and economist L. V. Kantorovich, who has written the first treatise on this subject, Kantorovich (1939). For certain political reasons he was not allowed at that time to continue activity in this area. For many years subsequent chapters of the new theory were being written outside the USSR. An intensive development of linear programming and then convex optimization in a broader sense began in the USA by the end of World War II and right after it. It was stimulated by the problems of economics as well as by military needs, in particular, the problems of governing the military forces. Among many researchers who contributed to the rapid development of linear programming and especially its economical applications, we should mention the outstanding mathematicians L. V. Kantorovich and J. von Neumann as well as the great economists W. Leontief and T. Koopmans. (For the elaboration of the mathematical foundations of the theory of economics, Kantorovich and Koopmans were awarded the 1975 Nobel Prize for Economics.) The simplex method, the powerful algorithm for solving the linear programming problems which was introduced by the American mathematician G. B. Dantzig, greatly enhanced the practical applicability of linear programming. The book Dantzig (1963) is the first fundamental monograph on linear programming. In 1949, Fenchel (who had emigrated to Canada) gave a course of lectures which laid the foundation for the theory of convex functions. His monograph Fenchel (1951) stimulated the further development of convexity theory. In two decades Rockafellar (1970) set out recent progress in this direction. The title of this book, Convex Analysis, was the first appearance of this combination of words. In the Preface the author thanks Professor A. W. Tucker of Princeton University, saying that it was he who proposed the title of this book. Since then the term "convex analysis" has become generally accepted. We mention also the monographs by Ekeland and Teman (1976), presenting many applied aspects of convex analysis; and Ioffe and Tikhomirov (1974), containing an introduction into convex analysis and giving a detailed account of its use in extremal problems. In subsequent years convex analysis made further progress, which will be partly covered in this book. We begin with a review of the basic ideas and principles of convex analysis; along with this we will set out the structure of the book.
INTRODUCTION
3. Elementary convex geometry. We will review here the key issues of convex geometry with emphasis on their geometric essence. All the proofs will be carried through on a Euclidean plane in a purely geometric manner, with the aid only of ruler, compass, and imagination. For example, we will not use (except the parts in small print) functions, nor even numbers (although subsequently, beyond this Introduction, the proofs will be mostly algebraic). The proofs we give here are so simple and natural that looking at the figures, one can see the validity of theorems and the essence of the proofs without logical arguments. For the sequel we need the notion of closedness. The easiest way to define it is to use that of an open set. A set on the plane is open if with any of its points it contains a circle centered at this point. A set is closed if its complement is open. Note also that any straight line partitions the plane into three parts: the line itself and two (open) half-planes. If we join the line to one of them, this will be a closed half-plane. Henceforth we call it simply a half-plane. As in school textbooks, we denote the points of the plane by capital letters, while the sets (sometimes called figures) by script style letters. We begin with the theorem that is fundamental for all convex analysis, namely, the theorem on strict separation (also called in the sequel the second separation theorem). DEFINITION 1. Let A be a set and B a point in the plane. A straight line 'H strictly separates the set A from the point B if A lies in one of the half-planes bounded by 'H, while B lies in the other one and does not belong to 'H (or, in other words, B lies in the other open half-plane); see Figure 3.
THEOREM (on strict separation). A convex closed set in the plane may be strictly separated from a point not belonging to it. PROOF. Let A be a convex closed set and B a point outside it. From among the points of A we can find the point C closest to B (see Figure 3). Let us imagine that A is an island amongst still waters of a lake, and we drop a stone "into point B" in the lake. At first the circle
INTRODUCTION
<,
<,
-,
-,
\ \ \ \
\
I
J
I /
/ /
~//
FIGURE
extending from this point does not reach A, but at some time the wave front will first touch A at some point C, which will be the closest point to B. For a long time mathematicians considered this type of assertion on existence of the closest point as obvious, accepting them without proof. In the 19th century A. Cauchy and K. Weierstrass provided a proof based on the notions of continuity and compactness. We will outline it here (remaining "within the plane"). A real-valued function f defined on the plane is said to be continuous at a point Ao if for any number E > 0 there is a number 8 > 0 such that the values of f at Ao and a point A differ in absolute value by less than E whenever the distance between A and Ao is less than 8. Weierstrass proved the important theorem that a continuous function on a compact set (in our case on a bounded closed set) attains its maximum and minimum values on this set. Let us apply this theorem to our case. The distance from a fixed point B to a point of the plane is a continuous function on the plane. Take some point A' in A and consider the set Ai which is the intersection of A and the circle with center at B of radius equal to the distance from B to A' (see Figure 3). By the Weierstrass theorem the distance from B to a point of the plane attains minimum on Ai (and hence on A) at some point C. Now the line H perpendicular to the segment [C, B] and passing through its mid-point solves the problem. Indeed, point B obviously does not belong to H. Assume that the same (open) half-plane where B lies contains some point D from A. Let us drop a perpendicular
INTRODUCTION
from B to the segment [C, D] (or the straight line containing it). Denote its base by D'. If D' lies in [C, DJ, then the length of [B, D'] is less than that of [C, B] (as a leg and the hypotenuse of a right triangle). In case D' lies outside [C, DJ, then BD is less than BC (since the angle BDC is obtuse). Thus we arrive at a contradiction with the fact that C is the closest point to B in A. 0
COROLLARY (Minkowski's theorem). For a figure in the plane to be an intersection of half-planes it is necessary and sufficient that it be convex and closed.
Indeed, an intersection of half-planes is a convex closed set (because the half-planes themselves are convex and closed). Now let A be a convex closed figure. Denote by Ai the intersection of all halfplanes containing it. If there were a point B in Ai but not in A, then separating it from A would give us a contradiction. The theorem on strict separation and its corollary were proved by H. Minkowski (1894 - 1909) for a finite-dimensional space and extended to infinite-dimensional case by the Polish mathematician s. Masur (1905 - 1981). This theorem is basic for infinite-dimensional convex analysis. One of the main theorems of convex analysis is the MinkowskiKrein-Milman theorem on extreme points.
DEFINITION 2. The points of a set that are interior for no interval contained in this set are said to be extreme points of this set.
The extreme points of a triangle are its vertices; for a disk these are the points of its boundary circle (Figure 4).
a)
b)
FIGURE 4. a) A, B, C are the extreme points of the triangle ABC. b) F is an extreme point of the circle V.
INTRODUCTION
DEFINITION 3. The intersection of convex (convex and closed) sets containing a given set is its convex hull (convex closure). THEOREM (Minkowski-M. Krein-Milman). A convex bounded planar set is the convex closure of its extreme points. PROOF. First of all we prove that the set of extreme points of A is nonempty. Let Ho be an arbitrary straight line. One possibility is that Ho has some points common with A and the set A lies entirely on one side of this line, in which case it is a support line to A. Otherwise, if Ho is not a support line, we will "move" it, keeping it parallel to itself until it becomes a support line (Figure 5a and b).
b)
FIGURE 5. a) A is an extreme point of A. point of A. b) B is an extreme
Again, we will explain how this (intuitively obvious) assertion can be proved. Consider the function on the plane, which is the distance from a point on the plane to Ho. This is a continuous function; hence by the Weierstrass theorem it attains its maximum and minimum on A. The parallel lines to Ho at the maximal and minimal distance from it are support lines. The intersection of a support line HI with A is an interval (which may degenerate into a point), and the end-points of this interval will be extreme points of A. (Prove it yourself.) Now let Al be the convex closure of the extreme points of A. It is a convex and closed subset of A. Assume that there is a point B in A but not in AI. By the theorem on strict separation the point B
INTRODUCTION
can be strictly separated from Ai by a line Hi' Let us find in A the most distant point Ai from Hi in the same half-plane as B. By the above, the line H~ parallel to Hi and passing through Ai contains an extreme point of A, which contradicts our construction (Figure 6).D
FIGURE 6 EXERCISE 1. Show that the convex hull of a finite number of points is an intersection of finitely many half-planes. The theorem on extreme points in the infinite-dimensional case was proved by the Soviet mathematicians M. G. Krein (1807-1989) and D. P. Milman (1913-1982) in 1940. The finite-dimensional setup has a number of features unavailable in infinite-dimensional spaces. One of them (very important in applications) is the possibility of the so-called "clean-up" 1 based on the Helly theorem, which is a remarkable result of convex finitedimensional geometry. We will prove this theorem here using two results which are of considerable independent interest. THEOREM (Caratheodory). If a point belongs to the convex hull of a finite system of points, then it either coincides with a point of this system or belongs to an interval joining two points of the system or to a triangle with vertices in the system.
IThis is a literal translation of the term adopted in the Russian literature, meaning that some unnecessary things may be taken away. Specifically, it refers to the possibility of deleting all but a finite number of points, leaving their convex hull unchanged. There is no generally accepted term in the English literature. In the translation of Ioffe and Tikhomirov (1974) the term "decomposition" was used. In this book we adopted the translation "clean-up".
INTRODUCTION
In other words, a point that belongs to the convex hull of a finite system of points belongs to the convex hull of at most three points of this system. This theorem was proved by the famous German mathematician of Greek origin, C. Caratheodory (1873 - 1950). . PROOF. The convex hull of a finite system of points is a polygon whose vertices are some points of this system (think it over yourself). If the point in question is one of the points of the system or lies on the interval joining two such points, then the proof is over. Assume the contrary. Let us draw a straight line through this point and a vertex of the polygon. It will cross the boundary of the polygon at a point on its side with end-points belonging to the system. Therefore our point belongs to the triangle with vertices mentioned above (Figure 7). 0
FIGURE 7. A belongs to the convex hull of {Ao,Ak,Ak+l}. THEOREM (Radon). Any system of at least four points can be partitioned into two subsystems so that their convex hulls have a nonempty intersection. PROOF. It suffices to prove this for a system of four points. Assume that these points do not lie on one straight line. Then their convex hull is either a triangle (see AI, A4, A3 in Figure 8) or a quadrangle (see AI,A2,A~,A3). In the former case the fourth point belongs to the triangle (the convex hull of its vertices), and in the latter the diagonals (the convex hulls of pairs of points) have an intersection p~~. 0 THEOREM(Helly). Let a family of at least three planar bounded closed convex sets be given. If any three sets of this family have a
10
INTRODUCTION
A,
FIGURE 8 nonempty intersection, then there exists a point which belongs to all of the sets of the family. PROOF. We will prove this statement for a finite system of, say, = 3, then the assertion follows from the condition. Assume that the theorem holds for r - 1 sets, and let a system A = {Ai, ... ,Ar} of r sets be given. By condition, for any i (1 :S i :S r) there is a point B, which belongs to the intersection of all the sets of the system except Ai' Since r is no less than four, we can apply Radon's theorem and partition the points B = {Bi}r=l into two sets Bl and B2 such that their convex hulls have a nonempty intersection. Let B be a point in this intersection. Denote by Ai and A2 the corresponding subsystems of A. Each point in Bl belongs to every set in A2, hence so does the convex hull of these points. In the same way the convex hull of B2 belongs to every set in Ai· Therefore their common point B belongs to every set of the system A (Figure 9). D
r sets by induction in r, If r
EXERCISE 2. Assume that every three of finitely many points on the plane can be placed into a circle of unit radius. Show that all the points can be placed into a circle of unit radius. EXERCISE3. Assume that the distance between every two out of a finite number of points on the plane is no greater than one. Show that all these points can be placed into a circle of radius ~ (Young's theorem on the plane). The above theorem was proved by the Austrian mathematician
E. Helly (1884-1943) in 1913. He spoke about this theorem to his colleague J. Radon (1887-1956), an expert in analysis and measure
theory. Helly did not publish the proof of his theorem before World
INTRODUCTION
B,
11
B,
B,
ni=1
War I. Then he was called up, was wounded at the Russian front, captured, treated in a hospital in Siberia, and returned to Austria only in 1921. His proof of the theorem appeared in Helly (1923). Radon found another proof based on his own theorem, published in Radon (1915). Now we turn to convex functions. We will consider functions of one (real) variable. Let us draw on our Euclidean plan a horizontal line and a vertical line perpendicular to it. Denote their intersection point by O. One can distinguish between two kinds of straight lines on the plane: the ones intersecting the vertical line and those parallel to it. The nonverticallines passing through the point 0 are the graphs of linear functions. Generally a nonvertical line on the plane is a graph of an affine function. A set on the plane which with any of its points contains the entire ray going from this point "upwards" parallel to the vertical line is the epigraph of the function that associates with any point of the horizontal line the lowest point of the epigraph projecting into this point. The epigraph of a function f is denoted by epi f. A function is said to be convex if its epigraph is a convex set and closed if it is a closed set (Figure 10). EXERCISE 4. Give an example of a function whose epigraph is convex but not closed. The following important theorem is one of the fundamental sults in convex analysis of functions. re-
THEOREM(Fenchel-Moreau). The epigraph of a function is the intersection of epigraphs of affine functions if and only if the function itself is convex and closed.
12
INTRODUCTION
x,
FIGURE
10
This theorem was proved in the finite-dimensional case by the German mathematician W. Fenchel in 1949 (see Fenchel (1951)) and in the infinite-dimensional case by the French mathematician J. J. Moreau in 1960 (see Moreau (1962)).
PROOF. The "only if" part follows from the definitions: if the epigraph of a function is the intersection of epigraphs of affine functions, then the function is convex and closed. If the epigraph is empty (in terms of functions this means that the function is identically equal to infinity), then the assertion is obvious: take all constants for the affine functions (their graphs are horizontal lines). Suppose the epigraph A is nonempty and let B be a point on its boundary. Take a point B' "under" B on the same vertical line (Figure 11). The line Ho strictly separating B' and A cannot be vertical (because the vertical line through B' passes also through B, contrary to strict separation). Hence it is the graph of an affine function. Assume now that the set Ai which is the intersection of epigraphs of affine functions lying below the set A (clearly Ai contains A) does not coincide with A. Then there exists a point Bl in Ai but not in A. The line Hi separating Bl from A cannot be the graph of an affine function, because that would contradict the construction of the set Ai. Hence it is vertical. The line Ho cannot pass "over" the point Bl (which would lead us to the same contradiction). Since the line Ho is not vertical, it intersects the vertical line. Let C be the intersection point of Ho and Hi. Take a point B~ "over" Bl and draw the line
INTRODUCTION
13
1t1
x
FIGURE
11
113 through C and B~. It is not vertical and lies over B1. This again
is a contradiction, which proves the theorem. 0
4. The main objects of convex analysis. Besides the two main objects of the convexity theory mentioned above, convex sets and convex functions, we should also mention subspaces, convex cones, convex sets containing the origin (termed zeroconvex sets), "translated subspaces" (termed affine subspaces or affine manifolds), and convex homogeneous functions. Recall that a set in a vector space is a cone (with vertex at the origin) if, along with any of its points, it contains the entire ray passing through this point from the origin; a function f is homogeneous (of first order) if f(ax) = af(x) for any positive a. A convex cone is a cone which is a convex set, and a convex homogeneous function (also called a sub linear function) is a convex function which is at the same time a homogeneous function of first order. 5. Duality in convex analysis. One of the most important phenomena related to convexity is that of duality. It manifests itself already in terms of the spaces in which our convex objects are considered. Convexity is defined in a vector space, but the theory of convexity takes the most complete form in the spaces endowed with the notion of closedness. This is possible in topological spaces where topology is compatible with convexity, so that the "habitat" of convex analysis where it "naturally lives" is the so-called locally convex spaces (LCS) defined in the mid-30s by J. von Neumann (after A. N. Kolmogorov introduced the notion of
14
INTRODUCTION
a linear (nowadays termed "vector") topological space). But in this book we will primarily stay on the "finite-dimensional level" (tackling the infinite-dimensional setting only in Section 12). The set of linear functionals on n-dimensional space (or linear continuous functionals if the basic space is LCS) forms the dual space. The fundamental principle of convex analysis may be formulated as follows: the convex objects (junctions, sets, and extremal problems) have a twofold description; with each convex object in the basic space one can associate the dual object in the dual space; if the initial object has certain closedness properties, it can be uniquely reconstructed from the dual object. We will explain below how the reconstruction is done, and now we give some examples of dual description of convex objects. To make the discussion more specific, we recall the description of the finite-dimensional space ]Rn and its dual (a more detailed description will be given in the next section). The space ]Rn consists of all ordered collections of n real numbers, which we write as column-vectors
(but to save space we will also write x = (Xl, ... ,xnf, where T means the transpose). For such vectors the operations of (coordinatewise) addition and (coordinate-wise) multiplication by a real number are defined in a natural way. With these operations ]Rn becomes a vector space. As we said, the space dual to a vector space is the space of linear functionals defined on it. If a = (al, ... ,an), the map X _. a- x = alXl + ... + anxn, where x = (Xl, ... ,Xn)T E ]Rn (with the dot denoting the matrix multiplication of the row-vector a by the column-vector x), obviously is a linear functional, and any linear functional on ]Rn may be written in this form. Thus the dual space to ]Rn may be realized as the set of all ordered collections of n real numbers written as row-vectors. We denote this space by (]Rn),. Let a E (]Rn)' and-y E R The set H = H(a,'Y) = {x E]Rn I a·x = 'Y} is a hyperplane (so that a hyperplane is a level set of a linear functional). Each hyperplane H (a, 'Y) determines two half-spaces H + (a, 'Y) = {x E ]Rn I a·x:::; 'Y} and H_(a,'Y) = {x E]Rn I a·x ~'Y}. Now we can illustrate the basic duality principle. We begin with convex sets.
INTRODUCTION
15
Any convex closed set in JRn, on the one hand, is a point set which with any two points contains the line segment joining them and, on the other hand, is the intersection of all subspaces containing it (Minkowski's theorem), so that convex sets are solutions to systems of (nonhomogeneous) inequalities. Thus we see that geometry and algebra unite in the description of convex sets. Similarly, the other objects can also be described in a twofold way. A convex function with closed epigraph (called in this case a closed function) can be dually defined as the supremum of affine functions, i.e., functions which are the sum of a linear function and a constant. The dual description of a sublinear function is the supremum of linear functions. A convex closed cone (with vertex at the origin) is, on the one hand, the union of its constituent rays and, on the other hand, the intersection of the half-spaces with boundary hyperspaces passing through the origin (i.e., cones are solutions to systems of linear inequalities). A zero-convex set is, on the one hand, the union of intervals originating from the origin and, on the other hand, the intersection of half-spaces containing the origin (i.e., such sets are solutions to systems of inequalities with the same right-hand sides). Affine subspaces are defined geometrically as the sets containing, together with two points, the entire straight line passing through them and, on the other hand, as the solutions of nonhomogeneous systems of equations. Subspaces are affine subspaces which are cones and, on the other hand, are solutions of homogeneous systems of equations. This is how the unity of geometric and analytic approaches to convexity manifests itself: convex objects are defined both geometrically as sets of points with certain properties and analytically as solutions of systems of linear equations and inequalities. As we said, the convex closed functions in the dual description are the suprema of affine functions. This leads us to the definition of the dual function in the dual space. In JRn it is defined as follows. Let f: JRn ----> JRu {±oo}. The Legendre- Young-Fenchel transformation of f or the conjugate function to f is the function defined on the dual space (JRn)' by the following rule:
f*(y)
:= sup(y·
x
x - f(x))
(see Figure 12). The function conjugate with f* (defined on JRn) is called the second conjugate of f and is denoted by
r:
16
INTRODUCTION
a=y'x
FIGURE
12
The following Fenchel-Moreau theorem holds: if f: ~n ----> ~ U 1** = f if and only if f is convex and closed. Geometrically this theorem means that a convex closed function is the envelope of a family of affine functions. This is the fundamental result in the duality theory of convex functions. In turn, this theorem (and actually the entire theory of duality in convex analysis) relies on the separation theorem, which was first proved in the finite-dimensional case by Steinitz in 1913, although it had already been known in an implicit form to Minkowski. (It was discussed above for a plane.) Besides the conjugation operator for functions (which associates with each function its conjugate), there are other duality operators: annihilator for subspaces (which associates with a subspace its annihilator), the conjugation operator for cones (which associates with a cone its conjugate), the polar for zero-convex sets (which associates with a set its polar), and other operators for affine subspaces and convex sets. The separation theorem states that these operators are involutory (i.e., the twofold application of the operator maps the object into itself).
{+oo}, then
6. Convex calculus. The calculus of smooth functions (differential calculus) hinges on the calculus of differentials; the main component of convex calculus is the "calculus" of subdifferentials. The subdifferential is an analog of the differential adapted for convex analysis. To explain this, let f be a function differentiable at a point Xo. Then the differential of f at Xo is a linear function (i.e., an element of
INTRODUCTION
17
the dual space) which, upon addition of the constant f(xo), provides a good approximation of f in a neighborhood of the given point. In convex analysis the role of linear function is played by sublinear functions. Such a function is the supremum of some family of linear functions. The corresponding set in the dual space is referred to as the subdifferential of the sublinear function, and each element of the subdifferential is a subgradient. Its formal definition is as follows: if p is a sublinear function, then its sub differential is 8p = {x' E (JRn)' I Any convex function f is approximated locally (in a neighborhood of some point xo) by the sum of a sublinear function and the constant f(xo). The subdifferential of the sublinear function approximating f at the point Xo is called the subdifferential of f at Xo. It is denoted by 8f(xo) and can be defined directly: 8f(xo) = {y E (JRn)' I f(x) - f(xo) 2: Y . (x - xo)} (in this notation the subdifferential of a sublinear function p is 8p(O)). Among the basic theorems of differential calculus there are theorems on the derivative of the sum of functions, on superposition of functions, and on inverse functions. In convex analysis there are analogs of these theorems, as well as results and formulas that have no analogs in differential calculus, e.g., the formula for the subdifferential of the maximum of convex functions. In particular, under certain assumptions on hand h, we have the following formula for the sum (an analog of the corresponding theorem of differential calculus):
Vx E
JRn}.
8(h
+ h)(x)
8h(x)
+ 8h(x)
(the Moreau-Rockafellar formula), and the subdifferential of the maximum of two convex functions continuous at a given point and taking equal values at this point is the convex hull of the union of their subdifferentials; i.e., for the function f(x) = max(h(x),h(x)) we have the following formula (by Dubovitskii-Milyutin):
8f(x)
i = 1,2, O:S; a
:s; I}.
This result has no counterpart in the classical calculus. At the same time, since convex functions are representable as suprema of affine ones, the formulas for the maxima or suprema of a family of functions playa particularly important role. Let p be a sublinear function. It is not hard to show that the set 8p (in the dual space) is convex and closed. Then a natural question is how to reconstruct the function from its subdifferential. It turns
18
INTRODUCTION
out that this can be done with the aid of the support function. If A is a set in the dual space, then the function sA(x) = SUPyEA y. x is the support function of A. The necessary and sufficient condition for the equality sap = p is that p be a closed sublinear function. Similarly, the equality asA = A holds if and only if A is a convex and closed set. These are duality relations as well, though unlike the case of involutive operators, as was, e.g., the conjugation operator for convex functions, they are generated by operators a, which map sublinear functions into convex sets, and s, which maps convex sets into sublinear functions. There are a number of operations that transform convex objects again into convex ones. So are, e.g., the algebraic summation of sets or the usual summation of functions. According to the duality principle discussed above, each operation has its dual. For instance, consider the duality related to the Legendre-Young-Fenchel transform and the summation operation for functions. We can ask, which operation on conjugate functions gives rise to the conjugate function of the sum of two functions, or, in other words, how can we obtain (11 + 12)* from fi and f27 This question can be answered with the aid of the operation
which is called the convolution (or sometimes, the infimal convolution). Under some assumptions on the functions the formula
(11 + 12)*
holds, and the equality
fi ffi f2
= fi
+ f2
is fulfilled in any case. In this sense the operations + and ffi are dual to each other. (Recently, the duality between linear operations and convolution caused the appearance of a new type of analysis parallel to the linear analysis (the so-called idempotent analysis).) Formulas similar to those presented above constitute convex calculus. There are more than 30 of them (see Section 3). The application of the duality operators listed above results sometimes in unusual operations on sets or functions (e.g., the so-called Kelley sum for sets and functions). The fundamentals of convex calculus are exposed in Sections 1 through 3 of Chapter 1. In particular, the theorems mentioned above are proved there.
INTRODUCTION
19
7. Finite-dimensional convex geometry. Finite-dimensional geometry is the cradle of convex analysis. The theory of convex sets began from the theory of convex polyhedra. Elements of this theory were laid by Cauchy (though the problem of classification of regular or semi-regular polyhedra was posed and to a large extent solved in antiquity, in particular, by the Plato school, Archimedes, and others). Cauchy proved a remarkable theorem: If there are two convex polyhedra with congruent respective faces, then they are congruent themselves. But this direction developed into a systematic theory only in the works of Minkowski. He turned from polyhedra to convex bodies. Among fundamental results obtained in the 19th and early 20th centuries, one has to list Minkowski's theorem on existence and uniqueness of a polyhedron with given normals and areas of corresponding faces, Steiner's formula for the volume of the sum of a polyhedron and Euclidean sphere, Steiner's symmetrization, the Brunn-Minkowski inequality, the Blaschke compactness theorem, and many others. All these results are presented in this book. Recent progress in mathematics gave a new incentive to the development of the theory of convex polyhedra. One source of interest in this theory was provided by the studies on solvability of algebraic equations in multidimensional spaces and on partial differential equations. Moreover, numerous links between convexity and algebraic geometry were discovered. 8. Convex extremal problems.
(P)
Io(x)
--+
min,
Ii(x):::;
0,
0:::; i:::; m, x
E A,
is referred to as a convex problem or a convex programming problem. If all functions in the problem (P) are affine, then (P) is called a linear programming problem.
20
INTRODUCTION
f(x)
~ min,
E ]Rn
(which is called a problem without constraints), then the necessary and sufficient condition for f to attain the minimum at a point is inclusion 0 E 8f(x). This inclusion, which can be verified trivially, is an analog of the Fermat theorem in calculus. Finite-dimensional problems with constraints were considered by Lagrange in the 17th century. For the treatment of such problems he proposed a general principle, which is known now as the Lagrange principle. This principle says that for obtaining necessary conditions for a minimum in a problem with constraints, one has to compose a function (called a Lagrange function), which is the sum of the functional to be minimized and constraints with indeterminate factors (called Lagrange multipliers), and to "minimize this function as if all variables were independent." For the problems considered by Lagrange this formulation has to be stated more precisely, but for the problems of convex programming it is correct literally: if solves the problem (P), then there are nonnegative Lagrange multipliers Ao, AI, ... ,Am, which are not all equal to zero, such that is a minimum point on A of the Lagrange function £(x, Ao, AI, ... ,Am) (this relation is called the minimum principle) and the complementary slackness conditions Adi(x) = 0, i = 1, ... ,m, are fulfilled. This is the Karush-Kuhn-Tucker theorem. And if Ao =I- 0 (this is the nondegeneracy mentioned above), then the condition that the Lagrange multipliers are nonnegative, the complementary slackness conditions, and the minimum principle are sufficient for the element to solve the problem (P). Thus in the problems of convex programming, necessary conditions "almost" coincide with sufficient ones. Besides these two notable features of convex problems (the most complete form of the Lagrange principle and coincidence of necessary and sufficient conditions), there is one more feature, which is related to duality of convex analysis objects discussed above: every convex programming problem has its dual, and it is worthwhile to investigate both problems jointly. Let us outline the construction of a dual problem. Let f: ]Rn ~ ]R U {±oo}. Consider the problem
(Po)
f(x)
~ min,
R".
INTRODUCTION
21
The problem (P) can be reduced to this form by setting f(x) = fo(x) if x satisfies the constraints in (P) and f(x) = +00 otherwise. Let F: JRn x JRm --+ JR U { +00 }. With each y E JRm we associate the problem
F(x, y)
--+
min,
JRn.
The family of such problems is the perturbation of the problem (Po). The dual problem to (Po) (with respect to the given perturbation) is the problem
(PO')
-F*(O, y')
--+
max,
y'
(JRm)"
where F*: (JRn)' x (JRm)' --+ JR U {+oo} is the function conjugate with F. This duality construction is based on the Fenchel-Moreau theorem. A few words about the methods of convex optimization, i.e., the algorithms for solving the convex problems. We mention here the simplex method and the so-called method of sections. The simplex method proposed by G. Dantzig is the method of efficient descent along the vertices of the polyhedron of the admissible elements in a linear programming problem. The underlying idea of the method of sections may be demonstrated by the univariate problem of smooth convex programming. Suppose we look for the absolute minimum of a convex smooth function f defined on the interval [0,1], given that we can compute the derivative of f at any desired point. The question is, what amount of computation is needed in order to localize the minimum by an interval of length E? It is not hard to see that the optimal method for finding the minimum point is the "method of bisection": first we compute I' (0) and 1'(1). If they are of the same sign, we take the minimal of them, and the value of f at this point is the minimum. Otherwise we divide the interval into two halves, compute the derivative at the mid-point, and cut off the half-interval where the derivatives at the end-points have the same sign. On the remaining interval we proceed in the same way, and so on. Eventually we will come arbitrarily close to the minimum point of f. An important theorem of finite-dimensional geometry, the Griinbaum-Hammer theorem, carries over this method to the multidimensional case.
22
INTRODUCTION
Besides these two methods, the book presents the method of ellipsoids and some recent work on the construction of algorithms for solving convex extremal problems. The theory of convex extremal problems is presented in Section 4 of Part 1. 9. Applications. The second part of our book deals with applications of the convexity theory to the various areas of mathematics, as well as to economics, technology, and the natural sciences. The mathematical applications can be briefly listed as follows: • Duality between convex calculus of subspaces and the theory of linear equations (see Section 7 where the Fredholm alternative is discussed). • Duality of convex calculus of cones and the theory of linear inequalities (see Section 7 where the Farkas, Gale, and Ky Fan theorems are discussed). • Duality of convex calculus of zero-convex sets and finitedimensional geometry (see H. Weyl's theorem in Section 7). • Calculus of support functions, theory of recovery, and inequalities for derivatives (Sections 9 and 10). • Calculus of Minkowski's functions and classical inequalities (Section 8). Section 8 also contains applications of the convex optimization to geometry and technology. Many applications of convexity to problems of analysis, economics, and technology rely on the fundamental relationship between convexity and integration. This can be illustrated by the theorem on vector measures due to A. A. Lyapunov. In the simplest setting it reads as follows. Let PI (. ), ... ,Pn (.) be n integrable functions on the interval [0,1]. With each measurable set E c [0,1] we associate the point x(E) = (xI(E), ... ,xn(E)) E R", where xi(E) = JEPi(t)dt, 1 :::; i :::; Then the set of such points corresponding to all mean. surable subsets E of [0,1] is a convex compact set in R". This phenomenon explains the role of convexity in the calculus of variations and optimal control (the fundamental results, such as Legendre and Weierstrass conditions, Pontryagin's maximum principle, and others, rely on convexity resulting from integration). Connections of convex analysis with general theory of extremum are presented in Section 13. Clearly, it is due to the optimization theory that convex analysis is
INTRODUCTION
23
so widely applicable in geometry, approximation theory, theory of inequalities, and so on. A typical situation in economics is when many small elements are summed up. Such sums are naturally approximated by integrals, which also explains the role of convex analysis in mathematical economics. The third part of the book is the Appendix. Here we prove some theorems which are fundamental for the convexity theory and review some aspects of the modern development of convex analysis. This is briefly the contents of the book.
CHAPTER
Theory
1. Basic definitions • In this section we give the basic facts about the structure of the space ~n, study the properties of convex sets in ~n and convex functions on ~n, and prove the separation theorems. 1.1. The space ~n. In this chapter we study convex objects in the space ~n (with n a positive integer) which consists of all ordered n-tuples of real numbers. Each n-tuple will be represented by a column of these numbers, but to save space we will write it as x = (Xl, ... ,Xn) T, where T means the transpose. The n-tuples X will be called vectors (or column vectors). For the vectors in ~n the operations of their addition and multiplication by a number are defined in a natural way: if X = (Xl, ... ,xnf and Y = (YI,'" ,Ynf, then X + Y = (Xl + YI,'" ,Xn + Ynf, and if X = (Xl, ... ,Xn)T E ~n and a E ~, then ax = (axI, ... ,axn)T. The space ~n has dimension n (written dim R" = n) in the sense that there exist n linearly independent vectors el, ... ,en (i.e., the equality aIel + ... + anen = is possible only for al = ... = an = 0) such that any vector X E ~n can be represented as X = ~lel +.. '+~nen and the (uniquely defined) numbers 6, ... ,~n are the coordinates of X in the basis el,'" ,en' A basis of ~n is any collection of n linearly independent vectors in R". Obviously, the vectors el = (1,0, ... ,of, ... ,en = (0, ... ,0,1)T form a basis in R". It is the standard basis, and the components of X = (Xl,'" ,xnf are the coordinates of vector X in this basis. Let l: ~n ----> ~ be a linear functional, i.e., l(x + y) = l(x) + l(y) for any x, Y E ~n and l(ax) = al(x) for any X E ~n and a E R Then if el,'" ,en is the standard basis in ~n and l(ei) = x:, i = 1, ... ,n, then for any X E ~n we have l(x) = X:Xi. This sum may be written as a (matrix) product of the row vector x' = (x~, ... ,x~)
L~=l
25
26
l. THEORY
by the column vector x (Xl, ... ,Xn)T, which will be written as x' . x. Thus l(x) = x' . x for any x E JRn. On the other hand, any row vector x' defines a linear functional x --+ x' . x on JRn. Therefore the set of all linear functionals on JRn (called the dual space of JRn) may be identified with the space JRn itself, with elements written as row vectors. Nevertheless, in view of subsequent generalizations, it is expedient to distinguish between JRn and its dual (the set of all ordered n-tuples of real numbers written in a row with the same operations of addition and multiplication by a number), which will be denoted by (JRn)'. There are some important classes of subsets of JRn which will be frequently encountered. These are, first of all, subspaces, affine subspaces, hyperplanes, and half-spaces. A subset L c JRn is a subspace if x + y E L for any x, y ELand ax E L for any x ELand a ERIn any subspace L c JRn one can find a set of linearly independent vectors el,'" ,em, m :::; n, such that any x E L is representable as x = 6el + ... + ~mem, and each such set has the same cardinality m, which is called the dimension of L, written dim L = m. It is proved in linear algebra that any subspace in JRn may be specified as the set of solutions to a linear system of homogeneous equations ai . x = 0, i = 1, ... ,s, where ai E (JRn)'. Subspaces in the plane are straight lines passing through the origin (one-dimensional subspaces), and in the three-dimensional space these are straight lines and planes passing through the origin (one- and two-dimensional subspaces respectively). A subset M c JRn is an affine subspace or affine manifold if M is a translated subspace, i.e., a set of the form x + L, where L is a subspace. The dimension of L is the dimension of M. These are all straight lines in the plane, and all straight lines and planes in the three-dimensional space. Let a = (al,'" ,an) E (JRn)'. A hyperplane H (a, ,) (determined by a vector a E (JR n)' and a number v) is the set {x E JRn I a· x = ,} (i.e., a level set of a linear functional). Clearly, a hyperplane is an affine subspace, and for, = 0 it is a subspace. The set H (a, ,) in the plane consists of the points satisfying the equation alxl + a2x2 = " which is a straight line, and H(a,,) in the threedimensional space is a plane. A hyperplane divides the space JRn into two sets (with the hyperplane being their common boundary). We denote these sets by H+(a,,) = {x E JRn I a· x :::;,} and H_(a,,) = {x E JRn I a· x 2: I}' These are half-planes in the plane and half-spaces in the three-dimensional space.
1. BASIC DEFINITIONS
27
In the sequel an important role will be played by straight lines, rays, and line segments, i.e., one-dimensional subspaces and affine subspaces, one-dimensional cones and convex sets (to be precisely defined later). Let x, y E lRn and x i= y. A line segment [x, y] is defined as the set of points of the form z = (1 - a)x + oq), 0 :::;a :::; . 1 A ray originating from x and passing through y is the set of points of the form z = (1 - a)x + ay, a 2: 0, and, finally, the straight line passing through the points x and y is the set z = (1 - a)x + oq), a E R If n = 1,2, or 3, these are the usual line segments, rays, and straight lines. Next, we will describe the so-called topological structure of lRn. With each x E lRn we associate the number [z] = VxT 'X= JI:~=lx;, called the length or the Euclidean norm of x. This quantity has the following properties: 1) [z] 2: 0 with Ixl = 0 if and only if x = 0; 2) laxl = lallxl for any x E lRn and a E lR; 3) [z + yl :::; [z] + Iyl for any x, y E lRn. The first two properties are obvious, and the third is established with the aid of the Cauchy-Schwarz inequality IxT . yl :::; Ixllyl, which can be easily verified. The quantity d(x,y) = Ix - yl is the distance between the vectors x and y. Now we define open and closed sets in R". Let x E lRn and r > O. The set Br(x) = {~ E lRn I I~- xl < r} is the open ball with center x and radius r. Let A c lRn and x E A. Then x is said to be an interior point of A if x is contained in A together with some open ball centered at x. The set of interior points of A is the interior of A, written int A. A set G c lRn is open if each of its points is an interior one, i.e., int G = G. A neighborhood of x E lRn is any open set containing x. In order to get accustomed to the basic topological notions, we advise the reader to think over the following exercises.
EXERCISE
1. Prove that any open ball is an open set. 2. Prove that the union of any family of open sets and of finitely many open sets is an open set.
EXERCISE
the intersection
EXERCISE
and a open.
i=
3. Prove that if G is an open set, then for any y E lRn 0 the set y+aG = {z E lRn I z = y+ax, x E G} is also
A set F
28 EXERCISE
1. THEORY
= {~ E
4. Prove that the ball (sometimes termed the closed ~n II~ - z] r} of radius r with center at x is a
EXERCISE 5. Prove that the intersection of any family of closed sets and the union of finitely many closed sets is a closed set.
Let A c ~n. A point x E ~n is an accumulation point of A if any neighborhood of x contains at least one point of A. The set of all accumulation points of a set A is called the closure of A and is denoted by cl A. It is clear that A c cl A.
EXERCISE
c B,
then
clAcclB.
EXERCISE 7. Prove that a set A is closed if and only if it coincides with its closure; i.e., A = cl A.
A sequence {XdkEN of points in ~n is said to converge to a point x E ~n if for any E > 0 there exists a positive integer N = N(E) such that IXk - z] < E for any k 2: N.
EXERCISE 8. Prove that x is an accumulation point of A if and only if there exists a sequence of points in A converging to x. EXERCISE 9. Prove that any convergent sequence {xkhEN bounded; i.e., there exists c 2: 0 such that IXkl c for all kEN.
is
A set A c ~n is compact if any sequence of its elements contains a subsequence converging to an element of A.
EXERCISE 10. Prove that a set in ~n is compact if and only if it is bounded (i.e., is contained in some ball) and closed.
is continuous at that If(x) - f(xo)1 < E for any x E A such that [z - xol < E (in other words, for any x E An Bc:(xo)), This definition of continuity of a function at a point is equivalent to the following one: a function f: A -4 ~ is continuous at Xo E A if for any sequence of vectors {XdkEN C A converging to Xo the sequence {f(Xk)}kEN converges to f(xo). If a function is continuous at each point of A, it is said to be continuous on A. Let A
~n
and Xo E A. A function f: A
-4 ~
the point Xo
l.
BASICDEFINITIONS
29
EXERCISE 1l. Prove that a function f: ]Rn ----+ ]R is continuous on ]Rn if and only if the inverse image of any open (closed) set is open (closed). (The inverse image of a set A under a mapping F: X ----+ Y is the set F-l(A) = {x E X I F(x) E A}.) EXERCISE12. Prove that the functions x where x' E (]Rn)' and a E ]R, are continuous.
----+
[z] and x
----+
x'·x+a,
Exercises 11 and 12 imply that hyperplanes and half-spaces in R" are closed sets, hence so are subspaces and affine subspaces in ]Rn. THEOREM (Weierstrass). A function continuous on a compact set in]Rn (i.e., on a bounded closed set) attains its maximum and minimum values on this set. PROOF. Let A be a compact subset of ]Rn and f: A ----+ ]R a continuous function. Denote, = sup{f(x) I x E A} and let {xd be a sequence of elements of A such that {f(Xk)} converges to f. Select from {xd a subsequence {Xkl} converging to x EA. Then the sequence {J(Xkl)} converges to f(x) = ,; hence, is finite and x is a point where f takes its maximal value. The reasoning for minimum is similar. 0 REMARK. The notions of open and closed sets, accumulation points, continuity, and compactness are so-called topological notions. They are introduced in ]Rn in a natural way based actually only on the notion of an open ball. In an infinite-dimensional space these notions, in general, are defined differently. We will set out convex analysis in infinite-dimensional spaces in Section 6. Presently we mean ]Rn only when speaking of topological notions. But it is worthwhile to note that all the results presented in this chapter remain valid in the infinite-dimensional setting if we specify the precise meaning of the words "closedness" and "closure". 1.2. Convex sets and functions. • Convex sets and functions are the main objects of convex analysis. Here we present the basic concepts related to these objects. We begin with convex sets. Recall that for x, y E ]Rn, the set [x,y] = {z E]Rn I z = (1- a)x+ ay, 0 S a S I} is referred to as the line segment (joining the points x and y). In a similar way, we define the half-intervals [x, y) = {z E ]Rn I z = (1 - a)x + ay, 0 S a < I}, (x,y] = {z E]Rn I z = (1- a)x + ay, 0 < a S I} and the interval (x, y) = {z E]Rn I z = (1- a)x + ay, 0 < a < I}.
30
1. THEORY
DEFINITION 1. A nonempty set A c ~n is convex if, along with any two of its points x and y, it contains the line segment [x, yJ.
The empty set is convex by definition. For any a E (~n)' and 'Y E ~, the hyperplane H(a,'Y) = {x E ~n I a· x = 'Y} and the corresponding half-spaces H+(a,'Y) = {x E ~n I a· x:S; 'Y} and H-(a,'Y) = {x E ~n I a· x 2: 'Y} are convex sets. The basic operations on convex sets are intersection and algebraic sum: (1) Let {AihE.7 be an arbitrary family of convex subsets of ~n. Then niE.7 Ai is a convex set (i.e., the intersection of any family of convex sets is a convex set). In particular, the intersection of any family of half-spaces is a convex set. Moreover, this set is closed (see Exercise 5 above). Later we will show that any convex closed set is the intersection of some family of half-spaces (which is one of the basic results of convex analysis). (2) Let {Adf=l be a finite family of convex sets in R". Then Al + ... + An := {x E ~n I X = Xl + ... + Xn, Xi E Ai, i = 1, ... ,n} is a convex set (i.e., the algebraic sum of finitely many convex sets is a convex set). If A is a convex subset of ~n and A E ~, then the homothetic transformation of A with coefficient A, i.e., the set AA := {AX I x E A}, is a convex set. More generally, if A: ~n --+ ~m is a linear operator and B a convex subset of ~n, then A(A), the image of A under the mapping A (to be written also AA), and A-I (B), the inverse image of B under the mapping A (to be written also BA), are convex sets. These assertions are deduced directly from the definitions. A nonempty set K C ~n is a cone if, along with any element x E K, it contains the element ax E K for any a > O. It can be easily verified that a cone K is convex if and only if K+KcK. The set = {x = (Xl, ... ,Xn) E ~n I Xi 2: 0, i = 1, ... ,n} is a convex cone. Let Xi E ~n and ai E R, i = 1, ... ,n. The vector x = I:~=l aixi is called (a) a linear combination of the vectors Xl, ... ,Xn for any possible values ai E ~, i = 1, ... ,n. A set containing any linear combination of any two of its elements (i.e., with any two of its points containing the plane passing through them and the origin) is a subspace (this notion, as well as some other notions
~+
1. BASIC DEFINITIONS
31
given below, has appeared before, but here we define them in a unified manner); (b) an affine combination of Xl,'" ,Xn if Di E ]R, i = 1, ... ,n, and I:~=l Di = l. A set containing any affine combination of any two of its elements (i.e., with any two of its points containing the straight line passing through them) is an affine subspace or affine manifold; (c) a conic combination of Xl, ... ,Xn if Di 2: 0, i = 1, ... ,n. A set containing any conic combination of any two of its elements (i.e., with any two of its points containing the planar angle between the rays determined by these points) is a convex cone; (d) a convex combination of Xl, ... ,Xn if Di 2: 0, i = 1, ... ,n, and I:~=l Di = l. A set containing any convex combination of any two of its elements (i.e., with any two of its points containing the line segment joining them) is a convex set. If A is a nonempty subset of ]Rn, then we immediately deduce from the definitions that the set of (a) all linear combinations of elements of A is the least vector subspace in ]Rn containing A. It is called the linear span of A and is denoted by span A; (b) all affine combinations of elements of A comprise the least affine subspace in ]Rn containing A, i.e., the set of the form X + LA, where X E A and LA is a subspace in ]Rn. It is called the affine hull of A and is denoted by aff A; (c) all conic combinations of elements of A comprise the least convex cone in ]Rn containing A. It is called the conic hull of A and is denoted by cone A; (d) all convex combinations of elements of A comprise the least convex set in ]Rn containing A. It is called the convex hull of A and is denoted by co A. If A is a convex subset of ]Rn and aff A = X + LA, then the dimension of A (written dim A) is the dimension of LA. The points al ... ,am+l E ]Rn (m 2: 1) are affine independent if I::il Aiai = 0 and I::il Ai = 0 entail Al... = Am+l= O. It can be easily verified that the points al,." ,am+l are affine independent if and only if for any 1 :::;io :::;m + 1 the vectors ai - aio' 1 :::;i :::;m + 1, i =I=- io, are linearly independent. The convex hull of affine independent points al, ... ,am+l is referred to as an m-dimensional simplex, and the points al, ... ,am+l as its vertices. Any point X of the simplex is uniquely representable
32
l. THEORY
l l as X = I:::i Aiai, where Ai ~ 0, i = 1, ... ,m+ 1, and I:::i Ai = 1. The coefficients Ai, 1 SiS m + 1, are the barycentric coordinates of x. It is clear that a one-dimensional simplex is a line segment, a twodimensional simplex is a triangle, and a three-dimensional simplex is a tetrahedron. Let A be a convex subset of ~n. If its dimension is greater than zero, then it equals the maximal dimension of simplexes contained in A. Indeed, let dim A = m. It follows from the definition of the dimension of a convex set that the maximal dimension of simplexes contained in A is no greater than m. Assume that it is less than m, and let al,." ,ak+l (k < m) be the maximal affine independent set of points in A. If A C aff{ al,'" ,ak+d, then aff A C aff{ al,'" ,ak+l} in contradiction with dim A = m. Let bE A \ aff{al, ... ,ak+l}' But in this case one can easily verify that the points b, ai, ... ,ak+l are affine independent, which contradicts the maximality property of the set of a's. It is clear that such sets as a line segment in the plane or a triangle in the three-dimensional space have no interior points. However, they have interior points relative to their affine hulls, which motivates introducing the following notion. Let A be a convex subset of R". A point x E A is a relatively interior point if it has a neighborhood U such that un aff A c A. The set of all relatively interior points of A is the relative interior of A, written ri A.
PROPOSITION 1. Let A be a nonempty convex subset of ~n. Then: (a)riA-=f.0; (b) if Xl E ri A and X2 E cl A, then all the points of the interval (Xl,X2) belong to riA; (c) riA and clA are convex sets and cl(riA) = clA. PROOF. (a) Assume first that aff A = ~n, i.e., dimA = n, so that A contains an n-dimensional simplex S. Let al,'" ,an+l be its vertices. Without loss of generality we may assume that al = 0. We will show that any point of S with positive barycentric coordinates belongs to int S. Indeed, let Xo = I:~~i Akak, I:~~i Ak = 1, Ak > 0, k = 1, ... ,n + 1. If el, ... ,en is the standard basis in ~n and X = (Xl, ,Xn)T E ~n, then X = I:~=lxiei' Further, since the vectors
a2,
,an+l
= I:~~; (Xikak,
= I:~~; "(kak·
+x
= I:~~;(Ak
+ I:~=l
xi(Xik)ak
1. BASIC DEFINITIONS
33
such that "(k ~ 0,2 S k S n + 1, and "(k S 1 whenever IXil < E, 1 SiS n. Therefore Xo + XES for any x with such components and a fortiori for any vector in the open ball .80 (0), i.e., Xo E int S = ri S. If dim A = m < n, we may identify aff A with the space JRm, thus getting that ri A = int A (in JR ""). Then the preceding arguments show that in this case ri A =I=- 0 as well. (b) In view of this remark we may assume that aff A = JRn and hence ri A = int A. Let Y E (Xl, X2); then y = (l-a)xI +ax2 for some 0< a < 1. Since X2 E clA, we have X2 E A + .80(0) for any E > O. Then y+.8o(O) = (l-a)xI +ax2+.80(0) C (l-a)xI +a(A+.8o(O))+ .80(0) = (1- a)(xI + .8,(0)) + aA, where "( = E(l + a)(la)-I. By assumption, Xl E int A Xl + .8,(0) c A for E small enough; therefore y + .80(0) c (1- a)A + aA = A. (c) We again assume that aff A = JRn. Convexity of int A follows directly from (b). Let us prove convexity of cl A. Let Xi E cl A, i = 1,2, and let X = (1 - a)xI + aX2, 0 S a S 1. For any E > 0 we have (xi+.8o(O))nA =I=- 0, i = 1,2. Let Xi E (Xi+.8o(O))nA, i = 1,2. Set X = (1- a)xI + aX2. Then x E A and x E (1 - a)(xI + .80(0)) + a(x2 + .80(0)) = (1 - a)xI + aX2 + .80(0); i.e., any neighborhood of the point X = (1 - a)xI + aX2 has a nonempty intersection with A, so that X E clA. The inclusion cl(riA) C clA is obvious. Let X E clA. By (a) there exists an Xl E ri A. Then (x, Xl) C ri A by (b), which implies that X E cl(ri A). 0 Now we turn to convex Junctions. We will have to deal with functions taking not only finite values but also -00 and +00. To this end we introduce the set ~ = JR U {-oo} U {+oo}, i.e., the real line completed with symbols ±oo, with order relation extended in a natural way, -00 < a < +00 for a E JR. Moreover, we assume that a ± 00 = ±oo for any a E JR, a(±oo) = ±oo if a > 0, and
2:~~;
J:
-00.
JRn
-+ ~
{x
E JRn I
{(x,a)
E JRn x JR I X E domJ,
a ~ J(x)}
34
THEORY
is the space JRn+l regarded as the set of pairs (x, a), is the epigraph of f. f: JRn ----+ 1R is proper if at no point it takes the value equal to +00 identically.
----+
1R
is convex
if epi f is a
For proper functions there is an equivalent definition of convexity which often proves to be more convenient. Namely, the following assertion holds: a proper function f: JRn ----+ JR U { -l-oo] is convex if and only if for any points Xl, X2 E JRn the graph of the function on the segment [Xl, X2] never lies above the line segment joining the points f(xI) and f(X2)" i.e., for any Xl, X2 E JRn and any ai 2: 0, i = 1,2, such that «i + a2 = 1, the following mequality holds:
which is known as the Jensen inequality. Its proof follows easily from definitions. Here are examples of convex functions on the real line: the quadratic function X ----+ ax2 + bx + c with a 2: 0; the exponential function x ----+ e"'x, a E JR; powers of the absolute value, X ----+ IxIP, p 2: 1; the function x ----+ -log X for X > 0 and +00 for x :s; 0; and the function X ----+ - (x log2 X + (1 - x) log2(1 - x)) for 0 < x < 1, 0 for x = 0 or 1, and +00 otherwise (the last function is, up to sign, the entropy of the random variable taking the values 0 and 1 with probabilities x and 1- x). An important (though uncustomary) convex function is the indicator function of a convex set A. It is defined as JA(x)
=
0, { +00,
x E A, x ~ A.
For many functions (in particular, for most of those given above) it is convenient to establish their convexity using the following assertion: if a function f is twice differentiable on (a, b) (-00 :s; a < b:S; +00) and f"(x) 2: 0 for any x E (a,b), then f: JR ----+ 1R, where f(x) = +00 for x ~ (a, b), is a convex function. Indeed, since f" (x) 2: 0 for x E ( a, b), we see that l' is nondecreasing. Therefore, if a < Xl < X2 < b, 0 < a < 1 and x = (1 - a)xI + aX2, we have by the mean value theorem f(x) - f(XI) :s; 1'(x)(x - xI) and f(X2) - f(x) 2: 1'(X)(X2 - x), or f(x) :s; f(XI) +
1 BASIC DEFINITIONS
35
af'(x)(x2-xI)
and f(x) ~ f(x2)-(1-a)f'(x)(x2-xI). Multiplying the next-to-last inequality by 1- a and the last one by a and adding them, we obtain f((l - a)xl + aX2) ~ (1 - a)f(xI) + af(x2). 0 The main operations on convex functions are maximum and summation: (a) Let Ii: lRn --+ i, i = 1,2, be convex functions. Then !I V h(x) = max(!I(x), h(x)) is also a convex function. Actually, the following more general assertion is valid: if fJ: lRn --+ i, j E J, is a family of convex functions, then f = SUPjEJ fj is also a convex function and epi f = njEJ epi h (i.e., the supremum of any family of convex functions is again a convex function). (b) If f, g: lRn --+ i are proper convex functions, then f + 9 (i.e., (f + g)(x) = f(x) + g(x)) is also convex (i.e., the sum of convex functions is again a convex function). Note also that, for a convex convex function.
njEJ
and a
Property (a) follows easily from the obvious fact that epi f = epi h and the fact that the intersection of any family of convex sets is again a convex set. Property (b) and the statement about the product of a convex function by a positive number are easily verified with the aid of the Jensen inequality. If A: lRn --+ lRm is a linear operator and f: lRm --+ lR and g: lRn --+ i are convex functions, then it is not hard to see that the functions fA: lRn --+ 1R, f A( x) = f (Ax), called the inverse image of f under the mapping A, and Ag: lRm --+ i, Ag(y) = inf{g(x) I Ax = y}, called the image of 9 under the mapping A, are also convex. An important class of convex functions consists of sublinear functions. A function p: lRn --+ 1R is sublinear if epip is a convex cone in lRn x lR. The Jensen inequality implies that a proper function p: lRn --+ lR is sublinear if and only if
Vx
VX,yElRn, R", vo » 0
(i.e., a proper sublinear function is a convex function positively homogeneous of first order). For example, linear functions and the supremum of any family of linear functions are sublinear functions. (One of the important facts
36
1. THEORY
of convex analysis to be proved below is that a sublinear function with closed epigraph is the supremum of some family of linear functions). Now we consider three important examples of sublinear functions: the Minkowski function, the support function, and the directional derivative of a convex function. Let A be a non empty subset of ~n. The function J1.A: ~n -4 lR defined by the equality J1.A( ) = inf {A > 0 I x E AA} is the Minkowski x function of the set A. Let A be a nonempty subset of R". The function sA on (~n)' defined by the equality sA(x') = sUPxEA x' . x is the support function of A. It is obvious that this function is positively homogeneous, and it is convex because sA(·) is a supremum of convex functions. Now we define the directional derivative. Let xo, x E ~n and let the function f: ~n -4 lR be finite at Xo. The quantity
f'( Xo;X )
= im
1·
f(xo
+ tx) - f(xo)
t
ti~
(whenever it exists) is the derivative of f at the point Xo in direction x. For convex functions the limit (possibly infinite) in the righthand side of the above relation always exists. Indeed, the following assertion is valid: if f: ~n -4 lR is a proper convex function and Xo E dom I, then l'(xo; .) is a sub linear function, and for any x E ~n
f'(xo. x)
't>o
inf f(xo
+ tx)
- f(xo).
To prove this assertion, we will show first that for any x E ~n the function h: t -4 r 1 (f (xo + tx) - f (xo)), t > 0, is nondecreasing. Indeed, let 0 < s :S t. Since Xo + sx = (s/t)(xo + tx) + ((t - s)/t)xo, we have by convexity of f
f(xo
t-s
(f(xo) - f(xo)),
i.e., h(s) :S h(t). Thus the limit in the definition of the derivative can be replaced by infimum (which may be equal to -00).
1.
BASICDEFINITIONS
37
is homogeneous.
r:XO;X1 +X2 )
= im flO
li
f(xo
f'(xo;xd
+ f'(XO;X2).
The main results of convex analysis (see Sections 2, 3 below) are based on separation theorems for convex sets. Separation theorems in a finite-dimensional space. Recall that for x' E (I~n)', x' =I- 0, and v E ~, the set H = H(x', "'() = {x E ~n I x'·x = "'(}is a hyperplane in ~n. It is clear that H(x', "'() = H( cx', c"'() for any c =I- O. The hyperplane H(x', "'() determines two half-spaces H+(x',,,,() = {x E ~n I x'· x S "'(} and H_(x',"'() = {x E ~n I x'· x ~
"'(}.
Two nonempty subsets, A and B, of ~n are said to be separable if there exists a hyperplane H such that A and B lie in the opposite half-spaces determined by H. This geometric definition of separability is, obviously, equivalent to the following algebraic one: the sets A and B are separable if there exists a nonzero element x' E (~n)' such that
aEA
sup x' . a
S inf x' . b.
bEB
If the inequality is strict, then the sets A and B are strictly separable (in geometric terms it means that A and B can be separated by two different parallel hyperplanes). We will prove first the theorem on strict separability of a point from a set, and then deduce from it the theorem on separability of sets. THEOREM (second separability theorem or the theorem on strict separability of a point from a set). Let A be a nonempty convex closed
subset of separable.
~n
and let b
tt
A.
PROOF. Let x E A, r = Ix- bl, and A1 = AnBr(b) = {x E ~n I r}. Consider the function f: ~n --+~, f(x) = Ix - bl. This function is continuous; hence by the Weierstrass theorem it attains its infimum on the bounded closed set A1 at some point E A1. In
Ix - bl S
38
1.
THEORY
other words, is the closest point to b in AI, hence also in A. Set x' = (x - b) T and consider the hyperplane H = {x E JRn I x' . x = x' ·x}. Then x'· b = x'· (b-x+x) = -lx'12 +x'·x < x' ·x. It remains to show that x' . x ~ x' . x for any x E A. Suppose there exists Xo E A such that x' . Xo < x' . x. Since A is convex, (1 - t)x + txo E A for o ::::; ::::; and we have t 1,
1(1 - t)x
+ txo -
W=
for small t. But this contradicts the property that is the closest point to b in A, hence x' . b < x' . x::::;x' . x for any x E A; i.e., A and b are strictly separable. D THEOREM (first separability theorem or the theorem on separability of sets). Let A and B be nonempty convex subsets of JRn and let ri A n B = 0. Then A and B are separable. PROOF. Let C = ri A - B. Since ri A is a convex set (see Proposition 1), the set C is convex and 0 tt C. Suppose 0 tt clC. Since cl C is also convex (Proposition 1), by the previous theorem there is a nonzero vector x' E (JRn)' such that x' . x < 0 for any x E cl C and a fortiori for x = a - b E ri A - B = C, i.e., x' . a < x' . b for any a E ri A and b E B. This implies that ri A belongs to the closed subspace H+(x',ry) = {x E JRn I x' . x ::::; y}, where r ry = inf{x'· bib E B} and B C H_(X',ry). But, by Proposition 1, A c clA = cl(riA) C H+(X',ry); hence the sets A and B are separable. Now let 0 E c1 C and let x E ri C (ri C i= 0 by Proposition 1). Then for any A > 0 the vector -AX does not belong to cl C, since otherwise (again by Proposition 1) 0 = A(l+A)-lx+(l+A)-I(-AX) would belong to ri C and hence to C. Then there exists a sequence of points {Xk} outside cl C such that Xk --+ 0 as k --+ 00. According to the second separability theorem proved above, there are vectors x~ E (JRn)', x~ i= 0, kEN, such that
(i)
for each kEN and x E cl C. Dividing (i) by Ix~ I, we can assume that Ix~1 = 1. Then (since the sphere S = {x E JRn Ilxl = I} is bounded and closed) the sequence {xU contains a subsequence converging to
39
some vector x', Clearly Ix'i = 1; hence x' =f. O. The passage to a limit in inequality (i) along this subsequence yields x' .x :::; Then exactly O. the same arguments as in the preceding theorem show that the sets A and B are separable. 0 2. Duality in convex analysis • An important feature of convex objects is the possibility of their three-way description: namely, geometric, as a set of points, segments, rays, or straight lines with certain properties; analytic, as a superposition of some operators defined on families of convex sets and functions; and algebroanalytic, as sets of solutions of a family of equations or inequalities (for sets) or suprema of families of affine or linear functions (for functions).
In this section we formulate a theorem on convex duality which gives all such descriptions for the main convex objects. We begin with a description of operators acting on classes of convex sets and functions. A) Conjugation operator for functions. Let a function f: ]Rn ----+ lR be given. The function 1*: (]Rn)' ----+ lR defined by the equality j*(x')
:= sup
xElRn
(x' . x - f(x))
is called the conjugate function of I, or the Young or Young-Fenchel or Legendre transform of f. Since 1* is the supremum of convex closed functions x' ----+ x' . x f(x), x E ]Rn, its epigraph 1* is the intersection of the epigraphs of these functions; hence 1* is a convex closed function on (R")'. Thus, dealing only with convex functions, we have an operator ", which maps convex functions on ]Rn to convex closed functions on (]Rn),. Here are some examples of conjugate functions. 1) Let n = 1 and f(x) = eX. Then 1*(x') = x' log x' - x' for x' > 0,1*(0) = 0, and 1* (x') = +00 for x' < O. 2) Let f(x) = (2:7=1IxiIP), where x = (Xl, ... ,xn)T and 1 <
<
00.
Then 1* (x')
= ;,-
(2:7=1 Ix:IP'),
where x'
= 1. 3) Let A be a nonempty subset of ]Rn. Then (JA)*(x') = sA(x') (the function conjugate with an indicator function of a set is the support function of this set).
lip
+ lip'
40
1. THEORY
4) Let x' E (]Rn)/, a E ]R, and let f: X --4 x'· X - a be an affine function. Then 1*(X') = a for x' = x' and 1*(X') = 00 otherwise. Such a function, with the epigraph consisting of a single ray, will be called elementary. The epigraph of any function is the union of epigraphs of elementary functions. Note that if f is a convex proper closed function, then so is 1*. Indeed, its convexity and closedness follow from the definition. Since f is not identically +00, 1* nowhere equals -00. It will be shown in the duality theorem presented below that there exists an affine function p(x) = x' . x - a which does not exceed f. Then 1*(X') 2: p*(x') = a, so that 1* is a proper function.
bX2
EXERCISE
1. Find the functions conj ugate with f (Xl, X2) = maxtzj , ... , xn).
= aXl +
is the function
1** : ]Rn
--4
i defined
f**(x):=
sup
x'E(JRn)'
(x'· x - f*(x')).
is a convex closed function on ]Rn. functions of f(x)
]Rn
1**
sinx, f(x)
1/x2.
and
It follows directly from the definitions that for any x E x' E (]Rn)' the following inequalities hold:
+ f*(x'),
+ f**(x),
which are known as the Young inequalities; moreover, it follows that 1**(x) :::;f(x) for any x E ]Rn and that f(x) :::;g(x) for all x E ]Rn entails 1* (x') 2: g*(X') for all x' E (]Rn)'. B) Subdifferential. Let p: ]Rn --4 i be a sublinear (i.e., convex and homogeneous) function. The set
is the subdifferential of p. One can easily check that up is a convex set, and it is closed because it is the intersection of the closed halfspaces {x' E (]Rn)' I x' . x :::;p(x)}, x E ]Rn. Thereby the operator is defined, which associates with a sublinear function on ]Rn a convex closed set in (]Rn)/. In particular, with a linear function l: X --4 x' . x it associates the set at = {x'} consisting of a single element.
41
Note that if p is a sublinear proper and closed function, then 8p =I=Indeed, according to the aforesaid, p* is also a proper function; hence there exists a point x' E (I~n)' where p is finite. In view of homogeneity, p(x') = O. This means that x' . x :s; p(x) for all x, i.e.,
0.
x'
E 8p.
EXERCISE
... ,xn)
(Lx;)1/2,
i=l max lXii, l:'02:'On l:'Oi:'On
Xi·
,xn) =
xn) = max
In the next section we will give the definition of the subdifferential for any convex function at a fixed point (according to which the above definition provides the subdifferential of p at zero), which is a natural extension of the notion of derivative at a given point. A convex function f is approximated locally (e.g., in a neighborhood of a given point xo) by a sublinear function p (like a smooth function approximated by a linear one), and the subdifferential of f at Xo is equal to that of p. Hence it is natural to take the above definition as the initial one. C) Support operator. Let A be a convex subset of JRn. The function s: (JRn)' --+ ~ defined by the relation sA(x')
=
sup x' . X
xEA
(if A = 0, we set sA(x') == -00) is the support function of A. It is easily seen that s is a closed sublinear function; hence this definition determines an operator s from the class of convex sets in JRn to the set of closed sublinear functions on (JRn)/. In particular, if A consists of a single point, A = {x}, then sA: x' --+ x' . x is a linear function.
EXERCISE 4. Determine the support functions of the following sets: {x E JRn I Ixl < I}, {x = (Xl,"" Xn)T E JRn I maxl:'Oi:'On IXil < I}, {x E JRn I Ll:'Oi:'OnIXil < I}.
AD = {x'
't:/x E A}
42
1. THEORY
is the polar of the set A. Obviously, AO is a closed zero-convex set (i.e., a convex closed set containing the origin). Thereby the operator ° which associates with a convex set in JRn a closed zero-convex set in (JRn)' is defined. In particular, if A consists of a single point, A = {x}, then the polar is the half-space {x' E (JRn)' I x' . x :s; I}. One can apply the operator ° to the set A ° once again. The resulting set AOO = (AO)O c JRn is the bipolar of A.
EXERCISE
plane: {( -1,
-1),
5. Determine the polars of the following sets in the (-1,1), (1, -1), (1, I)}, {(Xl, X2) EJR2 I +x~ :s; I}.
xI
E) Conjugation
Let
e be
E
a convex cone in
c'
= {x' E
C}
is the conjugate cone of C. Clearly, C' is a convex closed cone, and thereby the operator' is defined acting from the class of convex cones in JRn to the class of convex closed cones in (JRn)'. In particular, if the cone is a ray passing through a point x, then the conjugate cone is the half-space {x' E (JRn)' I x' . x ~ O}. A repeated application of the operator' results in the set C'' = (C')' c JRn called the bi-conjugate cone of C. F) Annihilator of a subspace. Let L be a subspace of JRn. The set
LJ.. = {x'
(JRn)' I x' . x
= 0, \Ix E
L}
is the annihilator of the subspace L. It is clear that LJ.. is a subspace in (JRn)', so that J.. is an operator which associates with a subspace of JRn a subspace of (JRn),. In particular, the annihilator of the straight line passing through the point x is the hyperplane {x' E (JRn)' I x'·x = O}. A repeated application of the operator J.. yields the subspace LJ..J..= (LJ..)J.. C JRn called the bi-annihilator of the subspace L. Besides the operators listed above we will need two more operators, which have already been discussed. They map the subsets of JRn (or (JRn )') to functions on JRn (or (JRn )'). Let us recall their definitions. Indicator operator. Let A be a subset of JRn. We associate with A the function bA on JRn (called the indicator function of A) by the following rule: bA(x) == +00 if A = 0; otherwise, if A =I=- 0, then bA(x) = 0 for x E A and bA(x) = +00 for x ~ A. It is clear that bA is a convex function whenever A is convex.
43
Minkowski's operator. Let A be a nonempty subset of JRn. We associate with A the function p,A on JRn (called Minkowski's function of the set A) by the formula p,A(x) = inf{A > 0 I x E AA}.
EXERCISE 6. Determine Minkowski's function of the triangle in the plane with vertices at the points (1,0),(-1/2,/3/2),(-1/2,
-/3/2).
EXERCISE
tion.
THEOREM
(1) Let f: JRn ----+ JR U {+oo}. Then the following statements are equivalent: (1.1) f is convex and closed; (1.2) 1** = I, (1.3) f is the pointwise supremum of affine functions that do not exceed f. (2) Let p: JRn ----+ lR be a proper sublinear function. Then the following statements are equivalent: (2.1) p is closed; (2.2) sap = p; (2.3) p is the pointwise supremum of linear functions that do not exceed p. (3) Let A be a nonempty subset ofJRn. Then the following statements are equivalent: (3.1) A is convex and closed; (3.2) asA = A; (3.3) A is the intersection of all half-spaces containing A. (4) Let A be a nonempty subset ofJRn. Then the following statements are equivalent: (4.1) A is convex, closed, and contains the origin; (4.2) ADO = A; (4.3) A is the intersection of all half-spaces of the form {x E JRn I x' . x:::; 1, x' E (JRn)/} containing A. (5) Let G be a nonempty subset ofJRn. Then the following statements are equivalent: (5.1) G is a convex closed cone;
(5.2) Gil
G;
(5.3) G is the intersection of all half-spaces of the form {x E JRn I x' . x:::; 0, x' E (JRn)'} containing G.
44
1. THEORY
(6) Let L be a nonempty subset ofJRn. Then the following statements are equivalent: (6.1) L is a subspace; (6.2) LJ..J.. = L; (6.3) L is the intersection of all hyperplanes of the form {x E JRn I x' . x = 0, x' E (JRn)'} containing L. (7) Let M be a nonempty subset of JRn. Then the following statements are equivalent: (7.1) M is an affine subspace; (7.2) M is the intersection of all affine subspaces of the form {x E JRn I x'·x = x'·xo, x' E LJ.., Xo EM} containing M, where L is the subspace from which M is obtained by translation. 1
PROOF. All the assertions are proved in a unified way. If D is some convex object (the epigraph of a convex function or a convex set) and d is one of the duality operators (*, 0, I, J..), then it is seen from the definitions that Ddd is a convex closed set and D C o«. If D is not closed and D i= o«, then there is an element in o« \ D whose separation from D (by the second separability theorem) immediately leads to a contradiction; hence o« = D. The assertions sup = p and usA = A are proved similarly. Nevertheless, in order not to repeat the same arguments, we deduce some assertions from those proved previously.
Assertion (1) is usually referred to as the Fenchel-Moreau theorem.? (1.1) ===} (1.3). If f == +00, then the implication is obvious. Assuming otherwise, we first outline the subsequent proof. If f(xo) < +00 for some Xo E JRn, then for any e > 0 the point (xo, f(xo)-c) does not belong to epi f; hence it can be strictly separated from epi J. But the separating hyperplane is the graph of an affine function which does not exceed J. Since e is arbitrary, this implies that f(xo) is the supremum at Xo of affine functions not exceeding l If f (xo) = +00, then for any number Go we can construct an affine function not
1In fact, for any affine subspace M which does not contain the origin, one can define the duality operator' by the equality M· {x' E (JRn)' I x, . x 1, "Ix E M}; its extension to subspaces passing through the origin leads to duality in projective geometry, but we will not tackle this matter here. 2 A geometric proof of the Fenchel-Moreau theorem for functions of one variable was given in the Introduction.
45
exceeding f with value at Xo greater than this number, which will prove the assertion in this case. Now we proceed to precise arguments. Let Xo E JRn and ao < f(xo). Then (xo, ao) ~ epi f. By assumption, the set epi f is convex and closed; hence by the second separability theorem there exist x' E (JRn)' and "( E JR such that
x' . Xo
+ ao,,( >
sup
(x,a)Eepif
(x'· x
+ o-y).
(i)
Now it is easily seen that "( ::::: . O Let f(xo) < +00. In this case "( < O. Indeed, putting the point (xo, f(xo)) into the right-hand side of (i), we obtain by elementary calculations that "((ao - f(xo)) > O. But ao - f(xo) < 0; hence "( < O. We may assume that "( = -1 (dividing inequality (i) by -"( if necessary). Then, denoting the right-hand side of (i) by c, we can rewrite this inequality as
x',xo-c>ao,
x'·x-c:::::a,
V(x,a)Eepij.
(ii)
For the affine function p(x) = x' . x - c we have p(x) :::::(x) for f all x E JRn (if f(x) = +00, this is obvious, and if f(x) < +00, this follows from the second inequality in (ii) with a = f(x)). Moreover, (ii) implies that ao < p(xo) :::::(xo), and taking ao arbitrarily close f to f(xo) we obtain that f on dom f is the supremum of affine functions not exceeding f. Now let f(xo) = +00. We have to verify that for any ao E JR there is an affine function p such that p(x) :::::f(x) for any x and p(xo) > ao. If "( < 0 in (i) (as before, we assume then that "( = -1), we see from (ii) that p(xo) > ao. In the case "( = 0 we can rewrite (i) as
x' . Xo - c > 0,
x'· x - c ::::: , 0
V(x, a) E epi j.
(iii)
According to the aforesaid there exists an affine function Po not exceeding f. For an arbitrary f..L > 0 consider the affine function PJ.L(x) = Po(x) + f..L(x' . X - c). The second inequality in (iii) implies that PJ.L is also no greater than f on dom f for any f..L> 0, while the first inequality implies that PJ.L(xo) = p(xo) + f..L(x' . Xo - c) > ao for sufficiently large f..L. (1.3) ===} (1.2). Let x' E (JRn)' and a E JR be such that x' 'x-a ::::: f(x) for all x E JRn. This is equivalent to a 2: SUPxERn (x'· X - f(x)) =
46
1. THEORY
1* (X').
Since
f(x) =
sup
x' E(JRn)' ,aEJR a?:.f*(x')
(x' . x -
0:).
(iii)
Since f is not equal to +00 identically, 1* is nowhere equal to -00; hence we can put 1* (x') for 0: in (iii). Then
f(x)
sup
x' E (IRn)'
(x'. x - 1*(X'))
= 1**(x).
(1.2) ===} (1.1). This implication follows directly from the definition of the conjugate function. (2) In this part and part (3) of the proof we use two statements which follow directly from the definitions: (a) if p is a sublinear function on lRn, then p* = bop; (b) if A is a convex subset of R", then (bA)* = sA. (2.1) (2.2)
===} ===}
(2.2). p (~) (bop)* ~ sop. (2.3). The equality (2.2) implies that
r: ~
=
p(x)
sup x'·x
x'EBp
and x'· x::; p(x) for any x E lRn, which is equivalent to (2.3). (2.3) ===} (2.1). The epigraph of a support function is the intersection of closed half-spaces; hence p is a closed function. (3) (3.1) ===} (3.2). bA (~) (bA)** ~ (sA)* ~ WsA. (3.2) ===} (3.3). The equality (3.2) implies that
{x
E lRn
I x'
. x::; sA(x')}.
(1)
Let x' E (lRn)'. If for some 'Y E lR the half-space H + (x', 'Y) = {x E lRn I x' . x ::; 'Y} contains A, then A contains the half-space {x E lRn I x' . x ::; sA(x')} c H+(x', 'Y). If H+(x', 'Y) contains A for no 'Y, then sA(x') = +00; hence {x E lRn I x' . x ::; sA(x')} = lRn. Thus the intersection of all half-spaces containing A coincides with the intersection in the right-hand side of (ii) and hence with A. (3.3) ===} (3.1). A is convex and closed as an intersection of convex closed sets. (4) (4.1) ===} (4.2). Clearly, A c ADO. Let us prove the opposite inclusion. Suppose Xo E ADO and Xo ~ A. Then by the second separability theorem there exists a vector x' E (R") , x' i= 0, such
47
that sUPxEAx' . X < x' . Xo. Since 0 E A, we can assume that the left-hand side of the latter inequality is positive and (multiplying x' by a positive constant) set it to be one. Then x' .x :::;1 for any x E A, i.e., x' E AO. But x'· Xo > 1; hence Xo ~ AOo, which contradicts the assumption. (4.2) ===} (4.3). The equality (4.2) implies that
x'EAO
{x E JRn
I x'
.x
< 1 }.
(2)
(5) (5.1) ===} (5.2). Clearly 0 E C. Since the cone C along with any x contains the element tx for any t > 0, it is easily seen that Co = -C'. Next, it is obvious that (-A)O = -Ao; hence C" = -(-CO)O (5.2)
=
===}
{x E JRn
I x'
. x:::; O}.
(3)
x'EC'
(6) (6.1) ===} (6.2). We have already pointed out that subspaces in JRn are closed sets. Clearly each subspace is a cone. Since the subspace L with any x contains tx for any t E JR, we have L' = L1...
Thus L1..1..= L" (~) L. (6.2) ===} (6.3). The equality (6.2) implies that L=
{x E JRn
I x'
. x = O}.
(4)
x'EL.l.
The subsequent reasoning is the same as in the previous case. (7) (7.1) ===} (7.2). Since any affine subspace is a convex closed set, we can use (2.2). Let M = Xo + L, where Xo E M and L is a subspace. One can easily check that 8M = x' . Xo + bL1.., so that 88M = {x E JRn I x'· x:::; x'· XO, \lx' E L1..}. By (2.2) M = 88M; hence the inequalities x' . x :::; ' . Xo are in fact the equalities. Indeed, x the inequality x' . x < x' . Xo for some x E M would contradict the fact that x' E L1.., since x - Xo E L. Hence we can write
M=
x'EL.l.,
{ x E JRn
I x'
.x
= x'
. Xo }
(5)
D
xoEM
48
1. THEORY
Note that relations (1) through (5) mean correspondingly that convex closed sets, and only them, are solutions to systems of nonhomogeneous inequalities; closed zero-convex sets, and only them, are solutions to systems of nonhomogeneous inequalities of the form a· x :::;1; convex closed cones, and only them, are solutions to systems of homogeneous inequalities; affine manifolds, and only them, are solutions to systems of nonhomogeneous equations. 3. Convex calculus • In this section we expose convex calculus, which is a collection of formulas for the action of duality operators (conjugation for functions, subdifferential, support, and polar for sets, conjugation for cones, and annihilator for subspaces) on elements that are themselves obtained from a pair of elements by some operations on sets or functions (e.g., the sum or intersection of convex sets, or the maximum or sum of convex functions), as well as on elements that are images or inverse images of sets or functions under a linear mapping. The major part of this section is devoted to subdifferential calculus, which is the most essential part of convex calculus. Subdifferential calculus in convex analysis is an analog of differential calculus, where the role of derivative is played by the subdifferential.
DEFINITION 3. Let f: The set (possibly empty) ]Rn
---+
lR
with f(xo)
finite at Xo
E ]Rn.
8f(xo)
{x'
(]Rn)'
I f(x)
- f(xo)
2: x' . (x - xo),
Vx E ]Rn}
is the subdifferential
The elements of 8f(xo) are called subgradients of f at Xo. It is clear that x' E (]Rn)' is a subgradient of f at Xo if and only if there exists a E ]R such that the affine function x ---+ x' . x + a does not exceed f everywhere and equals f(xo) at the point Xo. It is not hard to check that if f: ]Rn ---+ lR is convex and differentiable at Xo E R", then 8f(xo) = {f'(XO)}. Indeed, it has been shown that the function rl((f(xo) + tx) - f(xo)) is monotone decreasing in t for any x; hence f(xo +x) - f(x) 2: f'(xo; x) = f'(xo) ·X, i.e., f'(xo) E 8f(xo). Conversely, if x' E 8f(xo), then, by definition,
3.
CONVEXCALCULUS
49
f(xo + tx) - f(xo) 2: t X' . X for all X E X and t > O. Dividing this inequality by t and taking the limit as t ! 0, we obtain f'(xo)·x 2: x'·x, i.e., x' = f'(xo). 0 Now we state a simple fact playing, however, a very important role in the theory of convex extremal problems. This fact is an analog of the Fermat theorem for convex functions. Let f : ~n --+ iR be a convex function. Consider the extremal problem
f(x)--+min, This is a minimization straints. xE~n. con-
THEOREM (analog of Fermat's theorem). A point E ~n affords a minimum in the problem without constraints if and only if 0 E
8f(x).
PROOF. The function f attains the minimum at the point E ~n (obviously, f(x) < +00) if and only if f(x) - f(x) 2: 0 = O· (x - x) for all x E ~n, which is equivalent to the inclusion 0 E 8 f (x). 0 Now we illustrate the calculation of subdifferentials amples. by some ex-
EXAMPLE l. f (x) = e", x E R This is a convex smooth function. By what we proved above, 8f(x) = e", Here is an example of a convex function (again on R) with empty subdifferential at a point. Let f(x) = xlogx for x > 0, f(O) = 0, and f(x) = +00 for x < O. Since there is no k E ~ to satisfy the inequality xlogx 2: kx for all x> 0, we see that 8f(0) = 0. EXAMPLE 2. f(x) = [z], x E R It is again a convex function. It is clear that k E 8f(0) if and only if kx :::; Ixl for all x E R, i.e., 8f(0) = [-1,1]. It Xo i= 0, then, obviously, k E 8f(xo) if and only if k = signxo, i.e., k E 8f(0) and kxo = Ixol. In the next example this situation is extended to arbitrary linear functions. sub-
EXAMPLE 3. Let p: ~n --+ ~ be a sublinear function. We will denote its subdifferential at zero simply by 8p, i.e., 8p = {x' E (~n)' I x'· x:::; p(x), \Ix E ~n}. If p is continuous, then, for any Xo E ~n,
8p(xo)
{x' E (~n)'
I x'
E 8p, x' . Xo
p(xo)}.
50
1.
THEORY
Indeed, let x' E up(xo). Then p(x) - p(xo) ~ x' . (x - xo) for any x E ]Rn. In this inequality we let x = 0 and then x = 2xo to obtain p(xo) = x' . Xo. Next, if A > 0, then p(AX + xo) - p(xo) ~ x' . AX, or p(x + A-1xo) - A-1p(Xo) ~ x'· x. Letting A ----+ 00 we obtain that p(x) ~ x' . x, i.e., x' E up. Conversely, let x' E up and p(xo) = x'· Xo. Then p(x) - p(xo) ~ x'· (x - xo) for any x E ]Rn; hence x' E up(xo).D EXAMPLE 4. Let JA be the indicator function of a set A E R". Then it follows directly from the definitions that for any Xo E A
uJA(xo)
= {x'
=
E E
{x'
(]Rn)' (]Rn)' .x
I x' I x'
uJL(xo)
of L.
= {x'
(]Rn)'
I x'
= 0, 't:/x E L} = L1-
Now we will state the main properties of a subdifferential. THEOREM (on existence and structure of a subdifferential).
f:
]Rn ----+
Let
= u1'(xo; -);
continuous at Xo, then uf(xo) compact set.
is a nonempty
is the intersection of closed half-spaces over all x E R", this is a convex closed set (possibly empty). 2) Let x' E uf(xo). Then for any x E ]Rn and t > 0 we have x' . x = C1 x' . tx ::; C1(f(xo + tx) - f(xo)). Taking the limit as t ----+ 0 we obtain that x' E u1'(xo; 0). Conversely, if x' E u1'(xo; 0), then we have from the definitions x' . (x - xo) ::; l' (xo; x - xo) ::; f(xo +x - xo) - f(xo) = f(x) - f(xo) for all x E ]Rn, i.e., x' E uf(xo). 3) First we note that continuity of fat Xo implies that 1'(xo;') is a closed proper function. Indeed, the function x ----+ f(x + xo) - f(xo) is bounded in some neighbor hood of the origin, and since l' (xo; x) ::; f(x+xo) - f(xo), the function 1'(xo;') is also bounded in this neighborhood. Therefore it is continuous at the origin and hence everywhere (hence also bounded everywhere). Continuity of a function implies its closedness. Now we will show that uf(xo) =I=- 0. It suffices to verify that up =I=- 0. Denote p = 1'(xo, .). It up = 0, then for any x' E (]Rn)'
{x' E (]Rn)'
I x'
3. CONVEX CALCULUS
51
p**
there is x E ]Rn such that x'· x - p(x) > 0; hence p* == +00. But then == -00, which contradicts the Fenchel-Moreau theorem. In view of continuity of J at Xo there is b > 0 such that J(xo + x) - J(x) :::;1 for all x E Bo(O). If x' E 8J(xo), then such x fulfills the inequality x' . x :::;J(xo + x) - J(x) :::;l. This implies that 8J(xo) is contained in the ball of radius .;nIb. Hence 8J(xo) is a nonempty compact set. 0 Now we will prove two main theorems of convex calculus (see Ioffe and Tikhomirov (1979)).
THEOREM (Moreau-Rockafellar). Let Ji: ]Rn ---+ iR, i = 1,2, be two convex proper Junctions. Suppose there exists a point where both Junctions are finite and at least one oj them is continuous. Then
8(h
(h
+ h)'(X;·)
PROOF. The
previous
J{(x;·)
+ J~(x;·),
xE
]Rn,
xt
which is valid for arbitrary sets Al and A2 in R". Now applying successively the second statement of the theorem on convex duality (see Section 2), (*), and the third statement of that theorem, we obtain
52
1.
THEORY
THEOREM (Dubovitskii-Milyutin). Let i.. lRn ----+ i, i = 1,2, be convex functions continuous at a point x E lRn and let I,(x) = h(x). Then 8max(h, h)(x)
=
co (8h(x)
U 8h(x)).
PROOF. Since (max(h, h))'(X;') = max(f{(x; -), f~(x; .)) (which can be verified trivially) it again suffices to prove the theorem for sublinear functions Pi = fi(x; -}, i = 1,2. Since fi, i = 1,2, are continuous at x, by the theorem on existence and structure of the subdifferential the sets 8Pi, i = 1,2, are compact; hence the set co (8Pl U 8p2) is compact as well (indeed, if (1- ak)xl +akx2 ----+ x E clco (8Pl U8p2), x~ E 8Pi, i = 1,2, 0 :::; ak :::;1, then, as in the previous theorem, one can select a subsequence {Xkl} such that (1 - ak)x~ + akx2 ----+ (1 - a) xl + ax2, where xi E 8Pi, i = 1,2, and 0 :::; a :::;1. Thus x E co (8Pl U 8p2))' We will also need the following easily verifiable equality which is valid for any Ai C lRn, i = 1,2. Now applying successively the second statement of the theorem on convex duality, (**), and the third statement of that theorem, we obtain 8max(Pl,P2)
=
8max(s8Pl,
S8p2)
= 8s
= co (8Pl
U 8p2).
EXERCISE 1. Find the subdifferentials of the following functions: max(e-X,eX) at x = 0, max(-logx,x -1) at x = 1, and max(x,x2) at x = 0 and x = 1.
It is clear that the last two theorems can be extended by induction to any finite number of functions. The following result extends the Dubovitskii- Milyutin theorem.
THEOREM (Levin (1985); subdifferential form of the cleanup theorem). 3 Let T be a compact set in lRm. Suppose a function f: T x lRn ----+ lR is such that the mapping f (t, -}: lRn ----+ lR is convex for each t E T and the mapping f (-, x): T ----+ lR is continuous for each x E R": Let x' E 8 maXtET f (t, x) for some x E lRn. Then
3The very idea of "clean-up" appeared in the works of de la Vallee-Poussin on approximation theory. It was put in a more complete form by 1. G. Shnirel'man.
3. CONVEX CALCULUS
53
E T, i = 1, ... .r ,
U Oxf(ti,
i=l
--4
PROOF. The condition x' E omaXtET f(t,x) is equivalent to 0 E og(x), where g(x) = maxtET f(t, x) - x'· x = maxtET(f(t, x) - x'· x). By Fermat's theorem is a minimum point of g(.). Then by the clean-up theorem (see Section 11) applied to this function there are a positive integer r :s; n + 1 and points t, E T, i = 1,. .. ,r, such that x is a minimum point of the function maxI<i<r f(ti, x) - x' . x. This implies, again by Fermat's theorem, that the subdifferential of this function at x contains zero, or equivalently, x' E 0 maxI<i<r f(ti, x). Reducing r if necessary, we may assume that the values at of the functions f(ti, .), i = 1, ... ,r, are equal. Now the required assertion follows directly from the Dubovitskii-Milyutin theorem. 0
Formulas of convex calculus. First we list the operations on convex sets and convex functions. We begin with sets. The following operations associate with each pair of convex subsets of ]Rn a convex set in the same space: • convex hull of a union (co U): (AI,A2) --4 Al couA2; • intersection (n): (AI, A2) --4 Al n A2; • sum (+): (AI,A2) --4 Al + A2; • Kelley's sum (181):(AI,A2) --4 Al 181 2 = A Uo::;a::;1 ((1 - a)AI n aA2). Let A: ]Rn --4 ]Rffi be a linear operator. Then one can associate with each convex set A c ]Rn (B C ]Rffi) a convex set in ]Rffi (R") by the following rule: • the image of A under the mapping A: AA = {y E ]Rffi I 3x E A: y = Ax}; • the inverse image of B under the mapping A: BA = {x E ]Rn I Ax E B}. The following operations on convex functions associate with each pair of convex functions on ]Rn a convex function on the same space (for numbers a and b we will usually write a V b instead of max{ a, b} ):
54
l. THEORY
• convex hull of minimum (co /\): (11, h) --+ 11 co /\12 ((l1co/\12)(x) = inf{(l- a)l1(xd + a12(x2) I x =
(1- a)xl
• maximum (V): (11, h) --+ 11 V 12 ((11 V 12)(x) = max(11 (x), 12(x))); • convolution (EB): (11, h) --+ 11 EB12 ((11 EB12)(x) inf{l1(xl) + 12(x2) I x = Xl + X2}); • sum (+): (11,12) --+ 11 + 12 ((11 + 12)(x) = l1(x)+
12(x));
• Kelley's sum (1Zi): (11, h) --+ 11 IZi 12 (11 IZi 12)(x) inf{h(xl) V 12(X2) I x = Xl + X2}. 2. Let h(x) = eX, 12(x) that (h EB 12)(x) = 0 and (13 EB 12)(x)
EXERCISE
= x2/2.
= e-x,
and h(x)
= x2.
Show
Moreover, we will introduce two more operations which associate a convex function to a convex function and a linear operator. Let A: lRn --+ lRm be a linear operator, f a convex function on lRm, and 9 a convex function on R": • the inverse image of f under the mapping A: (j, A) --+
fA
(j A(x)
f(Ax));
• the image of 9 under the mapping A: (A, g) --+ Ag (Ag(y) = inf{g(x) I Ax = y}). The operations on sets thus introduced (except for Kelley's sum) look quite natural. Kelley's sum, as will be seen in the sequel, arises from the reasons of duality. The operations on functions (except for maximum, sum, and inverse image under a linear mapping) look rather artificial, but in fact all these operations result in a natural way from the set operations applied to the epigraphs of the functions. Specifically, if C is a convex subset of lRn x lR, then it is not hard to check that the function f: lRn --+ ~ defined as f(x) = inf{t E lR I (x, t) E C} is convex. Now if hand 12 are convex functions, then one can easily check that the set C = co( epi 11 U epi h) corresponds to the convex hull of the minimum and, obviously, the set C = epi h nepi [z corresponds to the maximum of these functions. Further, it is not hard to see that C = epi h + epi 12 corresponds to the convolution. Using that the epigraphs lie in the product spaces, one can associate with two convex sets epi 11 and epi 12 the convex sets {(x, t) E lRnxlR I (X,ti) E epik i = 1,2} and {(x,t) E lRn x lR I (Xi,t) E epik i = 1,2}. One can easily check that the former corresponds to the sum
3.
CONVEXCALCULUS
55
of functions and the latter to Kelley's sum. The arises from the reasons of duality. In the following theorem the sign "=" means the convex objects in question holds without any tions, while the sign "~" means that the equality certain additional conditions.
image of a function that the equality of additional assumpis only valid under
THEOREM (the main formulas of convex calculus). (1) Let h, 12, g be convex functions on]Rn, f a convex function on ]Rm, A: ]Rn --+ ]Rm a linear operator, and A': (]Rm)' --+ (]Rn)' the operator conjugate with A. Then
1.1 1.3
1.2 1.4
(fIco /\ 12)*
R V f;;
ffi
(h
ffi
12)*
= R + f;;
1.5 (Ag)*
= g* A';
f;;
(2) Let AI, A2, A be convex subsets of ]Rn; B a convex subset of ]Rm; and A: ]Rn --+ ]Rm a linear operator. Then
n A2) = J(AI Q9 A2) = JAI + JA2 = JAI V JA2; 2.2 J(AI co U A2) = JAI co /\ JA2; 2.3 J(AI + A2) = JAI
2.1 J(AI
2.4
ffi JA2;
J(AA)
= AJA;
2.5 J(BA)
= JB A.
item and
(3) Let AI,A2,A,B,A be as in the previous operator conjugate with A. Then 3.1 S(AI 3.4 s(AI 3.6 s(BA)
A' the
n A2) ~ SAl
Q9
= SAl
Q9
sA2;
3.5
(4) Let A I, A2, A, B, A be as in the previous item. 4.1 M(AI 4.3 M(AI
n A2) = MAl
V MA2;
Q9
4.2
+ A2)
~ MAl
MA2;
4.4 M(AI
A2) ~ MAl
=
+ MA2;
A' the
4.5 M(AA)
= A MA;
4.6 M(BA)
MB A. item and
(5) Let AI,A2,A,B,A be as in the previous operator conjugate with A. Then 5.1 (AI 5.3 (AI
n A2t ~ A~ co U A~;
+ A2t =
A~
Q9
A~;
(AI
Q9
A2t
~ A~
= A~ n A~; + A~;
5.5 (AAt=AoA';
5.6 (BAt=A'Bo.
56
1. THEORY
(6) Let P1,P2 be sublinear functions on ~n, A: ~n ----+ ~m a linear operator, and A' the operator conjugate with A. Then
6.1 6.3 6.5
8(p1 V P2) ~ 8P1 co U 8P2; 8(P1 EB P2) = 8P1 n 8P2; 8(pA) ~ A'8(pA).
8P1 n 8P2;
(7) Let C and L be a convex cone and a subspace in ~n respectively. Let A: ~n ----+ ~m be a linear operator and A': (~m)' ----+ (~n)' the operator conjugate with A. Then
7.1 (C1 + C2)' = C~ n C~; 7.3 (L1 7.2 (C1 n C2)' ~ C~ + C~; 7.4 (L1
+ L2)j_
7.5 (AL)j_ =
Lt Lj_A';
=
n L~;
n L2)j_
+ L~;
PROOF. We will not prove all the formulas; the rest is left to the reader. The formulas 1.2, 1.3, 1.5 follow easily from the definitions. Formulas 1.1 and 1.4 are valid under the additional assumption ri (dom j'j ) n ri (domh) i= 0, and 1.6 holds provided there is x such that Ax E ri(domf) (see Rockafellar (1970)). The formulas in part (2) follow from the definitions, as well as formulas 3.2, 3.3, and 3.5. Formula 3.1 is valid if ri Al n ri A2 i= 0. Indeed, using the easily verifiable relation
(JA)*
sA
(i)
s(A1 n A2)
+ JA2)*
s(BA) ~ (J(BA))*
(~) ((JB)A)*
Formulas 4.1, 4.2, 4.6 follow easily from the definitions. For the proof of the remaining ones we formulate some auxiliary assertions,
4. FINITE-DIMENSIONAL
CONVEX
GEOMETRY
57
hEJ
is an arbitrary
(ii) if Al and A2 are arbitrary sets, then (iii) and, finally, if a and b are arbitrary nonnegative numbers, then inf{max(a/a,
]Rn ---+ ]Rn,
b/,8)
a +,8 = I} = a + b.
(iv)
]Rn
Let us prove 4.3. We apply formula 4.5 with operator A: A(Xl,X2) = Xl + X2. Then for any X E]Rn we have
=
J-L(AI + A2)(x)
J-L(A(AI
A2))(x) x A2)(Xl,X2)
(~) inf{J-L(Al
(iii) .
Ix
= Xl
+ X2}
I X = Xl + X2}
J-L(Al0A2)(X)=J-L(
U
<>,/3>0, <>+/3=1
(aAln,8A2))(X)
I}
= J-LAl(X)
+ J-LA2(X),
convex geometry
4. Finite-dimensional
• Convex geometry traces its origin to the 19th century in the works of Cauchy, Steiner, and Minkowski. It reached its fullest flower in the 20th century. Nowadays finite-dimensional convexity also attracts considerable interest due to its applications to optimization algorithms and algebraic geometry. In this section we present some classical results in this area.
58
1. THEORY
4.1. Cauchy's theorem on rigidity of convex polyhedra. We begin with the first fundamental result in the history of convex geometry. Cauchy's theorem says that a convex polyhedron in the three-dimensional space is uniquely, up to translation, determined by its faces. Now we formulate this result more precisely. We will say that two polyhedra, M and M', are equivalent if there is a correspondence f between their faces, edges, and vertices such that if f is a face or an edge of one of them and if I'1 is, respectively, an edge or a vertex of I', then f(f1) is an edge or a vertex of the face or edge f(f) of the other polyhedron.
THEOREM (Cauchy). If the respective faces of two equivalent convex polyhedra in the three-dimensional space are isometric, then the polyhedra themselves are isometric.
Isometric bodies are also called congruent, i.e., superposable by translation. Hence Cauchy'S theorem can be reformulated as follows: If the respective faces of two convex polyhedra are congruent, then the polyhedra themselves are congruent. If we drop the convexity condition, then this assertion fails; see Figure 4.1.
a)
FIGURE
b)
4.1
PROOF. We will deduce Cauchy'S theorem from the following geometric lemma. LEMMA 1. Suppose the faces of two noncongruent polyhedra can be put in a correspondence so that the planar angles of the corresponding faces are equal. Let the edges of the first polyhedron be marked with + or - if the corresponding bihedral angle at this edge is greater
4. FINITE-DIMENSIONAL
CONVEX GEOMETRY
59
or less than the corresponding bihedral angle of the other polyhedron. Next, we mark each planar angle between two marked edges with I or 0 according to whether the signs on these edges are different or the same. Then there are at least four angles of the first polyhedron marked with 1. To explain this idea, imagine a regular tetrahedral angle. Let us press the opposite edges towards each other. Then the corresponding bihedral angles increase, while the other two decrease, so that the edges will be marked with alternate signs, and all of the four planar angles will be marked with 1. We will deduce Cauchy's theorem from the lemma and then prove the lemma. Assume that not all of the bihedral angles of the two polyhedra are equal. Let us mark the edges of M with the signs + or - if the corresponding bihedral angle of M is greater or less than that of M'. Again, we mark a planar angle between two marked edges with I or 0 according to whether the marks of its sides alternate or coincide. Consider first the case where all the edges of the polyhedron are marked. Let No be the number of vertices, Nl the number of edges, and N2 the number of faces of the polyhedra, and let the total number of planar angles of the first polyhedron marked with I be N. Denote by a3 the number of triangular faces, by a4 that of quadrangular faces, and, in general, by ak the number of n-gonal faces. The number of I-angles in a k-gonal face is even and does not exceed k. Hence
(i)
Moreover, we have a3
+ a4 + a5 + ...
= N2
(ii)
and
3a3
= 2N1.
(ii)
2a3
(iii) (iv)
Thus N < 4No - 8 < 4No. On the other hand, by Lemma I we have at least four l-angles adjacent to each vertex, so that N 2: 4No, and we arrive at a contradiction.
(iii),(iv)
60
1. THEORY
In case not all of the edges are marked, we can follow the same lines. Let N{ be the number of marked edges, N~ the number of vertices at which the marked edges originate, and N~ the number of regions into which the marked edges split up the surface of the polyhedron. By the lemma, a succession of marked edges cannot "terminate" by entering a vertex from which no other marked edge goes (this is why these edges split the polyhedron into regions). The network of marked edges need not be connected: they may form several components with no marked edges joining them. But the numbers a~ of regions bounded by k marked edges and the numbers n~ of marked edges originating from the sth vertex have the following two important properties: a~ = 0 and n~ ~ 4. Therefore the above scheme of allocating signs will again lead us to a contradiction if (instead of Euler's identity) we prove the inequality N~ - N{ + N~ ~ 2. In order to prove it, we will join successively the remaining edges to the marked ones, adding at each step an edge with at least one vertex belonging either to a marked edge or to an edge added before. After each step the number of edges increases by one, while the combined number of vertices and regions either increases by one or remains unchanged. Indeed, if at some step a new vertex appears, the number of regions does not change; and if the added edge connects two "old" vertices, then this edge either splits a region into two or splits up neither of the components. Therefore at each step the quantity No - NI + N2 either remains unchanged or decreases by one, and eventually, when all the edges have been added, we get No - NI + N2 = 2 by Euler's formula, which proves the inequality. Thus all the bihedral angles of equivalent polyhedra are equal; hence when applying the two polyhedra to each other with a respective face, they will coincide as a whole. 0 For the proof of Lemma 1 we need one more lemma. LEMMA2 (Cauchy-Steiner). Suppose two convex polygons (planar or spherical) with vertices AI'" An and A~ ... A~ have equal lengths of the corresponding sides IAIA21 = IA~A~I,··· ,IAn-lAnl = IA~_lA~I, and their angles satisfy the inequalities LA2 ~ LA~, ... , LAn-l ~ LA~_l' Then IAnAII ~ IA~A~I·
PROOF. The lemma is obvious for triangles, because in triangles with two sides respectively equal the side opposite the larger angle is larger.
4. FINITE-DIMENSIONAL
CONVEX GEOMETRY
61
Turning to the general case, assume first that only one angle of the first polygon differs from the corresponding angle of the second, say LAi < LA;, whereas LAj = LAj, j i= i, i = 2, ... , n - 1. In this case the lemma is intuitively obvious: let us join the first and the last vertices with an elastic thread and put a hinge at a vertex distinct from these two. It is clear that increasing the angle at the hinged vertex (keeping the polygon otherwise rigid) will stretch the thread. Of course, this argument can be put in a precise form. Let us join Ai and A; with Al,An and A~,A~ respectively. Then the polygons AiAi-l ... Al and A;A;_l ... A~, as well as the polygons AiAi+l'" An and A;A;+l ... A~ are congruent, i.e., IAiAnl = IA:A~I and IAiAll = IA;A~I, so that we can again apply the theorem for triangles (since LAIAiAn < LA~A;A~). If there are several angles of the first polygon that are less than the corresponding angles of the second, we will increase them successively, keeping the angles of the second polygon unchanged. At this point Cauchy finished the proof. The argument looked so obvious that no stumbling block could be expected. However, there was a gap in Cauchy's argument, which was found by Steiner. The matter is that the above argument goes through provided the procedure of successively increasing the angles results each time in a convex figure. Unfortunately, this is not necessarily so. Nevertheless, the lemma is true, but Cauchy's arguments have to be somewhat refined (for details, see Lyusternik (1956)). Lemma 2 immediately implies Lemma 1. Indeed, suppose there is a vertex such that the number of l-edges adjacent to it is two. Let us intersect the surface of each polyhedron with a sphere centered at such vertex of a radius small enough to contain no other vertices. Then we obtain two spherical polygons with angles equal to the corresponding bihedral angles. If there is only one change from + to - between, say, Al and An and one change from - to + between, say, Ai and Ai+l' then we take some points B on [AI, Anl and G on [Ai, Ai+ll and the points B' and G' on the corresponding sides of the second polygon. Applying the Cauchy-Steiner lemma to the polygons BAlA2··· Ai+l G and B' A~A~ ... A;+l G', we obtain that IBGI > IB'G'I, and an application of this lemma to BAnAn-l'" AiG and B' A~A~_l ... A;G' yields IBGI < IB'G'I. This contradiction proves Lemma 1. 0
62
1.
THEORY
4.2. The Caratheodory, Radon, and Helly theorems. The first two results in this section, though of independent interest, provide an important step towards the proof of Hally's theorem, which has numerous and fruitful applications in geometry and analysis. THEOREM 1 (clean-up theorem for cones). Let A be a nonempty subset of R". Then any point x E cone A (conic hull of A) different from the origin is representable as a conic combination of at most n points of A (i. e., there are a positive integer r ~ n, points Xi E A, and numbers J.li > 0, 1 ~ i ~ r, such that x = L~=l J.liXi). PROOF. Let x E cone A and x =I=- O. Then by definition x L~l J.liXi, J.li > O. If N ~ n, we have nothing to prove. Otherwise, if N > n, the vectors {Xi }~l are linearly dependent; hence there are numbers ri, not all equal to zero, such that L~l riXi = O. Without loss of generality we may assume that some of the numbers ri are positive (otherwise we reverse their sign). Let S be the set of the i's for which ri > O. Set fJ = miniEs ~ and J.l~= J.li - fJri, 1 ~ i ~ N. Then all of the J.l; are nonnegative with at least one of them equal to zero, and we have L~l J.l;Xi = L~l J.liXi - fJ L~l riXi = x. Repeating this procedure we finally arrive at the required relation.D COROLLARY (Caratheodory's theorem). If A is a nonempty subset of ]Rn, then any point of the convex hull of A is representable as a convex combination of at most n + 1 points of A. PROOF. Consider the set B = {(I,x) E ]R x ]Rn I x E A}. It is clear that coB = {I} x coA. Let K be the cone generated by B. Then coB c K. If x E coA, then (l,x) E coB and by the cleanup theorem there are r ~ n + 1, points (1, Xi) E B (i.e., Xi E A), and numbers J.li > 0, 1 ~ i ~ r, such that (l,x) = L~=lJ.li(I,Xi). Obviously, this is equivalent to x E co {Xi}r=l' Xi E A. 0 THEOREM (Radon). Any finite set in ]Rn consisting of at least n + 2 points can be divided into two disjoint subsets such that the convex hulls of these subsets have a nonempty intersection. PROOF. Let A = {xdi=l' Xi E ]Rn, and s ~ n + 2. The vectors {Xi - xdi=2 are linearly dependent; hence there are numbers J.li,
4.
FINITE-DIMENSIONAL CONVEXGEOMETRY
63
z=
s i=l
ViXi = 0,
z=
s i=l
vi = 0,
(i)
ViXi, Then
Vi·
(ii)
i.e.,
(-V)X' 2
2,
{ilx,EA2}
n coA2.
THEOREM (Helly). Let A be a set of indices and {Aa}aEA a family of closed convex subsets of ]Rn with at least one of them compact. If any subfamily consisting of n + 1 sets has a nonempty intersection, then the whole family has a nonempty intersection. PROOF. First we prove by induction that any finite subfamily consisting, say, of s sets, s ~ n + 1, has a nonempty intersection. Denote by Ind( s) the inductive statement of the sth step. By assumption Ind(n + 1) is true. Assume that Ind(s - 1) is true for some s ~ n+ 2 and prove Ind(s). Take any s indices {adi=l' By Ind(s -1) there exists an Xj E ni=l,i;iojAa" j = 1, ,s. According to Radon's theorem we can partition Ns := {I, 2, ,s} into disjoint subsets N~ and N~ such that there exists an x E co {Xj }jEN~nco {Xj }jEN~' But if j E N~, then Xj E Aa" i E N~. Hence x E niEN~Aa,. In a similar way, if j E N~,then Xj E Aa" i E N~, so that x E niEN~, i.e., x E ni=l Aa,. If An is a compact set, then we have shown that the family of its closed subsets {Aa n An} aEA is centered (has the finite intersection property), which implies that the whole family has a nonempty intersection (see Dunford and Schwartz (1958)). 0 4.3. Minkowski's theorem on the existence of a convex polyhedron. Let M be a convex n-dimensional polyhedron, let {ni}~=l be the unit normal vectors to its (n - Lj-dimensional faces and {Fd~=l the (n - l.j-dimensional volumes of the faces. We will show that 2:7=1 Fini = 0. Indeed, let n be an arbitrary hyperplane and n the
64
1.
THEORY
unit normal to it. Then (n, ni) is the cosine of the angle between TI and the hyperplane passing through the face Fi, so that (n, ni)Fi is the signed (n - I)-dimensional volume of the projection of F; on TI. The projections on TI of the faces of the polyhedron M cover twice its projection on this hyperplane, and for each pair of points projecting into the same point the products (n, ni) of the corresponding normals by n have opposite signs. Hence 2:7=1 (n, ni)Fi = 0, i.e., unit vector n. Therefore 2:7=1 Fm; = O. For a convex n-dimensional polyhedron the vectors {nJ7=1 must span the entire ]Rn, since otherwise they would lie in a proper subspace so that the polyhedron would be unbounded in a normal direction to this subspace. Minkowski proved that the above conditions on the vectors n; and the numbers F; are not only necessary but also sufficient for the existence of the corresponding polyhedron. (n, THEOREM (Minkowski). Let {nJ7=1 be unit vectors in]Rn that span the entire ]Rn, and {FJ7=1 positive numbers such that 2:7=1 Fin, = O. Then there exists a convex n-dimensional polyhedron with volumes of its (n - I)-dimensional faces equal to Fi and normal vectors to its faces equal to ni. PROOF. The existence of the required polyhedron follows from the existence of a solution to the following extremal problem: Voln{x E]Rn I (x,ni)
k
----+
max,
LFihi
i=1
1,
hi ~ 0,
< k.
(P)
For positive hi the set in braces is, obviously, a convex polyhedron M(h), h = (hI,'" ,hk), of dimension k, since this set contains the ball {x E ]Rn Ilxl :::;minl::;iSk hd. Denote by V(h), h = (hI,'" ,hn), the functional to be maximized in the problem (P). The function h ----+ V(h) is continuous; hence (by the Weierstrass theorem) it attains maximum on the (k-l)dimensional simplex
{h
4.
FINITE-DIMENSIONAL CONVEXGEOMETRY
65
at some point h = (hl, ... ,hk), i = 1, ... ,k. Note also that in a neighborhood ofh the function V(·) is differentiable and (aV/ahi)(h) = Fi(h), where Fi(h) is the (n-l)-dimensional volume of the intersection of the polyhedron M(h) with the hyperplane {x E ]Rn I (x, ni) = hd. (Indeed, if Fi(hi) > 0, then substituting hi + t for hi we add to M(h) (for t > 0) or subtract from M(h) (for t < 0) a figure of volume tFi(hi) up to higher order terms for small t, or of volume o(t) if Fi(h) = 0.) Applying the Lagrange principle we obtain that Fi(h) = )"Fi and
k k
1 = LFi(h)hi i=l
=)..
LFihi
i=l
=)...
Fi.
4.4. The Cauchy and Steiner-Minkowski
M(h) is of volume 0
formulas.
THEOREM (Cauchy's formula). Let Me ]Rn be a convex polyhedron, let dp, be the measure on the sphere §n-l invariant with respect to translations, Jsn-l dp, = meas§n-l, and let VN(~) be the (n - 1)dimensional volume of the projection of M on the hyperplane orthogonal to the unit vector ~ E §n-l. Then
(1)
where VaM is the volume of the boundary of M; i.e., the sum of volumes of its faces, and f3n-l is the volume of the (n-l )-dimensional unit ball. PROOF. Let I', be one of the faces of the polyhedron M and Vr, (~) the volume of its projection on the hyperplane orthogonal to the unit vector ~ E §n-l. It is clear that Vr,(~) = I(Cni)!Vri, where ni is the unit normal to the face I', and Vr, is the volume of this face. Invariance of the measure du implies that Jsn-l 1(~,a)ldp, does not depend on the choice of a and hence is equal to some number Cn dependent only on n. Therefore
66
1.
THEORY
Each point of the projection of M (except a set of null measure) is covered by projections of two faces. This implies that
}§n-l
VM(~)dJL =
}§n-l2
r ~ LVr,(~)dJL
z
= Cn LVr,
2.
,
= cnV8M.
2
To evaluate the constant Cn, it is expedient to consider the unit ball B" instead of M (treating §n-1 as a limit of polyhedra inscribed in §n-1). In this case VBn(~) = 13n-1, so that cn/2 = 13n-1' D REMARK. Cauchy's formula implies that if one convex polyhedron lies inside another, the volume of the inner polyhedron does not exceed that of the outer. By approximating a convex body by convex polyhedra and taking the limit, it is not hard to establish Cauchy's formula for an arbitrary convex body M. Let M c ]Rn be a convex polyhedron and Mr, r 2: 0, the set of points in ]Rn distant at most by r from M (in Euclidean norm); i.e., M; is the union of all balls of radius r with centers in M. One can easily verify that M is a convex set.
where 130 = Vo is the volume of M, 131 is the volume of its surface, and 13n is the volume of the n-dimensional ball.
PROOF. We will conduct the proof by induction in n using Cauchy's formula for convex bodies (see the remark above). For n = 1 formula (2) is obvious. Assume that is true for (n - 1)dimensional polyhedra and let M be a convex polyhedron in ]Rn. Denote by PM(~) the projection of M on the hyperplane orthogonal to a unit vector F. It is clear that PMr(~) = (PM(~))r' Let Vr(~) be the (n - l j-dimensional volume of PMr(~)' By the inductive assumption, Vr(~) = Vo(~) + ... + 13n_1rn-1. According to Cauchy's formula
n-1 13n-1V8Mr
=
In-l VMr(OdJL
In-l L 13k(PMr(~))rdJL
k=O
4.
FINITE-DIMENSIONAL CONVEXGEOMETRY
67
where the ak are some constants. It is clear from geometrical considerations that where P; is the volume of the surface of Mr, so that Vr = Vo + io Ptdt = Vo + Vo + Por n-l
dVr / dr
Pr,
r I«
(Po + n-l ~
a~tk
dt
+ L bkrk+l.
k=l
o
4.5. The Brunn-Minkowski inequality and the Griinbaum+Hamrner theorem. The Griinbaum-Hammer theorem relies on the Brunn-Minkowski inequality and symmetrization. Both these facts are of independent interest. Let A c ]Rn. Denote by r y,a the hyperplane {x E ]Rn I (x, y) = a}. Writing Voln-1(C) for the (n - I)-dimensional volume of an (n - 1)dimensional set C, set
'Py(a)
= { Vol;!~~-l)(A
-00,
n ry,a),
(A
(A
n r y,a) n r y,a)
0, = 0.
=I-
THEOREM (The Brunn-Minkowski inequality). Let A be a convex compact set in R". Then for any y E ]Rn \ {O} and -00 < ao < al < 00, the following inequality holds:
M:=
Let A be a convex compact body in R", and let e = (1,0, ...,0), max{(e,x) I x E A}, m:= min{(e,x) I x E A}. Set a(A):= {x E]Rn
Ix
(t,~),
t E [m,M],
I~I:::;
r(t)},
where r(t) is such that Voln-1 B(O, r(t), ]Rn-l) = Voln-1 (A n r e,t). The set a(A) is the Brunn-Minkowski symmetrization of A. It follows from the Brunn-Minkowski inequality that the BrunnMinkowski symmetrization a(A) is a convex compact set (provided so is A itself) and
68
1.
THEORY
For a set C E ]Rn denote by gr C its center of gravity. If z, = (X*I, ... , x*n) = gr A, then the theorems of integral calculus imply that gru(A)
THEOREM (Griinbaum-Hammer). Let A be a compact convex body in]Rn and X* its center of gravity. Then any hyperplane r x.,(y,x.), y =I- 0, cuts off from A a volume whose fraction relative to that of A is at least (n / (n + 1))n (2: 1/ e); i. e. , Voln(A n
TIy,(y,x.»)
2: (n/(n
+ l))n
Voln(A),
Vy E ]Rn, y =I- O.
4.6. Some complements. Here we briefly review some more results from the theory of convex bodies and polyhedra, which is one of the earliest and most beautiful branches of geometry. It goes back to the theory of Platonic solids and Cauchy'S theorem on polyhedra. The basic subject of this theory is compact convex sets in ]Rn, which are also referred to as convex figures. Define a distance in the set of convex figures by setting
where B is the unit ball in ]Rn. The quantity h(AI, A2) is called the Hausdorff distance between Al and A2. Note also that since convex sets can be added and multiplied by numbers, AI+A2={x!X=XI+X2, aA xIEAI,X2EA2}, ay, YEA},
= {z ] x =
the set of all convex figures forms a cone. This cone is embedded into the "linear space of convex figures" as a semigroup is embedded into a group. We state the following important theorem. THEOREM (Blaschke's compactness theorem). The set of convex figures forms a locally compact cone (in topology induced by the Hausdorff metric). In many cases this theorem allows for proving theorems on existence of solutions to extremal problems concerning convex figures. The convex hull of a finite number of points is a convex polyhedron.
5.
69
THEOREM (on approximation). Let A be a convex figure in ~n and E > O. Then there are two convex polyhedra A_ and A+ such that A_ cAe A+ and h(A_, A+) < E; i.e., a convex compact set may be approximated by polyhedra with any accuracy. This theorem enables one to carryover many facts of the geometry of polyhedra to convex surfaces. The study of convex polyhedra began in ancient times. The 13th book of Euclid's Elements is devoted to five regular convex polyhedra in ~3, the Platonic solids: regular tetrahedron, cube, octahedron, dodecahedron, and icosahedron. In ~4 there are six regular convex polyhedra, and in ~n for n 2: 5 only three: regular simplex, cube, and octahedron. An important role in various fields of natural science is played by regular convex polyhedra which regularly partition the space, but this subject is beyond the scope of this book. An intersection of finitely many half-spaces forms a polyhedral set. THEOREM(on polyhedral sets). A set P is polyhedral if and only if it is finitely generated, i. e., if there exist ~i E ~n, i = 1, ... ,m, and a positive integer k, 0 :s; k :s; m, such that for any x E P there are numbers Ai 2: 0, i = 1, ... ,m, such that x
=
L:~=lAi~i+ L::k+l
Ai~i'
L:~=lAi
= 1.
This implies, in particular, that a bounded polyhedral set is a convex polyhedron and a finitely generated cone is closed because it is polyhedral. 5. Convex extremal problems
min,
x E C,
(P)
70
1. THEORY
The set C is the set of constraints or simply the constraint in the problem (P). If C = JRn, then (P) is a problem without constraints. The points of C are said to be admissible (in the problem (P)). An admissible point x is a solution to the problem (P) or a minimum point in the problem (P) if f(x) ;::::(x) for all x E C. The f minimal value of f in problem (P) is called the value of this problem. In a problem without constraints, necessary and sufficient conditions for to be a minimum point are given by an analog of the Fermat theorem stated in Section 3. In this section we will be interested in conditions for a minimum in a convex problem when C is not the entire JRn and is specified by certain relations such as equalities, inequalities, and inclusions. Suppose we are given a nonempty convex subset A of JRn, a convex proper function [«: JRn -4 iR finite on A, convex functions Ii: JRn -4 JR, i = 0,1, ... ,m', and affine functions Ii: JRn -4 JR, i = m' + 1, ... ,m (recall that an affine function is the sum of a linear function and a constant). The problem
fo(x) fi(x)
-4
SO,
+ 1, ...
,m,
x E A,
is obviously a convex one. It is usually referred to as a convex programming problem. The function c. JRn xJRm+l -4 iR, £(x, A) = 2:::7:0 Adi(x), where A = (Ao, AI, ... ,Am), is called the Lagrange function of the problem (PI) and the numbers Ai, i = 0,1, ... ,m, are the Lagrange multipliers; the vector A will be referred to as the set of Lagrange multipliers.
THEOREM
(Pd
(Karush-Kuhn-Tucker).
!!._roblemjPl). Then there is a set of Lagrange multipliers :\ = (:\0' AI,'" ,Am), not all of which vanish, such that (a) :\i ;::::0, i=O,I, ,m'; (b) :\d(x) = 0, i = 1, ,m'; (c) minxEA £(x,:\) = £(x,:\). If a point x admissible in the problem (P) satisfies conditions (a), (b), and ( c) with :\0 > 0, then x is a solution to the problem (P). Ifm' = m, x is a solution to the problem (P) and there is a point x admissible in this problem such that Ii(x) < 0, i = 1, ... ,m, then
:\0 =I-
(Slater's
condition).
5.
CONVEXEXTREMAL PROBLEMS
71
The equalities (b) in the theorem are called the complementary slackness conditions. PROOF. Let set
x be
...
Consider the
{b
(bo,b1,
,bmf
jRm+l
l:3x
E A:
= bi,
m'
+ 1:::;
i :::; m}.
Since b = (Jo(X) , 0, ... ,of E B for x = x, we have that B =I- 0, and it can be easily verified that B is a convex set. Then ri B =I- 0 by Proposition 1 of Section 1. We will show that b 1- ri B. Indeed, otherwise there is an E > such that b+ LB nBo(O) c B, where LB is the subspace from which aff B is obtained by translation. Since (Jo(x) + 1,0, ... ,O)T E B, we have (-1,0, ... ,O)T = (Jo(x),O, ... ,O)T(Jo(x) + 1,0, ... ,of E LB, and since (E/2)(-1,0, ,0) E Bo(O), we have b+ (E/2)(-1,0, ... ,O)T = (Jo(x) - E/2,0, ,of E B. But this implies that there is an admissible point x such that fo(x) :::; fo(x) - E/2 < fo(x). This contradicts the assumption that x is a minimum point in the problem (PI); therefore b 1- ri B. By the first separation theorem (see Section 1) there exists a nonzero vector>: = (>:0,>:1,... ,>:m) such that
z=
m i=O
>:ibi 2: >:ofo(x)
for b the vectors ...
(i)
Substituting
(Jo(X)
+ 1,0,
,of,
,of, .. ·,
(Jo(x),O, ... ,0,1,0, ...
,of,
where the unity in the last vector stands on the m'th place, we obtain that >:i 2: 0, i = 0,1, ... ,m', which proves (a). Now putting into (i) the vectors
(Jo(X), h(x),
0, ...
,of, .. · ,(Jo(X),
,of,
we obtain that >:di(X) 2: 0, i = 1, ... ,m'. But >:di(X) :::; 0, i = 1, ... ,m', because of (a) and because is an admissible point. Thus >:di(X) = 0, i = 1, ... ,m', which proves (b).
72
1. THEORY
fm(x)f
... ,
i=O
i=O =
where the second-to-last equality uses fi(X) This proves (c). Let conditions (a) - (c) be fulfilled with admissible point x we have
0, m'
+1
~ i ~ m.
>:0 >
i=l
i=l
is a minimum point in the problem (Pi). ~ Let us prove the last statement. Suppose that AO of the >:i are positive, and we have >:d(x) = i.e.,
L7:o >:d(x)
L7:o
L7:1 >:d(x)
, which is impossible.
REMARK. The Karush-Kuhn-Tucker theorem remains valid if we consider an arbitrary vector space X instead of ]Rn. Then the proof of necessity has to be modified, but the proof of sufficiency, obviously, does not depend on the structure of X (this will be used in Section 10).
5.2. Duality of convex problems. We begin with a general scheme of constructing to a given one. Consider the problem
a dual problem
f(x)-4min,
xE]Rn.
(1)
We will embed this problem into a family of "similar" problems depending on some parameter (or, as is said sometimes, we perturb the problem). More precisely, let F: ]Rm x Y -4 ~ be a function such that F(x,O) = f(x) for any x E ]Rn. For each y E ]Rm we consider the problem F(x,y) -4 min, x E ]Rn. The family of such problems is called a perturbation of the problem (1), and the function S: ]Rm -4 ~ specifying for each y the value S(y) of the problem (ly) is called the S-function of the family (ly). Clearly, S(O) is the value of the initial problem (1).
5. CONVEX EXTREMAL
PROBLEMS
73
Before defining the dual problem to (1) we present the reasons motivating this definition. If F is convex, then it is not hard to check that the 5-function is convex as well; and if the latter is closed, then by the Fenchel-Moreau theorem 5 = 5** and, in particular, 5(0) = 5**(0). Let us describe the problem having the value 5**(0), which will be taken for the dual to (1). By definitions 5(0) ~ 5**(0) = sUPY'E(lRm), (-5*(y')) and 5* (y')
y-
sup (O,x+Y"y-F(x,y)), xElR,yElRm where the last expression gives the function conjugate with F at the point (0, y'). Hence the dual problem to (1) (with respect to a given perturbation) is the problem
-F*(O,y')
--+
max,
y' E (lRm)'.
(1*)
Since the function F* (0, .) is convex, we see that (1*) is the maximization problem of a concave function. We call such problems convex as well (since instead of (1*) one can consider the minimization problem of F*(O, -); this replacement changes the sign of the value and does not affect the solution). Thus the dual problem is convex regardless of whether or not the initial problem is convex. The above relations imply that the value of the dual problem (1*) is always no greater than that of the initial problem (1) and that a sufficient condition for their values to be equal is 5(0) = 5** (0). The conditions to ensure this equality are, of course, based on the Fenchel-Moreau theorem. Let us write down the dual problem to (P) (assuming for simplicity that m' = m; i.e., there are no constraints of equality type) with respect to the perturbation
fo(x)
--+
min,
x E A,
which will be referred to as the standard perturbation. In terms of the general scheme this means that we write the initial problem in the form f(x)--+min, xElRn, where f(x)
f(x)
fo(x) if fi(X) S 0, i = 1, ... ,m, and x E A, and +00 otherwise, and then we consider the function F: lRn x
74
1. THEORY
lRm ----+
iR such that F(x, y) = fo(x) if fi(X) :::;Yi, i = 1, ... ,m = (Y1,'" ,Ymf) and x E A, and F(x,y) = +00 otherwise.
F*(O,y')=
sup
xEIRn ,yEIR=
(where
Let us find the function conjugate with F at the point (0, Y'):
(y'·y-F(x,y))
sup
yEIR=
sup
xEA
(y"y-fo(x)).
... ,m
f,(x)~y"i=l,
If y' = (y~, ... ,y~) 2: 0, i.e., all the coordinates are nonnegative, then the second supremum is easily seen to equal +00; and if y' :::;0, then it is equal to Thus the dual problem to 1 fi(X)Y~ - fo(x). (P) (with respect to the standard perturbation) has the form
2:::
- ~~~ (~fi(X)Y~
- fo(X))
----+
max,
Let us apply the formulas just obtained to the linear programming problem. This is a particular case of a convex programming problem which consists in maximization of a linear functional over a convex polyhedron. The fundamentals of the theory of such problems were laid by L. V. Kantorovich in the late thirties. His results were rediscovered by American mathematicians (Kuhn, TUcker, Dantzig, and others) during the 40s. The linear programming problem may be written in different forms. We will consider this problem in the so-called normal form:
C'
----+
min,
Ax 2: b,
x 2: 0,
(2)
where, as usual, x E lRn, C E (lRn)" A is an m x n matrix, b E lRm, and inequalities between vectors are to be understood coordinate-wise. Writing fo(x) = c- x, fi(X) = (-ai, x) + bi, i = 1, ... ,m, where the ai are the rows of the matrix A and the b, are the coordinates of the vector b, and letting A = lRf_'we see that (2) is a particular case of the problem (Pd. Due to the fairly simple structure of the linear programming problem, its dual problem (with respect to the standard perturbation) can be written in an explicit form, and it is also a linear programming problem. Indeed, the perturbation (1) is written as
C·
----+
min,
-Ax
+ b :::;y,
x 2: 0.
5. CONVEX EXTREMAL
PROBLEMS
75
=-
sup(y'. (-Ax+b)
x:2:0
- c·x)
inf (-y'.
x:2:0
b+ (y' A+ c) ·x).
The set of those y' where this function equals - 00 can be discarded because we are interested in its supremum. Thus y' A 2: -c and in this case -F*(O,y') = -t/: b. Introducing the variable ~ = -y', the dual problem to (1) is written as ~.b
--+
max,
~A S c,
~ 2: O.
5.3. Algorithms of convex optimization. The algorithms for finding the solutions to extremal problems are based on the ideas of the method of descent as well as on the penalization and central section methods. In the infinite-dimensional case there are various methods of reduction to a finite-dimensional setup. The method of central sections. Consider the minimization problem of a convex differentiable function f over a convex finite-dimensional compact body A C JRd: f(x)
--+
min,
x E A.
This is a general problem of convex finite-dimensional optimization. In the mid-60s, A. Levin in Russia and D. Newman in the USA proposed the following method of seeking a minimum in problem (P2) based on the Griinbaum-Hammer theorem, which was proved in the previous section. According to this theorem a hyperplane passing through the center of gravity of a convex body A in d-dimensional space splits the body into two sets, A' and A", with the volume of each of them no less than (1- e-l) times the volume of the entire set
A.
The method itself (referred to as the method of central sections) consists of the following procedure. Let us write Ao for A. Find its center of gravity Xl = gr Ao and compute f'(Xl). If it equals the zero vector, Xl is the minimum point and the problem is solved. Otherwise, we may discard the part of Ao lying in the half-space II~ := {x I f'(Xl)' (x - xd > O} (since, as we remember, any convex smooth function fulfills the inequality f(x) - f(O 2: 1'(0, (x - ~), so that for x E Ao n II~ we will have f (x) > f (xd 2: min 1). Denote by Ai the remaining part after deleting A n II~ (where A = Ao) and implement the procedure repeatedly.
76
1. THEORY
Now at the nth step we select the point ~n among {Xl, ... ,xn} where the value of f is no greater than any of the values {f(Xi), 1~ i ~n}. We will prove that f(~n) converges to the value of the problem (P2) at a geometric rate. Indeed, without loss of generality we may assume that 0 E A and that this is the minimum point in problem (P2). Let a > (VoId An/VoId A)l/d ;:::: /2 (where VoldC is the volume of a the d-dimensional set C). Then by definition Vold(aA) > Vol, An; hence there is an element x E aA \An. We see from the construction of the algorithm that if the element x was discarded, then f(x) > f(xs) for some s; hence f(x) > f(~n)' But x E aA, so that x = a~, ~ E A; therefore by the Griinbaum-Hammer theorem we have (denoting by var f the maximum of f(x) - f(O), X E A): f(~n)
+ (1- a)f(O)
defa
= f(O)
+ a(f(O
0
- f(O))
+ (1- e-It/dvar
f.
2:.~=1x% ~ 1 and the semi-ellipsoid is the "upper hemisphere", i.e., consists of the points of the ball with nonnegative last coordinate. Let us take the point ~ with coordinates (0, ... ,0, l/(d + 1)) for the center of the circumscribed ellipsoid and define the ellipsoid by the
The method of central sections has not received practical implementation because of computational difficulties in finding the center of gravity. But its basic idea can be used for constructing algorithms of much practical use. Here is one such algorithm. The method of circumscribed ellipsoids. This method is based on a combination of two ideas, viz., the idea of section discussed above and the following geometric fact: half of an ellipsoid can be placed inside an ellipsoid of smaller volume than that of the initial ellipsoid! It is essential that, first, the ratio of the volume of the ellipsoid circumscribing the "semi-ellipsoid" to that of the ellipsoid itself is less than one and, secondly, the computation of the center of the new ellipsoid from the position of the semi-ellipsoid requires order d2 operations. Let us show how the circumscribed ellipsoid is constructed. Due to the affine nature of the problem (i.e., because the ratio of volumes of geometric fugures does not change under their affine transformations) we may assume that the initial ellipsoid is the ball
5. CONVEX EXTREMAL
PROBLEMS
77
inequalities
c:
"""" k=l
d-l
Xk
2(d2 1) _ d2
(x
__ l_)2(d d+l d2
+ 1)2
<1
_.
The volume of this ellipsoid equals the product of its semi-axes times the volume of the unit ball. But we are interested in its ratio to the latter, which is therefore equal simply to the product of semi-axes: g(d) (
Jd2=l
)d-l d
+1
(d _ 1)(d-l)/2(d
+ 1)(d+l)/2'
It is not hard to prove (as a consequence of convexity of the function d --4 dlogd) that g(d) < 1 for any positive integer d 2: 2. Now we can describe the algorithm. Let Eo be an ellipsoid containing the set A. If its center Co does not belong to A, we can construct a hyperplane passing through Co and containing no points of A. Then we discard the half of the ellipsoid disjoint with A. Otherwise, if Co E A, we compute !'(co) and perform the central section "according to Levin-Newman" to obtain again a semi-ellipsoid, which we denote by Eb. Now we circumscribe around Eb an ellipsoid of smaller volume than that of Eo, denote this ellipsoid by El, and iterate the procedure. Again, we will approach the value of the problem at a geometric rate. The simplex method for the solution of linear programming problems. We will describe the method for finding the solutions to the problem c· x --4 max, Ax:S b, (1) where c = (Cl,'" ,cn), x = (Xl, ... ,xn)T, A = (aij) is an m x n matrix, and b = (bl, ... ,bm). The set of admissible vectors in this problem (i.e., of x E ]Rn such that Ax :S b) is the intersection of finitely many half-spaces. Such set is said to be polyhedral. A polyhedral set may be unbounded. A bounded polyhedral set is a polyhedron. It is not hard to show that a solution to problem (1) is achieved at a vertex of the polyhedral set. Thus to find the solution one has simply to examine the values of the function to be maximized at all the vertices and to select the largest of them. However, in applied problems the number of vertices is extremely large, and we need a reasonable procedure of examination. One such procedure was proposed by Dantzig. We will describe his simplex method in the so-called nondegenerate case.
78
1. THEORY
Suppose we have found some vertex x (there are effective methods for the search of vertices), and suppose this vertex is nondegenerate. This means that at this vertex exactly n out of m inequalities in the system Ax :s;; b become equalities and the matrix An (see below) is nondegenerate. Without loss of generality we may assume that the first n inequalities become equalities; i.e. (writing ai = (ail, ... , ain), i = 1, ... ,m),
ai . x
= bj,
1< i
< n,
(i)
By definition, the vertex x is nondegenerate if the vectors {al, ... , an} form a basis in ]Rn, i.e., the matrix An = (aijh:S;i,j:S;n is nondegenerate. Set b = (bl, ... bn). Then we find from the equation Anx = b , that x = A;;-lb. (ii) Let). be the solution of the equation
AT). n
(iii)
(where A; is the transpose of An). There are two possibilities: "X ~ 0, or some components of "X are negative. We will show that in the first case x solves problem (1). Indeed, let x be an admissible vector in this problem, i.e.,
Ax:S;; b.
Set "X = ("Xl,'"
(iv)
.x, 0, ...
,0). Then
(iv) -- n Xc· x (~) AT, /\ . X -, /\. A nX:S;; ,/\. -b (~.D, A nX - AT, /\ . - (~) c· -. n - /\. x,
hence x is a solution to problem (1). Consider the second possibility. Let, for example, the first coordinate of "X be negative. Denote by y a nontrivial solution of the system a2 . y = ... = an . y = O. (v) Since An is nondegenerate, that Then for small t
ai . X
i= O. We may
assume
y=
-E
< O.
(vi)
> 0 we have
i = 1, i ~n
+ ty < bi,
+ 1,
79
x + ty
+ t-) y
c· (X
(v),(vi) = C> x
+ t c· A-nl(
-E,
0, ...
O)T
=c·x+t(A~lfc·
=c . x
(-E,O, ...
,of
- tEAl> c . x.
Letting t tend to infinity, we find either that the supremum in this problem is infinite or that some jth inequality (j 2: n+ 1) becomes an equality. Then the corresponding point is a vertex and the functional to be maximized takes a larger value at this point than at x. Taking this vertex for the initial one (provided it is nondegenerate), we repeat the procedure, and so on. This is how the simplex method works in the nondegenerate case. The simplex method has played an exceptionally important role in the history of numerical methods of optimization. For many years it was unknown whether or not problem (1) is a so-called nonpolynomial (i.e., "difficult") one. In 1970 Klee and Minty constructed examples showing that in some situations the simplex method requires an exponential number of steps. Many mathematicians (including Dantzig himself) used to say that they regarded as a miracle the triumphant "service" of the simplex method in innumerable applied studies. And it was only recently that polynomial algorithms comparable with simplex method in efficiency were constructed. 6. Supplement: Convex analysis in vector spaces
• Convex analysis studies convex sets, convex functions, and convex extremal problems. An important feature of convex objects is that they may be described in different ways, but essentially in geometric and algebroanalytic terms. Moreover, convex objects admit a special calculus, viz., convex calculus. In this section we dispense with finite-dimensional vector spaces and approach the notions considered before from a somewhat different point of view. 6.1. The classes of convex sets. Here we introduce in geometric terms five classes of convex sets in an arbitrary vector space. The objects of these classes are unions of the so-called elementary objects, which possess certain geometric
80
1. THEORY
properties. We assume that the reader is familiar with standard definitions of a subspace, a cone, and a convex set. Let X be a vector space. 1) By an elementary subspace we mean a straight line passing through the origin. A subspace is a union of such lines containing any two lines along with the plane spanned by them. The class of all subspaces of X will be denoted by Lin (X). 2) By an elementary cone we mean a ray going from the origin. A convex cone is a union of such rays containing any two rays (unless they go in opposite directions) along with the entire angle between them. The class of all convex cones in X will be denoted by Cone (X). 3) By an elementary zero-convex set we mean a line segment with one end-point the origin. A zero-convex set is a union of such segments containing any two segments (unless they go in opposite directions) together with the entire triangle with two sides of these segments. The class of all zero-convex sets in X will be denoted by Coo (X). 4) By an elementary affine subspace we mean a point. An affine subspace is a union of points containing any two different points together with the straight line passing through them. The class of all affine subspaces in X will be denoted by Aff (X). 5) By an elementary convex set we also mean a point. A convex set is a union of points containing any two points together with the segment joining them. The class of all convex sets in X will be denoted by Co (X). By an elementary object we mean any of the elementary sets defined above. The classes of sets so defined admit a unified algebraic description as classes of sets which along with two points Xl and X2 contain all the points of the form 0lXl + 02X2, where the pair of real numbers (01,02) belongs to a certain subset A C lR?: A = ~2 for Lin (X), A = ~~ for Cone (X), A = {(01,02) E ~~ I 01 + 02 :::; I} for COD(X), A = {(01,02) E ~2 I 01 + 02 = I} for Aff (X), and A = {(01,02) E ~~ 101 + 02 = I} for Co (X). 6.2. Duality operators. All the objects in X defined above (subspaces, convex cones, zeroconvex sets, affine subspaces, and convex sets), which are unions of elementary objects, have dual objects in the dual space, which are intersections of dual objects to the corresponding elementary ones. We begin with definitions of dual objects to elementary ones.
SPACES
81
Let X' be the space dual to X, i.e., the set of all linear functionals on X. We write (x',x) for the action of x' E X' on x E X. The dual objects to elementary ones are defined as follows. For the straight line l it is the annihilator of l, l1.. = {x' E X' I (x', x) = 0, \Ix E l} (which is a hyperplane); for a ray r this is the conjugate cone of r, r' = {x' E X' I (x',x) 2: 0, \Ix E r}, which is a half-space bounded by a hyperplane passing through the origin; for a segment ~ this is the polar of~, ~o = {x' E X' I (x',x) :::;1, \Ix E ~}, which is a translated subspace. We do not specify the dual objects to points as elementary objects, since they will not be used in the sequel. The bidual elements in X are defined in a natural way: the biannihilator of a straight line l, l1..1..= {x E X I (x', x) = 0, \Ix' E l1..}, the biconjugate cone of a ray r, r" = {x E X I (x', x) 2: 0, \Ix' E r'}, the bipolar of a segment ~, ~oo = {x E X I (x',x) :::;1, \Ix' E ~O}. If a is a straight line, a ray, or a segment and d denotes the operation translating a to its annihilator, conjugate cone, or polar respectively, then it is not hard to see that add = a. Now we note that the annihilator of the linear hull of two straight lines (or, which is the same, of their sum) is the intersection of the annihilators ((h + h)1.. = It n It). In a similar way, the conjugate cone of the conic hull of two rays (or of their sum) is the intersection of the conjugate cones (h + r2)' = r~ n r~). The situation is slightly more complicated for segments. The polar of the convex hull of two segments is indeed the intersection of their polars, but the polar of the sum of segments gives rise to a new operation to be denoted by Q9 ((~l + ~2)O = ~l Q9 ~2 = UO<a<l((1 - a)~l n a~2)· The proofs of all these statements are based on-the following simple algebraic facts: if al and a2 are two real numbers and the expression al al + a2a2 is equal to zero for any aI, a2, then al = a2 = 0; is nonnegative for any nonnegative aI, a2, then al 2: 0, a2 2: 0; is no greater than one for any nonnegative aI, a2 such that al + a2 :::;1, then ai :::;1, i = 1,2; is equal to one for any aI, a2 such that al + a2 = 1, then al = a2 = 1. The definition of the new operation on segments is based on one more simple algebraic fact: if al and a2 are two real numbers and alaI + a2a2 :::;1 for ai :::;1, i = 1,2, when al < 0, a2 2: 1, when a2 < 0, al 2: 1, and when al 2: 0, a2 2: 1, al + a2 :::;1, then this sum is {al < I} Q9 {a2 < I}.
°:::;
82
1. THEORY
A dual object is defined as the intersection of the sets dual to the elementary ones: if L E Lin (X), i.e., L = UIEL l, then LJ.. = nzEL lJ.. = {x' E X' I (X',X) = 0, Vx E L}; if C E Cone(X)), i.e., C = UrECr, then C' = nrECr' = {x' E X'I (X',X) ~ 0, Vx E C}; if B E Coo(X), i.e., B = UflEB~' then BO = nflEB ~o = {x' E X' I (X',X) :::; 1, Vx E B}. We see that dual objects belong to the same classes of sets (subspaces, convex cones, and zero-convex sets) as the initial ones but defined in the dual space. Then taking their duals we obtain the objects in X bidual to the initial ones. 6.3. The duality theorem. The biduality of elementary objects (add = a) implies that any set is contained in its bidual. In order to describe the sets which are equal to their bidual sets we will introduce topologies in the vector spaces X and X'. These spaces are dual to each other in the sense that the bilinear form (-, .): X' x X ---+ lR, (x', x) ---+ (x', x) has the property that (x', x) = for any x E X (x' E X') implies x' = (x = 0). This duality enables us to introduce topologies in X' and X making them locally convex topological spaces (LCS). They are called weak topologies and are denoted by a(X, X') and a(X', X) respectively. The base of neighborhoods of zero for a(X,X') (a(X',X)) consists of finite intersections of the sets U(X', c) = {x E X II(x',x)1 < e}, x' E X', e > (U(x,c) = {x' E X' I l(x',x)1 < C}, x E X, E > 0). The linear functionals x ---+ (x',x), x' E X' on X (x' ---+ (x',x), x E X X') are, obviously, continuous in these topologies, and each of the spaces X and X' is topologically dual to the other one, i.e., (X, a(X, X'))' = X' and (X', a(X', X))' = X. If X is an LCS (in particular, a normed space) and X* its dual space (i.e., the set of all linear continuous functionals on X), then the bilinear form (x*,x) ---+ (x*,x) (where (x*,x) is the value of the functional x" E X* on the element x E X) renders the vector spaces X and X* mutually (topologically) dual. The topology a(X,X*) is called weak topology in X (and this is the weakest topology among those in which the functionals in X* are continuous), and in case X is a normed space, the topology a(X*, X) is called weak" topology on X*. In any LCS the separation theorems for convex sets hold. Here we state the second separation theorem.
°
°
SPACES
83
THEOREM (second separation theorem). Let X be an LGS, X* its dual, A a nonempty convex closed subset of X, and Xo E X \ A. Then the set A and the point Xo are strictly separable; i. e., there is a nonzero element x* E X* such that sUPxEA (x* , x) < (x*, xo). The hyperplanes and half-spaces in X and X' are closed in their weak topologies; therefore all dual and bidual sets are also closed. Thus for a set to coincide with its dual it is necessary that it be closed. It can be easily deduced from the second separation theorem that this condition is also sufficient. For example, we will show this for annihilators: if L is closed, then LJ..J.. = L. Indeed, as was pointed out, L c LJ..J... Assume that L 1= LJ..J.. and let Xo E LJ..J.. \ L. By the second separation theorem there is a nonzero element x' E X' such that
SUp(X',
xEL
x) < (X',XO).
(i)
Then x' E LJ.. because otherwise (x', x) 1= for some x ELand since tx E L for any t E lR, we arrive at a contradiction with (i). But if x' E LJ.., then the left-hand side of (i) is equal to zero; hence (x', xo) 1= 0, which, in its turn, contradicts the fact that Xo E LJ..J... In fact we have given a two-way description of all convex closed objects. Each of them was first defined geometrically (as a set containing two elementary objects together with their linear, conic, or zero-convex combination) and then algebraically (as the solution to a system of homogeneous linear equations, homogeneous linear inequalities, or nonhomogeneous inequalities with the same free term). The latter description can be analytically expressed as follows: a convex closed set of the given class is equal to its bidual. Thus all these descriptions are contained in the following theorem (which for the finite-dimensional case was proved in Section 2). THEOREM 1 (on duality of sets). A subspace L is closed if and only if LJ..J.. = L; a convex cone C is closed if and only if C = C; a zero-convex set B is closed if and only if BOO = B.
U
6.4. Convex calculus. The two basic operations on the class of all convex sets are convex hull of a union (co U) (in terms of which subspaces, cones, zero-convex, and simply convex sets are defined) and intersection (n) (defining the dual objects). Moreover, there is one more important operation, viz., summation (+) of two sets. The convex hull of two straight lines
84
1.
THEORY
coincides with the convex hull of their union and with their sum, so that in Lin (X) there are only two operations: + and its dual, n. The conic hull of two rays also coincides with the convex hull of their union and with their sum; hence in Cone(X) there are also only two operations, + and n. Further, we should bear in mind that the image and the inverse image of any of our objects under a linear mapping belong to the same class. Moreover, intersections of closed convex sets and inverse images of such sets are closed sets, whereas image, sum, convolution, and convex hull of a union do not in general possess this property. Thus (as will be seen from the forthcoming theorem) the action of the duality operator on the sum, the image, and the convex hull of a union of convex sets yields the desired relations without any additional assumptions, whereas the relations for a similar action on the intersection, the convolution, and the inverse image of convex sets hold under certain conditions, which we express by the sign ~. THEOREM 2 (formulas of convex calculus of sets). L, E Lin (X), i = 1,2,
===}
+ L2)j_
=Lt
n L-i,
(LA)j_
= t» A*,
n C2)'
= A* t»,
c, E Cone (X),
BiECoo(X),
i = 1,2,
===}
n C~,
(CA)' ~ A*C'.
~Ci + C~,
C'A*,
i=1,2,
===}
(ABt=BoA*,
+ B2t
Q9
B~,
B2t
+ B~.
There is one more operation, +, which does not appear in this book; hence we do not state the corresponding formulas (although they could be easily written down). Apart from sets, convex analysis deals with functions. But the theory of convex functions can be easily reduced to that of convex sets because each point {x} is associated with a linear function on X, S{x}(x') = (x',x). The inverse operator is called the subdifferential, 8s{x}(·) = {x}. The supremum of a family of linear functions is a
SPACES
85
convex homogeneous (of first order) function (or a sublinear function). The epigraph of a sublinear function is a cone in X x R If the function is closed, this cone is closed; hence it is an intersection of half-spaces containing a ray {(O,a) I a 2: O}. It is not hard to see that in this case this function is a supremum of a family of linear functions, i.e., a support function of some set. We complete this section by a list of the remaining formulas of convex calculus, which for the finite-dimensional case were given in Section 3. Let X and Y be vector spaces and X' and Y' their dual spaces. All spaces are endowed with the corresponding weak duality topologies.
(1) Let
h, 12,9 be convex functions on X, f a convex function on Y, A: X ----+ Y a linear operator, and A': Y' ----+ X' the operator conjugate with A. Then
1\
f;;
1.2 1.4
f;;
be convex subsets of X, B a convex subset of Y, and let A: X ----+ Y be a linear operator. Then
Q9
2.1
A2) = bAl
AbA;
+ bA2 = bAl
2.3
(3) Let Ai, A2, A, B, A be as in the previous item and A' the
operator conjugate with A. Then
3.1 s(Al n A2) ~ SAl EB sA2 ~ S(Al coU A2) = SAl V sA2; 3.4 S(Al Q9 A2) ~ SAl Q9 sA2; 3.6 s(BA) ~ A' sB.
3.2
Ai co 1\ sA2;
3.3
+ sA2;
(4) Let Ai, A2, A, B, A be as in the previous item. Then 4.1 tJ(Al n A2) = tJAl tJ(Al + A2) ~ tJAl 4.5 tJ(AA) = AtJA;
4.3 V
Q9
tJA2; tJA2;
tJAl EBtJA2;
+ tJA2;
CHAPTER
Applications
7. Convex analysis of subspaces and cones and the theory of linear equations and inequalities • Here we show how the theory of linear equations can be developed on the basis of duality and convex calculus of annihilators and the theory of solvability of linear inequalities on the basis of duality and convex calculus of conjugate cones. Let a system of linear equations L~=laijXi = bj, j = 1, ... ,m, or inequalities L~=laijXi :::;bj, j = 1, ... ,m, be given, where aij, 1 :::;i :::; 1 < j < m, and bj, 1 < j < m, are given numbers, and n, Xi, 1 :::;i :::; are unknown variables. The main problems here are: n, compatibility (solvability) of the systems of equations or inequalities (i.e., the existence of at least one solution), description of the set of all solutions, existence of nonnegative solutions, deriving explicit formulas for solutions, and so on. Linear equations are one of the most long-standing subjects in mathematics. In the earliest known source containing formulations of mathematical problems, the Rhind papyrus (about 1650 BC), the linear equation X + ~ = 19 is discussed. The methods for solving two equations in two unknowns and three equations in three unknowns were described in the Chinese treatise Mathematics in Nine Books attributed to the second or first century BC. The general theory of linear equations with finitely many equations and unknowns was built in the 18-19th centuries (G. Cramer, L. Kronecker, and others). In the 20th century many results of the finite-dimensional theory were extended to the infinite-dimensional case (I. Fredholm, F. Riesz). In contrast to the theory of linear equations, the development of the theory of linear inequalities began only in the 19th century. The set of solutions of a system of linear equations and inequalities is the intersection of half-spaces of the form {x E lRn I a . x :::; }, b where a = (al, ... ,an), x = (Xl, ... ,xnf, and b E lR (in the case
88
2. APPLICATIONS
of linear equations each hyperplane {x I a . x = b} may be viewed as the intersection of the half-spaces {x E ~n I a· x :::; b} and {x E ~n I -a· x :::; -b}). The intersection of finitely many halfspaces is a convex polyhedron. It is clear that a convex polyhedron specified by finitely many homogeneous inequalities, i.e., a set of the form {x E ~n I ai. x :::;0, i = 1, ... ,m}, is a convex cone. Such cones will be called polyhedral cones. A cone which is a convex hull of finitely many vectors, i.e., a set of the form cone {bl, ... ,bS}, is said to be finitely generated. We begin with the famous theorem by H. Weyl (1935) describing the structure of compact convex polyhedra and convex polyhedra that are polyhedral cones. THEOREM (Weyl). 1. A convex polyhedron in ~n is compact if and only if it is the convex hull of a finite number of points. 2. A cone in ~n is polyhedral if and only if it is finitely generated. PROOF. 1. Let M; be a compact set in ~n which is the intersection of finitely many half-spaces. Then Ml is a convex compact set, and by the Minkowski-Krein-Milman theorem (see the Introduction and Section 11) it contains an extreme point. An extreme point of an intersection of half-spaces, being a boundary point, must belong to the intersection of n hyperplanes whose normals form a basis in ~n (otherwise the intersection of hyperplanes is an affine manifold containing with each point an interval, so the point cannot be extreme). Since there are finitely many collections of n hyperplanes from among the boundaries of our half-spaces, the set Ml has finitely many extreme points, and by the Minkowski-Krein-Milman theorem it is the convex hull of these points. Conversely, let the set M2 C ~n be the convex hull of finitely many points Xl, ... ,XN. Then M2 is a convex compact set. Without loss of generality we may assume that the origin is an interior point of M2 = co {Xl, ... ,xN} (otherwise we restrict our considerations to the affine hull of this set and, if necessary, shift M2 to contain zero in its interior). Then the polar of M2 is also a convex compact set (as a closed set contained in some ball), which is the intersection of finitely many half-spaces {y E (~n)' I y. Xi :::; I}, i = 1, ... ,N. By the "only if' statement proved above M2 = co {yl, ... ,yN'}. By the theorem on bipolar M2 = M2°; hence M2 is the intersection of finitely many half-spaces {x E ~n I yi.x:::; I}, i = 1, ... ,N'.
7. LINEAR
EQUATIONS
AND
INEQUALITIES
89
2. Let C = cone{a1,,,. ,as} C ~n. Let us show that C is a polyhedral cone. The case C = ~n is trivial because, for example, C = {x E ~n I O·x:::; O}. Let C i= ~n. Consider the set K = co {O,ai, ... ,as}. By the first part of this theorem there are ci E (~n)' and ai E~, i = 1, ... ,m, such that K = {x E ~n Ici·x:::; ai, 1:::; i :::; Since E K, we have ai 2: 0, 1 :::;i :::; and some m}. m, of these numbers are zeros. Indeed, if ai > 0, 1 :::;i :::; then m, is an interior point of K and hence of C because, obviously, K c C. But then C must be the entire ~n, which contradicts the assumption. Let ai = 0, i = 1, ... ,l, l < m, Set C1 = {x E ~n I ci . X < 0, 1 :::;i :::; l}. We will show that C = C1. If x E C, then ax E K for a > small enough. Hence ci . ax:::; 0, i = 1, ... ,l, so that x E C1. Conversely, let x E C1. Then for a > small enough the vector ax belongs to K (use the representation of K as an intersection of half-spaces); hence x E C. Thus a finitely generated cone is a polyhedral one. Now let C = {x E ~n I gi. X :::; 0, i = 1, ... ,m}. Consider the cone C2 = cone {- gl ... ,- gm}. It can be easily verified that the cone conjugate with C2 coincides with C, i.e., C~ = C. As we have proved, the finitely-generated cone C2 is polyhedral; hence it is closed as an intersection of closed half-spaces. Then by the theorem on bi-conjugate cones C2 = C~ = C'. Thus C' is finitely-generated; hence it is a polyhedral cone. By what we have just proved, there is a finitely generated cone C3 such that C3 = C". But C" = C, so that C is finitely generated. D
Now we proceed to the solvability problem for the systems of linear equations and inequalities stated above. To this end it well be expedient to write them in a matrix form. Let A = (aij) be an m x n matrix. The systems of linear equations and inequalities can now be written as Ax = b and Ax :::; , where the inequality between vectors b is understood coordinate-wise. The matrix A determines a linear operator from ~n into ~m, which will be denoted by the same letter. This matrix determines also the linear operator A': (~m)' --4 (~n)' by the rule A'y = yA, and it is easily verified that for any x E ~n and y E (~m)' the equality u : Ax = yA· x holds. We denote the image and the kernel of the linear operator A by ImA and Ker A respectively. Recall that L1- denotes the annihilator of the subspace L. Convex analysis of subspaces in ~n consists of the duality theorem: if L is a subspace in ~n, then
90
2. APPLICATIONS
(a) L
Lj_j_,
(b) (L1 + L2)j_ = Lt n Lf; (c) (L1 n L2)j_ = Lt + Lf; if A: ~n --4 ~m is a linear operator, then (d) (AL)j_ = Lj_ A'; (e) (LA)j_=A'Lj_.
The following result gives the dual description of solvability of linear equations:
THEOREM 1. Let A: ~n --4 ~m be a linear operator. equation Ax = b is solvable if and only if b E (Ker A,)j_. PROOF.
Then the
= b means that b
A~n.
Then
Convex analysis of polyhedral cones (similar to convex analysis of subspaces) consists of the duality theorem: if C is a polyhedral cone in ~n, then
(a') C = C",
and formulas of convex calculus of annihilators: (b') (C1 + C2)' = q n q; (c') (C1 n C2)' = C~+ q; if A: ~n --4 ~m is a linear operator, then
(d') (e')
With the help of these results we prove three main theorems of the theory of linear inequalities.
THEOREM 2. Let A: ~n --4 ~m be a linear operator and b E ~m. Then 1) the inequality Ax ::::: is solvable if and only if y . b ::::: for all b 0 y such that y ::::: A'y = 0 (Ky Fan (1956)); 0, 2) the equality Ax = b for x ::::: is solvable if and only if y. b ::::: 0 0 for all y such that A'y ::::: (Minkowski (1896), Farkas (1901)); 0 3) the inequality Ax ::::: for x ::::: is solvable if and only if b 0 u b ::::: for all y such that y ::::: A'y::::: 0 (Gale (1960)). 0 0,
7.
LINEAREQUATIONS ANDINEQUALITIES
91
PROOF. We will also use the evident equality (~i)' = ~i. 1) Solvability of the inequality Ax :::; means that b E A~n +~~. b The sum of two polyhedral cones in a finite-dimensional space is a polyhedral cone (prove it yourself). Hence the cone A~n + ~~ is a polyhedral cone in ~m and then A~n+~~ (~) ((A~n
+ ~~)')' (~)((~n)'
A'
(i) But OA' = Ker A', so (i) is equivalent to statement 1) of the theorem. Analogously, 2) The cone A~+ is evidently a polyhedral cone in ~m. Hence A~~ (~) ((A~~)')'
(!)
(~~A')'.
(ii)
We see that (ii) is equivalent to the statement 2) of the theorem. 3) A~+ + ~~ is a polyhedral cone in ~m. Hence
3) of the theorem.
The following result is an infinite-dimensional generalization of Theorem 1. Let X be a Banach space. In this case convex analysis of subspaces consists of the duality theorem: L is a closed subspace in X iff L = LJ..J.. (a"), and formulas of convex calculus of annihilators: (L1 + L2)J.. = Lt n L:}- (b"), (L1 u L2)J.. ~ Lt + L:}- (c") (the equality holds if Lt + L:}- is a closed subspace). Let A be a linear continuous operator from X to another Banach space Y; then (AL) J.. = LJ..A * (d") , (LA)J.. ~ A* LJ.. (e") (the equality holds if A* LJ.. is a closed subspace). THEOREM 1'. a) Let X and Y be Banach spaces, A: X ----+ Y be a linear continuous operator, b EX, and 1m A be a closed subspace in Y. Then the equation Ax = b is solvable if and only if b E (Ker A')J... b) If Y = X and A = I - C, where C is a compact operator, then the following (Fredholm) alternative holds: either (i) the equation Ax = b has a unique solution for all b EX, or (ii) the kernel of A contains a nonzero element.
92
2. APPLICATIONS
REMARK. Theorem l' is due to Fredholm and F. Riesz. This was one of the greatest achievements of functional analysis in the 20th century. PROOF. The statement a) is proved similarly to the corresponding statement of Theorem 1. For the proof of b) note that ifKer A i= {O} then nonuniqueness of the solution is obtained already for b = O. Now we will show that if the equation Ax = b is solvable for any bE X, then Ker A = {O}. Assume the contrary and let Xl E Ker A be such that Xl i= O. By assumption, the equation Ax = Xl is solvable, and if X2 is its solution, then X2 E Ker A2, because A2X2 = AXl = O. Obviously, Ker AcKer A2 strictly since X2 is in Ker A2 but not in Ker A, because otherwise we would have Xl = O. We continue this process to obtain a chain of strict inclusions Ker AcKer A2 c ... c Ker An c .... Now we use the following fact of convex analysis (which is a trivial corollary of the convex calculus presented in Section 6): the distance between the unit ball of a normed space and a proper subspace of this space is equal to 1 (this statement is sometimes referred to as the Riesz theorem on almost perpendicular). This implies that for each n = 2,3, ... there is a vector en E Ker An such that Ilenll = 1 and Ilx - enll 2: 1/2 for all X E Ker An-i. Consider the sequence {Gen}. Let m > n. One can easily check that Zn = en - Aen + Aem E Ker Am-i. Then IIGem - Genii = Ilzn - emil 2: 1/2; hence it is impossible to select a convergent subsequence from the sequence {Gen}, which contradicts compactness of the operator G. Thus, if the equation Ax = b has a solution for any b E X, then Ker A = {O}, and this solution is, obviously, unique. D
8. Classical inequalities, problems of geometry and mechanics • This section comprises selected problems of analysis and geometry demonstrating the capabilities of convex analysis. Moreover, we consider an extremal problem of mechanics, where convexity does not appear explicitly but in fact plays an important role. 8.1. Some classical inequalities. There are an innumerable amount of inequalities which are simple consequences of the formulas of convex analysis. The main results of