Lovasz Discrete and Continuous
Lovasz Discrete and Continuous
Lovasz Discrete and Continuous
L aszl o Lov asz Microsoft Research One Microsoft Way, Redmond, WA 98052 lovasz@microsoft.com
Abstract How deep is the dividing line between discrete and continuous mathematics? Basic structures and methods of both sides of our science are quite dierent. But on a deeper level, there is a more solid connection than meets the eye.
Introduction
There is a rather clear dividing line between discrete and continuous mathematics. Continuous mathematics is classical, well established, with a rich variety of applications. Discrete mathematics grew out of puzzles and then is often identied with specic application areas like computer science or operations research. Basic structures and methods of both sides of our science are quite dierent (continuous mathematicians use limits; discrete mathematicians use induction). This state of mathematics at this particular time, the turn of the millennium, is a product of strong intrinsic logic, but also of historical coincidence. The main external source of mathematical problems is science, in particular physics. The traditional view is that space and time are continuous, and that the laws of nature are described by dierential equations. There are, of course, exceptions: chemistry, statistical mechanics, and quantum physics are based on at least a certain degree of discreteness. But these discrete objects (molecules, atoms, elementary particles, Feynman graphs) live in the continuum of space and time. String theory tries to explain these discrete objects as singularities of a higher dimensional continuum. Accordingly, the mathematics used in most applications is analysis, the real hard core of our science. But especially in newly developing sciences, discrete models are becoming more 1
and more important. One might observe that there is a nite number of events in any nite domain of spacetime. Is there a physical meaning of the rest of a four (or ten) dimensional manifold? Does the point in time half way between two consecutive interactions of an elementary particle make sense? Should the answer to this question be yes or no, discrete features of subatomic world are undeniable. How far would a combinatorial description of the world take us? Or could it happen that the descriptions of the world as a continuum, or as a discrete (but terribly huge) structure, are equivalent, emphasizing two sides of the same reality? But let me escape from these unanswerable questions and look around elsewhere in science. Biology tries to understand the genetic code: a gigantic task, which is the key to understanding life and, ultimately, ourselves. The genetic code is discrete: simple basic questions like nding matching patterns, or tracing consequences of ipping over substrings, sound more familiar to the graph theorist than to the researcher of dierential equations. Questions about the information content, redundancy, or stability of the code may sound too vague to a classical mathematician but a theoretical computer scientist will immediately see at least some tools to formalize them (even if to nd the answer may be too dicult at the moment). Economics is a heavy user of mathematics and much of its need is not part of the traditional applied mathematics toolbox. Perhaps the most successful tool in economics and operations research is linear programming, which lives on the boundary of discrete and continuous. The applicability of linear programming in these areas, however, depends on conditions of convexity and unlimited divisibility; taking indivisibilities into account (for example, logical decisions, or individual agents) leads to integer programming and other models of combinatorial nature. Finally, the world of computers is essentially discrete, and it is not surprising that so many discrete mathematicians are working in this science. Electronic communication and computation provides a vast array of well-formulated, dicult, and important mathematical problems on algorithms, data bases, formal languages, cryptography and computer security, VLSI layout, and much more. In all these areas, the real understanding involves, I believe, a synthesis of the discrete and continuous, and it is an intellectually most challenging goal to develop these mathematical tools. There are dierent levels of interaction between discrete and continuous mathematics, and I treat them (I believe) in the order of increasing signicance. 1. We often use the nite to approximate the innite. To discretize a complicated contin2
uous structure has always been a basic methodfrom the denition of the Riemann integral through triangulating a manifold in (say) homology theory to numerically solving a partial dierential equation on a grid. It is a slightly more subtle thought that the innite is often (or perhaps always?) an approximation of the large nite. Continuous structures are often cleaner, more symmetric, and richer than their discrete counterparts (for example, a planar grid has a much smaller degree of symmetry than the whole euclidean plane). It is a natural and powerful method to study discrete structures by embedding them in the continuous world. 2. Sometimes, the key step in the proof of a purely discrete result is the application of a purely continuous theorem, or vice versa. We all have our favorite examples; I describe two in section 2. 3. In some areas of discrete mathematics, key progress has been achieved through the use of more and more sophisticated methods from analysis. This is illustrated in section 3 by describing two powerful methods in discrete optimization. (I could not nd any area of continuous mathematics where progress would be achieved at a similar scale through the introduction of discrete methods. Perhaps algebraic topology comes closest.) 4. Connections between discrete and continuous may be the subject of mathematical study on their own right. Numerical analysis may be thought of this way, but discrepancy theory is the best example. In this article, we have to restrict ourselves to discussing two classical results in this blooming eld (section 4); we refer to the book of Beck and Chen [8], the expository article of Beck and S os [9], and the recent book of Matou sek [37]. 5. The most signicant level of interaction is when one and the same phenomenon appears in both the continuous and discrete setting. In such cases, intuition and insight gained from considering one of these may be extremely useful in the other. A well-known example is the connection between sequences and analytic functions, provided by the power series expansion. In this case, there is a dictionary between combinatorial aspects (recurrences, asymptotics) of the sequence and analytic properties (dierential equations, singularities) of its generating function. In section 5, I discuss in detail the discrete and continuous notion of Laplacian, connected with a variety of dispersion-type processes. This notion connects topics from the heat equation to Brownian motion to random walks on graphs to linkless embeddings of graphs. An exciting but not well understood further parallelism is connected with the fact that the iteration of very simple steps results in very complex structures. In continuous mathematics, this idea comes up in the theory of dynamical systems and numerical analysis. In discrete 3
mathematics, it is in a sense the basis of many algorithms, random number generation, etc. A synthesis of ideas from both sides could bring spectacular developments here. One might mention many other areas with substantial discrete and continuous components: groups, probability, geometry... Indeed, my starting observation about this division becomes questionable if we think of some of the recent developments in these areas. I believe that further development will make it totally meaningless. Acknowledgement. I am indebted to several of my colleagues, in particular to Alan Connes and Mike Freedman, for their valuable comments on this paper.
2
2.1
A striking application of graph theory to measure theory is the construction of the Haar measure on compact topological groups. This application was mentioned by Rota and Harper [43], who elaborated upon the idea in the example of the construction of a translation invariant integral for almost periodic functions on an arbitrary group. The proof is also related to Weils proof of the existence of Haar measure. (There are other, perhaps more signicant, applications of matchings to measures that could be mentioned here, for example the theorem of Ornstein [39] on the isomorphism of Bernoulli shifts; cf also [16, 32, 33, 46].) Here we shall describe the construction of a translation invariant integral for continuous functions on a compact topological group, equivalent to the existence of the Haar measure [33]. An invariant integration for continuous functions on a compact topological group is a linear functional L dened on the continuous real-valued functions on G, with the following properties: (a) L(f + g ) = L(f ) + L(g ) (b) If f 0 then L(f ) 0 (linearity)
(monotonicity) (normalization)
(d) If s and t are in G and f and g are two continuous functions such that g (x) = f (sxt) for every x, then L(g ) = L(f ) (double translation invariance). Theorem 1 For every compact topological group there exists an invariant integration for continuous functions.
Proof. Let f be the function we want to integrate. The idea is to approximate the integral by the average f (A) = 1 |A| f (a),
aA
where A is a uniformly dense nite subset of group G. The question is, how to nd an appropriate set A? The key denition is the following. Let U be an open non-empty subset of G (think of it as small). A subset A G is called a U -net if A sU t = for every s, t G. It follows from the compactness of G that there exists a nite U -net. Of course, a U -net may be very unevenly distributed over the group, and accordingly, the average over it may not approximate the integral. What we show is that the simple trick of restricting ourselves to U -nets with minimum cardinality (minimum U -nets, for short), we get a suciently uniformly dense nite set. To measure this uniformity we dene (U ) = sup{|f (x) f (y )| | x, y sU t for some s, t G}. The compactness of G implies that if f is continuous then it is also uniformly continuous in the sense that for every > 0 there exists a non-empty open set U such that (U ) < . Let A and B be minimum U -nets. The following inequality is the heart of the proof: |f (A) f (B )| (U ). (1)
The combinatorial core of the construction lies in the proof of this inequality. Dene a bipartite graph H with bipartition (A, B ) by connecting x A to y B if and only if there exists s, t G such that x, y sU t. We use the Marriage Theorem to show that this bipartite graph has a perfect matching. We have to verify two things: (I) |A| = |B |. This is trivial. (II) Every set X A has at least |X | neighbors in B . Indeed, let Y be the set of neighbors of X . We show that T = Y (A \ X ) is a U -net. Let s, t G, we show that T intersects sU t. Since A is a U -net, there exists an element x A sU t. If x / X then x T and we are done. Otherwise, we use that there exists an element y B sU t. Then xy is an line of H and so y Y , and we are done again. Thus T is a U -net. Since A is a U -net with minimum cardinality, we must have |T | |A|, which is equivalent to |Y | |X |.
(f (ai ) f (bi ))
i=1
|f (ai ) f (bi )|
i=1
1 (n (U )) = (U ). n
The rest of the proof is rather routine. First, we need to show that averaging over minimum U -nets for dierent U s gives approximately the same result. More exactly, let A be a minimum U -net and B a minimum V -net, then |f (A) f (B )| (U ) + (V ). Indeed, for every b B , Ab is also a U -net with minimum cardinality, so by (1), |f (A) f (Ab)| (U ). Hence |f (A) f (AB )| = f (A) Similarly, |f (B ) f (AB )| (V ), whence (2) follows. Now choose a sequence Un of open sets such that (Un ) 0, and let An be a minimum Un -net. Then (2) implies that the sequence f (An ) tends to a limit L(f ), which is independent of the choice of Un and An , and so it is well dened. Conditions (a)(d) are trivial to verify. 1 |B | f (Ab)
bB
(2)
1 |B |
|f (A) f (Ab)| (U ).
bB
2.2
Now we turn to examples demonstrating the converse direction: applications of continuous methods to a purely combinatorial problems. Our rst example is a result where algebraic topology and geometry are the essential tools in the proof of a combinatorial theorem. Theorem 2 Let us partition the k -element subsets of an n-element set into n 2k + 1 classes. Then one of the classes contains two disjoint k -subsets.
This result was conjectured by Kneser [26], and proved in [30]. The proof was simplied by B ar any [6], and we describe his version here. Proof. We rst invoke a geometric construction due to Gale There exists a set of 2k + d vectors on the d-sphere S d such that every open hemisphere contains at least k of them. (For d = 1, take the vertices of a regular (2k + 1)-gon.) Choosing d = n 2k , we thus get a set S of n points. Suppose that all the k -subsets are partitioned into d + 1 = n 2k + 1 classes P0 , P1 , . . . , Pd . Let Ai be the set of those unit vectors h S d for which the open hemisphere centered at h contains a k -subset of S which belongs to Pi . Clearly the sets A0 , A1 , . . . , Ad are open, and by the denition of S , they cover the whole sphere. Now we turn from geometry to topology: by the Borsuk-Ulam theorem, one of the sets Ai contains two antipodal points h and h. Thus the hemispheres about h and h both contain a k -subset from Pi . Since these hemispheres are disjoint, these two k -subsets are disjoint. There are numerous other examples where methods from algebraic topology have been used to prove purely combinatorial statements (see [10] for a survey).
Our next example illustrates how a whole area of important applied mathematical problems, namely discrete optimization, relies on ideas bringing more and more sophisticated tools from more traditional continuous optimization. In a typical optimization problem, we are given a set S of feasible solutions, and a function f : S R, called the objective function. The goal is to nd the element of S which maximizes (or minimizes) the objective function. In a traditional optimization problem, S is a decent continuum in a euclidean space, and f is a smooth function. In this case the optimum can be found by considering the equation f = 0, or by an iterative method using some version of steepest descent. Both of these depend on the continuous structure in a neighborhood of the optimizing point, and therefore fail for a discrete (or combinatorial) optimization problem, when S is nite. This case is trivial from a classical point of view: in principle one could evaluate f for all elements in S , and choose the best. But in most cases of interest, S is very large in comparison with the number of data needed to specify the instance of the problem. For example, S may be the set of spanning trees, or the set of perfect matchings, of a graph; or the set of all possible schedules of all trains in a country; or the set of states of a spin glass.
In such cases, we have to use the implicit structure of S , rather than brute force, to nd the optimizing element. In some cases, totally combinatorial methods enable us to solve such problems. For example, let S be the set of all spanning trees of a graph G, and assume that each edge of G has a non-negative length associated with it. In this case a spanning tree with minimum length can be found by the greedy algorithm (due to Boruvka and Kruskal): we repeatedly choose the shortest edge that does not form a cycle with the edges chosen previously, until a tree is obtained. In many (in a sense most) cases, such a direct combinatorial algorithm is not available. A general approach is to embed the set S into a continuum S of solutions, and also extend the objective function f to a function f dened on S . The problem of minimizing f over S is called a relaxation of the original problem. If we do this right (say, S is a convex set and f is a convex function), then the minimum of f over S can be found by classical tools (dierentiation, steepest descent, etc.). If we are really lucky (or clever), the minimizing element of S will belong to S , and then we have solved the original discrete problem. (If not, we may still use the solution of the relaxation to obtain a bound on the solution of the original, or to obtain an approximate solution. See later.)
3.1
Polyhedral combinatorics
The rst successful realization of this scheme was worked out in the 60s and 70s, where techniques of linear programming were applied to combinatorics. Let us assume that our combinatorial optimization problem can be formulated so that S is a set of 0 1 vectors and the objective function is linear. The set S may be specied by a variety of logical and other constraints; in most cases, it is quite easy to translate these into linear inequalities:
T S = {x {0, 1}n : aT 1 x b1 , . . . , am x bm }
(3)
ci xi ,
i=1
(4)
where the ci are given real numbers. Such a formulation is typically easy to nd. Thus we have translated our combinatorial optimization problem into a linear program with integrality conditions. It is quite easy to solve this, if we disregard the integrality conditions; the real game is to nd ways to write up these linear programs in such a way that disregarding integrality conditions is justied.
A nice example is matching in bipartite graphs. Suppose that G is a bipartite graph, with node set {u1 , . . . , un , v1 , . . . , vn }, where every edge connects a node ui to a node vj . We want to nd a perfect matching, i.e., a set of edges covering each node exactly once. Suppose that G has a perfect matching M and let xij = 1, if ui vj M , 0, otherwise.
xij = 1
i=1
for all j,
j =1
xij = 1
for all i.
(5)
Conversely, we nd a solution of this system of linear equations with every xij = 0 or 1, then we have a perfect matching. Unfortunately, the solvability of a system of linear equations (even of a single equation) is NP-hard. What we can do is to replace the condition xij = 0 or 1 by the weaker condition xij 0. (6) To solve a linear system like (5) in non-negative real numbers is still not trivial, but doable eciently (in polynomial time) using linear programming. We are not done, of course, since if we nd that (5)-(6) has a solution, this solution may not be in integers and hence it may not mean a perfect matching. There are various ways to conclude, extracting a perfect matching from a fractional solution of (5)-(6). The most elegant is the following. The set of all solutions forms a convex polytope. Now every vertex of this convex polytope is integral. (The proof of this fact is amusing, using Cramers Rule and basic determinant calculations. See e.g. [33].) So we run a linear programming algorithm to see if (5)-(6) has a solution in real numbers. If not, then the graph has no perfect matching. If yes, most linear programming algorithms automatically give a basic solution, i.e., a vertex. This is an integral solution of (5)-(6), and hence corresponds to a perfect matching. Many variations of this idea have been developed both in theory and practice. There are ways to automatically generate new constraints and add them to (3) if these fail; there are more subtle, more ecient methods to generate new constraints in for special problems; there are ways to handle the system (3) even if it gets too large to be written up explicitly.
3.2
Semidenite optimization
This new technique of producing relaxations of discrete optimization problems makes the problems continuous in a more sophisticated way. We illustrate it by the Maximum Cut
) of V for which the problem. Let G = (V, E ) be a graph. We want to nd a partition (S, S is maximum. number M of edges connecting S to S This problem is NP-hard, so we cannot hope to nd an ecient (polynomial time) algorithm that solves it. Instead, we have to settle for less: we try to nd a cut that is close to being optimal. It is easy to nd a cut that picks up half of the edges. Just process the nodes one by one, , whichever gives the larger number of new edges going between placing them in the set S or S the two sets. Since no cut can have more than all edges, this simple algorithm obtains an approximation of the maximum cut with at most 50% relative error. It turned out to be quite dicult to improve upon this simple fact, until Goemans and Williamson [21] combined semidenite optimization with a randomized rounding technique to obtain an approximation algorithm with a relative error of about 12%. On the other hand, it was proved by H astad [23] (tightening earlier results of Arora et al. [5]) that no polynomial time algorithm can produce an approximation with a relative error less than (about ) 6%, unless P = N P . We sketch the GoemansWilliamson algorithm. Let us describe any partition V1 V2 of V by a vector x {1, 1}V by letting xi = 1 i i V1 . Then the size of the cut corresponding to x is (1/4)
i,j (xi
over all x
{1, 1}V .
Reducing a discrete optimization problem to the problem of maximizing a quadratic function subject to a system of quadratic equations may sound great, and one might try to use Lagrange multiplicators and other classical techniques. But these wont help; in fact the new problem is very dicult (NP-hard), and it is not clear that we gain anything. The next trick is to linearize, by introducing new variables yij = xi xj (1 i, j n). The objective function and the constraints become linear in these new variables yij : maximize subject to 1 4 (yii + yjj 2yij )
i,j
(9) (10)
y11 = . . . = ynn = 1.
Of course, there is a catch: the variables yij are not independent! Introducing the symmetric 10
matrix Y = (yij )n i,j =0 , we can note that Y is positive semidenite, and Y has rank 1. (12) (11)
The problem of solving (9)-(10) with the additional constraints (11) and (12) is equivalent to the original problem, and thus NP-hard in general. But if we drop (12), then we get a tractable relaxation! Indeed, what we get is a semidenite program: maximize a linear function of the entries of a positive semidenite matrix, subject to linear constraints on the matrix entries. Since positive semideniteness of a matrix Y can be translated into (innitely many) linear inequalities involving the entries of Y : vTY v 0 for all v Rn+1 ,
semidenite programs can be viewed as linear programs with innitely many constraints. However, they behave much nicer than one would expect. Among others, there is a duality theory for them (see e.g. Wolkowitz [48]). It is also important that semidenite programs are polynomial time solvable (up to an arbitrarily small error; note that the optimum solution may not be rational). In fact, the ellipsoid method [22] and, more importantly from a practical point of view, interior point methods [2] extend to semidenite programs. Coming back to the Maximum Cut problem, let Y be an optimal solution of (9)-(10)-(11), and let M be the optimum value of the objective function. Since every cut denes a solution, we have M M . The second trick is to observe that since Y is positive semidenite, we can write it as a Gram matrix, i.e., there exist vectors ui if Rn such that uT i uj = Yij for all i and j . In particular, we get that |ui |2 = 1, and (ui uj )2 = 4M .
i,j
(13)
Now choose a uniformly distributed random hyperplane H through the origin. This divides the ui into two classes, and thereby denes a cut in G. The expected weight of this cut is Cij P(H separates ui and uj ).
i,j
11
But it is trivial that the probability that H separates ui and uj is (1/ ) times the angle between ui and uj . In turn , the angle between ui and uj is at least 0.21964(ui uj )2 , which is easily seen by elementary calculus. Hence the expected size of the cut obtained is at least 0.21964(ui uj )2 = 0.87856M 0.87856M.
i,j
Thus we get a cut which is at least about .88 percent of the optimal. Except for the last part, this technique is entirely general and has been used in a number of proofs and algorithms in combinatorial optimization [35].
Discrepancy theory
In section 3, we started with a discrete problem, and tried to nd a good continuous approximation (relaxation) of it. Let us discuss a reverse problem a bit (only to the extent of a couple of classical examples). Suppose that we have a set with a measure; how well can it be approximated by a discrete measure? In our rst example, we consider the Lebesgue measure on the unit square [0, 1]2 . Suppose that we want to approximate this by the uniform measure on a nite subset T . Of course, we have to specify how the error of approximation is measured: here we dene it as the maximum error on axis-parallel rectangles: (T ) = sup |T R| (R)|T | ,
R
where R ranges over all sets of the form R = [a, b] [c, d] (we scaled up by |T | for convenience). We are interested in nding the best set T , and in determining the discrepancy n = inf (T ).
|T |=n
The question can be raised for any family of test sets instead of rectangles: circles, ellipses, triangles, etc. The answer depends in an intricate way on the geometric structure of the test family; see [8] for an exposition of these beautiful results. The question can also be raised in other dimensions. The 1-dimensional case is easy, since the obvious choice of T = {1/(n + 1), . . . , n/(n + 1)} is optimal. But for dimensions larger Returning to dimension 2, the obvious rst choice is to try a n n grid for T . This leaves out a rectangle of size about 1/ n, and so it has (T ) n. There are many constructions that do better; the most elementary is the following. Let n = 2k . Take all points (x, y ) where 12 than 2, the order of magnitude of n is not known.
both x and y are multiples of 2k , and expanding them to k bits in binary, we get these bits in reverse order: x = 0.b1 b2 . . . bk and y = 0.bn bn1 . . . b1 . It is a nice exercise to verify that this set has discrepancy (T ) k = log2 n. It is hard to prove that one cannot do better; even to prove that n was dicult [1]. The lower bound matching the above construction was nally proved by Schmidt [44]: n = (log n). This fundamental result has many applications. Our second example can be introduced through a statistical motivation. Suppose that we are given a 0 1 sequence x = x1 a2 . . . xn of length n. We want to test whether it comes from independent coin ips. One approach (related to von Misess proposal for the denition of a random sequence) could be to count 0s and 1s; their numbers should be about the same. This should also be true if we count bits in the even positions, or odd positions, or more generally, in any xed arithmetic progression of indices. So we consider the quantity (x) = max
A iA
1 xi |A| , 2
where A ranges through all arithmetic progressions in {1, . . . , n}. Elementary probability theory tells us that if x is generated by independent coin ips, then (x) n/2 with high probability. So if (x) is larger than this (which is the case of most non-random sequences one thinks of), then we can conclude that it is not generated by independent coin ips. But can x fail this test in the other direction, being too smooth? In other words, what is n = min (x),
x
where x ranges over all 0 1 sequences of length n? We can think of question as a problem of measure approximation: we are given the measure on {1, . . . , n} in which each atom has measure 1/2. This is a discrete measure but not integral valued; we want to approximate it by an integral valued measure. The test family dening the error of approximation consists of arithmetic progressions. It follows from considering random sequences that n = O( n) The rst result in the
opposite direction was proved by Roth [41], who showed that n = (n1/4 ). It was expected that the bound would be improved to n eventually, showing that random sequences are 13
the extreme, at least in the order of magnitude (in many similar instances of combinatorial extremal problems, for example in Ramsey Theory, random choice is the best). But surprisingly, S ark ozy showed that n = O(n1/3 , which was improved by Beck [7] to the almost sharp n = O(n1/4 log n), and even the logarithmic factor was recently removed by Matou sek and Spencer [38]. Thus there are sequences which simulate random sequences too well!
The Laplacian
What could be more tied to the continuity of euclidean spaces, or at least of dierential manifolds, than a dierential operator? In this section we show that the Laplacian makes sense in graph theory, and in fact it is a basic tool. Moreover, the study of the discrete and continuous versions interact in a variety of ways, so that the use of one or the other is almost a matter of convenience in some cases.
5.1
Random walks
One of the fundamental general algorithmic problems is sampling: generate a random element from a given distribution over a set V . In non-trivial cases, the set V is either innite or nite but very large, and often only implicitly described. In most cases, we want to generate an element from the uniform distribution, and this special case will be enough for the purpose of our discussions. Ill use as examples two sampling tasks: (a) Given a convex body K in Rn , generate a uniformly distributed random point of K . (b) Given a graph G, generate a uniformly distributed random perfect matching in G. Problem (a) comes up in many applications (volume computation, Monte-Carlo integration, optimization, etc.). Problem (b) is related to the Ising model in statistical mechanics. In simple cases one may nd elegant special methods for sampling. For example, if we need a uniformly distributed random point in the unit ball in Rn , then we can generate n independent coordinates from the standard normal distribution, and normalize the obtained vector appropriately. 14
But in general, sampling from a large nite set, or an innite set, is a dicult problem, and the only general approach known is the use of Markov chains. We dene (using the structure of S ) an ergodic Markov chain whose state space is S and whose stationary distribution is uniform. In the case of a nite set S , it may be easier to think of this as a connected regular non-bipartite graph G with node set V . Starting at an arbitrary node (state), we take a random walk on the graph: from a node i, we step to a next node selected uniformly from the set of neighbors of i. After a suciently large number T of steps, we stop: the distribution of the last node is approximately uniform. In the case of a convex body, a rather natural Markov chain to consider is the following: starting at a convenient point in K , we move at each step to a uniformly selected random point in a ball of radius about the current point (if the new point is outside K , we stay where we were, and consider the step wasted). The step-size will be chosen appropriately, but typically it is about 1/ n. If we want to sample from perfect matchings in a graph G, the random walk to consider is not so obvious. Jerrum and Sinclair consider a random walk on perfect and near-perfect matchings (matchings that leave just two nodes unmatched). If we generate such a matching, it will be perfect with a small but non-negligible probability (1/nconst if the graph is dense). So we just iterate until we get a perfect matching. One step of the random walk is generated by picking a random edge e of G. If the current matching is perfect and contains e, then we delete e; if the current matching is near-perfect and e connects the two unmatched nodes, we add it; if e connects an unmatched node to a matched node, we add e to the matching and delete the edge it intersects. Another way of looking at this is do a random walk on a big graph H whose nodes are the perfect and near-perfect matchings of the small graph G. Two nodes of H are adjacent if the corresponding matchings arise from each other by changing (deleting, adding, or replacing) a single edge. Loops are added to make H regular. Note that we do not want to construct H explicitly; its size may be exponentially large compared with G. The point is that the random walk on H can be implemented using only this implicit denition. It is generally not hard to achieve that the stationary distribution of the chain is uniform, and that the chain is ergodic. The crucial question is: what is the mixing time, i.e., how long do we have to walk? This question leads to estimating the mixing time of Markov chains (the number of steps before the chain becomes essentially stationary). From the point of view of practical applications, it is natural to consider nite Markov chains a computation in a computer is necessarily nite. But in the analysis, it depends on the particular application whether one prefers to use a nite, or a general measurable, state space. All the essential 15
(and very interesting) connections that have been discovered hold in both models. In fact, the general mathematical issue is dispersion: we might be interested in dispersion of heat in a material, or dispersion of probability during a random walk, or many other related questions. A Markov chain can be described by its transition matrix M = (pij ), where pij is the probability of stepping to j , given that we are at i. In the special case of random walks on a regular undirected graph we have M = (1/d)A, where A is the usual adjacency matrix of the graph G. Note that the all-1 vector 1 is an eigenvector of M belonging to the eigenvalue 1. If is the starting distribution (which can be viewed as a vector in RV ), then the distribution after t steps is M t . From this it is easy to see that the speed of convergence to the stationary distribution (which is the eigenvector (1/n)1), depends on the dierence between the largest eigenvalue 1 and the second largest eigenvalue (the spectral gap). We could also dene the spectral gap as the smallest positive eigenvalue of the matrix L = M I . We call this matrix L the Laplacian of the graph. For any vector f RV , the value of Lf at node i is the average of f over the neighbors of i, minus fi . This property shows that the Laplacian is indeed a discrete analog of the classical Laplace operator. Some properties of the Laplace operator depend on the ne structure of dierentiable manifolds, but many basic properties can be generalized easily to any graph. One can dene harmonic functions, the heat kernel, prove identities like the analogue Greens formula: (fi (Lg )i gi (Lf )i ) = 0
i
and so on. Many properties of the continuous Laplace operator can be derived using such easy combinatorial formulas on a grid and then taking limits. See Chung [15] for an exposition of some of these connections. But more signicantly, the discrete Laplacian is an important tool in the study of various graph theoretic properties. As a rst illustration of this fact, let us return to random walks. Often information about the spectral gap is dicult to obtain (this is the case in both of our introductory examples). One possible remedy is to relate the dispersion speed to isoperimetric inequalities in the state space. To be more exact, dene the conductance of the chain as = max
S V
iS,j V \S
i pij . (S ) (V \ S )
(The numerator can be viewed as the probability that choosing a random node from the stationary distribution, and then making one step, the rst node is in S and the second is in V \ S . The denominator is the same probability for two independent random nodes from the stationary distribution.) The following inequality was proved by Jerrum and Sinclair [25]: 16
Theorem 3
2 1 . 8
This means that the mixing time (which we have to use informally here, since the exact denition depends on how we start, how we measure convergence, etc.) is between 1/ and 1/2 . In the case of random walks in a convex body, the conductance is very closely related to the following isoperimetric theorem [34, 17]: Theorem 4 Let K be a convex body in Rn , of diameter D. Let the surface F divide K into two parts K1 and K2 . Then voln1 (F ) 2 vol(K1 )vol(K2 ) D vol(K )
(A long thin cylinder shows that the bound is sharp.) From theorem 4, it follows that (after appropriate preprocessing and other technical diculties which are now swept under the carpet) one can generate a sample point in an n-dimensional convex body in O (n3 ) steps. In the case of matchings, Jerrum and Sinclair prove that for dense graphs, the conductance of the chain described is bounded from below by 1/nconst , and hence the mixing time is bounded by nconst . How to establish isoperimetric inequalities? The generic method is to construct, explicitly or implicitly, ows or multicommodity ows. Suppose that for each subset S V we can construct a ow through the graph so that each node i S is a source that produces an amount of i (V \ S ) of ow, and each j V \ S is a sink that consumes j (S ) amount of ow. Suppose further that each edge ij carries at most Ki pij ow. Then a simple computation shows that the conductance satises 1 . K
Instead of constructing a ow for each subset S of nodes, one might prefer to construct a multicommodity ow, i.e., a ow of value i j for each i and j . If the total ow through each edge ij is at most Ki pij , then the conductance is at least 1/K . How good is the bound on the conductance obtained by the multicommodity ow method? An important result of Leighton and Rao [28] implies that for the best choice of the ows, it gets within a factor of O(log n). One of the reasons I bring this up is that this result, in turn, can be derived from the theorem of Bourgain [11] about the embedability of nite metric spaces in
1 -spaces
To construct the best ows or multicommodity ows is often the main mathematical diculty. In the case of the proof of theorem 4, it can be done by a method used earlier by Payne and Weinberger [40] in the theory of partial dierential equations. We only give a rough sketch. Suppose that we want to construct a ow from K1 to K2 . By the Ham Sandwich Theorem, there exists a hyperplane that cuts both K1 and K2 into two parts with equal volume. We can separately construct the ows on both side of this hyperplane. Repeating this procedure, we can cut up K into needles so that each needle is split by the partition (S, V \ S ) in the same proportion, and hence it suces to construct a ow between K1 and K2 inside each needle. This is easily done using the BrunMinkowski Theorem. For the random walk on matchings, the construction of the multicommodity ow (called canonical paths by Jerrum and Sinclair) also goes back to one of the oldest methods in matching theory. Given (say) a perfect matching M and a near-perfect matching M , we form the union. This is a subgraph of the small graph G that decomposes into a path (with edges alternating between M and M ), some cycles (again alternating between M and M ) and the common edges of M and M . Now it is easy to transform M into M by walking along each cycle and the path, and replacing the edges one by one. What this amounts to is a path in the big graph H , connecting nodes M and M . If we use this path to carry the ow, then it can be shown that no edge of the graph H is overloaded, provided the graph G is dense.
5.2
It was proved by Steinitz that every 3-connected planar graph can be represented as the skeleton of a (3-dimensional) polytope. In fact, there is a lot of freedom in choosing the geometry of this representing polytope. Among various extensions, the most interesting for us is the classical construction going back to Koebe [27] (rst proved by Andreev [4]; cf. also [47]): Theorem 5 (The Cage Theorem.) Let H be a 3-connected planar graph. Then H can be represented as the skeleton of a 3-dimensional polytope, all whose edges touch the unit sphere. We may add that the representing polytope is unique up to a projective transformation of the space that preserves the unit sphere. By considering the horizon from each vertex of the polytope, we obtain a representation of the nodes of the graph by openly disjoint circular disks in the plane so that adjacent nodes correspond to touching circles. The Cage Theorem may be considered as a discrete form of the Riemann Mapping Theorem, in the sense that it implies the Riemann Mapping Theorem. Indeed, suppose that we want to construct a conformal mapping of a simply connected domain K in the complex plane onto 18
the unit disk D. For simplicity, assume that K is bounded. Consider a triangular grid in the plane (an innite 6-regular graph), and let G be the graph obtained by identifying all gridpoints outside K into a single node s. Consider the Koebe representation by touching circles on the sphere; we may assume that the circle representing the node s is the exterior of D. So all the other circles are inside D, and we obtain a mapping of the set of gridpoints in K into D. By letting the grid become arbitrarily ne, it was shown by Rodin and Sullivan [42] that in the limit we get a conformal mapping of K onto D. So the Cage Theorem is indeed a beautiful bridge between discrete mathematics (graph theory) and continuous mathematics (complex analysis). But where is the Laplacian? Well see one connection in the next section. Another one occurs in the work of Spielmann and Teng [45], who use the Cage Theorem to show that the eigenvalue gap of the Laplacian of a planar graph is at most 1/n. This implies, among others, that the mixing time of the random walk on a planar graph is at least linear in the number of nodes.
5.3
In 1990, Colin de Verdi` ere [14] introduced a parameter (G) for any undirected graph G. Research concerning this parameter involves an interesting mixture of ideas from graph theory, linear algebra, and analysis. The exact denition goes beyond the limitations of this article; roughly speaking, (G) is the multiplicity of the smallest positive eigenvalue of the Laplacian of G, where the edges of G are weighted so as to maximize this multiplicity. The parameter was motivated by the study of the maximum multiplicity of the second eigenvalue of certain Laplacian-type dierential operators, dened on Riemann surfaces. He approximated the surface by a suciently densely embedded graph G, and showed that the multiplicity of the second eigenvalue of the operator can be bounded by this value (G) depending only on the graph. Colin de Verdi` eres invariant created much interest among graph theorists, because of its surprisingly nice graph-theoretic properties. Among others, it is minor-monotone, so that the RobertsonSeymour graph minor theory applies to it. Moreover, planarity of graphs can be characterized by this invariant: (G) 3 if and only if G is planar. Colin de Verdi` eres original proof of the if part of this fact was most unusual in graph theory: basically, reversing the above procedure, he showed how to reconstruct a sphere and a positive elliptic partial dierential operator P on it so that (G) is bounded by the dimension of the null space of P , and then invoked a theorem of Cheng [12] asserting that this dimension
19
is at most 3. Later Van der Holst [24] found a combinatorial proof of this fact. While this may seem as a step backward (after all, it eliminated the necessity of the only application of partial dierential equations in graph theory I know of), it did open up the possibility of characterizing the next case. Verifying a conjecture of Robertson, Seymour, and Thomas, it was shown by Lov asz and Schrijver [36] that (G) 4 if and only if G is linklessly embedable in R3 . Can one go back to the original motivation from (G) and nd a continuous version of this result? In what sense does a linklessly embedded graph approximate the space? These questions appear very dicult. It turns out that graphs with large values of are also quite interesting. For example, for a graph G on n nodes with (G) n4, the complement of G is planar, up to introducing twin points; and the converse of this assertion also holds under reasonably general conditions. The proof of the latter fact uses the KoebeAndreev representation of graphs. So the graph invariant (G) is related at one end of the scale to elliptic partial dierential equations (Chengs theorem); on the other, to Riemanns theorem on conformal mappings.
References
[1] T. van Aardenne-Ehrenfest, On the impossibility of a just distribution. Nederl. Akad. Wetensch. Proc. 52 (1949), 734739; Indagationes Math. 11 (1949) 264269. [2] F. Alizadeh: Interior point methods in semidenite programming with applications to combinatorial optimization, SIAM J. on Optimization 5 (1995) 1351. [3] N. Alon and J. Spencer, The Probabilistic Method. With an appendix by Paul Erd os, Wiley, New York, 1992. [4] E. Andreev, On convex polyhedra in Lobachevsky spaces, Mat. Sbornik, Nov. Ser. 81 (1970), 445478. [5] S. Arora, C. Lund, R. Motwani, M. Sudan, M. Szegedy: Proof verication and hardness of approximation problems Proc. 33rd FOCS (1992), 1423. [6] I. B ar any, A short proof of Knesers conjecture, J. Combin. Theory A 25 (1978) 325326. [7] J. Beck, Roths estimate of the discrepancy of integer sequences is nearly sharp, Combinatorica 1 (1981), 319325. [8] J. Beck and W. Chen: Irregularities of Distribution, Cambridge Univ. Press (1987). 20
[9] J. Beck and V.T. S os: Discrepancy Theory, Chapter 26 in: Handbook of Combinatorics (ed. R.L. Graham, M. Gr otschel and L. Lov asz), North-Holland, Amsterdam (1995). [10] A. Bjorner, Topological methods, in: Handbook of Combinatorics (eds. R.L.Graham, L.Lov asz, M.Gr otschel), Elsevier, Amsterdam, 1995, 18191872. [11] J. Bourgain, On Lipschitz embedding of nite metric spaces in Hilbert space, Isr. J. Math. 52 (1985), 4652. [12] S.Y. Cheng, Eigenfunctions and nodal sets, Commentarii Mathematici Helvetici 51 (1976) 4355. [13] A.J. Chorin, Vorticity and turbulence, Springer, New York, 1994. [14] Y. Colin de Verdi` ere, Sur un nouvel invariant des graphes et un crit` ere de planarit e, Journal of Combinatorial Theory, Series B 50 (1990) 1121. [15] F.R.K. Chung, Spectral graph theory, CBMS Reg. Conf. Series 92, Amer. Math. Soc., 1997. [16] R.M. Dudley, Distances of probability measures and random variables, Ann. Math. Stat. 39 (1968), 15631572. [17] M. Dyer and A. Frieze: Computing the volume of convex bodies: a case where randomness provably helps, in: Probabilistic Combinatorics and Its Applications (ed. B ela Bollob as), Proceedings of Symposia in Applied Mathematics, Vol. 44 (1992), 123170. [18] M. Deza and M. Laurent, Geometry of Cuts and Metrics, Springer Verlag, 1997. [19] M. Dyer, A. Frieze and R. Kannan (1991): A Random Polynomial Time Algorithm for Approximating the Volume of Convex Bodies, Journal of the ACM 38, 117. [20] C. Delorme and S. Poljak: Combinatorial properties and the complexity of max-cut approximations, Europ. J. Combin. 14 (1993), 313333. [21] M.X. Goemans and D. P. Williamson: .878-Approximation algorithms for MAX CUT and MAX 2SAT, Proc. 26th ACM Symp. on Theory of Computing (1994), 422-431. [22] M. Gr otschel, L. Lov asz and A. Schrijver, Geometric Algorithms and Combinatorial Optimization, Springer, 1988.
21
[23] J. H astad: Some optimal in-approximability results, Proc. 29th ACM Symp. on Theory of Comp., 1997, 110. [24] H. van der Holst, A short proof of the planarity characterization of Colin de Verdi` ere, Journal of Combinatorial Theory, Series B 65 (1995) 269272. [25] M. Jerrum and A. Sinclair, Approximating the permanent, SIAM J. Computing 18 (1989), 1149-1178. [26] M. Kneser, Aufgabe No. 360, Jber. Deutsch. Math. Ver. 58 (1955). [27] P. Koebe, Kontaktprobleme der konformen Abbildung, Berichte uber die Verhandlungen d. S achs. Akad. d. Wiss., Math.Phys. Klasse, 88 (1936) 141164. [28] F.T. Leighton and S. Rao, An approximate max-ow min-cut theorem for uniform multicommodity ow problems with applications to approximation algorithms, Proc. 29th Annual Symp. on Found. of Computer Science, IEEE Computer Soc., (1988), 422-431. [29] N. Linial, E. London and Y. Rabinovich, The geometry of graphs and some of its algebraic applications, Combinatorica 15 (1995), 215245. [30] L. Lov asz, Knesers conjecture, chromatic number, and homotopy, J. Comb. Theory A 25 (1978), 319-324. [31] L. Lov asz: On the Shannon capacity of graphs, IEEE Trans. Inform. Theory 25 (1979), 17. [32] L. Lov asz and P. Major, A note on a paper of Dudley, Studia Sci. Math. Hung. 8 (1973), 151152. [33] L. Lov asz and M.D. Plummer, Matching Theory, Akad emiai Kiad o - North Holland, Budapest, 1986. [34] L. Lov asz and M. Simonovits, Random walks in a convex body and an improved volume algorithm, Random Structures and Alg. 4 (1993), 359412. [35] L. Lov asz and A. Schrijver: Cones of matrices and set-functions, and 0-1 optimization, SIAM J. on Optimization 1 (1990), 166-190. [36] L. Lov asz and A. Schrijver, A Borsuk theorem for antipodal links and a spectral characterization of linklessly embeddable graphs, Proceedings of the AMS 126 (1998), 12751285. 22
[37] J. Matou sek, Geometric Discrepancy, Springer Verlag, 1999. [38] J. Matou sek and J. Spencer, Discrepancy in arithmetic progressions. J. Amer. Math. Soc. 9 (1996), 195204. [39] D. Ornstein, Bernoulli shifts with the same entropy are isomorphic, Advances in Math. 4 (1970), 337352. [40] L. E. Payne and H. F. Weinberger: An optimal Poincar e inequality for convex domains, Arch. Rat. mech. Anal. 5 (1960), 286292. [41] K.F. Roth, Remark concerning integer sequences, Acta Arith. 9 (1964), 257260. [42] B. Rodin and D. Sullivan: The convergence of circle packings to the Riemann mapping, J. Dierential Geom. 26 (1987), 349360 [43] G.-C. Rota and L.H. Harper, Matching theory, an introduction, in: Advances in Probability Theory and Related Topics, (ed. P. Ney) Vol I, Marcel Dekker, New York (1971) 169215. [44] W. Schmidt, Irregularities of distribution, Quart. J. Math. Oxford 19 (1968), 181191. [45] D.A. Spielman and S.-H. Teng, Spectral partitioning works: planar graphs and nite element meshes, Proc. 37th Ann. Symp. Found. of Comp. Sci., IEEE 1996, 96105. [46] V. Strassen, The existence of probability measures with given marginals, Ann. Math. Statist 36 (1965), 423439. [47] W. Thurston: The Geometry and Topology of Three-manifolds, Princeton Lecture Notes, Chapter 13, Princeton, 1985. [48] H. Wolkowitz: Some applications of optimization in matrix theory, Linear Algebra and its Applications 40 (1981), 101118.
23