Optimal constants in concentration inequalities on the sphere and in the Gauss space
Abstract.
We show several variants of concentration inequalities on the sphere stated as subgaussian estimates with optimal constants. For a Lipschitz function, we give one-sided and two-sided bounds for deviation from the median as well as from the mean. For example, we show that if is the normalized surface measure on with , is -Lipschitz, is the median of , and , then . If is the mean of , we have a two-sided bound . Consequently, if is the standard Gaussian measure on and (again, -Lipschitz, with the mean equal to ), then . These bounds are slightly better and arguably more elegant than those available elsewhere in the literature.
1. Introduction and the main results
Lévy’s isoperimetric inequality on the sphere in [9, 15] is one of the most useful tools in the study of high-dimensional phenomena. The isoperimetric inequality itself is a very precise result: Among the subsets of the sphere of given measure, the caps have the smallest boundary. On the other hand, typical applications appeal to its corollaries, which exhibit varying degrees of tightness. Such corollaries are most often expressed as subgaussian concentration inequalities of the form
(1) |
valid for any -Lipschitz real valued function on the sphere and any , where is the normalized surface measure on the sphere, is either the median or the mean of , and are (explicit or effectively computable) constants, independent of , , and . Perhaps the most frequently cited variant of a spherical concentration inequality comes from the influential 1986 book of Milman and Schechtman [11].
Fact 1.
If is a -Lipschitz function (with respect to the geodesic distance) and is its median, then for every
(2) |
Equivalently, if is such that , then – for every – the -enlargement of defined by verifies .
The spherical concentration inequality (2) played a huge role in the development of the theory. However, its statement is not completely satisfactory for two reasons. First, while it is well known that the constant in the exponent is optimal, stating Fact 1 for hides the inconvenient truth that the dimension of the ambient space in this formulation is , while the factor that appears in the exponent is just . It would be more elegant and convenient for that factor to coincide with the dimension of the ambient space, leading directly to a concentration result of the form (1). Next, the constant in (2) is not optimal (ideally, both bounds in Fact 1 should tend to as ). Even though it is possible to replace with while adjusting the constants, doing that would only exacerbate the second drawback. Here we will prove the following version of the inequality that addresses all these concerns. We emphasize that this bound is not meant to be optimal; in fact, various bounds that are better and near optimal in various asymptotics are known and not that difficult (for example, see Proposition 6, its proof, and the comments following it). However, we believe that our results offer a reasonable compromise between sharpness, simplicity, and the ease of application.
Theorem 2.
([5]) Let and . If satisfies then
(3) |
Consequently, if is a function which is -Lipschitz with respect to the geodesic distance and if is its median, then for every ,
(4) |
Remark.
Before continuing, let us comment on the case , i.e., that of the -dimensional sphere . As easily follows from the general argument sketched in Section 2, the optimal lower bound in (3) is then very simple and reads (in the nontrivial range )
A direct check shows then that the estimate (3) from Theorem 2 fails if , where . However, it remains true outside of this interval. (This is visualized in Figure 1 further below.) Also, if we use the extrinsic chordal distance in (or in , for any ) instead of the geodesic distance, the estimate holds in the entire nontrivial range . Note that the extrinsic distance, i.e., the usual Euclidean distance in the ambient space , is in many applications more relevant than the geodesic distance. This happens for example when the function is defined – and Lipschitz – on the entire space , or at least on the unit ball. ∎
Let us now pass to the discussion of bounds of the form (1) (or (4)) with , the expected value of . The first observation is that, in this context, the value of the constant can not be smaller than , which is shown by the following simple example. (The example is cooked up for , but clearly a simple modification with similar features can be produced for any , see also Section 3.) Identify with endowed with the normalized Lebesgue measure, which we will also denote by . Next, let and let be defined by . Then and so, for ,
Accordingly, if – for such – we have , then letting yields .
Thus, when , the best bound we may hope for in the estimate (1) is . Somewhat surprisingly, a stronger fact is true: we also have (for ) a two-sided bound of the same form.
Theorem 3.
If is a function which is -Lipschitz with respect to the geodesic distance, then, for every ,
(5) |
(6) |
Remark.
(i) All the comments presented in the remark following Theorem 2 apply mutatis mutandis to the case of the inequality from (6). (ii) Obviously, the bound (6) is stronger than (5). However, we state the latter separately since it is also valid for . Moreover, its proof provides some additional information, is a good warmup for the harder proof of (6), and in fact some cases considered in the proof of (6) reduce to instances of (5). (iii) An alert reader will recall that a powerful standard tool for obtaining subgaussian estimates for deviations from the expected value of a random variable is the log-Sobolev inequality, and will wonder whether at least the one-sided part of the assertion of Theorem 3 (i.e., the inequality from (5)) can be derived that way. This is almost true, but not quite. Indeed, a log-Sobolev inequality for a Riemannian manifold is usually deduced from from a bound on , the Ricci curvature of . Now, , which by general arguments alluded to above leads to an estimate of the form (1) with , , and the coefficient of in the exponent on the right hand side equal to . The stronger two-sided estimate (6) presents further problems. ∎
A standard consequence of Theorem 3 is the following deviation result for the Gaussian space.
Corollary 4.
Let be the standard Gaussian measure on and let be an -Lipschitz function (with respect to the Euclidean distance). Next, let and . Then
(7) |
Remark.
(i) When is the median of , the bound (7) is an easy consequence of the Gaussian isoperimetric inequality [4, 16] and the inequality . (ii) The Gaussian analogue of (5) follows from the log-Sobolev inequality via the so-called “Herbst argument” (see [8] or [2]), but we couldn’t easily find a reference giving the estimate (7) for the two-sided deviation. Indeed, the most frequently cited concentration result of that kind is . While in our presentation Corollary 4 appears as an afterthought, the ubiquity of the normal distribution makes this result potentially at least as useful as Theorems 2 and 3. ∎
Our methods are based on refinements of known ideas, a lot of elementary calculus, and some numerics. It would be nice to find better reasons for the results. Hopefully, now that we know that the inequalities hold, someone will come up with a more conceptual proof. Concerning a potential self-contained argument, see the Remark at the end of subsection 3.1. Otherwise, there are many relevant techniques that we did not explore seriously, or did not know how to use (for starters, measure transportation and semigroup methods, see [8], or the Curvature-Dimension-Diameter condition [10] in the context of Section 4), and we use only very weakly log-concavity of the marginals of (the measures defined by (25)), so there is hope.
2. Deviation from the median: proof of Theorem 2
In this section we will sketch the proof of Theorem 2, which, except for the calculus part, follows [11]. The derivation of the second statement from the first one is standard and well-known (see, e.g., [8] or [11]). In fact the two statements are formally equivalent; here is a sketch of the argument.111Modulo minor modifications, this argument works in any metric probability space.
First, if and is the median of , then the set verifies , so (3) applies. Next, if is -Lipschitz, then , and so any lower bound on implies an upper bound on , which is exactly what we need. The second inequality in (4) and the case of general Lipschitz constant follow easily.
In the opposite direction, if , define . Then is -Lipschitz, the median of is , and we have (for any ). Again, this means that any upper bound on translates to a lower bound on , as needed.
From this point on we will concentrate on the estimate (3). The spherical isoperimetric inequality guarantees that, given and , the value of is minimized when is a spherical cap. Accordingly, we need only consider the case of a spherical cap with (i.e., a hemisphere). In other words, we need to show
Proposition 5.
If and , then
(8) |
Proof.
We first note that is again a spherical cap (whose radius in the geodesic distance is ), and so – following [11] – the left hand side of (8) can be rewritten as
(9) |
where is the well-known Wallis integral
(10) |
[The precise value of is not important for the present argument, but for future reference we will cite some easy and well-known facts in Proposition 8 at the end of this section.] This means that the first assertion of Theorem 2 is equivalent to the inequality
(11) |
to be valid for and . While numerical considerations suggest that, for each , the sequence is nonincreasing, in view of the recurrence formula
(12) |
it will be easier to compare and . Specifically, we will aim at proving that
(13) |
which simplifies to
(14) |
Passing to the definite integrals in the formula (12) yields . Substituting this value in (14), and further applying the recurrence formula (12) to the left hand side of (14), allows us to rewrite that inequality as an upper bound on , namely as
(15) |
The the cosine integral appearing above can be upper-bounded as follows:
Applying this bound to the left hand side of (15), we see that it is now sufficient to show that
(16) |
for , which is a well known inequality used often in analysis. This inequality can be validated in many ways; for example, one may consider the power series of both sides, or take logarithm of both sides and repeatedly differentiate.
To recapitulate, we have shown up to now that, for each , the sequences and are nonincreasing. Thus, to deduce (11), we only need to establish that, for , and . (The failure of on some subinterval was the reason why the case had to be excluded from Theorem 2 and analyzed separately.) From (11), and are given by the following
(17) | ||||
(18) |
The inequalities and can now be verified numerically or graphically, see Figure 1. Note that the only points where or is even close to are near (for which we have equality), but since and , we can be sure that the inequalities hold when is close to . (Note that, additionally, we know that , so that for sure holds outside the short interval identified earlier.)
Alternatively, we can analyze the inequalities by analytic means using the same techniques as those employed earlier in the proof of (16). However, the argument is more involved and at some stage numerics seem necessary. For example, taking the logarithm of both sides of leads to
We again have equality when so we look at the derivatives and are led to
(that is, if the above inequality holds in , then so does the previous one and we are done). By direct calculation, , and so it is apparent that has a unique maximum in at . It remains to verify that , as needed. ∎
Remark.
It is possible to generalize the above argument to arrive at estimates involving , where . We cannot expect such estimates to hold for all : we have already seen above that, even in the case , must be greater than . However, they may be true for . Bounds of this type are relevant to improving constants in isoperimetric/concentration inequalities on for , the topic that is explored in Section 4. ∎
For future reference, we will present here other closely related bounds for the quantities appearing in Theorem 2. For simplicity, we will state them only as an upper bound for the volume of spherical caps; estimates of the form (3) and (4) follow then in the usual way.
Proposition 6.
For , the normalized volume of a spherical cap of geodesic radius in satisfies
(19) |
and, for ,
(20) |
where . Moreover, the ratio is increasing on .
Remark.
The inequality (19) is equivalent to , where , which is superior to (8) except for close to . Next, is the best exponent that we may possibly expect since, for small , the volume of an -cap scales as . Finally, since , the bound (20) is superior to (19) except for very close to ; this will be exploited in the proof of inequality (6) from Theorem 3. ∎
Proof.
First, it is apparent that we have equality in (19) when and . Accordingly, (19) will follow once we prove the last statement, i.e., being increasing on . To that end, recall that (cf. (9))
while . The conclusion follows now immediately from the following elementary fact.
Lemma 7.
Let be integrable and let be nonincreasing. Then the function is nonincreasing.
Let us note that the Lemma was stated here with the hypotheses fitting the current setting, but it remains true under any reasonable assumptions that assure the quantities in question are well defined.
Finally, let us state for future reference some easy and well-known facts concerning the Wallis integral defined in (10).
Proposition 8.
If then
(i) for ,
(ii) for .
(iii) We have , where is as in Proposition 6. The sequence is increasing to , and so and both decrease to .
For these (and other, tighter and two-sided) estimates we refer the reader to, for example, Section 5.1.2 and Appendix A in [2].
3. Deviation from the mean: proofs of Theorem 3 and Corollary 4
In this section we will address the one-sided and the two-sided problem for the deviation from the mean: If is -Lipschitz and , what are the bounds for and for ? The bulk of the argument will be devoted to the spherical case (Theorem 3); at the very end we will make a few comments about deducing the Gaussian result (Corollary 4).
As in the previous section, the proof of Theorem 3 splits into two parts: (i) identifying extremal instances for the problem at hand and (ii) obtaining tight estimates satisfied by those extremal instances. In the case of Theorem 2, the extremal objects were the spherical cap of measure , i.e., a hemisphere (in the context of (3)) and the function (in the context of (4)). This will not suffice in the present setting since depends on the entire distribution of , i.e., on the values of for all , and not only on the values of for which that measure is . However, allowing caps of arbitrary measure leads to a sufficiently rich family of functions, which we will describe in a moment.
3.1. The one-sided problem : Proof of (5)
We will start with the following simple lemma.
Lemma 9.
Let be a probability measure on such that . Next, let be the set of functions from to that are -Lipschitz and nondecreasing. For , consider the optimization problem
(21) |
where stands for the expected value in the probability space . Then, for each , it is enough to restrict the supremum to the subfamily of consisting of functions of the form
(22) |
where is a parameter. Moreover, for each it is sufficient to consider . That is, if is a nonincreasing function satisfying for all , then it is also true that for all and all .
Proof.
Denote . For the first assertion of the Lemma, we need to show that, for ,
(23) |
the converse inequality being trivial. To that end, fix and . Since is nondecreasing an continuous, the set is either empty, or of the form for some , and necessarily . Next, since adding a constant to any function doesn’t change the set , we may just as well assume that .
Consider now the function as defined by (22). Then
for with equality for (by construction)
for (because is -Lipschitz).
(The reader is advised to draw a picture.)
Thus everywhere on and it follows in particular that .
Consequently, if we choose so that , then .
On the other hand, , and since for any and any
we obviously have , it follows that
Since was arbitrary, (23) follows. The second assertion of the Lemma is a consequence of the fact that to upper-bound we only need information about for some , wiith the equality being implicit in the definition of the latter set. ∎
Remark.
The second assertion of Lemma 9 can be strengthened and simplified as follows: For any , the supremum from (21) is attained for some with verifying . We stated the weaker version since it is sufficient for our purposes and easier to prove, but since the Lemma may be of independent interest, we include a sketch of the proof of the stronger fact in the Appendix. ∎
We will now sketch a very well known reduction argument that allows to derive from Lemma 9 the form of the extremal functions for the one-sided problem (5) from Theorem 3. (The same argument will work for the two-sided problem (6) once we establish a two-sided analogue of Lemma 9.)
Let be any (say, Borel) function on and let be its rearrangement (i.e., verifying for any ) that is of the form
(24) |
where is the first coordinate of and is nondecreasing. This is a standard procedure that works for any random variable and any non-atomic probability measure on , but in our setting it has an additional feature: if is -Lipschitz, so is . Indeed, suppose is -Lipschitz and let . We need to show that if , , then .
By symmetry, we may assume that (hence ). By construction, the sets and are “opposite” spherical caps with “parallel” boundaries with . Likewise, by construction, if and , then and . Next, since is -Lipschitz, it follows that ; in other words, , where is the -enlargement of defined in Fact 1. We now appeal to the isoperimetric inequality on to conclude that . Since both and are (closed) “left” spherical caps, we deduce that or, equivalently, , and – in particular – , as needed.
Since all quantities depending on the distribution are identical for and , it follows that for estimates such as (5) it is enough to consider functions of the form (24). Further, since the geodesic distance between the two “parallels” and equals , the function defined by (24) is -Lipschitz on if an only if is -Lipschitz on . Putting all these observations together, we conclude that the one-sided bound (5) for given is an instance of (21) with being the push-forward of under the map , with the extremal functions given by (22).222Note that the functions in (22) are a priori defined on , but only their restrictions to and the values are relevant to the problem at hand. However, for other reference measures (for example, the Gaussian measure), test functions defined on and all values of may be needed.
As was (implicitly) determined in Section 2, is then of the form
(25) |
where is defined by (10).333For definiteness, we will assume the that the density of is outside of the interval . Accordingly, verifying the one-sided estimates from Theorem 3 numerically for “not-too-large” , and analytically for “small” , is completely straightforward. First, given and such that , the set is exactly , and its measure is
which is the Haar measure of the corresponding cap
(26) |
already analyzed in Section 2 (at least for , see Proposition 5; note that, in the notation from that section, ). This quantity (as a function of ) needs to be compared with , where , which can be rewritten as
(27) |
where , are understood as random variables in the probability space , and we use the fact that . For future reference, let us note that (27) can be restated as follows
(28) |
this is because for any nonnegative random variable one has .
Note that if is close to (but strictly greater than) , then on a set of nearly full measure, while . Consequently, for we have , while (in fact also necessarily ). This is another argument showing that, in the present context, one can not hope for the multiplicative constant in the bound of type (1) to be strictly smaller than .
To summarize, we have shown that the validity of the one-sided bound from (5) for given will follow from (and in fact is equivalent to)
(29) |
where is defined via (27) or (28). Here is a sketch of the calculation showing that (29) holds for . (In fact, we will see that in that range a tighter bound, with a better constant , can be found. The argument from the proof of Lemma 9 implies then that a version of (5) with that improved constant holds for all -Lipschitz functions and all above the threshold given by (27) or (28) with . The so calculated threshold depends on and is asymptotically equivalent to .)
Since we know from Proposition 5 (at least for , which we assume) that the measure of the cap in question does not exceed , it is enough to show that
(30) |
where
(31) |
(cf. (28)). Taking logarithms of both sides of the inequality (30) shows that it is equivalent to
and finally to
On the other hand, appealing again to the estimate , we can upper-bound by , so it is enough to show that, for any and ,
Let us now change variables via and, inside the integral, to get an equivalent dimension-free form
This inequality is easy to confirm, in fact a much sharper bound follows from the well-known Komatu inequality ([6], or see Remark 4 in [17])
(32) |
As is easy to check, the above argument yields (for and ) the bound in (29) that is of the form with . Further improvement is possible if one replaces the use of the Komatu inequality (32) by the more precise bound ([13], or see Proposition 3 in [17] ; numerical check suggests that the constant works. This shows that – except for small values of – the bound (5) is not very sharp. However, this is a feature, not a bug: it provides the wiggle room needed to deliver the two-sided bound (5).
Concerning the case of (29), the verification is – as pointed out earlier – completely straightforward. Indeed, a direct computation leads to and and there is no doubt that (29) holds in the entire non-trivial range , with equality iff .
For , we (trivially) have equality in (29) when (and , for all ), but otherwise the bound does not seem very tight. (This can be seen heuristically by expanding the quantities in question in powers of if is small, and approximating the random variable by a standard normal random variable if is not “too large.”) For a rigorous argument, observe first that, for , it is more transparent to rewrite the formula for as
(33) |
In other words, we need to show that if , then
(34) |
where we used . Note that, in the present context, the relationship between and is exactly the same as the relationship between and was in the case . Next observe that in (35) we are in a different regime than for : unless is very small, both sides in the last inequality are close to . Accordingly, it is more appropriate to restate the conclusion of (34) as a lower bound on the probability of the complementary event
(35) |
To further facilitate concentrating on the values of that are close to , we change variables to and . The statement (35) becomes then
(36) |
where
(37) |
is the normalized volume of a spherical cap of geodesic radius in , the function that was already defined in (9). We now appeal to Proposition 6 to obtain
(38) |
The next step is the following simple observation.
Lemma 10.
If and , then .
Proof.
There is equality for and , and the function has the derivative , which is positive on and negative on , where is such that . ∎
Combining Lemma 10 and (38) we see that (36) will follow if
(39) |
Since , the above can be further strengthened to
(40) |
which in turn is equivalent to
(41) |
This is evidently true for since and by Proposition 8(ii), which concludes the proof of (5).
Remark.
We point out that the two crucial inequalities appearing in the proof, namely (29) and the conclusion of (35), are de facto functional inequalities relating the function and its derivative or, in the formulation in the spirit of (36), the function and its primitive. Accordingly, it is conceivable that once one comes up with a manageable related differential inequality, these functional inequalities would follow. (This could be parallel to the proofs of Komatu-like inequalities, see, e.g., Proposition 3 in [17], or Exercise A.2 in [2].) Similar comments apply to the proofs of the two sided-bound (6) and Corollary 4 in the next two subsections. ∎
3.2. The two-sided problem : Proof of (6)
We now pass to the analysis of the estimate (6) from Theorem 3, i.e., the bound for . The initial step, a reduction to the case of and the reference measure defined by (25) is the same as for the one-sided problem. The second step, the analogue of Lemma 9, i.e., a reduction to functions from (22), is slightly more involved, and we impose some mild restrictions on the reference measure , which needs to be symmetric and unimodal (by the latter we mean that , where is nonincreasing).
Lemma 11.
Let be a symmetric unimodal probability measure on such that and let . Then
(42) |
Moreover, for each it is sufficient to consider .
The proof of the Lemma is elementary, but on the complicated side. We relegate it to the Appendix.
Similarly as was the case in the one-sided setting, Lemma 11 reduces – for a specific density – the inequality of type (6) to a comparison of two concrete functions of the parameter , which can be verified numerically. For added rigor, this should be accompanied by an asymptotic analysis of the quantities in question when (note that as , both sides of (6) typically converge to ) and – if is not compactly supported – as . In particular, it is routine to check whether (6) holds for any particular value of . For there is a failure on the same interval for which Theorem 2 failed for , and the failure happens for the same reason: in the reformulation in terms of given by (25), consider the function and note that since the density of is constant for , replacing the median by the mean does not make any difference. For , the integrals involved in the definitions of , , and the relevant probabilities can be explicitly evaluated and the resulting graphs look as in Figure 2.
Clearly, the only values of that are questionable are those close to , but it is readily verified that , while . Consequently, comparing vs. will show a clear separation.
We now focus on the values , which we will assume when needed (though most steps will work for or even ).
Consider first the case . In the notation from the one-sided setting, we have
Accordingly, our problem reduces to determining whether
(43) |
If we use (for ) the bound (a special case of (3)), the conclusion will follow if
(44) |
Since the function is concave on the interval , this clearly holds if . A direct calculation shows that the constraint is also sufficient. However, this argument can not work in full generality since the function is convex on the interval , and so the inequality converse to (44) holds if . To handle such larger values of , we need a strengthening of the inequality (8) from Proposition 5.
Heuristically, it is clear that the bound (8) can be improved if is large enough. Indeed, behaves roughly as a standard normal random variable and so – within the range of this approximation – by Komatu’s inequality (32), or the more precise inequality from [17] mentioned in the proof of (5). Consequently, the coefficient of becomes small when is large. The same phenomenon is exemplified in the spherical case by the bound (20). For our purposes, the following variant will suffice.
Lemma 12.
If , then for . For , the inequality holds for .
Lemma 12 is based on a subtle comparison between the cosine and the exponential function. Since such ideas will also be used later in the argument, we state them separately.
Lemma 13.
We have
(45) |
For , the inequality holds for and respectively. On the other hand,
(46) |
The proofs of both Lemmas involve mostly calculus, some numerics, and careful book-keeping. We relegate them to the Appendix.
Returning to the proof of (43), we consider two cases.
Case Assuming , both terms on the left-hand side of (43) can be upper-bounded using (the first statement of) Lemma 12 and so it is enough to verify
(47) |
Due to the improvement in the bound for (compared to (44)), an argument along the lines of the proof of (5) will work. First, since clearly , the inequality (47) can be further reduced to
As in the proof of (5), this is equivalent to
(48) |
where . On the other hand, from the definition (31) of and appealing to Lemma 12, we deduce that
(49) |
The last integral in (49) can be expressed in terms of the Gaussian error function and investigated numerically. Alternatively, as in the proof of (5), we may use the Komatu bound (32), which reduces the problem to showing that, for ,
(50) |
Siince , these two denominators can be discarded. To complete the argument, it remains to verify the resulting inequality at . (The inequality (50) actually holds for all , but showing that is not needed.)
Case In this case we can not use Lemma 12 to estimate , but – as in the proof of (5) – the weaker bound (8) combined with Komatu’s inequality (32) will be – for different reasons – sufficient. In the notation from Case , we have
Consequently,
It is readily verified that the expression in the parentheses is less than for . In particular, for , we get , and so we are in the range of applicability of (44). (The argument is almost as clean if we use the more precise expression involving the Gaussian error function; it yields the bound for .)
Finally, let us recall that when , the inequality (43) was verified numerically, and – in the range – it was never close. Alternatively, the general argument presented above can be easily patched up when specified to the instance (and only Case requires patching).
It remains to handle the case . As in the context of the one-sided bound, it is then more transparent to rewrite the formula for as
(51) |
where (see (33)), while the inequality to be verified (cf. (35)) becomes
(52) |
As for , we will consider separately the cases when is “small” and “not-so-small.”
Case , small : If is sufficiently small (to be made precise later), will be within the range of applicability of inequality (45), and the approach from (the proof of) Lemma 12 will work. Specifically, we can deduce then that
(53) |
Substituting , the inequality (52) reduces to
(54) |
where and . At the same time, as in Eqs. (33)-(38), and in view of Lemma 10 and Proposition 5,
(55) |
For future reference, let us note that (55) and Proposition 8 imply immediately that .
To summarize, we need to show that if satisfies the constraint (55), then (54) holds for in the appropriate range. Furthermore, since (for fixed ), is decreasing, while is increasing (for ), it is enough to consider the largest possible value of , e.g., . Thus the problem is reduced to comparing two functions of , which depend rather weakly on since, by Proposition 8, the coefficients appearing in them satisfy and . This suggests verifying first the asymptotic version of the statement, namely :
(56) |
A numerical check shows that this statement “comfortably” holds in the relevant -range (say, ), see Figure 3.
This implies that the inequality (54) holds under the constraint (55) if is large enough. Moreover, since the sequences of coefficients are monotone (by Proposition 8), once we determine that (54) holds for , it will follow that it is also valid for . A direct check shows that this works for , which is enough for our purposes.
To complete this part of the argument, we need to determine the range of the parameter that assures that (45) can be applied with (that is, or, equivalently, if , and similarly for ). Using the bound (55) with , we see that if . Since the sequence decreases by Proposition 8, it follows that the same is true for . In other words, the method from Case works for . This is amply sufficient as the approach from Case will cover the range , and .
Case , not-so-small : To handle the values of , which correspond to , we need a more specialized upper bound for . To that end, we appeal to (20), which restated in the current setup becomes
(57) |
Accordingly (cf. (51))
(58) | |||||
where in the last inequality we used (again) (57) and the identity .
We now argue similarly as in the one-sided context. The left-hand side of (52) will be lower-bounded by (cf. (53)) and the right-hand side upper-bounded by , which reduces the problem to
(59) |
As in the argument that led to (56), let us consider first an asymptotic version of (60). That is, substitute and and let , which leads to
(61) |
To establish (61), it is clearly enough to assume equality in the constraint on , and it is then apparent (numerically) that the inequality on the right comfortably holds. In fact, it does hold for and, for , the ratio of the two sides is greater than . We can not, however, deduce immediately that (60) holds for sufficiently large since there is no obvious monotonicity with respect to and we do not know if the convergence involved in obtaining (61) is appropriately uniform. Still, patching the calculation is rather routine; we sketch the main points in the Appendix. ∎
3.3. The Gaussian case : Proof of Corollary 4
The Gaussian isoperimetric inequality [4, 16] reduces the problem to . The obvious line of argument is now to invoke some version of the Poincaré Lemma (appropriately normalized marginals of converge, as , to the normal distribution) and then appeal to (6), but there are some minor technical issues that need to be addressed. First, , the random variable distributed according to , is not exactly a marginal of (the marginals are parametrized by rather than by ). Next, we have to make sure that the convergence of to the standard normal preserves probabilities and moments. An elementary way to resolve these issues is to consider the density of , which is (on its support) . Once we take into account the properties of stated in Proposition 8, it is an elementary exercise to show that (the density of ), and that this convergence is dominated in a rather strong sense: we have for all and . The dominated convergence theorem implies then the convergence of all probabilities and all moments. ∎
An alternative line of argument is to appeal to Lemma 11 and then compare to , where . Since all these quantities can be expressed in terms of the Gaussian error function, there is no problem with a numerical verification, see Figure 4. For complete rigor, this should be accompanied by an asymptotic analysis as .
Finally, one could redo in the Gaussian setting the rather rigorous calculations that were performed in the spherical case. These would be substantially easier since we do not need to worry about the dependence on the dimension. As a matter of fact, we did some such calculations to provide heuristics for the spherical case. However, an argument of that nature wouldn’t be pretty. It would be good to have a neat proof based on standard properties of the Gaussian error function, perhaps along the lines of [17] or the follow-up papers [7, 12].
4. Products of spheres
In this section we will discuss perspectives for improving isoperimetric/concentration inequalities on for . The discussion is somewhat exploratory in nature, with some results based on numerics and many estimates presumably not optimal, and is intended to encourage further research.
As is well known, Fact 1 generalizes to products with arbitrary number of factors (see, e.g., [11], section 6.5.2; note, however, that he family discussed there should involve and not ). This is because Ricci curvature is , and consequently the same is true for the product and one may apply the following comparison result due to Gromov (see section 6.4 and Appendix I in [11]).
Fact 14.
Let be a an -dimensional Riemannian manifold, whose Ricci curvature is bounded from below by . Choose so that . Denote by the normalized Riemannian measure on and by the normalized Haar (surface) measure on the sphere . Next, let , and be a cap such that . Then .
As earlier, is again a cap and so its volume can be – after rescaling by – expressed as an integral of the type (9). In particular, if , then
where is defined by (11). Specifying further to (with ), whose Ricci curvature is , we are led to
(62) |
Proposition 15.
Let , , and let be the normalized product measure on . Next, let be such that . If , then
(63) |
Since the fraction inside is clearly greater than , it follows that , which is slightly better than the estimate from [11] mentioned at the beginning of this section: the multiplicative constant instead of (this is because we are using Theorem 2 and not Fact 1). There is also a slight improvement in the coefficient of in the exponent, but it becomes less and less significant as increases. In order to be able to deduce the same bound as in Theorem 2 for , we need a stronger version of (11) (or, equivalently, of (3) or (8)) with an appropriate “excess” in the exponent to compensate for the coefficient of in (63) being strictly smaller than . Specifically, define
(64) |
and suppose that the inequality
(65) |
is valid with and with such that . Then, repeating mutatis mutandis the argument that led to Proposition 15, we obtain – for this particular choice of , and for the respective values of – an improvement to (63) with on the right hand side, which is a much neater expression.
As it turns out, as speculative as the bound (65) appears, it is not unreasonable. Of course, as already pointed out in the Remark following the proof of Theorem 2, we cannot expect it to hold for all and all , but it may conceivably be true for . Below is an analysis showing that validating (65) is actually quite feasible.
We note first that the equation resolves to or . The last two expressions are decreasing functions of respectively or , which means that the threshold value for the excess that is sufficient for our purposes can be chosen as a function of only. Next, it follows immediately from the definition (64) of and from (13) that for all and . Thus, for each and for all , the sequences and are nonincreasing. This means that, given , once we establish (65) for certain and , it will be valid for all , and consequently for all sufficiently large (with the last qualification depending on ).
A numerical check indicates that and for , and that such inequalities do not generally hold for and . This suggests that, in the setting of Theorem 2, the bound is valid for and the bound is valid for . It would be interesting to rigorously determine the threshold values , in addition to the numerical results indicated above and further explored below.
As a demonstration, let us focus on the instance . A numerical check using Mathematica shows that in that case:
if , hence and , then (65) holds for ; taking into account the rescaling we deduce (cf. (62)) that, in the setting of Proposition 15 the bound holds for
if , hence and , then (65) and all the subsequent bounds hold for , so (after rescaling) we deduce that for ; however, if we use the chordal distance instead of the geodesic distance, then the bounds hold in the entire range
if , hence and , then (65) and all the subsequent bounds hold in the entire respective range
if , the same holds by monotonicity; note that for all ’s are even, and so our inductive “initialization” requires verifying only one value .
The above considerations can be summarized in the following statement.
Theorem 16.
Let be the normalized product measure on and let be such that . If and , then
If and , the above bound holds for, respectively, and . Additionally, the bound holds in the entire range if and if the enlargements are defined via the chordal distance rather than the geodesic distance.
For , the bound (65) holds for and for with the chordal distance; for for and for with the chordal distance. So it appears that the threshold for allowable dimensions increases with , possibly unboundedly. Again, it would be interesting to rigorously determine the dependence of the allowable range of as a function of (assuming it indeed does not stabilize, which would be a desirable property, but not very likely in view of the above numerical results). Another useful – and perhaps not that hard – result would be a good universal lower lower bound on such that the bounds hold for , or for . Numerics suggest that those intervals are never really small.
The final remark is that – unlike in the case – we do not have an exact calculation, based on the knowledge of extremal subsets, but one that relies on the Gromov’s comparison theorem (Fact 14). While there are many very sophisticated approaches to isoperimetric problems on product spaces (e.g. [14, 18]), we are not aware of the precise solution to the problem even in the case of the torus . So it is possible, and quite likely, that in reality the bounds hold for a larger set of parameters than what follows from the argument above. It may be feasible to test, e.g., and or by looking at some specific sets and the (relatively large) values of or suggested by the numerics leading to the results described above.
Acknowledgements. GA was supported in part by ANR (France) under the grant ESQuisses (ANR-20-CE47-0014-01). The research of JJ and SJS has been supported in part by grants from the National Science Foundation (U.S.A.).
References
- [1]
- [2] G. Aubrun and S. J. Szarek, Alice and Bob Meet Banach. The Interface of Asymptotic Geometric Analysis and Quantum Information Theory. Mathematical Surveys and Monographs, 223, Amer. Math. Soc., 2017.
- [3] A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovász, and M. Simonovits, Deterministic and randomized polynomial-time approximation of radii. Mathematika 48, Issues 1-2, pp.63-105 (2003).
- [4] C. Borell, The Brunn-Minkowski inequality in Gauss space. Invent. Math. 30 (1975), no. 2, 207-216
- [5] J. Jenkinson, Convex Geometric Connections to Information Theory. Ph.D. thesis, Case Western Reserve University, 2013, http://rave.ohiolink.edu/etdc/view?acc_num=case1365179413.
- [6] Y. Komatu, Elementary inequalities for Mills’ ratio. Rep. Statist. Appl. Res. Un. Jap. Sci. Engrs. 4 (1955), 69-70.
- [7] O. Kouba, Inequalities related to the error function. arXiv:math/0607694, 2006.
- [8] M. Ledoux, The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs, 223, Amer. Math. Soc., Providence, 2001.
- [9] P. Lévy, Problèmes concrets d’analyse fonctionnelle. 2nd ed. Gauthier-Villars, Paris, 1951.
- [10] E. Milman, Sharp isoperimetric inequalities and model spaces for the Curvature-Dimension-Diameter condition. J. Eur. Math. Soc. 17 (2015), 1041-1078.
- [11] V. D. Milman and G. Schechtman, Asymptotic theory of finite dimensional normed spaces. With an appendix by M. Gromov, Lecture Notes Math. 1200, Springer Verlag, Berlin-New York, 1986.
- [12] M. B. Ruskai and E. Werner, A pair of optimal inequalities related to the error function. arXiv:math/9711207, 1997.
- [13] M. R. Sampford, Some inequalities on Mill’s ratio and related functions. Ann. Math. Statistics 24 (1953), 130-132.
- [14] G. Schechtman, Concentration results and applications. In Handbook of the geometry of Banach spaces. Edited by W. B. Johnson and J. Lindenstrauss (North-Holland, Amsterdam, 2003), Vol. 2, pp. 1603-1634.
- [15] E. Schmidt, Die Brunn-Minkowskische Ungleichung und ihr Spiegelbild sowie die isoperimetrische Eigenschaft der Kugel in der euklidischen und nichteuklidischen Geometrie. I. Math. Nachr. 1 (1948) 81-157.
- [16] V. N. Sudakov and B. S. Tsirelson, Extremal properties of half-spaces for spherically invariant measures. J. Soviet Math. (1978), 9-18.
- [17] S. J. Szarek and E. Werner, A Nonsymmetric Correlation Inequality for Gaussian Measure. J. Multivariate Analysis 68 (1999), 193-211
- [18] M. Talagrand, Concentration of measure and isoperimetric inequalities in product spaces. Inst. Hautes Études Sci. Publ. Math. 81 (1995), 73-205.
5. Appendix
5.1. Achievability of tail estimates in Lemma 9
Here we sketch a proof of the assertion from the Remark following the proof of Lemma 9, which said that the supremum in (21) is attained for some and, moreover, for verifying . In view of (23), this reduces to the analysis of the family without having to consider a general . We will start with a series of elementary observations.
Since , it follows that , with equality iff is supported on .
Since and if , the function is -Lipschitz and non-decreasing. Consequently, the same is true for .
Similarly, is nondecreasing; this follows from and being a nondecreasing function of (for each fixed ).
If , then the supremum in (21) is strictly smaller than . The hypothesis implies that such that if , then . Let and suppose (as we can) that , which implies . Denote ; our objective is to show that can not be too small. We have
which can be rewritten as
Now, set and choose the corresponding . If , then , while and so
In other words, either , or , so , as asserted.
Note : The assertion will follow independently from the other observations, but we include it here since the argument works for any -Lipschitz function and not just for .
and . Both of these follow from via the dominated convergence theorem.
With this preparation, the conclusion is very easy. Denote , . By , the function is -Lipschitz and non-decreasing. By , it has an oblique asymptote as and a horizontal asymptote as . Since, by , is nonincreasing, it follows that there is a unique value such that
if
if , with equality if .
Let us decode what these inequalities mean. First, means ; in that case . On the other hand, if , then . Since, again, is non-decreasing by , the the largest value of will be attained for the smallest value of for which
, i.e., for , and the condition means precisely or .
∎
5.2. Proof of Lemma 11
We fix and proceed in several steps.
Step First, there is the essentially trivial observation that – from the point of view of estimating – the function is equivalent to (for any ) and, due to the symmetry of , to .
Step Second, extremal functions (i.e., such that is maximal) do exist. Indeed, suppose that is a sequence of functions for which . By the previous remark, we may assume that for all , and it then follows the there is a subsequence of converging uniformly on bounded intervals444In our setting, the measures all all supported on , which makes the argument even more straightforward. to some function , which necessarily belongs to . Since , it follows from the dominated convergence theorem that . This implies that and, consequently, that . So the limit function is extremal.
Step The next observation is slightly less trivial; it gives the first hint why functions of the form (22) may be extremal. Let and let be defined by (it exists by the intermediate value theorem, see Figure 5 for this and the subsequent steps). We claim that if is extremal, then (at least in all cases of interest, to be clarified later)
(66) |
If this is not the case, then, denoting and , we will have . We now set (so ) and define and as follows
Then are -Lipschitz, , and so . Consequently (again, see Figure 5), there is an intermediate function for some such that
for
for and for
As a consequence of these properties, the set is strictly contained in the set and so (again, in all cases of interest) . This means that such can not be extremal for the two-sided problem for this particular value of and shows that an extremal function must satisfy (66). An alternative take on this argument is that we just produced an extremal function, namely , for which (66) holds (this doesn’t even require that the inequality stated earlier in this paragraph is strict).
In the above argument we tacitly assumed that and existed (i.e., the sets appearing in their definitions were nonempty) and that they belonged to the support of ; this is what we meant by “cases of interest.” However, if – for example – the set was empty, then it would follow that in fact for all and, consequently, . This means that we would be back to the one-sided problem, for which we know that the he functions are extremal. Another caveat is that if the interval was not included in (the support of the measure from (25)), it might happen that is the same as . Again, this is not a problem. First, we can replace one putatively extremal function by another one by modifying it outside of the support of , which has no effect on the quantities under consideration. Next, means that we are again de facto in the setting of the one-sided problem.
Step To summarize the analysis up to this point, the extremal functions satisfy the property (66), which can be subsumed as follows: for some ,
(67) |
Since the density of decreases away from , it is apparent that will be minimized when is as large as possible and, by Step , it is enough to consider . Given such putatively extremal function (with associated ), define by
(68) |
Then for and everywhere else. Accordingly, , and so if is defined by , then
It follows that and the inequality is strict unless on the support of (in other words, -a.e.). Consequently,
(69) |
and so if was extremal, so is . Since the function is – up to an additive constant – of the form , (42) follows.
It remains to justify the last assertion of Lemma 11, namely that “it is sufficient to consider .” In the argument above, the extremal function being of the form , this condition translates to , an equality which can be immediately deduced in many cases of interest. This happens, for example, if the function defining the density of is strictly decreasing on its support (which holds in our setting, cf. (25)). In that case, the function is strictly increasing for (excluding, if applicable, the “trivial” range, i.e., the values of for which the interval does not intersect the support of ). Now, if we had , a strict inequality in (69) would follow, contradicting the extremality of .
This special case is sufficient for our intended applications of Lemma 11. And here is a sketch of the argument addressing the case of general . Let and repeat the construction above with replaced by to obtain etc. Each is extremal and, up to an additive constant, is of the form with . Next, since , it follows that , and so the sequence must converge to some finite limit . The limit function will be also extremal and will satisfy . ∎
5.3. Proof of Lemma 13.
Since the proof of Lemma 12 uses the inequalities from Lemma 13, we will start with the latter. Let us recall the main instances of the inequalities in question :
Here is an outline of the proof. We start with (45), which is effectively a lower bound for the density of . Taking logarithms of both sides and substituting we see that, for a given , (45) is equivalent to.
(70) |
It is readily verified that , while on some interval (where depends on ) and for . Now, the domain of is and , so it follows that on some interval (again, depending on ) and for . Accordingly, (70) holds, for a given , iff . Since (by elementary calculus) we have equality in the limit, this would follow once if the sequence was nonincreasing. This is not exactly true, but almost: in fact it is nonincreasing starting with . For , this can be established numerically by substituting, say, (to have a compact interval to deal with) and by differentiating with respect to . The remaining instances are handled by directly evaluating , and for .
We now pass to the analysis of (46). As in the argument that led to (45), this reduces to verifying that the inequality holds for , and one way to show that is by establishing that the sequence , which converges to , is increasing. Again, this can be verified by the method suggested above. This fact is fairly delicate and is (roughly) equivalent to the inequality , which – while not standard – may be known, and is also fairly easy to show directly. ∎
5.4. Proof of Lemma 12.
We first use the bound (45) to prove the estimate from Lemma 12 for . (Again, the choice of the cutoff is predicated on the other approach taking care of .) The heuristics are as follows. First, having an estimate of the form allows to express both sides of the inequality in terms of the variable . Next, it turns out that after this substitution, the density of is bounded from below on the interval (and, a fortiori, for ) by a strictly positive constant, independent of . This means that can be upper-bounded on that interval by a strictly decreasing linear function, with equality at , and so – for moderate values of – we will have a strict separation between that function and , leading to the asserted improved upper bound.
Here are some of the details. First, the change of variables leads to
By Proposition 8(ii), the factor in front of the integral converges to as (basically, this reflects the fact that, as we mentioned earlier, the random variable approximates the standard normal), but – unfortunately – is strictly smaller than the limit. If we ignore that discrepancy and compare the “ideal upper bound” for to , we get the picture as shown in Figure 6.
Clearly and unsurprisingly, there is some room to spare between the ideal upper bound and the bound asserted in Lemma 12, which establishes that bound (in the range ) for sufficiently large . Since the constants are explicit, we can verify that “sufficiently large” means here “.” Smaller values of can be checked directly.
Finally, we can check directly (numerically) that, for , the bound in question is valid when .
It remains to show the estimate from Lemma 12 for . In that range, we will use the bound (20) from Proposition 6, which restated in the current context asserts that
We next appeal to (46) to upper-bound by . Once this is done, it remains to check that the coefficient doesn’t exceed as long as , which is straightforward and completes the proof. (The sequence being increasing, see Proposition 8, comes in handy here.) ∎
5.5. Proof of inequality (60)
We first rewrite the inequality from the conclusion of (60) as follows
(71) |
By elementary trigonometry,
(72) |
We next use the constraint from (60) to upper-bound the last two terms on the right-hand side of (72). We have
where we used consecutively the inequality , the substitution , and the bound (46), which applies since . Similarly
We now note that the sequence decreases to (by Proposition 8) and so, for , can be upper-bounded by . Moreover, the coefficients and are decreasing functions of , so they can be upper-bounded by their vales at . Putting these estimates together, we conclude that
(73) |
At the same time, again for and ,
Now, for , the sequence increases to , so it can be lower-bounded by its initial term. In our setting, the initial term corresponds to , and so . This means that the inequality (71) indeed holds and, again, it is not close. ∎