We give a pseudorandom generator that fools m-facet polytopes over {0, 1}n with seed length polylog(m) · log n. The previous best seed length had superlinear dependence on m.
1 Introduction
Unconditional derandomization has been a major focus of research in computational complexity theory for more than 30 years. A significant line of work in this area has been on developing unconditional pseudorandom generators (PRGs) for various types of Boolean functions. Early seminal results in this vein focused on Boolean circuits [1, 43, 45] and branching programs [22, 44, 46], but over the past decade or so a new strand of research has emerged in which the goal is to construct PRGs against halfspaces and various generalizations of halfspaces. This work has included a sequence of successively more efficient PRGs against halfspaces [9, 15, 27, 29, 35, 39], low-degree polynomial threshold functions [10, 24, 25, 27, 28, 39], and, most relevant to this article, intersections of halfspaces [8, 18, 19, 56].
Since intersections of \(m\) halfspaces correspond to \(m\) -facet polytopes, and also to \(\lbrace 0,1\rbrace\) -integer programs with \(m\) constraints, these objects are of fundamental interest in high-dimensional geometry, optimization, and a range of other areas. A pseudorandom generator that \(\delta\) -fools intersections of \(m\) halfspaces can equivalently be viewed as an explicit discrepancy set for \(m\) -facet polytopes: a small subset of \(\lbrace 0,1\rbrace ^n\) that \(\delta\) -approximates the \(\lbrace 0,1\rbrace ^n\) -volume of every \(m\) -facet polytope. (Discrepancy sets are stricter versions of hitting sets, which are only required to intersect every polytope of volume at least \(\delta\) .) The problem of constructing a PRG for intersections of \(m\) halfspaces is also a stricter version of the algorithmic problem of deterministically approximating the number of solutions of a \(\lbrace 0,1\rbrace\) -integer program with \(m\) constraints. It is stricter because a PRG yields an input-oblivious algorithm: The range of a PRG is a single fixed set of points that gives approximately the right answer for every \(\lbrace 0,1\rbrace\) -integer program. Beyond pseudorandomness, intersections of halfspaces also play a significant role in other fields such as concrete complexity theory [4, 26, 40, 48, 58, 59] and computational learning theory [6, 16, 30, 32, 33, 34, 57, 64].
The main result of this article is a new PRG for intersections of \(m\) halfspaces. Its seed length grows polylogarithmically with \(m\) , which is an exponential improvement of the previous best PRG for this class. Before giving the precise statement of our result, we briefly describe the prior state-of-the-art-for this problem.
1.1 Prior work on PRGs for Intersections of Halfspaces
A halfspace \(F(x) = \mathds {1}[w \cdot x \le \theta ]\) is said to be \(\tau\) -regular if \(|w_j| \le \tau \Vert w\Vert _2\) for all \(j \in [n]\) ; intuitively, a \(\tau\) -regular halfspace is one in which no coefficient \(w_j\) is too large relative to the overall scale of all the coefficients. Harsha, Klivans, and Meka [19] gave a PRG that \(\delta\) -fools any intersection of \(m\) many \(\tau\) -regular halfspaces with seed length \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) , where \(\tau\) has to be sufficiently small relative to \(m\) and \(\delta\) (specifically, \(\tau \le \text{some } \mathrm{poly}(\frac{\delta }{\log m})\) is required). While this seed length has the desirable property of being polylogarithmic in \(m\) , due to the regularity requirement this result cannot be used to fool intersections of even two general halfspaces. We note that there are very basic halfspaces, such as \(F(x) = \mathds {1}[x_1 \le 1/2]\) , that are highly irregular.
Recently, Reference [56] built on the work of Reference [19] to give a PRG that fools a different subclass of intersections of halfspaces. They give a PRG that \(\delta\) -fools any intersection of \(m\) many weight- \(W\) halfspaces with seed length \(\mathrm{poly}(\log m, W, 1/\delta)\cdot \mathrm{polylog}\,n\) ; a halfspace has weight \(W\) if it can be expressed as \(\mathds {1}[w \cdot x \le \theta ]\) where each coefficient \(w_j\) is an integer of magnitude at most \(W\) . Unfortunately, many \(n\) -variable halfspaces require weight polynomially or even exponentially large in \(n\) ; in fact, a counting argument shows that almost all halfspaces require exponentially large weight. Therefore, the Reference [56] result also cannot be used to fool even two general halfspaces.
In Reference [18], Gopalan, O’Donnell, Wu, and Zuckerman gave a PRG that can fool intersections of \(m\) general halfspaces. However, various aspects of their approach each necessitate a seed length that is at least linear in \(m\) , and indeed their overall seed length is \(O((m \log (m/\delta) + \log n) \cdot \log (m/\delta))\) .1 So, while this PRG is notable for being able to handle intersections of general halfspaces, its seed length becomes trivial (greater than \(n\) ) for intersections of \(m \ge n\) many halfspaces. (Indeed, this PRG of Reference [18] fools arbitrary monotone functions of \(m\) general halfspaces, with intersections (i.e., Ands) being a special case. Due to the generality of this class—which of course includes every monotone function over \(\lbrace 0,1\rbrace ^m\) —it can be shown that any PRG for it has to have at least linear seed length dependence on \(m\) .)
1.1.1 PRGs over Gaussian Space.
There has also been work on PRGs for functions over \(\mathds R^n\) endowed with the \(n\) -dimensional Gaussian distribution. Analyses in this setting are often facilitated by the continuous nature of \(\mathds R^n\) and rotational invariance of the Gaussian distribution, useful technical properties not afforded by the standard setting of Boolean space. For halfspaces and polytopes, PRGs over Gaussian space can be viewed as a first step towards PRGs over Boolean space; as we describe below, Boolean PRGs even for restricted subclasses of halfspaces and polytopes yield Gaussian PRGs for general halfspaces and polytopes, but the converse does not hold. We also note that the correspondence between polytopes and \(\lbrace 0,1\rbrace\) -integer programs is specific to Boolean space, and in particular, Gaussian PRGs do not yield algorithms for counting solutions to these programs.
For halfspaces, Meka and Zuckerman [39] showed that any PRG for the subclass of \(O(\frac{1}{\sqrt {n}})\) -regular halfspaces over Boolean space yields a PRG for all halfspaces over Gaussian space. Note that \(O(\frac{1}{\sqrt {n}})\) -regular halfspaces are “the most regular” ones; every halfspace is \(\tau\) -regular for some \(\tau \in [\frac{1}{\sqrt {n}},1]\) . Reference [19] generalized this connection to polytopes: They showed that any PRG for intersections of \(m\) many \(O((\log m)/\sqrt {n})\) -regular halfspaces over Boolean space yields a PRG for intersections of \(m\) many arbitrary halfspaces over Gaussian space. Combining this with their Boolean PRG for intersections of regular halfspaces discussed above, Reference [19] obtained a Gaussian PRG for intersections of \(m\) halfspaces with seed length \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) . Recent work of Reference [8] gives a different Gaussian PRG with seed length \(\mathrm{poly}(\log m,1/\delta) + O(\log n)\) .
The focus of the current work is on the standard setting of PRGs over Boolean space, and the rest of the article addresses this (more challenging) setting.
1.2 This Work: A PRG for Intersections of General Halfspaces
Summarizing the prior state-of-the-art on PRGs over Boolean space, there were no PRGs that could fool intersections of \(m = n\) many general halfspaces, and relatedly, the best PRG for intersections of \(m\le n\) general halfspaces had a superlinear seed length dependence on \(m\) . The PRGs that could fool intersections of \(m\ge n\) halfspaces imposed technical restrictions on the halfspaces: either regularity (hence excluding simple halfspaces such as \(\mathds {1}[x_1 \le 1/2]\) ) or small weights (hence excluding almost all halfspaces). Please refer to Table 1.
\(\mathrm{poly}(\log m, W, 1/\delta)\cdot \mathrm{polylog}\,n\)
This work
Intersections of \(m\) halfspaces
\(\mathrm{poly}(\log m, 1/\delta)\cdot \log n\)
Table 1. PRGs for Intersections of Halfspaces Over \(\lbrace 0,1\rbrace ^n\)
The main result of this article is a PRG that fools intersections of \(m\) general halfspaces with a polylogarithmic seed length dependence on \(m\) :
In particular, this PRG fools intersections of \(\mathrm{quasipoly}(n)\) many halfspaces with seed length \(\mathrm{polylog}(n)\) , and its seed length remains non-trivial for intersections of exponentially many halfspaces ( \(\exp (n^c)\) where \(c \gt 0\) is an absolute constant).
An immediate consequence of Theorem 1.1 is a deterministic algorithm that runs in time \(n^{\mathrm{polylog}(m)}\) and additively approximates the number of solutions to any \(n\) -variable \(\lbrace 0,1\rbrace\) -integer program with \(m\) constraints. Prior to our result, no non-trivial deterministic algorithm (running in time \(\lt 2^n\) ) was known even for general \(\lbrace 0,1\rbrace\) -integer programs with \(m=n\) constraints. Theorem 1.1 also yields PRGs with comparable seed lengths for intersections of halfspaces over a range of other domains, such as the \(n\) -dimensional hypergrid \(\lbrace 0,1,\ldots , N\rbrace ^n\) and the solid cube \([0,1]^n\) (details are left to the interested reader).
1.3 Discussion
Since the initial conference publication of a preliminary version of this article [50], several works have appeared that are relevant to the topic of this article. One line of work has been on obtaining pseudorandom generators for functions of halfspaces (including intersections of halfspaces, i.e., polytopes) with a better dependence on the error parameter but a worse dependence on other parameters. Kabanets et al. [23] gave a PRG that \(\delta\) -fools the class of \(n\) -variable size- \(s\) de Morgan formulas with halfspace gates at the bottom and has seed length \(O(n^{1/2} s^{1/4} \log (n) \log (n/\delta))\) , and Hatami et al. [20] gave a PRG that \(\delta\) -fools the class of arbitrary functions of \(m\) halfspaces over \(n\) variables with seed length \(\tilde{O}(\sqrt {n(m + \log (1/\delta))}).\) The techniques employed in these works are very different from the methods of our article; even for the special case of intersections of \(m\) halfspaces, each of these results has a seed length that is polynomial in \(n\) and \(m\) (rather than logarithmic or polylogarithmic as in our work), but these results have a much better seed length dependence on the error parameter \(\delta .\) Achieving a polylogarithmic dependence on all three parameters \(n,m\) , and \(1/\delta\) is an interesting challenge for future work.
In a different related line of work, Arunachalam and Yao [2] have considered the problem of constructing explicit PRGs for positive spectrahedrons. A positive spectrahedron is a Boolean function \(\mathds {1}[x_1 A^1 + \cdots + x_n A^n \preceq B]\) , where the \(A^i\) ’s are \(k \times k\) positive semidefinite matrices. Building on some of the ideas and ingredients in this article and in prior work [19], they establish invariance principles and give a PRG that \(\delta\) -fools “sufficiently regular” positive spectrahedrons over \(\lbrace 0,1\rbrace ^n\) with seed length \(\mathrm{poly}(\log k, 1/\delta)\cdot \log n.\)
2 Overview of Our Proof
Our proof of Theorem 1.1 involves several novel extensions of the central technique driving this line of work, namely, Lindeberg-style proofs of probabilistic invariance principles and derandomizations thereof. We develop these extensions to overcome challenges that arise due to the generality of our setting; specifically, the fact that we are dealing with intersections of arbitrary halfspaces, with no restrictions whatsoever on their structure. One of the key new ingredients in our analysis, which we believe is of independent interest, is a sharp high-dimensional generalization of the classic Littlewood–Offord anticoncentration inequality [12, 37] that we establish. We now describe our proof and the new ideas underlying it in detail.
2.1 Background: The Reference [19] PRG for Regular Polytopes
We begin by recalling the arguments of Harsha, Klivans, and Meka [19] for fooling regular polytopes. At a high level, Reference [19] builds on the work of Meka and Zuckerman [39], which gave a versatile and powerful framework for constructing pseudorandom generators from probabilistic invariance principles; the main technical ingredient underlying the Reference [19] PRG for regular polytopes is a new invariance principle for such polytopes, which we now describe.
Reference [19]’s invariance principle and the Lindeberg method. At a high level, the Reference [19] invariance principle for regular polytopes is as follows: Given an \(m\) -tuple of regular linear forms over \(n\) input variables \(x=(x_1,\ldots ,x_n)\) (denoted by \(Ax\) , where \(A\) is an \(m\) -by- \(n\) matrix), the distribution (over \(\mathds R^m\) ) of \(A \boldsymbol {u}\) , where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) is uniform random, is very close to the distribution of \(A \boldsymbol {g}\) , where \(\boldsymbol {g}\sim {\mathcal {N}(0,1)}^n\) is distributed according to a standard \(n\) -dimensional Gaussian. Here, closeness is measured by multidimensional CDF distance; we observe that multidimensional CDF distance corresponds to test functions of the form \(\mathds {1}[Ax \le b]\) where \(b \in \mathds R^m\) , which syncs up precisely with an intersection of \(m\) halfspaces \(\mathds {1}[A_1 x \le b_1] \wedge \cdots \wedge \mathds {1}[A_m x \le b_m].\) To prove this invariance principle, Reference [19] employs the well-known Lindeberg method (see, e.g., Chapter §11 of References [47] and [61]) and proceeds in two main conceptual steps. The first step establishes a version of the result for smooth test functions, proxies for the actual “hard threshold” test functions \(\mathds {1}[Ax \le b]\) , and the second step relates distance with respect to these smooth test functions to multidimensional CDF distance via Gaussian anticoncentration. We outline each of these two steps below.
The first step is to prove an invariance principle for smooth test functions. Here, instead of measuring the distance between \(A\boldsymbol {u}\) and \(A\boldsymbol {g}\) using test functions that are orthant indicators \(\mathcal {O}_b(v_1,\ldots ,v_m) = \mathds {1}[v \le b]\) (corresponding to multidimensional CDF distance), distance is measured using a sufficiently smooth mollifier \(\widetilde{\mathcal {O}}_b : \mathds R^m \rightarrow [0,1]\) of \(\mathcal {O}_b\) . Such mollifiers, with useful properties that we now discuss, were proposed and analyzed by Bentkus [5]. In more detail, Reference [19] proves that the difference between the expectations of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {u})\) and \(\widetilde{\mathcal {O}}_b(A\boldsymbol {g})\) is bounded by a certain function involving \(\widetilde{\mathcal {O}}_b\) ’s derivatives. In fact, as in standard in Lindeberg-style proofs of invariance principles, Reference [19] actually bounds this difference with respect to any smooth test function \(\Upsilon : \mathds R^m \rightarrow \mathds R\) in terms of \(\Upsilon\) ’s derivatives; the only specific property of Bentkus’s mollifier \(\widetilde{\mathcal {O}}_b\) that is used is that its derivatives are appropriately small. At a high level, the proof of this smooth invariance principle proceeds by hybridizing from \(\Upsilon (A\boldsymbol {u})\) to \(\Upsilon (A\boldsymbol {g})\) , using the multidimensional Taylor expansion of \(\Upsilon\) to bound the error incurred in each step. (The regularity of the linear forms is used in a crucial way to control the approximation error that results from truncating the Taylor expansion at a certain fixed degree.)
The second step is to establish the desired bound on multidimensional CDF distance using the aforedescribed smooth invariance principle applied to Bentkus’s mollifier. This step relies on a second key property of Bentkus’s mollifier: \(\widetilde{\mathcal {O}}_b\) agrees with the orthant indicator \(\mathcal {O}_b\) except on a small error region near the orthant boundary. With this property in hand, a fairly simple and standard argument shows that it suffices to bound the anticoncentration of the Gaussian random variable \(A \boldsymbol {g}\) ; intuitively, such anticoncentration establishes that \(A\boldsymbol {g}\) does not place too much probability weight on the error region where \(\widetilde{\mathcal {O}}_b\) disagrees with \(\mathcal {O}_b\) . In Reference [19], the required anticoncentration for \(A \boldsymbol {g}\) follows immediately from a result of Nazarov [33, 42] on the Gaussian surface area of \(m\) -facet polytopes.
The Reference [19] PRG via a derandomized invariance principle. Having proved this invariance principle for regular polytopes, Reference [19] then establishes a pseudorandom version by derandomizing its proof. That is, they argue that their proof in fact establishes multidimensional-CDF-closeness between \(A \boldsymbol {z}\) and \(A\boldsymbol {g}\) , where \(\boldsymbol {g}\sim \mathcal {N}(0,1)^n\) is distributed according to a standard Gaussian as before, but \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) is the output of a suitable pseudorandom suitable generator \(\mathscr{G} : \lbrace -1,1\rbrace ^r \rightarrow \lbrace -1,1\rbrace ^n\) (rather than uniform random). Combining the “full-randomness” invariance principle (establishing closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {g}\) ) with this pseudorandom version (establishing closeness between \(A\boldsymbol {z}\) and \(A\boldsymbol {g}\) ), it follows from the triangle inequality that \(A\boldsymbol {z}\) and \(A\boldsymbol {u}\) are close. Recalling that multidimensional CDF distance corresponds to test functions of the form \(\mathds {1}[Ax \le b] = \mathds {1}[A_1 x \le b_1] \wedge \cdots \wedge \mathds {1}[A_m x \le b_m]\) , this is precisely equivalent to the claim that \(\mathscr{G}\) fools the intersection of \(m\) halfspaces with weight matrix \(A \in \mathds R^{m\times n}\) (and an arbitrary vector of thresholds \(b \in \mathds R^m\) ).
For later reference, we close this section with an informal description of the Reference [19] generator (for fooling intersections of \(m\) many \(\tau\) -regular halfspaces):
(1)
Pseudorandomly hash the \(n\) variables into \(L\coloneqq {\mathrm{poly}(1/\tau)}\) buckets using an \((r_\mathrm{hash}\coloneqq 2\log m)\) -wise uniform hash function \(\boldsymbol {h}: [n] \rightarrow [L]\) .
(2)
Independently across buckets, assign values to the variables within each bucket using an \((r_\mathrm{bucket}\coloneqq 4\log m)\) -wise uniform distribution.
We remark that this is the structure of the Meka–Zuckerman generator [39] for fooling a single regular halfspace, the only difference being that the relevant parameters \(L,r_\mathrm{hash},\) and \(r_\mathrm{bucket}\) are larger in Reference [19] than in Reference [39] (naturally so, given that the Reference [19] generator fools intersections of \(m\) regular halfspaces instead of a single one).
Our analysis in this article can be used to show that the Reference [39] generator, instantiated with suitable choices of \(L, r_\mathrm{hash}\) , and \(r_\mathrm{bucket}\) , fools intersections of \(m\) general halfspaces. However, for technical reasons (that are not essential for this high-level discussion), this results in a seed length that is \(\mathrm{poly}(\log m,1/\delta , \log n)\) . To achieve our seed length of \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) , we slightly extend the Reference [39] generator in two ways. First, within each bucket the variables are assigned using an \(r_\mathrm{bucket}\) -wise uniform distribution Xor-ed with an independent draw from a generator that fools small-width CNF formulas [17]. Second, we Xor the entire resulting \(n\) -bit string with an independent draw from a \(k\) -wise independent generator. (See Section 4 for a detailed description of our PRG.)
2.2 Some Key New Ingredients in our Analysis
A fundamental challenge in extending the Reference [19] PRG result from regular to general polytopes stems from the fact that an invariance principle simply does not hold for general polytopes \(Ax \le b\) . Without the regularity requirement on \(A\) , it is not true that \(A\boldsymbol {u}\) and \(A\boldsymbol {g}\) are close in CDF distance; indeed, even a single non-regular linear form such as \(x_1\) is distributed very differently under \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) versus \(\boldsymbol {g}\sim \mathcal {N}(0,1)^n\) . This therefore necessitates a significant conceptual departure from the Meka–Zuckerman framework for constructing pseudorandom generators from invariance principles: Rather than establishing closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) (where \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) is the output of a suitable pseudorandom generator) through \(A\boldsymbol {g}\) by means of an invariance principle, one has to establish closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) “directly” without using invariance.
Somewhat surprisingly, even though an invariance principle does not hold in our setting of general polytopes, our proof nonetheless proceeds via the Lindeberg method for proving invariance principles. Following the two main conceptual steps of the method (as outlined in the previous section), we first prove that \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) are close with respect to Bentkus’s smooth mollifiers \(\widetilde{\mathcal {O}}_b\) for the orthant indicators \(\mathcal {O}_b\) , and then use this to establish closeness in multidimensional CDF distance. However, the fact that we are dealing with matrices \(A \in \mathds R^{m\times n}\) whose rows are arbitrary linear forms (corresponding to the facets of general \(m\) -facet polytopes) instead of regular linear forms poses significant challenges in both steps of the Lindeberg method. We discuss some of these challenges, and the new ideas that we employ to overcome them, next. For concreteness, we will discuss these challenges and new ingredients by contrasting our proof with that of Reference [19], but we remark here that these are in fact qualitative differences between our approach and the Lindeberg method in general.
Step 1: Fooling Bentkus’s mollifier. Recall that Reference [19] first proves a general invariance principle establishing closeness in expectation (with a quantitative bound that depends on \(\Upsilon\) ’s derivatives) between \(\Upsilon (A\boldsymbol {u})\) and \(\Upsilon (A\boldsymbol {g})\) for any smooth test function \(\Upsilon\) . They then apply this general invariance principle with Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b\) being the test function, using the bounds on \(\widetilde{\mathcal {O}}_b\) ’s derivatives established in Reference [5] but no other properties of \(\widetilde{\mathcal {O}}_b\) .
In contrast, we do not prove closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) for all smooth test functions; our argument is carefully tailored to Bentkus’s specific mollifier. In addition to bounds on \(\widetilde{\mathcal {O}}_b\) ’s derivatives, we crucially rely on the specific structure of \(\widetilde{\mathcal {O}}_b\) , in particular, the fact that it is the product of \(m\) univariate functions, one for each coordinate (i.e., \(\widetilde{\mathcal {O}}_b(v) = \prod _{i=1}^m \psi _{b_i}(v_i)\) , where each \(\psi _{b_i}\) maps \(\mathds R\) to \([0,1]\) ). A high-level intuition for why such product structure is useful is as follows: By doing some structural analysis of halfspaces (see Section 5), we can decompose each of our \(m\) halfspaces into a small “head” portion, consisting of at most \(k\) variables, and a remaining “tail” portion that is regular. From this point of view, the difference between regular and general polytopes is therefore the presence of these size-at-most- \(k\) head portions in each of the \(m\) halfspaces. Very roughly speaking, the product structure of \(\widetilde{\mathcal {O}}_b\) allows us to handle these head portions using pseudorandom generators for small-width CNF formulas [17]. (To see the relevance of CNF formulas in this context, at least at a conceptual level, observe that a product of \(\lbrace 0,1\rbrace\) -valued \(k\) -juntas is a width- \(k\) CNF formula.)
Our proof incorporates these PRGs for CNFs into Reference [19]’s analysis of the regular tail portions. We highlight one interesting aspect of our analysis: In all previous instantiations of the Lindeberg method that we are aware of, expressions like \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] -{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }}^{\prime })]|\) are bounded by considering two Taylor expansions of \(\Upsilon\) , both taken around the “common point” \(\boldsymbol {v}\) . Lindeberg method arguments analyze the difference of these Taylor expansions using moment-matching properties of \({\boldsymbol {\Delta }}\) and \({\boldsymbol {\Delta }}^{\prime }\) and the fact that they are “small” in a certain technical sense, which is directly related to the regularity assumptions that underlie these invariance principles. In contrast, in our setting, since we are dealing with arbitrary linear forms rather than regular ones, we end up having to bound expressions like \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] - {{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime } + {\boldsymbol {\Delta }}^{\prime })]|\) . Note that this involves considering the Taylor expansions of \(\Upsilon\) around two distinct points \(\boldsymbol {v}\) and \(\boldsymbol {v}^{\prime }\) , which may be far from each other—indeed, a priori it is not even clear that \(|{{\bf E}}[\Upsilon (\boldsymbol {v})]-{{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime })]|\) will be small. Because of these differences from the standard Lindeberg scenario, moment-matching properties of \({\boldsymbol {\Delta }}\) and \({\boldsymbol {\Delta }}^{\prime }\) and their “smallness” no longer suffice to ensure that the overall expected difference is small. Instead, as alluded to above, our analysis additionally exploits the product structure of Bentkus’s mollifier via PRGs for CNFs to bound \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] - {{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime } + {\boldsymbol {\Delta }}^{\prime })]|\) (see Section 8).
Step 2: Anticoncentration. The next step is to pass from closeness of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {u})\) and \(\widetilde{\mathcal {O}}_b(A\boldsymbol {z})\) in expectation, to closeness of \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) in multidimensional CDF distance. We recall that in the analogous step in Reference [19]’s proof, the starting point was closeness in expectation of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {u})\) and \(\widetilde{\mathcal {O}}_b(A\boldsymbol {g})\) , where \(\boldsymbol {g}\sim \mathcal {N}(0,1)^n\) is a standard Gaussian (instead of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {z})\) where \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) is pseudorandom). For this reason, it sufficed for Reference [19] to bound the Gaussian anticoncentration of \(A\boldsymbol {g}\) and, as mentioned, such a bound is an immediate consequence of Nazarov’s bound on the Gaussian surface area of \(m\) -facet polytopes.
In contrast, since the Gaussian distribution does not enter into our arguments at all (by necessity, as explained above), we instead have to bound the Boolean anticoncentration of \(A\boldsymbol {u}\) where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) is uniform random. This task, which is carried out in Section 7, requires significantly more work; indeed, Boolean anticoncentration formally contains Gaussian anticoncentration as a special case. At the heart of our arguments for this step is a new Littlewood–Offord-type anticoncentration inequality for \(m\) -facet polytopes, a high-dimensional generalization of the classic Littlewood–Offord theorem [12, 37]. We discuss this new theorem, which we believe is of independent interest, next.
2.2.1 A Littlewood–Offord Theorem for Polytopes.
We first recall the classic Littlewood–Offord anticoncentration inequality.
Littlewood and Offord [37] first proved a bound of \(O((\log n)/\sqrt {n})\) ; Erdös [12] subsequently sharpened this to \(O(1/\sqrt {n})\) , which is optimal by considering \(w = 1^n\) and \(\theta = 0\) . (We observe that the question trivializes without the assumption on the magnitudes of \(w\) ’s coordinates; for instance, the relevant probability is \(1/2\) for \(w = (1,0,\ldots ,0)\) and \(\theta = 1\) .)
Theorem 2.1 has the following natural geometric interpretation: The maximum fraction of hypercube points that can fall within the “width-2 boundary” of a halfspace \(\mathds {1}[w\cdot x \le \theta ]\) where \(|w_j|\ge 1\) for all \(j\) is \(O(1/\sqrt {n})\) . Given this geometric interpretation, it is natural to seek a generalization from single halfspaces (i.e., 1-facet polytopes) to \(m\) -facet polytopes:
What is the maximum fraction of hypercube points \(u \in \lbrace -1,1\rbrace ^n\) that can lie within the “width-2 boundary” of an \(m\) -facet polytope \(A x \le b\) where \(|A_{ij}| \ge 1\) for all \(i\) and \(j\) ?
In more detail, we say that \(u\) lies within the “width-2 boundary” of the polytope \(Ax \le b\) provided \(Au \le b\) and \(A_i \cdot u \gt b_i - 2\) for some \(i\in [m]\) ; equivalently, \(u\) lies in the difference of the two polytopes \(Ax \le b\) and \(Ax \le b - 2\cdot \mathds {1}_m\) , where \(\mathds {1}_m\) denotes the all-1’s vector in \(\mathds R^m\) . The Littlewood–Offord theorem (Theorem 2.1), along with a naive union bound, implies a bound of \(O(m/\sqrt {n})\) ; we are not aware of any improvement of this naive bound prior to our work.
We give an essentially complete answer to this question, with upper and lower bounds that match up to constant factors. In Section 7, we prove the following “Littlewood–Offord theorem for polytopes”:
Our proof of Theorem 2.2 draws on and extends techniques from Kane’s bound on the Boolean average sensitivity of \(m\) -facet polytopes [26]. We complement Theorem 2.2 with a matching lower bound, which establishes the existence of an \(m\) -facet polytope with an \(\Omega (\sqrt {\ln m}/\sqrt {n})\) -fraction of hypercube points lying within its width-2 boundary. (In fact, our lower bound is slightly stronger: It establishes the existence of a polytope with an \(\Omega (\sqrt {\ln m}/{\sqrt {n}})\) -fraction of hypercube points lying on its surface, corresponding to its width-0 boundary.)
Theorem 2.2 does not suffice for the purpose of passing from closeness with respect to Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b\) to closeness in multidimensional CDF distance (i.e., Step 2 in Section 2.2): While the assumption on the magnitudes of \(A\) ’s entries is essential to Theorem 2.2 (just as the analogous assumption on \(w\) ’s coordinates is essential to the Littlewood–Offord theorem), the weight matrix of a general \(m\) -facet polytope need not have this property. In Section 7, we establish various technical extensions of Theorem 2.2 that are required to handle this issue.
We close this section with a discussion of the connection between our techniques and those of the recent work cited in Reference [56]. Recall that the main result of Reference [56] is a PRG for \(\delta\) -fooling intersections of \(m\) weight- \(W\) halfspaces using seed length \(\mathrm{poly}(\log m, W,1/\delta)\cdot \mathrm{polylog}\, n\) (whereas our main result, which is strictly stronger, is a PRG for \(\delta\) -fooling intersections of \(m\) general halfspaces using seed length \(\mathrm{poly}(\log m, 1/\delta)\cdot \log n\) , with no dependence on the weights of the halfspaces).
A key structural observation driving Reference [56] is that every intersection of \(m\) low-weight halfspaces can be expressed as \(H \wedge G\) , where \(H\) is an intersection of \(m\) regular halfspaces and \(G\) is a small-width CNF. (The width of \(G\) grows polynomially with the weights of the halfspaces, and this polynomial growth is responsible for the polynomial dependence on \(W\) in the seed length of the Reference [56] PRG.) From this starting point, it suffices for Reference [56] to bound the multidimensional CDF distance between the \((\mathds R^{m} \times \lbrace \pm 1\rbrace)\) -valued random variables \((A\boldsymbol {u}, G(\boldsymbol {u}))\) and \((A\boldsymbol {z}, G(\boldsymbol {z}))\) , where \(A \in \mathds R^{m\times n}\) is the weight matrix of \(H\) , \(\boldsymbol {u}\) is uniform random, and \(\boldsymbol {z}\) is the output of the Reference [56] PRG (which is a slight variant of Reference [19]’s pseudorandom generator). Since \(H\) is an intersection of regular halfspaces, the fact that \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) are close in multidimensional CDF distance is precisely the main result of Reference [19]; the crux of the work in Reference [56], therefore, lies in dealing with the additional distinguished \((m+1)^{\text{st}}\) coordinate corresponding to the CNF \(G\) . Very roughly speaking, Reference [56] employs a careful coupling \((\widehat{\boldsymbol {u}},\widehat{\boldsymbol {z}})\) (whose existence is a consequence of the fact that bounded independence fools CNFs [3, 52]) to ensure that \(G(\widehat{\boldsymbol {u}})\) and \(G(\widehat{\boldsymbol {z}})\) almost always agree, and hence these \((m+1)^{\text{st}}\) coordinates “have a negligible effect” throughout Reference [19]’s Lindeberg-based proof of the regular case establishing closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) .
Because of the aforementioned structural fact (that an \(m\) -tuple of low-weight halfspaces is equivalent to “an \(m\) -tuple of regular halfspaces plus a CNF”), the low-weight case analyzed in Reference [56] did not require as significant a departure from Reference [19]’s approach, and from the Lindeberg method as a whole, as the general case that is the subject of this article. In particular, the new ideas discussed in Section 2.2 that are central to our proof were not present in Reference [56]’s analysis for the low-weight case. To elaborate on this,
•
Reference [56] did not have to exploit the product structure of Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b\) to fool it. Like Reference [19], the arguments of Reference [56] establish closeness in expectation between \(\Upsilon (A \boldsymbol {u}, G(\boldsymbol {u}))\) and \(\Upsilon (A \boldsymbol {z}, G(\boldsymbol {z}))\) for all smooth test functions \(\Upsilon\) , and the only properties of Bentkus’s mollifier that are used are the bounds on its derivatives given in Reference [5] (which are used in a black box way). The simpler setting of Reference [56] also did not necessitate comparing the Taylor expansions of \(\Upsilon\) around distinct points, as discussed in Section 2.2.
•
Reference [56] did not have to reason about Boolean anticoncentration, which, as discussed above, requires significant novel conceptual and technical work, including our new Littlewood–Offord theorem for polytopes. Like Reference [19], Reference [56] was able to apply Nazarov’s Gaussian anticoncentration bounds as a black box to pass from fooling Bentkus’s mollifier to closeness in multidimensional CDF distance.
3 Preliminaries
For convenience, in the rest of the article, we view halfspaces as having the domain \(\lbrace -1,1\rbrace ^n\) rather than \(\lbrace 0,1\rbrace ^n\) . We remind the reader that a halfspace \(F: \lbrace -1,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) is a function of the form \(F(x) = \mathds {1}[w \cdot x \le \theta ]\) for some \(w \in \mathds R^n\) , \(\theta \in \mathds R\) .
For an \(n\) -dimensional vector \(y\) and subset \(B \subseteq [n]\) , we write \(y_B\) to denote the \(|B|\) -dimensional vector obtained by restricting \(y\) to the coordinates in \(B\) . For an \(m \times n\) matrix \(A\) and subset \(B \subseteq [n]\) , we write \(A^B\) to denote the \(m\times |B|\) matrix obtained by restricting \(A\) to the columns in \(B\) . For indices \(i \in [m]\) and \(j \in [n]\) , we write \(A_i\) to denote the \(n\) -dimensional vector corresponding to the \(i\) th row of \(A\) , and \(A^j\) to denote the \(m\) -dimensional vector corresponding to the \(j\) -column of \(A\) .
3.1 Regularity, Orthants, and Taylor’s Theorem
Translated orthants and their boundaries. For \(b \in \mathds R^m\) , we write \(\mathcal {O}_b \subset \mathds R^m\) to denote the translated orthant
\[\mathcal {O}_b = \lbrace v \in \mathds R^m :v_i \le b_i \text{ for all $i\in [m]$}\rbrace .\]
We will overload notation and also write “ \(\mathcal {O}_b\) ” to denote the indicator \(\mathds R^m \rightarrow \lbrace 0,1\rbrace\) of the orthant \(\mathcal {O}_b\) (i.e., \(\mathcal {O}_b(v) = \mathds {1}[v \le b]\) ). We write \(\Game \mathcal {O}_{b} \subset \mathcal {O}_b\) to denote \(\mathcal {O}_b\) ’s surface,
\[\Game \mathcal {O}_b = \lbrace v \in \mathcal {O}_b :v_i = b_i \text{ for some $i\in [m]$}\rbrace .\]
For \(\Lambda \gt 0\) , we write \(\Game _{-\Lambda }\mathcal {O}_b\) and \(\Game _{+\Lambda } \mathcal {O}_b\) to denote the inner and outer \(\Lambda\) -boundaries of \(\mathcal {O}_b\) ,
and \(\Game _{\pm \Lambda }\mathcal {O}_b\) to denote the disjoint union \(\Game _{\pm \Lambda }\mathcal {O}_b = \Game _{+\Lambda }\mathcal {O}_b \sqcup \Game _{-\Lambda }\mathcal {O}_b\) .
Derivatives and multidimensional Taylor expansion. We write \(\psi ^{(d)}\) to denote the \(d\) th derivative of a \(\mathcal {C}^d\) function \(\psi : \mathds R\rightarrow \mathds R.\) For an \(m\) -dimensional multi-index \(\alpha = (\alpha _1,\ldots ,\alpha _m) \in \mathds N^m\) , we write \(|\alpha |\) to denote \(\alpha _1 + \cdots + \alpha _m\) , and \(\alpha !\) to denote \(\alpha _1! \alpha _2! \cdots \alpha _m!\) . Given a vector \(\Delta \in \mathds R^m\) , the expression \(\Delta ^\alpha\) denotes \(\prod _{i=1}^m \Delta _i^{\alpha _i}\) . Given a function \(\Upsilon : \mathds R^m \rightarrow \mathds R\) , the expression \(\partial _\alpha \Upsilon\) denotes the mixed partial derivative taken \(\alpha _i\) times in the \(i\) th coordinate.
The following is a straightforward consequence of the multidimensional Taylor theorem, upper-bounding the error term by the \(L_1\) -norm of the derivatives times the \(L_\infty\) -norm of the offset-powers:
3.2 Pseudorandomness Preliminaries
Throughout this work we use boldface for random variables and random vectors. If \(\mathcal {D}\) is a probability distribution, then we write \(\boldsymbol {x}\sim \mathcal {D}\) to denote that \(\boldsymbol {x}\) is drawn from that distribution. For example, \(\mathcal {N}(0,1)\) will denote the standard normal distribution, so \(\boldsymbol {g}\sim \mathcal {N}(0,1)\) means \(\boldsymbol {g}\) is a standard Gaussian random variable. In case \(S\) is a finite set, the notation \(\boldsymbol {x}\sim S\) will mean that \(\boldsymbol {x}\) is chosen uniformly at random from \(S\) . The most common case for this will be \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) , meaning that \(\boldsymbol {u}\) is chosen uniformly from \(\lbrace -1,1\rbrace ^n\) . We will reserve \(\boldsymbol {u}\) for this specific random vector.
We recall the definition of a pseudorandom generator:
Bounded independence and hash families. A sequence of random variables \(\boldsymbol {x}_1, \ldots , \boldsymbol {x}_n\) is said to be \(r\) -wise independent if any collection of \(r\) of them is independent. In case the \(\boldsymbol {x}_i\) ’s are uniformly distributed on their range, we say the sequence is \(r\) -wise uniform. We will also use this terminology for distributions \(\mathcal {D}\) on \(\lbrace -1,1\rbrace ^n\) . An obvious but useful fact about \(r\) -wise uniform PRGs \(\mathscr{G}\) is that they 0-fool the class of degree- \(r\) polynomials \(\lbrace -1,1\rbrace ^n \rightarrow \mathds R\) .
A distribution \(\mathcal {H}\) on functions \([n] \rightarrow [L]\) is said to be an \(r\) -wise uniform hash family if, for \(\boldsymbol {h}\sim \mathcal {H}\) , the sequence \((\boldsymbol {h}(1), \ldots , \boldsymbol {h}(n))\) is \(r\) -wise uniform. Such a distribution also has the property that for any \(\ell \in [L]\) , the sequence \((\mathds {1}_{\boldsymbol {h}(1) = \ell }, \ldots , \mathds {1}_{\boldsymbol {h}(n) = \ell })\) is \(r\) -wise independent on \(\lbrace 0,1\rbrace ^n\) , with each individual random variable being Bernoulli \((1/L)\) . Well-known constructions (see, e.g., Section 3.5.5 of Reference [63]) give that for every \(n,L\) and \(r\) , there is an \(r\) -wise uniform hash family \(\mathcal {H}\) of functions \([n] \rightarrow [L]\) such that choosing a random function from \(\mathcal {H}\) takes \(O(r \log (nL))\) random bits (and evaluating a function from \(\mathcal {H}\) takes time \(\mathrm{poly}(r, \log n, \log L)\) ), and consequently there are known efficient constructions of \(r\) -wise uniform distributions over \(\lbrace 0,1\rbrace ^n\) with seed length \(O(r \log n).\)
Fooling CNFs. Gopalan, Meka, and Reingold [17] have given an efficient explicit PRG that fools the class of small-width CNFs:
4 Our PRG
The Meka–Zuckerman generator. As stated earlier, the PRG that we will analyze is a slight variant of a PRG first proposed by Meka and Zuckerman for fooling a single halfspace [39]. We begin by recalling the Meka–Zuckerman PRG.
In words, an \(r_\mathrm{hash}\) -wise uniform hash \(\boldsymbol {h}\) is used to partition the variables \(x_1,\ldots ,x_n\) into \(L\) “buckets,” and then independently across buckets, the variables in each bucket are assigned according to an \(r_\mathrm{bucket}\) -wise uniform distribution.
We note in passing that the generators of References [19, 56] also have this structure (though the choice of parameters \(L, r_\mathrm{bucket}\) , and \(r_\mathrm{hash}\) are different than those in Reference [39]).
Our generator. Now we are ready to describe our generator and bound its seed length. Roughly speaking, our generator extends the Meka–Zuckerman generator by (i) additionally Xor-ing each bucket with an independent pseudorandom variable that fools CNF formulas; and (ii) globally Xor-ing the entire resulting \(n\) -bit string with an independent draw from a \(2k\) -wise uniform distribution.
Recalling the standard constructions of \(r\) -wise uniform hash functions and random variables described at the end of Section 3, we have the following:
4.1 Setting of Parameters
We close this section with the parameter settings for fooling intersections of \(m\) halfspaces over \(\lbrace -1,1\rbrace ^n\) . Fix \(\varepsilon \in (0,1)\) to be an arbitrarily small absolute constant; the parameters we now specify will be for fooling to accuracy \(O_\varepsilon (\delta) = O(\delta)\) . We first define a few auxiliary parameters:
\begin{align*} \lambda &= \frac{\delta }{\sqrt {\log (m/\delta) \log m}} \qquad \qquad \qquad {\rm {(Dictated by Equation~(31))}} \\ \tau &= \frac{\delta ^{1+\varepsilon }}{(\log m)^{2.5+2\varepsilon }} \qquad \qquad \qquad {\rm {(Dictated by Equation~(30))}} \\ d &= \text{constant depending only on $\varepsilon $.} \qquad \qquad \qquad {\rm {(Dictated by Equation~(30))}} \end{align*}
The precise value of \(d =d(\varepsilon)\) will be specified in the proof of Theorem 8.1. We will instantiate our generator \(\mathscr{G} = \mathscr{G}(L,r_\mathrm{hash},r_\mathrm{bucket},k,w,\delta _\mathrm{CNF})\) with parameters:
\begin{align*} L &= \frac{(\log m)^5}{\delta ^{2+\varepsilon }} \qquad \qquad \qquad {\rm {(Constrained by Equation~(30),}} \\ & \qquad \qquad \qquad {\rm chosen to optimize seed length)} \end{align*}
\begin{align*} r_\mathrm{hash}&= C_1 \log (Lm/\delta) \qquad \qquad \qquad {\rm {(Dictated by Proposition~8.11)}} \\ r_\mathrm{bucket}&= \log (m/\delta) \qquad \qquad \qquad {\rm {(Dictated by Lemma~8.10)}} \\ k &= \frac{C_2 \log (m/\delta)\log \log (m/\delta)}{\tau ^2} \qquad \qquad \qquad {\rm {(Dictated by Theorem~5.1)}} \\ w &= \frac{2k}{L} \qquad \qquad \qquad {\rm {(Dictated by Proposition~8.11)}} \\ \delta _\mathrm{CNF}&= \frac{\delta }{L}\cdot {\lparen }*{\rparen }{\frac{\lambda }{m\sqrt {n}}}^{d-1}, \qquad \qquad \qquad {\rm {(Dictated by Equation~(30))}} \end{align*}
where \(C_1\) and \(C_2\) are absolute constants specified in the proofs of Proposition 8.11 and Theorem 5.1, respectively.
Our seed length:. By Fact 4.3, our overall seed length is
\begin{equation} \mathrm{polylog}(m) \cdot \delta ^{-{(2+\varepsilon)}}\cdot \log n \end{equation}
(2)
for any absolute constant \(\varepsilon \in (0,1)\) .
5 Reduction to Standardized Polytopes
5.1 A Reduction from Fooling Polytopes to Fooling Standardized Polytopes
In this section, we reduce from the problem of fooling general \(m\) -facet polytopes to the problem of fooling \(m\) -facet \((k,\tau)\) -standardized polytopes (Definition 3.1). The main technical result we prove in this section is the following:
We stress that Lemma 5.1 establishes that \(\mathds {1}[Ax \le b]\) is well-approximated by \(\mathds {1}[A^{\prime }x \le b^{\prime }]\) under both the uniform distribution and the pseudorandom distribution constructed by our generator, since both of these distributions are \(2k\) -wise uniform. (Note that a draw \(\boldsymbol {z}= \breve{\boldsymbol {y}} \oplus \boldsymbol {y}^\star\) from our generator is indeed \(2k\) -wise uniform, since \(\boldsymbol {y}^\star\) is; indeed,Lemma 5.1 is the motivation for why our construction includes a bitwise-Xor with \(\boldsymbol {y}^\star\) .) This is crucial: In general, given a function \(F\) and an approximator \(F^{\prime }\) that is close to \(F\) only under the uniform distribution (i.e., \({{\bf Pr}}[F(\boldsymbol {u}) \ne F^{\prime }(\boldsymbol {u})]\) is small), fooling \(F^{\prime }\) does not suffice to fool \(F\) itself.
Given Lemma 5.1, to prove Theorem 1.1 it is sufficient to prove the following:
The rest of the article is devoted to proving Theorem 5.3.
6 Bentkus’s Mollifier and Its Properties
In this section we introduce and analyze Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b: \mathds R^m \rightarrow (0,1)\) , which is a smoothed version of the translated orthant indicator function \(\mathcal {O}_b: \mathds R^m \rightarrow \lbrace 0,1\rbrace\) from Section 3.1.
Since \(\mathcal {O}_b(v) = \prod _{i=1}^m \mathds {1}[v_i \le b_i]\) and \(\mathcal {N}(0,1)^m\) is a product distribution, the mollifier \(\widetilde{\mathcal {O}}_{b,\lambda }\) can be equivalently defined as follows:
This product structure of Bentkus’s mollifier will be crucially important for us in the analysis that we carry out in Section 8.1. We note the following translation property of Bentkus’s mollifier:
In Section 8.1 we will also use the following global bound on the magnitude of the derivatives of the Gaussian-mollified halfline:
The following result, from Bentkus [5, Theorem 3(ii)], can be viewed as a multidimensional generalization of Fact 6.4. (Strictly speaking, Reference [5] only considers \(b\) ’s of the form \((\theta , \theta , \ldots , \theta)\) , but by translation-invariance the bound holds for all \(b \in \mathds R^m.\) )
Recall from (1) that \(\Game _{-\Lambda }\mathcal {O}_b = \mathcal {O}_b \setminus \mathcal {O}_{b - (\Lambda , \ldots , \Lambda)}\) and \(\Game _{+\Lambda }\mathcal {O}_b = \mathcal {O}_{b + (\Lambda , \ldots , \Lambda)} \setminus \mathcal {O}_{b}.\) We will use the following notions of approximation for translated orthants:
The connection between Bentkus’s mollifier and these notions of approximation is established in the following claim:
6.1 The Connection between Inner/outer Approximators and CDF Distance
The following elementary properties of inner/outer approximators will be useful for us:
The next lemma is straightforward but very useful for us. Intuitively, it says that for an \(\mathds R^m\) -valued random variable \(\tilde{\boldsymbol {v}}\) to fool a translated orthant \(\mathcal {O}_b\) relative to another \(\mathds R^m\) -valued random variable \(\boldsymbol {v}\) , it suffices to (i) have \(\tilde{\boldsymbol {v}}\) fool both inner and outer approximators for \(\mathcal {O}_b\) , and (ii) establish anticoncentration of the original random variable \(\boldsymbol {v}\) at the inner and outer boundaries of \(\mathcal {O}_b\) . We explain in detail how we will use this lemma after giving its proof below.
6.1.1 Applying Lemma 6.9 in the context of Theorem 5.3, and the organization of the rest of this article.
Applying Lemma 6.9 with \(\boldsymbol {v}\) and \(\tilde{\boldsymbol {v}}\) being \(A\boldsymbol {u}\) and \(A \boldsymbol {z}\) , respectively, the task of bounding
Fooling Bentkus’s mollifier: bounding \(|{{\bf E}}[\widetilde{\mathcal {O}}(A\boldsymbol {u})] - {{\bf E}}[\widetilde{\mathcal {O}}(A\boldsymbol {z})]|\) for \(\widetilde{\mathcal {O}} \in \lbrace \widetilde{\mathcal {O}}_{b^{{\mathrm{out}}},\lambda },\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }\rbrace\) , the inner and outer approximators for \(\mathcal {O}_b\) given by Lemma 6.7.
Section 7 is devoted to the former and Section 8 the latter. In Section 9, we put these pieces together to prove Theorem 5.3.
7 Boolean Anticoncentration Within Orthant Boundaries
The main result of this section is Theorem 7.1, which provides the first step of the two-step program described at the end of Section 6:
En route to proving Theorem 7.1, we will establish a “Littlewood–Offord theorem for polytopes,” Theorem 2.2, that was stated in Section 2.2.1. Theorem 2.2 will in fact be obtained as a special case of a more general result about intersections of \(m\) arbitrary unate functions (namely, Lemma 7.13).
Our analysis, dealing as it does with intersections of unate functions, is somewhat reminiscent of that of Reference [26], and indeed we will establish the main result of Reference [26]—an upper bound of \(O(\sqrt {n \log m})\) on the average sensitivity of any intersection of \(m\) unate functions—in the course of our analysis.
7.1 Caps and Their Boundary Edges
Let \(G\) and \(H\) be subsets of \(\lbrace -1,1\rbrace ^n\) . We typically think of \(G\) as a \(G\) eneral/arbitrary set and \(H\) as a \(H\) alfspace, though formally \(H\) will only need to be unate. Throughout this section, we write \(\sigma \in \lbrace -1,1\rbrace ^n\) to denote the orientation of \(H\) .
We call the set \(G \setminus H\) the cap, the set \(G\cap H\) the body, and the complement of \(G\) the exterior. Please refer to Figure 1, where \(G\) is the union of the two regions with blue shading and \(H\) is the gray-shaded region (depicted as a halfspace in the figure). The upward arrows in the diagram illustrate some edges of the hypercube. We have oriented these edges according to \(\sigma\) : For an edge \(\lbrace x,y\rbrace\) in the \(j\) th direction in which \(x_j = -1\) and \(y_j = 1\) , the tail of the corresponding arrow represents \(x\) if \(\sigma _j = -1\) and \(y\) if \(\sigma _j = 1\) . Note in particular that the edges are oriented “away” from \(H\) (i.e., so that \(H\) is antimonotone with respect to the edge orientations).
Fig. 1.
We will be concerned with the boundary edges for the cap \(G \setminus H\) ; these are edges that have one endpoint inside \(G \setminus H\) and one endpoint outside it.
We distinguish the three possible types of boundary edges of the cap \(G \setminus H\) :
•
Body \(\rightarrow\) Cap (BC) edges: the red edges in the diagram. Formally, these are edges where the tail is in the body \(G\cap H\) and the head is in the cap \(G\setminus H\) .
•
Exterior \(\rightarrow\) Cap (EC) edges: the green edges in the diagram. Formally, these are edges where the tail is not in \(G\) , and the head is in the cap \(G\setminus H\) .
•
Cap \(\rightarrow\) Exterior (CE) edges: the purple edges in the diagram. Formally, these are edges where the tail is in the cap \(G\setminus H\) and the head is not in \(G\) .
Given a cap \(C = G \setminus H\) , we write \(\mathrm{BC}(G,H)\) , \(\mathrm{EC}(G,H)\) , \(\mathrm{CE}(G,H)\) for the fraction of hypercube edges of each of the three above types. Therefore, \({\mathcal {E}}(C) = \mathrm{BC}(G,H) + \mathrm{EC}(G,H) + \mathrm{CE}(G,H)\) .
We will also be interested in the directed edge boundary of caps:
It will be very useful for us to have an upper bound on \({\mathcal {E}}(G \cap H) - {\mathcal {E}}(G)\) , the change in \({\mathcal {E}}(G)\) when we intersect \(G\) with \(H\) (note that this quantity can be either positive or negative). The following fact is immediate from the definitions:
Comparing Equations (10) and (9), we plainly have:
To get a quantitative bound, we have the following lemma:
7.1.1 Reproving the Main Result of Reference [26].
We can now reprove the main result of Reference [26] (which we will use later):
7.2 A Littlewood–Offord Theorem for Polytopes (Theorem 2.2)
We note in passing that the anticoncentration bound given by Theorem 2.2 is best possible up to constant factors. Indeed, our matching lower bound applies even to the stricter event of falling on the surface of \(\mathcal {O}_b\) :
As mentioned at the beginning of this section, we will obtain Theorem 2.2 as a corollary of a more general result about intersections of unate functions. Let \(H_1,\ldots ,H_m \subseteq \lbrace -1,1\rbrace ^n\) be unate sets, \(m \ge 2\) , and further suppose that we have additional unate sets \(\overline{H}_1, \ldots , \overline{H}_m\) such that \(H_i \subseteq \overline{H}_i\) for all \(i\) . (For intuition, it may be helpful to think of \(H_i\) as the “interior” of \(\overline{H}_i\) ; see the proof of Theorem 2.2 using Lemma 7.13 just below for a typical example of sets \(H_i\) and \(\overline{H}_i\) .) We define the following subsets of \(\lbrace -1,1\rbrace ^n\) :
\begin{align*} F &= \overline{H}_1 \cap \cdots \cap \overline{H}_m \\ F^\circ &= H_1 \cap \cdots \cap H_m \qquad \qquad \qquad {\rm {interior of F}}\\ \partial F &= F \setminus F^\circ \qquad \qquad \qquad {\rm {boundary of F}}\\ F^c &= \lbrace -1,1\rbrace ^n\setminus F \qquad \qquad \qquad {\rm {exterior of F}}\\ \partial H_i &= \overline{H}_i \setminus H_i \text{ (for each $i \in [m]$).} \qquad \qquad \qquad {\rm {boundary of \overline{H}_i}} \end{align*}
The rest of this section will be devoted to the proof of Lemma 7.13. Recalling that \(F^\circ\) is called the interior of \(F\) and \(\partial F\) is called the boundary of \(F\) , we say that an edge in the hypercube is boundary-to-interior if it has one endpoint in \(\partial F\) and the other endpoint in \(F^\circ\) , and we write \(\nu _{BI}\) for the fraction of all edges that are of this type. We similarly define boundary-to-exterior edges and \(\nu _{BE}\) , with \(F^c\) . Note that every boundary-to-interior edge is a boundary edge for \(F^\circ = H_1 \cap \cdots \cap H_m\) , which is an intersection of \(m\) unate sets. By applying Theorem 7.9 to \(F^\circ\) , we get that
Similarly, every boundary-to-exterior edge is a boundary edge for \(F = \overline{H}_1 \cap \cdots \cap \overline{H}_m\) ; applying Theorem 7.9 to this intersection yields
Next, we bound the fraction of edges that have both endpoints in \(\partial F\) and go between “two different parts of \(\partial F\) . More precisely, for \(x \in \partial F\) , define \(i^\star (x)\) to be the least \(i\) for which \(x \in \partial H_i\) (equivalently, the least \(i\) for which \(x \not\in H_i\) ). We say that an edge \({\lbrace x,y\rbrace }\) is boundary-to-boundary \(^{\prime }\) if \(x, y \in \partial F\) but \(i^\star (x) \ne i^\star (y)\) ; we write \(\nu _{BB^{\prime }}\) for the fraction of such edges.
Thus, Lemma 7.13 follows from Equations (14) and (15) and the following claim:
This completes the proof of Lemma 7.13 and hence Theorem 2.2.
7.3 A Robust Generalization of the Littlewood–Offord Theorem for Polytopes
In the previous section, we proved Theorem 2.2, which establishes anticoncentration of \(A\boldsymbol {u}\) under the assumption that all its entries have magnitude at least 1. The goal of this section is to prove the following robust generalization of Theorem 2.2:
Recall that Theorem 2.2 followed as an easy consequence of the fact that \(\mathrm{vol}(\partial F) \le {\frac{5 \sqrt {2\log m}}{\sqrt {n}}}\) when all \(\partial H_i\) ’s are “thin” (Lemma 7.13). We slightly generalize this notion here.
Theorem 7.16 follows as a direct consequence of the following lemma (by the same reasoning that derives Theorem 2.2 as a corollary of Lemma 7.13):
In this section, we prove Theorem 7.1 using Lemma 7.19 established in the previous section. In more detail, we use a bound on the anticoncentration of \(A\boldsymbol {u}\) under the assumption that at least an \(\alpha\) fraction of entries of each row of \(A\) have magnitude at least \(\tau\) (given by Lemma 7.19) to establish a bound on the anticoncentration of \(A\boldsymbol {u}\) under the assumption that each of \(A\) ’s rows has a \(\tau\) -regular subvector of 2-norm 1 (Theorem 7.1).
The following result regarding \(\tau\) -regular linear forms is fairly standard:
Recall the following fact, which can also be easily proven using Paley–Zygmund (see, e.g., Proposition 3.7 of the full version of Reference [18]):
We combine these as follows:
We take \(B = \lfloor 1/\tau ^2 \rfloor\) in the above. This yields the following:
We can now prove Theorem 7.1, which we restate here for convenience:
8 Fooling Bentkus’s Mollifier
The main result of this section is the following theorem, which provides the second step of the two-step program described at the end of Section 6:
At a very high level, in line with the usual Lindeberg approach, Theorem 8.1 is proved by hybridizing between \(\boldsymbol {u}\) and \(\boldsymbol {z}\) via a sequence of intermediate distributions. In our setting there are \(L+1\) such distributions, the first of which is \(\boldsymbol {u}\) and the last of which is \(\boldsymbol {z}\) , and the \(\ell\) th of which may be viewed as “filling in buckets \(\ell ,\ldots ,L\) according to \(\boldsymbol {u}\) and filling in buckets \(1,\ldots ,\ell -1\) according to \(\boldsymbol {z}\) ,” where the \(L\) buckets correspond to the partition of \([n]\) induced by the choice of the random hash function in the Meka–Zuckerman generator.
In Section 8.1, we upper bound the error incurred by taking a single step through this sequence of hybrid distributions. The upper bound given there (see Lemma 8.3) has a first component corresponding to the terms of order \(0,\ldots ,d-1\) in a \((d-1)\) -st order Taylor expansion, and a second component corresponding to the error term in Taylor’s theorem. The first component is upper bounded in Section 8.1, and the second component is upper bounded in Section 8.2. Section 8.3 formalizes the hybrid argument and uses the results of these earlier subsections to establish Theorem 8.1.
8.1 Single Swap in the Hybrid Argument
As we will see later, Equation (21) is a useful bound because we can (and will) take \(\delta _{\mathrm{CNF}}\) to be very small, and when we apply Lemma 8.3, we will be able to ensure that both expectations on the right-hand side of Equation (21) are small as well.
The main ingredient in the proof of Lemma 8.3 is the following claim:
Before proving Claim 8.4, we observe that Lemma 8.3 follows as a consequence:
In this subsection, we put together the two main results of the two previous subsections (Lemma 8.3 and Lemma 8.10) to prove Theorem 8.1.
Recalling Remark 8.2, we can write \(A\) as \(H+T\) , where every row of \(H\) is \(k\) -sparse and every row of \(T\) is \(\tau\) -regular with 2-norm 1. Let us say that a hash \(h : [n] \rightarrow [L]\) is \(H\) -good if
\begin{equation} | h^{-1}(\ell) \cap \mathrm{supp}(H_i)| \le w \coloneqq {\frac{2k}{L}} \end{equation}
(25)
for all buckets \(\ell \in [L]\) and rows \(i\in [m]\) . Equivalently, for all \(\ell \in [L]\) , every row of the the submatrix \(H^{h^{-1}(\ell)}\) is \(w\) -sparse.
We are now ready to prove Theorem 8.1, which we restate here for convenience:
Having completed both steps of the two-step program described at the end of Section 6, we are finally ready to prove Theorem 5.3, which we restate here for convenience:
Acknowledgments
Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Footnotes
1
Their seed length improves to \(O(m \log (m/\delta) + \log n)\) if \(m/\delta\) is bounded by any \(\mathrm{polylog}(n)\) .
2
Equivalently, \(A\) is \((n, \tau)\) -standardized.
3
Given a halfspace \(\mathds {1}[w \cdot x \le \theta ]\) , there is a smallest value \(\theta ^{\prime } \gt \theta\) achievable as \(w \cdot x\) for \(x \in \lbrace -1,1\rbrace ^n\) ; first perturb \(\theta\) upward to \((\theta + \theta ^{\prime })/2\) . Now no input \(x\) achieves \(w\cdot x = \theta\) exactly, so we can perturb the coefficients of \(w\) by sufficiently small amounts.
A
In this section, we prove Theorem 5.1. The proof uses the “critical index” theory for Boolean halfspaces, introduced in Reference [55] and used in several subsequent works on halfspaces.
Given \(A\) as in Theorem 5.1, the rows that are already \((k,\tau)\) -regular pose no difficulty as a simple rescaling of any such row (and the corresponding entry of \(b\) ) makes it \((k,\tau)\) -standardized. The remaining rows \(A_i\) have \(\tau\) -critical index exceeding \(k\) . The critical index theory [49, 55] says that such halfspaces \(\mathds {1}[A_i x \le b_i]\) are very close to \(k\) -juntas, and in fact Reference [9] shows that this is true even under \((k+2)\) -wise uniform distributions (for a slightly larger choice of \(k\) as alluded to in Remark 5.2). We tweak the quantitative aspects of these arguments below to work for the choice of \(k\) given in Equation (3). It will be convenient to follow the treatment in Reference [18].
The first lemma below says that if the “head” variables are set uniformly, then the resulting random variable has good anticoncentration at the scale of the two-norm of the tail:
Vidmantas Bentkus. 1990. Smooth approximations of the norm and differentiable functions with bounded support in Banach space \(l^k_\infty\) . Lithuan. Math. J. 30, 3 (1990), 223–230.
Avrim Blum and Ravi Kannan. 1997. Learning an intersection of a constant number of halfspaces under a uniform distribution. J. Comput. Syst. Sci. 54, 2 (1997), 371–380.
Ilias Diakonikolas, Daniel Kane, and Jelani Nelson. 2010. Bounded independence fools degree-2 threshold functions. In Proceedings of the 51st Annual Symposium on Foundations of Computer Science (FOCS). 11–20.
Devdatt Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge.
Parikshit Gopalan, Daniel Kane, and Raghu Meka. 2015. Pseudorandomness via the discrete Fourier transform. In Proceedings of the 56th Annual Symposium on Foundations of Computer Science (FOCS). 903–922.
Parikshit Gopalan, Adam Klivans, and Raghu Meka. 2012. Learning functions of halfspaces using prefix covers. In Proceedings of the 25th Annual Conference on Learning Theory (COLT).
Parikshit Gopalan, Ryan O’Donnell, Yi Wu, and David Zuckerman. 2010. Fooling functions of halfspaces under product distributions. In Proceedings of the 25th Annual Conference on Computational Complexity (CCC). 223–234. Retrieved from https://arxiv.org/abs/1001.1593.
Prahladh Harsha, Adam R. Klivans, and Raghu Meka. 2012. An invariance principle for polytopes. J. ACM 59, 6 (2012), 29:1–29:25. DOI:DOI:https://doi.org/10.1145/2395116.2395118
Russell Impagliazzo, Cristopher Moore, and Alexander Russell. 2014. An entropic proof of Chang’s inequality. SIAM J. Discrete Math. 28, 1 (2014), 173–176. DOI:DOI:
Russell Impagliazzo, Noam Nisan, and Avi Wigderson. 1994. Pseudorandomness for network algorithms. In Proceedings of the 26th Annual Symposium on Theory of Computing (STOC). 356–364.
Valentine Kabanets, Sajin Koroth, Zhenjian Lu, Dimitrios Myrisiotis, and Igor Oliveira. 2020. Algorithms and lower bounds for de Morgan formulas of low-communication leaf gates. In Proceedings of the 35th Computational Complexity Conference (CCC’20) (LIPIcs), Vol. 169. 15:1–15:41.
Daniel Kane. 2011. \(k\) -Independent Gaussians fool polynomial threshold functions. In Proceedings of the 26th IEEE Conference on Computational Complexity (CCC). 252–261.
Daniel Kane. 2011. A small PRG for polynomial threshold functions of Gaussians. In Proceedings of the 52nd Annual IEEE Symposium on Foundations of Computer Science (FOCS). 257–266.
Daniel Kane. 2014. The average sensitivity of an intersection of halfspaces. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC). 437–440.
Daniel Kane. 2014. A pseudorandom generator for polynomial threshold functions of Gaussians with subpolynomial seed length. In Proceedings of the 29th Annual Conference on Computational Complexity (CCC). 217–228.
Daniel Kane and Sankeerth Rao. 2018. A PRG for Boolean PTF of degree 2 with seed length subpolynomial in \(\varepsilon\) and logarithmic in \(n\) . In Proceedings of the 33rd Computational Complexity Conference (CCC). 2:1–2:24.
Adam Klivans, Ryan O’Donnell, and Rocco A. Servedio. 2004. Learning intersections and thresholds of halfspaces. J. Comput. Syst. Sci. 68, 4 (2004), 808–840.
Adam Klivans, Ryan O’Donnell, and Rocco A. Servedio. 2008. Learning geometric concepts via Gaussian surface area. In Proceedings of the 49th Symposium on Foundations of Computer Science (FOCS). 541–550.
Adam Klivans and Alexander Sherstov. 2006. Cryptographic hardness for learning intersections of halfspaces. In Proceedings of the 47th Symposium on Foundations of Computer Science (FOCS). 553–562.
Pravesh K. Kothari and Raghu Meka. 2015. Almost optimal pseudorandom generators for spherical caps. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC). 247–256.
John Littlewood and Albert Cyril Offord. 1943. On the number of real roots of a random algebraic equation. III. Rec. Math. [Mat. Sbornik] N.S. 12 (1943), 277–286.
Sergey Nagaev and Iosif Pinelis. 1978. Some inequalities for the distribution of sums of independent random variables. Theor. Probab. Applic. 22, 2 (1978), 248–256.
Fedor Nazarov. 2003. On the maximal perimeter of a convex set in \(\mathds R^n\) with respect to a Gaussian measure. In Geometric Aspects of Functional Analysis (2001–2002). Lecture Notes in Math., Vol. 1807, Springer, 169–187.
Noam Nisan and David Zuckerman. 1996. Randomness is linear in space. J. Comput. Syst. Sci. 52, 1 (1996), 43–52. DOI:DOI:https://doi.org/10.1006/jcss.1996.0004
Ryan O’Donnell, Rocco A. Servedio, and Li-Yang Tan. 2019. Fooling polytopes. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC’19). ACM, 614–625.
Haskell Rosenthal. 1970. On the subspaces of \(L^p\) ( \(p \gt 2\) ) spanned by sequences of independent random variables. Israel J. Math 8 (1970), 273–303.
Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. 1995. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8, 2 (1995), 223–250. SJDMECDOI:DOI:https://doi.org/10.1137/S089548019223872X
Rocco A. Servedio and Li-Yang Tan. 2017. Fooling intersections of low-weight halfspaces. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS). 824–835.
Rocco A. Servedio and Li-Yang Tan. 2017. What circuit classes can be learned with non-trivial savings? In Proceedings of the 8th Innovations in Theoretical Computer Science (ITCS). 30:1–30:21.
Alexander A. Sherstov. 2013. Optimal bounds for sign-representing the intersection of two halfspaces by polynomials. Combinatorica 33, 1 (2013), 73–96.
Terence Tao and Van Vu. 2012. The Littlewood–offord problem in high dimensions and a conjecture of Frankl and Füredi. Combinatorica 32, 3 (2012), 363–372.
STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
We give a pseudorandom generator that fools m-facet polytopes over {0,1}n with seed length polylog(m) · log(n). The previous best seed length had superlinear dependence on m. An immediate consequence is a deterministic quasipolynomial time algorithm for ...
STOC '10: Proceedings of the forty-second ACM symposium on Theory of computing
Let X be randomly chosen from {-1,1}n, and let Y be randomly chosen from the standard spherical Gaussian on Rn. For any (possibly unbounded) polytope P formed by the intersection of k halfspaces, we prove that |Pr[X ∈ P] - Pr[Y ∈ P]| ≤ log8/5k • Δ, ...
We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and the John walk, for generating samples from the uniform distribution over a polytope. Both random walks are sampling algorithms derived from interior point methods. The former ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].