Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Fooling Polytopes

Published: 31 January 2022 Publication History

Abstract

We give a pseudorandom generator that fools m-facet polytopes over {0, 1}n with seed length polylog(m) · log n. The previous best seed length had superlinear dependence on m.

1 Introduction

Unconditional derandomization has been a major focus of research in computational complexity theory for more than 30 years. A significant line of work in this area has been on developing unconditional pseudorandom generators (PRGs) for various types of Boolean functions. Early seminal results in this vein focused on Boolean circuits [1, 43, 45] and branching programs [22, 44, 46], but over the past decade or so a new strand of research has emerged in which the goal is to construct PRGs against halfspaces and various generalizations of halfspaces. This work has included a sequence of successively more efficient PRGs against halfspaces [9, 15, 27, 29, 35, 39], low-degree polynomial threshold functions [10, 24, 25, 27, 28, 39], and, most relevant to this article, intersections of halfspaces [8, 18, 19, 56].
Since intersections of \(m\) halfspaces correspond to \(m\) -facet polytopes, and also to \(\lbrace 0,1\rbrace\) -integer programs with \(m\) constraints, these objects are of fundamental interest in high-dimensional geometry, optimization, and a range of other areas. A pseudorandom generator that \(\delta\) -fools intersections of \(m\) halfspaces can equivalently be viewed as an explicit discrepancy set for \(m\) -facet polytopes: a small subset of \(\lbrace 0,1\rbrace ^n\) that \(\delta\) -approximates the \(\lbrace 0,1\rbrace ^n\) -volume of every \(m\) -facet polytope. (Discrepancy sets are stricter versions of hitting sets, which are only required to intersect every polytope of volume at least \(\delta\) .) The problem of constructing a PRG for intersections of \(m\) halfspaces is also a stricter version of the algorithmic problem of deterministically approximating the number of solutions of a \(\lbrace 0,1\rbrace\) -integer program with \(m\) constraints. It is stricter because a PRG yields an input-oblivious algorithm: The range of a PRG is a single fixed set of points that gives approximately the right answer for every \(\lbrace 0,1\rbrace\) -integer program. Beyond pseudorandomness, intersections of halfspaces also play a significant role in other fields such as concrete complexity theory [4, 26, 40, 48, 58, 59] and computational learning theory [6, 16, 30, 32, 33, 34, 57, 64].
The main result of this article is a new PRG for intersections of \(m\) halfspaces. Its seed length grows polylogarithmically with \(m\) , which is an exponential improvement of the previous best PRG for this class. Before giving the precise statement of our result, we briefly describe the prior state-of-the-art-for this problem.

1.1 Prior work on PRGs for Intersections of Halfspaces

A halfspace \(F(x) = \mathds {1}[w \cdot x \le \theta ]\) is said to be \(\tau\) -regular if \(|w_j| \le \tau \Vert w\Vert _2\) for all \(j \in [n]\) ; intuitively, a \(\tau\) -regular halfspace is one in which no coefficient \(w_j\) is too large relative to the overall scale of all the coefficients. Harsha, Klivans, and Meka [19] gave a PRG that \(\delta\) -fools any intersection of \(m\) many \(\tau\) -regular halfspaces with seed length \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) , where \(\tau\) has to be sufficiently small relative to \(m\) and \(\delta\) (specifically, \(\tau \le \text{some } \mathrm{poly}(\frac{\delta }{\log m})\) is required). While this seed length has the desirable property of being polylogarithmic in \(m\) , due to the regularity requirement this result cannot be used to fool intersections of even two general halfspaces. We note that there are very basic halfspaces, such as \(F(x) = \mathds {1}[x_1 \le 1/2]\) , that are highly irregular.
Recently, Reference [56] built on the work of Reference [19] to give a PRG that fools a different subclass of intersections of halfspaces. They give a PRG that \(\delta\) -fools any intersection of \(m\) many weight- \(W\) halfspaces with seed length \(\mathrm{poly}(\log m, W, 1/\delta)\cdot \mathrm{polylog}\,n\) ; a halfspace has weight \(W\) if it can be expressed as \(\mathds {1}[w \cdot x \le \theta ]\) where each coefficient \(w_j\) is an integer of magnitude at most \(W\) . Unfortunately, many \(n\) -variable halfspaces require weight polynomially or even exponentially large in \(n\) ; in fact, a counting argument shows that almost all halfspaces require exponentially large weight. Therefore, the Reference [56] result also cannot be used to fool even two general halfspaces.
In Reference [18], Gopalan, O’Donnell, Wu, and Zuckerman gave a PRG that can fool intersections of \(m\) general halfspaces. However, various aspects of their approach each necessitate a seed length that is at least linear in \(m\) , and indeed their overall seed length is \(O((m \log (m/\delta) + \log n) \cdot \log (m/\delta))\) .1 So, while this PRG is notable for being able to handle intersections of general halfspaces, its seed length becomes trivial (greater than \(n\) ) for intersections of \(m \ge n\) many halfspaces. (Indeed, this PRG of Reference [18] fools arbitrary monotone functions of \(m\) general halfspaces, with intersections (i.e., Ands) being a special case. Due to the generality of this class—which of course includes every monotone function over \(\lbrace 0,1\rbrace ^m\) —it can be shown that any PRG for it has to have at least linear seed length dependence on \(m\) .)

1.1.1 PRGs over Gaussian Space.

There has also been work on PRGs for functions over \(\mathds R^n\) endowed with the \(n\) -dimensional Gaussian distribution. Analyses in this setting are often facilitated by the continuous nature of \(\mathds R^n\) and rotational invariance of the Gaussian distribution, useful technical properties not afforded by the standard setting of Boolean space. For halfspaces and polytopes, PRGs over Gaussian space can be viewed as a first step towards PRGs over Boolean space; as we describe below, Boolean PRGs even for restricted subclasses of halfspaces and polytopes yield Gaussian PRGs for general halfspaces and polytopes, but the converse does not hold. We also note that the correspondence between polytopes and \(\lbrace 0,1\rbrace\) -integer programs is specific to Boolean space, and in particular, Gaussian PRGs do not yield algorithms for counting solutions to these programs.
For halfspaces, Meka and Zuckerman [39] showed that any PRG for the subclass of \(O(\frac{1}{\sqrt {n}})\) -regular halfspaces over Boolean space yields a PRG for all halfspaces over Gaussian space. Note that \(O(\frac{1}{\sqrt {n}})\) -regular halfspaces are “the most regular” ones; every halfspace is \(\tau\) -regular for some \(\tau \in [\frac{1}{\sqrt {n}},1]\) . Reference [19] generalized this connection to polytopes: They showed that any PRG for intersections of \(m\) many \(O((\log m)/\sqrt {n})\) -regular halfspaces over Boolean space yields a PRG for intersections of \(m\) many arbitrary halfspaces over Gaussian space. Combining this with their Boolean PRG for intersections of regular halfspaces discussed above, Reference [19] obtained a Gaussian PRG for intersections of \(m\) halfspaces with seed length \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) . Recent work of Reference [8] gives a different Gaussian PRG with seed length \(\mathrm{poly}(\log m,1/\delta) + O(\log n)\) .
The focus of the current work is on the standard setting of PRGs over Boolean space, and the rest of the article addresses this (more challenging) setting.

1.2 This Work: A PRG for Intersections of General Halfspaces

Summarizing the prior state-of-the-art on PRGs over Boolean space, there were no PRGs that could fool intersections of \(m = n\) many general halfspaces, and relatedly, the best PRG for intersections of \(m\le n\) general halfspaces had a superlinear seed length dependence on \(m\) . The PRGs that could fool intersections of \(m\ge n\) halfspaces imposed technical restrictions on the halfspaces: either regularity (hence excluding simple halfspaces such as \(\mathds {1}[x_1 \le 1/2]\) ) or small weights (hence excluding almost all halfspaces). Please refer to Table 1.
Table 1.
ReferenceFunction classSeed length of PRG
[18]Monotone functions of \(m\) halfspaces \(O((m \log (m/\delta) + \log n) \cdot \log (m/\delta))\) \(O(m \log (m/\delta) + \log n),\) if \(m/\delta \le \text{any } \mathrm{polylog}(n)\)
[19]Intersections of \(m\) \(\tau\) -regular halfspaces \(\mathrm{poly}(\log m,1/\delta)\cdot \log n,\) if \(\tau \le \text{some }\mathrm{poly}(\frac{\delta }{\log m})\)
[56]Intersections of \(m\) weight- \(W\) halfspaces \(\mathrm{poly}(\log m, W, 1/\delta)\cdot \mathrm{polylog}\,n\)
This workIntersections of \(m\) halfspaces \(\mathrm{poly}(\log m, 1/\delta)\cdot \log n\)
Table 1. PRGs for Intersections of Halfspaces Over \(\lbrace 0,1\rbrace ^n\)
The main result of this article is a PRG that fools intersections of \(m\) general halfspaces with a polylogarithmic seed length dependence on \(m\) :
Theorem 1.1 (PRG for Polytopes).
For all \(n,m\in \mathds N\) and \(\delta \in (0,1)\) , there is an explicit pseudorandom generator with seed length \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) that \(\delta\) -fools the class of intersections of \(m\) halfspaces over \(\lbrace 0,1\rbrace ^n\) .
In particular, this PRG fools intersections of \(\mathrm{quasipoly}(n)\) many halfspaces with seed length \(\mathrm{polylog}(n)\) , and its seed length remains non-trivial for intersections of exponentially many halfspaces ( \(\exp (n^c)\) where \(c \gt 0\) is an absolute constant).
An immediate consequence of Theorem 1.1 is a deterministic algorithm that runs in time \(n^{\mathrm{polylog}(m)}\) and additively approximates the number of solutions to any \(n\) -variable \(\lbrace 0,1\rbrace\) -integer program with \(m\) constraints. Prior to our result, no non-trivial deterministic algorithm (running in time \(\lt 2^n\) ) was known even for general \(\lbrace 0,1\rbrace\) -integer programs with \(m=n\) constraints. Theorem 1.1 also yields PRGs with comparable seed lengths for intersections of halfspaces over a range of other domains, such as the \(n\) -dimensional hypergrid \(\lbrace 0,1,\ldots , N\rbrace ^n\) and the solid cube \([0,1]^n\) (details are left to the interested reader).

1.3 Discussion

Since the initial conference publication of a preliminary version of this article [50], several works have appeared that are relevant to the topic of this article. One line of work has been on obtaining pseudorandom generators for functions of halfspaces (including intersections of halfspaces, i.e., polytopes) with a better dependence on the error parameter but a worse dependence on other parameters. Kabanets et al. [23] gave a PRG that \(\delta\) -fools the class of \(n\) -variable size- \(s\) de Morgan formulas with halfspace gates at the bottom and has seed length \(O(n^{1/2} s^{1/4} \log (n) \log (n/\delta))\) , and Hatami et al. [20] gave a PRG that \(\delta\) -fools the class of arbitrary functions of \(m\) halfspaces over \(n\) variables with seed length \(\tilde{O}(\sqrt {n(m + \log (1/\delta))}).\) The techniques employed in these works are very different from the methods of our article; even for the special case of intersections of \(m\) halfspaces, each of these results has a seed length that is polynomial in \(n\) and \(m\) (rather than logarithmic or polylogarithmic as in our work), but these results have a much better seed length dependence on the error parameter \(\delta .\) Achieving a polylogarithmic dependence on all three parameters \(n,m\) , and \(1/\delta\) is an interesting challenge for future work.
In a different related line of work, Arunachalam and Yao [2] have considered the problem of constructing explicit PRGs for positive spectrahedrons. A positive spectrahedron is a Boolean function \(\mathds {1}[x_1 A^1 + \cdots + x_n A^n \preceq B]\) , where the \(A^i\) ’s are \(k \times k\) positive semidefinite matrices. Building on some of the ideas and ingredients in this article and in prior work [19], they establish invariance principles and give a PRG that \(\delta\) -fools “sufficiently regular” positive spectrahedrons over \(\lbrace 0,1\rbrace ^n\) with seed length \(\mathrm{poly}(\log k, 1/\delta)\cdot \log n.\)

2 Overview of Our Proof

Our proof of Theorem 1.1 involves several novel extensions of the central technique driving this line of work, namely, Lindeberg-style proofs of probabilistic invariance principles and derandomizations thereof. We develop these extensions to overcome challenges that arise due to the generality of our setting; specifically, the fact that we are dealing with intersections of arbitrary halfspaces, with no restrictions whatsoever on their structure. One of the key new ingredients in our analysis, which we believe is of independent interest, is a sharp high-dimensional generalization of the classic Littlewood–Offord anticoncentration inequality [12, 37] that we establish. We now describe our proof and the new ideas underlying it in detail.

2.1 Background: The Reference [19] PRG for Regular Polytopes

We begin by recalling the arguments of Harsha, Klivans, and Meka [19] for fooling regular polytopes. At a high level, Reference [19] builds on the work of Meka and Zuckerman [39], which gave a versatile and powerful framework for constructing pseudorandom generators from probabilistic invariance principles; the main technical ingredient underlying the Reference [19] PRG for regular polytopes is a new invariance principle for such polytopes, which we now describe.
Reference [19]’s invariance principle and the Lindeberg method. At a high level, the Reference [19] invariance principle for regular polytopes is as follows: Given an \(m\) -tuple of regular linear forms over \(n\) input variables \(x=(x_1,\ldots ,x_n)\) (denoted by \(Ax\) , where \(A\) is an \(m\) -by- \(n\) matrix), the distribution (over \(\mathds R^m\) ) of \(A \boldsymbol {u}\) , where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) is uniform random, is very close to the distribution of \(A \boldsymbol {g}\) , where \(\boldsymbol {g}\sim {\mathcal {N}(0,1)}^n\) is distributed according to a standard \(n\) -dimensional Gaussian. Here, closeness is measured by multidimensional CDF distance; we observe that multidimensional CDF distance corresponds to test functions of the form \(\mathds {1}[Ax \le b]\) where \(b \in \mathds R^m\) , which syncs up precisely with an intersection of \(m\) halfspaces \(\mathds {1}[A_1 x \le b_1] \wedge \cdots \wedge \mathds {1}[A_m x \le b_m].\) To prove this invariance principle, Reference [19] employs the well-known Lindeberg method (see, e.g., Chapter §11 of References [47] and [61]) and proceeds in two main conceptual steps. The first step establishes a version of the result for smooth test functions, proxies for the actual “hard threshold” test functions \(\mathds {1}[Ax \le b]\) , and the second step relates distance with respect to these smooth test functions to multidimensional CDF distance via Gaussian anticoncentration. We outline each of these two steps below.
The first step is to prove an invariance principle for smooth test functions. Here, instead of measuring the distance between \(A\boldsymbol {u}\) and \(A\boldsymbol {g}\) using test functions that are orthant indicators \(\mathcal {O}_b(v_1,\ldots ,v_m) = \mathds {1}[v \le b]\) (corresponding to multidimensional CDF distance), distance is measured using a sufficiently smooth mollifier \(\widetilde{\mathcal {O}}_b : \mathds R^m \rightarrow [0,1]\) of \(\mathcal {O}_b\) . Such mollifiers, with useful properties that we now discuss, were proposed and analyzed by Bentkus [5]. In more detail, Reference [19] proves that the difference between the expectations of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {u})\) and \(\widetilde{\mathcal {O}}_b(A\boldsymbol {g})\) is bounded by a certain function involving \(\widetilde{\mathcal {O}}_b\) ’s derivatives. In fact, as in standard in Lindeberg-style proofs of invariance principles, Reference [19] actually bounds this difference with respect to any smooth test function \(\Upsilon : \mathds R^m \rightarrow \mathds R\) in terms of \(\Upsilon\) ’s derivatives; the only specific property of Bentkus’s mollifier \(\widetilde{\mathcal {O}}_b\) that is used is that its derivatives are appropriately small. At a high level, the proof of this smooth invariance principle proceeds by hybridizing from \(\Upsilon (A\boldsymbol {u})\) to \(\Upsilon (A\boldsymbol {g})\) , using the multidimensional Taylor expansion of \(\Upsilon\) to bound the error incurred in each step. (The regularity of the linear forms is used in a crucial way to control the approximation error that results from truncating the Taylor expansion at a certain fixed degree.)
The second step is to establish the desired bound on multidimensional CDF distance using the aforedescribed smooth invariance principle applied to Bentkus’s mollifier. This step relies on a second key property of Bentkus’s mollifier: \(\widetilde{\mathcal {O}}_b\) agrees with the orthant indicator \(\mathcal {O}_b\) except on a small error region near the orthant boundary. With this property in hand, a fairly simple and standard argument shows that it suffices to bound the anticoncentration of the Gaussian random variable \(A \boldsymbol {g}\) ; intuitively, such anticoncentration establishes that \(A\boldsymbol {g}\) does not place too much probability weight on the error region where \(\widetilde{\mathcal {O}}_b\) disagrees with \(\mathcal {O}_b\) . In Reference [19], the required anticoncentration for \(A \boldsymbol {g}\) follows immediately from a result of Nazarov [33, 42] on the Gaussian surface area of \(m\) -facet polytopes.
The Reference [19] PRG via a derandomized invariance principle. Having proved this invariance principle for regular polytopes, Reference [19] then establishes a pseudorandom version by derandomizing its proof. That is, they argue that their proof in fact establishes multidimensional-CDF-closeness between \(A \boldsymbol {z}\) and \(A\boldsymbol {g}\) , where \(\boldsymbol {g}\sim \mathcal {N}(0,1)^n\) is distributed according to a standard Gaussian as before, but \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) is the output of a suitable pseudorandom suitable generator \(\mathscr{G} : \lbrace -1,1\rbrace ^r \rightarrow \lbrace -1,1\rbrace ^n\) (rather than uniform random). Combining the “full-randomness” invariance principle (establishing closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {g}\) ) with this pseudorandom version (establishing closeness between \(A\boldsymbol {z}\) and \(A\boldsymbol {g}\) ), it follows from the triangle inequality that \(A\boldsymbol {z}\) and \(A\boldsymbol {u}\) are close. Recalling that multidimensional CDF distance corresponds to test functions of the form \(\mathds {1}[Ax \le b] = \mathds {1}[A_1 x \le b_1] \wedge \cdots \wedge \mathds {1}[A_m x \le b_m]\) , this is precisely equivalent to the claim that \(\mathscr{G}\) fools the intersection of \(m\) halfspaces with weight matrix \(A \in \mathds R^{m\times n}\) (and an arbitrary vector of thresholds \(b \in \mathds R^m\) ).
For later reference, we close this section with an informal description of the Reference [19] generator (for fooling intersections of \(m\) many \(\tau\) -regular halfspaces):
(1)
Pseudorandomly hash the \(n\) variables into \(L\coloneqq {\mathrm{poly}(1/\tau)}\) buckets using an \((r_\mathrm{hash}\coloneqq 2\log m)\) -wise uniform hash function \(\boldsymbol {h}: [n] \rightarrow [L]\) .
(2)
Independently across buckets, assign values to the variables within each bucket using an \((r_\mathrm{bucket}\coloneqq 4\log m)\) -wise uniform distribution.
We remark that this is the structure of the Meka–Zuckerman generator [39] for fooling a single regular halfspace, the only difference being that the relevant parameters \(L,r_\mathrm{hash},\) and \(r_\mathrm{bucket}\) are larger in Reference [19] than in Reference [39] (naturally so, given that the Reference [19] generator fools intersections of \(m\) regular halfspaces instead of a single one).
Our analysis in this article can be used to show that the Reference [39] generator, instantiated with suitable choices of \(L, r_\mathrm{hash}\) , and \(r_\mathrm{bucket}\) , fools intersections of \(m\) general halfspaces. However, for technical reasons (that are not essential for this high-level discussion), this results in a seed length that is \(\mathrm{poly}(\log m,1/\delta , \log n)\) . To achieve our seed length of \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) , we slightly extend the Reference [39] generator in two ways. First, within each bucket the variables are assigned using an \(r_\mathrm{bucket}\) -wise uniform distribution Xor-ed with an independent draw from a generator that fools small-width CNF formulas [17]. Second, we Xor the entire resulting \(n\) -bit string with an independent draw from a \(k\) -wise independent generator. (See Section 4 for a detailed description of our PRG.)

2.2 Some Key New Ingredients in our Analysis

A fundamental challenge in extending the Reference [19] PRG result from regular to general polytopes stems from the fact that an invariance principle simply does not hold for general polytopes \(Ax \le b\) . Without the regularity requirement on \(A\) , it is not true that \(A\boldsymbol {u}\) and \(A\boldsymbol {g}\) are close in CDF distance; indeed, even a single non-regular linear form such as \(x_1\) is distributed very differently under \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) versus \(\boldsymbol {g}\sim \mathcal {N}(0,1)^n\) . This therefore necessitates a significant conceptual departure from the Meka–Zuckerman framework for constructing pseudorandom generators from invariance principles: Rather than establishing closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) (where \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) is the output of a suitable pseudorandom generator) through \(A\boldsymbol {g}\) by means of an invariance principle, one has to establish closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) “directly” without using invariance.
Somewhat surprisingly, even though an invariance principle does not hold in our setting of general polytopes, our proof nonetheless proceeds via the Lindeberg method for proving invariance principles. Following the two main conceptual steps of the method (as outlined in the previous section), we first prove that \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) are close with respect to Bentkus’s smooth mollifiers \(\widetilde{\mathcal {O}}_b\) for the orthant indicators \(\mathcal {O}_b\) , and then use this to establish closeness in multidimensional CDF distance. However, the fact that we are dealing with matrices \(A \in \mathds R^{m\times n}\) whose rows are arbitrary linear forms (corresponding to the facets of general \(m\) -facet polytopes) instead of regular linear forms poses significant challenges in both steps of the Lindeberg method. We discuss some of these challenges, and the new ideas that we employ to overcome them, next. For concreteness, we will discuss these challenges and new ingredients by contrasting our proof with that of Reference [19], but we remark here that these are in fact qualitative differences between our approach and the Lindeberg method in general.
Step 1: Fooling Bentkus’s mollifier. Recall that Reference [19] first proves a general invariance principle establishing closeness in expectation (with a quantitative bound that depends on \(\Upsilon\) ’s derivatives) between \(\Upsilon (A\boldsymbol {u})\) and \(\Upsilon (A\boldsymbol {g})\) for any smooth test function \(\Upsilon\) . They then apply this general invariance principle with Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b\) being the test function, using the bounds on \(\widetilde{\mathcal {O}}_b\) ’s derivatives established in Reference [5] but no other properties of \(\widetilde{\mathcal {O}}_b\) .
In contrast, we do not prove closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) for all smooth test functions; our argument is carefully tailored to Bentkus’s specific mollifier. In addition to bounds on \(\widetilde{\mathcal {O}}_b\) ’s derivatives, we crucially rely on the specific structure of \(\widetilde{\mathcal {O}}_b\) , in particular, the fact that it is the product of \(m\) univariate functions, one for each coordinate (i.e., \(\widetilde{\mathcal {O}}_b(v) = \prod _{i=1}^m \psi _{b_i}(v_i)\) , where each \(\psi _{b_i}\) maps \(\mathds R\) to \([0,1]\) ). A high-level intuition for why such product structure is useful is as follows: By doing some structural analysis of halfspaces (see Section 5), we can decompose each of our \(m\) halfspaces into a small “head” portion, consisting of at most \(k\) variables, and a remaining “tail” portion that is regular. From this point of view, the difference between regular and general polytopes is therefore the presence of these size-at-most- \(k\) head portions in each of the \(m\) halfspaces. Very roughly speaking, the product structure of \(\widetilde{\mathcal {O}}_b\) allows us to handle these head portions using pseudorandom generators for small-width CNF formulas [17]. (To see the relevance of CNF formulas in this context, at least at a conceptual level, observe that a product of \(\lbrace 0,1\rbrace\) -valued \(k\) -juntas is a width- \(k\) CNF formula.)
Our proof incorporates these PRGs for CNFs into Reference [19]’s analysis of the regular tail portions. We highlight one interesting aspect of our analysis: In all previous instantiations of the Lindeberg method that we are aware of, expressions like \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] -{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }}^{\prime })]|\) are bounded by considering two Taylor expansions of \(\Upsilon\) , both taken around the “common point” \(\boldsymbol {v}\) . Lindeberg method arguments analyze the difference of these Taylor expansions using moment-matching properties of \({\boldsymbol {\Delta }}\) and \({\boldsymbol {\Delta }}^{\prime }\) and the fact that they are “small” in a certain technical sense, which is directly related to the regularity assumptions that underlie these invariance principles. In contrast, in our setting, since we are dealing with arbitrary linear forms rather than regular ones, we end up having to bound expressions like \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] - {{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime } + {\boldsymbol {\Delta }}^{\prime })]|\) . Note that this involves considering the Taylor expansions of \(\Upsilon\) around two distinct points \(\boldsymbol {v}\) and \(\boldsymbol {v}^{\prime }\) , which may be far from each other—indeed, a priori it is not even clear that \(|{{\bf E}}[\Upsilon (\boldsymbol {v})]-{{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime })]|\) will be small. Because of these differences from the standard Lindeberg scenario, moment-matching properties of \({\boldsymbol {\Delta }}\) and \({\boldsymbol {\Delta }}^{\prime }\) and their “smallness” no longer suffice to ensure that the overall expected difference is small. Instead, as alluded to above, our analysis additionally exploits the product structure of Bentkus’s mollifier via PRGs for CNFs to bound \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] - {{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime } + {\boldsymbol {\Delta }}^{\prime })]|\) (see Section 8).
Step 2: Anticoncentration. The next step is to pass from closeness of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {u})\) and \(\widetilde{\mathcal {O}}_b(A\boldsymbol {z})\) in expectation, to closeness of \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) in multidimensional CDF distance. We recall that in the analogous step in Reference [19]’s proof, the starting point was closeness in expectation of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {u})\) and \(\widetilde{\mathcal {O}}_b(A\boldsymbol {g})\) , where \(\boldsymbol {g}\sim \mathcal {N}(0,1)^n\) is a standard Gaussian (instead of \(\widetilde{\mathcal {O}}_b(A\boldsymbol {z})\) where \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) is pseudorandom). For this reason, it sufficed for Reference [19] to bound the Gaussian anticoncentration of \(A\boldsymbol {g}\) and, as mentioned, such a bound is an immediate consequence of Nazarov’s bound on the Gaussian surface area of \(m\) -facet polytopes.
In contrast, since the Gaussian distribution does not enter into our arguments at all (by necessity, as explained above), we instead have to bound the Boolean anticoncentration of \(A\boldsymbol {u}\) where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) is uniform random. This task, which is carried out in Section 7, requires significantly more work; indeed, Boolean anticoncentration formally contains Gaussian anticoncentration as a special case. At the heart of our arguments for this step is a new Littlewood–Offord-type anticoncentration inequality for \(m\) -facet polytopes, a high-dimensional generalization of the classic Littlewood–Offord theorem [12, 37]. We discuss this new theorem, which we believe is of independent interest, next.

2.2.1 A Littlewood–Offord Theorem for Polytopes.

We first recall the classic Littlewood–Offord anticoncentration inequality.
Theorem 2.1 (Littlewood–offord).
For all \(\theta \in \mathds R\) and \(w \in \mathds R^n\) such that \(|w_j| \ge 1\) for all \(j\in [n]\) ,
\[{{\bf Pr}}{\lbrack }*{\rbrack }{w\cdot \boldsymbol {u}\in (\theta -2,\theta \,]} = O{\lparen }*{\rparen }{\frac{1}{\sqrt {n}}},\]
where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) is uniformly random.
Littlewood and Offord [37] first proved a bound of \(O((\log n)/\sqrt {n})\) ; Erdös [12] subsequently sharpened this to \(O(1/\sqrt {n})\) , which is optimal by considering \(w = 1^n\) and \(\theta = 0\) . (We observe that the question trivializes without the assumption on the magnitudes of \(w\) ’s coordinates; for instance, the relevant probability is \(1/2\) for \(w = (1,0,\ldots ,0)\) and \(\theta = 1\) .)
Theorem 2.1 has the following natural geometric interpretation: The maximum fraction of hypercube points that can fall within the “width-2 boundary” of a halfspace \(\mathds {1}[w\cdot x \le \theta ]\) where \(|w_j|\ge 1\) for all \(j\) is \(O(1/\sqrt {n})\) . Given this geometric interpretation, it is natural to seek a generalization from single halfspaces (i.e., 1-facet polytopes) to \(m\) -facet polytopes:
What is the maximum fraction of hypercube points \(u \in \lbrace -1,1\rbrace ^n\) that can lie within the “width-2 boundary” of an \(m\) -facet polytope \(A x \le b\) where \(|A_{ij}| \ge 1\) for all \(i\) and \(j\) ?
In more detail, we say that \(u\) lies within the “width-2 boundary” of the polytope \(Ax \le b\) provided \(Au \le b\) and \(A_i \cdot u \gt b_i - 2\) for some \(i\in [m]\) ; equivalently, \(u\) lies in the difference of the two polytopes \(Ax \le b\) and \(Ax \le b - 2\cdot \mathds {1}_m\) , where \(\mathds {1}_m\) denotes the all-1’s vector in \(\mathds R^m\) . The Littlewood–Offord theorem (Theorem 2.1), along with a naive union bound, implies a bound of \(O(m/\sqrt {n})\) ; we are not aware of any improvement of this naive bound prior to our work.
We give an essentially complete answer to this question, with upper and lower bounds that match up to constant factors. In Section 7, we prove the following “Littlewood–Offord theorem for polytopes”:
Theorem 2.2 (Littlewood–offord Theorem for Polytopes).
For all \(m\ge 2\) , \(b \in \mathds R^m\) , and \(A \in \mathds R^{m\times n}\) with \(|A_{ij}| \ge 1\) for all \(i \in [m]\) and \(j \in [n]\) ,
\[{{\bf Pr}}{\lbrack }*{\rbrack }{A\boldsymbol {u}\le b\ \& \ A_i \cdot \boldsymbol {u}\gt b_i-2 \text{ for some $i\in [m]$}} \le \frac{5\sqrt {2\ln m}}{\sqrt {n}},\]
where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) is uniformly random.
Our proof of Theorem 2.2 draws on and extends techniques from Kane’s bound on the Boolean average sensitivity of \(m\) -facet polytopes [26]. We complement Theorem 2.2 with a matching lower bound, which establishes the existence of an \(m\) -facet polytope with an \(\Omega (\sqrt {\ln m}/\sqrt {n})\) -fraction of hypercube points lying within its width-2 boundary. (In fact, our lower bound is slightly stronger: It establishes the existence of a polytope with an \(\Omega (\sqrt {\ln m}/{\sqrt {n}})\) -fraction of hypercube points lying on its surface, corresponding to its width-0 boundary.)
Theorem 2.2 does not suffice for the purpose of passing from closeness with respect to Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b\) to closeness in multidimensional CDF distance (i.e., Step 2 in Section 2.2): While the assumption on the magnitudes of \(A\) ’s entries is essential to Theorem 2.2 (just as the analogous assumption on \(w\) ’s coordinates is essential to the Littlewood–Offord theorem), the weight matrix of a general \(m\) -facet polytope need not have this property. In Section 7, we establish various technical extensions of Theorem 2.2 that are required to handle this issue.
Remark 2.3.
Our generalization of the Littlewood–Offord theorem (Theorem 2.2) is, to our knowledge, incomparable to other high-dimensional generalizations that have been studied in the literature. In particular, References [14, 31, 62] (see also the references therein) study the probability that \(A\boldsymbol {u}\) falls within a ball of fixed radius in \(\mathds R^m\) , where \(A \in \mathds R^{m\times n}\) is a matrix whose columns have 2-norm at least 1 (i.e., \(A\boldsymbol {u}\) is the random \(\pm 1\) sum of \(n\) many \(m\) -dimensional vectors of length at least 1).

2.3 Relation to Reference [56]

We close this section with a discussion of the connection between our techniques and those of the recent work cited in Reference [56]. Recall that the main result of Reference [56] is a PRG for \(\delta\) -fooling intersections of \(m\) weight- \(W\) halfspaces using seed length \(\mathrm{poly}(\log m, W,1/\delta)\cdot \mathrm{polylog}\, n\) (whereas our main result, which is strictly stronger, is a PRG for \(\delta\) -fooling intersections of \(m\) general halfspaces using seed length \(\mathrm{poly}(\log m, 1/\delta)\cdot \log n\) , with no dependence on the weights of the halfspaces).
A key structural observation driving Reference [56] is that every intersection of \(m\) low-weight halfspaces can be expressed as \(H \wedge G\) , where \(H\) is an intersection of \(m\) regular halfspaces and \(G\) is a small-width CNF. (The width of \(G\) grows polynomially with the weights of the halfspaces, and this polynomial growth is responsible for the polynomial dependence on \(W\) in the seed length of the Reference [56] PRG.) From this starting point, it suffices for Reference [56] to bound the multidimensional CDF distance between the \((\mathds R^{m} \times \lbrace \pm 1\rbrace)\) -valued random variables \((A\boldsymbol {u}, G(\boldsymbol {u}))\) and \((A\boldsymbol {z}, G(\boldsymbol {z}))\) , where \(A \in \mathds R^{m\times n}\) is the weight matrix of \(H\) , \(\boldsymbol {u}\) is uniform random, and \(\boldsymbol {z}\) is the output of the Reference [56] PRG (which is a slight variant of Reference [19]’s pseudorandom generator). Since \(H\) is an intersection of regular halfspaces, the fact that \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) are close in multidimensional CDF distance is precisely the main result of Reference [19]; the crux of the work in Reference [56], therefore, lies in dealing with the additional distinguished \((m+1)^{\text{st}}\) coordinate corresponding to the CNF \(G\) . Very roughly speaking, Reference [56] employs a careful coupling \((\widehat{\boldsymbol {u}},\widehat{\boldsymbol {z}})\) (whose existence is a consequence of the fact that bounded independence fools CNFs [3, 52]) to ensure that \(G(\widehat{\boldsymbol {u}})\) and \(G(\widehat{\boldsymbol {z}})\) almost always agree, and hence these \((m+1)^{\text{st}}\) coordinates “have a negligible effect” throughout Reference [19]’s Lindeberg-based proof of the regular case establishing closeness between \(A\boldsymbol {u}\) and \(A\boldsymbol {z}\) .
Because of the aforementioned structural fact (that an \(m\) -tuple of low-weight halfspaces is equivalent to “an \(m\) -tuple of regular halfspaces plus a CNF”), the low-weight case analyzed in Reference [56] did not require as significant a departure from Reference [19]’s approach, and from the Lindeberg method as a whole, as the general case that is the subject of this article. In particular, the new ideas discussed in Section 2.2 that are central to our proof were not present in Reference [56]’s analysis for the low-weight case. To elaborate on this,
Reference [56] did not have to exploit the product structure of Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b\) to fool it. Like Reference [19], the arguments of Reference [56] establish closeness in expectation between \(\Upsilon (A \boldsymbol {u}, G(\boldsymbol {u}))\) and \(\Upsilon (A \boldsymbol {z}, G(\boldsymbol {z}))\) for all smooth test functions \(\Upsilon\) , and the only properties of Bentkus’s mollifier that are used are the bounds on its derivatives given in Reference [5] (which are used in a black box way). The simpler setting of Reference [56] also did not necessitate comparing the Taylor expansions of \(\Upsilon\) around distinct points, as discussed in Section 2.2.
Reference [56] did not have to reason about Boolean anticoncentration, which, as discussed above, requires significant novel conceptual and technical work, including our new Littlewood–Offord theorem for polytopes. Like Reference [19], Reference [56] was able to apply Nazarov’s Gaussian anticoncentration bounds as a black box to pass from fooling Bentkus’s mollifier to closeness in multidimensional CDF distance.

3 Preliminaries

For convenience, in the rest of the article, we view halfspaces as having the domain \(\lbrace -1,1\rbrace ^n\) rather than \(\lbrace 0,1\rbrace ^n\) . We remind the reader that a halfspace \(F: \lbrace -1,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) is a function of the form \(F(x) = \mathds {1}[w \cdot x \le \theta ]\) for some \(w \in \mathds R^n\) , \(\theta \in \mathds R\) .
For an \(n\) -dimensional vector \(y\) and subset \(B \subseteq [n]\) , we write \(y_B\) to denote the \(|B|\) -dimensional vector obtained by restricting \(y\) to the coordinates in \(B\) . For an \(m \times n\) matrix \(A\) and subset \(B \subseteq [n]\) , we write \(A^B\) to denote the \(m\times |B|\) matrix obtained by restricting \(A\) to the columns in \(B\) . For indices \(i \in [m]\) and \(j \in [n]\) , we write \(A_i\) to denote the \(n\) -dimensional vector corresponding to the \(i\) th row of \(A\) , and \(A^j\) to denote the \(m\) -dimensional vector corresponding to the \(j\) -column of \(A\) .

3.1 Regularity, Orthants, and Taylor’s Theorem

Definition 3.1 ( \((k,\tau)\) -regular Vectors and Matrices)
We say that a vector \(w \in \mathds R^n\) is \(\tau\) -regular if \(|w_j| \le \tau \Vert w\Vert _2\) for all \(j \in [n]\) . More generally, we say that \(w\) is \((k,\tau)\) -regular if there is a partition \([n] = {\rm H}{\rm\small{EAD}} \sqcup {\rm T}{\rm\small{AIL}}\) where \(|{\rm H}{\rm\small{EAD}}|\le k\) and the subvector \(w_{{\rm T}{\rm\small{AIL}}}\) is \(\tau\) -regular. We say that \(w\) is \((k,\tau)\) -standardized if \(w\) is \((k,\tau)\) -regular and \(\sum _{j \in {\rm T}{\rm\small{AIL}}} w_j^2 = 1\) . We say that a matrix \(A \in \mathds R^{m\times n}\) is \(\tau\) -regular (respectively: \((k,\tau)\) -regular, \((k,\tau)\) -standardized) if all its rows are \(\tau\) -regular (respectively: \((k,\tau)\) -regular, \((k,\tau)\) -standardized). We also use this terminology to refer to polytopes \(Ax \le b\) .
Translated orthants and their boundaries. For \(b \in \mathds R^m\) , we write \(\mathcal {O}_b \subset \mathds R^m\) to denote the translated orthant
\[\mathcal {O}_b = \lbrace v \in \mathds R^m :v_i \le b_i \text{ for all $i\in [m]$}\rbrace .\]
We will overload notation and also write “ \(\mathcal {O}_b\) ” to denote the indicator \(\mathds R^m \rightarrow \lbrace 0,1\rbrace\) of the orthant \(\mathcal {O}_b\) (i.e., \(\mathcal {O}_b(v) = \mathds {1}[v \le b]\) ). We write \(\Game \mathcal {O}_{b} \subset \mathcal {O}_b\) to denote \(\mathcal {O}_b\) ’s surface,
\[\Game \mathcal {O}_b = \lbrace v \in \mathcal {O}_b :v_i = b_i \text{ for some $i\in [m]$}\rbrace .\]
For \(\Lambda \gt 0\) , we write \(\Game _{-\Lambda }\mathcal {O}_b\) and \(\Game _{+\Lambda } \mathcal {O}_b\) to denote the inner and outer \(\Lambda\) -boundaries of \(\mathcal {O}_b\) ,
\begin{equation} \Game _{-\Lambda }\mathcal {O}_b = \mathcal {O}_b \setminus \mathcal {O}_{b - (\Lambda , \ldots , \Lambda)}, \quad \quad \quad \Game _{+\Lambda }\mathcal {O}_b = \mathcal {O}_{b + (\Lambda , \ldots , \Lambda)} \setminus \mathcal {O}_{b}, \end{equation}
(1)
and \(\Game _{\pm \Lambda }\mathcal {O}_b\) to denote the disjoint union \(\Game _{\pm \Lambda }\mathcal {O}_b = \Game _{+\Lambda }\mathcal {O}_b \sqcup \Game _{-\Lambda }\mathcal {O}_b\) .
Derivatives and multidimensional Taylor expansion. We write \(\psi ^{(d)}\) to denote the \(d\) th derivative of a \(\mathcal {C}^d\) function \(\psi : \mathds R\rightarrow \mathds R.\) For an \(m\) -dimensional multi-index \(\alpha = (\alpha _1,\ldots ,\alpha _m) \in \mathds N^m\) , we write \(|\alpha |\) to denote \(\alpha _1 + \cdots + \alpha _m\) , and \(\alpha !\) to denote \(\alpha _1! \alpha _2! \cdots \alpha _m!\) . Given a vector \(\Delta \in \mathds R^m\) , the expression \(\Delta ^\alpha\) denotes \(\prod _{i=1}^m \Delta _i^{\alpha _i}\) . Given a function \(\Upsilon : \mathds R^m \rightarrow \mathds R\) , the expression \(\partial _\alpha \Upsilon\) denotes the mixed partial derivative taken \(\alpha _i\) times in the \(i\) th coordinate.
The following is a straightforward consequence of the multidimensional Taylor theorem, upper-bounding the error term by the \(L_1\) -norm of the derivatives times the \(L_\infty\) -norm of the offset-powers:
Fact 3.2 (Multidimensional Taylor Approximation).
Let \(d\in \mathds N\) and let \(\Upsilon : \mathds R^m \rightarrow \mathds R\) be a \(\mathcal {C}^d\) function. Then for all \(v ,\Delta \in \mathds R^m\) ,
\[\Upsilon (v + \Delta) = \sum _{0 \le |\alpha | \le d-1} \frac{\partial _\alpha \Upsilon (v)}{\alpha !} \Delta ^{\alpha } + \mathrm{err}(v,\Delta),\]
where
\[{\vert }*{\vert }{\mathrm{err}(v,\Delta)} \le \sup _{v^* \in \mathds R^m}\Bigg \lbrace \sum _{|\alpha |=d} |\partial _\alpha \Upsilon (v^*)|\Bigg \rbrace \cdot \Vert \Delta \Vert _\infty ^d.\]

3.2 Pseudorandomness Preliminaries

Throughout this work we use boldface for random variables and random vectors. If \(\mathcal {D}\) is a probability distribution, then we write \(\boldsymbol {x}\sim \mathcal {D}\) to denote that \(\boldsymbol {x}\) is drawn from that distribution. For example, \(\mathcal {N}(0,1)\) will denote the standard normal distribution, so \(\boldsymbol {g}\sim \mathcal {N}(0,1)\) means \(\boldsymbol {g}\) is a standard Gaussian random variable. In case \(S\) is a finite set, the notation \(\boldsymbol {x}\sim S\) will mean that \(\boldsymbol {x}\) is chosen uniformly at random from \(S\) . The most common case for this will be \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) , meaning that \(\boldsymbol {u}\) is chosen uniformly from \(\lbrace -1,1\rbrace ^n\) . We will reserve \(\boldsymbol {u}\) for this specific random vector.
We recall the definition of a pseudorandom generator:
Definition 3.3 (Pseudorandom Generator).
A function \(\mathscr{G} : \lbrace -1,1\rbrace ^r \rightarrow \lbrace -1,1\rbrace ^n\) is said to \(\delta\) -fool a function \(F: \lbrace -1,1\rbrace ^n \rightarrow \mathds R\) with seed length \(r\) if
\[\Big | \mathop {{\bf E}}_{\boldsymbol {s}\sim \lbrace -1,1\rbrace ^r}\big [F(\mathscr{G}(\boldsymbol {s}))\big ] - \mathop {{\bf E}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [F(\boldsymbol {u})\big ] \Big | \le \delta .\]
Such a function \(\mathscr{G}\) is said to be an explicit pseudorandom generator (PRG)that \(\delta\) -fools a class \(\mathcal {F}\) of \(n\) -variable functions if \(\mathscr{G}\) is computable by a deterministic uniform \(\mathrm{poly}(n)\) -time algorithm and \(\mathscr{G}\) \(\delta\) -fools every function \(F \in \mathcal {F}\) . We will also use the notation \(\boldsymbol {z}\sim \mathscr{G}\) to mean that \(\boldsymbol {z}= \mathscr{G}(\boldsymbol {s})\) for \(\boldsymbol {s}\sim \lbrace -1,1\rbrace ^r\) .
Bounded independence and hash families. A sequence of random variables \(\boldsymbol {x}_1, \ldots , \boldsymbol {x}_n\) is said to be \(r\) -wise independent if any collection of \(r\) of them is independent. In case the \(\boldsymbol {x}_i\) ’s are uniformly distributed on their range, we say the sequence is \(r\) -wise uniform. We will also use this terminology for distributions \(\mathcal {D}\) on \(\lbrace -1,1\rbrace ^n\) . An obvious but useful fact about \(r\) -wise uniform PRGs \(\mathscr{G}\) is that they 0-fool the class of degree- \(r\) polynomials \(\lbrace -1,1\rbrace ^n \rightarrow \mathds R\) .
A distribution \(\mathcal {H}\) on functions \([n] \rightarrow [L]\) is said to be an \(r\) -wise uniform hash family if, for \(\boldsymbol {h}\sim \mathcal {H}\) , the sequence \((\boldsymbol {h}(1), \ldots , \boldsymbol {h}(n))\) is \(r\) -wise uniform. Such a distribution also has the property that for any \(\ell \in [L]\) , the sequence \((\mathds {1}_{\boldsymbol {h}(1) = \ell }, \ldots , \mathds {1}_{\boldsymbol {h}(n) = \ell })\) is \(r\) -wise independent on \(\lbrace 0,1\rbrace ^n\) , with each individual random variable being Bernoulli \((1/L)\) . Well-known constructions (see, e.g., Section 3.5.5 of Reference [63]) give that for every \(n,L\) and \(r\) , there is an \(r\) -wise uniform hash family \(\mathcal {H}\) of functions \([n] \rightarrow [L]\) such that choosing a random function from \(\mathcal {H}\) takes \(O(r \log (nL))\) random bits (and evaluating a function from \(\mathcal {H}\) takes time \(\mathrm{poly}(r, \log n, \log L)\) ), and consequently there are known efficient constructions of \(r\) -wise uniform distributions over \(\lbrace 0,1\rbrace ^n\) with seed length \(O(r \log n).\)
Fooling CNFs. Gopalan, Meka, and Reingold [17] have given an efficient explicit PRG that fools the class of small-width CNFs:
Theorem 3.4 (PRG for Small-Width CNFs).
There is an explicit PRG \(\mathscr{G}_{\mathrm{GMR}} = \mathscr{G}_{\mathrm{GMR}}(w,\delta _\mathrm{CNF})\) that \(\delta _\mathrm{CNF}\) -fools the class of all width- \(w\) CNF formulas over \(\lbrace -1,1\rbrace ^n\) and has seed length
\[O(w^2 \log ^2(w \log (1/\delta _\mathrm{CNF})) + w \log (w)\log (1/\delta _\mathrm{CNF}) + \log \log n).\]

4 Our PRG

The Meka–Zuckerman generator. As stated earlier, the PRG that we will analyze is a slight variant of a PRG first proposed by Meka and Zuckerman for fooling a single halfspace [39]. We begin by recalling the Meka–Zuckerman PRG.
Definition 4.1 (Meka–Zuckerman Generator).
The Meka–Zuckerman generator with parameters \(L, r_\mathrm{hash},r_\mathrm{bucket}\in [n]\) , denoted \(\mathscr{G}_{\mathrm{MZ}}\) , is defined as follows: Let \(\boldsymbol {h}: [n] \rightarrow [L]\) be an \(r_\mathrm{hash}\) -wise uniform hash function. Let \(\boldsymbol {y}^{1},\ldots ,\boldsymbol {y}^{L} \sim \lbrace -1,1\rbrace ^n\) be independent random variables, each \(r_\mathrm{bucket}\) -wise uniform. A draw from \(\mathscr{G}_{\mathrm{MZ}} = \mathscr{G}_{\mathrm{MZ}}(L,r_\mathrm{hash},r_\mathrm{bucket})\) is \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) where
\begin{equation*} \boldsymbol {z}_{\boldsymbol {h}^{-1}(\ell)} = \boldsymbol {y}^{\ell }_{\boldsymbol {h}^{-1}(\ell)} \quad \text{for all $\ell \in [L]$.} \end{equation*}
In words, an \(r_\mathrm{hash}\) -wise uniform hash \(\boldsymbol {h}\) is used to partition the variables \(x_1,\ldots ,x_n\) into \(L\) “buckets,” and then independently across buckets, the variables in each bucket are assigned according to an \(r_\mathrm{bucket}\) -wise uniform distribution.
We note in passing that the generators of References [19, 56] also have this structure (though the choice of parameters \(L, r_\mathrm{bucket}\) , and \(r_\mathrm{hash}\) are different than those in Reference [39]).
Our generator. Now we are ready to describe our generator and bound its seed length. Roughly speaking, our generator extends the Meka–Zuckerman generator by (i) additionally Xor-ing each bucket with an independent pseudorandom variable that fools CNF formulas; and (ii) globally Xor-ing the entire resulting \(n\) -bit string with an independent draw from a \(2k\) -wise uniform distribution.
Definition 4.2 (Our Generator).
Our generator, denoted \(\mathscr{G}\) , is parameterized by values \(L, r_\mathrm{hash}, r_\mathrm{bucket},\) \(k,w \in [n]\) , \(\delta _\mathrm{CNF}\in (0,1)\) and is defined as follows: Let:
\(\boldsymbol {h}, \boldsymbol {y}^1,\ldots ,\boldsymbol {y}^L\) be defined as in the Meka–Zuckerman generator with parameters \(L\) , \(r_\mathrm{hash}\) , and \(r_\mathrm{bucket}\) .
\(\tilde{\boldsymbol {y}}^1,\ldots ,\tilde{\boldsymbol {y}}^L\sim \lbrace -1,1\rbrace ^n\) be independent draws from \(\mathscr{G}_{\mathrm{GMR}}(w,\delta _\mathrm{CNF})\) .
\(\boldsymbol {y}^\star \sim \lbrace -1,1\rbrace ^n\) be \(2k\) -wise uniform.
Define the random variable \(\breve{\boldsymbol {y}} \sim \lbrace -1,1\rbrace ^n\) by
\[\breve{\boldsymbol {y}}_{\boldsymbol {h}^{-1}(\ell)} = \left(\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell \right)_{\boldsymbol {h}^{-1}(\ell)} \qquad \text{for all $\ell \in [L]$},\]
where \(\oplus\) denotes bitwise Xor. A draw from our generator \(\mathscr{G} = \mathscr{G}(L,r_\mathrm{hash},r_\mathrm{bucket},k,w,\delta _\mathrm{CNF})\) is \(\boldsymbol {z}\sim \lbrace -1,1\rbrace ^n\) where \(\boldsymbol {z}= \breve{\boldsymbol {y}} \oplus \boldsymbol {y}^\star\) .
Recalling the standard constructions of \(r\) -wise uniform hash functions and random variables described at the end of Section 3, we have the following:
Fact 4.3 (Seed Length).
The seed length of our PRG \(\mathscr{G}\) with parameters \(L,r_\mathrm{hash},r_\mathrm{bucket},k,w,\delta _\mathrm{CNF}\) is
\begin{align*} &\le O(r_\mathrm{hash}\cdot \log (nL) + L\cdot r_\mathrm{bucket}\cdot \log n \qquad \qquad \qquad {\rm (Seed length for \mathscr{G}_\mathrm{MZ})}\\ &\ \ \ + L \cdot (w^2 \log ^2(w \log (1/\delta _\mathrm{CNF})) + w \log (w)\log (1/\delta _\mathrm{CNF}) + \log \log n) \qquad \qquad \qquad {\rm (L copies of \mathscr{G}_{\mathrm{GMR}})}\\ &\ \ \ + k\log n).\qquad \qquad \qquad {\rm (2k-wise uniform string)} \end{align*}

4.1 Setting of Parameters

We close this section with the parameter settings for fooling intersections of \(m\) halfspaces over \(\lbrace -1,1\rbrace ^n\) . Fix \(\varepsilon \in (0,1)\) to be an arbitrarily small absolute constant; the parameters we now specify will be for fooling to accuracy \(O_\varepsilon (\delta) = O(\delta)\) . We first define a few auxiliary parameters:
\begin{align*} \lambda &= \frac{\delta }{\sqrt {\log (m/\delta) \log m}} \qquad \qquad \qquad {\rm {(Dictated by Equation~(31))}} \\ \tau &= \frac{\delta ^{1+\varepsilon }}{(\log m)^{2.5+2\varepsilon }} \qquad \qquad \qquad {\rm {(Dictated by Equation~(30))}} \\ d &= \text{constant depending only on $\varepsilon $.} \qquad \qquad \qquad {\rm {(Dictated by Equation~(30))}} \end{align*}
The precise value of \(d =d(\varepsilon)\) will be specified in the proof of Theorem 8.1. We will instantiate our generator \(\mathscr{G} = \mathscr{G}(L,r_\mathrm{hash},r_\mathrm{bucket},k,w,\delta _\mathrm{CNF})\) with parameters:
\begin{align*} L &= \frac{(\log m)^5}{\delta ^{2+\varepsilon }} \qquad \qquad \qquad {\rm {(Constrained by Equation~(30),}} \\ & \qquad \qquad \qquad {\rm chosen to optimize seed length)} \end{align*}
\begin{align*} r_\mathrm{hash}&= C_1 \log (Lm/\delta) \qquad \qquad \qquad {\rm {(Dictated by Proposition~8.11)}} \\ r_\mathrm{bucket}&= \log (m/\delta) \qquad \qquad \qquad {\rm {(Dictated by Lemma~8.10)}} \\ k &= \frac{C_2 \log (m/\delta)\log \log (m/\delta)}{\tau ^2} \qquad \qquad \qquad {\rm {(Dictated by Theorem~5.1)}} \\ w &= \frac{2k}{L} \qquad \qquad \qquad {\rm {(Dictated by Proposition~8.11)}} \\ \delta _\mathrm{CNF}&= \frac{\delta }{L}\cdot {\lparen }*{\rparen }{\frac{\lambda }{m\sqrt {n}}}^{d-1}, \qquad \qquad \qquad {\rm {(Dictated by Equation~(30))}} \end{align*}
where \(C_1\) and \(C_2\) are absolute constants specified in the proofs of Proposition 8.11 and Theorem 5.1, respectively.
Our seed length:. By Fact 4.3, our overall seed length is
\begin{equation} \mathrm{polylog}(m) \cdot \delta ^{-{(2+\varepsilon)}}\cdot \log n \end{equation}
(2)
for any absolute constant \(\varepsilon \in (0,1)\) .
Remark 4.4.
As alluded to in the introduction, our techniques can also be used to show that the Meka–Zuckerman generator itself fools the class of intersections of \(m\) halfspaces over \(\lbrace -1,1\rbrace ^n\) . However, this would require setting the parameters \(L, r_\mathrm{hash}\) , and \(r_\mathrm{bucket}\) to be somewhat larger than the values used above and would result in a slightly worse seed length of \(\mathrm{poly}(\log m,1/\delta ,\log n)\) than our \(\mathrm{poly}(\log m,1/\delta)\cdot \log n\) . Briefly, such an analysis would use the fact that bounded-uniformity distributions fool CNF formulas [3, 52]; our analysis instead uses the (more efficient) Reference [17] generator for this purpose.

5 Reduction to Standardized Polytopes

5.1 A Reduction from Fooling Polytopes to Fooling Standardized Polytopes

In this section, we reduce from the problem of fooling general \(m\) -facet polytopes to the problem of fooling \(m\) -facet \((k,\tau)\) -standardized polytopes (Definition 3.1). The main technical result we prove in this section is the following:
Lemma 5.1 (Approximating Arbitrary Polytopes by \((k,\tau)\) -standardized Polytopes Under Bounded-uniformity Distributions)
There is a universal constant \(C_2 \gt 0\) such that the following holds: Fix \(m \ge 1\) and \(0 \lt \delta , \tau \lt 1/2\) such that
\begin{equation} k \coloneqq \frac{C_2 \log (m/\delta)\log \log (m/\delta)}{\tau ^2} \le \frac{n}{2}. \end{equation}
(3)
For every \(m\) -facet polytope \(Ax \le b\) in \(\mathds R^n\) , there is an \(m\) -facet \((k,\tau)\) -standardized polytope \(A^{\prime }x \le b^{\prime }\) in \(\mathds R^n\) such that if \(\boldsymbol {y}\sim \lbrace -1,1\rbrace ^n\) is \(2k\) -wise uniform, then
\begin{equation} {{\bf Pr}}{\lbrack }*{\rbrack }{\mathds {1}[A\boldsymbol {y}\le b] \ne \mathds {1}[A^{\prime } \boldsymbol {y}\le b^{\prime }]} \le \delta . \end{equation}
(4)
Remark 5.2.
Had we been content in this theorem with the worse value of \(k = O{\lparen }*{\rparen }{\log ^2(m/\delta)/\tau ^2}\) , then the result would essentially be implicit in Reference [9, Theorem 5.4] (and Reference [18, Theorem 7.4]), using only \((k+2)\) -wise uniformity. To save essentially a \(\log (m/\delta)\) factor, we give a modified proof in Appendix A.
We stress that Lemma 5.1 establishes that \(\mathds {1}[Ax \le b]\) is well-approximated by \(\mathds {1}[A^{\prime }x \le b^{\prime }]\) under both the uniform distribution and the pseudorandom distribution constructed by our generator, since both of these distributions are \(2k\) -wise uniform. (Note that a draw \(\boldsymbol {z}= \breve{\boldsymbol {y}} \oplus \boldsymbol {y}^\star\) from our generator is indeed \(2k\) -wise uniform, since \(\boldsymbol {y}^\star\) is; indeed,Lemma 5.1 is the motivation for why our construction includes a bitwise-Xor with \(\boldsymbol {y}^\star\) .) This is crucial: In general, given a function \(F\) and an approximator \(F^{\prime }\) that is close to \(F\) only under the uniform distribution (i.e., \({{\bf Pr}}[F(\boldsymbol {u}) \ne F^{\prime }(\boldsymbol {u})]\) is small), fooling \(F^{\prime }\) does not suffice to fool \(F\) itself.
Given Lemma 5.1, to prove Theorem 1.1 it is sufficient to prove the following:
Theorem 5.3 (Fooling \((k,\tau)\) -standardized Polytopes)
Let \(\mathscr{G}\) be our generator with parameters as set in Section 4.1. For all \(m\) -facet \((k,\tau)\) -standardized polytopes \(A^{\prime }x \le b^{\prime }\) ,
\[\bigg | \mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [A^{\prime }\boldsymbol {u}\in \mathcal {O}_{b^{\prime }} \big ] - \mathop {{\bf Pr}}_{\boldsymbol {z}\sim \mathscr{G}}\big [ A^{\prime }\boldsymbol {z}\in \mathcal {O}_{b^{\prime }}\big ] \bigg | = O(\delta).\]
Proof of Theorem 1.1 Assuming Theorem 5.3 and Lemma 5.1
Let \(Ax \le b\) be any \(m\) -facet polytope in \(\mathds R^n\) . Given \(\delta \gt 0\) , we recall that \(\tau = {{\delta ^{1+\varepsilon }/(\log m)^{2.5+\varepsilon }}}.\) If the quantity (3) is greater than \(n/2\) , then the claimed seed length from Fact 4.3 is greater than \(n\) and the conclusion of Theorem 1.1 trivially holds, so we suppose that (3) is less than \(n/2.\) Let \(A^{\prime }x \le b^{\prime }\) be the \(m\) -facet \((k,\tau)\) -standardized polytope given by Lemma 5.1. We have
\begin{align*} \mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}[A\boldsymbol {u}\in \mathcal {O}_b] &= \mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}[A^{\prime }\boldsymbol {u}\in \mathcal {O}_{{b^{\prime }}}] \pm \delta \qquad \qquad \qquad {\rm Lemma~5.1 applied to \boldsymbol {u}}\\ &= \mathop {{\bf Pr}}_{\boldsymbol {z}\sim \mathcal {G}}[A^{\prime }\boldsymbol {z}\in \mathcal {O}_{{b^{\prime }}}] \pm \delta \pm \delta \qquad \qquad \qquad {\rm Theorem~5.3}\\ &= \mathop {{\bf Pr}}_{\boldsymbol {z}\sim \mathcal {G}}[A\boldsymbol {z}\in \mathcal {O}_b] \pm \delta \pm \delta \pm \delta \qquad \qquad \qquad {\rm Lemma~5.1 applied to\ \boldsymbol {z}} \end{align*}
and Theorem 1.1 follows by rescaling \(\delta\) appropriately.□
The rest of the article is devoted to proving Theorem 5.3.

6 Bentkus’s Mollifier and Its Properties

In this section we introduce and analyze Bentkus’s orthant mollifier \(\widetilde{\mathcal {O}}_b: \mathds R^m \rightarrow (0,1)\) , which is a smoothed version of the translated orthant indicator function \(\mathcal {O}_b: \mathds R^m \rightarrow \lbrace 0,1\rbrace\) from Section 3.1.
Definition 6.1 (Gaussian-mollified Halfline).
For \(\theta \in \mathds R\) and \(\lambda \gt 0\) , we define the \({\mathcal {C}^\infty }\) function \({\widetilde{\mathds {1}}_{\theta ,\lambda }}: \mathds R\rightarrow (0,1)\) ,
\[{\widetilde{\mathds {1}}_{\theta ,\lambda }}(t) = \mathop {{\bf E}}_{\boldsymbol {g}\sim N(0,1)}\big [\mathds {1}[t + \lambda \boldsymbol {g}\le \theta ]\big ].\]
Definition 6.2 (Bentkus’s Orthant Mollifier).
For \(b \in \mathds R^m\) and \(\lambda \gt 0\) , the Bentkus \(\lambda\) -mollifier for \(\mathcal {O}_b\) is defined to be the \(C^\infty\) function \(\widetilde{\mathcal {O}}_{b,\lambda }: \mathds R^m \rightarrow (0,1)\) ,
\[\widetilde{\mathcal {O}}_{b,\lambda }(v) = \mathop {{\bf E}}_{\boldsymbol {g}\sim N(0,1)^m} \big [ \mathcal {O}_b(v + \lambda \boldsymbol {g})\big ].\]
Since \(\mathcal {O}_b(v) = \prod _{i=1}^m \mathds {1}[v_i \le b_i]\) and \(\mathcal {N}(0,1)^m\) is a product distribution, the mollifier \(\widetilde{\mathcal {O}}_{b,\lambda }\) can be equivalently defined as follows:
\begin{equation} \widetilde{\mathcal {O}}_{b,\lambda }(v) = \prod _{i=1}^m {\widetilde{\mathds {1}}_{b_i,\lambda }}(v_i). \end{equation}
(5)
This product structure of Bentkus’s mollifier will be crucially important for us in the analysis that we carry out in Section 8.1. We note the following translation property of Bentkus’s mollifier:
Fact 6.3.
For all \(b,v,\Delta \in \mathds R^m\) and \(\lambda \gt 0\) , we have \(\widetilde{\mathcal {O}}_{b,\lambda }(v+\Delta) = \widetilde{\mathcal {O}}_{b-v,\lambda }(\Delta)\) .
In Section 8.1 we will also use the following global bound on the magnitude of the derivatives of the Gaussian-mollified halfline:
Fact 6.4 (Standard; see Exercise 11.41 in Reference [47]).
For all \(\theta \in \mathds R\) , \(\lambda \gt 0\) , and integer \(d\ge 1\) ,
\[{\Vert }*{\Vert }{ {\widetilde{\mathds {1}}_{\theta ,\lambda }}^{(d)}}_\infty = O_d{\lparen }*{\rparen }{\frac{1}{\lambda }}^d.\]
The following result, from Bentkus [5, Theorem 3(ii)], can be viewed as a multidimensional generalization of Fact 6.4. (Strictly speaking, Reference [5] only considers \(b\) ’s of the form \((\theta , \theta , \ldots , \theta)\) , but by translation-invariance the bound holds for all \(b \in \mathds R^m.\) )
Theorem 6.5 (Bounded Sum of Derivatives).
For all \(m \ge 2\) , \(b \in \mathds R^m\) , \(\lambda \gt 0\) , and integer \(d\ge 1\) ,
\[\sup _{v \in \mathds R^m}\Bigg \lbrace \sum _{|\alpha |=d} |\partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(v)|\Bigg \rbrace = O_d{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d.\]
Recall from (1) that \(\Game _{-\Lambda }\mathcal {O}_b = \mathcal {O}_b \setminus \mathcal {O}_{b - (\Lambda , \ldots , \Lambda)}\) and \(\Game _{+\Lambda }\mathcal {O}_b = \mathcal {O}_{b + (\Lambda , \ldots , \Lambda)} \setminus \mathcal {O}_{b}.\) We will use the following notions of approximation for translated orthants:
Definition 6.6 (Inner and Outer Approximators for Orthants).
We say that \(\Upsilon : \mathds R^m \rightarrow [0,1]\) is a \((\Lambda ,\delta)\) -inner approximator for \(\mathcal {O}_b\) if
\[|\Upsilon (v) - \mathcal {O}_b(v)| \le \delta \quad \text{for all $v \notin \Game _{-\Lambda }\mathcal {O}_b$}.\]
Similarly, we say that \(\Upsilon\) is a \((\Lambda ,\delta)\) -outer approximator for \(\mathcal {O}_b\) if
\[|\Upsilon (v) - \mathcal {O}_b(v)| \le \delta \quad \text{for all $v \notin \Game _{+\Lambda }\mathcal {O}_b$}.\]
The connection between Bentkus’s mollifier and these notions of approximation is established in the following claim:
Lemma 6.7 (Bentkus’s Mollifier, Appropriately Translated, Yields Inner and Outer Approximators for Translated Orthants).
For all \(b \in \mathds R^m\) and \(\lambda ,\delta \in (0,1)\) , there are \(b^{{\mathrm{in}}},b^{{\mathrm{out}}} \in \mathds R^m\) such that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda },\widetilde{\mathcal {O}}_{b^{{\mathrm{out}}},\lambda }\) are \((\Lambda ,\delta)\) -inner and -outer approximators for \(\mathcal {O}_b\) , respectively, where \(\Lambda = \Theta (\lambda \sqrt {\log (m/\delta)}).\)
Proof.
Let \(b^{{\mathrm{in}}} = b - \beta \mathds {1}_m\) where \(\beta = \Theta (\lambda \sqrt {\log (m/\delta)}) \lt \Lambda\) will be specified in more detail later. We show below that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }\) is an \((\Lambda ,\delta)\) -inner approximator for \(\mathcal {O}_b\) ; an analogous argument in which the \(v \in \mathcal {O}_b\) and \(v \notin \mathcal {O}_b\) cases switch roles shows that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{out}}},\lambda }\) is a \((\Lambda ,\delta)\) -outer approximator for \(\mathcal {O}_b\) , where \(b^{{\mathrm{out}}} = b + \beta \mathds {1}_m.\)
Fix \(v \notin \Game _{-\Lambda }\mathcal {O}_b\) . There are two possibilities: either \(v \in \mathcal {O}_b\) or \(v \notin \mathcal {O}_b.\) We first consider the case in which \(v\) lies in \(\mathcal {O}_b\) . Since \(v \notin \Game _{-\Lambda }\mathcal {O}_b\) , we have \(v_i \le b_i - \Lambda\) for all \(i \in [m].\) Since \(\mathcal {O}_b(v)=1\) , we must show that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }(v) \ge 1-\delta\) . Recalling Equation (5) and the fact that the function \({\widetilde{\mathds {1}}_{\theta ,\lambda }}: \mathds R\rightarrow (0,1)\) is monotone decreasing for all \(\theta \in \mathds R\) and \(\lambda \gt 0\) , it suffices to show that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }(b - \Lambda \mathds {1}_m) \ge 1-\delta .\) Again by Equation (5) this holds if and only if
\[\prod _{i=1}^m {\widetilde{\mathds {1}}_{b_i - \beta ,\lambda }}(b_i - \Lambda) \ge 1-\delta ,\]
which is equivalent to
\[{{\lparen }*{\rparen }{\mathop {{\bf Pr}}_{\boldsymbol {g}\sim \mathcal {N}(0,1)}{\lbrack }*{\rbrack }{\boldsymbol {g}\le (\Lambda - \beta)/\lambda }}}^m \ge 1-\delta ,\]
which holds if
\begin{equation} \mathop {{\bf Pr}}_{\boldsymbol {g}\sim \mathcal {N}(0,1)}[\boldsymbol {g}\le (\Lambda - \beta)/\lambda ] \ge 1-\delta /m. \end{equation}
(6)
By the well-known Gaussian tail bound \({{\bf Pr}}[ \boldsymbol {g}\ge t] \le 1 - \frac{1}{t\sqrt {2\pi }} e^{-t^2/2}\) for \(t \gt 0\) (see, e.g., Reference [13], Section 7.1), we see that to achieve Equation (6) it suffices to have \(\Lambda - \beta \ge C \lambda \sqrt {\ln (m/\delta)}\) for an absolute constant \(C\gt 0\) , and hence \(\Lambda = \Theta (\lambda \sqrt {\log ({m}/\delta)})\) suffices.
Now, we turn to the case in which \(v \notin \mathcal {O}_b\) , and hence for some \(i \in [m],\) we have \(v_i \gt b_i\) ; without loss of generality, we suppose that \(v_1 \gt b_1.\) Since \(\mathcal {O}_b(v)=0\) in this case, we must show that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }(v) \le \delta\) . By Equation (5) this holds if and only if
\[\prod _{i=1}^m {\widetilde{\mathds {1}}_{b_i - \beta ,\lambda }}(v_i) \le \delta ,\]
which holds if
\[{\widetilde{\mathds {1}}_{b_1 - \beta ,\lambda }}(v_1) \le \delta ,\]
which is equivalent to
\[\mathop {{\bf Pr}}_{\boldsymbol {g}\sim \mathcal {N}(0,1)}[v_1 + \lambda \boldsymbol {g}\le b_1 - \beta ] \le \delta .\]
Recalling that \(v_1 \gt b_1\) , it suffices to have
\[\mathop {{\bf Pr}}_{\boldsymbol {g}\sim \mathcal {N}(0,1)}[\boldsymbol {g}\le - \beta /\lambda ] \le \delta ,\]
which holds (with room to spare) for our choice of \(\beta\) by the standard Gaussian tail bound.□

6.1 The Connection between Inner/outer Approximators and CDF Distance

The following elementary properties of inner/outer approximators will be useful for us:
Fact 6.8.
Fix \(b \in \mathds R^m\) and let \(\Upsilon ^{{\mathrm{in}}}, \Upsilon ^{{\mathrm{out}}}\) be \((\Lambda ,\delta)\) -inner and -outer approximators for \(\mathcal {O}_b\) . Then
(1)
\(\Upsilon ^{{\mathrm{in}}}(v) - \delta \le \mathcal {O}_b(v) \le \Upsilon ^{{\mathrm{out}}}(v) + \delta\) for all \(v \in \mathds R^m\) .
(2)
\(\Upsilon ^{{\mathrm{in}}}\) is a \((\Lambda ,\delta)\) -outer approximator for \(\mathcal {O}_{b-\Lambda \mathds {1}_m}\) , and similarly \(\Upsilon ^{{\mathrm{out}}}\) is a \((\Lambda ,\delta)\) -inner approximator for \(\mathcal {O}_{b + \Lambda \mathds {1}_m}\) .
The next lemma is straightforward but very useful for us. Intuitively, it says that for an \(\mathds R^m\) -valued random variable \(\tilde{\boldsymbol {v}}\) to fool a translated orthant \(\mathcal {O}_b\) relative to another \(\mathds R^m\) -valued random variable \(\boldsymbol {v}\) , it suffices to (i) have \(\tilde{\boldsymbol {v}}\) fool both inner and outer approximators for \(\mathcal {O}_b\) , and (ii) establish anticoncentration of the original random variable \(\boldsymbol {v}\) at the inner and outer boundaries of \(\mathcal {O}_b\) . We explain in detail how we will use this lemma after giving its proof below.
Lemma 6.9.
Let \(\Upsilon ^{\mathrm{in}}, \Upsilon ^{\mathrm{out}}: \mathds R^m \rightarrow [0,1]\) be \((\Lambda ,\delta)\) -inner and -outer approximators for \(\mathcal {O}_b\) . Let \(\boldsymbol {v}\) and \(\tilde{\boldsymbol {v}}\) be \(\mathds R^m\) -valued random variables satisfying:
\begin{equation} \big | {{\bf E}}\big [\Upsilon (\boldsymbol {v})\big ] - {{\bf E}}\big [\Upsilon (\tilde{\boldsymbol {v}})\big ]\big | \le \gamma \end{equation}
(7)
for both \(\Upsilon \in \lbrace \Upsilon ^{{\mathrm{out}}},\Upsilon ^{{\mathrm{in}}}\rbrace\) . Then
\[\big | {{\bf Pr}}\big [ \boldsymbol {v}\in \mathcal {O}_b\big ] - {{\bf Pr}}\big [\tilde{\boldsymbol {v}}\in \mathcal {O}_b\big ] \big | \le \gamma + 2\delta + {{\bf Pr}}\big [ \boldsymbol {v}\in \Game _{\pm \Lambda }\mathcal {O}_b\big ].\]
Proof.
The proof follows similar lines to the arguments used to prove Lemma 3.3 in Reference [19]. We first note that
\begin{align*} {{\bf Pr}}\big [\tilde{\boldsymbol {v}} \in \mathcal {O}_{b}\big ] & \le {{\bf E}}\big [\Upsilon ^{{\mathrm{out}}}(\tilde{\boldsymbol {v}})\big ] + \delta \qquad \qquad \qquad {\rm (Item 1 of Fact~6.8)} \\ &\le \left({{\bf E}}\big [\Upsilon ^{{\mathrm{out}}}(\boldsymbol {v})\big ] + \gamma \right) + \delta \qquad \qquad \qquad {\rm (Equation~(7) with \Upsilon = \Upsilon ^{{\mathrm{out}}})} \\ &\le {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_{b+\Lambda \mathds {1}_m}\big ] + \gamma + 2\delta . \qquad \qquad \qquad {\rm (Item 2 of Fact~6.8)} \end{align*}
Combining this with a symmetric argument for the lower bound, we have:
\begin{equation} {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_{b - \Lambda \mathds {1}_m}\big ] - \gamma -2\delta \le {{\bf Pr}}\big [\tilde{\boldsymbol {v}} \in \mathcal {O}_b \big ] \le {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_{b+ \Lambda \mathds {1}_m}\big ] + \gamma + 2\delta . \end{equation}
(8)
To convert this type of closeness into CDF closeness, we observe that
\begin{align*} {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_{b+ \Lambda \mathds {1}_m}\big ] &= {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_b\big ] + {{\bf Pr}}\big [\boldsymbol {v}\in \Game _{+\Lambda }\mathcal {O}_b\big ] \\ {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_{b- \Lambda \mathds {1}_m}\big ] &= {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_{b}\big ] -{{\bf Pr}}\big [\boldsymbol {v}\in \Game _{-\Lambda }\mathcal {O}_b \big ]. \end{align*}
Plugging these identities into Equation (8), we conclude that
\begin{align*} {{\bf Pr}}\big [\tilde{\boldsymbol {v}} \in \mathcal {O}_b \big ] &= {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_b \big ] \pm \big (\gamma + 2\delta + {{\bf Pr}}\big [\boldsymbol {v}\in \Game _{+\Lambda }\mathcal {O}_b\big ] +{{\bf Pr}}\big [\boldsymbol {v}\in \Game _{-\Lambda }\mathcal {O}_b \big ] \big) \\ &= {{\bf Pr}}\big [\boldsymbol {v}\in \mathcal {O}_b \big ] \pm \big (\gamma + 2\delta + {{\bf Pr}}\big [\boldsymbol {v}\in \Game _{\pm \Lambda }\mathcal {O}_b\big ] \big), \end{align*}
thus completing the proof of Lemma 6.9.□

6.1.1 Applying Lemma 6.9 in the context of Theorem 5.3, and the organization of the rest of this article.

Applying Lemma 6.9 with \(\boldsymbol {v}\) and \(\tilde{\boldsymbol {v}}\) being \(A\boldsymbol {u}\) and \(A \boldsymbol {z}\) , respectively, the task of bounding
\[\Big | \mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [A\boldsymbol {u}\in \mathcal {O}_b \big ] - \mathop {{\bf Pr}}_{\boldsymbol {z}\sim \mathscr{G}_{\mathrm{MZ}}}\big [ A\boldsymbol {z}\in \mathcal {O}_b\big ] \Big |\]
reduces to the following two-step program:
(1)
Establishing anticoncentration within orthant boundaries: bounding \({{\bf Pr}}[A\boldsymbol {u}\in \Game _{\pm \Lambda }\mathcal {O}_b]\) ; and,
(2)
Fooling Bentkus’s mollifier: bounding \(|{{\bf E}}[\widetilde{\mathcal {O}}(A\boldsymbol {u})] - {{\bf E}}[\widetilde{\mathcal {O}}(A\boldsymbol {z})]|\) for \(\widetilde{\mathcal {O}} \in \lbrace \widetilde{\mathcal {O}}_{b^{{\mathrm{out}}},\lambda },\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }\rbrace\) , the inner and outer approximators for \(\mathcal {O}_b\) given by Lemma 6.7.
Section 7 is devoted to the former and Section 8 the latter. In Section 9, we put these pieces together to prove Theorem 5.3.

7 Boolean Anticoncentration Within Orthant Boundaries

The main result of this section is Theorem 7.1, which provides the first step of the two-step program described at the end of Section 6:
Theorem 7.1 (Boolean Anticoncentration within Orthant Boundaries).
Assume \(A \in \mathds R^{m\times n}\) satisfies the following property: Each of its row vectors has a \(\tau\) -regular subvector of 2-norm 1, where \(\tau\) is as set in Section 4.1.2 Then for all \(b \in \mathds R^m\) and \(\Lambda \ge \tau\) , we have
\[\mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n} {\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game _{\pm \Lambda } \mathcal {O}_{b}}= O{\lparen }*{\rparen }{\Lambda \sqrt {\log m}}.\]
En route to proving Theorem 7.1, we will establish a “Littlewood–Offord theorem for polytopes,” Theorem 2.2, that was stated in Section 2.2.1. Theorem 2.2 will in fact be obtained as a special case of a more general result about intersections of \(m\) arbitrary unate functions (namely, Lemma 7.13).
Definition 7.2 (Unateness).
A function \(F: \lbrace -1,1\rbrace ^n\rightarrow \lbrace 0,1\rbrace\) is unate in direction \(\sigma \in \lbrace -1,1\rbrace ^n\) if the function \(G(x_1,\ldots ,x_n) = F(\sigma _1 x_1,\ldots , \sigma _n x_n)\) is a monotone Boolean function, meaning that \(G(x) \le G(y)\) whenever \(x_j \le y_j\) for all \(j \in [n].\) We refer to \(\sigma\) as the orientation of \(F\) .
Our analysis, dealing as it does with intersections of unate functions, is somewhat reminiscent of that of Reference [26], and indeed we will establish the main result of Reference [26]—an upper bound of \(O(\sqrt {n \log m})\) on the average sensitivity of any intersection of \(m\) unate functions—in the course of our analysis.

7.1 Caps and Their Boundary Edges

Let \(G\) and \(H\) be subsets of \(\lbrace -1,1\rbrace ^n\) . We typically think of \(G\) as a \(G\) eneral/arbitrary set and \(H\) as a \(H\) alfspace, though formally \(H\) will only need to be unate. Throughout this section, we write \(\sigma \in \lbrace -1,1\rbrace ^n\) to denote the orientation of \(H\) .
We call the set \(G \setminus H\) the cap, the set \(G\cap H\) the body, and the complement of \(G\) the exterior. Please refer to Figure 1, where \(G\) is the union of the two regions with blue shading and \(H\) is the gray-shaded region (depicted as a halfspace in the figure). The upward arrows in the diagram illustrate some edges of the hypercube. We have oriented these edges according to \(\sigma\) : For an edge \(\lbrace x,y\rbrace\) in the \(j\) th direction in which \(x_j = -1\) and \(y_j = 1\) , the tail of the corresponding arrow represents \(x\) if \(\sigma _j = -1\) and \(y\) if \(\sigma _j = 1\) . Note in particular that the edges are oriented “away” from \(H\) (i.e., so that \(H\) is antimonotone with respect to the edge orientations).
Fig. 1.
Fig. 1. Illustration of a cap and body.
We will be concerned with the boundary edges for the cap \(G \setminus H\) ; these are edges that have one endpoint inside \(G \setminus H\) and one endpoint outside it.
Definition 7.3 (Edge Boundary).
For a general set \(F \subseteq \lbrace -1,1\rbrace ^n\) , let \({\mathcal {E}}(F)\) denote the fraction of all \(n2^{n-1}\) hypercube edges that are boundary edges for \(F\) .
We distinguish the three possible types of boundary edges of the cap \(G \setminus H\) :
Body \(\rightarrow\) Cap (BC) edges: the red edges in the diagram. Formally, these are edges where the tail is in the body \(G\cap H\) and the head is in the cap \(G\setminus H\) .
Exterior \(\rightarrow\) Cap (EC) edges: the green edges in the diagram. Formally, these are edges where the tail is not in \(G\) , and the head is in the cap \(G\setminus H\) .
Cap \(\rightarrow\) Exterior (CE) edges: the purple edges in the diagram. Formally, these are edges where the tail is in the cap \(G\setminus H\) and the head is not in \(G\) .
Note that there are no Cap \(\rightarrow\) Body (CB) edges. Formally, these would be the last possibility for \(G \setminus H\) boundary edges, namely, ones with tail in the cap \(G \setminus H\) and head in the body \(G \cap H\) . But these cannot exist due to the antimonotonicity of \(H\) vis-à-vis the edges; if the tail is already not in \(H\) , then the head cannot be in \(H\) .
Given a cap \(C = G \setminus H\) , we write \(\mathrm{BC}(G,H)\) , \(\mathrm{EC}(G,H)\) , \(\mathrm{CE}(G,H)\) for the fraction of hypercube edges of each of the three above types. Therefore, \({\mathcal {E}}(C) = \mathrm{BC}(G,H) + \mathrm{EC}(G,H) + \mathrm{CE}(G,H)\) .
We will also be interested in the directed edge boundary of caps:
Definition 7.5 (Directed Edge Boundary).
For a cap \(G \setminus H\) , define
\begin{equation} {\vec{\mathcal {E}}}(G,H) = \mathrm{BC}(G,H) + \mathrm{EC}(G,H) - \mathrm{CE}(G,H), \end{equation}
(9)
the fraction of inward boundary edges minus the fraction of outward boundary edges.
It will be very useful for us to have an upper bound on \({\mathcal {E}}(G \cap H) - {\mathcal {E}}(G)\) , the change in \({\mathcal {E}}(G)\) when we intersect \(G\) with \(H\) (note that this quantity can be either positive or negative). The following fact is immediate from the definitions:
Fact 7.6 (Change in Boundary Size).
If \(G \setminus H\) is a cap, then
\begin{equation} {\mathcal {E}}(G \cap H) - {\mathcal {E}}(G) = \mathrm{BC}(G,H) -\mathrm{EC}(G,H) -\mathrm{CE}(G,H). \end{equation}
(10)
Comparing Equations (10) and (9), we plainly have:
Fact 7.7.
\({\mathcal {E}}(G \cap H) - {\mathcal {E}}(G) \le {\vec{\mathcal {E}}}(G,H)\) .
To get a quantitative bound, we have the following lemma:
Lemma 7.8.
For any cap \(C = G \setminus H\) ,
\[{\vec{\mathcal {E}}}(G,H) \le \frac{U(\mathrm{vol}(C))}{\sqrt {n}},\]
where \(\mathrm{vol}(C) = |C|/2^n\) and \(U\) denotes the function \(U(p) = 2p\sqrt {2\ln (1/p)}\) .
Proof.
This is a basic fact in analysis of Boolean functions. Identifying \(C\) with its indicator function \(C : \lbrace -1,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) , we have \(\mathrm{vol}(C) = {{\bf E}}[{C(\boldsymbol {u})}]\) and
\[{\vec{\mathcal {E}}}(G,H) = 2\mathop {\mathop {{\bf E}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}}_{\boldsymbol {j}\sim [n]}[C(\boldsymbol {u})\sigma _{\boldsymbol {j}}\boldsymbol {u}_{\boldsymbol {j}}] = \frac{2}{n} \sum _{j=1}^n \sigma _j \widehat{C}(\lbrace j\rbrace),\]
where \(\widehat{C}(\lbrace j\rbrace)\) denotes the degree-1 Fourier coefficient of \(C\) corresponding to coordinate \(j\) . It is well known and elementary that for \(F : \lbrace -1,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) with \({{\bf E}}[F] = p\) , one has \(\sum _{j=1}^n |\widehat{F}(\lbrace j\rbrace)| \le O(p \sqrt {\ln (1/p)})\sqrt {n}\) ; see, e.g., Kane’s paper [26, Lemma 6] for the short proof. For the sake of an asymptotically tight constant, we can use the Cauchy–Schwarz inequality and the Fourier “Level-1 Inequality” [7, 21, 60] to get
\[\sum _{j=1}^n \sigma _j \widehat{C}(\lbrace j\rbrace) \le \sqrt {n}\cdot \sqrt {\sum _{j=1}^n \widehat{C}(\lbrace j\rbrace)^2} \le \sqrt {n}\cdot p \sqrt {2\ln (1/p)}. \qedhere\]

7.1.1 Reproving the Main Result of Reference [26].

We can now reprove the main result of Reference [26] (which we will use later):
Theorem 7.9 ([26]).
Let \(F\) be the intersection of \(m \ge 2\) unate functions over \(\lbrace -1,1\rbrace ^n\) . Then
\begin{equation} {\mathcal {E}}(F) \le \frac{2\sqrt {2\ln m}}{\sqrt {n}}. \end{equation}
(11)
(Equivalently, an intersection of \(m \ge 2\) unate functions has average sensitivity at most \(2\sqrt {2\ln m}\sqrt {n}\) .)
Proof.
Let \(H_1, \ldots , H_m\) be unate functions and define associated caps
\begin{equation} C_i = (H_1 \cap \cdots \cap H_{i-1}) \setminus H_i, \end{equation}
(12)
with \(C_1 = \lbrace -1,1\rbrace ^n \setminus H_1\) (i.e., \(H_0=\lbrace -1,1\rbrace ^n\) ). Letting \(F = H_1 \cap \cdots \cap H_m\) , we have that the complement \(F^c = \lbrace -1,1\rbrace ^n\setminus F\) of \(F\) can be expressed as a disjoint union of caps:
\begin{equation} F^c = C_1 \sqcup \cdots \sqcup C_m. \end{equation}
(13)
For intuition, we may think of the intersection of \(m\) unate sets \(F\) as being formed in \(m\) stages, starting with \(\lbrace -1,1\rbrace ^n\) and successively intersecting with each \(H_i\) ; given this interpretation, \(C_i\) is the portion of \(\lbrace -1,1\rbrace ^n\) that is removed in the \(i\) th stage. With this notation in hand, we have that
\begin{align*} {\mathcal {E}}(F) &= \sum _{i=1}^m {\mathcal {E}}((H_1 \cap \cdots \cap H_{i-1}) \cap H_i) - {\mathcal {E}}(H_1 \cap \cdots \cap H_{i-1}) \\ &\le \sum _{i=1}^m {\vec{\mathcal {E}}}(H_1 \cap \cdots \cap H_{i-1}, H_i) \qquad \qquad \qquad {\rm (Fact~7.7 with G = H_1\cap \cdots \cap H_{i-1} and H = H_i)} \\ &\le \frac{1}{\sqrt {n}} \cdot \sum _{i=1}^m U(\mathrm{vol}(C_i)) \qquad \qquad \qquad {\rm (Lemma~7.8)}. \end{align*}
Finally,
\[\sum _{i=1}^m U(\mathrm{vol}(C_i)) \le m \cdot U{\lparen }*{\rparen }{\frac{\sum _{i=1}^m \mathrm{vol}(C_i)}{m}} = m \cdot U{\lparen }*{\rparen }{\frac{\mathrm{vol}(F^c)}{m}} \le m \cdot U(\tfrac{1}{m}) = 2\sqrt {2\ln m},\]
where we used concavity of \(U\) , then Equation (13), then the fact that \(U\) is increasing on \([0,1/2]\) . This completes the proof of Theorem 7.9.□

7.2 A Littlewood–Offord Theorem for Polytopes (Theorem 2.2)

In this section we prove Theorem 2.2:
Theorem 7.10.
For all \(m \ge 2\) , \(b \in \mathds R^m\) and \(A \in \mathds R^{m\times n}\) with \(|A_{ij}| \ge 1\) for all \(i \in [m]\) and \(j \in [n]\) ,
\[\mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n} {\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game _{-2}\mathcal {O}_b} \le \frac{5\sqrt {2\ln m}}{\sqrt {n}}.\]
We note in passing that the anticoncentration bound given by Theorem 2.2 is best possible up to constant factors. Indeed, our matching lower bound applies even to the stricter event of falling on the surface of \(\mathcal {O}_b\) :
Claim 7.11 (Optimality of Theorem 2.2).
For \(2 \le m \le 2^n\) , there is a matrix \(A \in {\lbrace -1,1\rbrace ^{m \times n}}\) and a vector \(b \in \mathds R^m\) such that
\[\mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n} {\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game \mathcal {O}_b} = \Omega {\lparen }*{\rparen }{\frac{\sqrt {\ln m}}{\sqrt {n}}}.\]
We prove Claim 7.11 in Appendix B.

7.2.1 Proof of Theorem 2.2.

As mentioned at the beginning of this section, we will obtain Theorem 2.2 as a corollary of a more general result about intersections of unate functions. Let \(H_1,\ldots ,H_m \subseteq \lbrace -1,1\rbrace ^n\) be unate sets, \(m \ge 2\) , and further suppose that we have additional unate sets \(\overline{H}_1, \ldots , \overline{H}_m\) such that \(H_i \subseteq \overline{H}_i\) for all \(i\) . (For intuition, it may be helpful to think of \(H_i\) as the “interior” of \(\overline{H}_i\) ; see the proof of Theorem 2.2 using Lemma 7.13 just below for a typical example of sets \(H_i\) and \(\overline{H}_i\) .) We define the following subsets of \(\lbrace -1,1\rbrace ^n\) :
\begin{align*} F &= \overline{H}_1 \cap \cdots \cap \overline{H}_m \\ F^\circ &= H_1 \cap \cdots \cap H_m \qquad \qquad \qquad {\rm {interior of F}}\\ \partial F &= F \setminus F^\circ \qquad \qquad \qquad {\rm {boundary of F}}\\ F^c &= \lbrace -1,1\rbrace ^n\setminus F \qquad \qquad \qquad {\rm {exterior of F}}\\ \partial H_i &= \overline{H}_i \setminus H_i \text{ (for each $i \in [m]$).} \qquad \qquad \qquad {\rm {boundary of \overline{H}_i}} \end{align*}
Definition 7.12 (Thin Sets).
We say that \(\partial H_i\) is thin if it does not contain any induced edges of the hypercube.
Lemma 7.13.
If \(\partial H_i\) is thin for each \(i \in [m]\) , then \(\mathrm{vol}(\partial F) \le \frac{5\sqrt {2\ln m}}{\sqrt {n}}\) .
Proof of Theorem 2.2 Assuming Lemma 7.13
Fix any \(b \in \mathds R^m\) and \(A \in \mathds R^{m\times n}\) such that \(|A_{ij}| \ge 1\) for all \(i \in [m]\) and \(j \in [n]\) , and let
\[\overline{H}_i = \big \lbrace x \in {\lbrace -1,1\rbrace ^n}:A^i \cdot x \le b_i\big \rbrace , \quad \quad H_i = \big \lbrace x \in {\lbrace -1,1\rbrace ^n}:A^i \cdot x \le b_i - 2\big \rbrace ,\]
so
\begin{align*} \partial H_i &= \big \lbrace x \in \lbrace -1,1\rbrace ^n:b_i - 2 \lt A^i \cdot x \le b_i \big \rbrace \quad \text{and}\\ \partial F &= \big \lbrace x \in \lbrace -1,1\rbrace ^n:Ax \le b\ \& \ A^i \cdot x \gt b_i-2 \text{ for some $i\in [m]$}\big \rbrace \\ &= {\big \lbrace x \in \lbrace -1,1\rbrace ^n:Ax \in \Game _{-2}\mathcal {O}_b \big \rbrace }. \end{align*}
Since \(|A_{ij}| \ge 1\) for all \(i,j\) , it follows that each \(\partial H_i\) is thin, and hence Lemma 7.13 directly gives Theorem 2.2.□
The rest of this section will be devoted to the proof of Lemma 7.13. Recalling that \(F^\circ\) is called the interior of \(F\) and \(\partial F\) is called the boundary of \(F\) , we say that an edge in the hypercube is boundary-to-interior if it has one endpoint in \(\partial F\) and the other endpoint in \(F^\circ\) , and we write \(\nu _{BI}\) for the fraction of all edges that are of this type. We similarly define boundary-to-exterior edges and \(\nu _{BE}\) , with \(F^c\) . Note that every boundary-to-interior edge is a boundary edge for \(F^\circ = H_1 \cap \cdots \cap H_m\) , which is an intersection of \(m\) unate sets. By applying Theorem 7.9 to \(F^\circ\) , we get that
\begin{equation} \nu _{BI} \le \frac{2\sqrt {2\ln m}}{\sqrt {n}}. \end{equation}
(14)
Similarly, every boundary-to-exterior edge is a boundary edge for \(F = \overline{H}_1 \cap \cdots \cap \overline{H}_m\) ; applying Theorem 7.9 to this intersection yields
\begin{equation} \nu _{BE} \le \frac{2\sqrt {2\ln m}}{\sqrt {n}}. \end{equation}
(15)
Next, we bound the fraction of edges that have both endpoints in \(\partial F\) and go between “two different parts of \(\partial F\) . More precisely, for \(x \in \partial F\) , define \(i^\star (x)\) to be the least \(i\) for which \(x \in \partial H_i\) (equivalently, the least \(i\) for which \(x \not\in H_i\) ). We say that an edge \({\lbrace x,y\rbrace }\) is boundary-to-boundary \(^{\prime }\) if \(x, y \in \partial F\) but \(i^\star (x) \ne i^\star (y)\) ; we write \(\nu _{BB^{\prime }}\) for the fraction of such edges.
Observation 7.14.
If every \(\partial H_i\) is thin, then every edge with both endpoints in \(\partial F\) is boundary-to-boundary \(^{\prime }\) . In this case, \(\nu _{BI} + \nu _{BE} + \nu _{BB^{\prime }}\) is exactly the fraction of edges in the cube that touch \(\partial F\) , which in turn is an upper bound on \(\mathrm{vol}(\partial F)\) .
Thus, Lemma 7.13 follows from Equations (14) and (15) and the following claim:
Claim 7.15 (Boundary-to-boundary \(^{\prime }\) Edges)
\(\displaystyle \nu _{BB^{\prime }} \le \frac{\sqrt {2\ln m}}{\sqrt {n}}\) .
Proof.
We define the caps \(C_1, \ldots , C_m\) with respect to the \(H_i\) ’s as in Equation (12) in the proof of Theorem 7.9. Subtracting Equation (10) from Equation (9) for each \(C_i\) and summing over \(i\in [m]\) ,
\begin{align*} 2\sum _{i=1}^m \mathrm{EC}(H_1 \cap \cdots \cap H_{i-1}, H_i) &= \sum _{i=1}^m {\vec{\mathcal {E}}}(H_1 \cap \cdots \cap H_{i-1}, H_i) \\ & \ \ \ - \bigg (\sum _{i=1}^m {\mathcal {E}}((H_1\cap \cdots \cap H_{i-1})\cap H_i) - {\mathcal {E}}(H_1 \cap \cdots \cap H_{i-1}) \bigg) \\ &= \sum _{i=1}^m {\vec{\mathcal {E}}}(H_1 \cap \cdots \cap H_{i-1}, H_i) - {\mathcal {E}}(H_1\cap \cdots \cap H_m) \\ &= \sum _{i=1}^m {\vec{\mathcal {E}}}(H_1 \cap \cdots \cap H_{i-1}, H_i) - {\mathcal {E}}({F^\circ }). \end{align*}
Since \({\mathcal {E}}(F^\circ) \ge 0\) , it follows that
\begin{equation} \sum _{i=1}^m \mathrm{EC}(H_1 \cap \cdots \cap H_{i-1}, H_i) \le \frac{1}{2} \sum _{i=1}^m {\vec{\mathcal {E}}}(H_1 \cap \cdots \cap H_{i-1}, H_i) \le \frac{\sqrt {2\ln m}}{\sqrt {n}}, \end{equation}
(16)
where the derivation of the second inequality is exactly as in the proof of Theorem 7.9. By Equation (16), it suffices to show
\begin{equation} \nu _{BB^{\prime }} \le \sum _{i=1}^m \mathrm{EC}(H_1 \cap \cdots \cap H_{i-1}, H_i). \end{equation}
(17)
Let \(\lbrace x,y\rbrace\) be a boundary-to-boundary \(^{\prime }\) edge and assume without loss of generality that \(i^\star (x) \lt i^\star (y)\) . We now show that edge \(\lbrace x,y\rbrace\) contributes to \(\mathrm{EC}(H_1 \cap \cdots \cap H_{i^\star (y)-1}, H_{i^\star (y)})\) . For brevity, write \(G = H_1 \cap \cdots \cap H_{i^*(y)-1}\) , \(H = H_{i^*(y)}\) , and \(C = G \setminus H = C_{i^\star (y)}\) . Since \(x \in \partial H_{i^\star (x)} = \overline{H}_{i^\star (x)} \setminus H_{i^\star (x)}\) (in particular, \(x \not\in H_{i^\star (x)}\) ) and \(i^\star (x) \lt i^\star (y),\) we have that \(x \not\in G\) . However, \(y \in G \setminus H = C\) by definition of \(i^\star (y)\) . Since \(x \notin G\) and \(y \in G \setminus H\) , we conclude that indeed \({\lbrace x,y\rbrace } \in \mathrm{EC}(G,H),\) as claimed.□
This completes the proof of Lemma 7.13 and hence Theorem 2.2.

7.3 A Robust Generalization of the Littlewood–Offord Theorem for Polytopes

In the previous section, we proved Theorem 2.2, which establishes anticoncentration of \(A\boldsymbol {u}\) under the assumption that all its entries have magnitude at least 1. The goal of this section is to prove the following robust generalization of Theorem 2.2:
Theorem 7.16.
Let \(A \in \mathds R^{m \times n}\) have the property that in every row, at least an \(\alpha\) fraction of the entries have magnitude at least \(\lambda\) . Then for any \(b \in \mathds R^m\) ,
\[{{\bf Pr}}{\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game _{-2\lambda }\mathcal {O}_b} \le \frac{5 \sqrt {2\ln m}}{\alpha \sqrt {n}}.\]
Recall that Theorem 2.2 followed as an easy consequence of the fact that \(\mathrm{vol}(\partial F) \le {\frac{5 \sqrt {2\log m}}{\sqrt {n}}}\) when all \(\partial H_i\) ’s are “thin” (Lemma 7.13). We slightly generalize this notion here.
Definition 7.17 (Semi-thin).
For \(\alpha \in [0,1]\) , say that \(\partial H_i\) is \(\alpha\) -semi-thin if the following holds: For each \(x \in \partial H_i\) , at least an \(\alpha\) fraction of its hypercube neighbors are outside \(\partial H_i\) . (Note that “1-semi-thin” is equivalent to “thin.”)
Example 7.18.
Suppose \(H = \lbrace x \in \lbrace -1,1\rbrace ^n: a \cdot x \le b_1\rbrace\) and \(\overline{H} = \lbrace x \in \lbrace -1,1\rbrace ^n: a \cdot x \le b_2\rbrace\) where \(b_1 \le b_2\) , so \(\partial H = \lbrace x \in \lbrace -1,1\rbrace ^n: b_1 \lt a \cdot x \le b_2\rbrace\) . If \(|a_j| \ge (b_2 - b_1)/2\) for at least an \(\alpha\) fraction of the coordinates \(j \in [n]\) , then \(\partial H\) is \(\alpha\) -semi-thin.
Theorem 7.16 follows as a direct consequence of the following lemma (by the same reasoning that derives Theorem 2.2 as a corollary of Lemma 7.13):
Lemma 7.19 (Robust Version of Lemma 7.13).
In the setup of Section 7.2.1, suppose each \(\partial H_i\) is \(\alpha\) -semi-thin. Then
\[\mathrm{vol}(\partial F) \le \frac{5 \sqrt {2\ln m}}{\alpha \sqrt {n}}.\]
Proof.
Our proof of Lemma 7.13 (a combination of Equation (14), Equation (15), and Claim 7.15) shows that
\begin{equation} \nu _{BI} + \nu _{BE} + \nu _{BB^{\prime }} \le \frac{5\sqrt {2\ln m}}{\sqrt {n}}. \end{equation}
(18)
However, in our current setting the left-hand side of the above is not a bound on \(\mathrm{vol}(\partial F)\) ; Observation 7.14 no longer holds and we now may have edges \((x,y)\) where \(i^{\star }(x) = i^{\star }(y)\) . Given an \(x \in \partial F\) and \(y\) a Hamming neighbor of \(x\) , we say that \(y\) is \(x\) -bad if \(y \in \partial F\) and \(i^\star (y) = i^\star (x)\) ; otherwise, we say that \(y\) is \(x\) -good. With this terminology, we can rewrite Equation (18) as
\begin{equation} {{\bf Pr}}{\lbrack }*{\rbrack }{\boldsymbol {u}\in \partial F \ \&\ \boldsymbol {u}^{\oplus \boldsymbol {j}} \text{ is $\boldsymbol {u}$-good}} \le \frac{5 \sqrt {2\ln m}}{\sqrt {n}}, \end{equation}
(19)
where \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) and \(\boldsymbol {j}\sim [n]\) are uniformly random, and \(\boldsymbol {u}^{\oplus \boldsymbol {j}}\) denotes \(\boldsymbol {u}\) with its \(\boldsymbol {j}\) th coordinate flipped. By the \(\alpha\) -semi-thin property, for any \(x \in \partial F\) , the fraction of \(\boldsymbol {j}\) ’s such that \(x^{\oplus \boldsymbol {j}}\) is \(x\) -good is at least \(\alpha\) . Therefore,
\begin{equation} {{\bf Pr}}{\lbrack }*{\rbrack }{\boldsymbol {u}\in \partial F \ \&\ \boldsymbol {u}^{\oplus \boldsymbol {j}} \text{ is $\boldsymbol {u}$-good}} \ge {{\bf Pr}}[\boldsymbol {u}\in \partial F] \cdot \alpha , \end{equation}
(20)
and the lemma follows by combining Equations (19) and (20).□

7.4 Proof of Theorem 7.1

In this section, we prove Theorem 7.1 using Lemma 7.19 established in the previous section. In more detail, we use a bound on the anticoncentration of \(A\boldsymbol {u}\) under the assumption that at least an \(\alpha\) fraction of entries of each row of \(A\) have magnitude at least \(\tau\) (given by Lemma 7.19) to establish a bound on the anticoncentration of \(A\boldsymbol {u}\) under the assumption that each of \(A\) ’s rows has a \(\tau\) -regular subvector of 2-norm 1 (Theorem 7.1).
The following result regarding \(\tau\) -regular linear forms is fairly standard:
Proposition 7.20.
Let \(w \in \mathds R^n\) be a \(\tau\) -regular vector with \(\Vert w\Vert _2 = 1\) . Let \(\boldsymbol {\pi }: [n] \rightarrow [B]\) be a random hash function that independently assigns each coordinate in \([n]\) to a uniformly random bucket in \([B]\) . For \(b \in [B]\) , write \(\boldsymbol {\sigma }_b^2 = \sum _{j \in \boldsymbol {\pi }^{-1}(b)} w_j^2\) , and say that bucket \(b\) is good if \(\boldsymbol {\sigma }_b^2 \gt \frac{1}{2B}\) . Assume \(B \le 1/\tau ^2\) . Then
\[{{\bf Pr}}{\lbrack }*{\rbrack }{\text{at most $\frac{B}{16}$ buckets $b \in [B]$ are good}} \le \exp {\lparen }*{\rparen }{-\frac{B}{64}}.\]
Proof.
Let \(\boldsymbol {X}_b = \mathds {1}[\boldsymbol {\sigma }_b^2 \gt \frac{1}{2B}]\) be the indicator that the \(b\) th bucket is good. Since \({{\bf E}}[\boldsymbol {\sigma }_b^2] = \frac{1}{B}\) and
\[{{\bf E}}[\boldsymbol {\sigma }_b^4] = {{\bf E}}{\lbrack }*{\rbrack }{{\lparen }*{\rparen }{\sum _{j=1}^n w_j^2 \mathds {1}[\boldsymbol {\pi }(j) = b]}^2} = \frac{1}{B} \sum _{j=1}^n w_j^4 + \frac{1}{B^2} \sum _{j \ne j^{\prime }} w_j^2 w_{j^{\prime }}^2 \le \frac{\tau ^2}{B} + \frac{1}{B^2} \le \frac{2}{B^2},\]
the Paley–Zygmund inequality implies that \({{\bf E}}[\boldsymbol {X}_b] = {{\bf Pr}}[\boldsymbol {\sigma }_b^2 \gt \frac{1}{2} {{\bf E}}[\boldsymbol {\sigma }_b^2]] \ge \frac{1}{8}\) .
The joint random variables \(\boldsymbol {\sigma }_1^2, \ldots , \boldsymbol {\sigma }_B^2\) are of “balls in bins” type (where the \(j\) th “ball” has “mass” \(w_j^2\) ) and are therefore negatively associated (see, e.g., Reference [11, Example 3.1]; the fact that the balls have different “masses” does not change the argument). Since \(\mathds {1}_{(\frac{1}{2B}, \infty)}\) is a nondecreasing function, it follows that the random variables \(\boldsymbol {X}_1, \ldots , \boldsymbol {X}_B\) are also negatively associated. Thus, we may apply the Chernoff bound to \(\sum _{k=1}^B \boldsymbol {X}_k\) , which has mean at least \(\frac{B}{8}\) . The result follows.□
Recall the following fact, which can also be easily proven using Paley–Zygmund (see, e.g., Proposition 3.7 of the full version of Reference [18]):
Fact 7.21.
For all \(w \in \mathds R^n\) and \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) , we have \({{\bf Pr}}[|w \cdot \boldsymbol {u}| \ge \frac{1}{2}\Vert w\Vert _2] \ge \frac{1}{16}\) .
We combine these as follows:
Proposition 7.22.
Let \(w \in \mathds R^n\) and assume that some subvector \(w^{\prime }\) of \(w\) is \(\tau\) -regular with \(\Vert w^{\prime }\Vert _2 = 1\) . Let \(\boldsymbol {\pi }: [n] \rightarrow [B]\) be as in Proposition 7.20, where \(B \le 1/\tau ^2\) . Let \(\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n\) , and define \(\overline{\boldsymbol {w}} \in \mathds R^B\) by \(\overline{\boldsymbol {w}}_b = \sum _{j \in \boldsymbol {\pi }^{-1}(b)} w_j \boldsymbol {u}_j\) . Call a bucket \(b \in [B]\) big if \(|\overline{\boldsymbol {w}}_b| \gt \frac{1}{2\sqrt {2B}}\) . Then
\[{{\bf Pr}}{\lbrack }*{\rbrack }{\text{fewer than $\frac{B}{512}$ buckets are big}} \le \exp {\lparen }*{\rparen }{-\frac{B}{2048}}.\]
Proof.
First apply Proposition 7.20 to \(w^{\prime }\) and observe that the presence of additional coordinates from \(w\) cannot harm “goodness.” Then apply Fact 7.21 to the good buckets. Each becomes “big” independently with probability at least \(\frac{1}{16}\) , and the proof follows from another Chernoff bound.□
We take \(B = \lfloor 1/\tau ^2 \rfloor\) in the above. This yields the following:
Corollary 7.23.
Assume \(A \in \mathds R^{m\times n}\) satisfies the following property: Each of its row vectors has a \(\tau\) -regular subvector of 2-norm 1. Fix \(B = \lfloor 1/\tau ^2 \rfloor\) and let \(\overline{\boldsymbol {A}} \in \mathds R^{m \times B}\) be the matrix obtained from \(A\) by randomly partitioning its columns into \(B\) buckets and adding them up with uniformly random \(\pm 1\) signs within each bucket. Say that a row of \(\overline{\boldsymbol {A}}\) is spread if at least a \(\frac{1}{512}\) -fraction of its entries exceed \(\frac{\tau }{2\sqrt {2}}\) . Then except with probability at most \(m \cdot \exp (-\Omega (1/\tau ^2))\) , all of \(\overline{\boldsymbol {A}}\) ’s rows are spread.

7.4.1 Proof of Theorem 7.1.

We can now prove Theorem 7.1, which we restate here for convenience:
Theorem 7.24.
Assume \(A \in \mathds R^{m\times n}\) satisfies the following property: Each of its row vectors has a \(\tau\) -regular subvector of 2-norm 1, where \(\tau\) is as set in Section 4.1. Then for all \(b \in \mathds R^m\) and \(\Lambda \ge \tau\) , we have
\[\mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n} {\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game _{\pm \Lambda } \mathcal {O}_{b}}= O{\lparen }*{\rparen }{\Lambda \sqrt {\log m}}.\]
Proof.
By union-bounding over \(2\lceil \Lambda /\tau \rceil\) choices of \(b\) , it suffices to prove the following: Whenever \(A \in \mathds R^{m \times n}\) has a \(\tau\) -regular subvector of 2-norm 1 in each row, it holds that \({{\bf Pr}}[A\boldsymbol {u}\in \Game _{-\tau }\mathcal {O}_b] \le O(\tau \sqrt {\log m})\) . Note that the distribution of \(A\boldsymbol {u}\) is the same as that of \(\overline{\boldsymbol {A}}\boldsymbol {u}^{\prime }\) , where \(\overline{\boldsymbol {A}}\) is as in Corollary 7.23, and \(\boldsymbol {u}^{\prime } \sim \lbrace -1,1\rbrace ^B\) is uniform. Thus, applying Corollary 7.23 and then Theorem 7.16 (with \(\alpha = \frac{1}{512}\) and \(\lambda = \frac{\tau }{2} \ge \frac{\tau }{2\sqrt {2}}\) ), we conclude that
\[{{\bf Pr}}{\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game _{-\tau }\mathcal {O}_b} = O{\lparen }*{\rparen }{\tau \sqrt {\log m}} + m \cdot \exp {\lparen }*{\rparen }{-\Omega (1/\tau ^2)}.\]
By our choice of \(\tau\) as set in Section 4.1, we get the desired overall bound of \(O(\tau \sqrt {\log m})\) and the proof is complete.□

8 Fooling Bentkus’s Mollifier

The main result of this section is the following theorem, which provides the second step of the two-step program described at the end of Section 6:
Theorem 8.1 ( \(\mathscr{G}\) Fools Bentkus’s Mollifier)
Let \(\mathscr{G}\) be our generator with parameters as given in Section 4.1, and likewise let \(\lambda \gt 0\) be as set in Section 4.1. For all \((k,\tau)\) -standardized matrices \(A \in \mathds R^{m\times n}\) and all \(b \in \mathds R^m\) ,
\[\Big | \mathop {{\bf E}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {u}) \big ] - \mathop {{\bf E}}_{\boldsymbol {z}\sim \mathscr{G}_{\mathrm{MZ}}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {z})\big ] \Big | = {O(\delta)}.\]
At a very high level, in line with the usual Lindeberg approach, Theorem 8.1 is proved by hybridizing between \(\boldsymbol {u}\) and \(\boldsymbol {z}\) via a sequence of intermediate distributions. In our setting there are \(L+1\) such distributions, the first of which is \(\boldsymbol {u}\) and the last of which is \(\boldsymbol {z}\) , and the \(\ell\) th of which may be viewed as “filling in buckets \(\ell ,\ldots ,L\) according to \(\boldsymbol {u}\) and filling in buckets \(1,\ldots ,\ell -1\) according to \(\boldsymbol {z}\) ,” where the \(L\) buckets correspond to the partition of \([n]\) induced by the choice of the random hash function in the Meka–Zuckerman generator.
In Section 8.1, we upper bound the error incurred by taking a single step through this sequence of hybrid distributions. The upper bound given there (see Lemma 8.3) has a first component corresponding to the terms of order \(0,\ldots ,d-1\) in a \((d-1)\) -st order Taylor expansion, and a second component corresponding to the error term in Taylor’s theorem. The first component is upper bounded in Section 8.1, and the second component is upper bounded in Section 8.2. Section 8.3 formalizes the hybrid argument and uses the results of these earlier subsections to establish Theorem 8.1.
Remark 8.2 (Head and Tail Matrices).
Recalling the definition of a \((k,\tau)\) -standardized matrix \(A\) (Definition 3.1), for every \(i \in [m]\) there is a partition \([n] = {\rm H}{\rm\small{EAD}}_i \sqcup {\rm T}{\rm\small{AIL}}_i\) such that \(|{\rm H}{\rm\small{EAD}}_i| \le k\) and \((A_i)_{{\rm T}{\rm\small{AIL}}_i}\) is \(\tau\) -regular with 2-norm \(\Vert (A_i)_{{\rm T}{\rm\small{AIL}}_i}\Vert _2\) equal to 1. Therefore, we may write \(A\) as \(H + T\) where
\[H_{ij} = A_{ij} \cdot \mathds {1}[\,j \in {\rm H}{\rm\small{EAD}}_i\,] \quad \text{and}\quad T_{ij} = A_{ij} \cdot \mathds {1}[\,j \in {\rm T}{\rm\small{AIL}}_i\,]\]
for all \(j\in [n]\) and \(i\in [m]\) . Note that every row of \(H\) is \(k\) -sparse, and every row of \(T\) is \(\tau\) -regular with 2-norm 1.

8.1 Single Swap in the Hybrid Argument

Lemma 8.3 (Error Incurred by a Single Swap).
Fix \(B \subseteq [n]\) . Let \(H^B, T^B \in \mathds R^{m\times B}\) , where every row of \(H^B\) is \(w\) -sparse and every row of \(T^B\) has 2-norm at most 1. Let \(\boldsymbol {u}, \boldsymbol {y}\) be random variables over \(\lbrace -1,1\rbrace ^B\) , where \(\boldsymbol {u}\) is uniform and \(\boldsymbol {y}\) \(\delta _{\mathrm{CNF}}\) -fools the class of width- \(w\) CNFs. For all \(b \in \mathds R^m\) , \(\lambda \gt 0\) , and all integers \(d\ge {2}\)
\begin{align} &\big | {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {u}+ T^B\boldsymbol {u})\big ] - {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {y}+ T^B\boldsymbol {y})\big ]\big | \\ &= \delta _{\mathrm{CNF}} \cdot m^{d-1} \cdot O_d{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^{d-1} + O_d{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d \left({{\bf E}}\big [\Vert T^B\boldsymbol {u}\Vert _\infty ^d\big ] + {{\bf E}}\big [\Vert T^B\boldsymbol {y}\Vert _\infty ^d\big ]\right).\nonumber \end{align}
(21)
As we will see later, Equation (21) is a useful bound because we can (and will) take \(\delta _{\mathrm{CNF}}\) to be very small, and when we apply Lemma 8.3, we will be able to ensure that both expectations on the right-hand side of Equation (21) are small as well.
The main ingredient in the proof of Lemma 8.3 is the following claim:
Claim 8.4.
For all integers \(c \ge 1\) and \(\alpha \in \mathds N^m\) such that \(|\alpha | = c\) ,
\begin{equation} \Big | \mathop {{\bf E}}\big [ \partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {u})\cdot (T^B\boldsymbol {u})^\alpha \big ] - \mathop {{\bf E}}\big [ \partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {y})\cdot (T^B\boldsymbol {y})^\alpha \big ]\Big | = \delta _{\mathrm{CNF}}\cdot O_c{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^c. \end{equation}
(22)
Remark 8.5.
Recalling the discussion of Step 1 in Section 2.2, we remark that Claim 8.4 provides the key ingredient of the arguments sketched there. This claim plays an essential role in enabling us to get a strong bound on the magnitude of the difference of two expectations (which was denoted “ \(|{{\bf E}}[\Upsilon (\boldsymbol {v}+ {\boldsymbol {\Delta }})] - {{\bf E}}[\Upsilon (\boldsymbol {v}^{\prime } + {\boldsymbol {\Delta }}^{\prime })]|\) ” in Section 2.2 and corresponds precisely to the left-hand side of Lemma 8.3 above) through an application of Taylor’s theorem around two different points. As will be seen in Section 8.1.1, the proof of Claim 8.4 exploits the product structure of \(\widetilde{\mathcal {O}}_b\) by using pseudorandom generators for small-width CNF formulas.
Before proving Claim 8.4, we observe that Lemma 8.3 follows as a consequence:
Proof of Lemma 8.3 Assuming Claim 8.4
By the multidimensional Taylor expansion (Fact 3.2) applied twice to \(\widetilde{\mathcal {O}}_{b,\lambda }\) , we have
\begin{align} \text{(21)} &\le \Bigg | \sum _{0\le |\alpha | \le d-1} \frac{1}{\alpha !}\mathop {{\bf E}}\big [ \partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {u})\cdot (T^B\boldsymbol {u})^\alpha \big ] - \frac{1}{\alpha !}\mathop {{\bf E}}\big [ \partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {y})\cdot (T^B\boldsymbol {y})^\alpha \big ] \Bigg | \nonumber\\ &\ \ \ + {{\bf E}}\Big [{\vert }*{\vert }{\mathrm{err}(H^B\boldsymbol {u}, T^B\boldsymbol {u})}\Big ] + {{\bf E}}\Big [{\vert }*{\vert }{\mathrm{err}(H^B\boldsymbol {y}, T^B\boldsymbol {y})}\Big ] \nonumber\\ &\le \sum _{0\le |\alpha |\le d-1} \big | \mathop {{\bf E}}\big [ \partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {u})\cdot (T^B\boldsymbol {u})^\alpha \big ] - \mathop {{\bf E}}\big [ \partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(H^B\boldsymbol {y})\cdot (T^B\boldsymbol {y})^\alpha \big ]\big | \\ &\ \ \ +\sup _{v \in \mathds R^m}\Bigg \lbrace \sum _{|\alpha |=d} |\partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(v)|\Bigg \rbrace \cdot \left({{\bf E}}\big [\Vert T^B\boldsymbol {u}\Vert _\infty ^d\big ] + {{\bf E}}\big [\Vert T^B\boldsymbol {y}\Vert _\infty ^d\big ]\right). \nonumber \end{align}
(23)
By Claim 8.4, each of the \(O(m^{d-1})\) summands of Equation (23) is at most \(\delta _\mathrm{CNF}\cdot O(\sqrt {n}/\lambda)^{d-1}\) . This along with the bound on \(\widetilde{\mathcal {O}}_{b,\lambda }\) ’s derivatives given by Theorem 6.5,
\[\sup _{v \in \mathds R^m}\Bigg \lbrace \sum _{|\alpha |=d} |\partial _\alpha \widetilde{\mathcal {O}}_{b,\lambda }(v)|\Bigg \rbrace = O_d{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d\]
yields Lemma 8.3.□

8.1.1 Proof of Claim 8.4.

Definition 8.6.
We say that a function \(\xi : \lbrace -1,1\rbrace ^B \rightarrow \mathds R\) is Boolean if its range is contained in \(\lbrace 0,1\rbrace\) . For \(\xi _1,\ldots ,\xi _m : \lbrace -1,1\rbrace ^B \rightarrow \mathds R\) , we say that the associated product function \(\Xi = \prod _{i \in [m]} \xi _i\) is a Boolean product function in case all the \(\xi _i\) ’s are Boolean.
Definition 8.7.
We say that \(\xi\) is a weight- \(W\) combination of Boolean functions if it is expressible as a linear combination \(\xi = \sum _\ell c_\ell \xi _\ell\) where each \(\xi _\ell\) is a Boolean function and where \(\sum _\ell |c_\ell | \le W\) . Likewise, \(\Xi\) is a weight- \(W\) combination of Boolean product functions if it is expressible as a linear combination \(\Xi = \sum _\ell c_\ell \Xi _\ell\) where each \(\Xi _\ell\) is a Boolean product function and where \(\sum _\ell |c_\ell | \le W\) .
The following facts are easy to establish:
Fact 8.8.
(1)
A function \(\xi : \lbrace -1,1\rbrace ^B \rightarrow [0,1]\) is a weight-1 combination of Boolean functions.
(2)
A function \(\xi : \lbrace -1,1\rbrace ^B \rightarrow [-W,W\,]\) is a weight- \((2W)\) combination of Boolean functions.
(3)
A weight- \(W_1\) combination of weight- \(W_2\) combinations of Boolean functions is a weight- \((W_1W_2)\) combination of Boolean functions.
(4)
If \(\xi _1\) and \(\xi _2\) are weight- \(W_1\) and weight- \(W_2\) combinations of Boolean product functions, respectively, then \(\xi _1\cdot \xi _2\) is a weight- \((W_1W_2)\) combination of Boolean product functions.
We are now ready to prove Claim 8.4.
Proof of Claim 8.4
We define the function \(G_\alpha : \lbrace -1,1\rbrace ^B \rightarrow \mathds R\) ,
\begin{align} G_\alpha (x) \coloneqq &\ \partial _{\alpha }\widetilde{\mathcal {O}}_{b,\lambda }(H^Bx) \cdot (T^Bx)^\alpha \nonumber\\ =&\ \Bigg (\prod _{i \notin S}{\widetilde{\mathds {1}}}_{b_i,\lambda }(H^B_i x) \prod _{i \in S} {\widetilde{\mathds {1}}}^{(\alpha _i)}_{b_i,\lambda }(H^B_i x)\Bigg) \cdot \prod _{i \in S} (T^B_i x)^{\alpha _i}, \end{align}
(24)
where \(S\) denotes \(\mathrm{supp}(\alpha) = \lbrace i \in [m] :\alpha _i \gt 0\rbrace\) . (Equation (24) crucially relies on the product structure of \(\widetilde{\mathcal {O}}_{b,\lambda } : \mathds R^m \rightarrow (0,1)\) ; recall Equation (5).)
Note that Claim 8.4 is equivalent to the claim that \(\boldsymbol {y}\) \(\delta\) -fools \(G_\alpha\) for \(\delta = \delta _\mathrm{CNF}\cdot O_c(\sqrt {n}/\lambda)^c\) . We analyze the three types of functions in Equation (24) in turn:
Recalling the assumptions of Lemma 8.3, by Item 1 of Fact 8.8, the function \(x \mapsto \widetilde{\mathds {1}}_{b_i,\lambda }(H^B_i x)\) is a weight-1 combination of Boolean functions. Furthermore, since \(|\mathrm{supp}(H^B_i)| \le w\) , it is in fact a weight-1 combination of Boolean \(w\) -juntas.
Similarly, by Item 2 of Fact 8.8, the function \(x \mapsto {\widetilde{\mathds {1}}}^{(\alpha _i)}_{b_i,\lambda }(H^B_i x)\) is a weight- \((2\Vert {\widetilde{\mathds {1}}}_{b_i,\lambda }^{(\alpha _i)}\Vert _\infty)\) combination of Boolean \(w\) -juntas.
Since \(\Vert T^B_i \Vert _1 \le \sqrt {B} \cdot \Vert T^B_i \Vert _2 \le \sqrt {B} \le \sqrt {n}\) and \(x_j \in \lbrace -1,1\rbrace\) for all \(j\in B\) , by Items 2 and 3 of Fact 8.8 the function \(x \mapsto T^B_ix\) is a weight- \((2\sqrt {n})\) combination of Boolean functions. Furthermore, it is a weight- \((2\sqrt {n})\) combination of Boolean 1-juntas.
Combining the above with Item 4 of Fact 8.8, it follows that \(G_\alpha : \lbrace -1,1\rbrace ^B \rightarrow \mathds R\) is a weight- \(W\) combination of Boolean product functions \(\Xi : \lbrace -1,1\rbrace ^B \rightarrow \lbrace 0,1\rbrace\) , where
\begin{align*} W &= \left(\prod _{i \in S} 2\,\Vert {\widetilde{\mathds {1}}}_{b_i,\lambda }^{(\alpha _i)}\Vert _\infty \right) \cdot \left(\prod _{i \in S} (2\sqrt {n})^{\alpha _i} \right) \\ &= \left(\prod _{i \in S} O_{\alpha _i}{\lparen }*{\rparen }{\frac{1}{\lambda ^{\alpha _i}}}\right) \cdot \left(\prod _{i \in S} (2\sqrt {n})^{\alpha _i}\right) \qquad \qquad \qquad {\rm (Fact~6.4)} \\ &= O_c{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^c. \qquad \qquad \qquad {\rm (|\alpha |= \alpha _1 + \cdots + \alpha _m = c)} \end{align*}
Furthermore, every \(\Xi\) in this combination is the product of \(m\) Boolean \(w\) -juntas and \(|\alpha |\) Boolean 1-junta(s). Since each such \(\Xi\) is computable by a width- \(w\) CNF, and \(\boldsymbol {y}\) \(\delta _\mathrm{CNF}\) -fools the class of width- \(w\) CNFs, we conclude that \(\boldsymbol {y}\) \(\delta\) -fools \(G_\alpha\) where \(\delta = \delta _\mathrm{CNF}\cdot W\) . This completes the proof of Claim 8.4.□

8.2 Bounding the Error Terms

We will use the following technical result:
Claim 8.9 (Rosenthal’s Inequality).
Let \(\beta \in [0,1]\) and let \(\boldsymbol {x}_1, \ldots , \boldsymbol {x}_n\) be independent \(\lbrace 0,\pm 1\rbrace\) -valued random variables, each being 0 with probability \(1-\beta\) and \(\pm 1\) with probability \(\beta /2\) each. Let \(w \in \mathds R^n\) be a \(\tau\) -regular vector of 2-norm 1. Then for any \(q \ge 2\) ,
\[{{\bf E}}[|w \cdot \boldsymbol {x}|^q] = O\left(q \tau \cdot (\beta /\tau ^2)^{1/q} + \sqrt {q} \sqrt {\beta }\right)^q.\]
Of course, if \(q\) is an even integer, then the above continues to hold even if \(\boldsymbol {x}_1, \ldots , \boldsymbol {x}_n\) are merely \(q\) -wise independent.
Proof.
This is an almost immediate consequence of a refinement of an inequality due to Rosenthal [53]. The exact version we use is due to Nagaev and Pinelis [41] (see also Reference [[51], (4)]); in our context, it states that
\begin{align*} {{\bf E}}[|w \cdot \boldsymbol {x}|^q] &\le 2^{O(q)} \cdot \left(q^q \sum _{j=1}^n {{\bf E}}[|w_j \boldsymbol {x}_j|^q] + q^{q/2}\left(\sum _{j=1}^n {{\bf E}}[(w_j \boldsymbol {x}_j)^2] \right)^{q/2}\right)\\ &\le 2^{O(q)} \cdot \left(q^q \beta \sum _{j=1}^n |w_j|^q + (q\beta)^{q/2} \right). \end{align*}
Since \(\beta \sum _j |w_j|^q \le \beta \left(\sum _j w_j^2 \right) \cdot \tau ^{q-2} = \beta \tau ^{q-2}\) , using \(x^q+y^q \le (x+y)^q\) for positive \(x,y\) we get the claimed bound.□
The following lemma will be used to bound the expectations on the right-hand side of Equation (21):
Lemma 8.10.
Let \(L, r_\mathrm{hash},r_\mathrm{bucket},\) and \(\tau\) be as set in Section 4.1. Let \(\boldsymbol {h}: [n]\rightarrow [L]\) be an \(r_\mathrm{hash}\) -wise uniform hash function, and fix a bucket \(\ell \in [L]\) . Let \(\boldsymbol {y}\sim \lbrace -1,1\rbrace ^n\) be an \(r_\mathrm{bucket}\) -wise uniform random variable. Let \(T \in \mathds R^{m\times n}\) be a \(\tau\) -regular matrix in which each row has 2-norm 1. Then for all integers \(d \ge 2\) ,
\[\mathop {{\bf E}}_{\boldsymbol {h},\boldsymbol {y}}\Big [ \Vert T^{\boldsymbol {h}^{-1}(\ell)} \boldsymbol {y}_{\boldsymbol {h}^{-1}(\ell)}\Vert ^d_\infty \Big ] = O_d{\lparen }*{\rparen }{\tau \log m + \sqrt {(\log m)/L}}^d.\]
Proof.
Let \(q\) be the largest even integer smaller than both \(r_\mathrm{hash}\) and \(r_\mathrm{bucket}\) ; note that \(q = \Theta (\log (m/\delta))\) . For notational brevity, we let \(\boldsymbol {X}\) denote the \(\mathds R^m\) -valued random variable \(\boldsymbol {X}\coloneqq T^{\boldsymbol {h}^{-1}(\ell)} \boldsymbol {y}_{\boldsymbol {h}^{-1}(\ell)}\) . Since \(r_\mathrm{bucket},r_\mathrm{hash}\ge q\) , we can express \(\boldsymbol {X}\) as \(\sum _{j=1}^n \boldsymbol {x}_j T^j\) where \(\boldsymbol {x}_1,\ldots ,\boldsymbol {x}_n \sim \lbrace -1,0,1\rbrace\) are \(q\) -wise independent random variables distributed as in Claim 8.9, with \(\beta = 1/L\) .
Since \(q \gt d\) for sufficiently large \(m\) , we have that
\[{{\bf E}}{\lbrack }*{\rbrack }{\Vert \boldsymbol {X}\Vert _\infty ^d} \le {{\bf E}}{\lbrack }*{\rbrack }{\Vert \boldsymbol {X}\Vert _q^d} \le {{\bf E}}{\lbrack }*{\rbrack }{\Vert \boldsymbol {X}\Vert _q^{q}}^{d/q} = {\lparen }*{\rparen }{\sum _{i=1}^m {{\bf E}}[\boldsymbol {X}_i^q]}^{d/q}.\]
Applying Claim 8.9 to bound each \({{\bf E}}[\boldsymbol {X}^q_i]\) , we conclude that
\begin{align*} {{\bf E}}{\lbrack }*{\rbrack }{\Vert \boldsymbol {X}\Vert _\infty ^d} &= {\lparen }*{\rparen }{m\cdot O{\lparen }*{\rparen }{q\tau \cdot (1/L\tau ^2)^{1/q} + \sqrt {q/L}}^q}^{d/q} \\ &= m^{d/q} \cdot O{\lparen }*{\rparen }{q\tau + \sqrt {q/L}}^{d} \\ &= O_d{\lparen }*{\rparen }{\tau \log (m/\delta) + \sqrt {\frac{\log (m/\delta)}{L}}}^d, \end{align*}
where the second inequality uses the fact that \((\frac{1}{L\tau ^2})^{1/q} = (\frac{\delta }{\log m})^{O(1/q)} = O(1)\) . This completes the proof of Lemma 8.10.□

8.3 Proof of Theorem 8.1: The Hybrid Argument

In this subsection, we put together the two main results of the two previous subsections (Lemma 8.3 and Lemma 8.10) to prove Theorem 8.1.
Recalling Remark 8.2, we can write \(A\) as \(H+T\) , where every row of \(H\) is \(k\) -sparse and every row of \(T\) is \(\tau\) -regular with 2-norm 1. Let us say that a hash \(h : [n] \rightarrow [L]\) is \(H\) -good if
\begin{equation} | h^{-1}(\ell) \cap \mathrm{supp}(H_i)| \le w \coloneqq {\frac{2k}{L}} \end{equation}
(25)
for all buckets \(\ell \in [L]\) and rows \(i\in [m]\) . Equivalently, for all \(\ell \in [L]\) , every row of the the submatrix \(H^{h^{-1}(\ell)}\) is \(w\) -sparse.
Proposition 8.11 (Even Distribution of Head Variables).
There is a universal constant \(C_1 \gt 0\) such that the following holds: If \(\boldsymbol {h}: [n] \rightarrow [L]\) is \(r_\mathrm{hash}\) -wise uniform where \(r_\mathrm{hash}\ge C_1 \log (Lm/\delta)\) , then
\[{{\bf Pr}}{\lbrack }*{\rbrack }{\text{$\boldsymbol {h}$ is not $H$-good}} \le \delta .\]
Proof.
Fix any \(\ell \in [L]\) and \(i \in [m].\) The quantity \(|\boldsymbol {h}^{-1}(\ell) \cap \mathrm{supp}(H_i)|\) is a sum of \(|\mathrm{supp}(H_i)| \le k\) many \(r_{\mathrm{hash}}\) -wise independent \(\lbrace 0,1\rbrace\) -valued random variables, each of which takes the value 1 with probability \(1/L.\) To bound the probability that \(|\boldsymbol {h}^{-1}(\ell) \cap \mathrm{supp}(H_i)|\) is larger than \(w\) , we apply the well-known tail bounds for sums of limited-independence random variables due to Schmidt, Siegel, and Srinivasan [54], specifically their Theorem 5(I)(a). Taking the “ \(\delta\) ” of their paper to be 1 and observing that their “ \(\mu\) ” is our \(k/L\) and their “ \(k\) ” is our \(r_{\mathrm{hash}}= \Theta (\log (Lm/\delta))\) , we get that \({{\bf Pr}}[|\boldsymbol {h}^{-1}(\ell) \cap \mathrm{supp}(H_i)| \gt w] \le \delta /(Lm).\) The proposition follows by a union bound over all \(\ell \in [L]\) and \(i \in [m].\)
We are now ready to prove Theorem 8.1, which we restate here for convenience:
Theorem 8.12.
Let \(\mathscr{G}\) be our generator with parameters as given in Section 4.1, and likewise let \(\lambda \gt 0\) be as set in Section 4.1. For all \((k,\tau)\) -standardized matrices \(A \in \mathds R^{m\times n}\) and all \(b \in \mathds R^m\) ,
\[\Big | \mathop {{\bf E}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {u}) \big ] - \mathop {{\bf E}}_{\boldsymbol {z}\sim \mathscr{G}_{\mathrm{MZ}}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {z})\big ] \Big | = {O(\delta)}.\]
Proof.
Let \(\boldsymbol {h}, \boldsymbol {y}^1,\ldots ,\boldsymbol {y}^L, \tilde{\boldsymbol {y}}^1,\ldots ,\tilde{\boldsymbol {y}}^L\) , \(\breve{\boldsymbol {y}}\) , and \(\boldsymbol {y}^\star\) be the random hash function and random variables associated with our generator \(\mathscr{G}\) , as defined in Definition 4.2. Recall that a draw from \(\boldsymbol {z}\sim \mathscr{G}\) is \(\boldsymbol {z}\coloneqq \breve{\boldsymbol {y}}\oplus \boldsymbol {y}^\star\) . We will show that in fact \(\breve{\boldsymbol {y}}\) alone satisfies:
\begin{equation} \Big | \mathop {{\bf E}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {u}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\breve{\boldsymbol {y}})\big ] \Big | = {O(\delta)}. \end{equation}
(26)
Since \(\boldsymbol {y}^{\star }\) and \(\breve{\boldsymbol {y}}\) are independent, Theorem 8.1 follows as a consequence of Equation (26).
We recall the definition of \(\breve{\boldsymbol {y}}\) :
\[\breve{\boldsymbol {y}}_{\boldsymbol {h}^{-1}(\ell)} = (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{\boldsymbol {h}^{-1}(\ell)} \qquad \text{for all $\ell \in [L]$}.\]
We observe first that for each \(\ell \in [L]\) , the random variable \(\boldsymbol {y}^\ell \oplus \tilde{\boldsymbol {y}}^{\ell } \sim \lbrace -1,1\rbrace ^n\)
(i)
is \(r_\mathrm{bucket}\) -wise uniform (since \(\boldsymbol {y}^{\ell }\) is); and
(ii)
\(\delta _\mathrm{CNF}\) -fools the class of width- \(w\) CNF formulas (since \(\tilde{\boldsymbol {y}}^{\ell }\) does).
We will use both properties in this proof. For each hash \(h : [n] \rightarrow [L]\) and index \(\ell \in \lbrace 0,1,\ldots ,L\rbrace\) , we define the hybrid random variable \(\boldsymbol {x}^{h,\ell } \sim \lbrace -1,1\rbrace ^n\) ,
\[\boldsymbol {x}^{h,\ell }_{h^{-1}(c)} = {\left\lbrace \begin{array}{ll} \boldsymbol {u}_{h^{-1}(c)} & \text{if $c \gt \ell $} \\ (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{h^{-1}(c)} & \text{if $c \le \ell $}. \end{array}\right.}\]
Averaging over \(\boldsymbol {h}\) , we get that \(\boldsymbol {x}^{\boldsymbol {h},0} \equiv \boldsymbol {u}\) and \(\boldsymbol {x}^{\boldsymbol {h},L} \equiv \breve{\boldsymbol {y}}\) , and so we may write
\begin{align*} \text{LHS of Equation (26)} &= \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {u}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\breve{\boldsymbol {y}})\big ] \big | \\ &= \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},0}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},L})\big ] \big | \\ &\le \mathop {{\bf E}}_{\boldsymbol {h}} \Big [ \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},0}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},L})\big ] \big | \Big ] \\ &\le \mathop {{\bf E}}_{\boldsymbol {h}} \Big [ \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},0}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},L})\big ] \big | \cdot \mathds {1}\big [ \, \text{$\boldsymbol {h}$ is $H$-good}\,\big ] \Big ]\\ &\ \ \ \ + {{\bf Pr}}\big [\,\text{$\boldsymbol {h}$ is not $H$-good}\,\big ] \\ &\le \underbrace{\mathop {{\bf E}}_{\boldsymbol {h}} \Big [ \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},0}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{\boldsymbol {h},L})\big ] \big | \cdot \mathds {1}\big [ \, \text{$\boldsymbol {h}$ is $H$-good}\,\big ] \Big ]}_{\heartsuit } + \delta . \end{align*}
The penultimate inequality uses the fact that \(\widetilde{\mathcal {O}}_{b,\lambda }\) is \((0,1)\) -valued (and hence the difference in its expectations under any two distributions is at most 1), and the final inequality is by Proposition 8.11 (note that we indeed have \(r_\mathrm{hash}\ge {C_1 \log (Lm/\delta)}\) ).
It remains to bound \(\heartsuit\) by \(O(\delta)\) . Fix a \(H\) -good hash \(h\) . By the triangle inequality,
\begin{equation} \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{h,0}) \big ] - \mathop {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{h,L})\big ] \big | \le \sum _{\ell =1}^L\big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{h,\ell -1}) \big ] - {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{h,\ell })\big ] \big |. \end{equation}
(27)
Fix \(\ell \in [L]\) and consider the corresponding summand
\begin{equation} \big | \mathop {{\bf E}}\big [\widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{h,\ell -1}) \big ] - {{\bf E}}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A\boldsymbol {x}^{h,\ell })\big ] \big |. \end{equation}
(28)
For notational clarity, let us write \(B\) for \(h^{-1}(\ell)\) and \(\overline{B}\) to denote \([n]\setminus B.\) Furthermore, since these “adjacent” hybrid random variables \(\boldsymbol {x}^{h,\ell -1}\) and \(\boldsymbol {x}^{h,\ell }\) agree on all coordinates outside \(B\) , we introduce the random variable \(\boldsymbol {s}\sim \lbrace -1,1\rbrace ^{\overline{B}}\) where \(\boldsymbol {s}_{h^{-1}(c)} \equiv \boldsymbol {x}^{h,\ell -1}_{h^{-1}(c)} \equiv \boldsymbol {x}^{h,\ell }_{h^{-1}(c)}\) for all \(c \ne \ell\) . Note that \(\boldsymbol {s}, \boldsymbol {u}_B\) , and \((\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_B\) are mutually independent. We have that
\begin{align*} (28) &= \Big | \mathop {{\bf E}}_{\boldsymbol {s}} \Big [ \mathop {{\bf E}}_{\boldsymbol {u}} \big [ \widetilde{\mathcal {O}}_{b,\lambda }(A^{\overline{B}}\boldsymbol {s}+ A^B \boldsymbol {u}_{B}) \big ] - \mathop {{\bf E}}_{\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A^{\overline{B}}\boldsymbol {s}+ A^B (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{B}) \big ] \Big ] \Big | \\ &\le \mathop {{\bf E}}_{\boldsymbol {s}} \Big [ \big |\mathop {{\bf E}}_{\boldsymbol {u}} \big [ \widetilde{\mathcal {O}}_{b,\lambda }(A^{\overline{B}}\boldsymbol {s}+ A^B \boldsymbol {u}_{B}) \big ] - \mathop {{\bf E}}_{\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\big [ \widetilde{\mathcal {O}}_{b,\lambda }(A^{\overline{B}}\boldsymbol {s}+ A^B (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{B}) \big ] \big | \Big ] \\ &= \mathop {{\bf E}}_{\boldsymbol {s}} \Big [ \big |\mathop {{\bf E}}_{\boldsymbol {u}} \big [ \widetilde{\mathcal {O}}_{b-A^{\overline{B}}\boldsymbol {s},\lambda }(A^B \boldsymbol {u}_{B}) \big ] - \mathop {{\bf E}}_{\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\big [ \widetilde{\mathcal {O}}_{b-A^{\overline{B}}\boldsymbol {s},\lambda }(A^B (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{B}) \big ] \big | \Big ] \qquad \qquad \qquad {\rm (Fact~6.3)} \\ &= \mathop {{\bf E}}_{\boldsymbol {s}} \Big [ \big |\mathop {{\bf E}}_{\boldsymbol {u}} \big [ \widetilde{\mathcal {O}}_{b-A^{\overline{B}}\boldsymbol {s},\lambda }(H^B \boldsymbol {u}_{B} + T^B\boldsymbol {u}_{B}) \big ] - \mathop {{\bf E}}_{\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\big [ \widetilde{\mathcal {O}}_{b-A^{\overline{B}}\boldsymbol {s},\lambda }(H^B (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{B}+ T^B (\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{B}) \big ] \big | \Big ]. \end{align*}
Since \(h\) is \(H\) -good, every row of \(H^B\) is indeed \(w\) -sparse, and since every row of \(T\) has 2-norm 1, every row of \(T^B\) has 2-norm at most 1. Recalling (ii) from above, we may apply Lemma 8.3 to each outcome \(s\) of \(\boldsymbol {s}\) , and we get that this quantity is at most
\[\delta _{\mathrm{CNF}} \cdot m^{d-1} \cdot O{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^{d-1} + O{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d{\lparen }*{\rparen }{\mathop {{\bf E}}_{\boldsymbol {u}}\big [\Vert T^B\boldsymbol {u}_B\Vert _\infty ^d\big ] + \mathop {{\bf E}}_{\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\big [\Vert T^B(\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_B\Vert _\infty ^d\big ]},\]
and therefore
\begin{align} \text{RHS of Equation (27)} &\le L\cdot \delta _{\mathrm{CNF}} \cdot m^{d-1} \cdot O{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^{d-1} \nonumber\\ & \ \ \ \ + O{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d\cdot \sum _{\ell =1}^L\left(\mathop {{\bf E}}_{\boldsymbol {u}}\Big [\Vert T^{h^{-1}(\ell)}\boldsymbol {u}_{h^{-1}(\ell)}\Vert _\infty ^d\Big ]\right. \nonumber\\ & \ \ \ \ \left.+\mathop {{\bf E}}_{\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\Big [\Vert T^{h^{-1}(\ell)}(\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{h^{-1}(\ell)}\Vert _\infty ^d\Big ]\right). \end{align}
(29)
Since Equation (29) holds for every \(H\) -good hash \(h\) , we have shown that
\begin{align*} \heartsuit &\le \mathop {{\bf E}}_{\boldsymbol {h}}\big [ \text{(RHS of Equation (27))} \cdot \mathds {1}[ \,\text{$\boldsymbol {h}$ is $H$-good}\,] \big ] \\ &\le \mathop {{\bf E}}_{\boldsymbol {h}}\big [ \text{(RHS of Equation (29))} \cdot \mathds {1}[ \,\text{$\boldsymbol {h}$ is $H$-good}\,] \big ] \\ &\le \mathop {{\bf E}}_{\boldsymbol {h}}\big [ \text{(RHS of Equation (29))} \big ] \\ &= L\cdot \delta _{\mathrm{CNF}} \cdot m^{d-1} \cdot O{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^{d-1} \\ & \ \ \ \ + O{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d\cdot \underbrace{\sum _{\ell =1}^L\left(\mathop {{\bf E}}_{\boldsymbol {h},\boldsymbol {u}}\Big [\Vert T^{\boldsymbol {h}^{-1}(\ell)}\boldsymbol {u}_{\boldsymbol {h}^{-1}(\ell)}\Vert _\infty ^d\Big ] + \mathop {{\bf E}}_{\boldsymbol {h},\boldsymbol {y}^\ell ,\tilde{\boldsymbol {y}}^{\ell }}\Big [\Vert T^{\boldsymbol {h}^{-1}(\ell)}(\boldsymbol {y}^{\ell } \oplus \tilde{\boldsymbol {y}}^\ell)_{\boldsymbol {h}^{-1}(\ell)}\Vert _\infty ^d\Big ]\right)}_{\diamondsuit }. \end{align*}
Applying Lemma 8.10 to bound each of the \(2L\) many summands of \(\diamondsuit\) , we have that
\begin{align} \heartsuit &\le L\cdot \delta _{\mathrm{CNF}} \cdot m^{d-1} \cdot O{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^{d-1} + O{\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\lambda }}^d\cdot 2L\cdot O{\lparen }*{\rparen }{\tau \log (m/\delta) + \sqrt {\frac{\log (m/\delta)}{L}}}^d \nonumber\\ &= L\cdot \delta _{\mathrm{CNF}} \cdot m^{d-1} \cdot O{\lparen }*{\rparen }{\frac{\sqrt {n}}{\lambda }}^{d-1} + L \cdot O{\lparen }*{\rparen }{\frac{\tau \sqrt {\log (m)} \log (m/\delta)}{\lambda } + \frac{\sqrt {\log (m)\log (m/\delta)}}{\lambda \sqrt {L}}}^d. \end{align}
(30)
By our choice of parameters as set in Section 4.1,
\begin{align*} (30) &= O(\delta) + \frac{(\log m)^5}{\delta ^{2+\varepsilon }} \cdot O{\lparen }*{\rparen }{\delta ^\varepsilon \cdot \frac{\log (m) (\log (m/\delta))^{1.5 + \varepsilon }}{(\log m)^{2.5+\varepsilon }} + \delta ^{\varepsilon /2} \cdot \frac{\log (m)\log (m/\delta)}{(\log m)^{2.5}}}^d. \end{align*}
Taking \(d\) to be sufficiently large relative to \(\varepsilon\) , the above expression can be bounded by \(O(\delta)\) . This establishes Equation (26), and the proof of Theorem 8.1 is complete.□

9 Proof of Theorem 5.3

Having completed both steps of the two-step program described at the end of Section 6, we are finally ready to prove Theorem 5.3, which we restate here for convenience:
Theorem 9.1.
Let \(\mathscr{G}\) be our generator with parameters as set in Section 4.1. For all \(m\) -facet \((k,\tau)\) -standardized polytopes \(Ax \le b\) ,
\[\Big | \mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}\big [A\boldsymbol {u}\in \mathcal {O}_{b} \big ] - \mathop {{\bf Pr}}_{\boldsymbol {z}\sim \mathscr{G}}\big [ A\boldsymbol {z}\in \mathcal {O}_{b}\big ] \Big | = O(\delta).\]
Proof.
Let \(\lambda \in (0,1)\) be as set in Section 4.1. By Lemma 6.7, there are \(b^{{\mathrm{in}}},b^{{\mathrm{out}}} \in \mathds R^m\) such that \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda },\widetilde{\mathcal {O}}_{b^{{\mathrm{out}}},\lambda }\) are \((\Lambda ,\delta)\) -inner and -outer approximators for \(\mathcal {O}_b\) , respectively, where \(\Lambda = \Theta (\lambda \sqrt {\log (m/\delta)}).\) Next, we apply Lemma 6.9 with \(\boldsymbol {v}\) and \(\tilde{\boldsymbol {v}}\) being \(A\boldsymbol {u}\) and \(A \boldsymbol {z}\) , respectively, using Theorem 8.1 to show that Equation (7) is satisfied for both \(\widetilde{\mathcal {O}}_{b^{{\mathrm{in}}},\lambda }\) and \(\widetilde{\mathcal {O}}_{b^{{\mathrm{out}}},\lambda }\) with \(\gamma = O(\delta)\) . We conclude that:
\begin{align} &{\vert }*{\vert }{\mathop {{\bf Pr}}_{\boldsymbol {u}\sim \lbrace -1,1\rbrace ^n}{\lbrack }*{\rbrack }{A\boldsymbol {u}\in \mathcal {O}_{b}} - \mathop {{\bf Pr}}_{\boldsymbol {z}\sim \mathscr{G}}{\lbrack }*{\rbrack }{A\boldsymbol {z}\in \mathcal {O}_{b}} }\nonumber\\ &= O(\delta) + {{\bf Pr}}{\lbrack }*{\rbrack }{A\boldsymbol {u}\in \Game _{\pm \Lambda }\mathcal {O}_b} \qquad \qquad \qquad {\rm (Lemma~6.9 and Theorem~8.1)} \end{align}
(31)
\begin{align} &= O(\delta) + O\big (\Lambda \sqrt {\log m})\big) \qquad \qquad \qquad {\rm (Theorem~7.1; note that \Lambda \ge \tau is indeed satisfied)} \end{align}
(32)
\begin{align} &= O(\delta) + O\left(\lambda \sqrt {\log (m/\delta)\log m}\right) \nonumber\\ &= O(\delta). \end{align}
(33)
This completes the proof of Theorem 5.3.□

Acknowledgments

Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Footnotes

1
Their seed length improves to \(O(m \log (m/\delta) + \log n)\) if \(m/\delta\) is bounded by any \(\mathrm{polylog}(n)\) .
2
Equivalently, \(A\) is \((n, \tau)\) -standardized.
3
Given a halfspace \(\mathds {1}[w \cdot x \le \theta ]\) , there is a smallest value \(\theta ^{\prime } \gt \theta\) achievable as \(w \cdot x\) for \(x \in \lbrace -1,1\rbrace ^n\) ; first perturb \(\theta\) upward to \((\theta + \theta ^{\prime })/2\) . Now no input \(x\) achieves \(w\cdot x = \theta\) exactly, so we can perturb the coefficients of \(w\) by sufficiently small amounts.

A

In this section, we prove Theorem 5.1. The proof uses the “critical index” theory for Boolean halfspaces, introduced in Reference [55] and used in several subsequent works on halfspaces.
Definition A.1 (Critical Index).
Let \(w \in \mathds R^n\) and assume for notational simplicity that \(|w_1| \ge |w_2| \ge \cdots \ge |w_n|.\) The \(\tau\) -critical index of \(w\) is the least \(j\) such that the “tail” \((w_j, w_{j+1}, \ldots , w_n)\) is \(\tau\) -regular, or \(\infty\) if no such \(j\) exists.
Given \(A\) as in Theorem 5.1, the rows that are already \((k,\tau)\) -regular pose no difficulty as a simple rescaling of any such row (and the corresponding entry of \(b\) ) makes it \((k,\tau)\) -standardized. The remaining rows \(A_i\) have \(\tau\) -critical index exceeding \(k\) . The critical index theory [49, 55] says that such halfspaces \(\mathds {1}[A_i x \le b_i]\) are very close to \(k\) -juntas, and in fact Reference [9] shows that this is true even under \((k+2)\) -wise uniform distributions (for a slightly larger choice of \(k\) as alluded to in Remark 5.2). We tweak the quantitative aspects of these arguments below to work for the choice of \(k\) given in Equation (3). It will be convenient to follow the treatment in Reference [18].
The first lemma below says that if the “head” variables are set uniformly, then the resulting random variable has good anticoncentration at the scale of the two-norm of the tail:
Lemma A.2.
Let \(\tau \in (0,1)\) , \(\varepsilon \in (0,1/2)\) , \(s \gt 1\) . Then for a certain \(\ell = O(\log (s) \log (1/\varepsilon)/\tau ^2)\) the following holds: If \(w \in \mathds R^n\) as in Definition A.1 has \(\tau\) -critical index at least \(\ell\) , then for all \(\theta \in \mathds R\) ,
\[\mathop {{\bf Pr}}_\begin{array}{c}\boldsymbol {u}\sim \lbrace -1,1\rbrace ^\ell \\ \text {uniform}\end{array} {\lbrack }*{\rbrack }{{\vert }w_1\boldsymbol {u}_1 + \cdots + w_\ell \boldsymbol {u}_\ell - \theta {\vert } \le s \cdot \sigma } \le \varepsilon + O(\log (1/\varepsilon)\exp (-s^2/2)),\]
where \(\sigma \coloneqq \sqrt {w_{\ell +1}^2 + \cdots + w_n^2}\) .
Proof.
We refer directly to the proof of the almost identical Reference [18, Theorem 5.3] in the full version of that paper. In that proof we may take “ \(\delta\) ” to be \(\tau ^2\) , and “ \(\eta\) ” to be \(1/\sqrt {3}\) , since we work with uniform \(\pm 1\) bits (see Fact 3.3.5 therein). The only change needed in the proof occurs before “inequality (10).” That inequality uses the fact that a certain random variable \(\boldsymbol {z}\) satisfies the tail bound \({{\bf Pr}}[|\boldsymbol {z}| \ge s \rho ] \le O(1/s^4)\) when \(\rho\) is at most the standard deviation of \(\boldsymbol {z}\) . But in our current setting, the random variable \(\boldsymbol {z}\) equals \(w_1\boldsymbol {u}_1 + \cdots + w_\ell \boldsymbol {u}_\ell\) , i.e., it is a weighted sum of independent uniform \(\pm 1\) bits, and so we have the improved tail bound \(2\exp (-s^2/2)\) using Hoeffding. Carrying through the remainder of the proof with this change yields the conclusion of Lemma A.2.□
Lemma A.3.
Let \(\tau \in (0,1)\) and let \(\varepsilon \in (0,1/2)\) . Then for a certain \(k = O(\log (1/\varepsilon)\log \log (1/\varepsilon)/\tau ^2)\) and \(r = O(\log (1/\varepsilon))\) , the following holds for every \(w \in \mathds R^n\) that is not \((k,\tau)\) -regular:
Let \(H \subseteq [n]\) be the set of \(k\) coordinates \(i\) for which \(|w_i|\) is largest and let \(T = [n] \setminus H\) . Assume \(w^{\prime } \in {\mathds R^n}\) has \(w^{\prime }_H = w_H\) and \(\Vert w^{\prime }_T\Vert _2 \le \Vert w_T\Vert _2\) . Then for any \({\theta } \in \mathds R\) ,
\[\mathop {{\bf Pr}}_{\boldsymbol {y}}{\lbrack }*{\rbrack }{\mathds {1}[w \cdot \boldsymbol {y}\le \theta ] \ne \mathds {1}[w^{\prime } \cdot \boldsymbol {y}\le \theta ]} = O(\varepsilon)\]
provided \(\boldsymbol {y}\sim \lbrace -1,1\rbrace ^n\) is \((k + r)\) -wise uniform.
Proof.
Suppose \(w\) is not \((k,\tau)\) -regular. By reordering coordinates we may assume that \(H = [k]\) ; then the non- \((k,\tau)\) -regularity of \(w\) means the \(\tau\) -critical index of \(w\) exceeds \(k\) . We may therefore apply Lemma A.2 with \(s = O(\sqrt {\log (1/\varepsilon)})\) . Using the fact that \(\boldsymbol {y}_H\) is fully uniform, we get
\begin{equation} {{\bf Pr}}{\lbrack }*{\rbrack }{{\vert }w_H \cdot \boldsymbol {y}_H - \theta {\vert } \le s \cdot \Vert w_{{{T}}}\Vert _2} = O(\varepsilon) \qquad \text{(and note that } w^{\prime }_H \cdot \boldsymbol {y}_H = w_H \cdot \boldsymbol {y}_H\text{).} \end{equation}
(34)
Conditioned on any outcome of \(\boldsymbol {y}_H\) , the distribution of \(\boldsymbol {y}_T\) remains \(r\) -wise uniform. We claim that it remains to show the following:
\begin{equation} {{\bf Pr}}[|w^{\prime }_T \cdot \boldsymbol {y}_T| \ge s \cdot \Vert w_{{{T}}}\Vert _2] = O(\varepsilon). \end{equation}
(35)
To see that this suffices, observe that by Equation (34) we have that \({\vert }w_H \cdot \boldsymbol {y}_H - \theta {\vert } = {\vert }w^{\prime }_H \cdot \boldsymbol {y}_H - \theta {\vert } \gt s \cdot \Vert w_{{{T}}}\Vert _2\) except with probability \(O(\varepsilon)\) . Also, by applying Equation (35) with \(w^{\prime }\) and with \(w^{\prime } = w\) , we get both \(|w_T \cdot \boldsymbol {y}_T|, |w^{\prime }_T \cdot \boldsymbol {y}_T| \le s \cdot \Vert w_{{{T}}}\Vert _2\) except with another probability at most \(O(\varepsilon)\) . When all of these events occur, \(\mathds {1}[w \cdot \boldsymbol {y}\le \theta ]\) and \(\mathds {1}[w^{\prime } \cdot \boldsymbol {y}\le \theta ]\) agree.
Finally, we can establish Equation (35) by appealing to, e.g., Reference [47, Theorem 9.23]. That theorem (with \(k = 1\) ) shows that for \(t \ge \sqrt {2e}\) , any linear form \(f(\boldsymbol {x})\) in uniform \(\pm 1\) random variables \(\boldsymbol {x}\) has \({{\bf Pr}}[|f(\boldsymbol {x})| \ge t \Vert f\Vert _2] \le \exp (-O(t^2))\) . If we could directly apply this to the linear form \(w^{\prime }_T \cdot \boldsymbol {y}_T\) , then we would be done by taking \(t = s\) and using \(\Vert w^{\prime }_{{{T}}}\Vert _2 \le \Vert w_{{{T}}}\Vert _2\) . We cannot directly apply this theorem, because the bits \(\boldsymbol {y}_T\) are not uniformly random. However, inspecting the proof of Reference [47, Theorem 9.23] shows that it suffices for those bits to be \(O(t^2)\) -wise uniform, which they are provided that \(r = O(\log (1/\varepsilon)) = O(s^2) = O(t^2)\) . The reason that this suffices is because the proof only uses \((2,q,1/\sqrt {q-1})\) -hypercontractivity of \(f(\boldsymbol {x})\) for \(q = O(t^2)\) , and (for even integer \(q\) ) this condition only involves the first \(q\) moments of \(f(\boldsymbol {x})\) , which do not change if \(\boldsymbol {x}\) is assumed to be merely \(q\) -wise uniform rather than truly uniform.□
We can now prove Theorem 5.1:
Proof.
We will use Lemma A.3 with \(\varepsilon = c \delta /m\) for small constant \(c \gt 0\) . This leads to the choice of \(k\) in the statement of Theorem 5.1; also, \(r \ll k\) and so \(2k \ge r + k.\)
Given \(A \in \mathds R^{m \times n}\) , as noted earlier the rows that are \((k,\tau)\) -regular are not a problem, so we consider all rows \(A_i\) that are not \((k,\tau)\) -regular. For these rows, we apply Lemma A.3, taking \(A^{\prime }_i\) to agree with \(A_i\) on the appropriate “head” coordinates \(H_i\) , and taking \(A^{\prime }_i\) to simply be 0 on the remaining “tail” coordinates. Note that \(A^{\prime }_i\) is now trivially \((k,\tau)\) -regular. By Lemma A.3, we have that
\[\mathop {{\bf Pr}}_{\boldsymbol {y}}{\lbrack }*{\rbrack }{\mathds {1}[A_i \cdot \boldsymbol {y}\le b_i] \ne \mathds {1}[A^{\prime }_i \cdot \boldsymbol {y}\le b_i]} \le \delta /m.\]
Taking \(b^{\prime }_i = b_i\) for these \(i\) ’s, and union-bounding over the at most \(m\) of them, we are almost at the point of establishing Equation (4) from the theorem statement. We now have that all \(A^{\prime }_i\) are \((k,\tau)\) -regular; the only deficiency is that the “tail” of each row need not have 2-norm 1, as required.
Whenever the “tail” of \(A^{\prime }_i\) has nonzero 2-norm, we can simply scale \(A^{\prime }_i\) and \(b^{\prime }\) by the same positive factor to make the tail of \(A^{\prime }_i\) have 2-norm 1; this scaling does not change the Boolean function \(\mathds {1}[A^{\prime }_i \cdot x \le b^{\prime }_i]\) at all. The only (very minor) difficulty now remaining is that some of the rows \(A^{\prime }_i\) may have tail with 2-norm zero. It is well known, however, that one can always slightly perturb the coefficients and threshold in a halfspace without changing it as a Boolean function.3 We can perturb in such a way that the tail coefficients all become equal to some sufficiently small \(\eta \gt 0\) . After this perturbation, the row \(A^{\prime }_i\) is \((k,\tau)\) -regular (this holds, recalling that \(k \le n/2\) , since \(n-k \ge k \ge 1/\tau ^2\) ) and its tail has positive 2-norm. Now we can scale up \((A^{\prime }_i,b^{\prime }_i)\) as before to make the tail have 2-norm 1.□

B

We recall Claim 7.11:
Claim B.1 (Claim 7.11 Restated).
For \({{2}} \le m \le 2^n\) , there is a matrix \(A \in \lbrace -1,1\rbrace ^{m \times n}\) and a vector \(b \in \mathds Z^m\) such that
\[{{\bf Pr}}{\lbrack }*{\rbrack }{ A\boldsymbol {u}\in \Game \mathcal {O}_b} = \Omega {\lparen }*{\rparen }{\frac{\sqrt {\ln m}}{\sqrt {n}}}.\]
Proof.
The proof is a simple probabilistic existence argument that follows the approach used to prove Theorem 2 in Reference [26]. For a polytope \(\mathcal {K} = \mathds {1}[ Ax \le b]\) we define
\begin{align*} \mathrm{Inside}(\mathcal {K}) &= \lbrace x \in \lbrace -1,1\rbrace ^n: Ax \in \mathcal {O}_b \setminus \Game \mathcal {O}_b, &\text{i.e.,~}A_i x \lt b_i \text{~for all~}i \in [m]\rbrace ,\\ \mathrm{Surface}(\mathcal {K}) &= \lbrace x \in \lbrace -1,1\rbrace ^n: Ax \in \Game \mathcal {O}_b, &\text{i.e.,~}Ax \le b \text{~and~}A_i x = b_i \text{~for some~}i \in [m]\rbrace . \end{align*}
Given \({{2}} \le m \le 2^n\) , if \(m \lt 10,\) then the one-facet polytope \(\mathds {1}[x_1 + \cdots + x_n \le 0]\) does the job (more formally, we take \(A\) to be the \(m \times n\) all-1’s matrix and \(b\) to be the zero vector in \(\mathds R^m\) ). It is also clear that proving our result for \(m \le 2^{n/10}\) also proves it for \(2^{n/10} \le m \le 2^n\) . So, we henceforth assume that \(10 \le m \le 2^{n/10}\) . Let \(k \ge n/2\) be an integer to be chosen later, and let \(F: \lbrace -1,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) denote the halfspace \(F(x)= \mathds {1}[x_1 + \cdots + x_n \le k]\) . Now define the following quantities:
\begin{align*} p_{\mathrm{I}} &\coloneqq |\mathrm{Inside}(F)|/2^n = {n \choose \lt \!\!k}/2^n,\qquad p_{\mathrm{S}} \coloneqq |\mathrm{Surface}(F)|/2^n = {n \choose k}/2^n. \end{align*}
Let \(\overline{\boldsymbol {\sigma }}=(\boldsymbol {\sigma }^1,\ldots ,\boldsymbol {\sigma }^{m})\) where each \(\boldsymbol {\sigma }^i\) is an independent uniform string in \(\lbrace -1,1\rbrace ^n\) . Let \(\boldsymbol {A}\in \lbrace -1,1\rbrace ^{m \times n}\) be the matrix whose \(i\) th row is \(\boldsymbol {\sigma }^i\) , and let \(b\) be the vector \((k,\ldots ,k) \in \mathds Z^{m}.\) It is easy to see that to prove our result it suffices to show that there is a fixed outcome \(A\) of \(\boldsymbol {A}\) such that
\begin{equation} {{\bf Pr}}{\lbrack }*{\rbrack }{ A\boldsymbol {u}\in \Game \mathcal {O}_{b}} = \Omega {\lparen }*{\rparen }{\frac{\sqrt {\log m}}{\sqrt {n}}}, \end{equation}
(36)
and this is what we show below. Towards this end, for each \(i \in [m],\) let us define the matrix \(\boldsymbol {A}^{\setminus i} \in \lbrace -1,1\rbrace ^{(m-1) \times n}\) obtained by removing the \(i\) th row of \(\boldsymbol {A}\) , and further define \(b^{\prime } = (k,\ldots ,k) \in \mathds Z^{m-1}.\)
For each fixed \(z \in \lbrace -1,1\rbrace ^n\) and each \(i \in [m]\) we have
\[\mathop {{\bf Pr}}_{\overline{\boldsymbol {\sigma }}} \left[ z \in \mathrm{Inside}(\mathds {1}[\boldsymbol {A}^{\setminus i} x \le b^{\prime }]) \right]= p_{\mathrm{I}}^{m-1}\]
and
\[\mathop {{\bf Pr}}_{\overline{\boldsymbol {\sigma }}} \left[ z \in \mathrm{Surface}(\mathds {1}[\boldsymbol {\sigma }^i \cdot x \le k])\right]= p_{\mathrm{S}}.\]
Since \(\boldsymbol {\sigma }^i\) and \((\boldsymbol {\sigma }^{i^{\prime }})_{i^{\prime } \in [m] \setminus \lbrace i\rbrace }\) are independent for each \(i \in [m]\) , it follows that
\[\mathop {{\bf Pr}}_{\overline{\boldsymbol {\sigma }}} \left[ z \in \mathrm{Inside}(\mathds {1}[\boldsymbol {A}^{\setminus i} x \le b^{\prime }]) \, \, \& \, \, z \in \mathrm{Surface}(\mathds {1}[\boldsymbol {\sigma }^i \cdot x \le k]) \right]= p_{\mathrm{S}} \cdot p_\mathrm{I}^{m-1},\]
and since the events
\[z \in \mathrm{Inside}(\mathds {1}[\boldsymbol {A}^{\setminus i} x \le b^{\prime }]) \, \, \& \, \, z \in \mathrm{Surface}(\mathds {1}[\boldsymbol {\sigma }^i \cdot x \le k])\]
and
\[z \in \mathrm{Inside}(\mathds {1}[\boldsymbol {A}^{\setminus i^{\prime }} x \le b^{\prime }]) \, \, \& \, \, z \in \mathrm{Surface}(\mathds {1}[\boldsymbol {\sigma }^{i^{\prime }} \cdot x \le k])\]
are mutually exclusive for \(i \ne i^{\prime } \in [m]\) , by a union bound we have that
\[\mathop {{\bf Pr}}_{\overline{\boldsymbol {\sigma }}} \big [ \, z \in \mathrm{Surface}(\mathds {1}[\boldsymbol {A}x \le b^{\prime }]) \big ] = \mathop {{\bf Pr}}_{\overline{\boldsymbol {\sigma }}}{\lbrack }*{\rbrack }{ \boldsymbol {A}z \in \Game \mathcal {O}_{b^{\prime }}} = m \cdot p_{\mathrm{S}} \cdot p_{\mathrm{I}}^{m-1}.\]
It follows that there is an outcome of \(\overline{\boldsymbol {\sigma }}\) such that the resulting matrix \(A \in \mathds R^{m \times n}\) has at least an \(m \cdot p_{\mathrm{S}} \cdot p_{\mathrm{I}}^{m-1}\) fraction of all points in \(\lbrace -1,1\rbrace ^n\) satisfying \(A z \in \Game \mathcal {O}_{{{b}}}\) ; i.e.,
\begin{equation} {{\bf Pr}}{\lbrack }*{\rbrack }{ A\boldsymbol {u}\in \Game \mathcal {O}_b} \ge m \cdot p_{\mathrm{S}} \cdot p_{\mathrm{I}}^{m-1}. \end{equation}
(37)
It remains only to argue that for any \(10 \le m \le 2^{n/10}\) , there is a value \(k\) such that, for \(p_{\mathrm{I}}={n \choose \lt k}\) and \(p_{\mathrm{S}} = {n \choose k}\) , we have
\[m \cdot p_{\mathrm{S}} \cdot p_{\mathrm{I}}^{m-1} = \Omega \left(\sqrt {\log m}/\sqrt {n}\right).\]
Towards this end we choose \(k\) to be the largest integer such that \({n \choose \lt k} / 2^n \le 1-{\frac{1}{m}}\) . Recalling that \(10 \le m \le 2^{n/10}\) , we have that \(n/2 \le k \le 0.99n\) , and hence \({n \choose k}\) and \({n \choose k-1}\) are within an absolute constant multiplicative factor of each other. It follows that
\[p_{\mathrm{I}} = {n \choose \lt \!\!k}/2^n = 1-\Theta (1/m),\]
which implies that
\[p_{\mathrm{I}}^{m-1}=\Omega (1).\]
Writing \(k = n/2 + (\sqrt {n}/2) t\) , we have the elementary binomial tail lower bound \(1-p_{\mathrm{I}} \ge \exp (-O(t^2))\) (see, e.g., Reference [36, inequality (4.2)])); hence \(t \ge \Omega (\sqrt {\ln m})\) . The desired bound
\[p_{\mathrm{S}} = {n \choose k}/2^n = \Omega (t/(m \sqrt {n}))\]
now follows from asymptotically tight estimates (up to universal constants for all \(0 \le t \le \sqrt {n}\) ) for the Mills ratio of the binomial distribution; see Reference [38].□

References

[1]
Miklós Ajtai and Avi Wigderson. 1989. Deterministic simulation of probabilistic constant depth circuits. Adv. Comput. Res. 5 (1989), 199–222.
[2]
Srinivasan Arunachalam and Penghui Yao. 2021. Positive spectrahedrons: Geometric properties, invariance principles and pseudorandom generators. CoRR abs/2101.08141 (2021).
[3]
Louay Bazzi. 2009. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput. 38, 6 (2009), 2220–2272.
[4]
Richard Beigel, Nick Reingold, and Daniel Spielman. 1995. PP is closed under intersection. J. Comput. Syst. Sci. 50, 2 (1995), 191–202.
[5]
Vidmantas Bentkus. 1990. Smooth approximations of the norm and differentiable functions with bounded support in Banach space \(l^k_\infty\) . Lithuan. Math. J. 30, 3 (1990), 223–230.
[6]
Avrim Blum and Ravi Kannan. 1997. Learning an intersection of a constant number of halfspaces under a uniform distribution. J. Comput. Syst. Sci. 54, 2 (1997), 371–380.
[7]
Mei-Chu Chang. 2002. A polynomial bound in Freiman’s theorem. Duke Math. J. 113, 3 (2002), 399–419. DOI:DOI:
[8]
Eshan Chattopadhyay, Anindya De, and Rocco Servedio. 2018. Simple and efficient pseudorandom generators from Gaussian processes. Retrieved from.
[9]
Ilias Diakonikolas, Parikshit Gopalan, Rajesh Jaiswal, Rocco A. Servedio, and Emanuele Viola. 2010. Bounded independence fools halfspaces. SIAM J. Comput. 39, 8 (2010), 3441–3462.
[10]
Ilias Diakonikolas, Daniel Kane, and Jelani Nelson. 2010. Bounded independence fools degree-2 threshold functions. In Proceedings of the 51st Annual Symposium on Foundations of Computer Science (FOCS). 11–20.
[11]
Devdatt Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge.
[12]
Paul Erdős. 1945. On a lemma of Littlewood and Offord. Bull. Amer. Math. Soc. 51 (1945), 898–902.
[13]
William Feller. 1968. An Introduction to Probability Theory and Its Applications. John Wiley & Sons.
[14]
Péter Frankl and Zoltán Füredi. 1988. Solution of the Littlewood–Offord problem in high dimensions. Ann. Math. 128, 2 (1988), 259–270.
[15]
Parikshit Gopalan, Daniel Kane, and Raghu Meka. 2015. Pseudorandomness via the discrete Fourier transform. In Proceedings of the 56th Annual Symposium on Foundations of Computer Science (FOCS). 903–922.
[16]
Parikshit Gopalan, Adam Klivans, and Raghu Meka. 2012. Learning functions of halfspaces using prefix covers. In Proceedings of the 25th Annual Conference on Learning Theory (COLT).
[17]
Parikshit Gopalan, Raghu Meka, and Omer Reingold. 2013. DNF sparsification and a faster deterministic counting algorithm. Comput. Complex. 22, 2 (2013), 275–310. DOI:DOI:
[18]
Parikshit Gopalan, Ryan O’Donnell, Yi Wu, and David Zuckerman. 2010. Fooling functions of halfspaces under product distributions. In Proceedings of the 25th Annual Conference on Computational Complexity (CCC). 223–234. Retrieved from https://arxiv.org/abs/1001.1593.
[19]
Prahladh Harsha, Adam R. Klivans, and Raghu Meka. 2012. An invariance principle for polytopes. J. ACM 59, 6 (2012), 29:1–29:25. DOI:DOI:https://doi.org/10.1145/2395116.2395118
[20]
Pooya Hatami, William Hoza, Avishay Tal, and Roei Tell. 2021. Fooling constant-depth threshold circuits. Electron. Colloquium Comput. Complex. 28 (2021), 2.
[21]
Russell Impagliazzo, Cristopher Moore, and Alexander Russell. 2014. An entropic proof of Chang’s inequality. SIAM J. Discrete Math. 28, 1 (2014), 173–176. DOI:DOI:
[22]
Russell Impagliazzo, Noam Nisan, and Avi Wigderson. 1994. Pseudorandomness for network algorithms. In Proceedings of the 26th Annual Symposium on Theory of Computing (STOC). 356–364.
[23]
Valentine Kabanets, Sajin Koroth, Zhenjian Lu, Dimitrios Myrisiotis, and Igor Oliveira. 2020. Algorithms and lower bounds for de Morgan formulas of low-communication leaf gates. In Proceedings of the 35th Computational Complexity Conference (CCC’20) (LIPIcs), Vol. 169. 15:1–15:41.
[24]
Daniel Kane. 2011. \(k\) -Independent Gaussians fool polynomial threshold functions. In Proceedings of the 26th IEEE Conference on Computational Complexity (CCC). 252–261.
[25]
Daniel Kane. 2011. A small PRG for polynomial threshold functions of Gaussians. In Proceedings of the 52nd Annual IEEE Symposium on Foundations of Computer Science (FOCS). 257–266.
[26]
Daniel Kane. 2014. The average sensitivity of an intersection of halfspaces. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC). 437–440.
[27]
Daniel Kane. 2014. A pseudorandom generator for polynomial threshold functions of Gaussians with subpolynomial seed length. In Proceedings of the 29th Annual Conference on Computational Complexity (CCC). 217–228.
[28]
Daniel Kane and Sankeerth Rao. 2018. A PRG for Boolean PTF of degree 2 with seed length subpolynomial in \(\varepsilon\) and logarithmic in \(n\) . In Proceedings of the 33rd Computational Complexity Conference (CCC). 2:1–2:24.
[29]
Zohar Shay Karnin, Yuval Rabani, and Amir Shpilka. 2012. Explicit dimension reduction and its applications. SIAM J. Comput. 41, 1 (2012), 219–249.
[30]
Subhash Khot and Rishi Saket. 2011. On the hardness of learning intersections of two halfspaces. J. Comput. Syst. Sci. 77, 1 (2011), 129–141.
[31]
Daniel Kleitman. 1970. On a lemma of Littlewood and Offord on the distributions of linear combinations of vectors. Adv. Math. 5, 1 (1970), 155–157.
[32]
Adam Klivans, Ryan O’Donnell, and Rocco A. Servedio. 2004. Learning intersections and thresholds of halfspaces. J. Comput. Syst. Sci. 68, 4 (2004), 808–840.
[33]
Adam Klivans, Ryan O’Donnell, and Rocco A. Servedio. 2008. Learning geometric concepts via Gaussian surface area. In Proceedings of the 49th Symposium on Foundations of Computer Science (FOCS). 541–550.
[34]
Adam Klivans and Alexander Sherstov. 2006. Cryptographic hardness for learning intersections of halfspaces. In Proceedings of the 47th Symposium on Foundations of Computer Science (FOCS). 553–562.
[35]
Pravesh K. Kothari and Raghu Meka. 2015. Almost optimal pseudorandom generators for spherical caps. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC). 247–256.
[36]
Michel Ledoux and Michel Talagrand. 1991. Probability in Banach Spaces. Springer.
[37]
John Littlewood and Albert Cyril Offord. 1943. On the number of real roots of a random algebraic equation. III. Rec. Math. [Mat. Sbornik] N.S. 12 (1943), 277–286.
[38]
Brendan D. McKay. 1989. On Littlewood’s estimate for the binomial distribution. Adv. Appl. Probab. 21, 2 (1989), 475–478. DOI:DOI:
[39]
Raghu Meka and David Zuckerman. 2013. Pseudorandom generators for polynomial threshold functions. SIAM J. Comput. 42, 3 (2013), 1275–1301.
[40]
Marvin Minsky and Seymour Papert. 1968. Perceptrons: An Introduction to Computational Geometry. The MIT Press, Cambridge, MA.
[41]
Sergey Nagaev and Iosif Pinelis. 1978. Some inequalities for the distribution of sums of independent random variables. Theor. Probab. Applic. 22, 2 (1978), 248–256.
[42]
Fedor Nazarov. 2003. On the maximal perimeter of a convex set in \(\mathds R^n\) with respect to a Gaussian measure. In Geometric Aspects of Functional Analysis (2001–2002). Lecture Notes in Math., Vol. 1807, Springer, 169–187.
[43]
Noam Nisan. 1991. Pseudorandom bits for constant depth circuits. Combinatorica 11, 1 (1991), 63–70. COMBDI
[44]
Noam Nisan. 1992. Pseudorandom generators for space-bounded computations. Combinatorica 12, 4 (1992), 449–461.
[45]
Noam Nisan and Avi Wigderson. 1994. Hardness vs. randomness. J. Comput. Syst. Sci. 49, 2 (1994), 149–167. DOI:DOI:https://doi.org/10.1016/S0022-0000(05)80043-1
[46]
Noam Nisan and David Zuckerman. 1996. Randomness is linear in space. J. Comput. Syst. Sci. 52, 1 (1996), 43–52. DOI:DOI:https://doi.org/10.1006/jcss.1996.0004
[47]
Ryan O’Donnell. 2014. Analysis of Boolean Functions. Cambridge University Press. Retrieved from http://analysisofbooleanfunctions.net/.
[48]
Ryan O’Donnell and Rocco Servedio. 2010. New degree bounds for polynomial threshold functions. Combinatorica 30, 3 (2010), 327–358.
[49]
Ryan O’Donnell and Rocco A. Servedio. 2011. The chow parameters problem. SIAM J. Comput. 40, 1 (2011), 165–199.
[50]
Ryan O’Donnell, Rocco A. Servedio, and Li-Yang Tan. 2019. Fooling polytopes. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC’19). ACM, 614–625.
[51]
Iosif Pinelis and Sergei Utev. 1985. Estimates of the moments of sums of independent random variables. Theor. Probab. Applic. 29, 3 (1985), 574–577.
[52]
Alexander Razborov. 2009. A simple proof of Bazzi’s theorem. ACM Trans. Comput. Theor. 1, 1 (2009), 3.
[53]
Haskell Rosenthal. 1970. On the subspaces of \(L^p\) ( \(p \gt 2\) ) spanned by sequences of independent random variables. Israel J. Math 8 (1970), 273–303.
[54]
Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. 1995. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8, 2 (1995), 223–250. SJDMECDOI:DOI:https://doi.org/10.1137/S089548019223872X
[55]
Rocco A. Servedio. 2007. Every linear threshold function has a low-weight approximator. Comput. Complex. 16, 2 (2007), 180–209.
[56]
Rocco A. Servedio and Li-Yang Tan. 2017. Fooling intersections of low-weight halfspaces. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS). 824–835.
[57]
Rocco A. Servedio and Li-Yang Tan. 2017. What circuit classes can be learned with non-trivial savings? In Proceedings of the 8th Innovations in Theoretical Computer Science (ITCS). 30:1–30:21.
[58]
Alexander A. Sherstov. 2013. The intersection of two halfspaces has high threshold degree. SIAM J. Comput. 42, 6 (2013), 2329–2374.
[59]
Alexander A. Sherstov. 2013. Optimal bounds for sign-representing the intersection of two halfspaces by polynomials. Combinatorica 33, 1 (2013), 73–96.
[60]
Michel Talagrand. 1996. How much are increasing sets positively correlated?Combinatorica 16, 2 (1996), 243–258. DOI:DOI:
[61]
Terence Tao. 2010. 254A Notes: Topics in random matrix theory. Retrieved from.
[62]
Terence Tao and Van Vu. 2012. The Littlewood–offord problem in high dimensions and a conjecture of Frankl and Füredi. Combinatorica 32, 3 (2012), 363–372.
[63]
Salil P. Vadhan. 2012. Pseudorandomness. Found. Trends Theoret. Comput. Sci. 7, 1-3 (2012), 1–336.
[64]
Santosh Vempala. 2010. A random-sampling-based algorithm for learning intersections of halfspaces. J. ACM 57, 6:32 (2010).

Index Terms

  1. Fooling Polytopes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 69, Issue 2
    April 2022
    362 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/3512347
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 January 2022
    Accepted: 01 April 2021
    Revised: 01 March 2021
    Received: 01 July 2020
    Published in JACM Volume 69, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Polytopes
    2. pseudorandom generators
    3. Littlewood-Offord
    4. central limit theorems

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • NSF
    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 1,189
      Total Downloads
    • Downloads (Last 12 months)450
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media