Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Approximate Degree, Weight, and Indistinguishability

Published: 04 March 2022 Publication History

Abstract

We prove that the OR function on {-1,1\}n can be pointwise approximated with error ε by a polynomial of degree O(k) and weight 2O(n log (1/ε)/k), for any k ≥ √n log (1/ε). This result is tight for any k ≤ (1-Ω (1))n. Previous results were either not tight or had ε = Ω (1). In general, we obtain a tight approximate degree-weight result for any symmetric function. Building on this, we also obtain an approximate degree-weight result for bounded-width CNF. For these two classes no such result was known.
We prove that the \(\mathsf {OR}\) function on \(\lbrace -1,1\rbrace ^n\) can be pointwise approximated with error \(\epsilon\) by a polynomial of degree \(O(k)\) and weight \(2^{O(n \log (1/\epsilon) /k)}\), for any \(k \ge \sqrt {n \log (1/\epsilon)}\). This result is tight for any \(k \le (1-\Omega (1))n\). Previous results were either not tight or had \(\epsilon = \Omega (1)\). In general, we obtain a tight approximate degree-weight result for any symmetric function. Building on this, we also obtain an approximate degree-weight result for bounded-width \(\mathsf {CNF}\). For these two classes no such result was known.
One motivation for such results comes from the study of indistinguishability. Two distributions \(P\), \(Q\) over \(n\)-bit strings are \((k,\delta)\)-indistinguishable if their projections on any \(k\) bits have statistical distance at most \(\delta\). The above approximations give values of \((k,\delta)\) that suffice to fool \(\mathsf {OR}\), symmetric functions, and bounded-width \(\mathsf {CNF}\), and the first result is tight for all \(k\) while the second result is tight for \(k \le (1-\Omega (1))n\). We also show that any two \((k, \delta)\)-indistinguishable distributions are \(O(n^{k/2}\delta)\)-close to two distributions that are \((k,0)\)-indistinguishable, improving the previous bound of \(O(n)^k \delta\). Finally, we present proofs of some known approximate degree lower bounds in the language of indistinguishability, which we find more intuitive.

1 Introduction

The idea of approximating Boolean functions pointwise using real-valued polynomials of “low complexity” has been a powerful tool in theoretical computer science. A natural notion of complexity of the polynomial is its degree, extensively studied since the seminal work by Nisan and Szegedy [25]. We can also consider the weight of the approximating polynomial, that is the sum of absolute value of its coefficients, which was studied under the name “spectral norm” in References [1, 2] for polynomials over \(\lbrace -1,1\rbrace\). Bounds on the weight of approximations have several applications, ranging from differential privacy [17] to attribute efficient learning [23, 29] and to indistinguishability [8].
In this article, we study weight in conjunction with degree, and we will show that we can typically trade degree with weight for approximating several large classes of functions over some basis. Previously, Bogdanov and Williamson [8] showed tight degree-weight tradeoffs for approximating \(\mathsf {OR}\) on \(\lbrace -1,1\rbrace ^n\) within constant error. To set context, recall that the approximate degree of \(\mathsf {OR}\) is \(\Theta (\sqrt n)\) [25]. Bogdanov and Williamson showed that \(\mathsf {OR}\) can be approximated with degree \(O(k)\) and weight \(2^{O(n/k)}\) for \(k \ge \sqrt {n}\), and this is tight. Prior to their results, Servedio et al. [29] showed \(w = 2^{\widetilde{O}(n/k)}\), Chandrasekaran et al. [17] showed \(w = 2^{\widetilde{O}\left(n\log ^2(1/\epsilon)/k\right)}\) for error \(\epsilon\) when \(k \ge \sqrt {n}\log (1/\epsilon)\), and Bun and Thaler [15] showed \(w = 2^{\Omega {(n/k)}}\). Apparently few degree-weight tradeoff results were known for arbitrary symmetric functions and non-symmetric functions.

1.1 Our Results: Approximate Degree-weight

We prove a tight result for approximating the \(\mathsf {OR}\) function. This refines the above-mentioned result in Reference [8] by including the dependency on the error \(\epsilon\), and the upper bound improves on the above-mentioned result in Reference [17] by getting a better dependence on \(\log (1/\epsilon)\) and removing the other log terms. Jumping ahead, this kind of improvement is critical to obtain our result for \(t\)-\(\mathsf {CNF}\), because we will need to set \(\epsilon\) exponentially small. However, our improvements do not seem to affect their applications.
We state the upper and lower bounds separately. We state several of our results as corollaries because they follow from theorems stated later, but we find it helpful to emphasize these special cases.
Corollary 1.1.
For every \(\epsilon\), \(n\), \(k\) satisfying \(\sqrt {n\log (1/\epsilon)} \le k \le n\), \(\mathsf {OR}_n\) can be \(\epsilon\)-approximated on \(\lbrace -1,1\rbrace ^n\) with degree \(O(k)\) and weight \(2^{O\left(n\log (1/\epsilon)/k\right)}\).
Corollary 1.2.
There exists a constant \(c\lt 1\) such that for every \(\epsilon\), \(n\), \(k \le n\), if a polynomial \(p\) \(\epsilon\)-approximates \(\mathsf {OR}_n\) on \(\lbrace -1,1\rbrace ^n\) with degree \(k\), then its weight is at least \(2^{(cn/k - 1)\log (1/\epsilon)}\).
Note that in the lower bound for \(k \le \frac{c}{2}n\), we have \(2^{(cn/k - 1)\log (1/\epsilon)} \ge 2^{cn\log (1/\epsilon)/2k}\), so the upper bound is tight for \(k \le (1-\Omega (1))n\).
These corollaries are special cases of the following results for symmetric functions. A function is symmetric if its value only depends on the Hamming weight of the input, i.e., the number of \(-1\)’s in the input on \(\lbrace -1,1\rbrace ^n\). For a symmetric function \(f_n\) with input length \(n\), let \(\tau (f_n)\) denote the smallest number \(t \in [0, \frac{n}{2}]\) such that \(f_n\) is constant on inputs of Hamming weight in \((t, n-t)\). For \(0 \le t \le \frac{n}{2}\), let \(\mathsf {SYM}_{n,t}\) denote the class of symmetric functions \(f_n\) with \(\tau (f_n) = t\). To set context, the \(\epsilon\)-approximate degree of any \(f \in \mathsf {SYM}_{n,t}\) is \(\Theta (\sqrt {n\left(\log (1/\epsilon)+ t\right)})\) [10, 28].
Again, we state the upper and lower bounds separately.
Theorem 1.3.
For every \(\epsilon\), \(n\), \(t\), \(k\) satisfying \(\sqrt {n\left(\log (1/\epsilon)+ t\right)} \le k \le n\), every function \(f \in \mathsf {SYM}_{n,t}\) can be \(\epsilon\)-approximated on \(\lbrace -1,1\rbrace ^n\) with degree \(O(k)\) and weight \(2^{O\left(n\left(\log (1/\epsilon)+ t\right) / k\right)}\).
Corollary 1.4.
There exists a constant \(c \lt 1\) such that for every \(\epsilon\), \(n\), \(t\), and \(k\) with \(k \le \frac{c}{2} n\) and \(0 \le t \le \frac{n}{2}\), if a polynomial \(p\) \(\epsilon\)-approximates \(f \in \mathsf {SYM}_{n,t}\) on \(\lbrace -1,1\rbrace ^n\) with degree \(k\), then its weight is at least \(2^{cn\left(\log (1/\epsilon)+ t\right)/2k}\).
Independently and concurrently, Bogdanov et al. [7Theorem 5] obtained a similar result with a weaker upper bound: For \(k \ge \sqrt {n\left(\log _n(1/\epsilon) + t\right) \log n}\), they got degree \(O(k)\) and weight \(2^{O\left(n\left(\log _n(1/\epsilon) + t\right)\log ^2 n / k\right)}.\)
From our proof of Theorem 1.3 we also get the following corollary about \(\mathsf {AND}\) that works on both \(\lbrace 0,1\rbrace ^n\) and \(\lbrace -1,1\rbrace ^n\). Jumping ahead, we will use the result on \(\lbrace 0,1\rbrace ^n\) to obtain our result for \(t\)-\(\mathsf {CNF}\). In contrast, \(\mathsf {OR}\) has approximate weight \(2^{\Omega (\sqrt {n})}\) on \(\lbrace 0,1\rbrace ^n\) (see Claim 3.3); hence, it is impossible to get approximate degree-weight tradeoffs for \(\mathsf {OR}_n\) over \(\lbrace 0,1\rbrace\). For more discussions about the two bases \(\lbrace 0,1\rbrace\) and \(\lbrace -1,1\rbrace\), please refer to Section 3.
Corollary 1.5.
For every \(\epsilon\), \(n\), \(k\) satisfying \(\sqrt {n\log (1/\epsilon)} \le k \le n\), \(\mathsf {AND}_n\) can be \(\epsilon\)-approximated on \(\lbrace 0,1\rbrace ^n\) and on \(\lbrace -1,1\rbrace ^n\) with degree \(O(k)\) and weight \(2^{O\left(n\log (1/\epsilon)/k\right)}\).
Note that \(\mathsf {AND}_n\) has constant weight on \(\lbrace 0,1\rbrace ^n\) with degree \(n\), matched by this result. Also note that by Corollary 1.2 and using De Morgan’s rule, this upper bound is tight over \(\lbrace -1,1\rbrace\) for \(k \le (1-\Omega (1))n\), thus also tight over \(\lbrace 0,1\rbrace\) for the same range of \(k\) (by Lemma 3.1).
We then move to non-symmetric functions. A \(t\)-\(\mathsf {CNF}\) is a \(\mathsf {CNF}\) with clauses of size \(t\). Sherstov [34Theorem 5.1] proved that the \(\epsilon\)-approximate degree of \(t\)-\(\mathsf {CNF}\) is \(O_t(n^\frac{t}{t+1}(\log (1/\epsilon))^{\frac{1}{t+1}})\). For \(t=2\) and constant \(\epsilon\) this is \(O(n^{2/3})\). We prove the following degree-weight approximation for \(t\)-\(\mathsf {CNF}\), which recovers Sherstov’s result when \(k = n^\frac{t}{t+1}(\log (1/\epsilon))^{\frac{1}{t+1}}\) and shows that the larger the degree \(k\) the smaller the weight \(w\) we can have, up to about \(w = 2^{O(n^{1-1/t})}\) for constant \(\epsilon\). For \(t=2\), the latter is \(2^{O(\sqrt n)}\).
Theorem 1.6.
For every \(\epsilon\), \(n\), \(t\), \(k\) satisfying \(n^\frac{t}{t+1}\left(\log (1/\epsilon)\right)^\frac{1}{t+1} \le k \le n\), there exists constant \(c_t\) depending on \(t\) such that any \(t\)-\(\mathsf {CNF}\) can be \(\epsilon\)-approximated on \(\lbrace -1,1\rbrace ^n\) with degree \(c_t \cdot k\) and weight \(2^{c_t \cdot n \left(\log (1/\epsilon)\right)^{1/t}/k^{1/t} }\).

1.2 Our Results: \((k,\delta)\)-indistinguishability

One of our motivations for these approximation results comes from our interest in indistinguishability. Two distributions on \(n\) bits are called \(k\)-wise indistinguishable if the marginals on any \(k\) bits are identical. It seems natural to ask which functions are fooled by \(k\)-wise indistinguishability, or in other words cannot distinguish any two \(k\)-wise indistinguishable distributions. Linear programming duality shows that \(k\)-wise indistinguishability fools a function \(f\) if and only if the approximate degree of \(f\) is at most \(k\) [6Theorem 1.1]. We will have more discussions about that in Section 1.3.
We study a natural relaxation of indistinguishability [6], defined next.
Definition 1.7.
Two distributions on \(n\) bits are \((k,\delta)\)-indistinguishable if the marginals on any \(k\) bits are \(\delta\)-close in statistical distance.
A function \(f :\lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) is \(\epsilon\)-fooled by \((k,\delta)\)-indistinguishability if for any two \((k,\delta)\)-indistinguishable distributions \(P\) and \(Q\) we have \(|\mathbb {E}[f(P)]-\mathbb {E}[f(Q)]|\le \epsilon\).
Actually, in the aforementioned paper, Bogdanov and Williamson [8] proved tradeoff results in terms of \((k,\delta)\)-indistinguishability. They showed that if \(f\) can be \(\epsilon\)-approximated on \(\lbrace -1,1\rbrace ^n\) with degree \(k\) and weight \(w\), then \(f\) is \(\epsilon\)-fooled by \((k, \delta)\)-indistinguishability for \(\delta = \epsilon /w\). Using this they showed that \(k \ge \sqrt {n}\), \((k,\delta)\)-indistinguishability fools \(\mathsf {OR}\) for any \(\delta = 2^{-O(n/k)}\). They also show that this is tight. However, as mentioned before, they only consider fooling by constant error.
Using this connection in Reference [8] (see also Theorem 2.4), tradeoffs between degree and weight imply tradeoffs between \(k\) and \(\delta\). Therefore the following “fools” theorems for \(\mathsf {OR}\), symmetric functions, and \(t\)-\(\mathsf {CNF}\)/\(\mathsf {DNF}\) follow from our degree-weight tradeoff upper bounds for approximating these functions.
Corollary 1.8.
For every \(\epsilon\), \(n\), and \(k\) satisfying \(\Omega (\sqrt {n \log (1/\epsilon)}) \le k \le n\), \((k, 2^{-O\left(n \log (1/\epsilon)/k\right)})\)-indistinguishability \(\epsilon\)-fools \(\mathsf {OR}_n\).
Corollary 1.9.
For every \(\epsilon\), \(n\), \(t\), \(k\) satisfying \(\Omega (\sqrt {n(\log (1/\epsilon)+ t)}) \le k \le n\), \((k, 2^{-O(n(\log (1/\epsilon)+ t)/k)})\)-indistinguishability \(\epsilon\)-fools any function \(f \in \mathsf {SYM}_{n,t}\).
Corollary 1.10.
For every \(\epsilon\), \(n\), \(t\), \(k\) satisfying \(n^\frac{t}{t+1}(\log (1/\epsilon))^{\frac{1}{t+1}} \le k \le n\), there exists constant \(c_t\) depending on \(t\) such that \((c_t \cdot k, 2^{-c_t \cdot n(\log (1/\epsilon))^{1/t}/k^{1/t}})\)-indistinguishability \(\epsilon\)-fools \(t\)-\(\mathsf {CNF}\)s and \(t\)-\(\mathsf {DNF}\)s on \(n\) variables.
We also prove the following “does not fool” theorems, matching the first two “fools” results. Corollary 1.11 shows that Corollary 1.8 is tight for all \(k \le n\), while Theorem 1.12 shows that Corollary 1.9 is tight for \(k\le (1-\Omega (1))n\). Using Theorem 2.4, they imply the degree-weight tradeoff lower bounds in Corollary 1.2 and 1.4. This is how the latter are proved in this article.
Corollary 1.11.
For every \(\epsilon\), \(n\), and \(k\), \((k, 2^{-\Omega \left(n \log (1/\epsilon)/ k\right)})\)-indistinguishability does not \(\epsilon\)-fool \(\mathsf {OR}_n\).
Theorem 1.12.
There exists a constant \(c^{\prime } \lt 1\) such that for every \(\epsilon\), \(n\), \(t\), and \(k\) with \(k \le c^{\prime } n\) and \(0 \le t \le \frac{n}{2}\), there exists function \(f \in \mathsf {SYM}_{n,t}\) such that \((k, 2^{-\Omega \left(n\left(\log (1/\epsilon)+ t\right)/k \right)})\)-indistinguishability does not \(\epsilon\)-fool \(f\).
The independent work by Bogdanov et al. [7Corollary 3], mentioned earlier, also obtained similar “does not fool” result for symmetric functions with constant \(\epsilon\) (moreover, unlike ours, their distributions do not depend on \(k\)).
We also improve the result in Reference [6Theorem D.1] about \(k\)-wise indistinguishability vs. \((k,\delta)\)-indistinguishability, analogous to the \(k\)-wise independence vs. almost \(k\)-wise independence results in Reference [3Theorem 2.1] and Reference [27Theorem 1.1]. This result is tight by the distributions given in Reference [27Theorem 1.2] when \(k\) is constant.
Theorem 1.13.
If \(P\) and \(Q\) are \((k, \delta)\)-indistinguishable, then they are \(O(n^{k/2}\delta)\)-close to \(P^{\prime }\) and \(Q^{\prime }\) that are \(k\)-wise indistinguishable.
The proof of this theorem incorporates an improvement suggested by an anonymous reviewer. Our original proof, which can be found in the ECCC version of this article [22], has similar structures, but it is based on Reference [27] thus has an extra \(e^k\) factor in the distance bound. It is natural to ask if the same improvement can be used to shave off the extra \(e^k\) factor in Reference [27] for the \(k\)-wise independence vs. almost \(k\)-wise independence problem. We provide a negative answer via a simple counterexample in Remark 7.1.

1.3 Our Results: Reproving Approximate Degree Lower Bounds in the Language of Indistinguishability

We suggest that indistinguishability could be a more convenient framework to prove approximate degree lower bounds. To illustrate, we reprove the following known approximate-degree lower bounds in the language of indistinguishability (for the first item, this presentation already appeared in lecture notes [35Lecture 8–9]; independently, it was proved in a similar way by Bogdanov [5Theorem 6]). We find the proofs more intuitive than the originals.
Claim 1.14.
Let \(\widetilde{\deg _\epsilon }(f)\) denote the \(\epsilon\)-approximate degree of \(f\) over \(\lbrace 0,1\rbrace\). Then
(1)
\(\widetilde{\deg _{1/3}}(\mathsf {AND}_m \circ \mathsf {OR}_n) = \Omega (\sqrt {mn})\) [14, 32];
(2)
\(\widetilde{\deg _\epsilon }(\mathsf {AND}_m \circ \mathsf {OR}_n) = \Omega (\sqrt {n})\) for \(\epsilon = 1/2 - 2^{-\Theta (m)}\) [15];
(3)
\(\widetilde{\deg _\epsilon }(\mathsf {GapMAJ}_m \circ f_n) = \Omega (\widetilde{\deg _{1/3}}(f_n))\) for \(\epsilon = 1/2 - 2^{-\Theta (m)}\) [9Theorem 3.2] where \(\mathsf {GapMAJ}_m\) is any function that outputs 1 on inputs of Hamming weight at least \(\frac{2}{3}m\) and 0 on inputs of Hamming weight at most \(\frac{1}{3}m\);
(4)
\(\widetilde{\deg _{1/3}}(g_m \circ f_n) = \Omega \left(\widetilde{\deg _{1/3}}(g_m) \cdot \widetilde{\deg _\epsilon }(f_n)\right)\) for \(\epsilon = 1/2 - 1/m^\alpha\) with \(\alpha \gt 1\) [33Theorem 3.3];
(5)
\(\widetilde{\deg _{\epsilon ^m}}(\mathsf {XOR}_m \circ f) = \Omega (m \cdot \widetilde{\deg _\epsilon }(f))\) and \(\widetilde{\deg _{\epsilon ^m}}(\mathsf {AND}_m \circ f) = \Omega (m \cdot \widetilde{\deg _\epsilon }(f))\) [31Lemma 3.9];
It will be interesting to present the proofs of more advanced results, such as the \(\mathsf {AC}^0\) lower bound [16] and the surjectivity lower bound [13], in this language.

1.4 Techniques

Existence of low degree-weight polynomials. As observed in References [8, 29], the Chebyshev polynomials \(T_d\) has degree \(d\) and weight \(2^{O(d)}\), and by composing it with the monomial \(x^{k/d}\), which has high-degree but weight just one, we can get a polynomial \(T_d(x^{k/d})\) with larger degree \(O(k)\), whose weight is only \(2^{O(d)}\), and maintains some of the properties similar to those we are looking for from \(T_k(x)\). For example it is bounded on \([0,1]\) and has derivative \(\ge d^2 \cdot \frac{k}{d} \ge kd\) for \(x \ge 1\).
At a high level, Theorem 4.1 follows by applying such an idea to the construction by Sherstov [34Theorem 4.6] for \(\mathsf {EXACT}_{n,n-t}\), where
\begin{align*} \mathsf {EXACT}_{n,r}(x) = \left\lbrace \begin{array}{ll} 1, & \mbox{ if the Hamming weight of $x$ is $r$},\\ 0, & \mbox{ otherwise. }\end{array}\right. \end{align*}
The crux of his construction is to first \(\epsilon\)-approximate inputs of Hamming weight \(\lbrace 0, 1, \dots , n-\ell \rbrace \cup \lbrace n-t\rbrace\) with a number \(\ell\) that will be quite large for small \(\epsilon\), by a simple application of Chebyshev polynomials. Then he multiplies this approximant with a set of auxiliary polynomials, each zeroing out the value of the approximant on inputs of one specific Hamming weight in \(\lbrace n-\ell +1, \dots , n-t-1\rbrace \cup \lbrace n-t+1, \dots , n\rbrace\), also using Chebyshev polynomials but with carefully designed “shifts,” i.e., \(T_k(a x + b)\) with suitable \(a\) and \(b\). For the first part, we basically replace the usage of \(T_k(x)\) by \(T_d(x^{k/d})\) to get degree-weight tradeoff for a suitable \(d\). For the second part, we use \(T_d(a x^{k/d} + b)\) instead, and we need to prove different bounds on \(a\) and \(b\). It is also more involved to combine these two parts in our proof, as we have to carefully choose \(k\) for each auxiliary polynomial to satisfy the degree and weight constraints. Finally, Theorem 1.3 follows from Theorem 4.1 on \(\lbrace -1,1\rbrace ^n\) by writing symmetric functions as linear combinations of \(\mathsf {EXACT}\)s, and in particular we get Corollary 1.1 as \(\mathsf {OR}_n \in \mathsf {SYM}_{n,0}\).
For Theorem 1.6, our proof is simpler than Sherstov’s for \(t\)-\(\mathsf {CNF}\) [30Theorem 5.1], as we are only considering approximating them on \(\lbrace -1,1\rbrace ^n\) instead of \(\lbrace -1,1\rbrace ^m_{\le n}\). The latter means that the Hamming weight is restricted to be at most \(n\) but the input length \(m\) could be much larger. Essentially his construction decomposes a \(t\)-\(\mathsf {CNF}\) into an \(\mathsf {AND}\) composed with \((t-1)\)-\(\mathsf {CNF}\)s inductively for each \(t\). Therefore we use polynomials with degree-weight tradeoff as an outer approximation for the \(\mathsf {AND}\) function and inner approximations for the \((t-1)\)-\(\mathsf {CNF}\)s, and by tweaking the parameters we get a polynomial with degree-weight tradeoff. As mentioned earlier, we exploit the good dependence on \(\epsilon\) in Corollary 1.5 for \(\mathsf {AND}_n\) as we need to set \(\epsilon\) exponentially small in the inner approximations. We also exploit the fact that Corollary 1.5 works over \(\lbrace 0,1\rbrace\) to compose our inner and outer approximations.
“Does not fool” theorems. The notion of fooling by \((k,\delta)\)-indistinguishability does not seem to have a dual characterization, because there does not seem to be a way to express statistical tests in the dual LP. Indeed in Theorem 2.4 while we are considering the weight of the approximations in the dual, we are essentially restricting the statistical tests to the parity tests in the primal, which are not equivalent and can be separated easily for small-bias distributions [24]. Therefore degree-weight tradeoff lower bounds do not imply “does not fool” results. Instead, we use a different method.
For Corollary 1.11 and Theorem 1.12 we reduce it to the case of \(k\)-wise indistinguishability (that is \(\delta =0\)) by Lemma 5.1, generalizing the proof in Reference [8Lemma 10], which only works for \(\mathsf {OR}\). By inserting 0’s into some random indices, we generate \((k, \delta)\)-indistinguishable distributions from \(k^{\prime }\)-wise indistinguishable distributions while preserving their Hamming weight, for suitable settings of \(k\), \(\delta\), and \(k^{\prime }\). Then the result follows from approximate degree lower bound of symmetric functions.

1.5 Organization

In Section 2, we provide some useful facts about Chebyshev polynomials, weights, and its connection to \((k,\delta)\)-indistinguishability. We discuss about the two bases \(\lbrace 0,1\rbrace\) and \(\lbrace -1,1\rbrace\) in Section 3. In Section 4, we prove Theorem 1.3 for symmetric functions and in particular Corollary 1.1 for \(\mathsf {OR}\), thus also proving Corollary 1.9 and 1.8; we also get Corollary 1.5. We prove matching lower bounds (Corollary 1.2, 1.4, 1.11, Theorem 1.12) in Section 5. In Section 6, we prove Theorem 1.6 and Corollary 1.10 for \(t\)-\(\mathsf {CNF}\). In Section 7, we prove Theorem 1.13. We prove Claim 1.14 in Section 8. Finally, we list some open problems in Section 9.

2 Preliminaries

Let \([n]\) denote the set \(\lbrace 1, 2, \dots , n\rbrace\).
Weight of polynomials. We define the weight of a polynomial \(p\) as the sum of absolute values of the coefficients of \(p\), denoted by \(\vert \!\vert \!\vert p \vert \!\vert \!\vert\). Specifically, on \(\lbrace -1,1\rbrace ^n\), \(\vert \!\vert \!\vert p \vert \!\vert \!\vert\) is the \(\ell _1\) Fourier weight of \(p\) (c.f. [26]). Generally it has the following properties.
Claim 2.1 ([34Fact 2.7]).
For any polynomials \(p\) and \(q\),
\(\vert \!\vert \!\vert ap \vert \!\vert \!\vert = \vert a \vert \cdot \vert \!\vert \!\vert p \vert \!\vert \!\vert\) for any \(a \in \mathbb {R}\);
\(\vert \!\vert \!\vert p + q \vert \!\vert \!\vert \le \vert \!\vert \!\vert p \vert \!\vert \!\vert + \vert \!\vert \!\vert q \vert \!\vert \!\vert\);
\(\vert \!\vert \!\vert p \cdot q \vert \!\vert \!\vert \le \vert \!\vert \!\vert p \vert \!\vert \!\vert \cdot \vert \!\vert \!\vert q \vert \!\vert \!\vert\).
Claim 2.2 ([8Fact 8, 9]).
For any univariate polynomial \(p\) of degree \(k\),
(1)
if \(q(x) = p(ax^t+b)\) where \(|a| + |b| \ge 1\) and \(t \ge 1\), then \(\vert \!\vert \!\vert q \vert \!\vert \!\vert \le (|a| + |b|)^k \vert \!\vert \!\vert p \vert \!\vert \!\vert\);
(2)
if \(q(x_1, \dots , x_n) = p(\sum _{i = 1}^n x_i / n)\) then \(\vert \!\vert \!\vert q \vert \!\vert \!\vert \le \vert \!\vert \!\vert p \vert \!\vert \!\vert\).
Chebyshev polynomials. Chebyshev polynomials (cf. Reference [19]), denoted as \(T_d\) for each degree \(d\), is a sequence of orthogonal univariate polynomials that can be uniquely defined by \(T_d(\cos x) = \cos dx\) for each \(d\). Its value is given by
\begin{equation*} T_d(x) = \frac{1}{2}\left(\left(x + \sqrt {x^2 - 1}\right)^d + \left(x - \sqrt {x^2 - 1}\right)^d\right). \end{equation*}
Claim 2.3 (cf. References [8, 34]).
For the degree-\(d\) Chebyshev polynomial \(T_d\), we have the following properties:
(1)
\(T_d(1) = 1\);
(2)
\(T_d\left(\cos \left(\frac{2i-1}{2d} \pi \right)\right) = 0\), for \(i \in [d]\);
(3)
\(\vert T_d(z) \vert \le 1\) for \(z \in [-1,1]\);
(4)
\(T^{\prime }_d(t) \ge d^2\) for \(t \in [1, \infty)\), so \(T_d\) is monotonically increasing on \([1, \infty)\);
(5)
\(T_d(1 + \delta) \ge 2^{d\sqrt {\delta } - 1}\) for \(\delta \in [0,1]\);
(6)
\(\vert \!\vert \!\vert T_d \vert \!\vert \!\vert \le 2^{2d}\).
Fooling by \((k,\delta)\)-Indistinguishability. The following theorem shows that low-degree low-weight approximation implies fooling by \((k,\delta)\)-indistinguishability.
Theorem 2.4 (Implicit in Reference [8Lemma 7]).
Given any function \(f :\lbrace -1,1\rbrace ^n \rightarrow \mathbb {R}\), for any \(k\) and \(\delta\) we have

3 Fourier Vs. Boolean Basis

Usually we approximate the functions over the Fourier basis \(\lbrace -1,1\rbrace\), where 1 represents \(\mathsf {False}\) and \(-1\) represents \(\mathsf {True}\). Alternatively we can use 0 for \(\mathsf {False}\) and 1 for \(\mathsf {True}\), called the Boolean basis. In some cases the Fourier basis is more convenient as negation of variables becomes negation of values; while in some other case the Boolean basis is more convenient as multiplication is equivalent to \(\mathsf {AND}\).
The degree of a polynomial is basis invariant, but the weight is not. The following lemma shows that we can always convert a polynomial on the Boolean basis into one on the Fourier basis without increasing the weight.
Lemma 3.1.
For any polynomial \(f:\lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) on the Boolean input basis, we have a polynomial representing the same function on the Fourier input basis with at most the same weight.
Proof.
Define \(g :\lbrace -1,1\rbrace ^n \rightarrow \mathbb {R}\) by \(g(x) = f(\frac{1-x_1}{2}, \frac{1-x_2}{2},\dots , \frac{1-x_n}{2})\), and the result follows from a multivariate version of Claim 2.2 (1) with \(|a| + |b| = 1\).□
The other direction is not always true. For example, as shown in the following claims, \(\mathsf {PARITY}\) and \(\mathsf {OR}\) have large weights on \(\lbrace 0,1\rbrace ^n\), even though \(\mathsf {PARITY}\) has constant weight on \(\lbrace -1,1\rbrace ^n\) with degree \(n\), and \(\mathsf {OR}\) has much smaller weight for large degrees on \(\lbrace -1,1\rbrace ^n\) as shown in Theorem 1.1. Claim 3.3 shows that it is impossible to get approximate degree-weight tradeoffs for \(\mathsf {OR}_n\) over \(\lbrace 0,1\rbrace\). In this sense, approximate degree-weight tradeoff upper bounds on \(\lbrace 0,1\rbrace ^n\) are stronger than those on \(\lbrace -1,1\rbrace ^n\).
Claim 3.2.
For any \(\epsilon \in (0,1)\), the weight of any polynomial \(f\) that \(\epsilon\)-approximates \(\mathsf {PARITY} :\) \(\lbrace 0,1\rbrace ^n \rightarrow \lbrace -1,1\rbrace\) is at least \((1-\epsilon) 2^n\).1
Claim 3.3.
For any fixed \(\epsilon \le \frac{1}{3}\), the weight of any polynomial \(f\) that \(\epsilon\)-approximates \(\mathsf {OR}:\lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) is \(2^{\Omega (\sqrt {n})}\).
Proof of Claim 3.2
Suppose \(f:\lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) \(\epsilon\)-approximates \(\mathsf {PARITY}\) and minimizes the weight. On one hand, we have \(\frac{1}{2^n} \sum _{x \in \lbrace 0,1\rbrace ^n}f(x)\mathsf {PARITY}(x) \in [1-\epsilon , 1+\epsilon ]\). On the other hand, write \(f\) as \(f = \sum _{S\subseteq [n]} c_S \prod _{i\in S} x_i\), then
\begin{align*} \sum _{x \in \lbrace 0,1\rbrace ^n}f(x)\mathsf {PARITY}(x) = \sum _{T \subseteq [n]} (-1)^{|T|} \sum _{S\subseteq T} c_S = \sum _{S \subseteq [n]} c_S \sum _{T \supseteq S} (-1)^{|T|} = (-1)^n c_{[n]}, \end{align*}
where the last step comes from the fact that whenever \(S \subsetneqq [n]\) we have \(\sum _{T \supseteq S} (-1)^{|T|} = 0\) as we can arrange such \(T\)’s into matching pairs that has exactly opposite value of \((-1)^{|T|}\).
Therefore, we have \(\vert \!\vert \!\vert f \vert \!\vert \!\vert \ge |c_{[n]}| \ge (1-\epsilon)2^n\).□
For the proof of Claim 3.3, we need the following result in Reference [11] about the set disjointness function \(\mathsf {DISJ}(x_1\cdots x_n, y_1\cdots y_n)\), where \(x_1\cdots x_n\) and \(y_1\cdots y_n\) are interpreted as indicators of two sets \(X, Y \subseteq [n]\), respectively, and \(\mathsf {DISJ}(X, Y) = 1\) iff \(X \cap Y = \varnothing\).
Theorem 3.4 (Immediate Corollary of [11Theorem 8]).
Any polynomial that \(\frac{1}{3}\)-approximates \(\mathsf {DISJ}(x_1\cdots x_n, y_1\cdots y_n)\) has at least \(2^{\sqrt {n/12}}\) monomials.
Proof of Claim 3.3
Let \(f\) be the polynomial that \(\epsilon\)-approximates \(\mathsf {OR}\) and minimizes the weight. Let \(w = \vert \!\vert \!\vert f \vert \!\vert \!\vert\). By a sampling argument [18Lemma 2.8] we can get a polynomial \(g :\lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) such that \(g\) \(\frac{1}{3}\)-approximates \(\mathsf {OR}\) and \(g\) has \(O(w^2n)\) monomials. Now define \(h :\lbrace 0,1\rbrace ^n \times \lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) by \(h(x_1\cdots x_n, y_1\cdots y_n) = 1 - g(x_1y_1, \dots , x_ny_n)\), and then \(h\) \(\frac{1}{3}\)-approximates \(\mathsf {DISJ}\) and it has \(O(w^2n)\) monomials. By Theorem 3.4, we get \(w = 2^{\Omega (\sqrt {n})}\).□

4 Proofs of Theorem 1.3 and Corollaries 1.1, 1.5, 1.8, and 1.9

To prove Theorem 1.3 for symmetric functions on \(\lbrace -1,1\rbrace ^n\), we first prove the following result on \(\lbrace 0,1\rbrace ^n\) for \(\mathsf {EXACT}_{n,n-t}\).
Theorem 4.1.
For every \(\epsilon\), \(k\), \(n\), \(t\) such that \(\sqrt {n \left(\log (1/\epsilon)+ t\right)} \le k \le n\) and \(0 \le t \le \frac{n}{2}\), there is a polynomial \(p :\lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) that \(\epsilon\)-approximates \(\mathsf {EXACT}_{n,n-t}\) with degree \(O(k)\) and weight \(2^{O\left(n\left(\log (1/\epsilon)+t\right)/k\right)}\).
Note that both \(\mathsf {EXACT}_{n,t}\) and \(\mathsf {EXACT}_{n,n-t}\) belong to the class \(\mathsf {SYM}_{n,t}\) for \(0 \le t \le \frac{n}{2}\), but on \(\lbrace 0,1\rbrace ^n\) we can only prove the theorem for \(\mathsf {EXACT}_{n,n-t}\) due to the fact that \(\mathsf {EXACT}_{n,t} = 1 - \mathsf {OR}_n\) for \(t = 0\) and the aforementioned fact that \(\mathsf {OR}\) requires large approximate weight on \(\lbrace 0,1\rbrace ^n\). A straightforward corollary (using Lemma 3.1) is that the same parameters also work for \(\lbrace -1,1\rbrace ^n\). In particular, as \(\mathsf {AND}_n = \mathsf {EXACT}_{n,n}\), we get Corollary 1.5.
Warm-up for proving Theorem 4.1. Our goal is to construct a univariate polynomial \(p^*\) to “\(\epsilon\)-approximate” the symmetrized version of \(\mathsf {EXACT}_{n,n-t}\), meaning that \(\vert p^*(\frac{i}{n}) - 1 \vert \le \epsilon\) for \(i = n-t\) and \(\vert p^* (\frac{i}{n}) \vert \le \epsilon\) for all \(i \in [n] \setminus \lbrace n-t\rbrace\). Besides, \(p^*\) must have the desired degree and weight.
We can already get a good approximation using Chebyshev polynomials. Let
\begin{equation*} \ell = \log \frac{2}{\epsilon } + t, \quad d = \sqrt {n \ell }. \end{equation*}
If \(n \le \ell\), then \(d \ge n\) and the result is trivial, so we can always assume that \(n - \ell \gt 0\). Without loss of generality assume \(d \in \mathbb {N}\), define univariate polynomials \(q_0\) and \(p_0\) by
\begin{equation*} q_0(z) = T_d\left(\frac{n}{n-\ell } \cdot z\right), \quad p_0(z) = q_0(z) / q_0\left(\frac{n-t}{n}\right). \end{equation*}
For \(z \in [0, \frac{n - \ell }{n}]\), we have \(\frac{n}{n-\ell } \cdot z \le 1\), and thus by Claim 2.3 (3) we have \(\left|q_0\left(\frac{i}{n}\right) \right|\le 1 \mbox{ for all } i = 0, 1, \dots , n-\ell\). The value of \(q_0\left(\frac{n-t}{n}\right)\) is also large enough, as by Claim 2.3 (5) we have
\begin{equation*} q_0\left(\frac{n-t}{n}\right) = T_d\left(1 + \frac{\ell - t}{n - \ell }\right) \ge 2^{\sqrt {n\ell }\sqrt {\frac{\ell - t}{n - \ell }} - 1} \ge 2^{(\ell - t) - 1} = \frac{1}{\epsilon }, \end{equation*}
where the third step uses \(\frac{n}{n-\ell } \ge 1\) and \(\ell \ge \ell -t\). Therefore
\begin{align*} \left|p_0\left(\frac{i}{n}\right)\right|&\le \epsilon , &\mbox{ for all } i = 0,1, \dots , n-\ell ,\\ p_0\left(\frac{i}{n}\right) &= 1, &\mbox{ for } i = n-t, \end{align*}
thus \(p_0\) is a good approximation for these \(i\)’s. We have \(\deg (p_0) = d\) and \(\vert \!\vert \!\vert p_0 \vert \!\vert \!\vert = 2^{O(d)}\), which are fixed by \(n\), \(t\), and \(\epsilon\).
To get approximations that have degree-weight tradeoff, we would decrease \(d\) and increase the power of \(\frac{n}{n-\ell } \cdot z\) inside \(T_d\) accordingly. We use the same \(\ell\), and for any \(k \ge \sqrt {n\ell }\), let
\begin{equation*} d = \frac{n\ell }{k}, \end{equation*}
thus \(d\) decreases when \(k\) increases. Without loss of generality assume \(d, \frac{k}{d} \in \mathbb {N}\), define
\begin{align*} q_1(z) = T_d\left(\left(\frac{n}{n-\ell }\cdot z\right)^\frac{k}{d}\right), \quad p_1(z) = q_1(z) / q_1\left(\frac{n-t}{n}\right). \end{align*}
Similarly, we have \(\left|q_1\left(\frac{i}{n}\right) \right|\le 1 \mbox{ for all } i = 0, 1, \dots , n-\ell\). We also get
\begin{align*} q_1\left(\frac{n-t}{n}\right) &= T_d\left(\left(1 + \frac{\ell - t}{n - \ell }\right)^\frac{k}{d}\right) \ge T_d\left(1 + \frac{\ell - t}{n - \ell }\cdot \frac{k}{d}\right) \ge 2^{d \sqrt {\frac{\ell - t}{n - \ell } \frac{k}{d} } - 1} \ge 2^{(\ell - t) -1} = \frac{1}{\epsilon }, \end{align*}
where the second step uses Bernoulli’s inequality and Claim 2.3 (4), the third step uses Claim 2.3 (5), the fourth step uses \(\sqrt {kd} = \sqrt {n\ell } \ge \sqrt {n(\ell -t)}\) and \(\frac{n}{n-\ell } \ge 1\). Therefore, \(p_1\) is a good approximation for some \(i\), namely
\begin{align} \left|p_1\left(\frac{i}{n}\right)\right|&\le \epsilon , &\mbox{ for all } i = 0,1, \dots , n-\ell , \end{align}
(1)
\begin{align} p_1\left(\frac{i}{n}\right) &= 1, &\mbox{ for } i = n-t. \end{align}
(2)
The degree of \(q_1\) and \(p_1\) is at most \(d\cdot \frac{k}{d} = k\). Now we are going to bound their weights. We can write \(q_1(z) = T_d(a \cdot z^{k/d})\) with \(a = \left(\frac{n}{n-\ell }\right)^{k/d}\), so by Claim 2.3 (6) and Claim 2.2 (1),
\begin{align*} \vert \!\vert \!\vert q_1 \vert \!\vert \!\vert \le \vert \!\vert \!\vert T_d \vert \!\vert \!\vert \cdot a^d = 2^{O(d)} \cdot \left(1 + \frac{\ell }{n-\ell }\right)^k \le 2^{O(d)} \cdot e^{\frac{k\ell }{n-\ell }}, \end{align*}
where the last step uses \(1 + x \le e^x\) for all \(x\in \mathbb {R}\). We can assume \(\ell \le \frac{3}{4}n\); otherwise, \(\sqrt {n\ell } = \Omega (n)\) thus \(d = \Omega (n)\) so we can simply use \(p_0\). Thus we have \(n(n-\ell) \ge \frac{1}{4}n^2 \ge \frac{1}{4} k^2\), so \(\frac{k\ell }{n-\ell } = O\left(\frac{n\ell }{k}\right) = O(d)\), therefore \(\vert \!\vert \!\vert q_1 \vert \!\vert \!\vert \le 2^{O(d)}\), which is the same as \(\vert \!\vert \!\vert T_d \vert \!\vert \!\vert\) up to the \(O(\cdot)\) in the exponent. In other words, we can ignore effect of the scaling term \(\frac{n}{n-\ell }\) to the weight.
Consequently,
\begin{align} \vert \!\vert \!\vert p_1 \vert \!\vert \!\vert \le \vert \!\vert \!\vert q_1 \vert \!\vert \!\vert \cdot \epsilon \le 2^{O(d)} = 2^{O\left(n\left(\log (1/\epsilon)+ t\right) / k\right)}. \end{align}
(3)
Therefore \(p_1\) has the degree-weight tradeoff we need in Theorem 4.1.
The problem of \(p_1\) is that for \(i = n - \ell + 1, \dots , n-t-1\) and \(n-t+1, \dots , n\), we have no bound on its value. We need the following construction of auxiliary polynomials \(T_{n,m}^{(k)}\) with degree-weight tradeoff that can be made zero on some specific points. Multiplying \(p_1\) by such \(T_{n,m}^{(k)}\), we can zero out the value on those \(i\)’s and get the desired approximations.
Lemma 4.2.
For every \(n\), \(m\), \(k\) such that \(0 \le m \lt n\) and \(\sqrt {\frac{n}{n-m}}\le k \le \frac{n}{n-m}\), there is a univariate polynomial \(T^{(k)}_{n,m}\) of degree \(O(k)\) and weight \(2^{O\left(\frac{n}{k(n-m)}\right)}\) such that
\begin{align} T^{(k)}_{n,m} (1) &= 1, \end{align}
(4)
\begin{align} T^{(k)}_{n,m}\left(\frac{m}{n}\right) &= 0, \end{align}
(5)
\begin{align} \left|T^{(k)}_{n,m}(z) \right|&\le 1, \mbox{ for any } z \in [0,1]. \end{align}
(6)
The case when \(k = \sqrt {\frac{n}{n-m}}\) was proved in Reference [34Lemma 4.4]. We generalize it to provide degree-weight tradeoff via \(k\). We postpone the proof to the end of the section. Building on the warm-up and assuming the lemma, we now present the proof of Theorem 4.1.
Proof of Theorem 4.1
Now we still let \(\ell = \log \frac{2}{\epsilon } + t\) and \(d = \frac{n\ell }{k}\) for any \(k \ge \sqrt {n\ell }\). Define the following univariate polynomial \(p^*\) by
\begin{align*} p^*(z) = p_1(z) \prod _{m = n - \ell }^{n-t-1} T_{n-t,m}^{\left(k^{\prime }_m\right)}\left(\frac{n}{n-t} \cdot z\right) \prod _{m = n-t+1}^n \left(1-\left(T_{m,n-t}^{(k^{\prime \prime }_m)}\left(\frac{n}{m} \cdot z \right)\right)^2\right), \end{align*}
using the auxiliary polynomials from Lemma 4.2, where
\begin{align*} k^{\prime }_m &= k \sqrt {\frac{n-t}{n\ell (n-t-m)}}, &\mbox{ for } m = n-\ell , \dots , n-t-1,\\ k^{\prime \prime }_m &= k\sqrt {\frac{m}{nt(m-n+t)}}, &\mbox{ for } m = n-t+1, \dots , n. \end{align*}
First, we need to show that our applications of Lemma 4.2 are legitimate.
\(\sqrt {\frac{n-t}{n-t-m}} \le k^{\prime }_m \le \frac{n-t}{n-t-m}\) for \(n - \ell \le m \le n-t-1\): On one hand, we have \(k \ge \sqrt {n\ell }\) so \(k^{\prime }_m \ge \sqrt {\frac{n-t}{n-t-m}}\). On the other hand, we have \(n-m \le \ell\) and \(t \lt \ell \le n\), so \(\frac{n}{\ell } \le \frac{n-t}{\ell - t } \le \frac{n-t}{n-t-m}\), and thus by \(k \le n\) we have \(k^{\prime }_m \le \sqrt {\frac{n(n-t)}{\ell (n-t-m)}} \le \frac{n-t}{n-t-m}\).
\(\sqrt {\frac{m}{m-n+t}} \le k^{\prime \prime }_m \le \frac{m}{m-n+t}\) for \(n - t + 1 \le m \le n\): On one hand, we have \(k \ge \sqrt {n\ell }\) and \(\ell \ge t\) so \(k^{\prime \prime }_m \ge \sqrt {\frac{\ell m}{t(m-n+t)}} \ge \sqrt {\frac{m}{m-n+t}}\). On the other hand, we have \(n-m\lt t \le n\), so \(\frac{n}{t} \le \frac{n-(n-m)}{t-(n-m)} = \frac{m}{m-n+t}\), and thus by \(k \le n\) we have \(k^{\prime \prime }_m \le \sqrt {\frac{nm}{t(m-n+t)}} \le \frac{m}{m-n+t}\).
Second, we are going to show that \(p^*\) is a good approximation.
\(|p^*(\frac{i}{n})| \le \epsilon\) for all \(i = 0, 1, \dots , n-\ell -1\): We have \(\frac{n}{n-t} \cdot \frac{i}{n} \lt \frac{n}{n-t} \cdot \frac{n - \ell }{n} \le \frac{n}{n-t} \cdot \frac{n - t}{n} = 1\), thus by Lemma 4.2 (6) we have
\begin{align*} \left|T_{n-t,m}^{\left(k^{\prime }_m\right)}\Big (\frac{n}{n-t} \cdot \frac{i}{n}\Big)\right|&\le 1, &\mbox{ for all } n - \ell \le m \le n - t - 1. \end{align*}
For \(n- t + 1 \le m \le n\), we have \(\frac{n}{m} \cdot \frac{i}{n} \le \frac{n}{n-t+1} \cdot \frac{n-\ell -1}{n} \le \frac{n}{n-t+1} \cdot \frac{n-t-1}{n} \lt 1\), and thus similarly we have \(\left|T_{m,n-t}^{(k^{\prime \prime }_m)}\left(\frac{n}{m} \cdot \frac{i}{n}\right) \right|\le 1\) so
\begin{align*} \left|1-\left(T_{m,n-t}^{(k^{\prime \prime }_m)}\left(\frac{n}{m} \cdot \frac{i}{n} \right)\right)^2 \right|&\le 1 - 0^2 = 1, &\mbox{ for all } n- t + 1 \le m \le n. \end{align*}
Combining with Equation (1), we get \(|p^*(\frac{i}{n})| \le \epsilon\).
\(p^*({\frac{i}{n}}) = 0\) for all \(i = n-\ell , \dots , n-t-1\): By Lemma 4.2 (5), for \(m = i\) we have \(T_{n-t,m}^{\left(k^{\prime }_m\right)}({\frac{n}{n-t} \cdot \frac{i}{n}}) = T_{n-t,m}^{\left(k^{\prime }_m\right)}({\frac{m}{n-t}}) = 0\).
\(p^*({\frac{i}{n}}) = 1\) for \(i = n - t\): By Lemma 4.2 (4), for all \(n-\ell \le m \le n-t-1\) we have \(T_{n-t,m}^{\left(k^{\prime }_m\right)}({\frac{n}{n-t} \cdot \frac{i}{n}}) = T_{n-t,m}^{\left(k^{\prime }_m\right)}(1) = 1\). By Lemma 4.2 (5), for all \(n-t+1 \le m \le n\) we have \(T_{m,n-t}^{(k^{\prime \prime }_m)}({\frac{n}{m} \cdot \frac{i}{n}}) = T_{m,n-t}^{(k^{\prime \prime }_m)} ({\frac{n-t}{m}}) = 0\) thus \(1-({T_{m,n-t}^{(k^{\prime \prime }_m)}({\frac{n}{m} \cdot \frac{i}{n}})})^2 = 1\). Combining with Equation (2) we get \(p^*({\frac{i}{n}}) = 1\).
\(p^*({\frac{i}{n}}) = 0\) for all \(i = n - t +1, \dots , n\): By Lemma 4.2 (4), for \(m = i\) we have \(T_{m,n-t}^{(k^{\prime \prime }_m)}({\frac{n}{m} \cdot \frac{i}{n}}) = T_{m,n-t}^{(k^{\prime \prime }_m)}(1) = 1\) thus \(1-({T_{m,n-t}^{(k^{\prime \prime }_m)}({\frac{n}{m} \cdot \frac{i}{n} })})^2 = 0\).
Therefore,
\begin{align} p^*\left(\frac{i}{n}\right) &= 1, &\mbox{ for } i = n-t, \end{align}
(7)
\begin{align} \left|p^*\left(\frac{i}{n}\right) \right|&\le \epsilon , &\mbox{ for } i = 0, \dots , n-t-1, n-t+1, \dots , n. \end{align}
(8)
Now we are going to bound the degree and weight of \(p^*\). We have \(k^{\prime }_m = k \sqrt {\frac{n-t}{n\ell (n-t-m)}} \le k\frac{1}{\sqrt {\ell (n-t-m)}}\) and \(k^{\prime \prime }_m = k\sqrt {\frac{m}{nt(m-n+t)}} \le k \frac{1}{\sqrt {t(m-n+t)}}\). By Lemma 4.2,
\begin{align} \deg (p^*) &\le k + \sum _{m = n-\ell }^{n-t-1} O\left(k^{\prime }_m\right) + \sum _{m = n-t+1}^n O\left(k^{\prime \prime }_m\right) \nonumber \nonumber\\ &\le O\left(k + \frac{k}{\sqrt {\ell }}\sum _{m = n-\ell }^{n - t - 1} \frac{1}{\sqrt {n-t-m}} + \frac{k}{\sqrt {t}}\sum _{m = n-t+1}^{n} \frac{1}{\sqrt {m-n+t}}\right) \nonumber \nonumber\\ &= O\left(k + \frac{k}{\sqrt {\ell }}\sum _{i = 1}^{\ell - t} \frac{1}{\sqrt {i}} + \frac{k}{\sqrt {t}}\sum _{i = 1}^{t} \frac{1}{\sqrt {i}}\right) \nonumber \nonumber\\ &= O(k), \end{align}
(9)
where in the last step we use \(\sum _{i = 1}^n \frac{1}{\sqrt {i}} = O(\sqrt {n})\) for any \(n \in \mathbb {N}\). Similarly to the argument in the calculation of \(\vert \!\vert \!\vert q_1 \vert \!\vert \!\vert\), we can safely ignore effects of the scaling terms \(\frac{n}{n-t}\) and \(\frac{n}{m}\) to the weight of \(p^*\). By Equation (3) and Lemma 4.2, we have
\begin{align} \log \vert \!\vert \!\vert p^* \vert \!\vert \!\vert &\le O\left(d\right) + \sum _{m = n - \ell }^{n - t - 1} O\left(\frac{n-t}{k^{\prime }_m(n-t-m)}\right) + \sum _{m = n - t + 1}^n O\left(\frac{m}{k^{\prime \prime }_m(m-n+t)}\right) \nonumber \nonumber\\ &= O\left(\frac{n\ell }{k} + \sum _{m = n-\ell }^{n-t-1} \frac{\sqrt {n\ell (n-t)}}{k} \frac{1}{\sqrt {n-t-m}} + \sum _{m=n-t+1}^n \frac{\sqrt {nt m}}{k} \frac{1}{\sqrt {m-n+t}}\right) \nonumber \nonumber\\ &\le O\left(\frac{n\ell }{k} + \frac{n\sqrt {\ell }}{k} \sum _{i = 1}^{\ell - t}\frac{1}{\sqrt {i}} + \frac{n\sqrt {t}}{k} \sum _{i = 1}^{t}\frac{1}{\sqrt {i}}\right) \nonumber \nonumber\\ &= O\left(\frac{n}{k}\left(\log (1/\epsilon)+ t\right)\right). \end{align}
(10)
Finally, define \(p :\lbrace 0,1\rbrace ^n \rightarrow \mathbb {R}\) by \(p(x) = p^*(\frac{\sum _{i=1}^n x_i}{n})\). The theorem follows from Equations (7)–(10) and Claim 2.2 (2).□
Now we prove Theorem 1.3 and Corollary 1.9. In the proof below we are going to negate our input variables to convert \(\mathsf {EXACT}_{n,i}(x)\) into \(\mathsf {EXACT}_{n,n-i}(\overline{x})\) for \(0 \le i \le t\) to apply Theorem 4.1. Therefore, as discussed in Section 3, we will be working on the Fourier basis. Since \(\mathsf {OR}_n \in \mathsf {SYM}_{n, 0}\), we get Corollary 1.1 from Theorem 1.3, and Corollary 1.8 from Corollary 1.9.
Proof of Theorem 1.3
By Lemma 3.1 we can get the same results for \(\mathsf {EXACT}\) on the Fourier basis as in Theorem 4.1. Then we can write \(f\) as
\begin{align} f(x) &= c + \sum _{i=0}^t c^{\prime }_i \cdot \mathsf {EXACT}_{n,i}(x) + \sum _{i=0}^t c^{\prime \prime }_i \cdot \mathsf {EXACT}_{n,n-i}(x) \nonumber \nonumber\\ &= c + \sum _{i=0}^t c^{\prime }_i \cdot \mathsf {EXACT}_{n,n-i}(\overline{x}) + \sum _{i=0}^t c^{\prime \prime }_i \cdot \mathsf {EXACT}_{n,n-i}(x), \end{align}
(11)
where \(c\), \(c^{\prime }_i\)’s, and \(c^{\prime \prime }_i\)’s are fixed reals, and \(\overline{x} = (-x_1, \dots , -x_n)\). Now let \(\epsilon ^{\prime } = \frac{\epsilon }{2t + 2}\), then for \(0 \le i \le t\), \(\sqrt {n\left(\log (1/\epsilon ^{\prime }) + i\right)} = O\left(\sqrt {n\left(\log (1/\epsilon)+ t\right)}\right)\) so we can ignore the constant factor difference and apply Theorem 4.1 with \(\epsilon = \epsilon ^{\prime }\) and the same \(k\) for each \(\mathsf {EXACT}_{n,n-i}\). The degrees of the approximations for \(\mathsf {EXACT}\) are \(O(k)\), so the total degree is also \(O(k)\). The weights of the approximations for \(\mathsf {EXACT}\) are \(2^{O\left(n\left(\log (1/\epsilon ^{\prime }) + i\right)/k\right)} = 2^{O\left(n\left(\log (1/\epsilon)+ t\right)/k\right)}\). By Claim 2.1 the total weight is \(O(t)2^{O\left(n\left(\log (1/\epsilon)+ t\right)/k\right)} = 2^{O\left(n\left(\log (1/\epsilon)+ t\right)/k\right)}\).□
Proof of Corollary 1.9
Apply Theorem 1.3, set \(\delta = \frac{\epsilon }{2w} = 2^{-O\left(\frac{n}{k} \left(\log (1/\epsilon)+ t\right)\right)}\). Then it follows from Theorem 2.4.□

4.1 Proof of Lemma 4.2

Let \(d = \frac{2n}{k(n-m)}\) so \(\frac{k}{d} = \frac{k^2(n-m)}{2n}\). As \(m \lt n\) we have \(d \gt 0\) and \(\frac{k}{d} \gt 0\). Without loss of generality assume \(d, \frac{k}{d}\in \mathbb {N}\), we can define
\begin{align*} T^{(k)}_{n,m}(z) = T_d\left(a \cdot z^{k/d} + b\right)\!, \end{align*}
where \(a,b \in \mathbb {R}\) are parameters to be set and \(T_d\) is the degree-\(d\) Chebyshev polynomial. We have \(\deg (T^{(k)}_{n,m}) \le d \cdot \frac{k}{d} = k\).
We set \(a\), \(b\) such that
\begin{align} a + b &= 1, \end{align}
(12)
\begin{align} a\left(\frac{m}{n}\right)^{k/d} + b &= \cos \left(\frac{\pi }{2d}\right), \end{align}
(13)
and then Property (4) follows from Equation (12) and Claim 2.3 (1), and Property (5) follows from Equation (13) and Claim 2.3 (2) with \(i = 1\).
Our goal is to prove \(0 \le a \le 1\). Assume this is true. From Equation (12) we have \(b \in [0,1]\). Hence \(a \cdot z^{k/d} + b\) is increasing in \(z \in [0,1]\), mapping \([0,1]\) to \([b, 1] \subseteq [0, 1]\). Therefore Property (6) follows from Claim 2.3 (3). Besides, we have \(|a| + |b| = a + b = 1\), so by Claim 2.3 (6) and Claim 2.2 (1) we have \(\vert \!\vert \!\vert T^{(k)}_{n,m} \vert \!\vert \!\vert = 1 \cdot 2^{O(d)} = 2^{O\left(\frac{n}{k(n-m)}\right)}\). Therefore, \(T^{(k)}_{n,m}(z)\) is the desired polynomial.
To see that \(a\in [0,1]\), we solve the linear Equations (12) and (13) to get
\begin{align} a &= \frac{1-\cos \left(\frac{\pi }{2d}\right)}{1-\left(\frac{m}{n}\right)^{k/d}}. \end{align}
(14)
Because \(m \lt n\), we have \(\frac{m}{n} \lt 1\), and thus \(1-\left(\frac{m}{n}\right)^{k/d} \gt 0\). We always have \(1 - \cos \left(\frac{\pi }{2d}\right) \ge 0\). Therefore, from Equation (14) we have \(a \ge 0\). However, let \(u = \frac{n}{n-m}\), from Equation (14) we can get
\begin{align} a \le \frac{\frac{1}{2}\left(\frac{\pi }{2d}\right)^2}{1-\left(1-\frac{n-m}{n}\right)^{k/d}} \le \frac{\frac{\pi ^2}{16}\frac{k^2}{2u^2}}{1-e^{-\frac{k^2}{2u^2}}}, \end{align}
(15)
where the first step uses \(\cos x \ge 1 - \frac{1}{2}x^2\) for \(x \in \mathbb {R}\) in the numerator, and the second step uses the value of \(d\) in the numerator and \(1+x \le e^x\) for \(x \in \mathbb {R}\) in the denominator. Since \(\sqrt {\frac{n}{n-m}}\le k \le \frac{n}{n-m}\) and \(m \lt n\), we have \(0 \lt \frac{k^2}{2u^2} \le \frac{1}{2}\). Consider the function \(f(z) = \frac{z}{1-e^{-z}}\) on \(z \in (0, \frac{1}{2}]\). Its derivative \(f^{\prime }(z) = \frac{e^z(e^z - z - 1)}{(e^z - 1)^2} \gt 0\) for \(z \in (0,1]\), and hence \(f(z)\) is increasing in \((0,\frac{1}{2}]\), and thus \(f(z) \le f\left(\frac{1}{2}\right)\). From Equation (15), we have \(a \le \frac{\pi ^2}{16} f\left(\frac{1}{2}\right) = \frac{\pi ^2 \sqrt {e}}{32(\sqrt {e} - 1)}\lt 1\), finishing our proof. \(\Box\)

5 Proofs of Theorem 1.12 and Corollaries 1.2, 1.4, and 1.11

First, we prove the following lemma reducing the problem into fooling by \(k\)-wise indistinguishability. It generalizes Lemma 10 in Reference [8], which only works for \(\ell = O(1)\). Jumping ahead, we will use non-constant \(\ell\) to deal with non-constant \(\epsilon\) and symmetric functions with non-constant \(t\) in the proof of Theorem 1.12.
Lemma 5.1.
Let \(c^{\prime \prime }\) be any constant. For every \(n\), \(k\) and \(\ell\) satisfying \(\frac{c^{\prime \prime }}{16} \sqrt {n\ell } \le k \le \frac{c^{\prime \prime }}{16} n\), there exists \(n^{\prime }\) with \(\ell \le n^{\prime } \le n\) such that if there exist \(k^{\prime }\)-wise indistinguishable distributions \(P^{\prime },Q^{\prime }\) on \(\lbrace 0,1\rbrace ^{n^{\prime }}\) for \(k^{\prime } = c^{\prime \prime } \sqrt {n^{\prime }\ell }\), then there exist distributions \(P\), \(Q\) on \(\lbrace 0,1\rbrace ^n\) such that the Hamming weight distribution \(|P| \equiv |P^{\prime }|\), \(|Q| \equiv |Q^{\prime }|\), and \(P\) and \(Q\) are \(({k, 2^{-\Omega \left(n \ell /k\right)}})\)-indistinguishable.
Proof.
To sample from \(P\), and \(Q\) respectively, we select \(n^{\prime }\)-many indices from \([n]\) uniformly at random as “active” indices and then fill in these \(n^{\prime }\) indices using a random sample from \(P^{\prime }\), and \(Q^{\prime }\) respectively; for other indices we simply set them to be 0. Obviously this process keeps the Hamming weight of the samples.
Suppose \(P^{\prime }\) and \(Q^{\prime }\) are \(k^{\prime }\)-wise indistinguishable with \(k^{\prime } = c^{\prime \prime }\sqrt {n^{\prime }\ell }\) for some constant \(0 \lt c^{\prime \prime } \lt 1\). For any \(k\) indices \(S\) of \(P\) and \(Q\), if there are at most \(k^{\prime }\) active indices in \(S\), then their projections on \(S\) are identical by \(k^{\prime }\)-wise indistinguishability. Therefore the probability that statistical test on \(k\) bits can distinguish \(P\) from \(Q\) is bounded by the probability that such event does not happen. By tail bounds of hypergeometric distribution [21], we have
\begin{align} \Pr [\text{more than k′ active indices in S}] &\le e^{-k D\left(\frac{k^{\prime }+1}{k}||\frac{n^{\prime }}{n}\right)} = 2^{-\Omega \left(k D\left(\frac{k^{\prime }}{k}||\frac{n^{\prime }}{n}\right)\right)}, \end{align}
(16)
where \(D(a || b) = a \log \frac{a}{b} + (1-a) \log \frac{1-a}{1-b}\) is the Kullback–Leibler divergence. By a lower bound from Hellinger distance \(H\) (cf. Reference [20]), for any \(p\) and any \(a \ge 16\), we have
\begin{align*} D(ap || p) \ge 2H^2(ap || p) \ge \big (\sqrt {ap} -\sqrt {p}\big)^2 = \left(\sqrt {a} - 1\right)^2 p \ge \frac{1}{2}ap, \end{align*}
where the last step comes from the fact that \(2\sqrt {a} \le \frac{a}{2}\) for \(a \ge 16\). Now we set \(n^{\prime } = \frac{c^{\prime \prime 2}}{16^2} \frac{n^2}{k^2} \ell\), and then we have \(k^{\prime } = \frac{c^{\prime \prime 2}}{16}\frac{n}{k}\ell\), thus \(\frac{k^{\prime }}{k} / \frac{n^{\prime }}{n} = 16\); therefore, we have
\begin{align*} 2^{-\Omega \left(k D\left(\frac{k^{\prime }}{k}||\frac{n^{\prime }}{n}\right)\right)} \le 2^{-\Omega \left(k\frac{k^{\prime }}{k}\right)} = 2^{-\Omega (k^{\prime })} = 2^{-\Omega \left(n\ell /k\right)}. \end{align*}
For \(\ell \le n^{\prime }\), we need \(\frac{c^{\prime \prime 2}}{16^2}\frac{n^2}{k^2} \ge 1\), and thus \(k \le \frac{c^{\prime \prime }}{16} n\). For \(n^{\prime } \le n\), we need \(\frac{c^{\prime \prime 2}}{16^2}\frac{n}{k^2}\ell \le 1\) and thus \(k \ge \frac{c^{\prime \prime }}{16}\sqrt {n\ell }\).□
Combining the following equivalence between approximate degrees and bounded indistinguishability, with the \(\epsilon\)-approximate degree lower bound of symmetric functions due to Buhrman et al. [10], improving Reference [28], we can obtain \(k\)-wise indistinguishable distributions that do not fool symmetric functions.
Theorem 5.2 ([6Theorem 1.1]).
For every \(\epsilon\), \(n\), \(k\), and \(f:\lbrace 0,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) the following are equivalent:
(1)
\(f\) is not \(\epsilon\)-fooled by \(k\)-wise indistinguishability;
(2)
The \(\epsilon /2\)-approximate degree of \(f\) is bigger than \(k\).
Theorem 5.3 ([10]).
For any \(f \in \mathsf {SYM}_{n,t}\), \(\widetilde{\deg _\epsilon }(f) = \Omega \left(\sqrt {n\left(\log (1/\epsilon)+ t\right)}\right)\).
Now we can prove Theorem 1.12 with \(f = \mathsf {EXACT}_{n,t} \in \mathsf {SYM}_{n, t}\) for \(0 \le t \le \frac{n}{2}\), and similarly for \(f = \mathsf {THR}_{n, t+1}\) where \(\mathsf {THR}_{n, r}\) is \(\mathsf {True}\) iff the Hamming weight of its input is at least \(r\).2 In particular, for \(\mathsf {OR}_n = \mathsf {THR}_{n, 1}\) we get Corollary 1.11 by applying \(t = 0\). Note that for \(t = O(1)\) (in particular for \(\mathsf {OR}\), as in Corollary 1.11), this result works for all \(k\) as \((n, \epsilon ^{0.999})\)-indistinguishability3 does not \(\epsilon\)-fool any non-constant function.
Proof of Theorem 1.12
We consider function \(\mathsf {EXACT}_{n,t}\). Let \(\ell = \log (2/\epsilon) + t\), \(c^{\prime \prime }\) be the constant in \(\Omega (\cdot)\) in Theorem 5.3. Set \(c^{\prime } = \frac{c^{\prime \prime }}{16}\).
For \(k \le c^{\prime \prime } \sqrt {n \ell }\), by Theorem 5.3 and Theorem 5.2 \(k\)-wise indistinguishability does not \(\epsilon\)-fool \(\mathsf {EXACT}_{n,t}\), and hence the theorem for \(k \le c^{\prime \prime } \sqrt {n \ell }\) follows from \(2^{-c^{\prime \prime 2}n\ell /k} \le 2^{-k}\).
For \(c^{\prime \prime }\sqrt {n \ell } \le k \le \frac{c^{\prime \prime }}{16}n\), apply Lemma 5.1 to get \(n^{\prime }\), and then Theorem 5.3 and Theorem 5.2 give us \(k^{\prime }\)-wise indistinguishable distributions \(P^{\prime }\) and \(Q^{\prime }\) on \(\lbrace 0,1\rbrace ^{n^{\prime }}\) that do not \(\epsilon\)-fool \(\mathsf {EXACT}_{n^{\prime }, t}\). The theorem follows by applying Lemma 5.1 again to get distributions \(P,Q\).□
Theorem 2.4 shows that a polynomial \(p\) that \((\epsilon /2)\)-approximates \(f\) with degree \(k\) must have weight at least \(\epsilon / \delta\) if \((k,\delta)\)-indistinguishability does not \(\epsilon\)-fool \(f\). Let \(c \lt 1\) be the constant in the \(\Omega (\cdot)\) in Theorem 1.12. Then we get Corollary 1.2 from Corollay 1.11 and Corollary 1.4 from Theorem 1.12.

6 Proofs of Theorem 1.6 and Corollary 1.10

We first prove a decomposition lemma and a composition lemma. The decomposition lemma is implicit in Lemma 5.5 in Reference [34], and we provide its proof here for completeness. The composition lemma generalizes Lemma 5.2 in Reference [34] to work for any inner and outer approximations.
Lemma 6.1 (Decomposition Lemma).
Let \(F\) be a \(t\)-\(\mathsf {CNF}\) on \(n\) variables. Then there exists a function \(F^{\prime }:\lbrace -1,1\rbrace ^n \times \lbrace -1,1\rbrace ^{2n} \rightarrow \lbrace 0,1\rbrace\) such that:
\(F^{\prime }(x,y) = \bigwedge _{i=1}^{2n} y_i \vee f_i(x)\), where all the \(f_i\)’s are \((t-1)\)-\(\mathsf {CNF}\)s.
There exists an assignment to each \(y_i\) by some \(x_j\) or \(-x_j\) such that \(F(x) = F^{\prime }(x,y)\) for all \(x \in \lbrace -1,1\rbrace ^n\).
Proof.
Given a \(t\)-\(\mathsf {CNF}\) \(F\), we can transform \(F\) by the following procedure. For each \(i \in [n]\), from \(F\) we pick up all the terms that contain \(x_i\) unnegated. Let \(m_i\) be the number of such terms and \(C^{(i)}_{1}\), \(C^{(i)}_2\), \(\dots\), \(C^{(i)}_{m_i}\) be these terms. We remove \(x_i\) from them to get \(C^{\prime (i)}_{1}\), \(C^{\prime (i)}_2\), \(\dots\), \(C^{\prime (i)}_{m_i}\). If \(m_i = 0\), then define \(f^{\prime }_i(x) = \lnot x_i\); otherwise, define it as \(\bigwedge _{j =1}^{m_i} C^{\prime (i)}_{j}\). Remove all the original terms from \(F\), continue for the next \(i\) until \(i=n\). Then we do this procedure on the remaining terms of \(F\) for each \(i \in [n]\) again, but this time we are collecting terms that contain \(\lnot x_i\) and defining \(f^{\prime \prime }_i(x)\) similarly. At last we define
\begin{align} F_1(x) &= \bigwedge _{i = 1}^n x_i \vee f^{\prime }_i(x), \end{align}
(17)
\begin{align} F_2(x) &= \bigwedge _{i = 1}^n \lnot x_i \vee f^{\prime \prime }_i(x), \end{align}
(18)
and by distributive law we have \(F = F_1 \wedge F_2\). Note that all the \(f^{\prime }_i\)’s and \(f^{\prime \prime }_i\)’s are \((t-1)\)-\(\mathsf {CNF}\)s. We define \(F^{\prime }(x,y) = \bigwedge _{i=1}^{2n} y_i \vee f_i(x)\), where
\begin{equation*} f_i = \left\lbrace \begin{array}{ll} f^{\prime }_i & \mbox{ for }i = 1, \dots , n,\\ f^{\prime \prime }_{i-n} & \mbox{ for }i = n+1, \dots , 2n. \end{array}\right. \end{equation*}
Therefore, all the \(f_i\)’s are \((t-1)\)-\(\mathsf {CNF}\)s. If we set
\begin{align*} y_i = \left\lbrace \begin{array}{ll} x_i & \mbox{ for }i= 1, \dots , n, \\ -x_{i-n} & \mbox{ for }i = n+1, \dots , 2n,\end{array}\right. \end{align*}
then we will have \(F(x) = F^{\prime }(x,y)\) for all \(x \in \lbrace -1,1\rbrace ^n\) by Equations (17) and (18).□
Lemma 6.2 (Composition Lemma).
Let \(F^{\prime }:\lbrace -1,1\rbrace ^n \times \lbrace -1,1\rbrace ^{2n} \rightarrow \lbrace 0,1\rbrace\) be a function such that \(F^{\prime }(x,y) = \bigwedge _{i=1}^{2n} y_i \vee f_i(x)\), for some functions \(f_i\)’s. Let \(B \in [2n]\) be a number. Assuming that the following statements are true.
Outer approximation: The \(\mathsf {AND}\) function on \(\frac{2n}{B}\) bits \(\bigwedge _{\frac{2n}{B}} :\lbrace 0,1\rbrace ^{\frac{2n}{B}} \rightarrow \lbrace 0,1\rbrace\) can be approximated within error \(\epsilon _{out}\), degree \(d_{out}\), and weight \(w_{out}\).
Inner approximations: For any subset \(S \subseteq [2n]\), \(\bigwedge _{j \in S} f_j(x) :\lbrace -1,1\rbrace ^n \rightarrow \lbrace 0,1\rbrace\) can be approximated within error \(\epsilon _{in}\), degree \(d_{in}\), and weight \(w_{in}\).
Then we can approximate \(F^{\prime }\) within error \(\epsilon\), degree \(d\), and weight \(w\), where
\begin{align} \epsilon &\ge w_{out}\epsilon _{in} + \epsilon _{out}, \end{align}
(19)
\begin{align} d &\le d_{out}B + d_{in}, \end{align}
(20)
\begin{align} \log w &\le \log w_{out} + d_{out}B + \log w_{in}. \end{align}
(21)
Proof.
Let \(N = 2n\). Let \(S_1\), \(\dots\), \(S_{\frac{N}{B}}\) be an even partition of \([N]\) into subsets of size \(B\). For each \(i \in [\frac{N}{B}]\) we can define \(h_i:\lbrace -1,1\rbrace ^n \times \lbrace -1,1\rbrace ^{S_i} \rightarrow \lbrace 0,1\rbrace\) by
\begin{equation*} h_i(x,y) = \bigwedge _{j \in S_i} y_j \vee f_j(x), \end{equation*}
and we have \(F^{\prime }(x, y) = \bigwedge _{i = 1}^{\frac{N}{B}} h_i(x,y) = \bigwedge _{\frac{N}{B}}(h_1(x,y), \dots , h_\frac{N}{B}(x,y))\). Our goal is to approximate the outer \(\mathsf {AND}\) function and the \(h_i\)’s carefully so that the total degree and weight can be bounded as we want.
For any subset \(T \subseteq S \subseteq [N]\), define the indicator function \(\mathbb {I}(\cdot ; T,S) :\lbrace -1,1\rbrace ^{N} \rightarrow \lbrace 0,1\rbrace\) by
\begin{align} \mathbb {I}(y; T,S) = \prod _{j \in T}\frac{y_j + 1}{2} \prod _{j \in S \setminus T}\frac{1 - y_j}{2}, \end{align}
(22)
so it is 1 if and only if \(y\) represents \(\mathsf {False}\) on \(T\) and \(\mathsf {True}\) on \(S \setminus T\). An assignment to \(y\) 1-1 corresponds to a subset \(T \subseteq S_i\) such that \(y_j\) is false for \(j \in T\) and true for \(j \in S_i \setminus T\). Under such \(T\) (thus the corresponding assignment to \(y\)), to make \(h_i(x, y)\) true we need \(f_j(x)\) to be true for all \(j \in T\). Therefore each \(h_i\) can be written as
\begin{align} h_i(x,y) &= \sum _{T \subseteq S_i} \left(\bigwedge _{j \in T} f_j(x) \wedge \bigwedge _{j \in T} \lnot y_j \wedge \bigwedge _{j \in S_i \setminus T} y_j\right) \nonumber \nonumber\\ &= \sum _{T \subseteq S_i} \left(\bigwedge _{j \in T}f_j(x) \cdot \mathbb {I}(y; T, S_i)\right). \end{align}
(23)
By assumption, we have a polynomial \(p :\lbrace 0,1\rbrace ^{\frac{N}{B}} \rightarrow \mathbb {R}\) that \(\epsilon _{out}\)-approximates the outer \(\mathsf {AND}_\frac{N}{B}\) function with degree \(d_{out}\) and weight \(w_{out}\). We can write it as
(24)
where \(a_U \in \mathbb {R}\) and \(\sum _{U} |a_U| = w_{out}\) by definition. Define \(F^{\prime \prime } :\lbrace -1,1\rbrace ^n \times \lbrace -1,1\rbrace ^{2n} \rightarrow \mathbb {R}\) by substituting the outer \(\mathsf {AND}_\frac{N}{B}\) function in \(F^{\prime }\) by \(p\): \(F^{\prime \prime }(x, y) = p(h_1(x,y), \dots , h_{\frac{N}{B}}(x,y))\). Since \(p\) is a pointwise approximation, we have
\begin{align} \Vert F^{\prime \prime } - F^{\prime }\Vert _\infty \le \Vert p - \mathsf {AND}_{\frac{N}{B}}\Vert _\infty = \epsilon _{out}. \end{align}
(25)
However, we can expand \(F^{\prime \prime }\) as
where \(\mathsf {img}(T)\) denotes the image of function \(T :U \rightarrow \mathcal {P}([N])\) and \(\mathcal {P}([N])\) is the powerset of \([N]\), the first step uses Equation (24), the second step uses Equation (23), the third step exchanges the product with the sum, the fourth step uses properties of multiplication, and the last step uses the fact that multiplication on the Boolean basis is equivalent to \(\mathsf {AND}\). It is important that we set the input basis of the outer approximation \(p\) (thus the output basis of \(\bigwedge _{j \in \mathsf {img}(T)}f_j(x) \cdot \mathbb {I}(y; \mathsf {img}(T), \cup _{i \in U} S_i)\)) to be the Boolean basis even though the input basis of the whole function is the Fourier basis; otherwise, the last step does not hold.
By assumption, we can approximate each \(\bigwedge _{j \in \mathsf {img}(T)}f_j(x)\) by \(\widetilde{\bigwedge _{j \in \mathsf {img}(T)}f_j(x)}\) within error \(\epsilon _{in}\), degree \(d_{in}\), and weight \(w_{in}\). Then we can define \(\widetilde{F^{\prime \prime }} :\lbrace -1,1\rbrace ^n \times \lbrace -1,1\rbrace ^{2n} \rightarrow \mathbb {R}\) by
(26)
Observe that for any \(y \in \lbrace -1,1\rbrace ^N\), for any \(U \subseteq \left[\frac{N}{B}\right]\) with \(|U| \le d_{out}\), there is only one \(T :U \rightarrow \mathcal {P}([N])\) with \(T(i) \subseteq S_i, \forall i \in U\) such that \(\mathbb {I}(y; \mathsf {img}(T), \cup _{i \in U} S_i) = 1\): It is uniquely determined by the value of \(y\) on \(\cup _{i \in U} S_i\); all other summands will vanish. Therefore, we have
(27)
Hence, if we have
\begin{equation*} \epsilon \ge w_{out}\epsilon _{in} + \epsilon _{out}, \end{equation*}
then \(\widetilde{F^{\prime \prime }}\) \(\epsilon\)-approximates \(F^{\prime }\) since \(\Vert \widetilde{F^{\prime \prime }} - F^{\prime } \Vert _\infty \le \Vert \widetilde{F^{\prime \prime }} - F^{\prime \prime } \Vert _\infty + \Vert F^{\prime \prime } - F^{\prime } \Vert _\infty \le w_{out}\epsilon _{in} + \epsilon _{out} \le \epsilon\), where the first steps uses the triangular inequality, the second step uses Equations (25) and (27), and the third step uses Equations (34) and (33). What remains to bound is the degree and weight of \(\widetilde{F^{\prime \prime }}\).
Denote the degree of \(\widetilde{F^{\prime \prime }}\) as \(d\), and the weight of \(\widetilde{F^{\prime \prime }}\) as \(w\). Note that by Equation (22) and Claim 2.1 we have
\begin{align} \deg \left(\mathbb {I}(\cdot ; \mathsf {img}(T), \cup _{i \in U} S_i)\right) &= \vert \cup _{i \in U} S_i\vert = |U|B \le d_{out}B, \end{align}
(28)
\begin{align} \vert \!\vert \!\vert \mathbb {I}(\cdot ; \mathsf {img}(T), \cup _{i \in U} S_i) \vert \!\vert \!\vert &\le 1. \end{align}
(29)
For each \(U\), there are at most \(2^{|\cup _{i \in U} S_i|} \le 2^{d_{out}B}\) \(T\)’s, since \(T\) satisfies that \(T(i) \subseteq S_i, \forall i \in U\). Therefore, we have
where the first inequality comes from Equations (26) and (28), and the second inequality follows from Equations (26) and (29), Claim 2.1, and the observation above.□
Now we prove Theorem 1.6 by induction using the decomposition and composition lemmas. For simplicity, we first show the proof for \(t = 2\) and then generalize it for larger \(t\).
Proof of Theorem 1.6
By induction on \(t\): \(t = 1\) is Corollary 1.5 for \(\mathsf {AND}\). Now assume the theorem holds for \((t-1)\)-\(\mathsf {CNF}\), and we want to prove it for \(t\)-\(\mathsf {CNF}\).
We first demonstrate the proof for \(t = 2\). By Lemma 6.1, it suffices to prove the theorem for \(F^{\prime }:\lbrace -1,1\rbrace ^n \times \lbrace -1,1\rbrace ^{2n} \rightarrow \lbrace 0,1\rbrace\) with \(F^{\prime }(x,y) = \bigwedge _{i=1}^{2n} y_i \vee f_i(x)\), where all the \(f_i\)’s are 1-\(\mathsf {CNF}\)s, which are just \(\mathsf {AND}\)s. Hence, all the functions \(\bigwedge _{j \in S} f_j(x)\) are just \(\mathsf {AND}\)s with \(n\) input variables. We are going to use Corollary 1.5 to get outer and inner approximations with carefully chosen \(\epsilon _{in}\), \(d_{in}\), \(w_{in}\), \(\epsilon _{out}\), \(d_{out}\), and \(w_{out}\), then use Lemma 6.2 to get the desired bounds on \(\epsilon\), \(d\), and \(w\).
By assumption \(k\) satisfies \(n^\frac{2}{3}\left(\log (1/\epsilon)\right)^\frac{1}{3} \le k \le n\). Let \(N = 2n\). For convenience we will ignore the difference between \(N\) and \(n\), and use them interchangeably, as they are the same up to a multiplicative factor of 2. Set
\begin{align} B &= \frac{N}{k}, \end{align}
(30)
\begin{align} k_{in} &= k, \end{align}
(31)
\begin{align} k_{out} &= \sqrt {k\log (1/\epsilon)}, \end{align}
(32)
\begin{align} \epsilon _{in} &= \frac{\epsilon }{2 w_{out}}, \end{align}
(33)
\begin{align} \epsilon _{out} &= \frac{\epsilon }{2}, \end{align}
(34)
where \(k_{in}\) and \(k_{out}\) are numbers to be used later as the \(k\)’s for the inner approximations and the outer approximation, respectively.
For the outer approximation, we have
\begin{align*} \sqrt {\frac{N}{B} \log \frac{1}{\epsilon _{out}}} = \sqrt {k\log (1/\epsilon)} = k_{out} \le k = \frac{N}{B}, \end{align*}
where the first equality follows from Equations (34) and (30) (ignoring the constant factor), and the inequality \(k_{out} \le k\) comes from \(k \ge N^{2/3}\left(\log (1/\epsilon)\right)^{1/3}\) and the fact that \(\log (1/\epsilon)\le N\) (otherwise we can only take \(k = N\) and we will get all the bounds trivially). This means we can apply Corollary 1.5 over \(\lbrace 0,1\rbrace\) with \(\epsilon = \epsilon _{out}\), \(k = k_{out}\), and \(n = \frac{N}{B}\) to get the outer approximating polynomial \(p\) with the following parameters:
\begin{align} d_{out} &= O\left(k_{out}\right) = O\left(\sqrt {k\log (1/\epsilon)}\right), \end{align}
(35)
\begin{align} \log w_{out} &= O\left(\frac{N}{Bk_{out}}\log \frac{1}{\epsilon _{out}}\right) = O\Bigg (\sqrt {k\log (1/\epsilon)}\Bigg), \end{align}
(36)
using Equations (34), (30), and (32).
For the inner approximations, we have \(k_{in} = k \le N\), and we also have
\begin{align} \sqrt {N\log \frac{1}{\epsilon _{in}}} = O\left(\sqrt {N \log \frac{w_{out}}{\epsilon }}\right) = O\left(\sqrt {N\sqrt {k\log (1/\epsilon)}}\right) \le O(k) = O(k_{in}), \end{align}
(37)
where the first step uses Equation (33), the second step uses Equation (36), and the third step uses \(k \ge N^{2/3}\left(\log (1/\epsilon)\right)^{1/3}\). Therefore, we can invoke Corollary 1.5 over \(\lbrace -1,1\rbrace\) with \(\epsilon = \epsilon _{in}\), \(k = k_{in}\), and \(n = N\) to get the inner approximations with the following parameters:
\begin{align} d_{in} &= O(k_{in}) = O(k), \end{align}
(38)
\begin{align} \log w_{in} &= O\left(\frac{N}{k_{in}}\log \frac{1}{\epsilon _{in}}\right) = O\left(\frac{N}{k} \log \frac{w_{out}}{\epsilon } \right) = O\left(\frac{N}{\sqrt {k}} \sqrt {\log (1/\epsilon)} \right), \end{align}
(39)
using Equations (33), (31), and (36).
Finally, we apply Lemma 6.2 to get an approximation for \(F^{\prime }\). Obviously Lemma 6.2 (19) is satisfied by Equations (33) and (34). Combining Equations (30), (35), and (38) and Lemma 6.2 (20) we get
\begin{align*} d = O\left(\frac{N}{k} \sqrt {k\log (1/\epsilon)} + k\right) =O\left(\frac{N}{\sqrt {k}} \sqrt {\log (1/\epsilon)} + k \right) = O(k), \end{align*}
where the last step follows from \(k \ge N^{2/3}\left(\log (1/\epsilon)\right)^{1/3}\). Combining Equation (30), (35), (36), and (39) and Lemma 6.2 (21) we get
\begin{align*} \log w &= O\left(\sqrt {k\log (1/\epsilon)} + \frac{N}{k} \sqrt {k\log (1/\epsilon)} + \frac{N}{\sqrt {k}} \sqrt {\log (1/\epsilon)} \right) = O\left(\frac{N}{\sqrt {k}} \sqrt {\log (1/\epsilon)} \right), \end{align*}
since \(k \le N\) implies \(\frac{N}{\sqrt {k}} \sqrt {\log (1/\epsilon)} \ge \sqrt {k\log (1/\epsilon)}\). This finishes our proof for \(t=2\).
For \(t \ge 2\), similarly as before we first invoke Lemma 6.1, getting \(F^{\prime }(x,y) = \bigwedge _{i=1}^{2n} y_i \vee f_i(x)\), where all the \(f_i\)’s are \((t-1)\)-\(\mathsf {CNF}\)s. Now the inner functions \(\bigwedge _{j \in S} f_j(x)\) become \((t-1)\)-\(\mathsf {CNF}\)s. We are going to use Corollary 1.5 to get an outer approximation with carefully chosen \(\epsilon _{out}\), \(d_{out}\), and \(w_{out}\), and use inductive hypothesis to get inner approximations with carefully chosen \(\epsilon _{in}\), \(d_{in}\), and \(w_{in}\), then use Lemma 6.2 to get the desired bounds on \(\epsilon\), \(d\), and \(w\).
By assumption \(k\) satisfies \(n^\frac{t}{t+1}\left(\log (1/\epsilon)\right)^\frac{1}{t+1} \le k \le n\). We still set \(\epsilon _{in}\), \(\epsilon _{out}\) to the same values as in Equations (33) and (34), respectively. Set
\begin{align*} B &= \frac{N}{k^{2/t}}, \\ k_{in} &= k, \\ k_{out} &= k^\frac{1}{t} \left(\log (1/\epsilon)\right)^\frac{1}{t}. \end{align*}
For the outer approximation, similarly as before we have \(\sqrt {\frac{N}{B} \log \frac{1}{\epsilon _{out}}} = k_{out} \le \frac{N}{B}\), so we can apply Corollary 1.5 to get
\begin{align*} d_{out} &= O(k_{out}) = O\left(k^\frac{1}{t}\left(\log (1/\epsilon)\right)^\frac{1}{t}\right), \\ \log w_{out} &= O\left(\frac{N}{Bk_{out}}\log \frac{1}{\epsilon _{out}}\right) = O\left(k^\frac{1}{t} \left(\log (1/\epsilon)\right)^\frac{t-1}{t} \right). \end{align*}
For the inner approximation, similarly as before we have \(N^{\frac{t-1}{t}}(\log \frac{1}{\epsilon _{in}})^{\frac{1}{t}} \le k_{in} \le N\), so we can use the induction hypothesis for \((t-1)\)-\(\mathsf {CNF}\) to get
\begin{align*} d_{in} &\le c_{t-1} \cdot k_{in} = c_{t-1}\cdot k, \\ \log w_{in} &\le c_{t-1}\cdot \frac{N}{k_{in}^{1/(t-1)}}\left(\log \frac{1}{\epsilon _{in}}\right)^\frac{1}{t-1} \le c^{\prime }_{t-1}\cdot \frac{N}{k^{1/t}} \left(\log (1/\epsilon)\right)^\frac{1}{t}, \end{align*}
where \(c^{\prime }_{t-1}\) is some constant depending on \(c_{t-1}\).
Now we invoke Lemma 6.2 to finish our proof. The same bound for \(\epsilon\) still works. Combining all these bounds, for some constant \(c_t\) (depending only on \(t\)) we get
\begin{align*} d &= O\left(\frac{N}{k^{1/t}} \left(\log (1/\epsilon)\right)^\frac{1}{t}\right) + c_{t-1}\cdot k \le c_t\cdot k,\\ \log w &= O\left(k^\frac{1}{t} \left(\log (1/\epsilon)\right)^\frac{t-1}{t}\right) + c_{t-1}\cdot \frac{N}{k^{1/t}} \left(\log (1/\epsilon)\right)^\frac{1}{t} \le c_t\cdot \frac{N}{k^{1/t}} \left(\log (1/\epsilon)\right)^\frac{1}{t}. \end{align*}
Proof of Corollary 1.10
Use Theorem 1.6 and Theorem 2.4.□

7 Proof of Theorem 1.13

We are going to prove that for any \(k \le n\), any two distributions \(P\) and \(Q\) on \(\lbrace -1,1\rbrace ^n\) are \(w\)-close to some \(k\)-wise indistinguishable distributions \(P^{\prime }\) and \(Q^{\prime }\), where
\begin{equation*} w = \sqrt {\sum _{|S| \le k}\left(\mathbb {E}[\chi _S(P)] - \mathbb {E}[\chi _S(Q)]\right)^2}, \end{equation*}
and \(\chi _S(x) = \prod _{i \in S} x_i\) for \(x \in \lbrace -1,1\rbrace ^n\) is the characteristic function of set \(S\). For any function \(f :\lbrace -1,1\rbrace ^n \rightarrow \mathbb {R}\) and any set \(S \subseteq [n]\), we use \(\widehat{f}(S)= \mathbb {E}_x [f(x) \chi _S(x)]\) to denote the Fourier coefficient of \(f\) on \(S\).
Proof of Theorem 1.13
Given distributions \(P\) and \(Q\), we view \(\frac{P - Q}{2}\) as a polynomial on \(\lbrace -1,1\rbrace ^n\) and decompose it into two parts \(L + H\), where \(L\) consists of the monomials with degree at most \(k\), and \(H\) consists of those with degree larger than \(k\).
For any polynomial \(H^{\prime }:\lbrace -1,1\rbrace ^n \rightarrow \mathbb {R}\) that only contains monomials of degree \(\gt k\), let \(w = \sum _x |L(x) + H^{\prime }(x)|\) and \(\phi = \frac{L + H^{\prime }}{w}\). First, we show that we can always find distributions \(P^{\prime \prime }\) and \(Q^{\prime \prime }\) such that
\begin{align} \frac{P^{\prime \prime } - Q^{\prime \prime }}{2} = -\phi . \end{align}
(40)
We can achieve this by defining \(P^{\prime \prime }(x) = -\min \lbrace 0, 2\phi (x)\rbrace\) and \(Q^{\prime \prime }(x) = \max \lbrace 0, 2\phi (x)\rbrace\) for all \(x \in \lbrace -1,1\rbrace ^n\). \(P^{\prime \prime }\) is supported on \(\lbrace x: \phi (x) \lt 0\rbrace\) and \(Q^{\prime \prime }\) is supported on \(\lbrace x: \phi (x) \gt 0\rbrace\) so they have disjoint supports. Therefore, it is easy to see that Equation (40) is satisfied, and we are going to show that such \(P^{\prime \prime }\) and \(Q^{\prime \prime }\) are distributions. Obviously, we have \(P^{\prime \prime }(x), Q^{\prime \prime }(x) \ge 0\) for all \(x\). Besides, we get
\begin{equation*} \sum _x \phi (x) = 2^n \mathbb {E}_x \phi (x) = 2^n \widehat{\phi }(\varnothing) = 2^n \widehat{L}(\varnothing) = 2^{n-1} \mathbb {E}_x[P(x) - Q(x)] = \frac{1}{2}\sum _x P(x) - \frac{1}{2}\sum _x Q(x) = 0, \end{equation*}
and thus \(\sum _{x: \phi (x) \lt 0} |\phi (x)| = \sum _{x: \phi (x) \gt 0} |\phi (x)| = \frac{1}{2} \sum _x |\phi (x)| = \frac{1}{2}\). Therefore, we have \(\sum _x P^{\prime \prime }(x) = 2\sum _{x: \phi (x) \lt 0} |\phi (x)| = 1\), and similarly for \(Q^{\prime \prime }\), so they are distributions.
Now we can set distributions \(P^{\prime } = \frac{P + wP^{\prime \prime }}{1 + w}\) and \(Q^{\prime } = \frac{Q + wQ^{\prime \prime }}{1 + w}\). Then the statistical distance \(\Delta (P, P^{\prime }) \le \frac{w}{1+w}\le w\) and similarly for \(Q\) and \(Q^{\prime }\). We also have
\begin{equation*} \frac{P^{\prime } - Q^{\prime }}{2} = \frac{H - H^{\prime }}{1 + w}. \end{equation*}
Since both \(H\) and \(H^{\prime }\) only contain monomials of degree \(\gt k\), for all \(|S| \le k\) we have
\begin{align*} \mathbb {E}[\chi _S(P^{\prime })] - \mathbb {E}[\chi _S(Q^{\prime })] &= \sum _x (P^{\prime }(x) - Q^{\prime }(x)) \chi _S(x)\\ &= 2^{n+1} \mathbb {E}_x \left[\frac{P^{\prime }(x) - Q^{\prime }(x)}{2}\chi _S(x)\right]\\ &= 2^{n+1} \widehat{\frac{P^{\prime }-Q^{\prime }}{2}}(S)\\ &= 0. \end{align*}
Therefore \(P^{\prime }\) and \(Q^{\prime }\) are \(k\)-wise indistinguishable.
In summary, \(P\) and \(Q\) are \(w\)-close to some \(k\)-wise indistinguishable distributions for \(w = \sum _x |L(x) + H^{\prime }(x)|\), where \(H^{\prime }\) only contains monomials of degree \(\gt k\). Therefore, our goal is to get the minimum of \(w\) under the constraint for \(H^{\prime }\). By LP duality, this equals to the maximum of \(\sum _x L(x)p(x)\) under the constraints that \(|p(x)| \le 1\) for all \(x\), and \(p\) is a polynomial of degree at most \(k\). Using Cauchy-Schwarz inequality and Parseval’s Theorem (cf. Reference [26Section 1.4]), we have
\begin{align} \sum _x L(x)p(x) \le 2^n \sqrt {\mathbb {E}_x L^2(x)} \sqrt {\mathbb {E}_x p^2(x)} \le 2^n \sqrt { \sum _{|S| \le k} \widehat{L}^2(S)}. \end{align}
(41)
For \(|S| \le k\), we have
\begin{align*} \widehat{L}(S) &= \mathbb {E}_x \left[\frac{P(x) - Q(x)}{2}\chi _S(x)\right] \\ &= \frac{1}{2^{n+1}}\left(\sum _x P(x)\chi _S(x) - \sum _x Q(x)\chi _S(x) \right)\\ &= \frac{1}{2^{n+1}}\left(\mathbb {E}[\chi _S(P)] - \mathbb {E}[\chi _S(Q)]\right), \end{align*}
and hence Equation (41) can be further bounded by \(\frac{1}{2}\sqrt {\sum _{|S| \le k} \left(\mathbb {E}[\chi _S(P)] - \mathbb {E}[\chi _S(Q)]\right)^2 }\). Therefore, for \(P\) and \(Q\) being \((k, \delta)\)-indistinguishable we can bound \(w\) by \(\sqrt {\binom{n}{k}}\frac{\delta }{2} = O(n^{k/2}\delta)\).□
Remark 7.1.
It is natural to ask if we can adapt the above proof to shave off the extra \(e^k\) factor in Reference [27] for the \(k\)-wise independence vs. almost \(k\)-wise independence problem. We note that it does not work for the following reasons. We replace both \(Q\) and \(Q^{\prime \prime }\) in the above proof by the constant function \(2^{-n}\), the probability density function of the uniform distribution. Now we need to have extra constraints to ensure that \(P^{\prime \prime }(x) \ge 0\) for all \(x\), and the dual LP becomes \(\max _{p, p^{\prime }} \sum _x L(x)(p(x) + p^{\prime }(x))\) under the constraints that \(p+p^{\prime }\) has degree at most \(k\), \(p^{\prime }(x) \ge 0\) for all \(x\), and \(\Vert p \Vert _\infty + \mathbb {E}_x p^{\prime }(x) \le 1\). Similarly to Reference [27], we can use hypercontractivity to show that
\begin{align} \sqrt {\mathbb {E}_x (p(x) + p^{\prime }(x))^2 } \le e^k, \end{align}
(42)
thus getting the bound \(O(e^k n^{k/2} \delta)\) using Cauchy–Schwarz, recovering [27Theorem 1.1]. One would hope to improve the bound in Equation (42) to \(O(1)\) as in the above proof to shave off the extra \(e^k\) factor in the final bound. However, it is tight up to a constant in the exponent, as shown by the following counterexample. Let \(p\) be 0 and \(p^{\prime }\) be \(2^k\) on an arbitrary \((n-k)\)-dimensional subcube of \(\lbrace -1,1\rbrace ^n\) (e.g., \(\lbrace x :x_i = -1 , \forall i \in [k]\rbrace\)) and 0 otherwise. Obviously the degree of \(p^{\prime }\) is \(k\), and \(\mathbb {E}_x p^{\prime }(x) = \frac{2^{n-k}}{2^n}2^k = 1\) so all the constraints are satisfied. Now we have \(\sqrt {\mathbb {E}_x (p(x) + p^{\prime }(x))^2 } = 2^{k/2}\).

8 Proof of Claim 1.14

First, we have the following lemma for compositions of indistinguishable distributions. Essentially, it is equivalent to the dual block composition method in Reference [14] for proving \(\mathsf {AND}\circ \mathsf {OR}\) approximate degree lower bounds, but we find it more intuitive and easier to prove.
Lemma 8.1 ([35Lecture 6-7]).
Suppose that distributions \(A^0, A^1\) over \(\lbrace 0,1\rbrace ^{n_A}\) are \(k_A\)-wise indistinguishable; and distributions \(B^0, B^1\) over \(\lbrace 0,1\rbrace ^{n_B}\) are \(k_B\)-wise indistinguishable. For \(b \in \lbrace 0,1\rbrace\), define \(C^b\) over \(\lbrace 0,1\rbrace ^{n_A \cdot n_B}\) by first drawing a sample \(x \in \lbrace 0,1\rbrace ^{n_A}\) from \(A^b\) and then replacing each bit \(x_i\) by a sample of \(B^{x_i}\) independently. Then \(C^0\) and \(C^1\) are \((k_A \cdot k_B)\)-wise indistinguishable.
Proof.
Consider any set \(S \subseteq \lbrace 1,\dots , n_A\cdot n_B \rbrace\) of \(k_A \cdot k_B\) bit positions. We will show that they have the same distribution in \(C^0\) and \(C^1\).
View the \(n_A \cdot n_B\) as \(n_A\) blocks of \(n_B\) bits. Call a block \(K\) of \(n_B\) bits heavy if \(|S\cap K| \gt k_B\); call the other blocks light. There are at most \(k_A\) heavy blocks by assumption, so that the distribution of the (entire) heavy blocks are the same in \(C^0\) and \(C^1\) by \(k_A\)-wise indistinguishability of \(A^0\) and \(A^1\). Furthermore, conditioned on any outcome for the \(A^b\) samples in \(C^b\), those bit positions in the light blocks have the same distribution in both \(C^0\) and \(C^1\) by \(k_B\)-wise indistinguishability of \(B^0\) and \(B^1\) and independence between blocks.
Therefore, \(C^0\) and \(C^1\) are \(k_A \cdot k_B\)-wise indistinguishable.□
Also observe that we can only consider disjoint distributions for indistinguishability.
Claim 8.2 ([35Lecture 8-9]).
For any function \(f\), and for any \(k\)-wise indistinguishable distributions \(A^0\) and \(A^1\), if \(f\) can distinguish them with probability \(\epsilon\), then there are distributions \(B^0\) and \(B^1\) with the same property and disjoint supports. (By disjoint support we mean for any \(x\) either \(\Pr [B^0 = x] = 0\) or \(\Pr [B^1 = x] = 0\).)
Proof.
Let distribution \(C\) be the “common part” of \(A^0\) and \(A^1\). That is, we define \(C\) such that \(\Pr [C = x] := \min \lbrace \Pr [A^0 = x], \Pr [A^1 = x]\rbrace\) divided by some constant that normalize \(C\) into a distribution. We can write \(A^0\) and \(A^1\) as
\begin{align*} A^0 &= pC + (1-p) B^0 \,,\\ A^1 &= pC + (1-p) B^1 \,, \end{align*}
where \(p \in (0,1)\), \(B^0\) and \(B^1\) are two distributions. Clearly, \(B^0\) and \(B^1\) have disjoint supports.
Then, we have
\begin{align*} \mathbb {E}[f(A^0)] - \mathbb {E}[f(A^1)] =&~p \mathbb {E}[f(C)] + (1-p) \mathbb {E}[f(B^0)] \\ &- p \mathbb {E}[f(C)] - (1-p) \mathbb {E}[f(B^1)] \\ =&~(1-p) \big (\mathbb {E}[f(B^0)] - \mathbb {E}[f(B^1)] \big) \\ \le &~\mathbb {E}[f(B^0)] - \mathbb {E}[f(B^1)] \,. \end{align*}
Similarly, for all \(S \ne \varnothing\) such that \(|S| \le k\), we have \(\mathbb {E}[\chi _S(A^0)] - \mathbb {E}[\chi _S(A^1)] = (1-p) \big (\mathbb {E}[\chi _S(B^0)] - \mathbb {E}[\chi _S(B^1)] \big)\), so \(\mathbb {E}[\chi _S(B^0)] - \mathbb {E}[\chi _S(B^1)] = 0\).
Therefore, if \(f\) can distinguish \(A^0\) and \(A^1\) with probability \(\epsilon\), and then it can also distinguish \(B^0\) and \(B^1\) with such probability. Besides, \(B^0\) and \(B^1\) are \(k\)-wise indistinguishable.□
Now we can prove approximate degree lower bounds using indistinguishability.
Proof of Claim 1.14
(1)
We know that \(\widetilde{\deg _{1/3}}(\mathsf {AND}_m) = \Omega (\sqrt {m})\) and \(\widetilde{\deg _{1/3}}(\mathsf {OR}_n) = \Omega (\sqrt {n})\) [25]. By standard error reduction techniques (cf. Reference [12]) \(\widetilde{\deg _\epsilon }(f) = \Theta (\widetilde{\deg _{1/3}}(f))\) for all constant \(\epsilon \in (0, \frac{1}{2})\). By Theorem 5.2 we get \(\Omega (\sqrt {m})\)-wise indistinguishable distrubutions \(A^0, A^1\) s.t.
\begin{equation*} \Pr [\mathsf {AND}_m(A^1) = 1] \ge \Pr [\mathsf {AND}_m(A^0) = 1] + 0.99, \end{equation*}
and similarly we have \(B^0, B^1\) for \(\mathsf {OR}_n\). By Claim 8.2, \(A^0, A^1\) have disjoint supports, and same for \(B^0, B^1\). Compose them by Lemma 8.1 to get \(\Omega (\sqrt {mn})\)-wise indistinguishable distributions \(C^0,C^1\). It remains to show that \(\mathsf {AND}_m\circ \mathsf {OR}_n\) can distinguish them:
\(C^0\): First, \(A^0\) is sampled. As there exists unique \(x = 1^m\) such that \(\mathsf {AND}_m(x)= 1\), \(\Pr [A^1 = x] \gt 0\) thus by disjointness of support \(\Pr [A^0 = x] = 0\). Therefore, we get a string with at least one “0.” But then this “0” is replaced with sample from \(B^0\). We have \(\Pr [B^0 = 0^n] \ge 0.99\), and when it happens, \(\mathsf {AND}_m\circ \mathsf {OR}_n\) will return 0.
\(C^1\): First, \(A^1\) is sampled, and we know that \(A^1 = 1^m\) with probability at least 0.99. Each bit “1” is replaced by a sample from \(B^1\), and we know that \(\Pr [B^1 = 0^n] = 0\) by disjointness of support since \(\Pr [B^0 = 0^n] \gt 0\), and thus in this case \(\mathsf {AND}_m\circ \mathsf {OR}_n\) will return 1.
Therefore, \(\mathsf {AND}_m\circ \mathsf {OR}_n\) is not 0.98-fooled by \(C^0, C^1\). By Theorem 5.2 and standard error reduction techniques, we have \(\widetilde{\deg _{1/3}}(\mathsf {AND}_m \circ \mathsf {OR}_n) = \Omega (\sqrt {mn})\).
(2)
We have \(\Omega (\sqrt {n})\)-wise indistinguishable distributions \(B^0, B^1\) s.t.
\begin{equation*} \Pr [\mathsf {OR}_n(B^1) = 1] \ge \Pr [\mathsf {OR}_n(B^0) = 1] + 0.99, \end{equation*}
and thus \(\Pr [\mathsf {OR}_n(B^0) = 1] \le 0.01\). We define \(C^b\) as \(m\) independent copies of \(B^b\) for \(b \in \lbrace 0,1\rbrace\). Obviously, \(C^0\), \(C^1\) are \(\Omega (\sqrt {n})\)-wise indistinguishable. For \(C^1\), every copy of \(B^1\) satisfies \(\Pr [B^1 = 0^n] = 0\) by disjointness of support, and thus \(\Pr [\mathsf {AND}_m \circ \mathsf {OR}_n(C^1) = 1] = 1\). For \(C^0\), we have \(\Pr [\mathsf {AND}_m \circ \mathsf {OR}_n(C^0) = 1] \le 0.01^m = 2^{-\Theta (m)}\). Therefore, \(\mathsf {AND}_m \circ \mathsf {OR}_n\) is not \((1-2^{-\Theta (m)})\)-fooled by \(C^0, C^1\), and thus by Theorem 5.2 we are done.
(3)
Define \(\mathsf {GapMAJ}^{\prime }_m\) as the partial function version of \(\mathsf {GapMAJ}_m\) with an extra requirement that it is undefined on inputs of Hamming weight in \((\frac{1}{3}m, \frac{2}{3}m)\). For a partial function \(g\) with domain \(D \subset \lbrace 0,1\rbrace ^m\), define the bounded approximate degree \(\widetilde{\mathrm{bdeg}_\epsilon }(g)\) as the minimum degree of polynomial \(p\) such that \(|p(x) - g(x) | \le \epsilon\) for \(x \in D\) and \(|p(x)| \le 1 + \epsilon\) for \(x \notin D\). It is easy to see that \(\widetilde{\deg _\epsilon }(\mathsf {GapMAJ}_m \circ f_n) \ge \widetilde{\mathrm{bdeg}_\epsilon }(\mathsf {GapMAJ}^{\prime }_m \circ f_n)\), so it remains to prove the lower bound for the latter.
Analogous to Theorem 5.2, it is necessary and sufficient to give two \(\Omega (\widetilde{\deg _{1/3}}(f_n))\)-wise indistinguishable distributions \(C^0\), \(C^1\) such that
(43)
where \(D\) is the domain of \(\mathsf {GapMAJ}^{\prime }_m \circ f_n\), i.e., the distinguishing advantage of \(\mathsf {GapMAJ}^{\prime }_m \circ f_n\) on \(D\) minus the probability mass of \(C^0\) and \(C^1\) outside of \(D\) must be at least \(2\epsilon\).
Let \(k = \widetilde{\deg _{1/3}}(f_n)\). Similarly as before, we can get \(\Omega (k)\)-wise indistinguishable distributions \(B^0, B^1\) s.t. \(\Pr [f_n(B^1) = 1] \ge \Pr [f_n(B^0) = 1] + 0.99\). Now for \(b \in \lbrace 0,1\rbrace\), we still define \(C^b\) as \(m\) independent copies of \(B^b\), thus \(C^0\), \(C^1\) are \(\Omega (k)\)-wise indistinguishable. We have \(\Pr [f_n(B^1) = 1] \ge 0.99\) and \(\Pr [f_n(B^1) = 0] \le 0.01\). Hence, in expectation more than 0.99 fraction of the \(m\) independent copies of \(B^1\) will make \(f_n\) return 1. Therefore, by Chernoff bound, on \(C^1\) the probability that \(\mathsf {GapMAJ}^{\prime }_m\) gets an input of Hamming weight less than \(\frac{2}{3}m\) is at most \(2^{-\Theta (m)}\). Similarly, on \(C^0\) the probability that \(\mathsf {GapMAJ}^{\prime }_m\) gets an input of Hamming weight larger than \(\frac{1}{3}m\) is at most \(2^{-\Theta (m)}\). Therefore, Inequality (43) holds for \(2\epsilon = 1 - 2^{-\Theta (m)}\), and we finish the proof.
(4)
Similarly as before, we get \(A^0, A^1\) for \(g_m\) and \(B^0, B^1\) for \(f_n\). Composing them by Lemma 8.1 gives \(\Omega \left(\widetilde{\deg _{1/3}}(g_m) \cdot \widetilde{\deg _\epsilon }(f_n)\right)\)-wise indistinguishable distributions \(C^0, C^1\). Note that now we have \(\Pr [f_n(B^1) = 1] \ge \Pr [f_n(B^0) = 1] + (1-\frac{2}{m^{\alpha }})\), and thus we have \(\Pr [f_n(B^b) \ne b] \le \frac{2}{m^{\alpha }}\) for both \(b \in \lbrace 0,1\rbrace\), i.e., \(B^b\) errs with probability at most \(\frac{2}{m^{\alpha }}\). Then by union bound, \(\Pr [g_m \circ f_n(C^1) = 1] \ge 1 - \frac{1}{3} - m \cdot \frac{2}{m^\alpha } = \frac{2}{3} - o(1)\), and similarly \(\Pr [g_m \circ f_n(C^0) = 0] = \frac{2}{3} - o(1)\); thus \(g_m \circ f_n\) is not \(\frac{1}{6}\)-fooled by \(C^0, C^1\), and we finish the proof similarly.
(5)
The \((m-1)\)-wise indistinguishable distributions \(A^0, A^1\) for \(\mathsf {XOR}_m\) can be explicitly obtained by defining \(A^0\) to be the uniform distribution over all strings of \(\lbrace 0,1\rbrace ^m\) with parity 0, and \(A^1\) for parity 1. Similarly as before, we have \(\Omega (\widetilde{\deg _\epsilon }(f_n))\)-wise indistinguishable distributions \(B^0\), \(B^1\) s.t. \(\mathbb {E}[f_n(B^1)] - \mathbb {E}[f_n(B^0)] \ge 2\epsilon\). Composing them by Lemma 8.1 gives \(\Omega (m \cdot \widetilde{\deg _\epsilon }(f_n))\)-wise indistinguishable distributions \(C^0, C^1\). Alternatively, we can define \(C^0_m, C^1_m\) inductively by the following:
\(C^0_1 = B^0, C^1_1 = B^1\);
for each \(k \gt 1\), for \(C^0_k\) first randomly draw \(z \in \lbrace 0,1\rbrace\), then sample from \(B^zC^z_{k-1}\) as result; for \(C^1_k\) first randomly draw \(z \in \lbrace 0,1\rbrace\), then sample from \(B^zC^{1-z}_{k-1}\).
It is easy to see that \(C^0 = C^0_m\) and \(C^1 = C^1_m\). For simplicity, we convert the output basis from \(\lbrace 0,1\rbrace\) to \(\lbrace -1,1\rbrace\), so \(\mathsf {XOR}_m \circ f_n\) becomes products of \(f_n\)’s. Under this basis, we have \(\mathbb {E}[f_n(B^0)] - \mathbb {E}[f_n(B^1)] \ge 2\epsilon\). Then
\begin{align*} & \mathbb {E}[\mathsf {XOR}_m \circ f_n(C^0_m)] - \mathbb {E}[\mathsf {XOR}_m \circ f_n(C^1_m)] \\ =&~ \frac{1}{4} \sum _{z, z^{\prime } \in \lbrace 0,1\rbrace } \mathbb {E}[f_n(B^z)\cdot \mathsf {XOR}_{m-1} \circ f_n(C^z_{m-1})] - \mathbb {E}[f_n(B^{z^{\prime }})\cdot \mathsf {XOR}_{m-1} \circ f_n(C^{1-z^{\prime }}_{m-1}))] \\ =&~ \frac{1}{4} \sum _{z, z^{\prime } \in \lbrace 0,1\rbrace } \mathbb {E}[f_n(B^z)]\mathbb {E}[\mathsf {XOR}_{m-1} \circ f_n(C^z_{m-1})] - \mathbb {E}[f_n(B^{z^{\prime }})]\mathbb {E}[\mathsf {XOR}_{m-1} \circ f_n(C^{1-z^{\prime }}_{m-1}))] \\ =&~ \frac{1}{2} (\mathbb {E}[f_n(B^0)]-\mathbb {E}[f_n(B^1)])(\mathbb {E}[\mathsf {XOR}_{m-1} \circ f_n(C^0_{m-1})] - \mathbb {E}[\mathsf {XOR}_{m-1} \circ f_n(C^1_{m-1})]) \\ \ge &~ \frac{1}{2} \cdot 2\epsilon \cdot (\mathbb {E}[\mathsf {XOR}_{m-1} \circ f_n(C^0_{m-1})] - \mathbb {E}[\mathsf {XOR}_{m-1} \circ f_n(C^1_{m-1})]). \end{align*}
Therefore by induction we have \(\mathbb {E}[\mathsf {XOR}_m \circ f_n(C^0_m)] - \mathbb {E}[\mathsf {XOR}_m \circ f_n(C^1_m)] \ge 2 \epsilon ^m\), thus \(\widetilde{\deg _{\epsilon ^m}}(\mathsf {XOR}_m \circ f_n) = \Omega (m \cdot \widetilde{\deg _\epsilon }(f_n))\) by Theorem 5.2.
For \(\mathsf {AND}_m\), use the same \(A^0, A^1\) if \(m\) is odd and switch their roles if \(m\) is even. The remaining proof follows similarly except that we keep the output basis to be \(\lbrace 0,1\rbrace\).

9 Discussion and Open Problems

An obvious open question is to prove approximate degree-weight tradeoffs for more functions. A central function in the area is Surjectivity [4]. For this, it would suffice to have a polynomial approximating \(\mathsf {OR}\) on the domain \(\lbrace -1,1\rbrace ^m_{\le n}\), in which the Hamming of the input is restricted to be at most \(n\). [34] showed that the degree of such a polynomial depends on \(n\) instead of \(m\) as if we are working on \(\lbrace -1,1\rbrace ^n\). It is natural to ask if the same holds for weight. The answer is negative. To show this, note that the proof of Theorem 1.12 actually gives us \((k,\delta)\)-indistinguishable distributions with bounded Hamming weight that do not fool \(\mathsf {OR}\). In particular, for any \(\epsilon , n, m, k\) satisfying \(\frac{c^{\prime }}{16}\sqrt {m \log (1/\epsilon)} \le k \le \frac{c^{\prime }}{16}m\) and \(n = \frac{c^{\prime 2}}{16^2}\frac{m^2}{k^2}\log (1/\epsilon)\), we have distributions that are \((k, 2^{-\Omega \left(m\log (1/\epsilon)/k\right)})\)-indistinguishable on \(\lbrace -1,1\rbrace ^m_{\le n}\) but cannot \(\epsilon\)-fool \(\mathsf {OR}\). For \(m \gt n\), we have \(2^{-\Omega \left(m\log (1/\epsilon)/k\right)} \lt 2^{-\Omega \left(n\log (1/\epsilon)/k\right)}\) for fixed \(k\) and \(\epsilon\). This also means that we need other methods for Surjectivity.
Another open problem is to show tight degree-weight tradeoffs for \(\mathsf {OR}\) on \(\lbrace -1,1\rbrace ^m_{\le n}\). Chandrasekaran et al. [17Corollary 5.2] proved that it requires weight roughly at least \((\frac{m}{k\sqrt {n}})^{\sqrt {n}}\) for constant \(\epsilon\), so when \(k\sqrt {n} \le (1-\Omega (1))m\) it requires \(2^{\Omega (\sqrt {n})}\).
Another open problem is to understand how the approximate weight of a symmetric function \(f\) changes when \(k = \Theta (n)\). In References [1, 2] they showed that when \(k=n\) for constant error it is very close to \(2^{O(\tau ^{\prime }(f))}\), where \(\tau ^{\prime }(f)\) is the smallest number \(t^{\prime } \in [0, \frac{n}{2}]\) such that \(f\) or \(f \cdot \mathsf {PARITY}\) is constant on inputs of Hamming weight in \((t^{\prime }, n-t^{\prime })\). Our results show tight bounds of \(2^{O(\tau (f))}\) for \(k \le \Theta (n)\), but \(\tau (f)\) could be much larger than \(\tau ^{\prime }(f)\) as in the case of \(\mathsf {PARITY}\). What happens in between? Can we get a better upper or lower bound?
We also lack a matching “does not fool” result for \(t\)-\(\mathsf {CNF}\) as tight approximate degree and weight are not known even for 2-\(\mathsf {CNF}\) (without promise on the input). The open problem here is to prove lower bounds matching our results for \(t\)-\(\mathsf {CNF}\).

Acknowledgments

The authors thank Lijie Chen for showing the idea for proving Claim 3.3, Chin Ho Lee for pointing out the paper [27], and Justin Thaler for pointing out the line of works [15, 17, 29]. The authors also thank Mark Bun and Avishay Tal for useful discussions and anonymous reviewers for valuable feedback, especially for the proof of Theorem 1.13.

Footnotes

1
It is easy to see that changing the output basis between Boolean and Fourier does not change the degree and the weight (up to a constant factor). For convenience, we use \(\lbrace -1,1\rbrace\) as the output basis here.
2
Note that \(\mathsf {THR}_{n, r} \in \mathsf {SYM}_{n, r-1}\) for \(r \le \frac{n}{2} + 1\), equivalently \(\mathsf {THR}_{n, t+1} \in \mathsf {SYM}_{n,t}\) for \(t \le \frac{n}{2}\).
3
Indeed \((n,\delta)\)-indistinguishability for any \(\delta \gt \epsilon\).

References

[1]
Anil Ada, Omar Fawzi, and Hamed Hatami. 2012. Spectral norm of symmetric functions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Anupam Gupta, Klaus Jansen, José Rolim, and Rocco Servedio (Eds.). Springer, Berlin, 338–349.
[2]
Anil Ada, Omar Fawzi, and Raghav Kulkarni. 2017. On the spectral properties of symmetric functions. arXiv:1704.03176 [cs.CC]
[3]
Noga Alon, Oded Goldreich, and Yishay Mansour. 2003. Almost k-wise independence versus k-wise independence. Inform. Process. Lett. 88, 3 (2003), 107–110.
[4]
Paul Beame and Widad Machmouchi. 2012. The quantum query complexity of AC\(^0\). Quant. Inf. Comput. 12, 7-8 (Jul. 2012), 670–676.
[5]
Andrej Bogdanov. 2018. Approximate Degree of AND via Fourier Analysis. Electronic Colloquium on Computational Complexity, TR18-197.
[6]
Andrej Bogdanov, Yuval Ishai, Emanuele Viola, and Christopher Williamson. 2016. Bounded indistinguishability and the complexity of recovering secrets. In Proceedings of the International Cryptology Conference (CRYPTO’16). Springer-Verlag, Berlin, 593–618.
[7]
Andrej Bogdanov, Nikhil S. Mande, Justin Thaler, and Christopher Williamson. 2019. Approximate degree, secret sharing, and concentration phenomena. In Proceedings of the International Conference on Approximation Algorithms for Combinatorial Optimization Problems/International Conference on Randomization and Computation (APPROX/RANDOM’19), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 145, Dimitris Achlioptas and László A. Végh (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 71:1–71:21.
[8]
Andrej Bogdanov and Christopher Williamson. 2017. Approximate bounded indistinguishability. In Proceedings of the 44th International Colloquium on Automata, Languages, and Programming (ICALP’17),Leibniz International Proceedings in Informatics (LIPIcs), Vol. 80, Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 53:1–53:11.
[9]
Adam Bouland, Lijie Chen, Dhiraj Holden, Justin Thaler, and Prashant Nalini Vasudevan. 2017. On the power of statistical zero knowledge. In Proceedings of the IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS’17). IEEE Computer Society, Los Alamitos, CA, 708–719.
[10]
Harry Buhrman, Richard Cleve, Ronald de Wolf, and Christof Zalka. 1999. Bounds for small-error and zero-error quantum algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99). IEEE Computer Society, Los Alamitos, CA, 358–368.
[11]
Harry Buhrman and Ronald de Wolf. 2001. Communication complexity lower bounds by polynomials. In Proceedings of the 16th Annual Conference on Computational Complexity (CCC’01). IEEE Computer Society, Washington, DC, 120–130.
[12]
Harry Buhrman, Ilan Newman, Hein Röhrig, and Ronald de Wolf. 2007. Robust polynomials and quantum algorithms. Theory Comput. Syst. 40, 4 (2007), 379–395.
[13]
Mark Bun, Robin Kothari, and Justin Thaler. 2018. The polynomial method strikes back: Tight quantum query bounds via dual polynomials. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC;18). ACM, New York, NY, 297–310.
[14]
Mark Bun and Justin Thaler. 2013. Dual lower bounds for approximate degree and markov-bernstein inequalities. In Proceedings of the 40th International Conference on Automata, Languages, and Programming, Volume Part I. Springer-Verlag, Berlin, 303–314.
[15]
Mark Bun and Justin Thaler. 2015. Hardness amplification and the approximate degree of constant-depth circuits. In Automata, Languages, and Programming, Magnús M. Halldórsson, Kazuo Iwama, Naoki Kobayashi, and Bettina Speckmann (Eds.). Springer, Berlin, 268–280.
[16]
Mark Bun and Justin Thaler. 2017. A nearly optimal lower bound on the approximate degree of \(\mathsf {AC}^0\). In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS’17). IEEE Computer Society, Los Alamitos, CA, 1–12.
[17]
Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, and Andrew Wan. 2014. Faster private release of marginals on small databases. In Proceedings of the Annual Conference on Innovations in Theoretical Computer Science (ITCS’14). Association for Computing Machinery, New York, NY, 387–402.
[18]
Arkadev Chattopadhyay, Nikhil S. Mande, and Suhail Sherif. 2019. The log-approximate-rank conjecture is false. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC’19), Moses Charikar and Edith Cohen (Eds.). Association for Computing Machinery, New York, NY, 42–53.
[19]
E. W. Cheney. 1982. Introduction to Approximation Theory (2nd ed.). Chelsea Pub. Co., New York, N.Y.81067708
[20]
Alison L. Gibbs and Francis Edward Su. 2002. On choosing and bounding probability metrics. Int. Stat. Rev. 70, 3 (2002), 419–435.
[21]
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. J. Am. Statist. Assoc. 58, 301 (1963), 13–30.
[22]
Xuangui Huang and Emanuele Viola. 2019. Approximate Degree-Weight and Indistinguishability. Electronic Colloquium on Computational Complexity, TR19-085.
[23]
Adam R. Klivans and Rocco A. Servedio. 2006. Toward attribute efficient learning of decision lists and parities. J. Mach. Learn. Res. 7 (2006), 587–602.
[24]
Joseph Naor and Moni Naor. 1993. Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput. 22, 4 (1993), 838–856.
[25]
Noam Nisan and Mario Szegedy. 1994. On the degree of boolean functions as real polynomials. Comput. Complex. 4 (1994), 301–313.
[26]
Ryan O’Donnell. 2014. Analysis of Boolean Functions. Cambridge University Press, New York, NY.
[27]
Ryan O’Donnell and Yu Zhao. 2018. On closeness to k-wise uniformity. In Proceedings of the International Conference on Approximation Algorithms for Combinatorial Optimization Problems/International Conference on Randomization and Computation (APPROX/RANDOM’18),Leibniz International Proceedings in Informatics (LIPIcs), Vol. 116, Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 54:1–54:19.
[28]
Ramamohan Paturi. 1992. On the degree of polynomials that approximate symmetric boolean functions (preliminary version). In Proceedings of the 24th Annual ACM Symposium on Theory of Computing (STOC’92). ACM, New York, NY, 468–474.
[29]
Rocco A. Servedio, Li-Yang Tan, and Justin Thaler. 2012. Attribute-efficient learning and weight-degree tradeoffs for polynomial threshold functions. In Proceedings of the 25th Annual Conference on Learning Theory, Proceedings of Machine Learning Research, Vol. 23, Shie Mannor, Nathan Srebro, and Robert C. Williamson (Eds.). PMLR, Edinburgh, Scotland, 14.1–14.19.
[30]
Alexander A. Sherstov. 2008. Approximate inclusion-exclusion for arbitrary symmetric functions. In Proceedings of the 23rd Annual IEEE Conference on Computational Complexity. IEEE Computer Society, Los Alamitos, CA, 112–123.
[31]
Alexander A. Sherstov. 2012. Strong direct product theorems for quantum communication and query complexity. SIAM J. Comput. 41, 5 (2012), 1122–1165.
[32]
Alexander A. Sherstov. 2013. Approximating the AND-OR tree. Theory Comput. 9, 20 (2013), 653–663.
[33]
Alexander A. Sherstov. 2013. The intersection of two halfspaces has high threshold degree. SIAM J. Comput. 42, 6 (2013), 2329–2374.
[34]
Alexander A. Sherstov. 2018. Algorithmic polynomials. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC’18). ACM, New York, NY, 311–324.
[35]
Emanuele Viola. 2017. Special Topics in Complexity Theory. ECCC Lecture Notes. Retrieved December 28, 2017 fromhttp://www.ccs.neu.edu/home/viola/classes/spepf17.html.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computation Theory
ACM Transactions on Computation Theory  Volume 14, Issue 1
March 2022
155 pages
ISSN:1942-3454
EISSN:1942-3462
DOI:10.1145/3505197
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2022
Accepted: 01 September 2021
Revised: 01 September 2021
Received: 01 August 2020
Published in TOCT Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate degree
  2. approximate weight
  3. polynomial approximation
  4. indistinguishability
  5. symmetric functions
  6. CNF

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 515
    Total Downloads
  • Downloads (Last 12 months)296
  • Downloads (Last 6 weeks)50
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media