research-article

A New Minimax Theorem for Randomized Algorithms

Authors:

Shalev Ben-David and

Eric BlaisAuthors Info & Claims

Journal of the ACM, Volume 70, Issue 6

Article No.: 38, Pages 1 - 58

https://doi.org/10.1145/3626514

Published: 30 November 2023 Publication History

Get Access

Abstract

The celebrated minimax principle of Yao says that for any Boolean-valued function f with finite domain, there is a distribution μ over the domain of f such that computing f to error ε against inputs from μ is just as hard as computing f to error ε on worst-case inputs. Notably, however, the distribution μ depends on the target error level ε: the hard distribution which is tight for bounded error might be trivial to solve to small bias, and the hard distribution which is tight for a small bias level might be far from tight for bounded error levels.

In this work, we introduce a new type of minimax theorem which can provide a hard distribution μ that works for all bias levels at once. We show that this works for randomized query complexity, randomized communication complexity, some randomized circuit models, quantum query and communication complexities, approximate polynomial degree, and approximate logrank. We also prove an improved version of Impagliazzo’s hardcore lemma.

Our proofs rely on two innovations over the classical approach of using Von Neumann’s minimax theorem or linear programming duality. First, we use Sion’s minimax theorem to prove a minimax theorem for ratios of bilinear functions representing the cost and score of algorithms.

Second, we introduce a new way to analyze low-bias randomized algorithms by viewing them as “forecasting algorithms” evaluated by a certain proper scoring rule. The expected score of the forecasting version of a randomized algorithm appears to be a more fine-grained way of analyzing the bias of the algorithm. We show that such expected scores have many elegant mathematical properties—for example, they can be amplified linearly instead of quadratically. We anticipate forecasting algorithms will find use in future work in which a fine-grained analysis of small-bias algorithms is required.

Appendices

A Proofs Related to the Minimax Theorem

Lemma 2.8 (An Upper Semicontinuous Function on a Compact Set Attains Its Max).

Let \(X\) be a nonempty compact topological space, and let \(\phi : X\rightarrow \overline{{\mathbb {R}}}\) be a function. Then if \(\phi\) is upper semicontinuous, it attains its maximum, meaning there is some \(x\in X\) such that for all \(x^{\prime }\in X\) , \(\phi (x^{\prime })\le \phi (x)\) . Similarly, if \(\phi\) is lower semicontinuous, it attains its minimum.

Proof.

The lower semicontinuous case follows from the upper semicontinuous case simply by negating \(\phi\) , so we focus on the upper semicontinuous case. Let \(z=\sup _{x\in X}\phi (x)\) , where \(z\in \overline{{\mathbb {R}}}\) . Let \(x_0\) be any element of \(X\) . If \(\phi (x_0)=z\) , we are done, so assume \(\phi (x_0)\lt z\) ; in particular, \(z\gt -\infty\) . We define a sequence \(x_1,x_2,\dots\) as follows. If \(z\lt \infty\) , define \(x_{i}\) to be any element of \(X\) such that \(\phi (x_i)\gt z-1/i\) . If \(z=\infty\) , define \(x_i\) to be any element of \(X\) such that \(\phi (x_i)\gt i\) . Moreover, for each \(i\in \mathbb {N}\) , let \(U_i=\lbrace \,x\in X:\phi (x)\lt \phi (x_i)\rbrace\) . Note that any \(x\in X\) for which \(\phi (x)\lt z\) must be in \(U_i\) for some \(i\in \mathbb {N}\) ; hence, if the supremum \(z\) is not attained, the sets \(U_i\) form a cover for \(X\) (meaning \(\bigcup _{i\in \mathbb {N}}U_i=X\) ).

The key claim is that the \(U_i\) sets are all open if \(\phi\) is upper semicontinuous. This is is because if \(x\in U_i\) , then \(\phi (x)\lt \phi (x_i)\) , and by the definition of upper semicontinuity, there is a neighborhood \(U\) of \(x\) on which \(\phi (\cdot)\) is still less than \(\phi (x_i)\) ; thus, there is a neighborhood \(U\) of \(x\) contained in \(U_i\) so that \(U_i\) is open. In this case, if the supremum \(z\) is not attained, the collection \(\lbrace U_i\rbrace _{i\in \mathbb {N}}\) is an open cover of \(X\) , and by the definition of compactness, it has a finite subcover. Let \(i\) be the largest index of some \(U_i\) in this subcover. Then it follows that \(\phi (x)\lt \phi (x_i)\) for all \(x\in X\) , which is a contradiction. Hence, the supremum \(z\) must be attained as a maximum, as desired. □

Lemma 2.9 (A Pointwise Infimum of Upper Semicontinuous Functions Is Upper Semicontinuous).

Let \(X\) be a topological space, let \(I\) be a set, and let \(\lbrace \phi _i\rbrace _{i\in I}\) be a collection of functions \(\phi _i : X\rightarrow \overline{{\mathbb {R}}}\) . Then if each \(\phi _i\) is upper semicontinuous, the function \(\phi (x)=\inf _{i\in I}\phi _i(x)\) is also upper semicontinuous. Similarly, if each \(\phi _i\) is lower semicontinuous, the pointwise supremum is lower semicontinuous.

Proof.

Note that the case where \(\phi _i\) are all lower semicontinuous follows from the case where they are all upper semicontinuous simply by negating the functions, since negation flips upper and lower semicontinuity and flips infimums and supremums. We focus on the case where \(\phi _i\) are all upper semicontinuous.

Fix \(x\in X\) . If \(\phi (x)=\infty\) , \(\phi\) is upper semicontinuous at \(x\) by definition. If \(\phi (x)\lt \infty\) , fix any \(y\gt \phi (x)\) . By the definition of \(\phi (x)\) as an infimum, there is some \(i\in I\) such that \(\phi _i(x)\lt y\) . By the upper semicontinuity of \(\phi _i(\cdot)\) , there is a neighborhood \(U\) of \(x\) such that for all \(x^{\prime }\in U\) , we have \(\phi _i(x^{\prime })\lt y\) . Then for all \(x^{\prime }\in U\) , we clearly have \(\phi (x^{\prime })=\inf _{i\in I}\phi _i(x^{\prime })\lt y\) . Thus, \(\phi\) is upper semicontinuous at \(x\) , as desired. □

Lemma A.1.

Let \(V\) be a real vector space, and let \(X\subseteq V\) . The convex hull of \(X\) is the set of all \(v\in V\) which can be written as a convex combination of vectors in \(x\) —that is, \(v\) for which there exist \(k\in \mathbb {N}\) , \(x_1,x_2,\dots ,x_k\in X\) , and \(\lambda _1,\lambda _2,\dots ,\lambda _k\in [0,1]\) with \(\lambda _1+\lambda _2+\dots +\lambda _k=1\) such that \(v=\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k\) .

Proof.

This is a well-known characterization of the convex hull, which can be shown as follows: let \(Y\) be the set of all finite convex combinations of points in \(X\) —that is, \(Y\) contains all points in \(V\) of the form \(\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k\) , where \(k\in \mathbb {N}\) , \(x_1,x_2,\dots ,x_k\in X\) , and \(\lambda _1,\lambda _2,\dots , \lambda _k\in [0,1]\) with \(\lambda _1+\lambda _2+\dots +\lambda _k=1\) . Then \(Y\) is clearly convex, since for all \(y_1,y_2\in Y\) and \(\lambda \in (0,1)\) , we know that \(y_1\) and \(y_2\) are finite convex combinations of points in \(x\) , meaning that \(\lambda y_1+(1-\lambda)y_2\) is also a finite convex combination of points in \(X\) . Furthermore, if \(Z\) is any other convex set containing \(X\) , then it is easy to show by induction that \(Z\) contains all convex combinations of \(k\) points in \(X\) for each \(k\in \mathbb {N}\) ; hence, \(Z\) must be a superset of \(Y\) . It follows that \(\operatorname{Conv}(X)\) , the intersection of all convex sets containing \(X\) , must exactly equal \(Y\) . □

Lemma 2.10 (Quasiconvex Functions on Convex Hulls).

Let \(V\) be a real vector space, let \(X\subseteq V\) , and let \(\phi :\operatorname{Conv}(X)\rightarrow \overline{{\mathbb {R}}}\) be a function. If \(\phi\) is quasiconvex, then

\(\begin{equation*} \sup _{x\in \operatorname{Conv}(X)}\phi (x)=\sup _{x\in X}\phi (x). \end{equation*}\)

Similarly, if \(\phi\) is quasiconcave, then

\(\begin{equation*} \inf _{x\in \operatorname{Conv}(X)}\phi (x)=\inf _{x\in X}\phi (x). \end{equation*}\)

Proof.

The quasiconcave case follows from the quasiconvex case by negating \(\phi\) ; hence, it suffices to prove the quasiconvex case. It is clear that \(\sup _{x\in \operatorname{Conv}(X)}\phi (x)\) is at least \(\sup _{x\in X}\phi (x)\) , so we only need to show the latter is at least the former. To this end, let \(y^*:= \sup _{x\in \operatorname{Conv}(X)}\phi (x)\) , and let \(\hat{x}\in \operatorname{Conv}(X)\) be such that \(\phi (\hat{x})\) is arbitrarily close to \(y^*\) . We must show that \(\sup _{x\in X}\phi (x)\ge \phi (\hat{x})\) or, equivalently, that there is some \(x\in X\) with \(\phi (x)\ge \phi (\hat{x})\) .

Using Lemma A.1, we can now write \(\hat{x}\in \operatorname{Conv}(X)\) as \(\hat{x}=\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k\) , with \(k\in \mathbb {N}\) , \(x_1,x_2,\dots , x_k\in X\) , and \(\lambda _1,\lambda _2,\dots ,\lambda _k\in [0,1]\) with \(\lambda _1+\lambda _2+\dots +\lambda _k=1\) . Furthermore, assume that \(\lambda _i\gt 0\) for each \(i\in [k]\) (we can remove \(\lambda _i x_i=0\) from the linear combination otherwise). Now, note that by quasiconvexity, we have \(\phi (\lambda x_1+(1-\lambda)x_2)\le \max \lbrace \phi (x_1),\phi (x_2)\rbrace\) . It is not hard to show by induction that \(\phi (\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k) \le \max \lbrace \phi (x_1),\phi (x_2),\dots ,\phi (x_k)\rbrace\) . Hence, there is some \(x\in X\) such that \(\phi (x)\ge \phi (\hat{x})\) , as desired. □

Lemma 2.15.

Let \(V\) be a real topological vector space, and let \(X\subseteq V\) be convex. For a function \(\psi : X\rightarrow \overline{{\mathbb {R}}}\) , let \(\psi ^{+}\) denote the function \(\psi ^{+}(x)=\max \lbrace \psi (x),0\rbrace\) . Then this operation on \(\psi\) preserves convexity, quasiconvexity, quasiconcavity, upper semicontinuity, and lower semicontinuity, but not concavity.

We actually prove a stronger statement, where the maximum is taken with an arbitrary constant.

Lemma A.2.

Let \(V\) be a real topological vector space, and let \(X\subseteq V\) be convex. Let \(\psi : X\rightarrow \overline{{\mathbb {R}}}\) be a function, let \(c\in {\mathbb {R}}\) be a constant, and let \(\psi ^{\prime }: X\rightarrow \overline{{\mathbb {R}}}\) be the function \(\psi ^{\prime }(x)=\max \lbrace \psi (x),c\rbrace\) . Then if \(\psi\) is convex, \(\psi ^{\prime }\) is convex; if \(\psi\) is quasiconvex, \(\psi ^{\prime }\) is quasiconvex; if \(\psi\) is quasiconcave, \(\psi ^{\prime }\) is quasiconcave; if \(\psi\) is upper semicontinuous, \(\psi ^{\prime }\) is upper semicontinuous; and if \(\psi\) is lower semicontinuous, \(\psi ^{\prime }\) is lower semicontinuous.

Proof.

Let \(x,y\in X\) , and let \(\lambda \in (0,1)\) . Then

\(\begin{equation*} \psi ^{\prime }(\lambda x+(1-\lambda)y)=\max \lbrace \psi (\lambda x+(1-\lambda)y),c\rbrace . \end{equation*}\)

If this maximum equals \(c\) , it is certainly at most \(\lambda \max \lbrace \psi (x),c\rbrace +(1-\lambda)\max \lbrace \psi (y),c\rbrace\) , since these two latter maximums are each at least \(c\) . Hence, the inequalities for convexity and quasiconvexity always hold when the original maximum equals \(c\) . Alternatively, if \(\max \lbrace \psi (\lambda x+(1-\lambda)y),c\rbrace =\psi (\lambda x+(1-\lambda)y)\) , then using \(\psi (x)\le \psi ^{\prime }(x)\) and \(\psi (y)\le \psi ^{\prime }(y)\) , we see that convexity of \(\psi\) gives the inequality for convexity of \(\psi ^{\prime }\) , and quasiconvexity of \(\psi\) gives the inequality for quasiconvexity of \(\psi ^{\prime }\) .

Next, suppose \(\psi\) is quasiconcave. Without loss of generality, say that \(\psi (x)\le \psi (y)\) . Then \(\psi ^{\prime }(\lambda x+(1-\lambda)y)=\max \lbrace \psi (\lambda x+(1-\lambda)y),c\rbrace \ge \max \lbrace \psi (x),c\rbrace =\psi ^{\prime }(x)\ge \min \lbrace \psi ^{\prime }(x),\psi ^{\prime }(y)\rbrace\) , and \(\psi ^{\prime }\) is quasiconcave.

Preservation of lower semicontinuity follows from Lemma 2.9, where we note that \(c\) is continuous as a function from \(X\) to \(\overline{{\mathbb {R}}}\) . It remains to show upper semicontinuity is preserved. Suppose \(\psi\) is upper semicontinuous, and let \(x\in X\) . If \(\psi ^{\prime }(x)=\infty\) , upper semicontinuity at \(x\) vacuously holds. Otherwise, fix some \(y\gt \psi ^{\prime }(x)\) . Then \(\psi (x) \le \psi ^{\prime }(x) \lt y\) and upper semicontinuity gives us a neighborhood \(U\) of \(x\) on which \(\psi (\cdot)\) is less than \(y\) . And since \(\psi ^{\prime }(x)\ge c\) , we have \(y\gt c\) so \(\psi ^{\prime }(\cdot)=\max \lbrace c,\psi (\cdot)\rbrace \lt y\) on \(U\) . Hence, \(\psi ^{\prime }\) is upper semicontinuous. □

Theorem A.3 (Sion’s Minimax [Sion 1958]).

Let \(V_1\) and \(V_2\) be real topological vector spaces, and let \(X\subseteq V_1\) and \(Y\subseteq V_2\) be convex. Let \(\alpha : X\times Y\rightarrow {\mathbb {R}}\) be semicontinuous and quasisaddle. If either \(X\) or \(Y\) is compact, then

\(\begin{equation*} \inf _{x\in X}\sup _{y\in Y}\alpha (x,y) =\sup _{y\in Y}\inf _{x\in X}\alpha (x,y). \end{equation*}\)

Theorem 2.11 (Sion’s Minimax for Extended Reals).

Let \(V_1\) and \(V_2\) be real topological vector spaces, and let \(X\subseteq V_1\) and \(Y\subseteq V_2\) be convex. Let \(\alpha : X\times Y\rightarrow \overline{{\mathbb {R}}}\) be semicontinuous and quasisaddle. If either \(X\) or \(Y\) is compact, then

\(\begin{equation*} \inf _{x\in X}\sup _{y\in Y}\alpha (x,y) =\sup _{y\in Y}\inf _{x\in X}\alpha (x,y). \end{equation*}\)

Proof.

First, note that the inf-sup is always at least the sup-inf. This is because these expressions can be thought of as two players, one choosing \(x\) and trying to minimize \(\alpha (x,y)\) , and the other choosing \(y\) and trying to maximize \(y\) ; in the inf-sup, the sup player chooses \(y\) after already knowing \(x\) , and therefore has more information and is better positioned to maximize \(\alpha (x,y)\) than in the sup-inf, where the inf player goes second.

Now, let

\(\begin{equation*} a:= \sup _{y\in Y}\inf _{x\in X}\alpha (x,y),\qquad b:= \inf _{x\in X}\sup _{y\in Y}\alpha (x,y). \end{equation*}\)

We have \(a,b\in \overline{{\mathbb {R}}}\) , and \(a\le b\) . We wish to show \(a=b\) . Suppose by contradiction that \(a\lt b\) . Then we can pick \(a^{\prime },b^{\prime }\in {\mathbb {R}}\) such that \(a\lt a^{\prime }\lt b^{\prime }\lt b\) . We then define \(\alpha ^{\prime } : X\times Y\rightarrow {\mathbb {R}}\) by \(\alpha ^{\prime }(x,y):= a^{\prime }\) if \(\alpha (x,y)\le a^{\prime }\) , \(\alpha ^{\prime }(x,y):= b^{\prime }\) if \(\alpha ^{\prime }(x,y)\ge b^{\prime }\) , and \(\alpha ^{\prime }(x,y):=\alpha (x,y)\) if \(\alpha (x,y)\in [a^{\prime },b^{\prime }]\) .

Note that \(\alpha ^{\prime }(x,y)=\max \lbrace a^{\prime },\min \lbrace b^{\prime },\alpha (x,y)\rbrace \rbrace\) . By Lemma A.2, we know that taking a maximum with a constant preserves quasiconvexity, quasiconcavity, and upper and lower semicontinuities. By negating the function, it also follows that taking a minimum with a constant preserves these properties. From this, it follows that \(\alpha ^{\prime }\) is quasisaddle and semicontinuous, since \(\alpha\) has these properties.

Now, since \(a=\sup _{y\in Y}\inf _{x\in X}\alpha (x,y)\) and since \(a^{\prime }\gt a\) , we know that for all \(y\in Y\) , there exists some \(x\in X\) for which \(\alpha (x,y)\lt a^{\prime }\) . This means that for all \(y\in Y\) , there exists \(x\in X\) for which \(\alpha ^{\prime }(x,y)=a^{\prime }\) . Hence, \(\sup _{y\in Y}\inf _{x\in X}\alpha ^{\prime }(x,y)=a^{\prime }\) . Similarly, since \(b=\inf _{x\in X}\sup _{y\in Y}\alpha (x,y)\) and since \(b^{\prime }\lt b\) , we know that for all \(x\in X\) , there exists some \(y\in Y\) for which \(\alpha (x,y)\gt b^{\prime }\) . This means that for all \(x\in X\) , there exists \(y\in Y\) for which \(\alpha ^{\prime }(x,y)=b^{\prime }\) . Hence, \(\inf _{x\in X}\sup _{y\in Y}\alpha ^{\prime }(x,y)=b^{\prime }\) . By Theorem A.3, we then have

\(\begin{equation*} b^{\prime }=\inf _{x\in X}\sup _{y\in Y}\alpha ^{\prime }(x,y) =\sup _{y\in Y}\inf _{x\in X}\alpha ^{\prime }(x,y)=a^{\prime }. \end{equation*}\)

But this is a contradiction, since we picked \(a^{\prime }\lt b^{\prime }\) . We conclude that we must have had \(a=b\) to begin with, as desired. □

B Distance Measures

Lemma 3.3.

\(\operatorname{hs}\) , \(\operatorname{Brier}\) , and \(\operatorname{ls}\) are proper scoring rules. \(\operatorname{bias}\) is a scoring rule which is not proper.

Proof.

It is clear that all of the functions from Definition 3.2 are smooth on \((0,1)\) and increasing on \([0,1]\) , where we interpret \(\operatorname{hs}(0)=\operatorname{ls}(0)=-\infty\) . It is also clear that all these functions evaluate to 1 at 1 and to 0 at \(1/2\) . It remains to show that \(\operatorname{Brier}\) , \(\operatorname{ls}\) , and \(\operatorname{hs}\) are proper. To do so, we need to show that \(ps(q)+(1-p)s(1-q)\) is uniquely optimized at \(q=p\) when \(s\) is one of these functions and \(p\in (0,1)\) . Fix such \(p\in (0,1)\) , and observe that the critical points of the expression we wish to maximize are the points \(q\) such that \(ps^{\prime }(q)=(1-p)s^{\prime }(1-q)\) .

For \(\operatorname{ls}(q)=1-\log (1/q)=1+(\log e)\ln q\) , the critical points \(q\) satisfy \((\log e)p/q=(\log e)(1-p)/(1-q)\) , or \(p/(1-p)=q/(1-q)\) . Noting that the function \(x/(1-x)\) is increasing on \((0,1)\) , and hence injective on \((0,1)\) , we conclude that the only critical point is \(q=p\) . Moreover, at the boundaries \(q=0\) and \(q=1\) , we clearly have \(p\operatorname{ls}(q)+(1-p)\operatorname{ls}(1-q)=-\infty\) , whereas in the interior the expression is finite. Hence, the unique maximum must occur at \(q=p\) .

For \(\operatorname{hs}(q)=1-\sqrt {(1-q)/q}=1-\sqrt {1/q-1}\) , we have \(\operatorname{hs}^{\prime }(q)=1/2\sqrt {q^3(1-q)}\) , so the critical points \(q\) satisfy \(p/2\sqrt {q^3(1-q)}=(1-p)/2\sqrt {(1-q)^3q}\) , or \(p/q=(1-p)/(1-q)\) , which once again only occurs at \(q=p\) . At the boundaries, we once again have \(p\operatorname{hs}(q)+(1-p)\operatorname{hs}(1-q)=-\infty\) for \(q=0\) or \(q=1\) , so the unique maximum occurs at \(q=p\) .

Finally, for \(\operatorname{Brier}(q)=1-4(1-q)^2=-4q^2+8q-3\) , we have \(\operatorname{Brier}^{\prime }(q)=8(1-q)\) , so the critical points \(q\) satisfy \(8p(1-q)=8(1-p)q\) , which again implies \(q=p\) . This time, the boundary points are finite, but we can use the second-order condition: the second derivative of \(p\operatorname{Brier}(q)+(1-p)\operatorname{Brier}(1-q)\) is \(p\operatorname{Brier}^{\prime \prime }(q)+(1-p)\operatorname{Brier}^{\prime \prime }(1-q)\) . Noting that \(\operatorname{Brier}^{\prime \prime }(q)=-8\) , this is \(-8p-8(1-p)=-8\lt 0\) . Hence, the critical point is a maximum, and since it is unique (with the boundaries 0 and 1 not being critical even if we extend the domain of the function), we conclude it is the unique maximum. □

Lemma B.1.

For any \(x\in [0,1]\) , we have

\(\begin{equation*} \frac{x^2}{2}\le 1-\sqrt {1-x^2}\le 1-H\left(\frac{1+x}{2}\right) \le x^2\le x. \end{equation*}\)

Additionally, \(x^2\) and \(1-\sqrt {1-x}\) are convex functions on \([0,1]\) .

Proof.

The inequality \(x^2\le x\) is clearly true for \(x\in [0,1]\) . Set \(f(x) = 1 - \sqrt {1-x^2}\) and \(g(x) = 1 - H(\frac{1+x}{2})\) . We want to show that \(x^2/2 \le f(x) \le g(x) \le x^2\) for all \(x \in [0,1]\) . We prove each of these inequalities in order.

The function \(f\) satisfies \(f(0) = 0 = (0)^2/2\) and has derivative \(f^{\prime }(x) = x/\sqrt {1-x^2}\) which is greater than \(x = (x^2/2)^{\prime }\) when \(0 \lt x \lt 1\) , so \(f\) grows faster than \(x^2/2\) over that interval. Therefore, \(x^2/2 \le f(x)\) for all \(x \in [0,1]\) .

The functions \(f\) and \(g\) satisfy \(f(0) = g(0) = 0\) and \(f^{\prime }(0) = g^{\prime }(0) = 0,\) and their second derivatives are \(f^{\prime \prime }(x) = (1-x^2)^{-3/2}\) and \(g^{\prime \prime }(x) = \big (\ln 2 \cdot (1-x^2)\big)^{-1}\) . So \(f^{\prime \prime }(x) \gt g^{\prime \prime }(x)\) if and only if \(\sqrt {1-x^2} \lt \ln 2\) , which holds if and only if \(|x| \gt 1 - \ln ^2 2 \approx 0.72\) . Therefore, \(f\) and \(g\) have only one intersection point in \(\mathbb {R}_{\gt 0}\) and \(f(x) \lt g(x)\) for all \(x\) between 0 and that intersection point. Since \(f(1) = 1 = g(1)\) , this means that \(f(x) \le g(x)\) for all \(x \in [0,1]\) .

The function \(x^2\) also has value and first derivative equal to 0 at \(x=0\) . In addition, \(g^{\prime \prime }(x) \gt (x^2)^{\prime \prime }\) if and only if \(\big (\ln 2 \cdot (1-x^2)\big)^{-1} \gt 2\) , which holds if and only if \(|x| \gt \sqrt { 1 - 1/\ln 4} \approx 0.53\) . So \(g\) and \(x^2\) have only one intersection point in \(\mathbb {R}_{\gt 0}\) and \(g(x) \lt x^2\) for all points \(x\) between 0 and this intersection point. Since \(g(1) = 1 = (1)^2\) , we then have \(g(x) \le x^2\) for all \(x \in [0,1]\) .

Finally, the convexity of \(x^2\) and \(1-\sqrt {1-x}\) on \([0,1]\) follows immediately from the fact that their second derivatives are both positive on \((0,1)\) . □

Lemma 3.6 (Relations between Distance Measures).

When applied to fixed \(\nu _0\) , \(\nu _1\) , and \(w\) , the distance measures satisfy

\(\begin{equation*} \frac{\operatorname{S}^2}{2}\le 1-\sqrt {1-\operatorname{S}^2} \le \text{h}^2\le \operatorname{JS}\le \operatorname{S}^2 \end{equation*}\)

as well as

\(\begin{equation*} \Delta ^2\le \operatorname{S}^2\le \Delta . \end{equation*}\)

We also have \(\operatorname{JS}\le \text{h}^2/\ln 2\) and \(\operatorname{S}^2\le (\ln 4)\operatorname{JS}\) .

Proof.

We use Lemma B.1. The inequality \(\frac{S^2}{2} \le 1 - \sqrt {1 - S^2}\) and the chain \(\text{h}^2\le \operatorname{JS}\le \operatorname{S}^2\le \Delta\) follow from the inequalities there, whereas the inequalities \(\Delta ^2\le \operatorname{S}^2\) and \(1-\sqrt {1-\operatorname{S}^2}\le \text{h}^2\) follow from Jensen’s inequality combined with the convexity of \(x^2\) and \(1-\sqrt {1-x}\) .

Finally, to show inequality \(\operatorname{JS}\le \text{h}^2/\ln 2,\) we only need to compute the limit of \(\alpha (x)/(1-\sqrt {1-x^2})\) as \(x\rightarrow 0\) , since this ratio is decreasing with \(x\) (where \(\alpha (x)\) is defined as in the proof of Lemma B.1). To do that it suffices to use \(\alpha (x)=x^2+O(x^4)\) and \(1-\sqrt {1-x^2}=x^2/2+O(x^4)\) , so the limit is 2. Hence, the limit of \((1-H((1+x)/2))/(1-\sqrt {1-x^2})\) as \(x\rightarrow 0\) is \(1/\ln 2\) , meaning this ratio is always at most \(1/\ln 2\) . Similarly, to show the inequality \(\operatorname{S}^2\le (\ln 4)\operatorname{JS}\) , we only need to compute the limit of \(\alpha (x)/x^2\) as \(x\rightarrow 0\) . Again using \(\alpha (x)=x^2+O(x^4)\) , the limit is 1, so the ratio \((1-H((1+x)/2))/x^2\) is always at least \(1/\ln 4\) . □

Lemma 3.11.

If \(x\in [0,1]\) and \(k\in [1,\infty)\) , we have

\(\begin{equation*} \frac{1}{2}\min \lbrace kx,1\rbrace \le 1-(1-x)^k\le \min \lbrace kx,1\rbrace . \end{equation*}\)

Proof.

Set \(f(x):= 1-(1-x)^k\) . Clearly, when \(x\in [0,1]\) , we have \(f(x)\in [0,1]\) , so \(f:[0,1]\rightarrow [0,1]\) . Note \(f(0)=0\) , \(f(1)=1\) , and that \(f(x)\) is increasing on \([0,1]\) . If \(k=1\) , we have \(f(x)=x\) , and the inequalities trivially hold; therefore, assume \(k\gt 1\) . Then \(f^{\prime }(x)=k(1-x)^{k-1}\) and \(f^{\prime \prime }(x)=-k(k-1)(1-x)^{k-2}\) , meaning that \(f(x)\) is concave on \([0,1]\) ; we also have \(f^{\prime }(0)=k\) and \(f^{\prime \prime }(0)=-k(k-1)\) . From this, we conclude that \(f(x)\le kx\) , proving the upper bound (as \(f(x)\le 1\) is clear).

For the lower bound, note that \(f^{\prime \prime \prime }(x)=k(k-1)(k-2)(1-x)^{k-3}\) , which is non-negative on \([0,1]\) . This means that \(f^{\prime \prime }(x)\ge -k(k-1)\) on \([0,1]\) , that \(f^{\prime }(x)\ge k-k(k-1)x\) on \([0,1]\) , and that \(f(x)\ge kx-(k(k-1)/2)x^2=kx(1-(k-1)x/2)\) on \([0,1]\) . If \((k-1)x\le 1\) , we get \(f(x)\ge kx/2\) . If \((k-1)x\ge 1\) , we have \(f(x)\ge 1-e^{-kx}\ge 1-1/e\ge 1/2\) . This completes the proof. □

Lemma 4.4 (Hellinger Distance of Disjoint Mixtures).

Let \(\mu\) be a distribution over a finite support \(A\) , and for each \(a\in A\) , let \(\nu _0^a\) and \(\nu _1^a\) be two distributions over a finite support \(S_a\) . Let \(\nu _0^\mu\) and \(\nu _1^\mu\) denote the mixture distributions where \(a\leftarrow \mu\) is sampled, and then a sample is produced from \(\nu _0^a\) or \(\nu _1^a\) , respectively. Assume the sets \(S_a\) are disjoint for all \(a\in A\) . Then

\(\begin{equation*} \text{h}^2(\nu _0^\mu ,\nu _1^\mu) =\mathbb {E}_{a\leftarrow \mu }[\text{h}^2(\nu _0^a,\nu _1^a)]. \end{equation*}\)

Proof.

Note that the squared-Hellinger distance is 1 minus the fidelity—that is, \(\text{h}^2(\mu _1,\mu _2)=1-F(\mu _1,\mu _2)\) , where \(F(\mu _1,\mu _2)=\sum _x\sqrt {\mu _1[x]\mu _2[x]}\) . (This is easy to check from the definition of \(\text{h}^2\) .) Now write

\(\begin{align*} \text{h}^2(\nu _0^\mu ,\nu _1^\mu)&=1-\sum _{x\in \bigcup _{a} S_a} \sqrt {\nu _0^\mu [x]\nu _1^\mu [x]}\\ &=1-\sum _{a\in A}\sum _{x\in S_a} \sqrt {\mu [a]\nu _0^a[x]\mu [a]\nu _1^a[x]}\\ &=1-\mathbb {E}_{a\leftarrow \mu }\left[\sum _{x\in S_a} \sqrt {\nu _0^a[x]\nu _1^a[x]}\right]\\ &=\mathbb {E}_{a\leftarrow \mu }\left[1-\sum _{x\in S_a} \sqrt {\nu _0^a[x]\nu _1^a[x]}\right]\\ &=\mathbb {E}_{a\leftarrow \mu }\left[\text{h}^2(\nu _0^a,\nu _1^a)\right]. \end{align*}\)

□

C Quantum Amplitude Estimation

We show the following strengthening of Theorem 5.1, which follows from Brassard et al. [2002].

Theorem C.1 (Amplitude Estimation).

Suppose we have access to a unitary \(U\) (representing a quantum algorithm) which maps \(|0\rangle\) to \(|\psi \rangle\) , as well as access to a projective measurement \(\Pi\) , and we wish to estimate \(p:=\Vert \Pi |\psi \rangle \Vert _2^2\) (representing the probability the quantum algorithm accepts). Fix \(\epsilon ,\delta \in (0,1/2)\) . Then using at most \((100/\epsilon)\cdot \ln (1/\delta)\) controlled applications of \(U\) or \(U^\dagger\) and at most that many applications of \(I-2\Pi\) , we can output \(\tilde{p}\in [0,1]\) such that \(|\tilde{p}-p|\le \epsilon\) with probability at least \(1-\delta\) .

Further, this can be tightened to a bound that depends on \(p\) , as follows. For any positive real number \(T\) , there is an algorithm which depends on \(\epsilon\) , \(\delta\) , and \(T\) (but not on \(p\) ) which uses at most \(T\) applications of the unitaries (as above) and outputs \(\tilde{p}\in [0,1]\) with the following guarantee: if \(T\) is at least \(\lfloor (100/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\cdot \ln (1/\delta) \rfloor\) , then \(|\tilde{p}-p|\le \epsilon\) with probability at least \(1-\delta\) .

Proof.

Brassard et al. [2002] showed that an algorithm which makes \(M\) controlled calls to the unitary \(U(I-2\mathinner {|{0}\rangle }\mathinner {\langle {0}|})U^{-1}(I-2\Pi)\) and one additional call to \(U\) can output \(\tilde{p}\) such that

\(\begin{equation*} |\tilde{p}-p| \le \frac{2\pi \sqrt {p(1-p)}}{M}+\frac{\pi ^2}{M^2} \end{equation*}\)

with probability at least \(8/\pi ^2\ge 4/5\) . If we pick \(M\) such that \(M\ge 8/\sqrt {\epsilon }\) and \(M\ge 8\sqrt {p}/\epsilon\) , then this is at most \((\pi /4+\pi ^2/64)\epsilon \le \epsilon\) . Note that \(M\) must be an integer, and that the number of applications of \(U\) or \(U^{-1}\) is \(2M+1\) . Hence, to get this success probability, it suffices to have \(T\ge 3+(16/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\) , or \(T\ge (19/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\) .

To generalize to other success probabilities, we amplify this algorithm by repeating \(2k+1\) times and returning the median estimate. The probability that this is still wrong is the probability that at least \(k+1\) out of \(2k+1\) of the estimates were wrong, which is

\(\begin{equation*} \sum _{i=1}^{k+1}\binom{2k+1}{k+1-i}q^{k+i}(1-q)^{k+1-i} \le q^{k+1}(1-q)^{k}\sum _{i=1}^{k+1}\binom{2k+1}{k+1-i} \end{equation*}\)

\(\begin{equation*} =q^{k+1}(1-q)^{k}2^{2k} =q(1-(1-2q)^2)^k \le qe^{-k(1-2q)^2}. \end{equation*}\)

Hence, to get this below \(\delta\) , we just need \(k\ge (1/(1-2q)^2)\ln (1/q\delta)\) , or \(k\ge 2.6\ln (1/\delta)-4\) . Since \(k\) must be an integer, we can always choose it so that \(2k+1\) is at most \(5.2\ln (1/\delta)\) . Multiplying this by the bound from before, we get that it suffices for \(T\) to be at most \((100/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\cdot \ln (1/\delta)\) , as desired. □

Index Terms

A New Minimax Theorem for Randomized Algorithms
1. Theory of computation

Recommendations

Super-linear time-space tradeoff lower bounds for randomized computation
FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science

We prove the first time-space lower bound tradeoffs for randomized computation of decision problems. The bounds hold even in the case that the computation is allowed to have arbitrary probability of error on a small fraction of inputs. Our techniques ...
Read More
Weak Derandomization of Weak Algorithms: Explicit Versions of Yao’s Lemma

A simple averaging argument shows that given a randomized algorithm A and a function f such that for every input x, Pr[A(x) = f(x)] ≥ 1 − ρ (where the probability is over the coin tosses of A), there exists a non-uniform deterministic algorithm B “of ...
Read More
An optimal separation of randomized and Quantum query complexity
STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing

We prove that for every decision tree, the absolute values of the Fourier coefficients of given order t≥1 sum to at most (cd/t)^t/2(1+logn)^(t−1)/2, where n is the number of variables, d is the tree depth, and c>0 is an absolute constant. This bound is ...
Read More

Comments

Information & Contributors

Information

Published In

Journal of the ACM Volume 70, Issue 6

December 2023

314 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/3633310

Editor:
Venkatesan Guruswami
University of California, Berkeley, United States

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Online AM: 18 October 2023

Accepted: 02 September 2023

Revised: 28 May 2023

Received: 04 December 2020

Published in JACM Volume 70, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada (NSERC)
Institute for Quantum Computing (IQC) at the University of Waterloo

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
295
Total Downloads

Downloads (Last 12 months)295
Downloads (Last 6 weeks)20

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Abstract

A Proofs Related to the Minimax Theorem

B Distance Measures

C Quantum Amplitude Estimation

Index Terms

Recommendations

Super-linear time-space tradeoff lower bounds for randomized computation

Weak Derandomization of Weak Algorithms: Explicit Versions of Yao’s Lemma

An optimal separation of randomized and Quantum query complexity

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations