Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A New Minimax Theorem for Randomized Algorithms

Published: 30 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    The celebrated minimax principle of Yao says that for any Boolean-valued function f with finite domain, there is a distribution μ over the domain of f such that computing f to error ε against inputs from μ is just as hard as computing f to error ε on worst-case inputs. Notably, however, the distribution μ depends on the target error level ε: the hard distribution which is tight for bounded error might be trivial to solve to small bias, and the hard distribution which is tight for a small bias level might be far from tight for bounded error levels.
    In this work, we introduce a new type of minimax theorem which can provide a hard distribution μ that works for all bias levels at once. We show that this works for randomized query complexity, randomized communication complexity, some randomized circuit models, quantum query and communication complexities, approximate polynomial degree, and approximate logrank. We also prove an improved version of Impagliazzo’s hardcore lemma.
    Our proofs rely on two innovations over the classical approach of using Von Neumann’s minimax theorem or linear programming duality. First, we use Sion’s minimax theorem to prove a minimax theorem for ratios of bilinear functions representing the cost and score of algorithms.
    Second, we introduce a new way to analyze low-bias randomized algorithms by viewing them as “forecasting algorithms” evaluated by a certain proper scoring rule. The expected score of the forecasting version of a randomized algorithm appears to be a more fine-grained way of analyzing the bias of the algorithm. We show that such expected scores have many elegant mathematical properties—for example, they can be amplified linearly instead of quadratically. We anticipate forecasting algorithms will find use in future work in which a fine-grained analysis of small-bias algorithms is required.
    Appendices

    A Proofs Related to the Minimax Theorem

    Lemma 2.8 (An Upper Semicontinuous Function on a Compact Set Attains Its Max).
    Let \(X\) be a nonempty compact topological space, and let \(\phi : X\rightarrow \overline{{\mathbb {R}}}\) be a function. Then if \(\phi\) is upper semicontinuous, it attains its maximum, meaning there is some \(x\in X\) such that for all \(x^{\prime }\in X\) , \(\phi (x^{\prime })\le \phi (x)\) . Similarly, if \(\phi\) is lower semicontinuous, it attains its minimum.
    Proof.
    The lower semicontinuous case follows from the upper semicontinuous case simply by negating \(\phi\) , so we focus on the upper semicontinuous case. Let \(z=\sup _{x\in X}\phi (x)\) , where \(z\in \overline{{\mathbb {R}}}\) . Let \(x_0\) be any element of \(X\) . If \(\phi (x_0)=z\) , we are done, so assume \(\phi (x_0)\lt z\) ; in particular, \(z\gt -\infty\) . We define a sequence \(x_1,x_2,\dots\) as follows. If \(z\lt \infty\) , define \(x_{i}\) to be any element of \(X\) such that \(\phi (x_i)\gt z-1/i\) . If \(z=\infty\) , define \(x_i\) to be any element of \(X\) such that \(\phi (x_i)\gt i\) . Moreover, for each \(i\in \mathbb {N}\) , let \(U_i=\lbrace \,x\in X:\phi (x)\lt \phi (x_i)\rbrace\) . Note that any \(x\in X\) for which \(\phi (x)\lt z\) must be in \(U_i\) for some \(i\in \mathbb {N}\) ; hence, if the supremum \(z\) is not attained, the sets \(U_i\) form a cover for \(X\) (meaning \(\bigcup _{i\in \mathbb {N}}U_i=X\) ).
    The key claim is that the \(U_i\) sets are all open if \(\phi\) is upper semicontinuous. This is is because if \(x\in U_i\) , then \(\phi (x)\lt \phi (x_i)\) , and by the definition of upper semicontinuity, there is a neighborhood \(U\) of \(x\) on which \(\phi (\cdot)\) is still less than \(\phi (x_i)\) ; thus, there is a neighborhood \(U\) of \(x\) contained in \(U_i\) so that \(U_i\) is open. In this case, if the supremum \(z\) is not attained, the collection \(\lbrace U_i\rbrace _{i\in \mathbb {N}}\) is an open cover of \(X\) , and by the definition of compactness, it has a finite subcover. Let \(i\) be the largest index of some \(U_i\) in this subcover. Then it follows that \(\phi (x)\lt \phi (x_i)\) for all \(x\in X\) , which is a contradiction. Hence, the supremum \(z\) must be attained as a maximum, as desired. □
    Lemma 2.9 (A Pointwise Infimum of Upper Semicontinuous Functions Is Upper Semicontinuous).
    Let \(X\) be a topological space, let \(I\) be a set, and let \(\lbrace \phi _i\rbrace _{i\in I}\) be a collection of functions \(\phi _i : X\rightarrow \overline{{\mathbb {R}}}\) . Then if each \(\phi _i\) is upper semicontinuous, the function \(\phi (x)=\inf _{i\in I}\phi _i(x)\) is also upper semicontinuous. Similarly, if each \(\phi _i\) is lower semicontinuous, the pointwise supremum is lower semicontinuous.
    Proof.
    Note that the case where \(\phi _i\) are all lower semicontinuous follows from the case where they are all upper semicontinuous simply by negating the functions, since negation flips upper and lower semicontinuity and flips infimums and supremums. We focus on the case where \(\phi _i\) are all upper semicontinuous.
    Fix \(x\in X\) . If \(\phi (x)=\infty\) , \(\phi\) is upper semicontinuous at \(x\) by definition. If \(\phi (x)\lt \infty\) , fix any \(y\gt \phi (x)\) . By the definition of \(\phi (x)\) as an infimum, there is some \(i\in I\) such that \(\phi _i(x)\lt y\) . By the upper semicontinuity of \(\phi _i(\cdot)\) , there is a neighborhood \(U\) of \(x\) such that for all \(x^{\prime }\in U\) , we have \(\phi _i(x^{\prime })\lt y\) . Then for all \(x^{\prime }\in U\) , we clearly have \(\phi (x^{\prime })=\inf _{i\in I}\phi _i(x^{\prime })\lt y\) . Thus, \(\phi\) is upper semicontinuous at \(x\) , as desired. □
    Lemma A.1.
    Let \(V\) be a real vector space, and let \(X\subseteq V\) . The convex hull of \(X\) is the set of all \(v\in V\) which can be written as a convex combination of vectors in \(x\) —that is, \(v\) for which there exist \(k\in \mathbb {N}\) , \(x_1,x_2,\dots ,x_k\in X\) , and \(\lambda _1,\lambda _2,\dots ,\lambda _k\in [0,1]\) with \(\lambda _1+\lambda _2+\dots +\lambda _k=1\) such that \(v=\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k\) .
    Proof.
    This is a well-known characterization of the convex hull, which can be shown as follows: let \(Y\) be the set of all finite convex combinations of points in \(X\) —that is, \(Y\) contains all points in \(V\) of the form \(\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k\) , where \(k\in \mathbb {N}\) , \(x_1,x_2,\dots ,x_k\in X\) , and \(\lambda _1,\lambda _2,\dots , \lambda _k\in [0,1]\) with \(\lambda _1+\lambda _2+\dots +\lambda _k=1\) . Then \(Y\) is clearly convex, since for all \(y_1,y_2\in Y\) and \(\lambda \in (0,1)\) , we know that \(y_1\) and \(y_2\) are finite convex combinations of points in \(x\) , meaning that \(\lambda y_1+(1-\lambda)y_2\) is also a finite convex combination of points in \(X\) . Furthermore, if \(Z\) is any other convex set containing \(X\) , then it is easy to show by induction that \(Z\) contains all convex combinations of \(k\) points in \(X\) for each \(k\in \mathbb {N}\) ; hence, \(Z\) must be a superset of \(Y\) . It follows that \(\operatorname{Conv}(X)\) , the intersection of all convex sets containing \(X\) , must exactly equal \(Y\) . □
    Lemma 2.10 (Quasiconvex Functions on Convex Hulls).
    Let \(V\) be a real vector space, let \(X\subseteq V\) , and let \(\phi :\operatorname{Conv}(X)\rightarrow \overline{{\mathbb {R}}}\) be a function. If \(\phi\) is quasiconvex, then
    \(\begin{equation*} \sup _{x\in \operatorname{Conv}(X)}\phi (x)=\sup _{x\in X}\phi (x). \end{equation*}\)
    Similarly, if \(\phi\) is quasiconcave, then
    \(\begin{equation*} \inf _{x\in \operatorname{Conv}(X)}\phi (x)=\inf _{x\in X}\phi (x). \end{equation*}\)
    Proof.
    The quasiconcave case follows from the quasiconvex case by negating \(\phi\) ; hence, it suffices to prove the quasiconvex case. It is clear that \(\sup _{x\in \operatorname{Conv}(X)}\phi (x)\) is at least \(\sup _{x\in X}\phi (x)\) , so we only need to show the latter is at least the former. To this end, let \(y^*:= \sup _{x\in \operatorname{Conv}(X)}\phi (x)\) , and let \(\hat{x}\in \operatorname{Conv}(X)\) be such that \(\phi (\hat{x})\) is arbitrarily close to \(y^*\) . We must show that \(\sup _{x\in X}\phi (x)\ge \phi (\hat{x})\) or, equivalently, that there is some \(x\in X\) with \(\phi (x)\ge \phi (\hat{x})\) .
    Using Lemma A.1, we can now write \(\hat{x}\in \operatorname{Conv}(X)\) as \(\hat{x}=\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k\) , with \(k\in \mathbb {N}\) , \(x_1,x_2,\dots , x_k\in X\) , and \(\lambda _1,\lambda _2,\dots ,\lambda _k\in [0,1]\) with \(\lambda _1+\lambda _2+\dots +\lambda _k=1\) . Furthermore, assume that \(\lambda _i\gt 0\) for each \(i\in [k]\) (we can remove \(\lambda _i x_i=0\) from the linear combination otherwise). Now, note that by quasiconvexity, we have \(\phi (\lambda x_1+(1-\lambda)x_2)\le \max \lbrace \phi (x_1),\phi (x_2)\rbrace\) . It is not hard to show by induction that \(\phi (\lambda _1 x_1+\lambda _2 x_2+\dots +\lambda _k x_k) \le \max \lbrace \phi (x_1),\phi (x_2),\dots ,\phi (x_k)\rbrace\) . Hence, there is some \(x\in X\) such that \(\phi (x)\ge \phi (\hat{x})\) , as desired. □
    Lemma 2.15.
    Let \(V\) be a real topological vector space, and let \(X\subseteq V\) be convex. For a function \(\psi : X\rightarrow \overline{{\mathbb {R}}}\) , let \(\psi ^{+}\) denote the function \(\psi ^{+}(x)=\max \lbrace \psi (x),0\rbrace\) . Then this operation on \(\psi\) preserves convexity, quasiconvexity, quasiconcavity, upper semicontinuity, and lower semicontinuity, but not concavity.
    We actually prove a stronger statement, where the maximum is taken with an arbitrary constant.
    Lemma A.2.
    Let \(V\) be a real topological vector space, and let \(X\subseteq V\) be convex. Let \(\psi : X\rightarrow \overline{{\mathbb {R}}}\) be a function, let \(c\in {\mathbb {R}}\) be a constant, and let \(\psi ^{\prime }: X\rightarrow \overline{{\mathbb {R}}}\) be the function \(\psi ^{\prime }(x)=\max \lbrace \psi (x),c\rbrace\) . Then if \(\psi\) is convex, \(\psi ^{\prime }\) is convex; if \(\psi\) is quasiconvex, \(\psi ^{\prime }\) is quasiconvex; if \(\psi\) is quasiconcave, \(\psi ^{\prime }\) is quasiconcave; if \(\psi\) is upper semicontinuous, \(\psi ^{\prime }\) is upper semicontinuous; and if \(\psi\) is lower semicontinuous, \(\psi ^{\prime }\) is lower semicontinuous.
    Proof.
    Let \(x,y\in X\) , and let \(\lambda \in (0,1)\) . Then
    \(\begin{equation*} \psi ^{\prime }(\lambda x+(1-\lambda)y)=\max \lbrace \psi (\lambda x+(1-\lambda)y),c\rbrace . \end{equation*}\)
    If this maximum equals \(c\) , it is certainly at most \(\lambda \max \lbrace \psi (x),c\rbrace +(1-\lambda)\max \lbrace \psi (y),c\rbrace\) , since these two latter maximums are each at least \(c\) . Hence, the inequalities for convexity and quasiconvexity always hold when the original maximum equals \(c\) . Alternatively, if \(\max \lbrace \psi (\lambda x+(1-\lambda)y),c\rbrace =\psi (\lambda x+(1-\lambda)y)\) , then using \(\psi (x)\le \psi ^{\prime }(x)\) and \(\psi (y)\le \psi ^{\prime }(y)\) , we see that convexity of \(\psi\) gives the inequality for convexity of \(\psi ^{\prime }\) , and quasiconvexity of \(\psi\) gives the inequality for quasiconvexity of \(\psi ^{\prime }\) .
    Next, suppose \(\psi\) is quasiconcave. Without loss of generality, say that \(\psi (x)\le \psi (y)\) . Then \(\psi ^{\prime }(\lambda x+(1-\lambda)y)=\max \lbrace \psi (\lambda x+(1-\lambda)y),c\rbrace \ge \max \lbrace \psi (x),c\rbrace =\psi ^{\prime }(x)\ge \min \lbrace \psi ^{\prime }(x),\psi ^{\prime }(y)\rbrace\) , and \(\psi ^{\prime }\) is quasiconcave.
    Preservation of lower semicontinuity follows from Lemma 2.9, where we note that \(c\) is continuous as a function from \(X\) to \(\overline{{\mathbb {R}}}\) . It remains to show upper semicontinuity is preserved. Suppose \(\psi\) is upper semicontinuous, and let \(x\in X\) . If \(\psi ^{\prime }(x)=\infty\) , upper semicontinuity at \(x\) vacuously holds. Otherwise, fix some \(y\gt \psi ^{\prime }(x)\) . Then \(\psi (x) \le \psi ^{\prime }(x) \lt y\) and upper semicontinuity gives us a neighborhood \(U\) of \(x\) on which \(\psi (\cdot)\) is less than \(y\) . And since \(\psi ^{\prime }(x)\ge c\) , we have \(y\gt c\) so \(\psi ^{\prime }(\cdot)=\max \lbrace c,\psi (\cdot)\rbrace \lt y\) on \(U\) . Hence, \(\psi ^{\prime }\) is upper semicontinuous. □
    Theorem A.3 (Sion’s Minimax [Sion 1958]).
    Let \(V_1\) and \(V_2\) be real topological vector spaces, and let \(X\subseteq V_1\) and \(Y\subseteq V_2\) be convex. Let \(\alpha : X\times Y\rightarrow {\mathbb {R}}\) be semicontinuous and quasisaddle. If either \(X\) or \(Y\) is compact, then
    \(\begin{equation*} \inf _{x\in X}\sup _{y\in Y}\alpha (x,y) =\sup _{y\in Y}\inf _{x\in X}\alpha (x,y). \end{equation*}\)
    Theorem 2.11 (Sion’s Minimax for Extended Reals).
    Let \(V_1\) and \(V_2\) be real topological vector spaces, and let \(X\subseteq V_1\) and \(Y\subseteq V_2\) be convex. Let \(\alpha : X\times Y\rightarrow \overline{{\mathbb {R}}}\) be semicontinuous and quasisaddle. If either \(X\) or \(Y\) is compact, then
    \(\begin{equation*} \inf _{x\in X}\sup _{y\in Y}\alpha (x,y) =\sup _{y\in Y}\inf _{x\in X}\alpha (x,y). \end{equation*}\)
    Proof.
    First, note that the inf-sup is always at least the sup-inf. This is because these expressions can be thought of as two players, one choosing \(x\) and trying to minimize \(\alpha (x,y)\) , and the other choosing \(y\) and trying to maximize \(y\) ; in the inf-sup, the sup player chooses \(y\) after already knowing \(x\) , and therefore has more information and is better positioned to maximize \(\alpha (x,y)\) than in the sup-inf, where the inf player goes second.
    Now, let
    \(\begin{equation*} a:= \sup _{y\in Y}\inf _{x\in X}\alpha (x,y),\qquad b:= \inf _{x\in X}\sup _{y\in Y}\alpha (x,y). \end{equation*}\)
    We have \(a,b\in \overline{{\mathbb {R}}}\) , and \(a\le b\) . We wish to show \(a=b\) . Suppose by contradiction that \(a\lt b\) . Then we can pick \(a^{\prime },b^{\prime }\in {\mathbb {R}}\) such that \(a\lt a^{\prime }\lt b^{\prime }\lt b\) . We then define \(\alpha ^{\prime } : X\times Y\rightarrow {\mathbb {R}}\) by \(\alpha ^{\prime }(x,y):= a^{\prime }\) if \(\alpha (x,y)\le a^{\prime }\) , \(\alpha ^{\prime }(x,y):= b^{\prime }\) if \(\alpha ^{\prime }(x,y)\ge b^{\prime }\) , and \(\alpha ^{\prime }(x,y):=\alpha (x,y)\) if \(\alpha (x,y)\in [a^{\prime },b^{\prime }]\) .
    Note that \(\alpha ^{\prime }(x,y)=\max \lbrace a^{\prime },\min \lbrace b^{\prime },\alpha (x,y)\rbrace \rbrace\) . By Lemma A.2, we know that taking a maximum with a constant preserves quasiconvexity, quasiconcavity, and upper and lower semicontinuities. By negating the function, it also follows that taking a minimum with a constant preserves these properties. From this, it follows that \(\alpha ^{\prime }\) is quasisaddle and semicontinuous, since \(\alpha\) has these properties.
    Now, since \(a=\sup _{y\in Y}\inf _{x\in X}\alpha (x,y)\) and since \(a^{\prime }\gt a\) , we know that for all \(y\in Y\) , there exists some \(x\in X\) for which \(\alpha (x,y)\lt a^{\prime }\) . This means that for all \(y\in Y\) , there exists \(x\in X\) for which \(\alpha ^{\prime }(x,y)=a^{\prime }\) . Hence, \(\sup _{y\in Y}\inf _{x\in X}\alpha ^{\prime }(x,y)=a^{\prime }\) . Similarly, since \(b=\inf _{x\in X}\sup _{y\in Y}\alpha (x,y)\) and since \(b^{\prime }\lt b\) , we know that for all \(x\in X\) , there exists some \(y\in Y\) for which \(\alpha (x,y)\gt b^{\prime }\) . This means that for all \(x\in X\) , there exists \(y\in Y\) for which \(\alpha ^{\prime }(x,y)=b^{\prime }\) . Hence, \(\inf _{x\in X}\sup _{y\in Y}\alpha ^{\prime }(x,y)=b^{\prime }\) . By Theorem A.3, we then have
    \(\begin{equation*} b^{\prime }=\inf _{x\in X}\sup _{y\in Y}\alpha ^{\prime }(x,y) =\sup _{y\in Y}\inf _{x\in X}\alpha ^{\prime }(x,y)=a^{\prime }. \end{equation*}\)
    But this is a contradiction, since we picked \(a^{\prime }\lt b^{\prime }\) . We conclude that we must have had \(a=b\) to begin with, as desired. □

    B Distance Measures

    Lemma 3.3.
    \(\operatorname{hs}\) , \(\operatorname{Brier}\) , and \(\operatorname{ls}\) are proper scoring rules. \(\operatorname{bias}\) is a scoring rule which is not proper.
    Proof.
    It is clear that all of the functions from Definition 3.2 are smooth on \((0,1)\) and increasing on \([0,1]\) , where we interpret \(\operatorname{hs}(0)=\operatorname{ls}(0)=-\infty\) . It is also clear that all these functions evaluate to 1 at 1 and to 0 at \(1/2\) . It remains to show that \(\operatorname{Brier}\) , \(\operatorname{ls}\) , and \(\operatorname{hs}\) are proper. To do so, we need to show that \(ps(q)+(1-p)s(1-q)\) is uniquely optimized at \(q=p\) when \(s\) is one of these functions and \(p\in (0,1)\) . Fix such \(p\in (0,1)\) , and observe that the critical points of the expression we wish to maximize are the points \(q\) such that \(ps^{\prime }(q)=(1-p)s^{\prime }(1-q)\) .
    For \(\operatorname{ls}(q)=1-\log (1/q)=1+(\log e)\ln q\) , the critical points \(q\) satisfy \((\log e)p/q=(\log e)(1-p)/(1-q)\) , or \(p/(1-p)=q/(1-q)\) . Noting that the function \(x/(1-x)\) is increasing on \((0,1)\) , and hence injective on \((0,1)\) , we conclude that the only critical point is \(q=p\) . Moreover, at the boundaries \(q=0\) and \(q=1\) , we clearly have \(p\operatorname{ls}(q)+(1-p)\operatorname{ls}(1-q)=-\infty\) , whereas in the interior the expression is finite. Hence, the unique maximum must occur at \(q=p\) .
    For \(\operatorname{hs}(q)=1-\sqrt {(1-q)/q}=1-\sqrt {1/q-1}\) , we have \(\operatorname{hs}^{\prime }(q)=1/2\sqrt {q^3(1-q)}\) , so the critical points \(q\) satisfy \(p/2\sqrt {q^3(1-q)}=(1-p)/2\sqrt {(1-q)^3q}\) , or \(p/q=(1-p)/(1-q)\) , which once again only occurs at \(q=p\) . At the boundaries, we once again have \(p\operatorname{hs}(q)+(1-p)\operatorname{hs}(1-q)=-\infty\) for \(q=0\) or \(q=1\) , so the unique maximum occurs at \(q=p\) .
    Finally, for \(\operatorname{Brier}(q)=1-4(1-q)^2=-4q^2+8q-3\) , we have \(\operatorname{Brier}^{\prime }(q)=8(1-q)\) , so the critical points \(q\) satisfy \(8p(1-q)=8(1-p)q\) , which again implies \(q=p\) . This time, the boundary points are finite, but we can use the second-order condition: the second derivative of \(p\operatorname{Brier}(q)+(1-p)\operatorname{Brier}(1-q)\) is \(p\operatorname{Brier}^{\prime \prime }(q)+(1-p)\operatorname{Brier}^{\prime \prime }(1-q)\) . Noting that \(\operatorname{Brier}^{\prime \prime }(q)=-8\) , this is \(-8p-8(1-p)=-8\lt 0\) . Hence, the critical point is a maximum, and since it is unique (with the boundaries 0 and 1 not being critical even if we extend the domain of the function), we conclude it is the unique maximum. □
    Lemma B.1.
    For any \(x\in [0,1]\) , we have
    \(\begin{equation*} \frac{x^2}{2}\le 1-\sqrt {1-x^2}\le 1-H\left(\frac{1+x}{2}\right) \le x^2\le x. \end{equation*}\)
    Additionally, \(x^2\) and \(1-\sqrt {1-x}\) are convex functions on \([0,1]\) .
    Proof.
    The inequality \(x^2\le x\) is clearly true for \(x\in [0,1]\) . Set \(f(x) = 1 - \sqrt {1-x^2}\) and \(g(x) = 1 - H(\frac{1+x}{2})\) . We want to show that \(x^2/2 \le f(x) \le g(x) \le x^2\) for all \(x \in [0,1]\) . We prove each of these inequalities in order.
    The function \(f\) satisfies \(f(0) = 0 = (0)^2/2\) and has derivative \(f^{\prime }(x) = x/\sqrt {1-x^2}\) which is greater than \(x = (x^2/2)^{\prime }\) when \(0 \lt x \lt 1\) , so \(f\) grows faster than \(x^2/2\) over that interval. Therefore, \(x^2/2 \le f(x)\) for all \(x \in [0,1]\) .
    The functions \(f\) and \(g\) satisfy \(f(0) = g(0) = 0\) and \(f^{\prime }(0) = g^{\prime }(0) = 0,\) and their second derivatives are \(f^{\prime \prime }(x) = (1-x^2)^{-3/2}\) and \(g^{\prime \prime }(x) = \big (\ln 2 \cdot (1-x^2)\big)^{-1}\) . So \(f^{\prime \prime }(x) \gt g^{\prime \prime }(x)\) if and only if \(\sqrt {1-x^2} \lt \ln 2\) , which holds if and only if \(|x| \gt 1 - \ln ^2 2 \approx 0.72\) . Therefore, \(f\) and \(g\) have only one intersection point in \(\mathbb {R}_{\gt 0}\) and \(f(x) \lt g(x)\) for all \(x\) between 0 and that intersection point. Since \(f(1) = 1 = g(1)\) , this means that \(f(x) \le g(x)\) for all \(x \in [0,1]\) .
    The function \(x^2\) also has value and first derivative equal to 0 at \(x=0\) . In addition, \(g^{\prime \prime }(x) \gt (x^2)^{\prime \prime }\) if and only if \(\big (\ln 2 \cdot (1-x^2)\big)^{-1} \gt 2\) , which holds if and only if \(|x| \gt \sqrt { 1 - 1/\ln 4} \approx 0.53\) . So \(g\) and \(x^2\) have only one intersection point in \(\mathbb {R}_{\gt 0}\) and \(g(x) \lt x^2\) for all points \(x\) between 0 and this intersection point. Since \(g(1) = 1 = (1)^2\) , we then have \(g(x) \le x^2\) for all \(x \in [0,1]\) .
    Finally, the convexity of \(x^2\) and \(1-\sqrt {1-x}\) on \([0,1]\) follows immediately from the fact that their second derivatives are both positive on \((0,1)\) . □
    Lemma 3.6 (Relations between Distance Measures).
    When applied to fixed \(\nu _0\) , \(\nu _1\) , and \(w\) , the distance measures satisfy
    \(\begin{equation*} \frac{\operatorname{S}^2}{2}\le 1-\sqrt {1-\operatorname{S}^2} \le \text{h}^2\le \operatorname{JS}\le \operatorname{S}^2 \end{equation*}\)
    as well as
    \(\begin{equation*} \Delta ^2\le \operatorname{S}^2\le \Delta . \end{equation*}\)
    We also have \(\operatorname{JS}\le \text{h}^2/\ln 2\) and \(\operatorname{S}^2\le (\ln 4)\operatorname{JS}\) .
    Proof.
    We use Lemma B.1. The inequality \(\frac{S^2}{2} \le 1 - \sqrt {1 - S^2}\) and the chain \(\text{h}^2\le \operatorname{JS}\le \operatorname{S}^2\le \Delta\) follow from the inequalities there, whereas the inequalities \(\Delta ^2\le \operatorname{S}^2\) and \(1-\sqrt {1-\operatorname{S}^2}\le \text{h}^2\) follow from Jensen’s inequality combined with the convexity of \(x^2\) and \(1-\sqrt {1-x}\) .
    Finally, to show inequality \(\operatorname{JS}\le \text{h}^2/\ln 2,\) we only need to compute the limit of \(\alpha (x)/(1-\sqrt {1-x^2})\) as \(x\rightarrow 0\) , since this ratio is decreasing with \(x\) (where \(\alpha (x)\) is defined as in the proof of Lemma B.1). To do that it suffices to use \(\alpha (x)=x^2+O(x^4)\) and \(1-\sqrt {1-x^2}=x^2/2+O(x^4)\) , so the limit is 2. Hence, the limit of \((1-H((1+x)/2))/(1-\sqrt {1-x^2})\) as \(x\rightarrow 0\) is \(1/\ln 2\) , meaning this ratio is always at most \(1/\ln 2\) . Similarly, to show the inequality \(\operatorname{S}^2\le (\ln 4)\operatorname{JS}\) , we only need to compute the limit of \(\alpha (x)/x^2\) as \(x\rightarrow 0\) . Again using \(\alpha (x)=x^2+O(x^4)\) , the limit is 1, so the ratio \((1-H((1+x)/2))/x^2\) is always at least \(1/\ln 4\) . □
    Lemma 3.11.
    If \(x\in [0,1]\) and \(k\in [1,\infty)\) , we have
    \(\begin{equation*} \frac{1}{2}\min \lbrace kx,1\rbrace \le 1-(1-x)^k\le \min \lbrace kx,1\rbrace . \end{equation*}\)
    Proof.
    Set \(f(x):= 1-(1-x)^k\) . Clearly, when \(x\in [0,1]\) , we have \(f(x)\in [0,1]\) , so \(f:[0,1]\rightarrow [0,1]\) . Note \(f(0)=0\) , \(f(1)=1\) , and that \(f(x)\) is increasing on \([0,1]\) . If \(k=1\) , we have \(f(x)=x\) , and the inequalities trivially hold; therefore, assume \(k\gt 1\) . Then \(f^{\prime }(x)=k(1-x)^{k-1}\) and \(f^{\prime \prime }(x)=-k(k-1)(1-x)^{k-2}\) , meaning that \(f(x)\) is concave on \([0,1]\) ; we also have \(f^{\prime }(0)=k\) and \(f^{\prime \prime }(0)=-k(k-1)\) . From this, we conclude that \(f(x)\le kx\) , proving the upper bound (as \(f(x)\le 1\) is clear).
    For the lower bound, note that \(f^{\prime \prime \prime }(x)=k(k-1)(k-2)(1-x)^{k-3}\) , which is non-negative on \([0,1]\) . This means that \(f^{\prime \prime }(x)\ge -k(k-1)\) on \([0,1]\) , that \(f^{\prime }(x)\ge k-k(k-1)x\) on \([0,1]\) , and that \(f(x)\ge kx-(k(k-1)/2)x^2=kx(1-(k-1)x/2)\) on \([0,1]\) . If \((k-1)x\le 1\) , we get \(f(x)\ge kx/2\) . If \((k-1)x\ge 1\) , we have \(f(x)\ge 1-e^{-kx}\ge 1-1/e\ge 1/2\) . This completes the proof. □
    Lemma 4.4 (Hellinger Distance of Disjoint Mixtures).
    Let \(\mu\) be a distribution over a finite support \(A\) , and for each \(a\in A\) , let \(\nu _0^a\) and \(\nu _1^a\) be two distributions over a finite support \(S_a\) . Let \(\nu _0^\mu\) and \(\nu _1^\mu\) denote the mixture distributions where \(a\leftarrow \mu\) is sampled, and then a sample is produced from \(\nu _0^a\) or \(\nu _1^a\) , respectively. Assume the sets \(S_a\) are disjoint for all \(a\in A\) . Then
    \(\begin{equation*} \text{h}^2(\nu _0^\mu ,\nu _1^\mu) =\mathbb {E}_{a\leftarrow \mu }[\text{h}^2(\nu _0^a,\nu _1^a)]. \end{equation*}\)
    Proof.
    Note that the squared-Hellinger distance is 1 minus the fidelity—that is, \(\text{h}^2(\mu _1,\mu _2)=1-F(\mu _1,\mu _2)\) , where \(F(\mu _1,\mu _2)=\sum _x\sqrt {\mu _1[x]\mu _2[x]}\) . (This is easy to check from the definition of \(\text{h}^2\) .) Now write
    \(\begin{align*} \text{h}^2(\nu _0^\mu ,\nu _1^\mu)&=1-\sum _{x\in \bigcup _{a} S_a} \sqrt {\nu _0^\mu [x]\nu _1^\mu [x]}\\ &=1-\sum _{a\in A}\sum _{x\in S_a} \sqrt {\mu [a]\nu _0^a[x]\mu [a]\nu _1^a[x]}\\ &=1-\mathbb {E}_{a\leftarrow \mu }\left[\sum _{x\in S_a} \sqrt {\nu _0^a[x]\nu _1^a[x]}\right]\\ &=\mathbb {E}_{a\leftarrow \mu }\left[1-\sum _{x\in S_a} \sqrt {\nu _0^a[x]\nu _1^a[x]}\right]\\ &=\mathbb {E}_{a\leftarrow \mu }\left[\text{h}^2(\nu _0^a,\nu _1^a)\right]. \end{align*}\)
     □

    C Quantum Amplitude Estimation

    We show the following strengthening of Theorem 5.1, which follows from Brassard et al. [2002].
    Theorem C.1 (Amplitude Estimation).
    Suppose we have access to a unitary \(U\) (representing a quantum algorithm) which maps \(|0\rangle\) to \(|\psi \rangle\) , as well as access to a projective measurement \(\Pi\) , and we wish to estimate \(p:=\Vert \Pi |\psi \rangle \Vert _2^2\) (representing the probability the quantum algorithm accepts). Fix \(\epsilon ,\delta \in (0,1/2)\) . Then using at most \((100/\epsilon)\cdot \ln (1/\delta)\) controlled applications of \(U\) or \(U^\dagger\) and at most that many applications of \(I-2\Pi\) , we can output \(\tilde{p}\in [0,1]\) such that \(|\tilde{p}-p|\le \epsilon\) with probability at least \(1-\delta\) .
    Further, this can be tightened to a bound that depends on \(p\) , as follows. For any positive real number \(T\) , there is an algorithm which depends on \(\epsilon\) , \(\delta\) , and \(T\) (but not on \(p\) ) which uses at most \(T\) applications of the unitaries (as above) and outputs \(\tilde{p}\in [0,1]\) with the following guarantee: if \(T\) is at least \(\lfloor (100/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\cdot \ln (1/\delta) \rfloor\) , then \(|\tilde{p}-p|\le \epsilon\) with probability at least \(1-\delta\) .
    Proof.
    Brassard et al. [2002] showed that an algorithm which makes \(M\) controlled calls to the unitary \(U(I-2\mathinner {|{0}\rangle }\mathinner {\langle {0}|})U^{-1}(I-2\Pi)\) and one additional call to \(U\) can output \(\tilde{p}\) such that
    \(\begin{equation*} |\tilde{p}-p| \le \frac{2\pi \sqrt {p(1-p)}}{M}+\frac{\pi ^2}{M^2} \end{equation*}\)
    with probability at least \(8/\pi ^2\ge 4/5\) . If we pick \(M\) such that \(M\ge 8/\sqrt {\epsilon }\) and \(M\ge 8\sqrt {p}/\epsilon\) , then this is at most \((\pi /4+\pi ^2/64)\epsilon \le \epsilon\) . Note that \(M\) must be an integer, and that the number of applications of \(U\) or \(U^{-1}\) is \(2M+1\) . Hence, to get this success probability, it suffices to have \(T\ge 3+(16/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\) , or \(T\ge (19/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\) .
    To generalize to other success probabilities, we amplify this algorithm by repeating \(2k+1\) times and returning the median estimate. The probability that this is still wrong is the probability that at least \(k+1\) out of \(2k+1\) of the estimates were wrong, which is
    \(\begin{equation*} \sum _{i=1}^{k+1}\binom{2k+1}{k+1-i}q^{k+i}(1-q)^{k+1-i} \le q^{k+1}(1-q)^{k}\sum _{i=1}^{k+1}\binom{2k+1}{k+1-i} \end{equation*}\)
    \(\begin{equation*} =q^{k+1}(1-q)^{k}2^{2k} =q(1-(1-2q)^2)^k \le qe^{-k(1-2q)^2}. \end{equation*}\)
    Hence, to get this below \(\delta\) , we just need \(k\ge (1/(1-2q)^2)\ln (1/q\delta)\) , or \(k\ge 2.6\ln (1/\delta)-4\) . Since \(k\) must be an integer, we can always choose it so that \(2k+1\) is at most \(5.2\ln (1/\delta)\) . Multiplying this by the bound from before, we get that it suffices for \(T\) to be at most \((100/\epsilon)\sqrt {\max \lbrace p,\epsilon \rbrace }\cdot \ln (1/\delta)\) , as desired. □

    Index Terms

    1. A New Minimax Theorem for Randomized Algorithms

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal of the ACM
      Journal of the ACM  Volume 70, Issue 6
      December 2023
      314 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/3633310
      • Editor:
      • Venkatesan Guruswami
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 November 2023
      Online AM: 18 October 2023
      Accepted: 02 September 2023
      Revised: 28 May 2023
      Received: 04 December 2020
      Published in JACM Volume 70, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Minimax
      2. randomized computation
      3. quantum computation
      4. query complexity
      5. communication complexity
      6. polynomial degree complexity
      7. circuit complexity

      Qualifiers

      • Research-article

      Funding Sources

      • Natural Sciences and Engineering Research Council of Canada (NSERC)
      • Institute for Quantum Computing (IQC) at the University of Waterloo

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 295
        Total Downloads
      • Downloads (Last 12 months)295
      • Downloads (Last 6 weeks)20

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media