Abstract
This paper studies the statistical query (SQ) complexity of estimating d-dimensional submanifolds in \({\mathbb {R}}^n\). We propose a purely geometric algorithm called manifold propagation, that reduces the problem to three natural geometric routines: projection, tangent space estimation, and point detection. We then provide constructions of these geometric routines in the SQ framework. Given an adversarial \(\mathrm {STAT}(\tau )\) oracle and a target Hausdorff distance precision \(\varepsilon = \Omega (\tau ^{2 / (d + 1)})\), the resulting SQ manifold reconstruction algorithm has query complexity \({\tilde{O}}(n \varepsilon ^{-d / 2})\), which is proved to be nearly optimal. In the process, we establish low-rank matrix completion results for SQ’s and lower bounds for randomized SQ estimators in general metric spaces.
Similar content being viewed by others
Notes
As the present paper uses \(\varepsilon \) for precision, we use \(\delta \) as the privacy parameter, contrary to the standard notation.
Abbreviations
- n :
-
Ambient dimension
- d :
-
Intrinsic dimension
- D :
-
Unknown distribution
- \(\left\| \cdot \right\| \) :
-
Euclidean norm
- \({\mathbb {G}}^{n,d}\) :
-
Grassmannian of d-dimensional subspaces of \({\mathbb {R}}^n\)
- \({\mathcal {H}}^d\) :
-
d-dimensional Hausdorff measure
- \(\mathrm {Supp}(D)\) :
-
Support of D
- \(\mathrm {B}(x,r)\) :
-
Closed Euclidean r-ball around \(x \in {\mathbb {R}}^n\)
- \(\omega _d\) :
-
Volume of unit d-ball
- \(\sigma _{d-1}\) :
-
Surface area of unit \((d-1)\)-dimensional sphere
- M :
-
Submanifold
- \(\mathrm {d}_M\) :
-
Geodesic distance on M
- \(\mathrm {B}_M(p,r)\) :
-
Closed geodesic r-ball around \(p \in M\)
- \(T_p M\) :
-
Tangent space of M at \(p \in M\)
- \(\angle \) :
-
Principal angle between linear subspaces
- \({\mathrm {Med}}\) :
-
Medial axis (Sect. 2.2.2)
- \(\pi _M\) :
-
Projection map onto M (Sect. 2.2.2)
- \({\mathrm {rch}}_M\) :
-
Reach of M (Definition 2.3)
- \(\mathrm {d}(\cdot ,M)\) :
-
Distance-to-M function
- \(\mathrm {d_H}\) :
-
Hausdorff distance (Definition 2.7)
- \(M^r\) :
-
r-Offset of M (Eq. (2.1))
- \(\mathrm {cv}_K(r)\) :
-
r-covering number of K (Definition B.1)
- \(\mathrm {pk}_K(r)\) :
-
r-packing number of K (Definition B.1)
- \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) :
-
Geometric model (Definition 2.4)
- \({\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) :
-
Statistical model (Definition 2.5)
- \(\left\{ {0}\right\} \sqcup {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) :
-
Fixed point geometric model (Definition 2.6)
- \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) :
-
Fixed point stat. model (Definition 2.6)
- \(\mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) :
-
Bounding ball geometric model (Definition 2.6)
- \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) :
-
Bounding ball stat. model (Definition 2.6)
- \({\mathrm {rch}}_{\min }\) :
-
Lower bound on the reach
- f :
-
Density with respect to the volume measure
- \(f_{\min }\) :
-
Lower bound on f
- \(f_{\max }\) :
-
Upper bound on f
- L :
-
Upper bound on the Lipschitz constant of f
- R :
-
Radius of enclosing ball
- \(\tau \) :
-
Tolerance
- \({\mathrm {STAT}}(\tau )\) :
-
Class of oracles with tolerance \(\tau \) (Definition 2.1)
- q :
-
Number of queries
- \(\varepsilon \) :
-
Precision
- \(\alpha \) :
-
Probability of failure (Definition 2.2)
- r :
-
Query to \({\mathrm {STAT}}(\tau )\)
- \(\mathrm {a}\) :
-
Answer to a query
- \({\mathsf {O}}\) :
-
SQ oracle
- \(\mathtt {A}\) :
-
SQ algorithm
- \(\mathcal {A}\) :
-
Randomized SQ algorithm
- \(\mu _i(\cdot )\) :
-
ith singular value in decreasing order
- \(\left\| \cdot \right\| _{\mathrm {F}} \) :
-
Frobenius norm
- \(\left\langle {\cdot }, {\cdot } \right\rangle \) :
-
Inner product
- \(\left\| \cdot \right\| _{\mathrm {op}} \) :
-
Operator norm
- \(\left\| \cdot \right\| _{*} \) :
-
Nuclear norm
- \(\mathop {\mathrm {TV}}\) :
-
Total variation distance (Definition G.1)
- \({\mathcal {D}}\) :
-
Generic model
- \((\Theta , \rho )\) :
-
Generic metric space
- \(\theta : {\mathcal {D}}\rightarrow \Theta \) :
-
Generic parameter of interest
References
Ery Arias-Castro, Gilad Lerman, and Teng Zhang. Spectral clustering based on local PCA. J. Mach. Learn. Res., 18:Paper No. 9, 57, 2017.
Ery Arias-Castro and Bruno Pelletier. On the convergence of maximum variance unfolding. J. Mach. Learn. Res., 14:1747–1770, 2013.
Srinivasan Arunachalam, Alex B. Grilo, and Henry Yuen. Quantum statistical query learning. CoRR, arXiv:2002.08240, 2020.
Eddie Aamari and Alexander Knop. Statistical query complexity of manifold estimation. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, page 116–122, New York, NY, USA, 2021. Association for Computing Machinery.
Eddie Aamari, Jisu Kim, Frédéric Chazal, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Estimating the reach of a manifold. Electron. J. Stat., 13(1):1359–1399, 2019.
Eddie Aamari and Clément Levrard. Stability and minimax optimality of tangential Delaunay complexes for manifold reconstruction. Discrete Comput. Geom., 59(4):923–971, 2018.
Eddie Aamari and Clément Levrard. Nonasymptotic rates for manifold, tangent space and curvature estimation. Ann. Statist., 47(1):177–204, 2019.
F. Almgren. Optimal isoperimetric inequalities. Indiana Univ. Math. J., 35(3):451–547, 1986.
Yariv Aizenbud and Barak Sober. Non-Parametric Estimation of Manifolds from Noisy Data. arXiv:2105.04754, May 2021.
Dmitri Burago, Yuri Burago, and Sergei Ivanov. A course in metric geometry, volume 33 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2001.
Shai Ben-David and Eli Dichterman. Learning with restricted focus of attention. J. Comput. Syst. Sci., 56(3):277–298, 1998.
Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using fourier analysis. In Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’94, page 253–262, New York, NY, USA, 1994. Association for Computing Machinery.
Jean-Daniel Boissonnat and Arijit Ghosh. Manifold reconstruction using tangential Delaunay complexes. Discrete Comput. Geom., 51(1):221–267, 2014.
Clément Berenfeld, John Harvey, Marc Hoffmann, and Krishnan Shankar. Estimating the reach of a manifold via its convexity defect function. Discrete Comput. Geom., 67(2):403–438, 2022.
Jean-Daniel Boissonnat, Siargey Kachanovich, and Mathijs Wintraecken. Sampling and Meshing Submanifolds in High Dimension. working paper or preprint, November 2019.
Jean-Daniel Boissonnat, André Lieutier, and Mathijs Wintraecken. The reach, metric distortion, geodesic convexity and the variation of tangent spaces. J. Appl. Comput. Topol., 3(1-2):29–58, 2019.
Tom Bylander. Learning linear threshold functions in the presence of classification noise. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 1994, New Brunswick, NJ, USA, July 12-15, 1994, pages 340–347, 1994.
Antonio Cuevas and Ricardo Fraiman. A plug-in approach to support estimation. Ann. Statist., 25(6):2300–2312, 1997.
Manfredo Perdigão do Carmo. Riemannian geometry. Mathematics: Theory & Applications. Birkhäuser Boston, Inc., Boston, MA, 1992. Translated from the second Portuguese edition by Francis Flaherty.
Tamal K. Dey. Curve and surface reconstruction: algorithms with mathematical analysis, volume 23 of Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2007.
Dana Dachman-Soled, Vitaly Feldman, Li-Yang Tan, Andrew Wan, and Karl Wimmer. Approximate resilience, monotonicity, and the complexity of agnostic learning. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 498–511. SIAM, 2015.
Vincent Divol. Minimax adaptive estimation in manifold inference. Electron. J. Stat., 15(2):5888–5932, 2021.
Vincent Divol. Reconstructing measures on manifolds: an optimal transport approach. arXiv:2102.07595, February 2021.
Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 73–84. IEEE Computer Society, 2017.
Giuseppe De Marco, Gianluca Gorni, and Gaetano Zampieri. Global inversion of functions: an introduction. NoDEA Nonlinear Differential Equations Appl., 1(3):229–248, 1994.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, volume 3876 of Lecture Notes in Comput. Sci., pages 265–284. Springer, Berlin, 2006.
John Dunagan and Santosh S. Vempala. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program., 114(1):101–114, 2008.
David B Dunson and Nan Wu. Inferring Manifolds From Noisy Data Using Gaussian Processes. arXiv:2110.07478, October 2021.
Alexandre V. Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pages 211–222. ACM, 2003.
Maryam Fazel, Emmanuel Candès, Benjamin Recht, and Pablo Parrilo. Compressed sensing and robust recovery of low rank matrices. In in Proc. 40th Asilomar Conf. Signals, Systems and Computers, 2008.
Herbert Federer. Curvature measures. Trans. Amer. Math. Soc., 93:418–491, 1959.
Herbert Federer. Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften, Band 153. Springer-Verlag New York Inc., New York, 1969.
Vitaly Feldman. A general characterization of the statistical query complexity. In Proceedings of Machine Learning Research, volume 65, pages 785–830, Amsterdam, Netherlands, 2017.
Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S. Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. J. ACM, 64(2):8:1–8:37, 2017.
Vitaly Feldman, Cristóbal Guzmán, and Santosh Vempala. Statistical query algorithms for mean vector estimation and stochastic convex optimization. Math. Oper. Res., 46(3):912–945, 2021.
Charles Fefferman, Sergei Ivanov, Matti Lassas, and Hariharan Narayanan. Fitting a manifold of large reach to noisy data. arXiv:1910.05084, October 2019.
Vitaly Feldman, Will Perkins, and Santosh S. Vempala. On the complexity of random satisfiability problems with planted solutions. SIAM J. Comput., 47(4):1294–1338, 2018.
Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Manifold estimation and singular deconvolution under Hausdorff loss. Ann. Statist., 40(2):941–963, 2012.
Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Minimax manifold estimation. J. Mach. Learn. Res., 13:1263–1291, 2012.
Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer Series in Statistics. Springer, New York, second edition, 2009. Data mining, inference, and prediction.
DR. Ron Jarmin. Census bureau adopts cutting edge privacy protections for 2020 census, 2019. available at https://www.census.gov/newsroom/blogs/random-samplings/2019/02/census_bureau_adopts.html.
Noah M. Johnson, Joseph P. Near, and Dawn Song. Towards practical differential privacy for SQL queries. Proc. VLDB Endow., 11(5):526–539, 2018.
Michael J. Kearns. Efficient noise-tolerant learning from statistical queries. J. ACM, 45(6):983–1006, 1998.
Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. What can we learn privately? SIAM J. Comput., 40(3):793–826, 2011.
Arlene K. H. Kim and Harrison H. Zhou. Tight minimax rates for manifold estimation under Hausdorff loss. Electron. J. Stat., 9(1):1562–1582, 2015.
William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’87, page 163–169, New York, NY, USA, 1987. Association for Computing Machinery.
Yi-Kai Liu. Universal low-rank matrix recovery from pauli measurements. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., Red Hook, 2011.
John A. Lee and Michel Verleysen. Nonlinear dimensionality reduction. Information Science and Statistics. Springer, New York, 2007.
Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom., 39(1-3):419–441, 2008.
Nikita Puchkin and Vladimir Spokoiny. Structure-adaptive manifold estimation. Journal of Machine Learning Research, 23(40):1–62, 2022.
Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, and Parvez Ahammad. Linkedin’s audience engagements API: A privacy preserving data analytics system at scale. CoRR, abs/2002.05839, 2020.
Alexander A. Sherstov. Halfspace matrices. Comput. Complex., 17(2):149–178, 2008.
Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir, editors, Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, volume 49 of JMLR Workshop and Conference Proceedings, pages 1490–1516. JMLR.org, 2016.
Joshua B. Tenenbaum. Mapping a manifold of perceptual observations. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems 10, [NIPS Conference, Denver, Colorado, USA, 1997], pages 682–688. The MIT Press, Cambridge, 1997.
Leslie G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984.
Karsten A. Verbeurgt. Learning DNF under the uniform distribution in quasi-polynomial time. In Mark A. Fulk and John Case, editors, Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT 1990, University of Rochester, Rochester, NY, USA, August 6-8, 1990, pages 314–326. Morgan Kaufmann, Burlington, 1990.
Martin J. Wainwright. Constrained forms of statistical minimax: computation, communication, and privacy. In Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. IV, pages 273–290. Kyung Moon Sa, Seoul, 2014.
Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
Larry Wasserman. Topological data analysis. Annu. Rev. Stat. Appl., 5:501–535, 2018.
Bin Yu. Assouad, fano, and le cam. In Festschrift for Lucien Le Cam, pages 423–435. Springer, 1997.
Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis-Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015.
Acknowledgements
This work was partially funded by CNRS PEPS JCJC. The authors would like to thank warmly the Department of Mathematics of UC San Diego, the Steklov Institute, and all the members of the Laboratoire de Probabilités, Statistiques et Modélisation for their support and insightful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Peter Bubenik.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proofs of the Properties of Manifold Propagation
When running Manifold Propagation, linear approximations of the manifold are done via its (approximate) tangent spaces. A key point in the proof of its correctness is the (quantitative) validity of this approximation, which is ensured by the reach assumption \({\mathrm {rch}}_M \ge {\mathrm {rch}}_{\min }\), which bounds curvature. Recall from (2.1) that \(M^r = \{z \in {\mathbb {R}}^n, \mathrm {d}(z,M) \le r\}\) stands for the r-offset of M.
Lemma A.1
Let \(M \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\), and \(x \in M^\eta \) with \(\eta < {\mathrm {rch}}_{\min }\). Take \(T \in {\mathbb {G}}^{n,d}\) such that \(\left\| \pi _{T_{\pi _M(x)}M}-\pi _{T} \right\| _{\mathrm {op}} \le \sin \theta \). Then for all \(\Delta \le {\mathrm {rch}}_{\min }/4\), and all unit vector \(v \in T\),
Proof of Lemma A.1
By assumption on T, there exists a unit vector \(v' \in T_{\pi _M(x)}M\) such that \(\left\| v-v' \right\| \le \sin \theta \). Hence, since \(\mathrm {d}(\cdot ,M)\) is 1-Lipschitz, we have
where the last inequality follows from [7, Lemma 1]. \(\square \)
We are now in position to prove Lemma 3.1, that guarantees that Manifold Propagation builds point clouds that do not deviate from M.
Proof of Lemma 3.1
The points added to \({\mathcal {O}}\) are all first added to \({\mathcal {Q}}\): therefore, it is sufficient to check that all the points x added to \({\mathcal {Q}}\) satisfy \(\mathrm {d}(x,M) \le \eta \). To see this, proceed by induction:
-
As \({\mathcal {Q}}\) is initialized to \(\left\{ {{\hat{x}}_0}\right\} \) with \(\mathrm {d}({\hat{x}}_0,M) \le \eta \), the inequality holds true at Line 1, before the first loop.
-
If \({\bar{x}} \ne {\hat{x}}_0\) was added to \({\mathcal {Q}}\), it can be written as \({\bar{x}} = {\hat{\pi }}(x_0+\Delta v_i)\), for some point \(x_0 \in {\mathcal {Q}}\) and a unit vector \(v_i \in {\hat{T}}(x_0)\). By induction, we have \(\mathrm {d}(x_0,M)\le \eta \). But since \({\hat{T}}(\cdot )\) is assumed to have precision \(\sin \theta \) over \(M^\eta \), we hence obtain that \(\left\| \pi _{T_{\pi _M(x_0)}M}-\pi _{{\hat{T}}(x_0)} \right\| _{\mathrm {op}} \le \sin \theta \). As a result, from Lemma A.1,
$$\begin{aligned} \mathrm {d}(x_0+\Delta v_i,M) \le \frac{5}{8} \frac{\Delta ^2}{{\mathrm {rch}}_{\min }} + \eta + \Delta \sin \theta \le \Lambda , \end{aligned}$$and therefore
$$\begin{aligned} \mathrm {d}({\bar{x}},M) \le \left\| {\bar{x}} - \pi _M(x_0 + \Delta v_i) \right\| = \left\| {\hat{\pi }}(x_0 + \Delta v_i) - \pi _M(x_0 + \Delta v_i) \right\| \le \eta \end{aligned}$$since \({\hat{\pi }}(\cdot )\) is assumed to have precision \(\eta \) over \(M^\Lambda \).
This concludes the induction and hence the proof. \(\square \)
Next we show Lemma 3.2, asserting that the radius of sparsity of the point clouds built by Manifold Propagation is maintained at all times.
Proof of Lemma 3.2
At initialization of Manifold Propagation, \({\mathcal {Q}} \cup {\mathcal {O}} = \left\{ {{\hat{x}}_0}\right\} \), so that the inequality trivially holds at Line 1. Then, if a point \({\bar{x}}\) is added to \({\mathcal {Q}}\) at Line 8, it means that it can be written as \({\bar{x}} = {\hat{\pi }}(x_0+\Delta v_{i_0})\), with \(\mathrm {d}(x_0 + \Delta v_{i_0}, {\mathcal {Q}} \cup {\mathcal {O}}) \ge \delta \). Consequently, by induction, we have
In addition, Lemma A.1 and Lemma 3.1 combined yield
As a result, after the update \({\mathcal {Q}} \leftarrow {\mathcal {Q}}\cup \left\{ {{\bar{x}}}\right\} \), the announced inequality still holds. Finally, we notice that Line 11, which swaps a point from \({\mathcal {Q}}\) to \({\mathcal {O}}\), leaves \({\mathcal {Q}} \cup {\mathcal {O}}\) unchanged. By induction, this concludes the proof. \(\square \)
Finally we prove Lemma 3.3, that states that if Manifold Propagation terminates, it outputs a point cloud dense enough nearby M.
Proof of Lemma 3.3
Assume for contradiction that there exists \(p_0 \in M\) such that for all \(x \in {\mathcal {O}}\), \(\mathrm {d}_M\bigl (p_0, \pi _M(x)\bigr ) > \Delta \). Let \(x_0 \in {\mathcal {O}}\) (which is not empty since \({\hat{x}}_0 \in {\mathcal {O}}\)) be such that
and write \(y_0 := \pi _M(x_0)\). Let \(\gamma := \gamma _{y_0 \rightarrow p_0} : [0, r_0] \rightarrow M\) denote an arc-length parametrized geodesic joining \(y_0\) and \(p_0\). Finally, set \(q_0 := \gamma (\Delta ) \in M\) and \(v_0 := \gamma '(0) \in T_{y_0}M\).
Consider the sets \({\mathcal {Q}}\) and \({\mathcal {O}}\) of Manifold Propagation right after \(x_0\) was removed from \({\mathcal {Q}}\) and added to \({\mathcal {O}}\) (Line 11). By construction, all the elements \(v_1, \dots , v_k\) of a maximal \((\sin \alpha )\)-packing of \({\mathcal {S}}^{d-1}_{{\hat{T}}(x_0)}\) were tested to enter \({\mathcal {Q}}\) (Loop from Line 6 to Line 10). Because the packing is maximal, it is also a \((2 \sin \alpha )\)-covering of \({\mathcal {S}}^{d-1}_{{\hat{T}}(x_0)}\) (see the proof of Proposition B.2). As a result, by assumption on the precision of \({\hat{T}}(x_0)\), there exists \(v_{i_0}\) in this packing such that \(\left\| v_0 - v_{i_0} \right\| \le 2 \sin \alpha + \sin \theta \).
As \(\gamma \) is a distance-minimizing path on M from \(y_0\) to \(p_0\), so it is along its two sub-paths with endpoint \(q_0\), as otherwise, one could build a strictly shorter path between \(y_0\) and \(p_0\). In particular, since \(\Delta < r_0 = \mathrm {d}_M(y_0,p_0)\), we have \(\mathrm {d}_M(y_0,q_0) = \mathrm {d}_M(y_0, \gamma (\Delta )) = \Delta \) and \(\mathrm {d}_M(p_0,q_0) = \mathrm {d}_M(p_0, \gamma (\Delta )) = r_0 - \Delta \). As a result,
But from Lemma 2.2, we get
and furthermore,
We now bound the right hand side of Eq. (A.3) term by term. The first term is bounded by
where the inequality follows from a Taylor expansion and Lemma 2.2. For the second term, write
For the third term, we combine Lemma A.1 and Lemma 3.1 to get
and for the fourth term, applying again Lemma A.1 and Lemma 3.1 yields
Plugging these four bounds in Eq. (A.3), we have shown that
Combining Eqs. (A.4), (A.2), and the assumptions on the parameters \(\Delta , \eta , \theta , \alpha \) hence yields
so that Eq. (A.1) gives
In particular, \({\hat{\pi }}\left( x_0+\Delta v_{i_0}\right) \) was not added to \({\mathcal {Q}}\) in the Loop of Lines 6 to 10 investigating the neighbors of \(x_0\) (i.e., when \(x_0\) was picked Line 3). Since \({\mathcal {Q}} \cup {\mathcal {O}}\) is an increasing sequence of sets as Manifold Propagation runs and that \({\mathcal {Q}} = \emptyset \) when it terminates, this means that there exists \(x_1\) in the final output \({\mathcal {O}}\) such that \(\left\| x_0 + \Delta v_{i_0} - x_1 \right\| \le \delta \).
The existence of this particular point \(x_1\) in \({\mathcal {O}}\) which is \(\delta \)-close to \(x_0 + \Delta v_{i_0}\) will lead us to a contradiction: we will show that \(\pi _M(x_1)\) will be closer to \(p_0\) than \(\pi _M(x_0)\) is in geodesic distance. To get there, we first notice that any such \(x_1 \in {\mathcal {O}}\) would satisfy \(\mathrm {d}(x_1,M) \le \eta \) from Lemma 3.1, so that
where the last-but-one line follows from Lemma A.1, and the last one from the assumptions on the parameters \(\Delta , \eta , \theta \) and \(\delta \). As a result, from Lemma 2.2,
Furthermore, using a similar decomposition as for Eq. (A.4), we have
from which we finally get
This takes us to the desired contradiction, since:
-
on one hand, \(x_1 \in {\mathcal {O}}\) forces to have
$$\begin{aligned} \mathrm {d}_M(p_0, \pi _M(x_1)) \ge r_0 = \min _{x \in {\mathcal {O}}} \mathrm {d}_M(p_0, \pi _M(x)\bigr ) = \mathrm {d}_M(p_0, \pi _M(x_0)) ; \end{aligned}$$ -
on the other hand, Eqs. (A.5) and (A.6) combined yield
$$\begin{aligned} \mathrm {d}_M(p_0, \pi _M(x_1))&\le \mathrm {d}_M(p_0,q_0) + \mathrm {d}_M\left( q_0, \pi _M\left( x_0+\Delta v_{i_0}\right) \right) \\&\quad + \mathrm {d}_M(\pi _M\left( x_0+\Delta v_{i_0}\right) , \pi _M(x_1)) \\&\le r_0 - \Delta + \frac{3}{16} \Delta + \left( 1 + \frac{3}{10000} \right) \left( \delta + \frac{17}{192} \Delta \right) \\&< r_0, \end{aligned}$$where we used that \(\delta \le 7 \Delta / 10\).
As a result, we have proved that
which is the announced result. \(\square \)
Preliminary Geometric Results
1.1 Local Mass of Balls Estimates
To prove the properties of the statistical query routines, we will need the following two geometric results about manifolds with bounded reach. In what follows, \(t_+ := \max \{0,t \}\) stands for the positive part of \(t \in {\mathbb {R}}\).
Proposition B.1
[6, Proposition 8.2] Let \(M \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\), \(x \in {\mathbb {R}}^n\) such that \(\mathrm {d}(x,M) \le {\mathrm {rch}}_{\min }/8\), and \(h \le {\mathrm {rch}}_{\min }/8\). Then,
where \(r_h = (h^2- \mathrm {d}(x,M)^2)_+^{1/2}\), \((r_h^-)^2 = \left( 1-\frac{\mathrm {d}(x,M)}{{\mathrm {rch}}_{\min }}\right) r_h^2\), and \((r_h^+)^2 = \left( 1+\frac{ 2 \mathrm {d}(x,M)}{{\mathrm {rch}}_{\min }}\right) r_h^2\).
As a result, one may show that any ball has large mass with respect to a measure \(D \in {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\).
Lemma B.1
Let \(D \in {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) have support \(M = \mathrm {Supp}(D)\).
-
For all \(p \in M\) and \(h \le {\mathrm {rch}}_{\min }/4\),
$$\begin{aligned} a_d f_{\min } h^d \le D\bigl (\mathrm {B}(p,h) \bigr ) \le A_d f_{\max } h^d, \end{aligned}$$where \(a_d = 2^{-d} \omega _d\) and \(A_d = 2^d \omega _d\).
-
For all \(x_0 \in {\mathbb {R}}^n\) and \(h \le {\mathrm {rch}}_{\min }/8\),
$$\begin{aligned} a'_d f_{\min } (h^2-\mathrm {d}(x_0,M)^2)_+^{d/2} \le D\bigl (\mathrm {B}(x_0,h) \bigr ) \le A'_d f_{\max } (h^2-\mathrm {d}(x_0,M)^2)_+^{d/2}, \end{aligned}$$where \(a'_d = (7/8)^{d/2}a_d\) and \(A'_d = (5/4)^{d/2}A_d\).
Proof of Lemma B.1
The first statement is a direct consequence of [6, Propositions 8.6 & 8.7]. The second one follows by combining the previous point with Proposition B.1. \(\square \)
1.2 Euclidean Packing and Covering Estimates
For sake of completeness, we include in this section some standard packing and covering bounds that are used in our analysis. We recall the following definitions.
A r-covering of \(K \subseteq {\mathbb {R}}^n\) is a subset \({\mathcal {X}}= \left\{ { x_1,\ldots ,x_k }\right\} \subseteq K\) such that for all \(x \in K\), \(\mathrm {d}(x,{\mathcal {X}}) \le r\). A r-packing of K is a subset \({\mathcal {Y}} = \left\{ y_1,\ldots ,y_k \right\} \subseteq K\) such that for all \(y,y' \in {\mathcal {Y}}\), \(\mathrm {B}(y,r) \cap \mathrm {B}(y',r) = \emptyset \) (or equivalently \(\left\| y'-y \right\| >2r\)).
Definition B.1
(Covering and Packing numbers) For \(K \subseteq {\mathbb {R}}^n\) and \(r>0\), the covering number \(\mathrm {cv}_K(r)\) of K is the minimum number of balls of radius r that are necessary to cover K:
The packing number \(\mathrm {pk}_K(r)\) of K is the maximum number of disjoint balls of radius r that can be packed in K:
Packing and covering numbers are tightly related, as shown by the following well-known statement.
Proposition B.2
For all subset \(K \subseteq {\mathbb {R}}^n\) and \(r>0\),
Proof of Proposition B.2
For the left-hand side inequality, notice that if K is covered by a family of balls of radius 2r, each of these balls contains at most one point of a maximal 2r-packing. Conversely, the right-hand side inequality follows from the fact that a maximal r-packing is always a 2r-covering. Indeed, if it was not the case one could add a point \(x_0 \in K\) that is 2r-away from all of the r-packing elements, which would contradict the maximality of this packing. \(\square \)
We then bound the packing and covering numbers of the submanifolds with reach bounded below. Note that these bounds depend only on the intrinsic dimension and volumes, but not on the ambient dimension.
Proposition B.3
For all \(M \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) and \(r \le {\mathrm {rch}}_{\min } / 8\),
and
Proof of Proposition B.3
First, we have \(\mathrm {pk}_{M}(r) \ge \mathrm {cv}_{M}(2r)\) from Proposition B.2. In addition, if \(\left\{ {p_i}\right\} _{1 \le i \le N} \subseteq M\) is a minimal (2r)-covering of M, then by considering the uniform distribution \(D_M = \mathbb {1}_{M} {\mathcal {H}}^d /{\mathcal {H}}^d(M)\) over M, using a union bound and applying Lemma B.1, we get
As a result, \( \mathrm {pk}_{M}(r) \ge \mathrm {cv}_{M}(2r) = N \ge \frac{{\mathcal {H}}^d(M)}{\omega _d (4r)^d} . \)
For the second bound, use again Proposition B.2 to get \(\mathrm {cv}_{M}(r) \le \mathrm {pk}_{M}(r/2)\). Now, by definition, a maximal (r/2)-packing \(\left\{ {q_j}\right\} _{1 \le j \le N'}\subseteq M\) of M provides us with a family of disjoint balls of radii r/2. Hence, from Lemma B.1, we get
so that \( \mathrm {cv}_{M}(r) \le \mathrm {pk}_{M}(r/2) = N' \le \frac{{\mathcal {H}}^d(M)}{\omega _d (r/4)^d} . \) \(\square \)
Bounds on the same discretization-related quantities computed on the Euclidean n-balls and k-spheres will also be useful.
Proposition B.4
-
For all \(r > 0\),
$$\begin{aligned} \mathrm {pk}_{\mathrm {B}(0,R)}(r) \ge \left( \frac{R}{2r}\right) ^n \text { and } \mathrm {cv}_{\mathrm {B}(0,R)}(r) \le \left( 1+\frac{2R}{r}\right) ^n . \end{aligned}$$ -
For all integer \(1 \le k < n\) and \(r \le 1/8\),
$$\begin{aligned} \mathrm {pk}_{{\mathcal {S}}^{k}(0,1)}(r) \ge 2 \left( \frac{1}{4r}\right) ^k . \end{aligned}$$
Proof of Proposition B.4
-
From Proposition B.2, we have \(\mathrm {pk}_{\mathrm {B}(0,R)}(r) \ge \mathrm {cv}_{\mathrm {B}(0,R)}(2r)\). Furthermore, if \(\cup _{i = 1}^N \mathrm {B}(x_i,2r) \supseteq \mathrm {B}(0,R)\) is a minimal 2r-covering of \(\mathrm {B}(0,R)\), then by a union bound, \( \omega _n R^n = {\mathcal {H}}^n(\mathrm {B}(0,R)) \le N \omega _n (2r)^n , \) so that \(\mathrm {pk}_{\mathrm {B}(0,R)}(r) \ge \mathrm {cv}_{\mathrm {B}(0,R)}(2r) = N \ge (R/(2r))^n\).
For the second bound, we use again Proposition B.2 to get \(\mathrm {cv}_{\mathrm {B}(0,R)}(r) \le \mathrm {pk}_{\mathrm {B}(0,R)}(r/2)\), and we notice that any maximal (r/2)-packing of \(\mathrm {B}(0,R)\) with cardinality \(N'\) provides us with a family of disjoint balls of radii r/2, all contained in \(\mathrm {B}(0,R)^{r/2} = \mathrm {B}(0,R+r/2)\). A union bound hence yields \( \omega _n (R+r/2)^n = {\mathcal {H}}^n(\mathrm {B}(0,R+r/2)) \ge N' {\mathcal {H}}^n(\mathrm {B}(0,r/2)) = N' \omega _n (r/2)^n \), yielding \(\mathrm {cv}_{\mathrm {B}(0,R)}(r) \le \mathrm {pk}_{\mathrm {B}(0,R)}(r/2) = N' \le (1+2R/r)^n\).
-
Notice that \({\mathcal {S}}^{k}(0,1) \subseteq {\mathbb {R}}^n\) is a compact k-dimensional submanifold without boundary, reach \({\mathrm {rch}}_{{\mathcal {S}}^{k}(0,1)} = 1\), and volume \({\mathcal {H}}^k({\mathcal {S}}^k(0,1)) = \sigma _k\). Applying Proposition B.3 together with elementary calculations hence yield
$$\begin{aligned} \mathrm {pk}_{{\mathcal {S}}^k(0,1)}(r)&\ge \frac{\sigma _k}{\omega _k} \left( \frac{1}{4r}\right) ^k \\&= \left( \dfrac{ 2\pi ^{(k+1)/2} }{ \Gamma \left( \frac{k+1}{2} \right) } \right) \left( \dfrac{ \pi ^{k/2} }{ \Gamma \left( \frac{k}{2} +1 \right) } \right) ^{-1} \left( \frac{1}{4r}\right) ^k \\&= 2\sqrt{\pi } \frac{\Gamma \left( \frac{k}{2} +1 \right) }{\Gamma \left( \frac{k+1}{2} \right) } \left( \frac{1}{4r}\right) ^k \\&\ge 2 \left( \frac{1}{4r}\right) ^k . \end{aligned}$$
\(\square \)
1.3 Global Volume Estimates
The following bounds on the volume and diameter of low-dimensional submanifolds of \({\mathbb {R}}^n\) with positive reach are at the core of Sect. 2.2.3. They exhibit some implicit constraints on the parameters for the statistical models not to be degenerate.
Proposition B.5
For all \(M \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\),
with equality if and only if M is a d-dimensional sphere of radius \({\mathrm {rch}}_{\min }\). Furthermore, if \(M \subseteq \mathrm {B}(0,R)\) then \({\mathrm {rch}}_{\min } \le \sqrt{2} R\) and
Proof of Proposition B.5
For the first bound, note that the operator norm of the second fundamental form of M is everywhere bounded above by \(1/{\mathrm {rch}}_{\min }\) [50, Proposition 6.1], so that [8, (3)] applies and yields the result.
For the next statement, note that [40, Theorem 3.26] ensures that M is not homotopy equivalent to a point. As a result, [5, Lemma A.3] applies and yields
For the last bound, consider a \(({\mathrm {rch}}_{\min }/8)\)-covering \(\left\{ {z_i}\right\} _{1 \le i \le N}\) of \(\mathrm {B}(0,R)\), which can be chosen so that \(N \le \left( 1 + \frac{2R}{{\mathrm {rch}}_{\min }/8} \right) ^n \le \left( \frac{18R}{{\mathrm {rch}}_{\min }} \right) ^n\) from Proposition B.4. Applying Lemma B.1 with \(h={\mathrm {rch}}_{\min }/8\), we obtain
for all \(i \in \left\{ {1,\ldots ,N}\right\} \). A union bound then yields
which concludes the proof. \(\square \)
Projection Routine
We now build the SQ projection routine \({\hat{\pi }}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) (Theorem 4.1), which is used repeatedly in the SQ emulation of Manifold Propagation (Theorems 5.1 and 5.4). Recall that given a point \(x_0 \in {\mathbb {R}}^n\) nearby \(M = \mathrm {Supp}(D)\), we aim at estimating its metric projection \(\pi _M(x_0)\) onto M with statistical queries to \({\mathrm {STAT}}(\tau )\). We follow the strategy of proof described in Sect. 4.1.
1.1 Bias of the Local Conditional Mean for Projection
In what follows, we will write
for the local conditional mean of D given \(\mathrm {B}(x_0,h)\). In order to study the bias of \(m_D(x_0,h)\) with respect to \(\pi _M(x_0)\), it will be convenient to express it (up to approximation) with intrinsic geodesic balls \(\mathrm {B}_M(\cdot ,\cdot )\) instead of the extrinsic Euclidean balls \(\mathrm {B}(\cdot ,\cdot )\) that appears in its definition (Eq. (C.1)). This change of metric is stated in the following result.
Lemma C.1
Let \(D \in {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) have support \(M = \mathrm {Supp}(D)\), and \(p \in M\). Recall that \(\omega _d = {\mathcal {H}}^d\left( \mathrm {B}_d(0,1) \right) \) denotes the volume of the d-dimensional unit Euclidean ball. Then for all \(r \le {\mathrm {rch}}_{\min }/4\),
and for \(r \le {\bar{r}} \le {\mathrm {rch}}_{\min }/4\),
where \(C,C'>0\) are absolute constants.
Proof of Lemma C.1
First apply the area formula [32, Section 3.2.5] to write the mean of any measurable function G defined on M as
where J(t, v) is the Jacobian of the volume form of M expressed in polar coordinates around p for \(0 \le t \le r \le {\mathrm {rch}}_{\min }/4\) and unit \(v \in T_p M\). That is, \(J(t, v) = t^{d-1} \sqrt{\det \left( {A}^\top _{t, v} A_{t, v} \right) }\) where \(A_{t, v} = \mathrm {d}_{tv} \exp _p^M\). But from [5, Proposition A.1 (iv)], for all \(w \in T_p M\), we have
As a consequence,
and in particular,
where \(C > 0\) is an absolute constant. Also, by assumption on the model, f is L-Lipschitz, so
Finally, from [7, Lemma 1], we have
Putting everything together, we can now prove the first bound by writing
where the last inequality used the fact that \( \int _0^r \int _{{\mathcal {S}}^{d-1}}t^d f(p)v \mathrm{d}v \mathrm{d}t = 0\). Similarly, to derive the second bound, we write
which concludes the proof. \(\square \)
We are now in position to bound the bias of \(m_D(x_0,h)\).
Lemma C.2
Let \(D \in {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) have support \(M = \mathrm {Supp}(D)\), and \(x_0 \in {\mathbb {R}}^n\) be such that \(\mathrm {d}(x_0,M) < h \le {\mathrm {rch}}_{\min }/8\). Then,
where \(r_h = (h^2- \mathrm {d}(x_0,M)^2)^{1/2}\) and \(C>0\) is an absolute constant.
Proof of Lemma C.2
For short, let us write \(p_0 = \pi _M(x_0)\). All the expected values \(\mathop {{\mathbb {E}}}\) are taken with respect to \(x \sim D\). Before any calculation, we combine Proposition B.1 and Lemma 2.2 to assert that
where we wrote \((r_h^-)^2 = \left( 1-\mathrm {d}(x_0,M)/{\mathrm {rch}}_{\min }\right) r_h^2\) and \(R_h^+ = r_h^+\left( 1 + (r_h^+/{\mathrm {rch}}_{\min })^2 \right) \), with \((r_h^+)^2 = \left( 1+{2 \mathrm {d}(x_0,M)}/{{\mathrm {rch}}_{\min }}\right) r_h^2\). We note by now from the definition \(0 < r_h^- \le R_h^+ \le {\mathrm {rch}}_{\min }/4\) since \(\mathrm {d}(x_0,M) < h \le {\mathrm {rch}}_{\min }/8\), and that
for some absolute constant \(C' > 0\).
We can now proceed and derive the asserted bound. From triangle inequality,
Combining Eq. (C.2), Lemma C.1, Proposition B.1 and Lemma B.1, the first term of the right hand side can be further upper bounded by
where the last bound uses (C.3). For the second term, we use Lemma C.1 and Lemma B.1 to derive
Since \(r_h \le h\), this concludes the proof by setting \(C = {\tilde{C}} + {\tilde{C}}''\). \(\square \)
1.2 Metric Projection with Statistical Queries
We finally prove the main announced statement of Appendix C.
Proof of Theorem 4.1
First note that under the assumptions of the theorem, \(\mathrm {d}(x_0, M) \le \Lambda \le {\mathrm {rch}}_{\min } / 8\). We hence let \(h >0\) be a bandwidth to be specified later, but taken such that \(\mathrm {d}(x_0,M) < \sqrt{2} \Lambda \le h \le {\mathrm {rch}}_{\min }/8\).
Consider the map \(F(x) = \frac{(x-x_0)}{h}\mathbb {1}_{\left\| x-x_0 \right\| \le h}\) for \(x \in {\mathbb {R}}^n\). As \(\left\| F(x) \right\| \le 1\) for all \(x\in {\mathbb {R}}^n\), Lemma 2.1 asserts that there exists a deterministic statistical query algorithm making 2n queries to \({\mathrm {STAT}}(\tau )\) and that outputs a vector \({\hat{W}} = {\hat{V}}/h \in {\mathbb {R}}^n\) such that \(\left\| \mathop {{\mathbb {E}}}_{x \sim D}\left[ F(x)\right] - {\hat{V}}/h \right\| \le C \tau \). Furthermore, with the single query \(r = \mathbb {1}_{\mathrm {B}(x_0,h)}\) to \({\mathrm {STAT}}(\tau )\), we obtain \({\hat{a}} \in {\mathbb {R}}\) such that \(\left| D(\mathrm {B}(x_0,h) - {\hat{a}} \right| \le \tau \). Let us set \({\hat{\pi }}(x_0) := x_0 + {\hat{V}}/{\hat{a}}\) and prove that it satisfies the claimed bound. For this, use \(|V/a - {\hat{V}}/{\hat{a}}| \le |a-{\hat{a}}|V/(a{\hat{a}})+|V-{\hat{V}}|/{\hat{a}}\) to write
where the last inequality comes from Lemma B.1, and \(r_h = (h^2-\mathrm {d}(x_0,M)^2)^{d/2} \ge h/\sqrt{2}\) since \(h \ge \sqrt{2} \Lambda \). If in addition, one assumes that \({\tilde{c}}^d \omega _d f_{\min } (h/\sqrt{2})^d \ge 2\tau \), we obtain the lower bound \({\tilde{c}}^d \omega _d f_{\min } (h/\sqrt{2})^d - \tau \ge {\tilde{c}}^d \omega _d f_{\min } (h/\sqrt{2})^d /2\), so that the previous bound further simplifies to
On the other hand, Lemma C.2 yields that the bias term is not bigger than
with \(r_h \le h\). As a result,
Taking bandwidth
we have by assumption on the parameters of the model that \( {\mathrm {rch}}_{\min }/8 \ge h \ge 2 \Lambda \ge \sqrt{2} \Lambda \), and that \({\tilde{c}}^d \omega _d f_{\min } (h/\sqrt{2})^d \ge 2\tau \) as soon as \(c>0\) is small enough. Finally, plugging the value of h in the above bound and recalling that \( \Gamma = \frac{f_{\min }}{f_{\max } + L {\mathrm {rch}}_{\min }} \) yields
which concludes the proof. \(\square \)
Tangent Space Estimation Routine
We now build the SQ tangent space routine \({\hat{T}} : {\mathbb {R}}^n \rightarrow {\mathbb {G}}^{n,d}\) (Theorem 4.2), which is used repeatedly in the SQ emulation of Manifold Propagation (Theorems 5.1 and 5.4). Recall that given a point \(x_0 \in {\mathbb {R}}^n\) nearby \(M = \mathrm {Supp}(D)\), we aim at estimating the tangent space \(T_{\pi _M(x_0)} M\) with statistical queries to \({\mathrm {STAT}}(\tau )\). We follow the strategy of proof described in Sect. 4.2.
To fix notation from now on, we let \(\left\langle {A}, {B} \right\rangle = \mathrm {tr}(A^*B)\) stand for the Euclidean inner product between \(A,B \in {\mathbb {R}}^{k \times k}\). We also write \(\left\| \Sigma \right\| _{\mathrm {F}} = \sqrt{\left\langle {\Sigma }, {\Sigma } \right\rangle }\) for the Frobenius norm, \(\left\| \Sigma \right\| _{\mathrm {op}} = \max _{\left\| v \right\| \le 1} \left\| \Sigma v \right\| \) for the operator norm, and \(\left\| \Sigma \right\| _{*} = \max _{\left\| X \right\| _{\mathrm {op}} \le 1} \left\langle {\Sigma }, {X} \right\rangle \) for the nuclear norm. In what follows, for a symmetric matrix \(A \in {\mathbb {R}}^{n \times n}\), we let \(\mu _i(A)\) denote its i-th largest singular value.
1.1 Bias of Local Principal Component Analysis
In what follows, we will write
for the re-scaled local covariance-like matrix of D at \(x_0 \in {\mathbb {R}}^n\) with bandwidth \(h>0\). Notice that for simplicity, this local covariance-like matrix is computed with centering at the current point \(x_0\), and not at the local conditional mean \(\mathop {{\mathbb {E}}}_{x \sim D}\left[ x | \left\| x-x_0 \right\| \le h \right] \). This choice simplifies our analysis and will not impact the subsequent estimation rates. Let us first decompose this matrix and exhibit its link with the target tangent space \(T_{\pi _M(x_0)} M \in {\mathbb {G}}^{n,d}\).
Lemma D.1
Let \(D \in {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) have support \(M = \mathrm {Supp}(D)\), \(x_0 \in {\mathbb {R}}^n\) and \(h > 0\). If \(\mathrm {d}(x_0,M) \le \eta \le h/\sqrt{2}\) and \(h \le {\mathrm {rch}}_{\min }/(8\sqrt{d})\), then there exists a symmetric matrix \(\Sigma _0 \in {\mathbb {R}}^{n \times n}\) with \({\text {Im}}(\Sigma _0) = T_{\pi _M(x_0)} M\) such that
with \(\mu _d(\Sigma _0) \ge \omega _d f_{\min }(ch)^d\) and \(\left\| R \right\| _{*} \le \omega _d f_{\max } (C h)^d \left( \frac{\eta }{h} + \frac{h}{{\mathrm {rch}}_{\min }}\right) \), where \(c,C>0\) are absolute constants.
Proof of Lemma D.1
This proof roughly follows the ideas of [6, Section E.1], with a different center point in the covariance matrix (\(x_0\) itself instead of the local mean around \(x_0\)) and finer (nuclear norm) estimates on residual terms. For brevity, we let \(p_0 = \pi _M(x_0)\). We first note that the integrand defining \(h^2\Sigma _D(x_0,h)\) decomposes as
for all \(x \in \mathrm {B}(x_0,h) \cap M\). After integrating them with respect to \(x \sim D\), we bound the last two terms, by writing
where the last inequality uses Lemma B.1. Similarly, for the second term of Eq. (D.2), we have
Given \(v\in {\mathbb {R}}^n\), write and . We now focus on the first term of Eq. (D.2), which we further decompose as
for all \(x \in \mathrm {B}(x_0,h) \cap M\). Note that for those points \(x \in \mathrm {B}(x_0,h) \cap M\), we have , and from [31, Theorem 4.18], \(\left\| (x-p_0)_\perp \right\| \le \left\| x-p_0 \right\| ^2/(2{\mathrm {rch}}_{\min }) \le 4 h^2/(2{\mathrm {rch}}_{\min })\). Hence, for the last two terms of Eq. (D.3),
where we used Lemma B.1 again. Dealing now with the second term of Eq. (D.3),
Finally, let us write
The matrix \(\Sigma _0\) is symmetric and clearly has image \({\text {Im}}(\Sigma _0) \subseteq T_{p_0} M\). Furthermore, since \(\mathrm {d}(x_0,M) \le \eta \le h/\sqrt{2}\) and \(h \le {\mathrm {rch}}_{\min }/8\), Proposition B.1 and Lemma 2.2 yield that \(M \cap \mathrm {B}(x_0,h) \supseteq M \cap \mathrm {B}\bigl (p_0, \sqrt{7}h/4\bigr ) \supseteq \mathrm {B}_M(p_0,h/2)\). Hence, for all \(u \in T_{p_0} M\),
where \({\mathcal {H}}^d\) is the d-dimensional Hausdorff measure on \({\mathbb {R}}^n\), and \(\exp _{p_0}^M : T_{p_0} M \rightarrow M\) is the exponential map of M at \(p_0\). But [6, Proposition 8.7] states that there exists \(c > 0\) such that for all \(v \in \mathrm {B}_d(0, {\mathrm {rch}}_{\min }/4)\), \(\left| \det \left( \mathrm {d}_v \exp _{p_0}^M \right) \right| \ge c^d\), and [7, Lemma 1] yields the bound \(\left\| \exp _{p_0}^M(v)- (p_0 + v) \right\| \le 5\left\| v \right\| ^2/(8 {\mathrm {rch}}_{\min })\). As a result, using the fact that \((a-b)^2 \ge a^2/2 - 3b^2\) for all \(a,b \in {\mathbb {R}}\), we have
as soon as \(h \le {\mathrm {rch}}_{\min }/\sqrt{d}\). In particular, the last bound shows that the image of \(\Sigma _0\) is exactly \(T_{p_0} M\), and that \(\mu _d(\Sigma _0) \ge \omega _d f_{\min }(c'h)^d\). Summing up the above, we have shown that
where \(\Sigma _0\) is symmetric, \({\text {Im}}(\Sigma _0) = T_{\pi _M(x_0)} M\), \(\mu _d(\Sigma _0) \ge \omega _d f_{\min }(c'h)^d\), and
which is the announced result. \(\square \)
1.2 Matrix Decomposition and Principal Angles
The following lemma ensures that the principal components of a matrix A are stable to perturbations, provided that A has a large-enough spectral gap. For a symmetric matrix \(A \in {\mathbb {R}}^{n \times n}\), recall that \(\mu _i(A)\) denotes its i-th largest singular value.
Lemma D.2
(Davis-Kahan) Let \({\hat{A}}, A \in {\mathbb {R}}^{n\times n}\) be symmetric matrices such that \(\mathrm {rank}(A) = d\). If \({\hat{T}} \in {\mathbb {G}}^{n,d}\) denotes the linear space spanned by the first d eigenvectors of \({\hat{A}}\), and \(T = {\text {Im}}(A) \in {\mathbb {G}}^{n,d}\), then
Proof of Lemma D.2
It is a direct application of [63, Theorem 2] with \(r = 1\) and \(s = d\).
\(\square \)
1.3 Low-rank Matrix Recovery
Proceeding further in the strategy described in Sect. 4.2, we now explain how to estimate the local covariance matrix \(\Sigma _D(x_0,h) \in {\mathbb {R}}^{n \times n}\) (Eq. (D.1)) in \({\mathrm {STAT}}(\tau )\).
Because \(\Sigma _D(x_0,h) \in {\mathbb {R}}^{n \times n} = {\mathbb {R}}^{n^2}\) can be seen as a mean vector with respect to the unknown distribution D, \(2n^2\) queries to \({\mathrm {STAT}}(\tau )\) would yield error \(O(\tau )\) from Lemma 2.1. However, this would not use the low-rank structure of \(\Sigma _D(x_0,h)\), i.e., some redundancy of its entries. To mitigate the query complexity of this estimation problem, we will use compressed sensing techniques [30]. Mimicking the vector case (Lemma 2.1), we put our problem in the broader context of the estimation of \(\Sigma = \mathop {{\mathbb {E}}}_{x \sim D}[F(x)] \in {\mathbb {R}}^{k \times k}\) in \({\mathrm {STAT}}(\tau )\), where \(F : {\mathbb {R}}^n \rightarrow {\mathbb {R}}^{k \times k}\) and \(\Sigma \) are approximately low rank (see Lemma D.4).
1.3.1 Restricted Isometry Property and Low-Rank Matrix Recovery
Let us first present some fundamental results coming of matrix recovery. Following [30, Section II], assume that we observe \(y \in {\mathbb {R}}^q\) such that
where \(\Sigma \in {\mathbb {R}}^{k \times k}\) is the matrix of interest, \({\mathcal {L}}: {\mathbb {R}}^{k\times k} \rightarrow {\mathbb {R}}^q\) is a linear map seen as a sampling operator, and \(z \in {\mathbb {R}}^q\) encodes noise and has small Euclidean norm \(\left\| z \right\| \le \xi \).
In general, when \(q < k^2\), \({\mathcal {L}}\) has non-empty kernel, and hence one has no hope to recover \(\Sigma \) only from y, even with no noise. However, if \(\Sigma \) is (close to being) low-rank and that \({\mathcal {L}}\) does not shrink low-rank matrices too much, \({\mathcal {L}}(\Sigma )\) may not actually censor information on \(\Sigma \), while compressing the dimension from \(k^2\) to q. A way to formalize this idea states as follows.
Definition D.1
(Restricted Isometry Property) Let \({\mathcal {L}}: {\mathbb {R}}^{k\times k} \rightarrow {\mathbb {R}}^q\) be a linear map, and \(d \le k\). We say that \({\mathcal {L}}\) satisfies the d-restricted isometry property with constant \(\delta > 0\) if for all matrix \(X \in {\mathbb {R}}^{k \times k}\) of rank at most d,
We let \(\delta _d({\mathcal {L}})\) denote the smallest such \(\delta \).
To recover \(\Sigma \) only from the knowledge of y, consider the convex optimization problem (see [30]) over \(X \in {\mathbb {R}}^{k \times k}\):
Let \(\Sigma _\text {opt}\) denote the solution of Eq. (D.5). To give insights, the nuclear norm is seen here as a convex relaxation of the rank function [30], so that Eq. (D.5) is expected to capture a low-rank matrix close to \(\Sigma \). If \({\mathcal {L}}\) satisfies the restricted isometry property, the next result states that (D.5) does indeed capture such a low-rank matrix. In what follows, we let \(\Sigma ^{(d)} \in {\mathbb {R}}^{k \times k}\) denote the matrix closest to \(\Sigma \) among all the matrices of rank d, where closeness is indifferently measured in nuclear, Frobenius, or operator norm. That is, \(\Sigma ^{(d)}\) is the truncated singular value decomposition of \(\Sigma \).
Theorem D.1
[30, Theorem 4] Assume that \(\delta _{5d} < 1/10\). Then the solution \(\Sigma _\text {opt}\) of Eq. (D.5) satisfies
where \(C_0, C_1 > 0\) are universal constants.
1.3.2 Building a Good Matrix Sensing Operator
We now detail a standard way to build a sampling operator \({\mathcal {L}}\) that satisfies the restricted isometry property (Definition D.1), thereby allowing to recover low-rank matrices from a few measurements (Theorem D.1). For purely technical reasons, we shall present a construction over the complex linear space \({\mathbb {C}}^{k \times k}\). This will eventually enable us to recover results over \({\mathbb {R}}^{k \times k}\) via the isometry \({\mathbb {R}}^{k \times k} \hookrightarrow {\mathbb {C}}^{k \times k}\).
First, we note that given an orthonormal \({\mathbb {C}}\)-basis \({\mathbb {W}} = (W_1, \dots , W_{k^2})\) of \({\mathbb {C}}^{k \times k}\) for the Hermitian inner product \(\left\langle {A}, {B} \right\rangle = \mathrm {tr}(A^* B)\), we can build a sampling operator \({\mathcal {L}}_{{\mathbb {S}}}: {\mathbb {C}}^{k\times k} \rightarrow {\mathbb {C}}^q\) by projecting orthogonally onto the space spanned by only q (randomly) pre-selected \({\mathbb {S}} \subseteq {\mathbb {W}}\) elements of the basis.
When \(k = 2^\ell \), an orthonormal basis of \({\mathbb {C}}^{k \times k}\) of particular interest is the so-called Pauli basis [48]. Its construction goes as follows:
-
For \(k = 2\) (\(\ell = 1\)), it is defined by \(W^{(1)}_i = \sigma _i / \sqrt{2}\), where
$$\begin{aligned} \sigma _1 = \begin{pmatrix} 0 &{} 1 \\ 1 &{} 0 \end{pmatrix} , \quad \sigma _2 = \begin{pmatrix} 0 &{} -i \\ i &{} 0 \end{pmatrix} , \quad \sigma _3 = \begin{pmatrix} 1 &{} 0 \\ 0 &{} -1 \end{pmatrix} , \quad \sigma _4 = \begin{pmatrix} 1 &{} 0 \\ 0 &{} 1 \end{pmatrix}. \end{aligned}$$Note that the \(\sigma _i\)’s have two eigenvalues, both belonging to \(\left\{ {-1,1}\right\} \), so that they are both Hermitian and unitary. In particular, \(\left\| W^{(1)}_i \right\| _{\mathrm {op}} = 1 / \sqrt{2}\) and \(\left\| W^{(1)}_i \right\| _{\mathrm {F}} = 1\) for all \(i \in \left\{ {1, \dots , 4}\right\} \). One easily checks that \(\bigl (W^{(1)}_i\bigr )_{1 \le i \le 4}\) is an orthonormal basis of \({\mathbb {C}}^{2 \times 2}\).
-
For \(k = 2^\ell \) (\(\ell \ge 2\)), the Pauli basis \(\bigl (W^{(\ell )}_i\bigr )_{1 \le i \le 2^\ell }\) is composed of matrices acting on the tensor space \(\left( {\mathbb {C}}^2\right) ^{\otimes \ell } \simeq {\mathbb {C}}^{2^\ell }\), and defined as the family of all the possible \(\ell \)-fold tensor products of elements of \(\bigl (W^{(1)}_i\bigr )_{1 \le i \le 4}\). As tensor products preserve orthogonality, we get that \(\bigl (W^{(\ell )}_i\bigr )_{1 \le i \le 2^\ell }\) is an orthonormal basis of \({\mathbb {C}}^{2^\ell \times 2^\ell }\). Furthermore, as \(\left\| W \otimes W' \right\| _{\mathrm {op}} = \left\| W \right\| _{\mathrm {op}} \left\| W' \right\| _{\mathrm {op}} \), we get that for all \(i \in \left\{ {1, \dots , 2^\ell }\right\} \),
$$\begin{aligned} \left\| W^{(k)}_i \right\| _{\mathrm {op}} = \left( \frac{1}{\sqrt{2}}\right) ^\ell = \frac{1}{\sqrt{k}} . \end{aligned}$$(D.6)Since \(\left\| W \right\| _{\mathrm {F}} \le \sqrt{k} \left\| W \right\| _{\mathrm {op}} \), the value \(1/\sqrt{k}\) actually is the smallest possible common operator norm of an orthonormal basis of \({\mathbb {C}}^{k \times k}\). As will be clear in the proof of Lemma D.3, this last property—called incoherence in the matrix completion literature [48]—is key to design a good sampling operator.
Still considering the case \(k = 2^\ell \), we let \({\mathcal {L}}_{\mathrm {Pauli}}: {\mathbb {C}}^{k \times k} \rightarrow {\mathbb {C}}^q\) denote the random sampling operator defined by
where \((I_i)_{1 \le i \le q}\) is an i.i.d. sequence with uniform distribution over \(\left\{ {1, \dots , k^2}\right\} \). Up to the factor \(k/\sqrt{q}\), \({\mathcal {L}}_{\mathrm {Pauli}}\) is the orthogonal projector onto the space spanned by \((W^{(\ell )}_{I_1}, \dots ,W^{(\ell )}_{I_q})\). This normalization \(k/\sqrt{q}\) is chosen so that for all \(X \in {\mathbb {C}}^{k \times k}\),
That is, roughly speaking, \({\mathcal {L}}_{\mathrm {Pauli}}\) satisfies the restricted isometry property (RIP, Definition D.1) on average. Actually, as soon as q is large enough compared to d, the result below states that \({\mathcal {L}}_{\mathrm {Pauli}}\) does fulfill a restricted isometry property with high probability.
Lemma D.3
Assume that \(k = 2^\ell \), and fix \(0 < \alpha \le 1\). There exist universal constants \(c_0,c_1 > 0\) such that if \(q \ge c_0 k d \log ^6(k) \log (c_1/\alpha )\), then with probability at least \(1 - \alpha \), the following holds.
For all \(X \in {\mathbb {R}}^{k \times k}\) such that \(\left\| X \right\| _{*} \le \sqrt{5d} \left\| X \right\| _{\mathrm {F}} \),
In particular, on the same event of probability at least \(1 - \alpha \), \(\delta _{5d}\left( {\mathcal {L}}_{\mathrm {Pauli}}\right) < 1/10\).
Proof of Lemma D.3
The Pauli basis is an orthonormal basis of \({\mathbb {C}}^{k \times k}\), and from Eq. (D.6), its elements all have operator norm smaller than \(1/\sqrt{k}\). Hence, applying [48, Theorem 2.1] with \(K=\sqrt{k} \max _{1 \le i \le k} \left\| W^{(\ell )}_i \right\| _{\mathrm {op}} = 1\), \(r = 5d\), \(C = c_0 \log (c_1/\alpha )\), and \(\delta = 1/20\) yields the first bound. The second one follows by recalling that any rank-r matrix \(X \in {\mathbb {R}}^{k \times k}\) satisfies \(\left\| X \right\| _{*} \le \sqrt{r} \left\| X \right\| _{\mathrm {F}} \). \(\square \)
1.3.3 Mean Matrix Completion with Statistical Queries
The low-rank matrix recovery of Appendices D.3.1 and D.3.2 combined with mean vector estimation in \({\mathrm {STAT}}(\tau )\) for the Euclidean norm (see Lemma 2.1) lead to the following result.
Lemma D.4
For all \(\alpha \in (0,1]\), there exists a family of statistical query algorithms indexed by maps \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^{k \times k}\) such that the following holds on an event of probability at least \(1-\alpha \) (uniformly over F).
Let D be a Borel probability distribution over \({\mathbb {R}}^n\), and \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^{k \times k}\) be a map such that for all \(x \in {\mathbb {R}}^n\), \(\left\| F(x) \right\| _{\mathrm {F}} \le 1\) and \(\left\| F(x) \right\| _{*} \le \sqrt{5d} \left\| F(x) \right\| _{\mathrm {F}} \). Write \(\Sigma = \mathop {{\mathbb {E}}}_{x \sim D}\left[ F(x) \right] \), and \(\Sigma ^{(d)}\) for the matrix closest to \(\Sigma \) among all the matrices of rank \(d\le k\). Assume that \(\Sigma \in \Xi \), where \(\Xi \subseteq {\mathbb {R}}^{k \times k}\) is a known linear subspace of \({\mathbb {R}}^{k \times k}\).
Then, there exists a statistical query algorithm making at most \(c_0 dk\log ^6(k) \log (c_1/\alpha )\) queries to \({\mathrm {STAT}}(\tau )\), and that outputs a matrix \({\hat{\Sigma }} \in \Xi \) that satisfies
on the event of probability at least \(1-\alpha \) described above, where \(C_0,C_1 > 0\) are universal constants.
Proof of Lemma D.4
Without loss of generality, we can assume that \(k = 2^\ell \). Indeed, one can always embed \({\mathbb {R}}^{k\times k}\) isometrically into \({\mathbb {R}}^{2^\ell \times 2^\ell }\), with \(2^\ell = 2^{\left\lceil \log _2(k)\right\rceil } \le 2k \), via the linear map
which preserves both the rank, the Frobenius and nuclear norms.
Let \(q \ge 1\) be a fixed integer to be specified later, and \((I_i)_{1 \le i \le q}\) be and i.i.d. sequence with uniform distribution over \(\left\{ {1, \dots , k^2}\right\} \), and for \(X \in {\mathbb {R}}^{k \times k}\), write
as in Eq. (D.7). For \(x \in {\mathbb {R}}^n\), write \(G(x) = {\mathcal {L}}_{\mathrm {Pauli}}(F(x))/2 \in {\mathbb {R}}^{2q}\). From Lemma D.3, with probability at least \(1- \alpha \) (over the randomness of \((I_i)_{1 \le i \le q}\)),
holds simultaneously for all the described \(F: {\mathbb {R}}^{k\times k} \rightarrow {\mathbb {R}}^{2q}\). Hence, on this event of probability at least \(1-\alpha \), Lemma 2.1 applies to G and provides a deterministic statistical query algorithm making 4q queries to \({\mathrm {STAT}}(\tau )\), and that outputs a vector \(y \in {\mathbb {R}}^{2q}\) such that
where \(C>0\) is a universal constant. But on the other hand, by linearity,
where all the expected values are taken with respect D, conditionally on \((I_i)_{1 \le i \le q}\). Hence, as soon as \(q \ge c_0 dk\log ^6(k) \log (c_1/\alpha )\), Theorem D.1 and Lemma D.3 combined together yields the following: on the same event of probability at least \(1 - \alpha \) as before, the solution \(\Sigma _\text {opt}\) to the convex optimization problem over \(X \in {\mathbb {R}}^{k \times k}\) given by
satisfies
Hence, the projected solution \({\hat{\Sigma }} = \pi _\Xi (\Sigma _\text {opt})\) onto \(\Xi \subseteq {\mathbb {R}}^{k \times k}\) belongs to \(\Xi \) and satisfies
which concludes the proof. \(\square \)
1.4 Tangent Space Estimation with Statistical Queries
We finally prove the main announced statement of Appendix D.
Proof of Theorem 4.2
Let \(h >0\) be a bandwidth to be specified later, such that \(\eta \le h/\sqrt{2}\) and \(h \le {\mathrm {rch}}_{\min }/(8\sqrt{d})\). First note that \(\Sigma _D(x_0,h) = \mathop {{\mathbb {E}}}_{x \sim D} \left[ F(x) \right] \), where the function \(F(x) = (x-x_0){(x-x_0)}^\top /h^2 \mathbb {1}_{\left\| x-x_0 \right\| \le h}\) is defined for all \(x \in {\mathbb {R}}^n\), and is such that \(\left\| F(x) \right\| _{\mathrm {F}} \le 1\) and \(\mathrm {rank}(F(x)) \le 1\). In particular, \(\left\| F(x) \right\| _{*} = \left\| F(x) \right\| _{\mathrm {F}} \le \sqrt{5d} \left\| F(x) \right\| _{\mathrm {F}} \) for all \(x \in {\mathbb {R}}^{n}\). Furthermore, \(\Sigma _D(x_0,h)\) belongs to the linear space \(\Xi \) of symmetric matrices. Working on the event on which Lemma D.4 holds (with \(\alpha = 1/2\), say), yields the existence of a deterministic SQ algorithm making at most \(c_0 dn\log ^6(n) \log (2c_1)\) queries to \({\mathrm {STAT}}(\tau )\), and that outputs a symmetric matrix \({\hat{\Sigma }}\) that satisfies
with probability at least \(1 - \alpha \). On the other hand, from Lemma D.1, provided that \(\sqrt{2} \eta \le h \le {\mathrm {rch}}_{\min }/(8\sqrt{d})\), one can write
where the symmetric matrix \(\Sigma _0\) satisfies \({\text {Im}}(\Sigma _0) = T_{\pi _M(x_0)} M\), \(\mu _d(\Sigma _0) \ge \omega _d f_{\min }(ch)^d\) and \(\left\| R \right\| _{\mathrm {F}} \le \left\| R \right\| _{*} \le \omega _d f_{\max } (C h)^d \left( \frac{\eta }{h} + \frac{h}{{\mathrm {rch}}_{\min }}\right) \). As \(\mathrm {rank}(\Sigma _0) = d\), we have in particular that,
Therefore, taking \({\hat{T}}(x_0)\) as the linear space spanned by the first d eigenvectors of \({\hat{\Sigma }}\), Lemma D.2 yields
We conclude by setting \( h = {\mathrm {rch}}_{\min } \left\{ \sqrt{\frac{\eta }{{\mathrm {rch}}_{\min }}} \vee \left( \frac{\tau }{\omega _d f_{\max } {\mathrm {rch}}_{\min }^d} \right) ^{1/(d+1)} \right\} \) in this last bound. This value for h does satisfy \(\sqrt{2} \eta \le h \le {\mathrm {rch}}_{\min }/(8\sqrt{d})\) since \(\eta \le {\mathrm {rch}}_{\min }/(64d)\) and \(\frac{\tau }{\omega _d f_{\max } {\mathrm {rch}}_{\min }^d} \le \left( \frac{1}{8\sqrt{d}}\right) ^{d+1}\), so that the whole analysis applies, and yields the announced result. \(\square \)
Seed Point Detection
We now build the SQ point detection algorithm \({\hat{x}}_0 \in {\mathbb {R}}^n\) (Theorem 4.3), which is used to initialize in the SQ emulation of Manifold Propagation yielding the SQ reconstruction algorithm in the model \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) where no seed point is available (Definition 2.6).
Recall that given a ball of radius \(R>0\) guaranteed to encompass \(M = \mathrm {Supp}(D) \subseteq \mathrm {B}(0,R)\), and a target precision \(\eta > 0\), we aim at finding a point that is \(\eta \)-close to M with statistical queries to \({\mathrm {STAT}}(\tau )\). We follow the strategy of proof described in Sect. 4.3.
1.1 Detecting a Raw Initial Point
Starting from the whole ball \(\mathrm {B}(0,R)\), the following result allows us to find a point nearby M using a binary search, with best precision of order \(\Omega (\tau ^{1/d})\). Let us note that it does not explicitly rely on any differential property of M, but only the behavior of the mass of balls for D (Lemma B.1).
Theorem E.1
Let \(D \in {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) have support \(M = \mathrm {Supp}(D) \subseteq \mathrm {B}(0,R)\). Let \(\Lambda _0 \le {\mathrm {rch}}_{\min }/8\) be fixed, and assume that \( \frac{\Lambda _0}{\sqrt{\log (6R/\Lambda _0)}} \ge 21 {\mathrm {rch}}_{\min } \sqrt{n} \left( \frac{\tau }{\omega _d f_{\min }{\mathrm {rch}}_{\min }^d}\right) ^{1/d} \).
Then there exists a deterministic statistical query algorithm making at most \(3 n \log (6R/\Lambda _0)\) queries to \({\mathrm {STAT}}(\tau )\), and that outputs a point \({\hat{x}}_0^{raw} \in \mathrm {B}(0,R)\) such that
Remark E.1
Recall from Sect. 2.2.3 that we always assume that \(R \ge {\mathrm {rch}}_{\min }/\sqrt{2}\) to ensure that the model is nonempty. As a result \(\log (6R/\Lambda _0) \ge 0\) for all \(\Lambda _0 \le {\mathrm {rch}}_{\min }/8\).
Proof of Theorem E.1
The idea is to use a divide and conquer strategy over a covering \(\left\{ {x_i}\right\} _{1 \le i \le N}\) of \(\mathrm {B}(0,R)\). The algorithm recurses over a subset of indices \({\mathcal {I}} \subseteq \left\{ {1,\ldots ,N}\right\} \) that is maintained to fulfill \(\cup _{i \in {\mathcal {I}}} \mathrm {B}(x_i,h) \cap M \ne \emptyset \) for some known \(h>0\). This property can be checked with the single query \(r = \mathbb {1}_{\cup _{i \in {\mathcal {I}}} \mathrm {B}(x_i,h)}\) to \({\mathrm {STAT}}(\tau )\), provided that \(D(\cup _{i \in {\mathcal {I}}} \mathrm {B}(x_i,h)) > \tau \). To ensure the later, the radius \(h>0\) is dynamically increased at each iteration. The algorithm stops when \({\mathcal {I}}\) is reduced to a singleton. More formally, we consider SQ Ambient Binary Search.
Because \(|{\mathcal {I}}|\) is a decreasing sequence of integers, it is clear that SQ Ambient Binary Search terminates, and that \(|{\mathcal {I}}_{final}|=1\) so that the output \({\hat{x}}_0^{raw}\) is well defined. As each while loop does only one query to \({\mathrm {STAT}}(\tau )\), and that \(N = \mathrm {cv}_{\mathrm {B}(0,R)}(\Lambda _0/2) \le (6R/\Lambda _0)^n\) from Proposition B.4 and \(\Lambda _0 \le R\), it makes at most \(\left\lfloor \log _2(N) +1 \right\rfloor \le \left\lfloor n \log (6R/\Lambda _0)/\log (2) +1\right\rfloor \le 3 n \log (6R/\Lambda _0)\) queries in total.
Let us now prove that the output \({\hat{x}}_0^{raw}\) satisfies \(\mathrm {d}({\hat{x}}_0^{raw},M) \le \Lambda _0\). For this, we show that when running SQ Ambient Binary Search, the inequality \(\min _{i \in {\mathcal {I}}} \mathrm {d}(x_i,M) \le h\) is maintained (recall that both \({\mathcal {I}}\) and h are dynamic), or equivalently that \(\cup _{i \in {\mathcal {I}}} \mathrm {B}\left( x_i,h\right) \cap M \ne \emptyset \). At initialization, this is clear because \({\mathcal {I}} = \left\{ {1,\ldots ,N}\right\} \), \(h= \Lambda _0/2\), and \(\left\{ {x_i}\right\} _{1 \le i \le N}\) is a \((\Lambda _0/2)\)-covering of \( \mathrm {B}(0,R) \supseteq M\). Then, proceeding by induction, assume that \(\cup _{i \in {\mathcal {I}}} \mathrm {B}\left( x_i,h\right) \cap M \ne \emptyset \) when entering an iteration of the while loop. Let \(i_0 \in {\mathcal {I}}\) be such that \(\mathrm {d}(x_{i_0},M) \le h\). From Lemma B.1, provided that \(\sqrt{h^2+\Delta ^2} \le {\mathrm {rch}}_{\min }/8\), we have
Hence, if we let \(\mathrm {a}\) denote the answer of the oracle to the query \(r= \mathbb {1}_{\cup _{i \in {\mathcal {L}}} \mathrm {B}\left( x_i,\sqrt{h^2+\Delta ^2}\right) }\), we have:
-
If \(\mathrm {a}> \tau \), then
$$\begin{aligned} D\left( \cup _{i \in {\mathcal {L}}} \mathrm {B}\left( x_i,\sqrt{h^2+\Delta ^2}\right) \right) \ge \mathrm {a}- \tau > 0 , \end{aligned}$$so that after the updates \({\mathcal {I}} \leftarrow {\mathcal {L}}\) and \(h \leftarrow \sqrt{h^2 + \Delta ^2}\), we still have \( \cup _{i \in {\mathcal {I}}} \mathrm {B}\left( x_i,h\right) \cap M \ne \emptyset . \)
-
Otherwise \(a \le \tau \), so that from Eq. (E.1),
$$\begin{aligned} D\left( \cup _{i \in {\mathcal {R}}} \mathrm {B}\left( x_i,\sqrt{h^2+\Delta ^2}\right) \right)&\ge D\left( \cup _{i \in {\mathcal {I}}} \mathrm {B}\left( x_i,\sqrt{h^2+\Delta ^2}\right) \right) \\&\quad - D\left( \cup _{i \in {\mathcal {L}}} \mathrm {B}\left( x_i,\sqrt{h^2+\Delta ^2}\right) \right) \\&> 2\tau - (\mathrm {a}+\tau ) \\&\ge 0 . \end{aligned}$$So as above, after the updates \({\mathcal {I}} \leftarrow {\mathcal {R}}\) and \(h \leftarrow \sqrt{h^2 + \Delta ^2}\), we still have \( \cup _{i \in {\mathcal {I}}} \mathrm {B}\left( x_i,h\right) \cap M \ne \emptyset . \)
Consequently, when the algorithm terminates, we have
since \( \frac{\Lambda _0}{\sqrt{\log (6R/\Lambda _0)}} \ge 21 {\mathrm {rch}}_{\min } \sqrt{n} \left( \frac{\tau }{\omega _d f_{\min }{\mathrm {rch}}_{\min }^d}\right) ^{1/d} \). The above also shows that when running the algorithm we have \(\sqrt{h^2+\Delta ^2} \le h_{final} \le \Lambda _0 \le {\mathrm {rch}}_{\min }/8\), which ensures that Eq. (E.1) is valid throughout and concludes the proof. \(\square \)
1.2 Refined Point Detection
We finally prove the main announced statement of Appendix E.
Proof of Theorem 4.3
The idea is to first detect a possibly coarse base point \({\hat{x}}_0^{raw}\) using a divide and conquer strategy in the ambient space (Theorem E.1), and then refine it by considering iterated projections of \({\hat{x}}_0^{raw}\) given by the local conditional mean (Theorem 4.1). More precisely, let \({\hat{x}}_0^{raw}\) be the output of the point detection SQ algorithm of Theorem E.1 applied with parameter
where \(C^d,\Gamma >0\) are the constants of Theorem 4.1. From the assumptions on the parameters (recall also that we necessarily have \(R \ge {\mathrm {rch}}_{\min }/\sqrt{2}\), see Sect. 2.2.3), we have \(\Lambda _0 \le {\mathrm {rch}}_{\min }/8\) and
so that Theorem E.1 applies and guarantees that \({\hat{x}}_0^{raw}\) can be obtained with at most \(3 n \log (6R/\Lambda _0)\) queries to \({\mathrm {STAT}}(\tau )\) and satisfies \(\mathrm {d}({\hat{x}}_0^{raw},M) \le \Lambda _0\).
If \(\Lambda = \eta \)—condition which can be checked by the learner since the parameters \(\eta , \Gamma , d\) and \({\mathrm {rch}}_{\min }\) are assumed to be known—, then \({\hat{x}}_0 := {\hat{x}}_0^{raw}\) clearly satisfies \(\mathrm {d}({\hat{x}}_0,M) = \mathrm {d}({\hat{x}}_0^{raw},M) \le \eta \), and has required at most \(3 n \log (6R/\Lambda _0) = 3 n \log (6R/\eta )\) queries to \({\mathrm {STAT}}(\tau )\). Otherwise, \(\eta \le \Lambda _0\), and we iterate the SQ approximate projections \({\hat{\pi }}(\cdot )\) given by Theorem 4.1. Namely, we let \({\hat{y}}_0 = {\hat{x}}_0^{raw}\) and for all integer \(k\ge 1\), \({\hat{y}}_k = {\hat{\pi }}({\hat{y}}_{k-1})\). In total, note that the computation of \({\hat{y}}_k\) requires at most \(3 n \log (6R/\eta ) + k(2n+1) \le 3n\bigl (\log (6R/\eta )+k\bigr )\) queries to \({\mathrm {STAT}}(\tau )\). Similarly as above, from the assumptions on the parameters, one easily shows by induction that since \(\mathrm {d}({\hat{y}}_0,M) \le \Lambda _0 \le \frac{{\mathrm {rch}}_{\min }}{16}\), Theorem 4.1 applies to each \({\hat{y}}_k\) and guarantees that
To conclude, fix \(k_0 := \left\lceil \log _2\left( \Lambda _0/\eta \right) \right\rceil \le \log \left( 6 \Lambda _0/\eta \right) \), and set \({\hat{x}}_0 := {\hat{y}}_{k_0}\). From the previous bound, we obtain that
with \({\hat{x}}_0\) requiring at most \(3n\bigl (\log (6R/\eta )+\log \left( 6 \Lambda _0/\eta \right) \bigr ) \le 6 n \log (6R/\eta )\) queries to \({\mathrm {STAT}}(\tau )\) to be computed, which concludes the proof. \(\square \)
Proof for the Main Statistical Query Manifold Estimators
This section is devoted to the proof of the two SQ manifold estimation upper bounds: the first one in the fixed point model \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) (Theorem 5.1), and the second one for the bounding ball model \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) (Theorem 5.4).
Proof of Theorem 5.1
Let us write
for some large enough \({\bar{C}}_d>0\) depending on d and \({{\textbf {C}}}\) to be chosen later, and \(\delta = \Delta /2\). We will run Manifold Propagation with scale parameters \(\Delta \), \(\delta \), angle \(\sin \alpha = 1/64\), and initialization point \({\hat{x}}_0 = 0 \in M\), the SQ projection routine \({\hat{\pi }}(\cdot )\) of Theorem 4.1 and the SQ tangent space routine \({\hat{T}}(\cdot )\) of Theorem 4.2. If we prove that these routines are precise enough, then Theorem 3.1 will assert that the output point cloud \({\mathcal {O}}\) and associated tangent space estimates \({\mathbb {T}}_{\mathcal {O}}\) of Manifold Propagation fulfill the assumptions of Theorem 2.1. This will hence allow to reconstruct M with a good triangulation, as claimed.
Note by now that at each iteration Manifold Propagation, exactly one call to each SQ routine \({\hat{\pi }}(\cdot )\) and \({\hat{T}}(\cdot )\) are made, yielding at most \((2n+1) + Cdn \log ^6(n) \le C'd n \log ^6(n)\) statistical queries. But if Theorem 3.1 applies, we get that the number of iteration \(N_{\mathrm {loop}}\) of Manifold Propagation satisfies
where the second inequality comes from the fact that \(1 = \int _{M} f \mathrm {d}{\mathcal {H}}^d \ge f_{\min } {\mathcal {H}}^d(M)\). In total, the resulting SQ algorithm hence makes at most
queries to \({\mathrm {STAT}}(\tau )\), which is the announced complexity. It only remains to verify that the SQ routines \({\hat{\pi }}(\cdot )\) and \({\hat{T}}(\cdot )\) are indeed precise enough so that Theorem 3.1 applies, and to bound the final precision given by the triangulation of Theorem 2.1.
To this aim, we notice that the assumption made on \(\tau \) puts it in the regime of validity of Theorem 4.1 and Theorem 4.2. Let us write
where \(C>0\) is the constant of Theorem 4.1 and \(\tilde{C}>1\) that of Theorem 4.2. Note by now that since \(f_{\max } \ge f_{\min }\), we have \({{\textbf {C}}} \ge 1\). For short, we also let \(\tilde{\tau }:= \tau /(\omega _d f_{\min } {\mathrm {rch}}_{\min }^d)\).
At initialization, and since \(D \in \left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\), the seed point \({\hat{x}}_0 = 0\) belongs to M, meaning that
Note that from the assumptions on the parameters, \(\eta \le {\mathrm {rch}}_{\min }/(64d)\). Hence, on the \(\eta \)-offset \(M^\eta \) of M, Theorem 4.2 asserts that \({\hat{T}}(\cdot )\) has precision
for estimating tangent spaces. As a result, we have
Using again the assumptions on the parameters, we have \(\Lambda \le {\mathrm {rch}}_{\min }/8\). Hence, applying Theorem 4.1 and elementary simplifications given by the assumptions on the parameters yield that, over the \(\Lambda \)-offset \(M^\Lambda \) of M, the projection \({\hat{\pi }}(\cdot )\) has precision at most
Additionally, one easily checks that \(\Delta \le {\mathrm {rch}}_{\min }/24\), \(\eta \le \Delta /24\) and \(\max \left\{ {\sin \alpha , \sin \theta }\right\} \le 1/64\), so that Theorem 3.1 applies: Manifold Propagation terminates and outputs a finite point cloud \({\mathcal {O}}\) such that \(\max _{x \in {\mathcal {O}}} \mathrm {d}(x,M) \le \eta \) and \(\max _{p \in M} \mathrm {d}(p, {\mathcal {O}}) \le \Delta + \eta \le 2\Delta \), together with tangent space estimates \({\mathbb {T}}_{\mathcal {O}}\) with error at most \(\sin \theta \). Hence, applying Theorem 2.1 with parameters \(\Delta ' = 2\Delta \), \(\eta \) and \(\sin \theta \) (for which one easily checks that they fulfill its assumptions), we get that the triangulation \({\hat{M}}\) of Theorem 2.1 computed over \({\mathcal {O}}\) and \({\mathbb {T}}_{\mathcal {O}}\) achieves precision
which yields the announced result since \({{\textbf {C}}} \le (C \vee {\tilde{C}})^d/\Gamma \). \(\square \)
Proof of Theorem 5.4
The proof follows the same lines as that of Theorem 5.1, except for the seed point \({\hat{x}}_0\) which is not trivially available, and requires extra statistical queries. More precisely, we let \({\hat{x}}_0\) be the output point given by the SQ detection algorithm of Theorem 4.3 applied with precision parameter \(\varepsilon /2\). This point requires no more than \( 6 n \log (6R/\varepsilon )\) statistical queries to \({\mathrm {STAT}}(\tau )\). Furthermore, adopting the same notation as in the proof of Theorem 5.1 we have
so that the rest of the proof runs exactly as that of Theorem 5.1, and yields the result. \(\square \)
Statistical Query Lower Bounds in Metric Spaces
In spirit, the lower bound techniques developed below are similar to the statistical dimension of [33], developed for general search problems. However, when working with manifold models, this tool appears difficult to handle, due to the singular nature of low-dimensional distributions, yielding non-dominated models. Indeed, if \(D_0\) and \(D_1\) are distributions that have supports being d-dimensional submanifolds \(M_0,M_1 \subseteq {\mathbb {R}}^n\), and that \(M_0 \ne M_1\), then \(D_0\) and \(D_1\) cannot be absolutely continuous with respect to one another. As a result, any lower bound technique involving Kullback-Leibler or chi-squared divergences becomes non-informative (see for instance [24, 33]).
Instead, we present techniques that are well-suited for non-dominated models. They apply for SQ estimation in metric spaces \((\Theta ,\rho )\) (see Sect. 2.1), as opposed to the (more general) setting of search problems of [33]. We decompose these results into two different types of lower bounds:
-
(Appendix G.1) The information-theoretic ones, yielding a maximal estimation precision \(\varepsilon = \varepsilon (\tau )\) given a tolerance \(\tau \);
-
(Appendix G.2) The computational ones, yielding a minimal number of queries \(q = q(\varepsilon )\) to achieve a given precision \(\varepsilon \).
1.1 Information-Theoretic Lower Bound for Randomized SQ Algorithms
The proofs of the informational lower bounds Theorems 5.2 and 5.5 are based on the following Theorem G.1, which is similar to so-called Le Cam’s Lemma [62]. To introduce this result we define the total variation distance between probability distributions.
Definition G.1
(Total Variation Distance) Given two probability distributions \(D_0\) and \(D_1\) over \(({\mathbb {R}}^n, {\mathcal {B}}({\mathbb {R}}^n))\), the total variation distance between them is defined by
The second formula above for the total variation suggests how it can measure an impossibility of estimation with \({\mathrm {STAT}}(\tau )\) oracles: two distributions that are close in total variation distance provide a malicious oracle to make them—and their parameter of interest—indistinguishable using SQ’s, . This lower bound insight is what underlies Le Cam’s Lemma [62] in the sample model, and it adapts easily to (randomized) SQ’s in the following way.
Theorem G.1
(Le Cam’s Lemma for Statistical Queries) Consider a model \({\mathcal {D}}\) and a parameter of interest \(\theta : {\mathcal {D}}\rightarrow \Theta \) in the metric space \((\Theta ,\rho )\). Assume that there exist hypotheses \(D_0, D_1 \in {\mathcal {D}}\), such that
If \(\alpha < 1 / 2\), then no \({\mathrm {STAT}}(\tau )\) randomized SQ algorithm can estimate \(\theta \) with precision \(\varepsilon \le \delta \) and probability of success \(1 - \alpha \) over \({\mathcal {D}}\) (no matter how many queries it does).
Proof of Theorem G.1
We prove the contrapositive. For this purpose, assume that a randomized SQ algorithm \(\mathtt {A} \sim \mathcal {A}\) estimates \(\theta \) with precision \(\varepsilon \le \delta \) and probability at least \(1-\alpha \) over \({\mathcal {D}}\). We will show that \(\alpha \ge 1 / 2\).
Consider the oracle which, given a query \(r:{\mathbb {R}}^n \rightarrow [-1,1]\) to the distribution \(D \in {\mathcal {D}}\), returns the answer:
-
\(\mathrm {a}= \mathop {{\mathbb {E}}}_{D_0}[r]\) if \(D = D_1\);
-
\(\mathrm {a}= \mathop {{\mathbb {E}}}_D[r]\) if \(D \in {\mathcal {D}}\setminus \left\{ {D_1}\right\} \).
As for all query \(r : {\mathbb {R}}^n \rightarrow [-1, 1]\), \(|\mathop {{\mathbb {E}}}_{D_0}[r] - \mathop {{\mathbb {E}}}_{D_1}[r]| \le 2 \mathop {\mathrm {TV}}(D_0,D_1) \le \tau \), it is a valid \({\mathrm {STAT}}(\tau )\) oracle. Furthermore, notice that the answers of this oracle are the same for \(D = D_0\) and \(D = D_1\). Writing \(\mathtt {A} = (r_1, \dots , r_q, {\hat{\theta }}) \sim \mathcal {A}\), we denote these answers by \(\mathrm {a}_1, \dots , \mathrm {a}_q\). The \(\mathrm {a}_i\)’s are random variables, with randomness driven by the randomness of \(\mathtt {A} \sim \mathcal {A}\). For \(i \in \left\{ {0, 1}\right\} \), let us consider the event
The fact that \(\mathcal {A}\) estimates \(\theta \) with precision \(\varepsilon \) and probability at least \(1-\alpha \) over \({\mathcal {D}}\) translates into \(\Pr _{\mathtt {A} \sim \mathcal {A}}(B_i) \ge 1 - \alpha \), for \(i\in \left\{ {0, 1}\right\} \). But since \(\varepsilon \le \delta < \rho (\theta (D_0), \theta (D_1)) / 2\), the events \(B_0\) and \(B_1\) are disjoint (i.e., \(B_0 \cap B_1 = \emptyset \)). As a result,
which yields \(\alpha \ge 1/2\) and concludes the proof. \(\square \)
1.2 Computational Lower Bound
This section is dedicated to prove the following Theorem G.2, that provides a computational lower bound for support estimation in Hausdorff distance. It involves the generalized notion of metric packing, which is defined right below.
Theorem G.2
Given a model \({\mathcal {D}}\) over \({\mathbb {R}}^n\), any randomized SQ algorithm estimating \(M = \mathrm {Supp}(D) \subseteq {\mathbb {R}}^n\) with precision \(\varepsilon \) for the Hausdorff distance, and with probability of success at least \(1 - \alpha \), must make at least
queries to \({\mathrm {STAT}}(\tau )\), where \({\mathcal {M}}= \left\{ {\mathrm {Supp}(D), D \in {\mathcal {D}}}\right\} \).
Similarly to Appendix G.1, we put Theorem G.2 in the broader context of SQ estimation in metric spaces (see Sect. 2.1), and state the more general Theorem G.3. To this aims, and similarly to the Euclidean case (Definition B.1), let us recall the definitions of metric packings and coverings. We let \((\Theta ,\rho )\) be a metric space, \({\mathcal {M}} \subseteq \Theta \) a subset of \(\Theta \), and a radius \(\varepsilon >0\).
-
An \(\varepsilon \)-covering of \({\mathcal {M}}\) is a subset \(\left\{ {\theta _1, \dots , \theta _k }\right\} \subseteq {\mathcal {M}}\) such that for all \(\theta \in {\mathcal {M}}\), we have \(\min _{1 \le i \le k}\rho (\theta ,\theta _i) \le \varepsilon \). The covering number \(\mathrm {cv}_{({\mathcal {M}},\rho )}(\varepsilon )\) of \({\mathcal {M}}\) at scale \(\varepsilon \) is the smallest cardinality k of such an \(\varepsilon \)-covering.
-
An \(\varepsilon \)-packing of \({\mathcal {M}}\) is a subset \(\left\{ \theta _1, \dots ,\theta _k \right\} \subseteq {\mathcal {M}}\) such that for all \(1 \le i < j \le k\), \(\mathrm {B}_{(\Theta ,\rho )}(\theta _i,\varepsilon ) \cap \mathrm {B}_{(\Theta ,\rho )}(\theta _j,\varepsilon ) = \emptyset \) (or equivalently \(\rho (\theta _i,\theta _j) > 2\varepsilon \)), where \(\mathrm {B}_{(\Theta ,\rho )}(\theta ,\varepsilon ) = \left\{ {\theta ' \in \Theta , \rho (\theta ,\theta ')\le \varepsilon }\right\} \) is the closed ball in \((\Theta ,\rho )\). The covering number \(\mathrm {pk}_{({\mathcal {M}},\rho )}(\varepsilon )\) of \({\mathcal {M}}\) at scale \(\varepsilon \) is the largest cardinality k of such an \(\varepsilon \)-packing.
Theorem G.3
Given a model \({\mathcal {D}}\) and a parameter of interest \(\theta : {\mathcal {D}}\rightarrow \Theta \) in the metric space \((\Theta ,\rho )\), any randomized SQ algorithm estimating \(\theta (D)\) over \({\mathcal {D}}\) with precision \(\varepsilon \) and probability of success at least \(1 - \alpha \), must make at least
queries to \({\mathrm {STAT}}(\tau )\), where \(\theta ({\mathcal {D}}) = \left\{ {\theta (D), D \in {\mathcal {D}}}\right\} \).
Proof of Theorem G.2
Apply Theorem G.3 with parameter of interest \(\theta (D) = \mathrm {Supp}(D)\) and distance \(\rho = \mathrm {d_H}\). \(\square \)
1.2.1 Probabilistic Covering and Packing Number
To prove Theorem G.3, we will use the following notion of probabilistic covering. Given a set S and an integer \(k \ge 0\), we denote by \(\left( {\begin{array}{c}S\\ \le k\end{array}}\right) \) the set of all subsets of S of cardinality at most k.
Definition G.2
Let \((\Theta , \rho )\) be a metric space. We say that a probabilistic measure \(\mu \) over \(\left( {\begin{array}{c}\Theta \\ \le d\end{array}}\right) \) is a probabilistic \((\varepsilon , \alpha )\)-covering of \((\Theta , \rho )\) by d points if for all \(\theta \in \Theta \),
We denote by \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon , \alpha )\) the minimal d such that there is a probabilistic \((\varepsilon , \alpha )\)-covering of \((\Theta , \rho )\) with d points.
This clearly generalizes (deterministic) coverings, since \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon , \alpha = 0)\) coincides with the standard covering number \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon )\). However, this quantity might be involved to compute since it involves randomness. Before proving Theorem G.3, let us show how to lower bound \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon , \alpha )\) in practice.
Theorem G.4
Let \((\Theta , \rho )\) be a metric space. Assume that there is a probability measure \(\nu \) on \(\Theta \) such that for all \(q_1, \dots , q_\ell \in \Theta \),
Then \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon , \alpha ) > \ell \).
Proof of Theorem G.4
Take any probability measure \(\mu \) over \(\left( {\begin{array}{c}\Theta \\ \le \ell \end{array}}\right) \), and consider the map \(f({{\textbf {p}}}, \theta ) = \mathbb {1}_{\cup _{q \in {{\textbf {p}}}}\mathrm {B}_{(\Theta ,\rho )}(q,\varepsilon )}(\theta )\) for all \({{\textbf {p}}} \in \left( {\begin{array}{c}\Theta \\ \le \ell \end{array}}\right) \) and \(\theta \in \Theta \). By assumption, for all fixed \({{\textbf {p}}} \in \left( {\begin{array}{c}\Theta \\ \le \ell \end{array}}\right) \),
hence, by integration with respect to \(\mu (d {{\textbf {p}}})\) and Fubini–Tonelli,
As \(\nu \) is a probability distribution, this yields the existence of a fixed \(\theta = \theta _\mu \in \Theta \) such that
In other words, we have shown that no probability distribution \(\mu \) over \(\left( {\begin{array}{c}\Theta \\ \le \ell \end{array}}\right) \) can be an \((\varepsilon , \alpha )\)-covering of \((\Theta , \rho )\) (Definition G.2). Hence, \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon , \alpha ) > \ell \). \(\square \)
As a byproduct of Theorem G.4, we can now show that probabilistic coverings are closely related to the usual notions of metric covering and packing numbers.
Theorem G.5
Let \((\Theta , \rho )\) be a metric space, and \(\alpha < 1\). Then,
Proof of Theorem G.5
If any of the three terms is infinite, then all the terms involved clearly are infinite, so that the announced bounds hold. Otherwise, any given \(\varepsilon \)-covering of \((\Theta ,\rho )\) is also a \((\varepsilon , \alpha )\)-covering (where we identify a finite set to the uniform measure on it), which gives the left-hand bound. For the right-hand bound, write \(k = \mathrm {pk}_{(\Theta , \rho )}(\varepsilon ) < \infty \), and let \(\{\theta _1, \dots , \theta _k\}\) be an \(\varepsilon \)-packing of \((\Theta , \rho )\). That is, for all \(i \ne j\), \(\rho (\theta _i, \theta _j) > 2\varepsilon \).
Take \(\nu \) to be the uniform probability distribution over this packing, that is set \(\nu (S) = |\{\theta _1, \dots , \theta _k\} \cap S|/k\) for all \(S \subseteq \Theta \). Note that since \(\{\theta _1, \dots , \theta _k\}\) is an \(\varepsilon \)-packing, we have \(\nu \bigl (\mathrm {B}_{(\Theta ,\rho )}(\theta ,\varepsilon )\bigr ) \le 1 / k\) for all \(\theta \in \Theta \), and as a result,
for all \(\theta _1, \dots , \theta _\ell \in \Theta \).
Taking \(\ell = \left\lceil (1 - \alpha ) k\right\rceil - 1\), Theorem G.4 implies that \(\mathrm {cv}_{(\Theta , \rho )}(\varepsilon , \alpha ) > \left\lceil (1 - \alpha ) k\right\rceil - 1\), and hence
\(\square \)
1.2.2 Proof of the Computational Lower Bounds for Randomized SQ Algorithms
We are now in position to prove the lower bounds on (randomized) SQ algorithms in general metric spaces.
Proof of Theorem G.3
For all \(i \in \left\{ {0, \dots , \left\lceil 1 / \tau \right\rceil }\right\} \), write \(L_i = \min \left\{ {-1 + (2i + 1) \tau , 1}\right\} \). The \(L_i\)’s form a \(\tau \)-cover of \([-1,1]\), meaning that for all \(t \in [-1,1]\), there is a least one \(0 \le i \le \left\lfloor 1 / \tau \right\rfloor \) with \(|L_i - t| \le \tau \). Hence we can define \(f : [-1, 1] \rightarrow [-1, 1]\) by \(f(t) = L_{i_0}\), where \(L_{i_0}\) is smallest \(L_i\) such that \(|L_{i}-t| \le \tau \). Note that f takes only \(\left\lfloor 1 / \tau \right\rfloor + 1\) different values, and that \(|f(t) - t| \le \tau \) for all \(t \in [-1,1]\).
Let us now consider the oracle \({\mathsf {O}}\) which, given a query \(r : {\mathbb {R}}^n \rightarrow [-1, 1]\) to the distribution D, returns the answer \(\mathrm {a}_D(r) = f(\mathop {{\mathbb {E}}}_D [r])\). Roughly speaking, the oracle discretizes the segment \([-1, 1]\) into \(\left\lfloor 1 /\tau \right\rfloor +1\) points and returns the projection of the correct mean value \(\mathop {{\mathbb {E}}}_D [r]\) onto this discretization. Clearly, \({\mathsf {O}}\) is a valid \({\mathrm {STAT}}(\tau )\) oracle since \(|f(t) - t| \le \tau \) for all \(t \in [-1,1]\).
Let \(\mathcal {A}\) be a randomized SQ algorithm estimating \(\theta \) over \({\mathcal {D}}\), and \(\mathtt {A} = (r_1, \dots ,r_q,{\hat{\theta }}) \sim \mathcal {A}\). Let us write \(d = (\left\lfloor 1 /\tau \right\rfloor +1)^q\), and consider the random subset of \(\Theta \) given by
Note that by construction of the oracle \({\mathsf {O}}\), \(C(\mathtt {A}) \in \left( {\begin{array}{c}{\mathcal {D}}\\ \le d\end{array}}\right) \). Let us consider the probability distribution \(\mu \) over \(\left( {\begin{array}{c}{\mathcal {D}}\\ \le d\end{array}}\right) \) such that the measure of a set S is equal to \(\Pr _{\mathtt {A} \sim \mathcal {A}}[C(\mathtt {A}) \in S].\)
It is clear that if a deterministic algorithm \(\mathtt {A}_0\) estimates \(\theta (D)\) with precision \(\varepsilon \) using the oracle \({\mathsf {O}}\), then \(\theta (D) \in \cup _{t \in C(\mathtt {A}_0)} \mathrm {B}_{(\Theta ,\rho )}(t,\varepsilon )\). As \(\mathcal {A}\) estimates \(\theta \) with precision \(\varepsilon \) and probability at least \(1 - \alpha \) over \({\mathcal {D}}\), this means that \(\mu \) is a probabilistic \((\varepsilon , \alpha )\)-covering of \(\theta ({\mathcal {D}})\) with \((\left\lfloor 1 / \tau \right\rfloor + 1)^q\) points (Definition G.2). As a result, by definition of \(\mathrm {cv}_{(\theta ({\mathcal {D}}), \rho )}(\varepsilon , \alpha )\), we have \((\left\lfloor 1 / \tau \right\rfloor +1)^q \ge \mathrm {cv}_{(\theta ({\mathcal {D}}), \rho )}(\varepsilon , \alpha )\). Finally, from Theorem G.5 we have \(\mathrm {cv}_{(\theta ({\mathcal {D}}), \rho )}(\varepsilon , \alpha ) \ge (1-\alpha )\mathrm {pk}_{(\theta ({\mathcal {D}}), \rho )}(\varepsilon )\), which gives the announced result. \(\square \)
Lower Bounds for Manifold Models
1.1 Diffeomorphisms and Geometric Model Stability
The following result will allow us to build different elements of \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) in a simple way, by considering diffeomorphic smooth perturbations of a base manifold \(M_0\). Here and below, \(I_n\) is the identity map of \({\mathbb {R}}^n\). Given a regular map \(\Phi : {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\), \(d_x\Phi \) and \(d_x^2 \Phi \) stand for its first and second order differentials at \(x \in {\mathbb {R}}^n\).
Proposition H.1
Let \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2 {\mathrm {rch}}_{\min }}\) and \(\Phi : {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) be a proper \({\mathcal {C}}^2\) map, i.e., \(\lim _{\left\| x \right\| \rightarrow \infty } \left\| \Phi (x) \right\| = \infty \). If \(\sup _{x \in {\mathbb {R}}^n} \left\| I_n - d_x \Phi \right\| _{\mathrm {op}} \le 1/(10d)\) and \(\sup _{x \in {\mathbb {R}}^n} \left\| d^2_x \Phi \right\| _{\mathrm {op}} \le 1/\left( 4{\mathrm {rch}}_{\min }\right) \), then \(\Phi \) is a global diffeomorphism, and \( \Phi (M_0) \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }} \). Furthermore, \(1/2 \le {\mathcal {H}}^d(\Phi (M_0))/{\mathcal {H}}^d(M_0) \le 2\).
Proof of Proposition H.1
As \(\sup _x \left\| d_x \Phi - I_n \right\| _{\mathrm {op}} < 1\), \(d_x \Phi \) is invertible for all \(x \in {\mathbb {R}}^n\). Hence, the inverse function theorem yields that \(\Phi \) is everywhere a local diffeomorphism. As, \(\lim _{\left\| x \right\| \rightarrow \infty }\left\| \Phi (x) \right\| = \infty \) this diffeomorphism is global by the Hadamard-Cacciopoli theorem [25]. In particular, \(\Phi (M_0)\) is a compact connected d-dimensional submanifold of \({\mathbb {R}}^n\) without boundary. In addition, by Taylor’s theorem, \(\Phi \) is Lipschitz with constant \(\sup _x \left\| d_x \Phi \right\| _{\mathrm {op}} \le (1 + \sup _x \left\| I_n - d_x \Phi \right\| _{\mathrm {op}} ) \le 11/10\), \(\Phi ^{-1}\) is Lipschitz with constant \(\sup _x\left\| d_x \Phi ^{-1} \right\| _{\mathrm {op}} \le ({1-\sup _x \left\| I_n - d_x \Phi \right\| _{\mathrm {op}} })^{-1} \le 10/9\), and \(d \Phi \) is Lipschitz with constant \(\sup _x \left\| d^2_x \Phi \right\| _{\mathrm {op}} \le 1/(4{\mathrm {rch}}_{\min }) \le 1 / (2{\mathrm {rch}}_{M_0})\). Hence, [31, Theorem 4.19] yields
As a result, we have \(\Phi (M_0) \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\). For the last claim, we use the properties of the Hausdorff measure \({\mathcal {H}}^d\) under Lipschitz maps [1, Lemma 6] to get
and symmetrically,
which concludes the proof. \(\square \)
Among the smooth perturbations \(\Phi : {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) nearly preserving \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\), the following localized bump-like functions will be of particular interest for deriving lower bounds.
Lemma H.1
Let \(\delta , \eta >0\) be positive reals. Fix \(p_1, \dots ,p_N \in {\mathbb {R}}^n\) be such that \(\left\| p_i - p_j \right\| > 2\delta \) for all \(i \ne j \in \left\{ {1, \dots ,N}\right\} \). Given a family of unit vectors \({{\textbf {w}}} = (w_i)_{1 \le i \le N} \in \left( {\mathbb {R}}^n \right) ^N\), we let \(\Phi _{{{\textbf {w}}}}\) be the function that maps any \(x\in {\mathbb {R}}^n\) to
where \(\phi : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) the real-valued bump function defined by
Then \(\Phi _{{\textbf {w}}}\) is \({\mathcal {C}}^\infty \) smooth, \(\lim _{\left\| x \right\| \rightarrow \infty } \left\| \Phi _{{\textbf {w}}}(x) \right\| = \infty \), and \(\Phi _{{\textbf {w}}}\) satisfies \(\sup _{x \in {\mathbb {R}}^n} \left\| x - \right\| { \Phi _{{\textbf {w}}}(x)} \le \eta \),
Proof of Lemma H.1
Straightforward calculations show that the real-valued map \(\phi : {\mathbb {R}}^n \longrightarrow {\mathbb {R}}\) is \({\mathcal {C}}^\infty \) smooth over \({\mathbb {R}}^n\), equals to 0 outside \(\mathrm {B}(0,1)\), and satisfies \(0 \le \phi \le 1\), \(\phi (0) = 1\),
By composition and linear combination of \({\mathcal {C}}^\infty \) smooth functions, \(\Phi _{{\textbf {w}}}\) is therefore \({\mathcal {C}}^\infty \) smooth. Also, \(\Phi _{{\textbf {w}}}\) coincides with the identity map outside the compact set \(\cup _{i=1}^N \mathrm {B}(p_i,\delta )\). Furthermore, for \(i \ne j \in \left\{ {1, \dots ,N}\right\} \), \(\mathrm {B}(p_i, \delta ) \cap \mathrm {B}(p_j,\delta ) = \emptyset \), since \(\left\| p_i-p_j \right\| > 2\delta \). Therefore, if \(x \in \mathrm {B}(p_i,\delta )\), we have \(\Phi _{{{\textbf {w}}}}(x) = x + \eta \phi \left( \frac{x-p_i}{\delta }\right) w_i\). This directly gives \(\sup _{x \in {\mathbb {R}}^n} \left\| x - \Phi _{{\textbf {w}}}(x) \right\| \le \eta \), and by chain rule,
and
which concludes the proof. \(\square \)
1.2 Building a Large-Volume Submanifold with Small Euclidean Diameter
The proofs of Theorems 5.5 and 5.6 will involve the construction of submanifolds \(M \subseteq {\mathbb {R}}^n\) with prescribed and possibly large volume \({\mathcal {H}}^d(M)\). Informally, this will enable us to build hypotheses and packings with large cardinality by local variations of it (see Propositions H.3 and H.5) under nearly minimal assumptions on \(f_{\min }\) (which can be seen as an inverse volume, for uniform distributions). For the reasons mentioned in Sect. 2.2.3, one easily checks that the volume of \(M \in \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) can neither be too small nor too large, when \({\mathrm {rch}}_{\min }\) and R are fixed (Proposition B.5). Conversely, this section is devoted to prove the existence of submanifolds \(M \in \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) that nearly achieve the minimum and maximum possible such volumes given by Proposition B.5.
1.2.1 The Statement
Namely, the goal of Appendix H.2 is to prove the following result.
Proposition H.2
Assume that \({\mathrm {rch}}_{\min } \le R/36\). Writing \(C_d' = 9(2^{2d+1} \sigma _{d - 1})\), let \({\mathcal {V}} > 0\) be such that
Then there exists \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) such that \(M_0 \subseteq \mathrm {B}(0,R)\) and
Informally, in codimension one (i.e., \(D = d+1\)), the manifold \(M_0\) of Proposition H.2 can be though of as the boundary of the offset of a Hilbert curve in \(\mathrm {B}(0,R)\) of prescribed length. This intuition, however, is only limited to codimension one, and requires extra technical developments for general \(d < D\).
Proof of Proposition H.2
Consider the discrete grid \(G_0\) in \({\mathbb {R}}^n\) centered at \(0 \in {\mathbb {R}}^n\), with vertices \(\left( 24 {\mathrm {rch}}_{\min } {\mathbb {Z}}^n \right) \cap \mathrm {B}(0, R / 2)\), and composed of hypercubes of side-length \(24 {\mathrm {rch}}_{\min }\). By considering a \(k_0\)-dimensional sub-grid parallel to the axes, we see that the grid \(G_0\) contains a square grid G with side cardinality \(\kappa = \left\lceil \frac{R/2}{24{\mathrm {rch}}_{\min } \sqrt{k_0}}\right\rceil \), where \(k_0\) belongs to \(\mathop {\hbox {argmax}}\limits _{1 \le k \le n} \left( \frac{R}{48 {\mathrm {rch}}_{\min } \sqrt{k}}\right) ^k\). Let us write \(\ell = \left\lfloor {\mathcal {V}}/(C_d'{\mathrm {rch}}_{\min }^d)\right\rfloor \). By assumption on \({\mathcal {V}}\), \({\mathrm {rch}}_{\min }\) and R, we have
Hence, Lemma H.4 asserts that there exists a connected open simple path \(L_n(\ell )\) in \(G \subseteq G_0\) with length \(| L_n(\ell )| = \ell \). Furthermore, Lemma H.3 applied with reach parameter \(2{\mathrm {rch}}_{\min }\) provides us with a closed d-dimensional submanifold \(M_0'\) of class \(C^{1,1}\) such that \(M_0' = M(L_n(\ell )) \subseteq G^{12{\mathrm {rch}}_{\min }} \subseteq \mathrm {B}(0,R/2)^{12 {\mathrm {rch}}_{\min }} \subseteq \mathrm {B}(0,2R/3)\) since \({\mathrm {rch}}_{\min } \le R/36\), reach \({\mathrm {rch}}_{M_0'} \ge 2 {\mathrm {rch}}_{\min }\). Furthermore, writing \(C_d = 9(2^d \sigma _{d-1})\) for the constant of Lemma H.3, we also have
and
where we used that \(\left\lfloor t\right\rfloor \ge t/2\) for all \(t \ge 1\) To conclude the proof, we use the density of \({\mathcal {C}}^{2}\) submanifolds in the space of \({\mathcal {C}}^{1,1}\) submanifolds to obtain a closed d-dimensional submanifold \(M_0\) of class \({\mathcal {C}}^2\) such that \({\mathrm {rch}}_{M_0} \ge {\mathrm {rch}}_{M_0'}/2 \ge {\mathrm {rch}}_{\min }\), \(\mathrm {d_H}(M_0,M_0') \le {\mathrm {rch}}_{\min }\) (and hence \(M_0 \subset \mathrm {B}(0,2R/3+{\mathrm {rch}}_{\min }) \subset \mathrm {B}(0,R)\)), and \(1/2 \le {\mathcal {H}}^d(M_0)/{\mathcal {H}}^d(M_0') \le 2\) (and hence \({\mathcal {V}}/24 \le {\mathcal {H}}^d(M_0) \le {\mathcal {V}}\)). \(\square \)
1.2.2 Widget Gluing: From Paths on the Discrete Grid to Manifolds
Lemma H.2
Given \({\mathrm {rch}}_{\min } > 0\) and \(d\ge 1\), there exist four d-dimensional \({\mathcal {C}}^{1,1}\)-submanifolds with boundary:
called respectively end, straight, tangent bend and normal bend widgets (see Fig. 3), that:
-
are smooth: \({\mathrm {rch}}_{M_E},{\mathrm {rch}}_{M_S},{\mathrm {rch}}_{M_{TB}}, {\mathrm {rch}}_{M_{NB}} \ge {\mathrm {rch}}_{\min }\);
-
have the following topologies:
-
\(M_E\) is isotopic to a d-ball \(\mathrm {B}_d(0,1)\),
-
\(M_S\), \(M_{TB}\) and \(M_{NB}\) are isotopic to a d-cylinder \({\mathcal {S}}^{d-1} \times [0,1]\);
-
-
are linkable: writing \(s = 6 {\mathrm {rch}}_{\min }\), we have
-
For the tip widget \(M_E\):
-
\( M_E \cap \left( [-s/2 ,s/2]^{d + 1} \right) ^c = M_E \cap \left( [s/2 ,s] \times {\mathbb {R}}^d \right) = [s/2 , s] \times {\mathcal {S}}^{d - 1}(0,s/3) .\)
-
-
For the straight widget \(M_S\):
-
\(M_S \cap \left( [-s/2 ,s/2]^{d + 1} \right) ^c = M_S \cap \left( \left( [-s , - s/2] \times {\mathbb {R}}^d\right) \cup \left( [s/2 ,s] \times {\mathbb {R}}^d \right) \right) , \)
-
\(M_S \cap \left( [-s , - s/2] \times {\mathbb {R}}^d\right) =[-s , -s/2] \times {\mathcal {S}}^{d - 1}(0,s/3)\),
-
\(M_S \cap \left( [s/2 ,s] \times {\mathbb {R}}^d \right) = [s/2 , s] \times {\mathcal {S}}^{d - 1}(0,s/3) . \)
-
-
For the tangent bend widget \(M_{TB}\):
-
\(M_{TB} \cap \left( [-s/2 ,s/2]^{d + 1} \right) ^c = M_{TB} \cap \left( \left( [-s , - s/2] \times {\mathbb {R}}^d \right) \cup \left( {\mathbb {R}}^d \times [-s , - s/2] \right) \right) , \)
-
\(M_{TB} \cap \left( [-s , - s/2] \times {\mathbb {R}}^d \right) = [-s , -s/2] \times {\mathcal {S}}^{d - 1}(0,s/3)\),
-
\(M_{TB} \cap \left( {\mathbb {R}}^d \times [-s , - s/2] \right) = {\mathcal {S}}^{d - 1}(0,s/3) \times [-s , - s/2] . \)
-
-
For the normal bend widget \(M_{NB}\):
-
\(M_{NB} \cap \left( [-s/2 ,s/2]^{d+2}\right) ^c = M_{NB} \cap \left( \left( [-s , - s/2] \times {\mathbb {R}}^{d} \times \left\{ {0}\right\} \right) \cup \left( \left\{ {0}\right\} \times {\mathbb {R}}^{d} \times [-s , - s/2] \right) \right) , \)
-
\(M_{NB} \cap \left( [-s , - s/2] \times {\mathbb {R}}^{d} \times \left\{ {0}\right\} \right) = [-s , -s/2] \times {\mathcal {S}}^{d - 1}(0,s/3) \times \left\{ {0}\right\} \),
-
\( M_{NB} \cap \left( \left\{ {0}\right\} \times {\mathbb {R}}^{d} \times [-s , - s/2] \right) = \left\{ {0}\right\} \times {\mathcal {S}}^{d - 1}(0,s/3) \times [-s , - s/2] . \)
-
Furthermore,
$$\begin{aligned} (C_d/3) {\mathrm {rch}}_{\min }^d \le {\mathcal {H}}^d(M_E),{\mathcal {H}}^d(M_S),{\mathcal {H}}^d(M_{TB}),{\mathcal {H}}^d(M_{NB}) \le C_d {\mathrm {rch}}_{\min }^d , \end{aligned}$$where \(C_d = 9 (2^d \sigma _{d - 1})\) depends only on d.
-
Proof of Lemma H.2
First notice that by homogeneity, we can carry out the construction in the unit hypercubes \([-1,1]^{d + 1}\) (respectively \([-1,1]^{d+2}\)) and conclude by applying an homothetic transformation. Indeed, for all closed set \(K \subseteq {\mathbb {R}}^{n}\) and \(\lambda \ge 0\), \({\mathrm {rch}}_{\lambda K} = \lambda {\mathrm {rch}}_K\) and \({\mathcal {H}}^d(\lambda K) = \lambda ^d {\mathcal {H}}^d(K)\).
-
End widget: the idea is to glue in a \({\mathcal {C}}^2\) way a half d-sphere with a d-cylinder. Namely, let us consider
$$\begin{aligned} M_E^{(0)}&= \left( {\mathcal {S}}^{d}(0,1/3) \cap \left( [-1,0]\times [-1,1]^d \right) \right) \cup \left( [0,1] \times {\mathcal {S}}^{d - 1}(0,1/3) \right) . \end{aligned}$$Elementary calculations yield the intersections
$$\begin{aligned} M_E^{(0)} \cap \left( [-1/2 ,1/2]^{d + 1}\right) ^c = M_E^{(0)} \cap \left( [1/2 ,1] \times {\mathbb {R}}^d \right) = [1/2 , 1] \times {\mathcal {S}}^{d - 1}(0,1/3) . \end{aligned}$$In addition, its medial axis is \({\mathrm {Med}}(M_E^{(0)}) = [0,1]\times \left\{ {0}\right\} ^d\), so that
$$\begin{aligned} {\mathrm {rch}}_{M_E^{(0)}} = \inf _{z \in {\mathrm {Med}}(M_E^{(0)})} \mathrm {d}(z,M_E^{(0)}) = 1/3. \end{aligned}$$Finally, \(M_E^{(0)}\) is isotopic to the half d-sphere \( {\mathcal {S}}^{d}(0,1/3) \cap \left( [-1,0]\times [-1,1]^d \right) \), or equivalently to a d-ball.
-
Straight widget: a simple d-cylinder satisfies our requirements. Similarly as above, the set
$$\begin{aligned} M_S^{(0)}&= [-1,1] \times {\mathcal {S}}^{d - 1}(0,1/3) \end{aligned}$$clearly is (isotopic to) a d-cylinder, has reach \({\mathrm {rch}}_{M_S^{(0)}} = 1/3\), and all the announced intersection properties with \(s=1\).
-
Tangent Bend widget: we will glue two orthogonal straight d-cylinders via a smoothly rotating \((d - 1)\)-sphere. More precisely, consider the d-cylinders \(C_1 = {\mathcal {S}}^{d - 1}(0,1/3) \times [-1,-1/2]\) and \(C_2 = [-1,-1/2] \times {\mathcal {S}}^{d - 1}(0,1/3)\). We will connect smoothly their tips, which are the \((d - 1)\)-spheres \(S_1 = {\mathcal {S}}^{d - 1}(0,1/3) \times \left\{ {-1/2}\right\} \subseteq C_1\) and \(S_2 = \left\{ {-1/2}\right\} \times {\mathcal {S}}^{d - 1}(0,1/3) \subseteq C_2\) of same radius. To this aim, take the trajectory of \(S_1\) via the affine rotations of center \(x_c = (-1/2,0_{{\mathbb {R}}^{d - 1}},-1/2)\) and linear parts
$$\begin{aligned} R_\theta = \begin{pmatrix} \cos \theta &{} \quad 0 &{} \quad \cdots &{} \quad 0 &{} \quad -\sin \theta \\ 0 &{} \quad 1 &{} \quad \cdots &{} \quad 0 &{} \quad 0 \\ \vdots &{}\quad &{}\quad \ddots &{}\quad &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad \cdots &{} \quad 1 &{} \quad 0 \\ \sin \theta &{} \quad 0 &{} \quad \cdots &{} \quad 0 &{} \quad \cos \theta \end{pmatrix} \in {\mathbb {R}}^{(d + 1)\times (d + 1)} , \end{aligned}$$when \(\theta \) varies in \([0,\pi /2]\). Hence, letting \(f_\theta (x) = x_c + R_\theta (x-x_c)\), we have \(f_0(S_1) = S_1\), \(f_{\pi /2}(S_1) = S_2\). In addition, for all \(\theta \in [0,\pi /2]\) and \(x \in [-1/2,1/2]^{d} \times \left\{ {-1/2}\right\} \), we have \(f_\theta (x) \in [-1/2,1/2]^{d + 1}\). Hence, letting
$$\begin{aligned} M_{TB}^{(0)} = C_1 \cup \biggl ( \bigcup _{0 \le \theta \le \pi /2} f_\theta (S_1) \biggr ) \cup C_2 , \end{aligned}$$we directly get that \(M_{TB}^{(0)}\) is isotopic to a d-cylinder, and that it satisfies all the announced intersection properties with \(s=1\). To conclude, by symmetry, the medial axis of this widget writes as
$$\begin{aligned}&{\mathrm {Med}}(M_{TB}^{(0)}) = \left\{ {0}\right\} ^{d}\times [-1,-1/2] \\ {}&\quad \cup \biggl ( x_c + \bigcup _{t \ge 0} (-t,0_{{\mathbb {R}}^{d-1}},-t) \biggr ) \cup [-1,-1/2]\times \left\{ {0}\right\} ^{d} , \end{aligned}$$so that straightforward calculations yield \({\mathrm {rch}}_{M_{TB}^{(0)}} = \min \left\{ {1/3, \mathrm {d}(x_c,M_{TB}^{(0)}) }\right\} = 1/6\).
-
Normal Bend widget: same as for the tangent bend widget, we glue the two orthogonal straight d-cylinders \(C_1 = \left\{ {0}\right\} \times {\mathcal {S}}^{d-1}(0,1/3) \times [-1,-1/2]\) and \(C_2 = [-1,-1/2] \times {\mathcal {S}}^{d-1}(0,1/3) \times \left\{ {0}\right\} \). via their respective tips, \(S_1 = \left\{ {0}\right\} \times {\mathcal {S}}^{d-1}(0,1/3) \times \left\{ {-1/2}\right\} \subseteq C_1\) and \(S_2 = \left\{ {-1/2}\right\} \times {\mathcal {S}}^{d-1}(0,1/3) \left\{ {0}\right\} \subseteq C_2\). To this aim, take trajectory of \(S_1\) via the affine rotation of center \(x_c = (-1/2,0_{{\mathbb {R}}^{d}},-1/2)\) and linear parts \(R_\theta \in {\mathbb {R}}^{(d+2)\times (d+2)}\) for \(\theta \in [0,\pi /2]\). As before, letting \(f_\theta (x) = x_c + R_\theta (x-x_c)\), we have \(f_0(S_1) = S_1\), \(f_{\pi /2}(S_1) = S_2\). Also, for all \(\theta \in [0,\pi /2]\) and \(x \in \left\{ {0}\right\} \times [-1/2,1/2]^{d} \times \left\{ {-1/2}\right\} \), we have \(f_\theta (x) \in [-1/2,1/2]^{d + 1}\). Hence, letting
$$\begin{aligned} M_{NB}^{(0)} = C_1 \cup \biggl ( \bigcup _{0 \le \theta \le \pi /2} f_\theta (S_1) \biggr ) \cup C_2 , \end{aligned}$$we get the announced results with \(s=1\), and in a similar way as above, \({\mathrm {rch}}_{M_{NB}^{(0)}} = \min \left\{ {1/3,1/2}\right\} = 1/3\).
Also one easily checks in all the four above cases that
Finally, letting
and considering the dilations \(M_E = (C {\mathrm {rch}}_{\min }) M_E^{(0)}\), \(M_S = (C {\mathrm {rch}}_{\min }) M_S^{(0)}\), \(M_{TB} = (C {\mathrm {rch}}_{\min }) M_{TB}^{(0)}\) and \(M_{NB} = (C {\mathrm {rch}}_{\min }) M_{NB}^{(0)}\) yields the result by homogeneity, with \(C_d = C^d \sigma _{d-1}/3^{d-2} = 9 (2^d \sigma _{d-1})\). \(\square \)
Lemma H.3
Let G be a discrete grid in \({\mathbb {R}}^n\) composed of hypercubes of side-length \(12 {\mathrm {rch}}_{\min }\). Then any connected open simple path L in G (see Lemma H.4) defines a \({\mathcal {C}}^{1,1}\) d-dimensional closed submanifold, denoted by M(L), such that:
-
\(M(L) \subseteq G^{6{\mathrm {rch}}_{\min }}\);
-
\(M(L) \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\);
-
\(C_d/3 \le \dfrac{{\mathcal {H}}^d(M(L))}{|L| {\mathrm {rch}}_{\min }^d} \le C_d\), where \(C_d\) is the constant of Lemma H.2;
-
If L and \(L'\) are two different such paths in G,
$$\begin{aligned} \mathrm {d_H}(M(L),M(L')) > 2 {\mathrm {rch}}_{\min } . \end{aligned}$$
Remark E.1
The construction of Lemma H.3 shows that, given one discrete path L, one could actually define several different manifolds M(L) with the same properties. We will not exploit this fact as the construction is enough for our purpose.
Proof of Lemma H.3
For short, we let \(s = 6 {\mathrm {rch}}_{\min }\). Let L be a fixed connected open simple path on G. If \(|L| = 1\), take M(L) to be a d-sphere of radius \(2{\mathrm {rch}}_{\min }\) centered at the only vertex of L. Assuming now that \(|L|\ge 2\), we will build M(L) iteratively by adding appropriate widgets of Lemma H.2 along the consecutive vertices that L goes through. We pick one of the two degree 1 vertices (endpoints) of L arbitrarily, and denote the consecutive vertices of L as \(x_1, \dots ,x_{[L|-1}\).
-
(I)
The path L has exactly one edge at \(x_0\), called \(v^+_0\), which is parallel to the axes of \({\mathbb {R}}^n\) since G is the square grid. In the cube \(x_0 + [- 6{\mathrm {rch}}_{\min },6{\mathrm {rch}}_{\min }]^n\), we define M(L) to coincide with the End widget \(M_E \times \left\{ {0}\right\} ^{n-(d + 1)}\), rotated in the \((e_1,v^+_0)\) plane so that \(-e_1\) is sent on \(v^+_0\). In this first cube, M(L) hence presents a d-cylinder, obtained by a rotation of \([-s,-s/2]\times {\mathcal {S}}^{d-1}(0,s/3) \times \left\{ {0}\right\} ^{n-(d + 1)}\) around \(x_0\), and pointing toward \(v^+_0\). Let us call this cylinder \(C^+_0\).
-
(II)
Assume now that we have visited the consecutive vertices \(x_0, \dots ,x_{k-1}\) of L, for some \(k\ge 1\), and that in the cube around \(x_{k-1}\), M(L) presents a cylinder \(C^+_{k-1}\) in the direction \(v^+_{k-1}\). If \(x_k\) is not the other endpoint of L, there are exactly two edges at \(x_k\), represented by the axis-parallel vectors \(v^-_k = (x_{k-1}-x_k) = -v^+_{k-1}\) and \(v^+_k = (x_{k + 1}-x_k)\). There are three possible cases depending on the turn that L takes at \(x_k\):
-
(a)
If \(v^-_k\) and \(v^+_k\) are aligned, take \(M(L) \cap \left( x_k + [-s,s]^n \right) \) to coincide with the Straight widget \(M_S \times \left\{ {0}\right\} ^{n-(d + 1)}\), rotated in the \(\{e_1,v^+_k\}\)-plane so that \(e_1\) is sent on \(v^+_k\).
-
(b)
If \(v^+_k\) belongs to the \((d + 1)\)-plane spanned by \(C^+_{k-1}\) but \(v^-_k\) and \(v^+_k\) are not aligned, proceed similarly by rotating the Tangent Bend widget \(M_{TB} \times \left\{ {0}\right\} ^{n-(d + 1)}\) so that \((e_1,e_{d + 1})\) is sent on \((-v^-_{k-1},v^+_k)\).
-
(c)
Otherwise, if \(v^+_k\) does not belong to the \((d + 1)\)-plane spanned by \(C^+_{k-1}\), then \(\{v^+_k, C^+_{k-1}\}\) defines a \((d+2)\)-plane. Hence, we proceed similarly by rotating the Normal Bend widget \(M_{NB} \times \left\{ {0}\right\} ^{n-(d+2)}\) so that \((e_1,e_{d+2})\) is sent on \((-v^-_{k-1},v^+_k)\). Note that this case can only occur if \(n \ge d+2\).
-
(a)
-
(III)
If we reached the other endpoint of L (\(k=|L|-1\)), add a rotated End widget oriented in the direction of \(C^+_{k-1}\).
Now that the construction of M(L) has been carried out, let us move to its claimed properties.
-
By construction of the widgets and the fact that all of them are centered at points of the grid G, M(L) is included in the offset of G of radius \(6{\mathrm {rch}}_{\min }\).
-
By induction on the length of the path, it is clear that the union of the straight and bend widgets (without the ends) is isotopic to a cylinder \({\mathcal {S}}^{d-1}(0,1) \times [0,1]\). As a result, adding the two end widgets at the endpoints of the path yields that M(L) is isotopic to a d-dimensional sphere \({\mathcal {S}}^{d}(0,1)\). It is also clear that M(L) connected, by connectedness of L. In particular, M(L) is a compact connected d-dimensional submanifold of \({\mathbb {R}}^n\) without boundary.
What remains to be proved is that \({\mathrm {rch}}_{M(L)} \ge {\mathrm {rch}}_{\min }\). To see this, notice that by construction, the widgets connect smoothly through sections of facing straight cylinders \(C^\pm = {\mathcal {S}}^{d-1}(0,s/3) \times [0,\pm s/2] \times \left\{ {0}\right\} ^{n-(d + 1)}\) (rotated), which are included in the boxes \([-s/2,s/2]^n\) centered a the midpoints of the grid. Apart from these connected ingoing and outgoing cylinders, the widgets are included in boxes \([-s/2,s/2]^n\), which are separated by a distance s. Hence, if two points \(x,y \in M(L)\) are such that \(\left\| y-x \right\| \le s/2\), then they must belong to either the same widget or the same connecting cylinder \(C^- \cup C^+\). As a result, from [31, Theorem 4.18] and the fact that \(\mathrm {d}(y-x,T_x M(L)) \le \left\| y-x \right\| \) for all \(x \in M(L)\), we get
$$\begin{aligned} {\mathrm {rch}}_{M(L)}&= \inf _{x \ne y \in M(L)} \frac{\left\| y-x \right\| ^2}{2 \mathrm {d}(y-x,T_x M(L))} \\&= \min \left\{ { \inf _{\begin{array}{c} x, y \in M(L) \\ \left\| y-x \right\| \ge s/2 \end{array}} \frac{\left\| y-x \right\| ^2}{2 \mathrm {d}(y-x,T_x M(L))} , \inf _{\begin{array}{c} x \ne y \in M(L) \\ \left\| y-x \right\| \le s/2 \end{array}} \frac{\left\| y-x \right\| ^2}{2 \mathrm {d}(y-x,T_x M(L))} }\right\} \\&\ge \min \left\{ { s/4 , \min \left\{ { {\mathrm {rch}}_{M_E},{\mathrm {rch}}_{M_S},{\mathrm {rch}}_{M_{TB}},{\mathrm {rch}}_{M_{NB}} }\right\} }\right\} \\&\ge \min \left\{ {6{\mathrm {rch}}_{\min }/4,{\mathrm {rch}}_{\min }}\right\} \\&= {\mathrm {rch}}_{\min } , \end{aligned}$$which ends proving that \(M(L) \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\).
-
As M(L) is the union of |L| of the widgets defined in Lemma H.2, it follows
$$\begin{aligned} {\mathcal {H}}^d(M(L))&\le |L| \max \left\{ { {\mathcal {H}}^d(M_E),{\mathcal {H}}^d(M_S),{\mathcal {H}}^d(M_{TB}),{\mathcal {H}}^d(M_{NB}) }\right\} \\&\le |L| C_d {\mathrm {rch}}_{\min }^d , \end{aligned}$$and similarly, as the intersection of the consecutive widgets (i.e. \((d-1)\)-spheres) is \({\mathcal {H}}^d\)-negligible, we have
$$\begin{aligned} {\mathcal {H}}^d(M(L))&\ge |L| \min \left\{ { {\mathcal {H}}^d(M_E),{\mathcal {H}}^d(M_S),{\mathcal {H}}^d(M_{TB}),{\mathcal {H}}^d(M_{NB}) }\right\} \\&\ge |L| (C_d/3) {\mathrm {rch}}_{\min }^d . \end{aligned}$$ -
Let us now fix two different connected open simple paths L and \(L'\) in G. Since \(L \ne L'\), L passes through a vertex, say \(x_0 \in {\mathbb {R}}^n\), where \(L'\) doesn’t. Regardless of the widget used at \(x_0\) to build M(L), this widget contains, up to rotation centered at \(x_0\), the set \(x_0 + \left\{ {-s/2}\right\} \times {\mathcal {S}}^{d - 1}(0,s/3) \times \left\{ {0}\right\} ^{n-(d + 1)}\). As a result, \(\mathrm {d}(x_0, M(L)) \le \sqrt{(s/2)^2+(s/3)^2}\). On the other hand, \(M(L')\) does not intersect the cube \(x_0 + [-s, s]^n\), so \(\mathrm {d}(x_0, M(L')) \ge s\). Finally, we get
$$\begin{aligned} \mathrm {d_H}(M(L),M(L'))&= \sup _{x \in {\mathbb {R}}^n} \left| \mathrm {d}(x,M(L')) - \mathrm {d}(x,M(L)) \right| \\&\ge \left| \mathrm {d}(x_0,M(L')) - \mathrm {d}(x_0,M(L)) \right| \\&\ge s - \sqrt{(s/2)^2+(s/3)^2} \\&= 6 (1-\sqrt{13}/6) {\mathrm {rch}}_{\min } \\&> 2 {\mathrm {rch}}_{\min } , \end{aligned}$$which concludes the proof.
\(\square \)
1.2.3 Existence of Long Paths on the Grid
In order to complete the construction of Proposition H.2, we need the existence of paths of prescribed length over the n-dimensional discrete grid. Although standard, we include this construction for sake of completeness.
Lemma H.4
Let \(\kappa \ge 1\) be an integer and consider the square grid graph \(G_n\) on \(\left\{ {1, \dots , \kappa }\right\} ^n\). Then for all \(\ell \in \left\{ {1, \dots , \kappa ^n}\right\} \), there exists a connected open simple path \(L_n(\ell )\) of length \(\ell \) in \(G_n\). That is, \(L_n(\ell )\) is a subgraph of \(G_n\) such that:
-
\(L_n(\ell )\) is connected;
-
\(L_n(\ell )\) has vertex cardinality \(\ell \);
-
if \(\ell \ge 2\), \(L_n(\ell )\) has maximum degree 2, and exactly two vertices with degree 1.
Proof of Lemma H.4
For \(\kappa =1\), \(G_n\) consists of a single point, so that the result is trivial. We hence assume that \(\kappa \ge 2\). Let us first build the paths \(L_n = L_n(\kappa ^n)\) by induction on n. For \(n = 1\), simply take \(L_1\) to be the full graph \(G_n\). We orientate \(L_1\) by enumerating its adjacent vertices in order: \(L^\rightarrow _1 [i] = i\) for all \(1 \le i \le \kappa \). Given an orientation \(L^\rightarrow \) of some path L in \(G_n\), we also let \(L^\leftarrow [i] = L^\rightarrow [|L| - i]\) denote its backwards orientation. Now, assume that we have built \(L_n\) for some \(n \ge 1\), together with an orientation \(L^\rightarrow _n\). To describe \(L_{n + 1}\), we list an orientation \(L_{n + 1}^\rightarrow \) of it: an edge of \(G_n\) hence belongs to \(L_n\) if an only if it joins two consecutive vertices in \(L_{n + 1}^\rightarrow \). Namely, for \(1 \le i \le \kappa ^n\), we let
where for the last line, \(\leftrightarrow \) stands for \(\rightarrow \) if \(\kappa \) is odd, and \(\leftarrow \) otherwise. \(L_{n + 1}\) clearly is connected and visits all the vertices \(\left\{ {1, \dots , \kappa }\right\} ^n\). Its edges all have degree two, except \(\left( L_{n}^\rightarrow [1], 1\right) \) and \(\left( L_{n}^\leftrightarrow [\kappa ^n], \kappa \right) \) which have degree 1, which concludes the construction of \(L_n = L_n(\kappa ^n)\). To conclude the proof, take \(L_n(\ell )\) (\(1 \le \ell \le \kappa ^n\)) to be the first \(\ell \) consecutive vertices of \(L_n^\rightarrow (\kappa ^n)\). \(\square \)
1.3 Informational Lower Bounds: Hypotheses for Le Cam’s Lemma
This section is devoted to prove the two informational lower bounds Theorems 5.2 and 5.5. We will use the general informational lower bound from Theorem G.1 in the models \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) and \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) respectively, and parameter of interest \(\theta (D) = \mathrm {Supp}(D)\) that lies in the metric space formed by the non-empty compact sets of \({\mathbb {R}}^n\) equipped with the metric \(\rho = \mathrm {d_H}\).
1.3.1 Construction of the Hypotheses
First, we show how to build hypotheses, i.e probability distributions for Le Cam’s Lemma (Theorem G.1). We present a generic construction in the manifold setting by perturbing a base submanifold \(M_0\). Note that the larger the volume \({\mathcal {H}}^d(M_0)\), the stronger the result. See also Proposition H.5 for a result similar in spirit, and used to derive computational lower bounds instead of informational ones.
Proposition H.3
For all \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2{\mathrm {rch}}_{\min }}\), \(x_0 \in M_0\) and \(\tau \le 1\), there exists a manifold \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) such that \(x_0 \in M_1\), \({\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_1) \le 2{\mathcal {H}}^d(M_0)\),
and so that the uniform distributions \(D_0,D_1\) over \(M_0,M_1\) satisfy \( \mathop {\mathrm {TV}}(D_0, D_1) \le \tau / 2 . \)
Proof of Proposition H.3
Let \(p_0 \in M_0\) be an arbitrary point such that \(\left\| p_0-x_0 \right\| \ge {\mathrm {rch}}_{\min }\). For instance, by taking the geodesic variation \(p_0 = \gamma _{x_0,v_0}(2{\mathrm {rch}}_{\min })\), where \(v_0 \in T_{x_0} M_0\) is a unit tangent vector, a Taylor expansion of \(\gamma _{x_0,v_0}\) and Lemma 2.2 yields
since \({\mathrm {rch}}_{M_0}\ge 2 {\mathrm {rch}}_{\min }\). Let us denote by \(w_0 \in \left( T_{p_0} M_0\right) ^\perp \) a unit normal vector of \(M_0\) at \(p_0\). For \(\delta ,\eta >0\) to be chosen later, let \(\Phi _{w_0}\) be the function that maps any \(x\in {\mathbb {R}}^n\) to
where \(\phi : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is the real bump function \(\phi (y) = \exp \left( -{\left\| y \right\| ^2}/{(1-\left\| y \right\| ^2)} \right) \mathbb {1}_{\mathrm {B}(0,1)}(y)\) of Lemma H.1. We let \(M_1 = \Phi _{w_0}(M_0)\) be the image of \(M_0\) by \(\Phi _{w_0}\). Roughly speaking, \(M_0\) and \(M_1\) only differ by a bump of width \(\delta \) and height \(\eta \) in the neighborhood of \(p_0\). Note by now that \(\Phi _{w_0}\) coincides with the identity map outside \(\mathrm {B}(p_0,\delta )\) and in particular, \(p_0 = \Phi _{w_0}(p_0) \in M_1\) as soon as \(\delta \le {\mathrm {rch}}_{\min }\).
Combining Proposition H.1 and Lemma H.1, we get that \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) and \({\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_1) \le 2 {\mathcal {H}}^d(M_0)\) as soon as
Under these assumptions, we have in particular that \( \mathrm {d_H}(M_0, M_1) \le \left\| \Phi _{w_0} - I_n \right\| _\infty \le \eta \le {\mathrm {rch}}_{\min }/10 . \) Also, by construction, \(\Phi _{w_0}(p_0) = p_0 + \eta w_0\) belongs to \(M_1\), so that
since \(w_0 \in \left( T_{p_0} M_0 \right) ^\perp \) [31, Theorem 4.8 (12)]. Let us now consider the uniform probability distributions \(D_0\) and \(D_1\) over \(M_0\) and \(M_1\) respectively. These distributions have respective densities \(f_i = {\mathcal {H}}^d(M_i)^{-1} \mathbb {1}_{M_i}\) (\(i \in \left\{ {0,1}\right\} \)) with respect to the d-dimensional Hausdorff measure \({\mathcal {H}}^d\) on \({\mathbb {R}}^n\). Furthermore, \(\Phi _{w_0}\) is a global diffeomorphism that coincides with the identity map on \(\mathrm {B}(p_0,\delta )^c\). As a result, since \(\frac{5\eta }{2\delta } \le \frac{1}{10d} \le (2^{1/d} - 1)\), [5, Lemma D.2] yields that for \(\delta \le {\mathrm {rch}}_{\min }/2\),
where we applied the upper bound of Lemma B.1 to get the last inequality, using that \({\mathrm {rch}}_{M_0} \ge 2{\mathrm {rch}}_{\min }\).
Finally, setting \(\eta = \delta ^2/(92{\mathrm {rch}}_{\min })\) yields a valid choice of parameters for all \(\delta \le {\mathrm {rch}}_{\min }/(2300d)\). Hence, we have shown that for all \(\delta \le {\mathrm {rch}}_{\min }/(2^{12}d) \le {\mathrm {rch}}_{\min }/(2300d)\),
Equivalently, setting \(\tau /2 = 12(2^d \omega _d \delta ^d)/{\mathcal {H}}^d(M_0)\) and \(\tau _{(0)} := 24 \omega _d ({\mathrm {rch}}_{\min }/(2^{11}d))^d/{\mathcal {H}}^d(M_0)\), we have shown that for all \(\tau \le \tau _{(0)}\), there exists \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) such that
We conclude the proof for \(\tau \le \tau _{(0)}\) by further bounding the term
Otherwise, if \(\tau > \tau _{(0)}\), then the above construction applied with \(\tau _{(0)}\) yields the existence of some \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) with the same properties, and
Summing up the two cases above, for all \(\tau \le 1\) we have exhibited some \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) with properties as above, \(\mathop {\mathrm {TV}}(D_0,D_1) \le \tau /2\) and
which concludes the proof. \(\square \)
Applying the technique of Proposition H.3 with manifolds \(M_0\) having largest possible volume (typically of order \(1/f_{\min }\)) in the models \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) and \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) yields the following result. The proof follows the ideas of [7, Lemma 5]. To our knowledge, the first result of this type dates back to [38, Theorem 6].
Lemma H.5
-
Assume that \(f_{\min } \le f_{\max }/4\) and that
$$\begin{aligned} 2^{d + 1}\sigma _d f_{\min } {\mathrm {rch}}_{\min }^d \le 1 . \end{aligned}$$Then for all \(\tau \le 1\), there exist \(D_0,D_1 \in \left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) with respective supports \(M_0\) and \(M_1\) such that
$$\begin{aligned} \mathrm {d_H}(M_0,M_1)&\ge \frac{{\mathrm {rch}}_{\min }}{2^{20}} \min \left\{ { \frac{1}{2^{20}d^2} , \left( \frac{\tau }{\omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \right) ^{2/d} }\right\} \text { and } \mathop {\mathrm {TV}}(D_0,D_1) \\ {}&\le \tau /2 . \end{aligned}$$ -
Assume that \({\mathrm {rch}}_{\min } \le R/144\) and \(f_{\min } \le f_{\max }/96\). Writing \(C_d' = 9(2^{2d+1} \sigma _{d - 1})\), assume that
$$\begin{aligned} \min _{1 \le k \le n} \left( \frac{192{\mathrm {rch}}_{\min } \sqrt{k}}{R}\right) ^k \le 2^{d + 1} C_d' f_{\min } {\mathrm {rch}}_{\min }^d \le 1 . \end{aligned}$$Then for all \(\tau \le 1\), there exist \(D_0,D_1 \in \mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) with respective supports \(M_0\) and \(M_1\) such that
$$\begin{aligned} \mathrm {d_H}(M_0,M_1)&\ge \frac{{\mathrm {rch}}_{\min }}{2^{30}} \min \left\{ { \frac{1}{2^{10}d^2} , \left( \frac{\tau }{\omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \right) ^{2/d} }\right\} \text { and } \mathop {\mathrm {TV}}(D_0,D_1) \\ {}&\le \tau /2 . \end{aligned}$$
Proof of Lemma H.5
For both models, the idea is to first build a manifold \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2{\mathrm {rch}}_{\min }}\) with prescribed volume close to \(1/f_{\min }\), and then consider the variations of it given by Proposition H.3.
-
Let \(M_0\) be a d-dimensional sphere of radius \(r_0 = \left( \frac{1}{2\sigma _d f_{\min }} \right) ^{1/d}\) in \({\mathbb {R}}^{d + 1}\times \left\{ {0}\right\} ^{n-(d + 1)} \subseteq {\mathbb {R}}^n\) containing \(x_0 = 0 \in {\mathbb {R}}^n\). By construction, \({\mathrm {rch}}_{M_0} = r_0 \ge 2 {\mathrm {rch}}_{\min }\), so that \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2{\mathrm {rch}}_{\min }}\), and one easily checks that \({\mathcal {H}}^d(M_0) = 1/(2f_{\min })\). For all \(\tau \le 1\), Proposition H.3 asserts that there exists a manifold \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) such that \(x_0 \in M_1\), with volume
$$\begin{aligned} 1 / f_{\max } \le 1/(4f_{\min }) \le {\mathcal {H}}^d(M_0) \le {\mathcal {H}}^d(M_1) \le 2 {\mathcal {H}}^d(M_0) \le 1/f_{\min } , \end{aligned}$$such that
$$\begin{aligned} \mathrm {d_H}(M_0,M_1)&\ge \frac{{\mathrm {rch}}_{\min }}{2^{18}} \min \left\{ { \frac{1}{2^{22}d^2} , \left( \frac{\tau }{2 \omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \right) ^{2/d} }\right\} \\&\ge \frac{{\mathrm {rch}}_{\min }}{2^{20}} \min \left\{ { \frac{1}{2^{20}d^2} , \left( \frac{\tau }{\omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \right) ^{2/d} }\right\} , \end{aligned}$$and with respective uniform distributions \(D_0\) and \(D_1\) over \(M_0\) and \(M_1\) that satisfy \(\mathop {\mathrm {TV}}(D_0,D_1)\le \tau /2\). Since the densities of \(D_0\) and \(D_1\) are constant and equal to \({\mathcal {H}}^d(M_0)^{-1}\) and \({\mathcal {H}}^d(M_1)^{-1}\) respectively, the bounds on the volumes of \(M_0\) and \(M_1\) show that \(D_0,D_1 \in \left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L=0}) \subseteq \left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\), which concludes the proof.
-
Let \(M_0 \subseteq {\mathbb {R}}^n\) be a submanifold given by Proposition H.2 applied with parameters \({\mathrm {rch}}_{\min }' = 2{\mathrm {rch}}_{\min }\), \({\mathcal {V}} = 1/(2f_{\min })\) and \(R' = R/2\). That is, \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2 {\mathrm {rch}}_{\min }}\) is such that \(1/(48 f_{\min }) \le {\mathcal {H}}^d(M_0) \le 1/(2 f_{\min })\) and \(M_0 \subseteq \mathrm {B}(0,R/2)\). For all \(\tau \le 1\), Proposition H.3 asserts that there exists a manifold \(M_1 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) such that \(\mathrm {d_H}(M_0,M_1) \le {\mathrm {rch}}_{\min }/10\), with volume
$$\begin{aligned} 1/f_{\max } \le 1/(96f_{\min }) \le {\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_1) \le 2 {\mathcal {H}}^d(M_0) \le 1/f_{\min } , \end{aligned}$$and
$$\begin{aligned} \mathrm {d_H}(M_0,M_1)&\ge \frac{{\mathrm {rch}}_{\min }}{2^{18}} \min \left\{ { \frac{1}{2^{22}d^2} , \left( \frac{\tau }{48 \omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \right) ^{2/d} }\right\} \\&\ge \frac{{\mathrm {rch}}_{\min }}{2^{30}} \min \left\{ { \frac{1}{2^{10}d^2} , \left( \frac{\tau }{\omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \right) ^{2/d} }\right\} , \end{aligned}$$and such that the respective uniform distributions \(D_0\) and \(D_1\) over \(M_0\) and \(M_1\) satisfy \(\mathop {\mathrm {TV}}(D_0,D_1)\le \tau /2\). Because \(M_0 \subseteq \mathrm {B}(0,R/2)\) and \(\mathrm {d_H}(M_0,M_1) \le {\mathrm {rch}}_{\min }/10 \le R/2\), we immediately get that \(M_1 \subseteq \mathrm {B}(0,R/2 + R/2) = \mathrm {B}(0,R)\). As a result, this family clearly provides the existence of the announced \(\varepsilon \)-packing of \(\bigl ( \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}, \mathrm {d_H}\bigr )\). As above, the bounds on the volumes of \(M_0\) and \(M_1\) show that \(D_0,D_1 \in \mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L=0}) \subseteq \mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\), which concludes the proof.
\(\square \)
1.3.2 Proof of the Informational Lower Bounds for Manifold Estimation
With all the intermediate results above, the proofs of Theorems 5.2 and 5.5 follow straightforwardly.
Proof of Theorems 5.2 and 5.5
These are direct applications of Theorem G.1 for parameter of interest \(\theta (D) = \mathrm {Supp}(D)\) and distance \(\rho = \mathrm {d_H}\), with the hypotheses \(D_0,D_1\) of the models \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) and \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) given by Lemma H.5. \(\square \)
1.4 Computational Lower Bounds: Packing Number of Manifold Classes
We now prove the computational lower bounds Theorems 5.3 and 5.6. For this, and in order to apply Theorem G.2, we build explicit packings of the manifold classes. To study the two models and the different regimes of parameters, we exhibit two types of such packings. The first ones that we describe (Proposition H.4) use translations of a fixed manifold \(M_0\) in the ambient space, and are called ambient packings (see Appendix H.4.1). The second ones (Proposition H.5) use a local smooth bumping strategy based on a fixed manifold \(M_0\), and are called intrinsic packings (see Appendix H.4.2). Finally, the proof of the computational lower bounds are presented in Appendix H.4.3.
1.4.1 Global Ambient Packings
To derive the first manifold packing lower bound, we will use translations in \({\mathbb {R}}^n\) and the following lemma.
Lemma H.6
La K be a compact subset of \({\mathbb {R}}^n\). Given \(v\in {\mathbb {R}}^n\), let \(K_v = \left\{ {p+v,p \in K}\right\} \) be the translation of K by the vector v. Then \(\mathrm {d_H}(K,K_v)= \left\| v \right\| \).
Proof of Lemma H.6
If \(v=0\), the result is straightforward, so let us assume that \(v \ne 0\). Since K is compact, the map g defined for \(p \in K\) by \(g(p) = \left\langle {v/\left\| v \right\| }, {p} \right\rangle \) attains its maximum at some \(p_0 \in K\). But by definition of \(K_v\), \(p_0+v \in K_v\), so
On the other hand, for all \(p \in K\) we have \(p+v \in K_v\), yielding \(\mathrm {d}(p,K_v) \le \left\| v \right\| \), and symmetrically \(\mathrm {d}(p+v,K) \le \left\| v \right\| \). Therefore \(\mathrm {d_H}(K,K_v) \le \left\| v \right\| \), which concludes the proof. \(\square \)
As a result, packings of sets in \({\mathbb {R}}^n\) naturally yields packings in the manifold space, by translating a fixed manifold \(M_0 \subset {\mathbb {R}}^n\). With this remark in mind, we get the following ambient packing lower bound.
Proposition H.4
Assume that \({\mathrm {rch}}_{\min } \le R/24\). Writing \(C_d = 9(2^d \sigma _{d - 1})\), let \({\mathcal {V}}>0\) be such that
Then for all \(\varepsilon \le R/2\),
and such a packing can be chosen so that all its elements M have volume \({\mathcal {V}}/6 \le {\mathcal {H}}^d(M) \le {\mathcal {V}}\).
Proof of Proposition H.4
Let \(z_1, \dots ,z_N \in \mathrm {B}(0,R/2)\) be a r-packing of \(\mathrm {B}(0,R/2)\). From Proposition B.4, such a packing can be taken so that \(N \ge (R/(4r))^n\). Applying Proposition H.2 with parameters \({\mathrm {rch}}_{\min }\), \({\mathcal {V}}\) and \(R' = R/2\), we get the existence of some \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) such that \({\mathcal {V}}/6 \le {\mathcal {H}}^d(M_0) \le {\mathcal {V}}\) and \(M_0 \subseteq \mathrm {B}(0,R/2)\). Note that for all \(z \in \mathrm {B}(0,R/2)\), the translation \(M_z = \left\{ {p + z,p \in M_0}\right\} \) belongs to \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\), has the same volume as \(M_0\), and satisfies \(M_z \subseteq \mathrm {B}(0,R/2+\left\| z \right\| ) \subseteq \mathrm {B}(0,R)\). In addition, Lemma H.6 asserts that for all \(z,z' \in \mathrm {B}(0,R/2)\), \(\mathrm {d_H}(M_z,M_{z'}) = \left\| z-z' \right\| \). In particular, for all \(i \ne j \in \left\{ {1, \dots ,N}\right\} \), \(\mathrm {d_H}(M_{z_i},M_{z_j}) = \left\| z_i-z_j \right\| > 2r\). As a result, the family \(\left\{ {M_{z_i}}\right\} _{1 \le i \le N}\) provides us with an r-packing of \(\bigl ( \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }} , \mathrm {d_H}\bigr )\) with cardinality \(N \ge (R/(4r))^n\), and composed of submanifold with volume \({\mathcal {V}}/6 \le {\mathcal {H}}^d(M) \le {\mathcal {V}}\), which concludes the proof. \(\square \)
1.4.2 Local Intrinsic Packings
In the same spirit as Proposition H.3 for informational lower bounds, the following result allows to build packings of manifold classes by small perturbations of a base submanifold \(M_0\). Note, again, that the larger the volume \({\mathcal {H}}^d(M_0)\), the stronger the result.
Proposition H.5
For all \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2{\mathrm {rch}}_{\min }}\) and \(r \le {\mathrm {rch}}_{\min }/(2^{34} d^2)\), there exists a family of submanifolds \(\left\{ {M_s}\right\} _{1 \le s \le {\mathcal {N}}} \subseteq {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) with cardinality \({\mathcal {N}}\) such that
and that satisfies:
-
\(M_0\) and \(\left\{ {M_s}\right\} _{1 \le s \le {\mathcal {N}}}\) have a point in common: \(M_0 \cap \bigl ( \cap _{1 \le s \le {\mathcal {N}}} M_s \bigr ) \ne \emptyset \).
-
For all \(s\in \left\{ {1, \dots , {\mathcal {N}}}\right\} \),
$$\begin{aligned} \mathrm {d_H}(M_0,M_s) \le 23r \text { and } {\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_s) \le 2 {\mathcal {H}}^d(M_0) . \end{aligned}$$ -
For all \(s \ne s' \in \left\{ {1, \dots , {\mathcal {N}}}\right\} \) \(\mathrm {d_H}(M_s,M_{s'}) > 2r\).
Proof of Proposition H.5
For \(\delta \le {\mathrm {rch}}_{\min }/8\) to be chosen later, let \(\left\{ {p_i}\right\} _{1\le i \le N}\) be a maximal \(\delta \)-packing of \(M_0\). From Proposition B.3, this maximal packing has cardinality \(N \ge \frac{{\mathcal {H}}^d(M_0)}{\omega _d (4\delta )^d}\).
Let \(\eta >0\) be a parameter to be chosen later. Given a family of unit vectors \({{\textbf {w}}} = (w_i)_{1 \le i \le N} \in \left( {\mathbb {R}}^n \right) ^N\) normal at the \(p_i\)’s, i.e., \(w_i \in \left( T_{p_i} M \right) ^\perp \) and \(\left\| w_i \right\| =1\), we let \(\Phi _{{{\textbf {w}}}}\) be the function defined in Lemma H.1, that maps any \(x\in {\mathbb {R}}^n\) to
where \(\phi : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is the real bump function \(\phi (y) = \exp \left( -{\left\| y \right\| ^2}/{(1-\left\| y \right\| ^2)} \right) \mathbb {1}_{\mathrm {B}(0,1)}(y)\) of Lemma H.1. We let \(M_{{\textbf {w}}} = \Phi _{{\textbf {w}}}(M_0)\) be the image of \(M_0\) by \(\Phi _{{\textbf {w}}}\). The set \(M_{{\textbf {w}}} \subseteq {\mathbb {R}}^n\) hence coincides with \(M_0\), except in the \(\delta \)-neighborhoods of the \(p_i\)’s, where it has a bump of size \(\eta \) toward direction \(w_i\). Note by now that up to rotations of its coordinates, the vector \({{\textbf {w}}} = (w_i)_{1 \le i \le N}\) belongs to \({\mathcal {S}}^{n-d}(0,1)^N\). Combining Proposition H.1 and Lemma H.1, we see that \(M_{{\textbf {w}}} \in {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) and \({\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_{{{\textbf {w}}}}) \le 2 {\mathcal {H}}^d(M_0)\) as soon as
In the rest of the proof, we will work with these two inequalities holding true. In particular, because \(\left\| \Phi _{{{\textbf {w}}}} - I_n \right\| _\infty \le \eta \), we immediately get that \(\mathrm {d_H}(M_0,M_{{{\textbf {w}}}}) \le \eta \). We note also that all the \(\Phi _{{\textbf {w}}}\)’s coincide with the identity map on (say) \(M_0 \cap \partial \mathrm {B}(p_1,\delta )\), so that \(M_0 \cap \bigl ( \cap _{{{\textbf {w}}}} M_{{\textbf {w}}} \bigr )\) contains \(M_0 \cap \partial \mathrm {B}(x_1,\delta )\) and is hence non-empty.
We now take two different families of unit normal vectors \({{\textbf {w}}}\) and \({{\textbf {w}}}'\) (i.e., \(w_i,w_i' \in \left( T_{p_i} M_0 \right) ^\perp \) and \(\left\| w_i \right\| = \left\| w_i' \right\| = 1\) for \(1 \le i \le N\)), and we will show that their associated submanifolds \(M_{{\textbf {w}}}\) and \(M_{{{\textbf {w}}}'}\) are far away in Hausdorff distance as soon as \(\max _{1\le i \le N} \left\| w_i - w_i' \right\| \) is large enough. To this aim, we first see that by construction, \(\Phi _{{\textbf {w}}}(p_i) = p_i + \eta w_i \in \Phi _{{\textbf {w}}}(M_0) = M_{{\textbf {w}}}\) for all \(i \in \left\{ {1, \dots ,N}\right\} \). In particular,
Let us fix a free parameter \(\lambda _i \in [0,1]\) to be chosen later. As \(\left\| \Phi _{\textbf {w'}}-I_n \right\| _\infty \le \eta \), we can write for all \(i \in \left\{ {1, \dots ,N}\right\} \) that
Further investigating the term \( \mathrm {d}\left( p_i + \eta w_i , \Phi _{{{\textbf {w}}}'}(M_{0} \cap \mathrm {B}(p_i,\lambda _i \delta )) \right) \), we see that for all \(x \in M_0 \cap \mathrm {B}(p_i,\lambda _i \delta ) \subseteq \mathrm {B}(p_i,\delta )\), \(\Phi _{{{\textbf {w}}}'} (x) = x + \eta \phi \left( \frac{x-p_i}{\delta }\right) w_i\). But from [31, Theorem 4.18], \({\mathrm {rch}}_{M_0} \ge 2 {\mathrm {rch}}_{\min }\) ensures that any \(x \in M_0 \cap \mathrm {B}(p_i,\lambda _i \delta )\) can be written as \(x = p_i + v + u\), where \(v \in T_{p_i} M_0\) with \(\left\| v \right\| \le \lambda _i\delta \), and \(u \in \bigl ( T_{p_i} M_0 \bigr )^\perp \) with \(\left\| u \right\| \le (\lambda _i\delta )^2/(4{\mathrm {rch}}_{\min })\). As a result, we have
But in the above minimum, v is orthogonal to \(u,w_i\) and \(w_i'\), so
Additionally, \(\phi \left( \frac{v+u}{\delta } \right) \) ranges in (a subset of) [0, 1] since \(0 \le \phi \le 1\). In particular,
where the second line follows from triangle inequality, and the last two from elementary calculations. Putting everything together, we have shown that for all \(\lambda _1, \dots , \lambda _N \in [0,1]\),
One easily checks that under the above assumptions on the parameters,
provides valid choices of \(\lambda _i \in [0,1]\). Plugging these values in the previous bound yields
so that if we further assume that \(\left\| w'_i -w_i \right\| \ge 4\sqrt{2}\eta /{\mathrm {rch}}_{\min }\), we obtain
where the last line follows from \(\left\| w_i-w'_i \right\| \le \left\| w_i \right\| + \left\| w'_i \right\| \le 2\).
Setting \(\eta = \delta ^2/(92{\mathrm {rch}}_{\min })\), which is a value that satisfies all the requirements above as soon as \(\delta \le {\mathrm {rch}}_{\min }/(2300 d)\), we have built a family of submanifolds \(\left\{ {M_{{\textbf {w}}} }\right\} _{{{\textbf {w}}}}\) of \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) indexed by \({{\textbf {w}}} \in {\mathcal {S}}^{n-d}(0,1)^N\), such that \({\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_{{\textbf {w}}}) \le 2 {\mathcal {H}}^d(M_0)\), and which are guaranteed to satisfy
provided that \(\max _{1 \le i \le N} \left\| w'_i - w_i \right\| > 1/4 = 2/8\). As a result, if we consider (1/8)-packings of the unit spheres \({\mathcal {S}}_{(T_{p_i} M_0)^\perp }(0,1) = {\mathcal {S}}^{n-d}(0,1)\) for \(i \in \left\{ {1, \dots ,N}\right\} \), then for all \(\delta \le {\mathrm {rch}}_{\min }/(2300 d)\), it naturally defines a \(\left( \frac{\delta ^2}{2082 {\mathrm {rch}}_{\min }}\right) \)-packing of \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) with cardinality \({\mathcal {N}}\) a least
and which consists of elements \(M_{{\textbf {w}}}\) such that \({\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_{{\textbf {w}}}) \le 2{\mathcal {H}}^d(M_0)\) and \(\mathrm {d_H}(M_0,M_{{\textbf {w}}}) \le \eta = \delta ^2/(92{\mathrm {rch}}_{\min })\). In particular, by setting \(r = \frac{\delta ^2}{2082 {\mathrm {rch}}_{\min }}\), then for all \(0 < r \le {\mathrm {rch}}_{\min }/(2^{34}d^2)\), we have exhibited a r-packing of \({\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) of cardinality \({\mathcal {N}}\) with
composed of submanifolds having volume as above, and \(\mathrm {d_H}(M_0,M_{{\textbf {w}}}) \le 2082 r/92 \le 23r\). From Proposition B.4, \(\log \mathrm {pk}_{{\mathcal {S}}^{n-d}(0,1)}(1/8) \ge (n-d) \log 2\). Finally, by considering the cases \(d \le n/2\) and \(d \ge n/2\), one easily checks that \((n-d) \ge n/(2d)\). In all, we obtain the announced bound
which yields the announced result. \(\square \)
Applying the technique of Proposition H.5 with manifolds \(M_0\) having a large prescribed volume \(\left\{ {0}\right\} \sqcup {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) and \(\mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) respectively yields the following result.
Proposition H.6
Let \({\mathcal {V}} > 0\) and \(\varepsilon \le {\mathrm {rch}}_{\min }/(2^{34} d^2)\).
-
Assume that
$$\begin{aligned} 1 \le \frac{{\mathcal {V}}}{2^{d + 1}\sigma _d {\mathrm {rch}}_{\min }^d} . \end{aligned}$$Then,
$$\begin{aligned} \log \mathrm {pk}_{\bigl ( \left\{ {0}\right\} \sqcup {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}, \mathrm {d_H}\bigr )}(\varepsilon ) \ge n \frac{{\mathcal {V}}}{\omega _d {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{21} \varepsilon } \right) ^{d/2} . \end{aligned}$$Furthermore, this packing can be chosen so that all its elements M satisfy
$$\begin{aligned} {\mathcal {V}}/4 \le {\mathcal {H}}^d(M) \le {\mathcal {V}} . \end{aligned}$$ -
Assume that \({\mathrm {rch}}_{\min } \le R/144\). Writing \(C_d' = 9(2^{2d+1} \sigma _{d - 1})\), assume that
$$\begin{aligned} 1 \le \frac{{\mathcal {V}}}{2^{d + 1} C_d' {\mathrm {rch}}_{\min }^d} \le \max _{1 \le k \le n} \left( \frac{R}{192{\mathrm {rch}}_{\min } \sqrt{k}}\right) ^k . \end{aligned}$$Then,
$$\begin{aligned} \log \mathrm {pk}_{\bigl ( \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }} , \mathrm {d_H}\bigr )}(\varepsilon ) \ge n \frac{{\mathcal {V}}}{\omega _d {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{31} \varepsilon } \right) ^{d/2} . \end{aligned}$$Furthermore, this packing can be chosen so that all its elements M satisfy
$$\begin{aligned} {\mathcal {V}}/96 \le {\mathcal {H}}^d(M) \le {\mathcal {V}} . \end{aligned}$$
Proof of Proposition H.6
For both models, the idea is to first build a manifold \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2{\mathrm {rch}}_{\min }}\) with prescribed volume close to \({\mathcal {V}}\), and then consider the variations of it given by Proposition H.5.
-
Let \(M_0\) be the centered d-dimensional sphere of radius \(r_0 = \left( \frac{{\mathcal {V}}/2}{\sigma _d} \right) ^{1/d}\) in \({\mathbb {R}}^{d + 1}\times \left\{ {0}\right\} ^{n-(d + 1)} \subseteq {\mathbb {R}}^n\). By construction, \({\mathrm {rch}}_{M_0} = r_0 \ge 2 {\mathrm {rch}}_{\min }\), so that \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2{\mathrm {rch}}_{\min }}\). Furthermore, one easily checks that \({\mathcal {H}}^d(M_0) = {\mathcal {V}}/2\). From Proposition H.5, there exists a family of submanifolds \(\left\{ {M_s}\right\} _{1 \le s \le {\mathcal {N}}} \subseteq {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) with cardinality \({\mathcal {N}}\) such that
$$\begin{aligned} \log {\mathcal {N}}&\ge n\frac{{\mathcal {V}}/2}{\omega _d {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{19}\varepsilon } \right) ^{d/2} \\&\ge n\frac{{\mathcal {V}}}{\omega _d {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{21}\varepsilon } \right) ^{d/2} , \end{aligned}$$that all share a point \(x_0 \in \cap _{1 \le s \le {\mathcal {N}}} M_s\), and such that \(\mathrm {d_H}(M_s,M_{s'}) > 2\varepsilon \) for all \(s \ne s' \in \left\{ {1, \dots , {\mathcal {N}}}\right\} \), with volumes \({\mathcal {V}}/4 = {\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_s) \le 2 {\mathcal {H}}^d(M_0) = {\mathcal {V}}\). As a result, the family given by the translations \(M'_s = M_s - x_0\) clearly provides the existence of the announced \(\varepsilon \)-packing of \(\bigl ( \left\{ {0}\right\} \sqcup {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}, \mathrm {d_H}\bigr )\).
-
Let \(M_0 \subseteq {\mathbb {R}}^n\) be a submanifold given by Proposition H.2 applied with parameters \({\mathrm {rch}}_{\min }' = 2{\mathrm {rch}}_{\min }\), \({\mathcal {V}}' = {\mathcal {V}}/2\) and \(R' = R/2\). That is, \(M_0 \in {\mathcal {M}}^{{n}, {d}}_{2 {\mathrm {rch}}_{\min }}\) is such that \({\mathcal {V}}/48 \le {\mathcal {H}}^d(M_0) \le {\mathcal {V}}/2\) and \(M_0 \subseteq \mathrm {B}(0,R/2)\). From Proposition H.5, there exists a family of submanifolds \(\left\{ {M_s}\right\} _{1 \le s \le {\mathcal {N}}} \subseteq {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) with cardinality \({\mathcal {N}}\) such that
$$\begin{aligned} \log {\mathcal {N}}&\ge n\frac{{\mathcal {V}}/48}{\omega _d {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{19}\varepsilon } \right) ^{d/2} \\&\ge n\frac{{\mathcal {V}}}{\omega _d {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{31}\varepsilon } \right) ^{d/2} , \end{aligned}$$with \(\mathrm {d_H}(M_0,M_s) \le 23\varepsilon \) and \(\mathrm {d_H}(M_s,M_{s'}) > 2\varepsilon \) for all \(s \ne s' \in \left\{ {1, \dots , {\mathcal {N}}}\right\} \), and volumes \({\mathcal {V}}/96 \le {\mathcal {H}}^d(M_0)/2 \le {\mathcal {H}}^d(M_s) \le 2 {\mathcal {H}}^d(M_0) \le {\mathcal {V}}\). Because \(M_0 \subseteq \mathrm {B}(0,R/2)\) and \(\mathrm {d_H}(M_0,M_s) \le 23\varepsilon \) for all \(s \in \left\{ {1, \dots , {\mathcal {N}}}\right\} \), we immediately get that \(M_s \subseteq \mathrm {B}(0,R/2 + 23 \varepsilon ) \subseteq \mathrm {B}(0,R)\). As a result, this family clearly provides the existence of the announced \(\varepsilon \)-packing of \(\bigl ( \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}, \mathrm {d_H}\bigr )\).
\(\square \)
1.4.3 Proof of the Computational Lower Bounds for Manifold Estimation
We are now in position to prove the computational lower bounds presented in this work. First, we turn to the infeasibiliy result of manifold estimation using statistical queries in the unbounded model \({\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) (Proposition 2.1).
Proof of Proposition 2.1
Since \(\sigma _d f_{\min } {\mathrm {rch}}_{\min }^d \le 1\), the uniform probability distribution \(D_0\) over the centered unit d-sphere \(M_0 \subseteq {\mathbb {R}}^{d + 1}\times \left\{ {0}\right\} ^{n-(d + 1)}\) of radius \({\mathrm {rch}}_{\min }\) belongs to \({\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\). Given a unit vector \(v \in {\mathbb {R}}^n\), the invariance of the model by translation yields that the uniform distributions \(D_k\) over \(M_k = \left\{ {p+(3k\varepsilon )v, p \in M_0}\right\} \), for \(k \in {\mathbb {Z}}\), also belong to \({\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\). But for all \(k \ne k' \in {\mathbb {Z}}\), \(\mathrm {d_H}(M_k,M_{k'}) = 3|k-k'|\varepsilon > 2\varepsilon \). Hence, writing
we see that the family \(\left\{ {M_k}\right\} _{k \in {\mathbb {Z}}}\) forms an infinite \(\varepsilon \)-packing of \(({\mathcal {M}},\mathrm {d_H})\). From Theorem G.2, we get that the statistical query complexity of manifold estimation over the model \({\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) with precision \(\varepsilon \) is infinite, which concludes the proof. \(\square \)
We finally come to the proofs of the computational lower bounds over the fixed point model \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) (Theorem 5.3) and the bounding ball model \(\mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) (Theorem 5.6).
Proof of Theorems 5.3 and 5.6
For both results, the idea is to exhibit large enough \(\varepsilon \)-packings of \({\mathcal {M}} = \left\{ {\mathrm {Supp}(D),D \in {\mathcal {D}}}\right\} \), and apply Theorem G.2. In each case, the assumptions on the parameters \(f_{\min },f_{\max }\), \({\mathrm {rch}}_{\min }\) and d ensure that the uniform distributions over the manifolds given by the packings of Proposition H.6 (and Proposition H.4 for Theorem 5.6) applied with \({\mathcal {V}} = 1/f_{\min }\) belong to the model, and hence that \({\mathcal {M}}\) contain these packings.
-
To prove Theorem 5.3, let us write
$$\begin{aligned} {\mathcal {M}}_0 := \left\{ {\mathrm {Supp}(D), D \in \left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})}\right\} \subseteq \left\{ {0}\right\} \sqcup {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }} . \end{aligned}$$From Theorem G.2, any randomized SQ algorithm estimating \(M = \mathrm {Supp}(D)\) over the model \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\) with precision \(\varepsilon \) and with probability of success at least \(1 - \alpha \) must make at least
$$\begin{aligned} q \ge \frac{\log \bigl ( (1-\alpha ) \mathrm {pk}_{({\mathcal {M}}_0, \mathrm {d_H})}(\varepsilon ) \bigr )}{\log (1+\left\lfloor 1/\tau \right\rfloor )} \end{aligned}$$queries to \({\mathrm {STAT}}(\tau )\). Furthermore, let \(\left\{ {M_i}\right\} _{1 \le i \le {\mathcal {N}}}\) be an \(\varepsilon \)-packing of \(\left\{ {0}\right\} \sqcup {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}\) given by Proposition H.6, that we apply with volume \({\mathcal {V}} = 1/f_{\min }\). Recall that these manifolds are guaranteed to have volumes \(1/(4f_{\min }) \le {\mathcal {H}}^d(M_i) \le 1/f_{\min }\). From the assumptions on the parameters of the model, we get that the uniform distributions \(\left\{ {D_i := \mathbb {1}_{M_i} {\mathcal {H}}^d /{\mathcal {H}}^d(M_i)}\right\} _{1 \le i \le {\mathcal {N}}}\) over the \(M_i\)’s all belong to \(\left\{ {0}\right\} \sqcup {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})\). In particular, the family \(\left\{ {M_i}\right\} _{1 \le i \le {\mathcal {N}}}\) is also an \(\varepsilon \)-packing of \({\mathcal {M}}_0\), and therefore
$$\begin{aligned} \log \bigl (\mathrm {pk}_{({\mathcal {M}}_0, \mathrm {d_H})}(\varepsilon ) \bigr )&\ge \log {\mathcal {N}} \ge n \frac{1}{\omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{21} \varepsilon } \right) ^{d/2} , \end{aligned}$$which yields the announced result.
-
Similarly, to prove Theorem 5.6, write
$$\begin{aligned} {\mathcal {M}}_R := \left\{ {\mathrm {Supp}(D), D \in \mathrm {B}(0,R) \sqcap {\mathcal {D}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }}({f_{\min }}, {f_{\max }}, {L})}\right\} \subseteq \mathrm {B}(0,R) \sqcap {\mathcal {M}}^{{n}, {d}}_{{\mathrm {rch}}_{\min }} , \end{aligned}$$and apply Theorem G.2 to get
$$\begin{aligned} q \ge \frac{\log \bigl ( (1-\alpha ) \mathrm {pk}_{({\mathcal {M}}_R, \mathrm {d_H})}(\varepsilon ) \bigr )}{\log (1+\left\lfloor 1/\tau \right\rfloor )} . \end{aligned}$$The assumptions on the parameters ensure that the packings exhibited in Propositions H.4 and H.6 applied with volume \({\mathcal {V}} = 1/f_{\min }\) are included in \({\mathcal {M}}_R\), so that
$$\begin{aligned} \log \bigl (\mathrm {pk}_{({\mathcal {M}}_R, \mathrm {d_H})}(\varepsilon ) \bigr )&\ge n \max \left\{ { \log \left( \frac{R}{4\varepsilon } \right) , \frac{1}{\omega _d f_{\min } {\mathrm {rch}}_{\min }^d} \left( \frac{{\mathrm {rch}}_{\min }}{2^{31} \varepsilon } \right) ^{d/2} }\right\} , \end{aligned}$$which concludes the proof.
\(\square \)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aamari, E., Knop, A. Adversarial Manifold Estimation. Found Comput Math 24, 1–97 (2024). https://doi.org/10.1007/s10208-022-09588-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-022-09588-2