Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Query lower bounds for log-concave sampling

Online AM: 21 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Log-concave sampling has witnessed remarkable algorithmic advances in recent years, but the corresponding problem of proving lower bounds for this task has remained elusive, with lower bounds previously known only in dimension one. In this work, we establish the following query lower bounds: (1) sampling from strongly log-concave and log-smooth distributions in dimension d ≥ 2 requires Ω(log κ) queries, which is sharp in any constant dimension, and (2) sampling from Gaussians in dimension d (hence also from general log-concave and log-smooth distributions in dimension d) requires \(\widetilde{\Omega }(\min (\sqrt \kappa \log d, d)) \) queries, which is nearly sharp for the class of Gaussians. Here κ denotes the condition number of the target distribution. Our proofs rely upon (1) a multiscale construction inspired by work on the Kakeya conjecture in geometric measure theory, and (2) a novel reduction that demonstrates that block Krylov algorithms are optimal for this problem, as well as connections to lower bound techniques based on Wishart matrices developed in the matrix-vector query literature.

    References

    [1]
    Kwangjun Ahn and Sinho Chewi. 2021. Efficient constrained sampling via the mirror-Langevin algorithm. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, K. Nguyen, P. S. Liang, J. W. Vaughan, and Y. Dauphin (Eds.), Vol.  34. Curran Associates, Inc., 28405–28418.
    [2]
    Jason M. Altschuler and Sinho Chewi. 2024. Faster high-accuracy log-concave sampling via algorithmic warm starts. J. ACM (3 2024).
    [3]
    Jason M. Altschuler and Kunal Talwar. 2023. Resolving the mixing time of the Langevin algorithm to its stationary distribution for log-concave sampling. In Proceedings of Thirty Sixth Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  195), Gergely Neu and Lorenzo Rosasco (Eds.). PMLR, 2509–2510.
    [4]
    Zhaojun Bai, Gark Fahey, and Gene Golub. 1996. Some large-scale matrix computation problems. J. Comput. Appl. Math. 7(1996), 71–89. Issue 1-2.
    [5]
    Ainesh Bakshi, Kenneth L. Clarkson, and David P. Woodruff. 2022. Low-rank approximation with 1/ϵ1/3 matrix-vector products. In 54th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 1130–1143.
    [6]
    Krishnakumar Balasubramanian, Sinho Chewi, Murat A. Erdogdu, Adil Salim, and Matthew S. Zhang. 2022. Towards a theory of non-log-concave sampling: first-order stationarity guarantees for Langevin Monte Carlo. In Conference on Learning Theory. PMLR, 2896–2923.
    [7]
    Espen Bernton. 2018. Langevin Monte Carlo and JKO splitting. In Conference on Learning Theory. PMLR, 1777–1798.
    [8]
    Mark Braverman, Elad Hazan, Max Simchowitz, and Blake E. Woodworth. 2020. The gradient complexity of linear regression. In Conference on Learning Theory, (COLT)(Proceedings of Machine Learning Research, Vol.  125). PMLR, 627–647.
    [9]
    Vladimir Braverman, Aditya Krishnan, and Christopher Musco. 2022. Sublinear time spectral density estimation. In 54th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 1144–1157.
    [10]
    Matthew Brennan, Guy Bresler, and Brice Huang. 2021. De Finetti-style results for Wishart matrices: combinatorial structure and phase transitions. arXiv e-prints, Article arXiv:2103.14011 (2021).
    [11]
    Sébastien Bubeck. 2015. Convex optimization: algorithms and complexity. Foundations and Trends® in Machine Learning 8, 3-4 (2015), 231–357.
    [12]
    Sébastien Bubeck, Jian Ding, Ronen Eldan, and Miklós Z. Rácz. 2016. Testing for high-dimensional geometry in random graphs. Random Structures Algorithms 49, 3 (2016), 503–532.
    [13]
    Sébastien Bubeck and Shirshendu Ganguly. 2018. Entropic CLT and phase transition in high-dimensional Wishart matrices. Int. Math. Res. Not. IMRN2 (2018), 588–606.
    [14]
    Yu Cao, Jianfeng Lu, and Lihan Wang. 2021. Complexity of randomized algorithms for underdamped Langevin dynamics. Commun. Math. Sci. 19, 7 (2021), 1827–1853.
    [15]
    Niladri S. Chatterji, Peter L. Bartlett, and Philip M. Long. 2022. Oracle lower bounds for stochastic gradient sampling algorithms. Bernoulli 28, 2 (2022), 1074–1092.
    [16]
    Yongxin Chen, Sinho Chewi, Adil Salim, and Andre Wibisono. 2022. Improved analysis for a proximal algorithm for sampling. In Proceedings of Thirty Fifth Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  178), Po-Ling Loh and Maxim Raginsky (Eds.). PMLR, 2984–3014.
    [17]
    Yuansi Chen, Raaz Dwivedi, Martin J. Wainwright, and Bin Yu. 2020. Fast mixing of Metropolized Hamiltonian Monte Carlo: benefits of multi-step gradients. J. Mach. Learn. Res. 21(2020), 92–1.
    [18]
    Yuansi Chen and Ronen Eldan. 2022. Localization schemes: a framework for proving mixing bounds for Markov chains (extended abstract). In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). 110–122.
    [19]
    Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, and Michael I. Jordan. 2018. Underdamped Langevin MCMC: a non-asymptotic analysis. In Proceedings of the 31st Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  75), Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet (Eds.). PMLR, 300–323.
    [20]
    Sinho Chewi. 2024. Log-concave sampling. (2024). Book draft available at https://chewisinho.github.io/.
    [21]
    Sinho Chewi, Murat A. Erdogdu, Mufan B. Li, Ruoqi Shen, and Matthew S. Zhang. 2022. Analysis of Langevin Monte Carlo from Poincaré to log-Sobolev. In Proceedings of Thirty Fifth Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  178), Po-Ling Loh and Maxim Raginsky (Eds.). PMLR, 1–2.
    [22]
    Sinho Chewi, Patrik R. Gerber, Holden Lee, and Chen Lu. 2023. Fisher information lower bounds for sampling. In Proceedings of the 34th International Conference on Algorithmic Learning Theory(Proceedings of Machine Learning Research, Vol.  201), Shipra Agrawal and Francesco Orabona (Eds.). PMLR, 375–410.
    [23]
    Sinho Chewi, Patrik R. Gerber, Chen Lu, Thibaut Le Gouic, and Philippe Rigollet. 2022. The query complexity of sampling from strongly log-concave distributions in one dimension. In Proceedings of Thirty Fifth Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  178), Po-Ling Loh and Maxim Raginsky (Eds.). PMLR, 2041–2059.
    [24]
    Sinho Chewi, Thibaut Le Gouic, Chen Lu, Tyler Maunu, Philippe Rigollet, and Austin J. Stromme. 2020. Exponential ergodicity of mirror-Langevin diffusions. Advances in Neural Information Processing Systems 33 (2020), 19573–19585.
    [25]
    Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, and Philippe Rigollet. 2021. Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory. PMLR, 1260–1300.
    [26]
    David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant. 2018. Approximating the spectrum of a graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,. ACM, 1263–1271.
    [27]
    Thomas M. Cover and Joy A. Thomas. 2006. Elements of information theory(second ed.). Wiley-Interscience [John Wiley & Sons], Hoboken, NJ. xxiv+748 pages.
    [28]
    Arnak S. Dalalyan. 2017. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 3 (2017), 651–676.
    [29]
    Arnak S. Dalalyan and Avetik Karagulyan. 2019. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications 129, 12 (2019), 5278–5311.
    [30]
    Arnak S. Dalalyan and Lionel Riou-Durand. 2020. On sampling from a log-concave density using kinetic Langevin diffusions. Bernoulli 26, 3 (2020), 1956–1988.
    [31]
    Arnak S. Dalalyan and Alexandre B. Tsybakov. 2012. Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. System Sci. 78, 5 (2012), 1423–1443.
    [32]
    Prathamesh Dharangutte and Christopher Musco. 2021. Dynamic trace estimation. In Advances in Neural Information Processing Systems 34. 30088–30099.
    [33]
    Zhiyan Ding, Qin Li, Jianfeng Lu, and Stephen J. Wright. 2021. Random coordinate Langevin Monte Carlo. In Conference on Learning Theory. PMLR, 1683–1710.
    [34]
    Alain Durmus, Szymon Majewski, and Błażej Miasojedow. 2019. Analysis of Langevin Monte Carlo via convex optimization. J. Mach. Learn. Res. 20(2019), Paper No. 73, 46.
    [35]
    Alain Durmus and Eric Moulines. 2017. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. The Annals of Applied Probability 27, 3 (2017), 1551–1587.
    [36]
    Zeev Dvir. 2009. On the size of Kakeya sets in finite fields. Journal of the American Mathematical Society 22, 4 (2009), 1093–1097.
    [37]
    Raaz Dwivedi, Yuansi Chen, Martin J. Wainwright, and Bin Yu. 2018. Log-concave sampling: Metropolis–Hastings algorithms are fast!. In Conference on Learning Theory. PMLR, 793–797.
    [38]
    Alan Edelman. 1989. Eigenvalues and condition numbers of random matrices. Ph. D. Dissertation. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA.
    [39]
    Jiaojiao Fan, Bo Yuan, and Yongxin Chen. 2023. Improved dimension dependence of a proximal algorithm for sampling. In Proceedings of Thirty Sixth Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  195), Gergely Neu and Lorenzo Rosasco (Eds.). PMLR, 1473–1521.
    [40]
    Khashayar Gatmiry and Santosh S. Vempala. 2022. Convergence of the Riemannian Langevin algorithm. arXiv e-prints, Article arXiv:2204.10818 (2022).
    [41]
    Rong Ge, Holden Lee, and Jianfeng Lu. 2020. Estimating normalizing constants for log-concave distributions: algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 579–586.
    [42]
    Sivakanth Gopi, Yin Tat Lee, and Daogao Liu. 2022. Private convex optimization via exponential mechanism. In Proceedings of Thirty Fifth Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  178), Po-Ling Loh and Maxim Raginsky (Eds.). PMLR, 1948–1989.
    [43]
    Michael F. Hutchinson. 1990. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation 19 (1990), 433–450. Issue 2.
    [44]
    Qijia Jiang. 2021. Mirror Langevin Monte Carlo: the case under isoperimetry. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol.  34. Curran Associates, Inc., 715–725.
    [45]
    Tiefeng Jiang and Danning Li. 2015. Approximation of rectangular beta-Laguerre ensembles and large deviations. J. Theoret. Probab. 28, 3 (2015), 804–847.
    [46]
    Richard Jordan, David Kinderlehrer, and Felix Otto. 1998. The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29, 1 (1998), 1–17.
    [47]
    Stasys Jukna. 2011. Extremal combinatorics: with applications in computer science. Vol.  571. Springer.
    [48]
    Yin Tat Lee, Ruoqi Shen, and Kevin Tian. 2020. Logsmooth gradient concentration and tighter runtimes for Metropolized Hamiltonian Monte Carlo. In Conference on Learning Theory. PMLR, 2565–2597.
    [49]
    Yin Tat Lee, Ruoqi Shen, and Kevin Tian. 2021. Lower bounds on Metropolized sampling methods for well-conditioned distributions. Advances in Neural Information Processing Systems 34 (2021), 18812–18824.
    [50]
    Yin Tat Lee, Ruoqi Shen, and Kevin Tian. 2021. Structured logconcave sampling with a restricted Gaussian oracle. In Conference on Learning Theory. PMLR, 2993–3050.
    [51]
    Ruilin Li, Molei Tao, Santosh S. Vempala, and Andre Wibisono. 2022. The mirror Langevin algorithm converges with vanishing bias. In Proceedings of the 33rd International Conference on Algorithmic Learning Theory(Proceedings of Machine Learning Research, Vol.  167), Sanjoy Dasgupta and Nika Haghtalab (Eds.). PMLR, 718–742.
    [52]
    László Lovász and Santosh Vempala. 2006. Simulated annealing in convex bodies and an O*(n4) volume algorithm. J. Comput. System Sci. 72, 2 (2006), 392–417.
    [53]
    Yi-An Ma, Niladri S. Chatterji, Xiang Cheng, Nicolas Flammarion, Peter L. Bartlett, and Michael I. Jordan. 2021. Is there an analog of Nesterov acceleration for gradient-based MCMC?Bernoulli 27, 3 (2021), 1942–1992.
    [54]
    Raphael A. Meyer, Cameron Musco, Christopher Musco, and David P. Woodruff. 2021. Hutch++: optimal stochastic trace estimation. In 4th Symposium on Simplicity in Algorithms. SIAM, 142–155.
    [55]
    Dan Mikulincer. 2022. A CLT in Stein’s distance for generalized Wishart matrices and higher-order tensors. Int. Math. Res. Not. IMRN10 (2022), 7839–7872.
    [56]
    Cameron Musco and Christopher Musco. 2015. Randomized block Krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems 28. 1396–1404.
    [57]
    Arkadij S. Nemirovskij and David B. Yudin. 1983. Problem complexity and method efficiency in optimization. (1983).
    [58]
    Yurii Nesterov. 2018. Lectures on convex optimization. Springer Optimization and Its Applications, Vol.  137. Springer, Cham. xxiii+589 pages.
    [59]
    Akihiko Nishimura and Marc A. Suchard. 2022. Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in “large n, large p” Bayesian sparse regression. J. Amer. Statist. Assoc. 0, 0 (2022), 1–14.
    [60]
    Oskar Perron. 1928. Über einen Satz von Besicovitsch. Mathematische Zeitschrift 28, 1 (1928), 383–386.
    [61]
    Miklós Z. Rácz and Jacob Richey. 2019. A smooth transition from Wishart to GOE. J. Theoret. Probab. 32, 2 (2019), 898–906.
    [62]
    Luis Rademacher and Santosh Vempala. 2008. Dispersion of mass and the complexity of randomized geometric algorithms. Adv. Math. 219, 3 (2008), 1037–1069.
    [63]
    Cyrus Rashtchian, David P. Woodruff, and Hanlin Zhu. 2020. Vector-matrix-vector queries for solving linear algebra, statistics, and graph problems. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques(LIPIcs, Vol.  176). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 26:1–26:20.
    [64]
    Christian P. Robert and George Casella. 2004. Monte Carlo statistical methods(second ed.). Springer-Verlag, New York. xxx+645 pages.
    [65]
    Sushant Sachdeva and Nisheeth K. Vishnoi. 2014. Faster algorithms via approximation theory. Found. Trends Theor. Comput. Sci. 9, 2 (2014), 125–210.
    [66]
    Adil Salim and Peter Richtarik. 2020. Primal dual interpretation of the proximal stochastic gradient Langevin algorithm. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol.  33. Curran Associates, Inc., 3786–3796.
    [67]
    Shubhangi Saraf and Madhu Sudan. 2008. An improved lower bound on the size of Kakeya sets over finite fields. Analysis & PDE 1, 3 (2008), 375–379.
    [68]
    Ruoqi Shen and Yin Tat Lee. 2019. The randomized midpoint method for log-concave sampling. Advances in Neural Information Processing Systems 32 (2019).
    [69]
    Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. 2018. Tight query complexity lower bounds for PCA via finite sample deformed Wigner law. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 1249–1259.
    [70]
    Xiaoming Sun, David P. Woodruff, Guang Yang, and Jialin Zhang. 2019. Querying a matrix through matrix-vector products. In 46th International Colloquium on Automata, Languages, and Programming(LIPIcs, Vol.  132). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 94:1–94:16.
    [71]
    Stanisław J. Szarek. 1991. Condition numbers of random matrices. J. Complexity 7, 2 (1991), 131–149.
    [72]
    Kunal Talwar. 2019. Computational separations between sampling and optimization. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol.  32. Curran Associates, Inc.
    [73]
    Santosh S. Vempala and Andre Wibisono. 2019. Rapid convergence of the unadjusted Langevin algorithm: isoperimetry suffices. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8094–8106.
    [74]
    Roman Vershynin. 2018. High-dimensional probability. Cambridge Series in Statistical and Probabilistic Mathematics, Vol.  47. Cambridge University Press, Cambridge. xiv+284 pages. An introduction with applications in data science, With a foreword by Sara van de Geer.
    [75]
    Andre Wibisono. 2018. Sampling as optimization in the space of measures: the Langevin dynamics as a composite optimization problem. In Conference on Learning Theory. PMLR, 2093–3027.
    [76]
    Andre Wibisono. 2019. Proximal Langevin algorithm: rapid convergence under isoperimetry. arXiv preprint arXiv:1911.01469(2019).
    [77]
    Karl Wimmer, Yi Wu, and Peng Zhang. 2014. Optimal query complexity for estimating the trace of a matrix. In 41st International Colloquium on Automata, Languages, and Programming(Lecture Notes in Computer Science, Vol.  8572). Springer, 1051–1062.
    [78]
    David P. Woodruff. 2014. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10, 1-2 (2014), 1–157.
    [79]
    Blake Woodworth and Nathan Srebro. 2017. Lower bound for randomized first order convex optimization. arXiv e-prints, Article arXiv:1709.03594 (2017).
    [80]
    Keru Wu, Scott Schmidler, and Yuansi Chen. 2022. Minimax mixing time of the Metropolis-adjusted Langevin algorithm for log-concave sampling. Journal of Machine Learning Research 23, 270 (2022), 1–63.
    [81]
    Kelvin S. Zhang, Gabriel Peyré, Jalal Fadili, and Marcelo Pereyra. 2020. Wasserstein control of mirror Langevin Monte Carlo. In Proceedings of Thirty Third Conference on Learning Theory(Proceedings of Machine Learning Research, Vol.  125), Jacob Abernethy and Shivani Agarwal (Eds.). PMLR, 3814–3841.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM Just Accepted
    ISSN:0004-5411
    EISSN:1557-735X
    Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Online AM: 21 June 2024
    Accepted: 11 June 2024
    Revised: 25 May 2024
    Received: 23 December 2023

    Check for updates

    Author Tags

    1. block Krylov
    2. log-concave sampling
    3. matrix-vector queries

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 38
      Total Downloads
    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)37
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media