Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Faster High-accuracy Log-concave Sampling via Algorithmic Warm Starts

Published: 11 June 2024 Publication History

Abstract

It is a fundamental problem to understand the complexity of high-accuracy sampling from a strongly log-concave density π on ℝd. Indeed, in practice, high-accuracy samplers such as the Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard; and in theory, via the proximal sampler reduction, it is understood that such samplers are key for sampling even beyond log-concavity (in particular, for sampling under isoperimetric assumptions).
This article improves the dimension dependence of this sampling problem to \(\widetilde{O}(d^{1/2})\). The previous best result for MALA was \(\widetilde{O}(d)\). This closes the long line of work on the complexity of MALA and, moreover, leads to state-of-the-art guarantees for high-accuracy sampling under strong log-concavity and beyond (thanks to the aforementioned reduction).
Our starting point is that the complexity of MALA improves to \(\widetilde{O}(d^{1/2})\), but only under a warm start (an initialization with constant Rényi divergence w.r.t. π). Previous algorithms for finding a warm start took O(d) time and thus dominated the computational effort of sampling. Our main technical contribution resolves this gap by establishing the first \(\widetilde{O}(d^{1/2})\) Rényi mixing rates for the discretized underdamped Langevin diffusion. For this, we develop new differential-privacy-inspired techniques based on Rényi divergences with Orlicz–Wasserstein shifts, which allow us to sidestep longstanding challenges for proving fast convergence of hypocoercive differential equations.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016a. TensorFlow: A system for large-scale machine learning. In Symposium on Operating Systems Design and Implementation. 265–283.
[2]
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016b. Deep learning with differential privacy. In Conference on Computer and Communications Security. 308–318.
[3]
Dallas Albritton, Scott Armstrong, Jean-Christophe Mourrat, and Matthew Novack. 2019. Variational methods for the kinetic Fokker–Planck equation. arXiv preprint 1902.04037 (2019).
[4]
David Alonso-Gutiérrez and Jesús Bastero. 2015. Approaching the Kannan–Lovász–Simonovits and Variance Conjectures. (Lecture Notes in Mathematics, Vol. 2131). Springer, Cham.
[5]
Jason M. Altschuler and Sinho Chewi. 2023. Shifted composition I: Harnack and reverse transport inequalities. arXiv preprint 2311.14520 (2023).
[6]
Jason M. Altschuler and Kunal Talwar. 2022. Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. In Advances in Neural Information Processing Systems Conference.
[7]
Jason M. Altschuler and Kunal Talwar. 2023. Resolving the mixing time of the Langevin algorithm to its stationary distribution for log-concave sampling. In Conference on Learning Theory, Vol. 195. 2509–2510.
[8]
Shahab Asoodeh, Mario Díaz, and Flávio du Pin Calmon. 2020. Privacy amplification of iterative algorithms via contraction coefficients. In International Symposium on Information Theory. 896–901.
[9]
Dominique Bakry, Ivan Gentil, and Michel Ledoux. 2014. Analysis and Geometry of Markov Diffusion Operators. Vol. 103. Springer.
[10]
Borja Balle, Gilles Barthe, Marco Gaboardi, and Joseph Geumlek. 2019. Privacy Amplification by Mixing and Diffusion Mechanisms.
[11]
Fabrice Baudoin. 2017. Bakry–Émery meet Villani. J. Funct. Anal. 273, 7 (2017), 2275–2291.
[12]
Julian Besag, Peter Green, David Higdon, and Kerrie Mengersen. 1995. Bayesian computation and stochastic systems. Stat. Sci. 10, 1 (1995), 3–66.
[13]
Joris Bierkens, Paul Fearnhead, and Gareth Roberts. 2019. The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Annals. Stat. 47, 3 (2019), 1288–1320.
[14]
Nawaf Bou-Rabee and Milo Marsden. 2022. Unadjusted Hamiltonian MCMC with stratified Monte Carlo time integration. arXiv preprint arXiv:211.11003 (2022).
[15]
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford.
[16]
Yu Cao, Jianfeng Lu, and Yulong Lu. 2019. Exponential decay of Rényi divergence under Fokker–Planck equations. J. Stat. Phys. 176, 5 (2019), 1172–1184.
[17]
Yu Cao, Jianfeng Lu, and Lihan Wang. 2021. Complexity of randomized algorithms for underdamped Langevin dynamics. Commun. Math. Sci. 19, 7 (2021), 1827–1853.
[18]
Yu Cao, Jianfeng Lu, and Lihan Wang. 2023. On explicit \(L^2\)-convergence rate estimate for underdamped Langevin dynamics. Arch. Ration. Mech. Anal. 247, 5 (2023), 90.
[19]
Hong-Bin Chen, Sinho Chewi, and Jonathan Niles-Weed. 2021. Dimension-free log-Sobolev inequalities for mixture distributions. J. Function. Anal. 281, 11 (2021), 109236.
[20]
Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru Zhang. 2023. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. In International Conference on Learning Representations.
[21]
Yongxin Chen, Sinho Chewi, Adil Salim, and Andre Wibisono. 2022. Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, Vol. 178. PMLR, 2984–3014.
[22]
Yuansi Chen, Raaz Dwivedi, Martin J. Wainwright, and Bin Yu. 2020. Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients. J. Mach. Learn. Res. 21 (2020).
[23]
Yuansi Chen and Ronen Eldan. 2022. Localization schemes: A framework for proving mixing bounds for Markov chains. In Symposium on Foundations of Computer Science (FOCS’22). IEEE, 110–122.
[24]
Zongchen Chen and Santosh S. Vempala. 2019. Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques(Leibniz Int. Proc. Inform., Vol. 145). Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern.
[25]
Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, and Michael I. Jordan. 2018. Underdamped Langevin MCMC: A non-asymptotic analysis. In Conference on Learning Theory. PMLR, 300–323.
[26]
Sinho Chewi. 2023. Log-concave Sampling. Forthcoming. Retrieved from https://chewisinho.github.io/
[27]
Sinho Chewi, Murat A. Erdogdu, Mufan Li, Ruoqi Shen, and Shunshi Zhang. 2022. Analysis of Langevin Monte Carlo from Poincaré to log-Sobolev. In Conference on Learning Theory, Vol. 178. PMLR, 1–2.
[28]
Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, and Philippe Rigollet. 2021. Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory. PMLR, 1260–1300.
[29]
Rishav Chourasia, Jiayuan Ye, and Reza Shokri. 2021. Differential privacy dynamics of Langevin diffusion and noisy gradient descent. In Advances in Neural Information Processing Systems, Vol. 34. 14771–14781.
[30]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2022. Introduction to Algorithms. MIT Press.
[31]
Arnak S. Dalalyan, Avetik Karagulyan, and Lionel Riou-Durand. 2022. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets. J. Mach. Learn. Res. 23, 235 (2022), 1–38.
[32]
Arnak S. Dalalyan and Lionel Riou-Durand. 2020. On sampling from a log-concave density using kinetic Langevin diffusions. Bernoulli 26, 3 (2020), 1956–1988.
[33]
Jean Dolbeault, Clément Mouhot, and Christian Schmeiser. 2009. Hypocoercivity for kinetic equations with linear relaxation terms. Comptes Rendus Mathématique 347, 9-10 (2009), 511–516.
[34]
Raaz Dwivedi, Yuansi Chen, Martin J. Wainwright, and Bin Yu. 2018. Log-concave sampling: Metropolis–Hastings algorithms are fast! In Conference on Learning Theory. PMLR, 793–797.
[35]
Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Found. Trends Theoret. Comput. Sci. 9, 3-4 (2014), 211–407.
[36]
Cynthia Dwork and Guy N. Rothblum. 2016. Concentrated differential privacy. arXiv preprint arXiv:1603.01887 (2016).
[37]
Murat A. Erdogdu, Rasa Hosseinzadeh, and Matthew S. Zhang. 2022. Convergence of Langevin Monte Carlo in chi-squared and Rényi divergence. In Conference on Artificial Intelligence and Statistics, Vol. 151. PMLR, 8151–8175.
[38]
Donald L. Ermak and Helen Buckholz. 1980. Numerical integration of the Langevin equation: Monte Carlo simulation. J. Comput. Phys. 35, 2 (1980), 169–182.
[39]
Jiaojiao Fan, Bo Yuan, and Yongxin Chen. 2023. Improved dimension dependence of a proximal algorithm for sampling. In 36th Conference on Learning Theory(Proceedings of Machine Learning Research, Vol. 195), Gergely Neu and Lorenzo Rosasco (Eds.). PMLR, 1473–1521.
[40]
Vitaly Feldman, Tomer Koren, and Kunal Talwar. 2020. Private stochastic convex optimization: Optimal rates in linear time. In Symposium on the Theory of Computing. 439–449.
[41]
Vitaly Feldman, Ilya Mironov, Kunal Talwar, and Abhradeep Thakurta. 2018. Privacy amplification by iteration. In Symposium on Foundations of Computer Science. IEEE, 521–532.
[42]
James Foster, Terry Lyons, and Harald Oberhauser. 2021. The shifted ODE method for underdamped Langevin MCMC. arXiv preprint arXiv:2101.03446 (2021).
[43]
Roy Frostig, Rong Ge, Sham Kakade, and Aaron Sidford. 2015. Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization. In International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 37). PMLR, 2540–2548.
[44]
Arun Ganesh and Kunal Talwar. 2020. Faster differentially private samplers via Rényi divergence analysis of discretized Langevin MCMC. In Advances in Neural Information Processing Systems, Vol. 33. 7222–7233.
[45]
Andrew Gelman, Daniel Lee, and Jiqiang Guo. 2015. Stan: A probabilistic programming language for Bayesian inference and optimization. J. Educ. Behav. Stat. 40, 5 (2015), 530–543.
[46]
Sivakanth Gopi, Yin Tat Lee, and Daogao Liu. 2022. Private convex optimization via exponential mechanism. In Conference on Learning Theory, Vol. 178. PMLR, 1948–1989.
[47]
Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, and Kevin Tian. 2023a. Algorithmic aspects of the log-Laplace transform and a non-Euclidean proximal sampler. In 36th Conference on Learning Theory(Proceedings of Machine Learning Research, Vol. 195), Gergely Neu and Lorenzo Rosasco (Eds.). PMLR, 2399–2439.
[48]
Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, and Kevin Tian. 2023b. Private Convex Optimization in General Norms. 5068–5089.
[49]
Leonard Gross. 1975. Logarithmic Sobolev inequalities. Am. J. Math. 97, 4 (1975), 1061–1083.
[50]
Aritra Guha, Nhat Ho, and Xuanlong Nguyen. 2023. On excess mass behavior in Gaussian mixture models with Orlicz–Wasserstein distances. In 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 11847–11870.
[51]
Arnaud Guillin and Feng-Yu Wang. 2012. Degenerate Fokker–Planck equations: Bismut formula, gradient estimate and Harnack inequality. J. Different. Equat. 253, 1 (2012), 20–40.
[52]
Ye He, Krishnakumar Balasubramanian, and Murat A. Erdogdu. 2020. On the ergodicity, bias and asymptotic normality of randomized midpoint sampling method. In Advances in Neural Information Processing Systems, Vol. 33, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). Curran Associates, Inc., 7366–7376.
[53]
Lars Hörmander. 1967. Hypoelliptic second order differential equations. Acta Math. 119 (1967), 147–171.
[54]
Richard Jordan, David Kinderlehrer, and Felix Otto. 1998. The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29, 1 (1998), 1–17.
[55]
Ravi Kannan, László Lovász, and Miklós Simonovits. 1995. Isoperimetric problems for convex bodies and a localization lemma. Discr. Computat. Geom. 13, 3 (1995), 541–559.
[56]
Martin Kell. 2017. On interpolation and curvature via Wasserstein geodesics. Adv. Calcul. Variat. 10, 2 (2017), 125–167.
[57]
Bo’az Klartag and Joseph Lehec. 2022. Bourgain’s slicing problem and KLS isoperimetry up to polylog. Geom. Funct. Anal. 32, 5 (2022), 1134–1159.
[58]
Bo’az Klartag and Eli Putterman. 2021. Spectral monotonicity under Gaussian convolution. arXiv preprint arXiv:2107.09496 (2021).
[59]
Andrey Kolmogorov. 1934. Zufallige bewegungen (zur theorie der Brownschen bewegung). Ann. Math. (1934), 116–117.
[60]
Jean-François Le Gall. 2016. Brownian Motion, Martingales, and Stochastic Calculus(Graduate Texts in Mathematics, Vol. 274). Springer.
[61]
Yin Tat Lee, Ruoqi Shen, and Kevin Tian. 2020. Logsmooth gradient concentration and tighter runtimes for Metropolized Hamiltonian Monte Carlo. In Conference on Learning Theory, Vol. 125. PMLR, 2565–2597.
[62]
Yin Tat Lee, Ruoqi Shen, and Kevin Tian. 2021a. Lower bounds on Metropolized sampling methods for well-conditioned distributions. Adv. Neural Inf. Process. Syst. 34 (2021), 18812–18824.
[63]
Yin Tat Lee, Ruoqi Shen, and Kevin Tian. 2021b. Structured logconcave sampling with a restricted Gaussian oracle. In Conference on Learning Theory, Vol. 134. PMLR, 2993–3050.
[64]
Mufan B. Li and Murat A. Erdogdu. 2023. Riemannian Langevin algorithm for solving semidefinite programs. Bernoulli 29, 4 (2023), 3093–3113.
[65]
Jiaming Liang and Yongxin Chen. 2021. A proximal algorithm for sampling from non-smooth potentials. arXiv preprint arXiv:2110.04597 (2021).
[66]
Jiaming Liang and Yongxin Chen. 2023. A proximal algorithm for sampling. Trans. Mach. Learn. Res. (2023).
[67]
Hongzhou Lin, Julien Mairal, and Zaid Harchaoui. 2015. A universal catalyst for first-order optimization. In Advances in Neural Information Processing Systems, Vol. 28. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc.
[68]
Yuan Liu. 2020. The Poincaré inequality and quadratic transportation-variance inequalities. Electron. J. Probabil. 25 (2020), 16.
[69]
László Lovász and Miklós Simonovits. 1993. Random walks in a convex body and an improved volume algorithm. Rand. Struct. Algor. 4, 4 (1993), 359–412.
[70]
Jianfeng Lu and Lihan Wang. 2022. Complexity of zigzag sampling algorithm for strongly log-concave distributions. Stat. Comput. 32, 3 (2022), 12.
[71]
Yi-An Ma, Niladri S. Chatterji, Xiang Cheng, Nicolas Flammarion, Peter L. Bartlett, and Michael I. Jordan. 2021. Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli 27, 3 (2021), 1942–1992.
[72]
Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In Symposium on Foundations of Computer Science. IEEE, 94–103.
[73]
Ilya Mironov. 2017. Rényi differential privacy. In Computer Security Foundations Symposium. IEEE, 263–275.
[74]
Pierre Monmarché. 2021. High-dimensional MCMC with a standard splitting scheme for the underdamped Langevin diffusion. Electron. J. Stat. 15, 2 (2021), 4117–4166.
[75]
Radford M. Neal et al. 2011. MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo 2, 11 (2011), 2.
[76]
Arkadii S. Nemirovsky and David B. Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, Inc., New York.
[77]
Yurii Nesterov. 1983. A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Doklady Akademii Nauk SSSR 269, 3 (1983), 543.
[78]
Felix Otto and Cédric Villani. 2000. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173, 2 (2000), 361–400.
[79]
Malempati Madhusudana Rao and Zhong Dao Ren. 1991. Theory of Orlicz Spaces. Vol. 146. M. Dekker New York.
[80]
Gareth O. Roberts and Jeffrey S. Rosenthal. 1998. Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc., Series B, Stat. Methodol. 60, 1 (1998), 255–268.
[81]
Julien Roussel and Gabriel Stoltz. 2018. Spectral methods for Langevin dynamics and associated error estimates. ESAIM. Math. Model. Numer. Anal. 52, 3 (2018), 1051–1083.
[82]
Theo Ryffel, Francis Bach, and David Pointcheval. 2022. Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics. (2022). arXiv:2201.11980.
[83]
Ruoqi Shen and Yin Tat Lee. 2019. The randomized midpoint method for log-concave sampling. Adv. Neural Inf. Process. Syst. 32 (2019).
[84]
Matteo Sordello, Zhiqi Bu, and Jinshuo Dong. 2021. Privacy amplification via iteration for shuffled and online PNSGD. In Machine Learning and Knowledge Discovery in Databases. Research Track, Nuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, and Jose A. Lozano (Eds.). Springer International Publishing, Cham, 796–813.
[85]
Karl-Theodor Sturm. 2011. Generalized Orlicz spaces and Wasserstein distances for convex-concave scale functions. Bulletin des Sciences Mathématiques 135, 6-7 (2011), 795–802.
[86]
Michalis K. Titsias and Omiros Papaspiliopoulos. 2018. Auxiliary gradient-based sampling algorithms. J. R. Stat. Soc.: Series B (Stat. Methodol.) 80, 4 (2018), 749–767.
[87]
M. M. Tropper. 1977. Ergodic and quasideterministic properties of finite-dimensional stochastic systems. J. Stat. Phys. 17 (1977), 491–509.
[88]
Tim Van Erven and Peter Harremos. 2014. Rényi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theor. 60, 7 (2014), 3797–3820.
[89]
Santosh Vempala and Andre Wibisono. 2019. Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In Advances in Neural Information Processing Systems, Vol. 32. 8092–8104.
[90]
Roman Vershynin. 2018. High-dimensional Probability: An Introduction with Applications in Data Science(Cambridge Series in Statistical and Probabilistic Mathematics, Vol. 47). Cambridge University Press.
[91]
Cédric Villani. 2009a. Hypocoercivity. Mem. Am. Math. Soc. 202, 950 (2009), 141.
[92]
Cédric Villani. 2009b. Optimal Transport: Old and New(Grundlehren der Mathematischen Wissenschaften, Vol. 338). Springer-Verlag, Berlin.
[93]
Andre Wibisono. 2018. Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem. In Conference on Learning Theory. PMLR, 2093–3027.
[94]
Andre Wibisono. 2019. Proximal Langevin algorithm: Rapid convergence under isoperimetry. arXiv preprint arXiv:1911.01469 (2019).
[95]
Keru Wu, Scott Schmidler, and Yuansi Chen. 2022. Minimax mixing time of the Metropolis-adjusted Langevin algorithm for log-concave sampling. J. Mach. Learn. Res. 23, 270 (2022), 1–63.
[96]
Jiayuan Ye and Reza Shokri. 2022. Differentially private learning needs hidden state (or much faster convergence). In Advances in Neural Information Processing Systems, Vol. 35. S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Curran Associates, Inc., 703–715.
[97]
Matthew S. Zhang, Sinho Chewi, Mufan B. Li, Krishnakumar Balasubramanian, and Murat A. Erdogdu. 2023. Improved discretization analysis for underdamped Langevin Monte Carlo. In 36th Conference on Learning Theory(Proceedings of Machine Learning Research, Vol. 195), Gergely Neu and Lorenzo Rosasco (Eds.). PMLR, 36–71.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 71, Issue 3
June 2024
323 pages
EISSN:1557-735X
DOI:10.1145/3613558
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024
Online AM: 20 March 2024
Accepted: 11 March 2024
Revised: 18 November 2023
Received: 18 November 2023
Published in JACM Volume 71, Issue 3

Check for updates

Author Tags

  1. Metropolis-adjusted Langevin algorithm
  2. log-concave sampling
  3. proximal sampler
  4. shifted divergence
  5. underdamped Langevin
  6. warm start

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 239
    Total Downloads
  • Downloads (Last 12 months)239
  • Downloads (Last 6 weeks)53
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media