Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3618260.3649744acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Parallel Sampling via Counting

Published: 11 June 2024 Publication History

Abstract

We show how to use parallelization to speed up sampling from an arbitrary distribution µ on a product space [q]n, given oracle access to counting queries: ℙX∼ µ[XSS] for any S⊆ [n] and σS ∈ [q]S. Our algorithm takes O(n2/3· polylog(n,q)) parallel time, to the best of our knowledge, the first sublinear in n runtime for arbitrary distributions. Our results have implications for sampling in autoregressive models. Our algorithm directly works with an equivalent oracle that answers conditional marginal queries ℙX∼ µ[Xii |  XSS], whose role is played by a trained neural network in autoregressive models. This suggests a roughly n1/3-factor speedup is possible for sampling in any-order autoregressive models. We complement our positive result by showing a lower bound of Ω(n1/3) for the runtime of any parallel sampling algorithm making at most poly(n) queries to the counting oracle, even for q=2.

References

[1]
Nima Anari, Callum Burgess, Kevin Tian, and Thuy-Duong Vuong. 2023. Quadratic speedups in parallel sampling from determinantal distributioPnrso.-In ceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 367-377. Nima Anari, Nathan Hu, Amin Saberi, and Aaron Schild. 2020. Sampling arborescences in parallealr.Xiv preprint arXiv: 2012. 0950.2 Nima Anari, Yizhi Huang, Tianyu Liu, Thuy-Duong Vuong, Brian Xu, and Katherine Yu. 2023. Parallel discrete sampling via continuous walPkrso.I-n ceedings of the 55th Annual ACM Symposium on Theory of Computin, g103-[ 29 ] 116.
[2]
Eric Balkanski, Aviad Rubinstein, and Yaron Singer. 2019. An exponential speedup in parallel running time for submodular maximization without loss in approximation. InProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithm. sSIAM, 283-302.
[3]
Eric Balkanski and Yaron Singer. 2020. A lower bound for parallel submodular minimization. InProccedings of the 52nd Annual ACM SIGACT Symposium on hTeory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020. ACM, [ 32 ] 130-139.
[4]
Alexander Barvinok. 2016C. ombinatorics and complexity of partition functi o. ns Vol. 30. Springer. [ 33 ] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learnAedrvsa. nces in neural information processing system, s33, 1877-1901.
[5]
Deeparnab Chakrabarty, Yu Chen, and Sanjeev Khanna. 2021. A polynomial lower bound on the number of rounds for parallel submodular function mini-mization. In62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 202. 2IEEE, 37-48.
[6]
Deeparnab Chakrabarty, Andrei Graur, Haotian Jiang, and Aaron Sidford. 2022. Improved lower bounds for submodular function minimization2. 02In2 IEEE 63rd Annual Symposium on Foundations of Computer Science (FO. CIESE)E, 245-254.
[7]
Charlie Chen, Sebastian Borgeaud, Geofrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, and John Jumper. 2023. Accelerating large language model decoding with speculative samplinga. rXiv preprint arXiv:2302.0131.8 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: pre-training of deep bidirectional transformers for language understanding.
[8]
arXiv preprint arXiv: 1810. 0480.5 Martin Dyer, Alan Frieze, and Ravi Kannan. 1991. A random polynomial-time algorithm for approximating the volume of convex bodJioeus.rnal of the ACM (JACM ), 38, 1, 1-17.
[9]
Weiming Feng, Thomas P Hayes, and Yitong Yin. 2021. Distributed metropolis sampler with optimal parallelism.PIrnoceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA.S)IAM, 2121-2140.
[10]
Hillel Gazit and Gary L Miller. 1987. A parallel algorithm for finding a separator in planar graphs. I2n8th Annual Symposium on Foundations of Computer Science (sfcs 1987 ). IEEE, 238-248.
[11]
Mark Jerrum, Alistair Sinclair, and Eric Vigoda. 2004. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries.
[12]
Journal of the ACM (JACM, ) 51, 4, 671-697.
[13]
Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. 1986. Random generation of combinatorial structures from a uniform distribuThetioorne. tical computer science, 43, 169-188.
[14]
Richard M. Karp, Eli Upfal, and Avi Wigderson. 1988. The complexity of parallel search. J. Comput. Syst. Sci., 36, 2, 225-253.
[15]
Hugo Larochelle and Iain Murray. 2011. The neural autoregressive distribution estimator. InProceedings of the fourteenth international conference on artificial intelligence and statistic.sJMLR Workshop and Conference Proceedings, 29-37.
[16]
Holden Lee. 2023. Parallelising glauber dynamaircXs. iv preprint arXiv:2307.0713.1 Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023. Fast inference from transformers via speculative decoding. IInnternational Conference on Machine Learning. PMLR, 19274-19286.
[17]
Wenzheng Li, Paul Liu, and Jan Vondrák. 2020. A polynomial lower bound on adaptive complexity of submodular maximization. PInroccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020. ACM, 140-152.
[18]
Hongyang Liu and Yitong Yin. 2022. Simple parallel algorithms for single-site dynamics. In STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20-24, 202. 2ACM, 1431-1444.
[19]
Andrea Montanari. 2008. Estimating random variables from random sparse observationsE. uropean Transactions on Telecommunicati,o1n9s, 4, 385-403.
[20]
Ketan Mulmuley, Umesh V Vazirani, and Vijay V Vazirani. 1987. Matching is as easy as matrix inversion. IPnroceedings of the nineteenth annual ACM symposium on Theory of computing, 345-354.
[21]
Prasad Raghavendra and Ning Tan. 2012. Approximating csps with global cardinality constraints using SDP hierarchiesP.rIonceedings of the TwentyhTird Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012. SIAM, 373-387.
[22]
Daniel Revuz and Marc Yor. 201C3o. ntinuous martingales and Brownian mot.ion Vol. 293. Springer Science & Business Media.
[23]
2023. Parallel sampling of difusion modelasr. Xiv preprint arXiv:2305.1631.7 Andy Shih, Dorsa Sadigh, and Stefano Ermon. 2022. Training and inference on any-order autoregressive models the right wAadyv. ances in Neural Information Processing System, s35, 2762-2775.
[24]
Yang Song, Chenlin Meng, Renjie Liao, and Stefano Ermon. 2021. Accelerating feedforward computation via parallel nonlinear equation solving. In International Conference on Machine Lear n.iPnMgLR, 9791-9800.
[25]
Daniel Štefankovič, Santosh Vempala, and Eric Vigoda. 2009. Adaptive simulated annealing: a near-optimal connection between sampling and counting.
[26]
Journal of the ACM (JACM, ) 56, 3, 1-36.
[27]
Shang-Hua Teng. 1995. Independent sets versus perfect matchingTheos. retical Computer Science, 145, 1-2, 381-390.
[28]
Aäron Van Den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. IInnternational conference on machine lear.ning PMLR, 1747-1756.
[29]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Atention is all you need. Advances in neural information processing syste, m30s.
[30]
Dror Weitz. 2006. Counting independent sets up to the tree threshold. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computi n,g 140-149.
[31]
Auke Wiggers and Emiel Hoogeboom. 2020. Predictive sampling with forecasting autoregressive models. IInnternational Conference on Machine Lear n.ing PMLR, 10260-10269.
[32]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: generalized autoregressive pretraining for language understandingA. dvances in neural information processing syste, m32s.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing
June 2024
2049 pages
ISBN:9798400703836
DOI:10.1145/3618260
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autoregressive models
  2. conditional marginals
  3. counting
  4. parallel sampling

Qualifiers

  • Research-article

Funding Sources

  • NSF

Conference

STOC '24
Sponsor:
STOC '24: 56th Annual ACM Symposium on Theory of Computing
June 24 - 28, 2024
BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 236
    Total Downloads
  • Downloads (Last 12 months)236
  • Downloads (Last 6 weeks)48
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media