Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3691620.3695031acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open access

Language-Agnostic Static Analysis of Probabilistic Programs

Published: 27 October 2024 Publication History

Abstract

Probabilistic programming allows developers to focus on the modeling aspect in the Bayesian workflow by abstracting away the posterior inference machinery. In practice, however, programming errors specific to the probabilistic environment are hard to fix without deep knowledge of the underlying systems. Like in classical software engineering, static program analysis methods could be employed to catch many of these errors. In this work, we present the first framework to formulate static analyses for probabilistic programs in a language-agnostic manner: LASAPP. While prior work focused on specific languages, all analyses written with our framework can be readily applied to new languages by adding easy-to-implement API bindings. Our prototype supports five popular probabilistic programming languages out-of-the-box. We demonstrate the effectiveness and expressiveness of the LASAPP framework by presenting four provably-correct language-agnostic probabilistic program analyses that address problems discussed in the literature and evaluate them on over 200 real-world programs.

References

[1]
Götz Alefeld and Günter Mayer. 2000. Interval analysis: theory and applications. Journal of computational and applied mathematics 121, 1--2 (2000), 421--464.
[2]
Gilles Barthe, Joost-Pieter Katoen, and Alexandra Silva. 2020. Foundations of Probabilistic Programming. Cambridge University Press.
[3]
Guillaume Baudart, Javier Burroni, Martin Hirzel, Louis Mandel, and Avraham Shinnar. 2021. Compiling Stan to generative probabilistic languages and extension to deep probabilistic programming. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 497--510.
[4]
Ryan Bernstein. 2019. Static analysis for probabilistic programs. arXiv preprint arXiv:1909.05076 (2019).
[5]
Ryan Bernstein. 2023. Abstractions for Probabilistic Programming to Support Model Development. Ph.D. Dissertation. Columbia University.
[6]
Ryan Bernstein, Matthijs Vákár, and Jeannette Wing. 2020. Transforming probabilistic programs for model checking. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference. 149--159.
[7]
Mathieu Besançon, Theodore Papamarkou, David Anthoff, Alex Arslan, Simon Byrne, Dahua Lin, and John Pearson. 2021. Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem. Journal of Statistical Software 98, 16 (2021), 1--30.
[8]
Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. 2019. Pyro: Deep universal probabilistic programming. The Journal of Machine Learning Research 20, 1 (2019), 973--978.
[9]
Markus Böck. 2024. LASAPP: Language-Agnostic Static Analysis of Probabilistic Programs.
[10]
Guillaume Claret, Sriram K Rajamani, Aditya V Nori, Andrew D Gordon, and Johannes Borgström. 2013. Bayesian inference using data flow analysis. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. 92--102.
[11]
Keith D Cooper and Linda Torczon. 2011. Engineering a compiler. Elsevier.
[12]
Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. 238--252.
[13]
Marco F Cusumano-Towner and Vikash K Mansinghka. 2018. Using probabilistic programs as proposals. arXiv preprint arXiv:1801.03612 (2018).
[14]
Marco F Cusumano-Towner, Feras A Saad, Alexander K Lew, and Vikash K Mansinghka. 2019. Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th acm sigplan conference on programming language design and implementation. 221--236.
[15]
Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 4 (1991), 451--490.
[16]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings 14. Springer, 337--340.
[17]
Vijay D'silva, Daniel Kroening, and Georg Weissenbacher. 2008. A survey of automated techniques for formal software verification. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 7 (2008), 1165--1178.
[18]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 422--431.
[19]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2015. Boa: Ultra-large-scale software repository and source-code mining. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 (2015), 1--34.
[20]
EvoArt. [n. d.]. HMC issue in the Julia Turing Forum. https://discourse.julialang.org/t/turing-inexacterror-for-discreteuniform-distribution-with-nuts-sampler/52820. Accessed: 2023-07-24.
[21]
Hong Ge, Kai Xu, and Zoubin Ghahramani. 2018. Turing: a language for flexible probabilistic inference. In International conference on artificial intelligence and statistics. PMLR, 1682--1690.
[22]
Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. 1995. Bayesian data analysis. Chapman and Hall/CRC.
[23]
Andrew D Gordon, Thomas A Henzinger, Aditya V Nori, and Sriram K Rajamani. 2014. Probabilistic programming. In Future of Software Engineering Proceedings. 167--181.
[24]
Maria I Gorinova, Andrew D Gordon, and Charles Sutton. 2018. SlicStan: Improving Probabilistic Programming using Information Flow Analysis. In Workshop on Probabilistic Programming Languages, Semantics, and Systems (PPS). https://pps2018.soic.indiana.edu/files/2017/12/SlicStanPPS.pdf.
[25]
Maria I Gorinova, Advait Sarkar, Alan F Blackwell, and Don Syme. 2016. A live, multiple-representation probabilistic programming environment for novices. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 2533--2537.
[26]
Anjana Gosain and Ganga Sharma. 2015. Static analysis: A survey of techniques and tools. In Intelligent Computing and Applications: Proceedings of the International Conference on ICA, 22--24 December 2014. Springer, 581--591.
[27]
John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach. Elsevier.
[28]
Matthew D Hoffman, Andrew Gelman, et al. 2014. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1 (2014), 1593--1623.
[29]
Gerard J Holzmann. 2017. Cobra: a light-weight tool for static and dynamic program analysis. Innovations in Systems and Software Engineering 13, 1 (2017), 35--49.
[30]
Chung-Kil Hur, Aditya V Nori, Sriram K Rajamani, and Selva Samuel. 2014. Slicing probabilistic programs. ACM SIGPLAN Notices 49, 6 (2014), 133--144.
[31]
jianlin. [n. d.]. HMC issue in the Pyro Forum. https://forum.pyro.ai/t/mcmc-discrete-rv-parallelization-is-there-anyway-to-stop-pyro-automatically-vectorizing-tensors/4160. Accessed: 2023-07-24.
[32]
Alicia A Johnson, Miles Q Ott, and Mine Dogucu. 2022. Bayes rules!: An introduction to applied Bayesian modeling. Chapman and Hall/CRC.
[33]
James C King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (1976), 385--394.
[34]
Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.
[35]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International symposium on code generation and optimization, 2004. CGO 2004. IEEE, 75--86.
[36]
Michael D Lee and Eric-Jan Wagenmakers. 2014. Bayesian cognitive modeling: A practical course. Cambridge university press.
[37]
Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2019. Towards verified stochastic variational inference for probabilistic programs. Proceedings of the ACM on Programming Languages 4, POPL (2019), 1--33.
[38]
Alexander K Lew, Marco F Cusumano-Towner, Benjamin Sherman, Michael Carbin, and Vikash K Mansinghka. 2019. Trace types and denotational semantics for sound programmable inference in probabilistic languages. Proceedings of the ACM on Programming Languages 4, POPL (2019), 1--32.
[39]
Jianlin Li, Leni Ven, Pengyuan Shi, and Yizhou Zhang. 2023. Type-preserving, dependence-aware guide generation for sound, effective amortized probabilistic inference. Proceedings of the ACM on Programming Languages 7, POPL (2023), 1454--1482.
[40]
Carol Mak, C-H Luke Ong, Hugo Paquet, and Dominik Wagner. 2021. Densities of almost surely terminating probabilistic programs are differentiable almost everywhere. In Programming Languages and Systems: 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27-April 1, 2021, Proceedings 30. Springer International Publishing, 432--461.
[41]
Carol Mak, Fabian Zaiser, and Luke Ong. 2021. Nonparametric Hamiltonian Monte Carlo. In International Conference on Machine Learning. PMLR, 7336--7347.
[42]
Richard McElreath. 2018. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC.
[43]
Hadi Mohasel Afshar and Justin Domke. 2015. Reflection, refraction, and hamiltonian monte carlo. Advances in neural information processing systems 28 (2015).
[44]
Ramon E Moore. 1966. Interval analysis. Vol. 4. Prentice-Hall Englewood Cliffs.
[45]
Chandrakana Nandi, Dan Grossman, Adrian Sampson, Todd Mytkowicz, and Kathryn S McKinley. 2017. Debugging probabilistic programs. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. 18--26.
[46]
Akihiko Nishimura, David B Dunson, and Jianfeng Lu. 2020. Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods. Biometrika 107, 2 (2020), 365--380.
[47]
Aditya Nori, Chung-Kil Hur, Sriram Rajamani, and Selva Samuel. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
[48]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv:1912.01703 [cs.LG]
[49]
pymc devs. 2024. pymc-resources. https://github.com/pymc-devs/pymc-resources/tree/a5f993653e467da11e9fc4ec682e96d59b880102.
[50]
Caitlin Sadowski, Jeffrey Van Gogh, Ciera Jaspan, Emma Soderberg, and Collin Winter. 2015. Tricorder: Building a program analysis ecosystem. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 598--608.
[51]
John Salvatier, Thomas V Wiecki, and Christopher Fonnesbeck. 2016. Probabilistic programming in Python using PyMC3. Peer J Computer Science 2 (2016), e55.
[52]
Adrian Sampson, Pavel Panchekha, Todd Mytkowicz, Kathryn S McKinley, Dan Grossman, and Luis Ceze. 2014. Expressing and verifying probabilistic assertions. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. 112--122.
[53]
StatisticalRethinkingJulia. 2024. SR2TuringPluto. https://github.com/StatisticalRethinkingJulia/SR2TuringPluto.jl/tree/75072280947a45f030bd45a62710c558d60a2a80.
[54]
Sam Staton, Hongseok Yang, Frank Wood, Chris Heunen, and Ohad Kammar. 2016. Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science. 525--534.
[55]
Nazanin Tehrani, Nimar S Arora, Yucen Lily Li, Kinjal Divesh Shah, David Noursi, Michael Tingley, Narjes Torabi, Eric Lippert, Erik Meijer, et al. 2020. Bean machine: A declarative probabilistic programming language for efficient programmable inference. In International Conference on Probabilistic Graphical Models. PMLR, 485--496.
[56]
TuringLang. 2024. TuringTutorials. https://github.com/TuringLang/TuringTutorials/tree/8515a567321adf1531974dd14eb29c00eea05648.
[57]
Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. 2018. An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756 (2018).
[58]
Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Sound probabilistic inference via guide types. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 788--803.
[59]
Yuan Zhou, Bradley J Gram-Hansen, Tobias Kohn, Tom Rainforth, Hongseok Yang, and Frank Wood. 2019. LF-PPL: A low-level first order probabilistic programming language for non-differentiable models. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 148--157.
[60]
Yuan Zhou, Hongseok Yang, Yee Whye Teh, and Tom Rainforth. 2020. Divide, conquer, and combine: a new inference strategy for probabilistic programs with stochastic support. In International Conference on Machine Learning. PMLR, 11534--11545.
[61]
Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318 (2021).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering
October 2024
2587 pages
ISBN:9798400712487
DOI:10.1145/3691620
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Badges

Author Tags

  1. probabilistic programming
  2. language-agnostic
  3. program analysis

Qualifiers

  • Research-article

Funding Sources

  • Österreichische Forschungsförderungsgesellschaft

Conference

ASE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 96
    Total Downloads
  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)44
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media