research-article

Open access

Language-Agnostic Static Analysis of Probabilistic Programs

Authors:

Michael Schröder,

Jürgen CitoAuthors Info & Claims

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Pages 78 - 90

https://doi.org/10.1145/3691620.3695031

Published: 27 October 2024 Publication History

Abstract

Probabilistic programming allows developers to focus on the modeling aspect in the Bayesian workflow by abstracting away the posterior inference machinery. In practice, however, programming errors specific to the probabilistic environment are hard to fix without deep knowledge of the underlying systems. Like in classical software engineering, static program analysis methods could be employed to catch many of these errors. In this work, we present the first framework to formulate static analyses for probabilistic programs in a language-agnostic manner: LASAPP. While prior work focused on specific languages, all analyses written with our framework can be readily applied to new languages by adding easy-to-implement API bindings. Our prototype supports five popular probabilistic programming languages out-of-the-box. We demonstrate the effectiveness and expressiveness of the LASAPP framework by presenting four provably-correct language-agnostic probabilistic program analyses that address problems discussed in the literature and evaluate them on over 200 real-world programs.

References

[1]

Götz Alefeld and Günter Mayer. 2000. Interval analysis: theory and applications. Journal of computational and applied mathematics 121, 1--2 (2000), 421--464.

Digital Library

[2]

Gilles Barthe, Joost-Pieter Katoen, and Alexandra Silva. 2020. Foundations of Probabilistic Programming. Cambridge University Press.

[3]

Guillaume Baudart, Javier Burroni, Martin Hirzel, Louis Mandel, and Avraham Shinnar. 2021. Compiling Stan to generative probabilistic languages and extension to deep probabilistic programming. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 497--510.

Digital Library

[4]

Ryan Bernstein. 2019. Static analysis for probabilistic programs. arXiv preprint arXiv:1909.05076 (2019).

[5]

Ryan Bernstein. 2023. Abstractions for Probabilistic Programming to Support Model Development. Ph.D. Dissertation. Columbia University.

[6]

Ryan Bernstein, Matthijs Vákár, and Jeannette Wing. 2020. Transforming probabilistic programs for model checking. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference. 149--159.

Digital Library

[7]

Mathieu Besançon, Theodore Papamarkou, David Anthoff, Alex Arslan, Simon Byrne, Dahua Lin, and John Pearson. 2021. Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem. Journal of Statistical Software 98, 16 (2021), 1--30.

[8]

Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. 2019. Pyro: Deep universal probabilistic programming. The Journal of Machine Learning Research 20, 1 (2019), 973--978.

Digital Library

[9]

Markus Böck. 2024. LASAPP: Language-Agnostic Static Analysis of Probabilistic Programs.

[10]

Guillaume Claret, Sriram K Rajamani, Aditya V Nori, Andrew D Gordon, and Johannes Borgström. 2013. Bayesian inference using data flow analysis. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. 92--102.

Digital Library

[11]

Keith D Cooper and Linda Torczon. 2011. Engineering a compiler. Elsevier.

[12]

Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. 238--252.

Digital Library

[13]

Marco F Cusumano-Towner and Vikash K Mansinghka. 2018. Using probabilistic programs as proposals. arXiv preprint arXiv:1801.03612 (2018).

[14]

Marco F Cusumano-Towner, Feras A Saad, Alexander K Lew, and Vikash K Mansinghka. 2019. Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th acm sigplan conference on programming language design and implementation. 221--236.

Digital Library

[15]

Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 4 (1991), 451--490.

Digital Library

[16]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings 14. Springer, 337--340.

[17]

Vijay D'silva, Daniel Kroening, and Georg Weissenbacher. 2008. A survey of automated techniques for formal software verification. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 7 (2008), 1165--1178.

Digital Library

[18]

Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 422--431.

[19]

Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2015. Boa: Ultra-large-scale software repository and source-code mining. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 (2015), 1--34.

Digital Library

[20]

EvoArt. [n. d.]. HMC issue in the Julia Turing Forum. https://discourse.julialang.org/t/turing-inexacterror-for-discreteuniform-distribution-with-nuts-sampler/52820. Accessed: 2023-07-24.

[21]

Hong Ge, Kai Xu, and Zoubin Ghahramani. 2018. Turing: a language for flexible probabilistic inference. In International conference on artificial intelligence and statistics. PMLR, 1682--1690.

[22]

Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. 1995. Bayesian data analysis. Chapman and Hall/CRC.

[23]

Andrew D Gordon, Thomas A Henzinger, Aditya V Nori, and Sriram K Rajamani. 2014. Probabilistic programming. In Future of Software Engineering Proceedings. 167--181.

Digital Library

[24]

Maria I Gorinova, Andrew D Gordon, and Charles Sutton. 2018. SlicStan: Improving Probabilistic Programming using Information Flow Analysis. In Workshop on Probabilistic Programming Languages, Semantics, and Systems (PPS). https://pps2018.soic.indiana.edu/files/2017/12/SlicStanPPS.pdf.

[25]

Maria I Gorinova, Advait Sarkar, Alan F Blackwell, and Don Syme. 2016. A live, multiple-representation probabilistic programming environment for novices. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 2533--2537.

Digital Library

[26]

Anjana Gosain and Ganga Sharma. 2015. Static analysis: A survey of techniques and tools. In Intelligent Computing and Applications: Proceedings of the International Conference on ICA, 22--24 December 2014. Springer, 581--591.

[27]

John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach. Elsevier.

Digital Library

[28]

Matthew D Hoffman, Andrew Gelman, et al. 2014. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1 (2014), 1593--1623.

[29]

Gerard J Holzmann. 2017. Cobra: a light-weight tool for static and dynamic program analysis. Innovations in Systems and Software Engineering 13, 1 (2017), 35--49.

Digital Library

[30]

Chung-Kil Hur, Aditya V Nori, Sriram K Rajamani, and Selva Samuel. 2014. Slicing probabilistic programs. ACM SIGPLAN Notices 49, 6 (2014), 133--144.

Digital Library

[31]

jianlin. [n. d.]. HMC issue in the Pyro Forum. https://forum.pyro.ai/t/mcmc-discrete-rv-parallelization-is-there-anyway-to-stop-pyro-automatically-vectorizing-tensors/4160. Accessed: 2023-07-24.

[32]

Alicia A Johnson, Miles Q Ott, and Mine Dogucu. 2022. Bayes rules!: An introduction to applied Bayesian modeling. Chapman and Hall/CRC.

[33]

James C King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (1976), 385--394.

Digital Library

[34]

Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.

Digital Library

[35]

Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International symposium on code generation and optimization, 2004. CGO 2004. IEEE, 75--86.

[36]

Michael D Lee and Eric-Jan Wagenmakers. 2014. Bayesian cognitive modeling: A practical course. Cambridge university press.

[37]

Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2019. Towards verified stochastic variational inference for probabilistic programs. Proceedings of the ACM on Programming Languages 4, POPL (2019), 1--33.

[38]

Alexander K Lew, Marco F Cusumano-Towner, Benjamin Sherman, Michael Carbin, and Vikash K Mansinghka. 2019. Trace types and denotational semantics for sound programmable inference in probabilistic languages. Proceedings of the ACM on Programming Languages 4, POPL (2019), 1--32.

[39]

Jianlin Li, Leni Ven, Pengyuan Shi, and Yizhou Zhang. 2023. Type-preserving, dependence-aware guide generation for sound, effective amortized probabilistic inference. Proceedings of the ACM on Programming Languages 7, POPL (2023), 1454--1482.

Digital Library

[40]

Carol Mak, C-H Luke Ong, Hugo Paquet, and Dominik Wagner. 2021. Densities of almost surely terminating probabilistic programs are differentiable almost everywhere. In Programming Languages and Systems: 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27-April 1, 2021, Proceedings 30. Springer International Publishing, 432--461.

[41]

Carol Mak, Fabian Zaiser, and Luke Ong. 2021. Nonparametric Hamiltonian Monte Carlo. In International Conference on Machine Learning. PMLR, 7336--7347.

[42]

Richard McElreath. 2018. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC.

[43]

Hadi Mohasel Afshar and Justin Domke. 2015. Reflection, refraction, and hamiltonian monte carlo. Advances in neural information processing systems 28 (2015).

[44]

Ramon E Moore. 1966. Interval analysis. Vol. 4. Prentice-Hall Englewood Cliffs.

[45]

Chandrakana Nandi, Dan Grossman, Adrian Sampson, Todd Mytkowicz, and Kathryn S McKinley. 2017. Debugging probabilistic programs. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. 18--26.

Digital Library

[46]

Akihiko Nishimura, David B Dunson, and Jianfeng Lu. 2020. Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods. Biometrika 107, 2 (2020), 365--380.

[47]

Aditya Nori, Chung-Kil Hur, Sriram Rajamani, and Selva Samuel. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.

[48]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv:1912.01703 [cs.LG]

[49]

pymc devs. 2024. pymc-resources. https://github.com/pymc-devs/pymc-resources/tree/a5f993653e467da11e9fc4ec682e96d59b880102.

[50]

Caitlin Sadowski, Jeffrey Van Gogh, Ciera Jaspan, Emma Soderberg, and Collin Winter. 2015. Tricorder: Building a program analysis ecosystem. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 598--608.

Digital Library

[51]

John Salvatier, Thomas V Wiecki, and Christopher Fonnesbeck. 2016. Probabilistic programming in Python using PyMC3. Peer J Computer Science 2 (2016), e55.

[52]

Adrian Sampson, Pavel Panchekha, Todd Mytkowicz, Kathryn S McKinley, Dan Grossman, and Luis Ceze. 2014. Expressing and verifying probabilistic assertions. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. 112--122.

Digital Library

[53]

StatisticalRethinkingJulia. 2024. SR2TuringPluto. https://github.com/StatisticalRethinkingJulia/SR2TuringPluto.jl/tree/75072280947a45f030bd45a62710c558d60a2a80.

[54]

Sam Staton, Hongseok Yang, Frank Wood, Chris Heunen, and Ohad Kammar. 2016. Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science. 525--534.

Digital Library

[55]

Nazanin Tehrani, Nimar S Arora, Yucen Lily Li, Kinjal Divesh Shah, David Noursi, Michael Tingley, Narjes Torabi, Eric Lippert, Erik Meijer, et al. 2020. Bean machine: A declarative probabilistic programming language for efficient programmable inference. In International Conference on Probabilistic Graphical Models. PMLR, 485--496.

[56]

TuringLang. 2024. TuringTutorials. https://github.com/TuringLang/TuringTutorials/tree/8515a567321adf1531974dd14eb29c00eea05648.

[57]

Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. 2018. An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756 (2018).

[58]

Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Sound probabilistic inference via guide types. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 788--803.

Digital Library

[59]

Yuan Zhou, Bradley J Gram-Hansen, Tobias Kohn, Tom Rainforth, Hongseok Yang, and Frank Wood. 2019. LF-PPL: A low-level first order probabilistic programming language for non-differentiable models. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 148--157.

[60]

Yuan Zhou, Hongseok Yang, Yee Whye Teh, and Tom Rainforth. 2020. Divide, conquer, and combine: a new inference strategy for probabilistic programs with stochastic support. In International Conference on Machine Learning. PMLR, 11534--11545.

[61]

Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318 (2021).

Index Terms

Language-Agnostic Static Analysis of Probabilistic Programs

Recommendations

PMAF: an algebraic framework for static analysis of probabilistic programs
PLDI '18

Automatically establishing that a probabilistic program satisfies some property ϕ is a challenging problem. While a sampling-based approach—which involves running the program repeatedly—can suggest that ϕ holds, to establish that the program satisfies ϕ,...
PMAF: an algebraic framework for static analysis of probabilistic programs
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

Automatically establishing that a probabilistic program satisfies some property ϕ is a challenging problem. While a sampling-based approach—which involves running the program repeatedly—can suggest that ϕ holds, to establish that the program satisfies ϕ,...
Static analysis for probabilistic programs: inferring whole program properties from finitely many paths
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyber-physical systems. Correctness ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

October 2024

2587 pages

ISBN:9798400712487

DOI:10.1145/3691620

General Chair:
Vladimir Filkov,
Program Co-chairs:
Baishakhi Ray
Columbia University, USA; AWS AI Lab
,
Minghui Zhou
Peking University, China

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Österreichische Forschungsförderungsgesellschaft

Conference

ASE '24

Sponsor:

ASE '24: 39th IEEE/ACM International Conference on Automated Software Engineering

October 27 - November 1, 2024

CA, Sacramento, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
96
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)44

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents