Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Fabular: regression formulas as probabilistic programming

Published: 11 January 2016 Publication History

Abstract

Regression formulas are a domain-specific language adopted by several R packages for describing an important and useful class of statistical models: hierarchical linear regressions. Formulas are succinct, expressive, and clearly popular, so are they a useful addition to probabilistic programming languages? And what do they mean? We propose a core calculus of hierarchical linear regression, in which regression coefficients are themselves defined by nested regressions (unlike in R). We explain how our calculus captures the essence of the formula DSL found in R. We describe the design and implementation of Fabular, a version of the Tabular schema-driven probabilistic programming language, enriched with formulas based on our regression calculus. To the best of our knowledge, this is the first formal description of the core ideas of R's formula notation, the first development of a calculus of regression formulas, and the first demonstration of the benefits of composing regression formulas and latent variables in a probabilistic programming language.

References

[1]
D. Bates, M. Mächler, B. Bolker, and S. Walker. Fitting Linear Mixed-Effects Models using lme4. ArXiv, 2014. arXiv:1406.5823 {stat.CO}. S. Bhat, J. Borgström, A. D. Gordon, and C. V. Russo. Deriving probability density functions from probabilistic functional programs. In N. Peterman and S. Smolka, editors, Tools and Algorithms for the Construction and Analysis of Systems (TACAS’13), volume 7795 of Lecture Notes in Computer Science, pages 508–522. Springer, 2013.
[2]
J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. V. Gael. Measure transformer semantics for Bayesian machine learning. Logical Methods in Computer Science, 9(3), 2013. Preliminary version at ESOP’11. J. Borgström, A. D. Gordon, L. Ouyang, C. Russo, A. Ścibior, and M. Szymczak. Fabular: Regression formulas as probabilistic programming. Technical Report MSR–TR–2015–83, Microsoft Research, 2015.
[3]
V. Dorie. Mixed Methods for Mixed Models. PhD thesis, Columbia University, 2014.
[4]
A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2007.
[5]
W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. The Statistician, 43:169–178, 1994.
[6]
N. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. In Uncertainty in Artificial Intelligence (UAI’08), pages 220–229. AUAI Press, 2008.
[7]
N. D. Goodman. The principles and practice of probabilistic programming. In Principles of Programming Languages (POPL’13), pages 399–402, 2013.
[8]
A. D. Gordon, M. Aizatulin, J. Borgström, G. Claret, T. Graepel, A. Nori, S. Rajamani, and C. Russo. A model-learner pattern for Bayesian reasoning. In Principles of Programming Languages (POPL’13), 2013.
[9]
A. D. Gordon, T. Graepel, N. Rolland, C. V. Russo, J. Borgström, and J. Guiver. Tabular: a schema-driven probabilistic programming language. In Principles of Programming Languages (POPL’14), 2014a. A. D. Gordon, T. A. Henzinger, A. V. Nori, and S. K. Rajamani. Probabilistic programming. In Future of Software Engineering (FOSE 2014), pages 167–181, 2014b. A. D. Gordon, C. V. Russo, M. Szymczak, J. Borgström, N. Rolland, T. Graepel, and D. Tarlow. Probabilistic programs as spreadsheet queries. In J. Vitek, editor, Programming Languages and Systems (ESOP 2015), volume 9032 of Lecture Notes in Computer Science, pages 1–25. Springer, 2015.
[10]
R. Hahn. Statistical formula notation in R. URL http: //faculty.chicagobooth.edu/richard.hahn/teaching/ FormulaNotation.pdf. O. Kiselyov and C. Shan. Embedded probabilistic programming. In Conference on Domain-Specific Languages, volume 5658 of Lecture Notes in Computer Science, pages 360–384. Springer, 2009.
[11]
D. Lunn, C. Jackson, N. Best, A. Thomas, and D. Spiegelhalter. The BUGS Book. CRC Press, 2013.
[12]
V. Mansinghka, D. Selsam, and Y. Perov. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, 2014. arXiv:1404.0099v1 {cs.AI}. B. Milch, B. Marthi, S. J. Russell, D. Sontag, D. L. Ong, and A. Kolobov. Statistical Relational Learning, chapter BLOG: Probabilistic Models with Unknown Objects. MIT Press, 2007.
[13]
T. Minka, J. Winn, J. Guiver, and A. Kannan. Infer.NET 2.3, Nov. 2009. Software available from http://research.microsoft.com/ infernet. T. P. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology, 2001.
[14]
F. Morandat, B. Hill, L. Osvald, and J. Vitek. Evaluating the design of the R language - objects and functions for data analysis. In J. Noble, editor, ECOOP 2012 - Object-Oriented Programming, volume 7313 of Lecture Notes in Computer Science, pages 104–131. Springer, 2012.
[15]
A. V. Nori, C.-K. Hur, S. K. Rajamani, and S. Samuel. R2: An efficient MCMC sampler for probabilistic programs. In Conference on Artificial Intelligence. AAAI, July 2014.
[16]
B. Paige and F. Wood. A compilation target for probabilistic programming languages. In ICML, 2014.
[17]
A. Pfeffer. Figaro: An object-oriented probabilistic programming language. Technical report, Charles River Analytics, 2009.
[18]
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2015. URL http://www.R-project.org/. S. R. Riedel, S. Singh, V. Srikumar, T. Rocktäschel, L. Visengeriyeva, and J. Noessner. WOLFE: strength reduction and approximate programming for probabilistic programming. In Statistical Relational Artificial Intelligence (StarAI 2014), volume WS-14-13 of AAAI Technical Report. The AAAI Press, 2014.
[19]
Stan Development Team. Stan: A C++ library for probability and sampling, version 2.2, 2014a. URL http://mc-stan.org/. Stan Development Team. RStan: the R interface to Stan, version 2.5.0, 2014b. URL http://mc-stan.org/rstan.html. D. H. Stern, R. Herbrich, and T. Graepel. Matchbox: large scale online Bayesian recommendations. In J. Quemada, G. León, Y. S. Maarek, and W. Nejdl, editors, Proceedings of the 18th International Conference on World Wide Web (WWW 2009), pages 111–120. ACM, 2009.
[20]
S. E. Whaley, M. Sigman, C. Neumann, N. Bwibo, D. Guthrie, R. E. Weiss, S. Alber, and S. P. Murphy. The impact of dietary intervention on the cognitive development of Kenyan school children. The Journal of Nutrition, 133(11):3965S–3971S, 2003.
[21]
F. Wood, J. W. van de Meent, and V. Mansinghka. A new approach to probabilistic programming inference. In Proceedings of the 17th International conference on Artificial Intelligence and Statistics, volume 33 of JMLR Workshop and Conference Proceedings, 2014.
[22]
arXiv:1403.0504v2 {cs.AI}.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 1
POPL '16
January 2016
815 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2914770
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
    January 2016
    815 pages
    ISBN:9781450335492
    DOI:10.1145/2837614
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2016
Published in SIGPLAN Volume 51, Issue 1

Check for updates

Author Tags

  1. Bayesian inference
  2. hierarchical models
  3. linear regression
  4. probabilistic programming
  5. relational data

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media