Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3157096.3157322guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

A probabilistic programming approach to probabilistic data analysis

Published: 05 December 2016 Publication History

Abstract

Probabilistic techniques are central to data analysis, but different approaches can be challenging to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include discriminative machine learning, hierarchical Bayesian models, multivariate kernel methods, clustering algorithms, and arbitrary probabilistic programs. We demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling definition language and structured query language. The practical value is illustrated in two ways. First, the paper describes an analysis on a database of Earth satellites, which identifies records that probably violate Kepler's Third Law by composing causal probabilistic programs with non-parametric Bayes in 50 lines of probabilistic code. Second, it reports the lines of code and accuracy of CGPMs compared with baseline solutions from standard machine learning libraries.

References

[1]
B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. J Stat Softw, 2016.
[2]
G. Casella and R. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Thomson Learning, 2002.
[3]
M. Davidian and D. M. Giltinan. Nonlinear models for repeated measurement data, volume 62. CRC press, 1995.
[4]
L. Devroye. Sample-based non-uniform random variate generation. In Proceedings of the 18th conference on Winter simulation, pages 260-265. ACM, 1986.
[5]
D. Fink. A compendium of conjugate priors. 1997.
[6]
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 99, Stockholm, Sweden, July 31 - August 6, 1999. 2 Volumes, 1450 pages, pages 1300-1309, 1999.
[7]
D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
[8]
V. Mansinghka, D. Selsam, and Y. Perov. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, abs/1404.0099, 2014.
[9]
V. Mansinghka, P. Shafto, E. Jonas, C. Petschulat, M. Gasner, and J. B. Tenenbaum. Crosscat: A fully bayesian nonparametric method for analyzing heterogeneous, high dimensional data. arXiv preprint arXiv:1512.01272, 2015.
[10]
V. Mansinghka, R. Tibbetts, J. Baxter, P. Shafto, and B. Eaves. Bayesdb: A probabilistic programming system for querying the probable implications of data. arXiv preprint arXiv:1512.05006, 2015.
[11]
B. Milch, B. Marthi, S. Russell, D. Sontag, D. L. Ong, and A. Kolobov. 1 blog: Probabilistic models with unknown objects. Statistical relational learning, page 373, 2007.
[12]
U. of Concerned Scientists. UCS Satellite Database, 2015.
[13]
A. Pfeffer. Figaro: An object-oriented probabilistic programming language. Charles River Analytics Technical Report, 137, 2009.
[14]
F. Saad and V. Mansinghka. Probabilistic data analysis with probabilistic programming. arXiv preprint arXiv:1608.05347, 2016.
[15]
D. J. Spiegelhalter, A. Thomas, N. G. Best, W. Gilks, and D. Lunn. Bugs: Bayesian inference using gibbs sampling. Version 0.5,(version ii) http://www.mrc-bsu.cam.ac.uk/bugs, 19, 1996.

Cited By

View all
  • (2020)Scaling exact inference for discrete probabilistic programsProceedings of the ACM on Programming Languages10.1145/34282084:OOPSLA(1-31)Online publication date: 13-Nov-2020
  1. A probabilistic programming approach to probabilistic data analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems
    December 2016
    5100 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 05 December 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Scaling exact inference for discrete probabilistic programsProceedings of the ACM on Programming Languages10.1145/34282084:OOPSLA(1-31)Online publication date: 13-Nov-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media