research-article

Scene Grammars, Factor Graphs, and Belief Propagation

Authors:

Pedro F. FelzenszwalbAuthors Info & Claims

Journal of the ACM (JACM), Volume 67, Issue 4

Article No.: 19, Pages 1 - 41

https://doi.org/10.1145/3396886

Published: 30 May 2020 Publication History

Abstract

We describe a general framework for probabilistic modeling of complex scenes and for inference from ambiguous observations. The approach is motivated by applications in image analysis and is based on the use of priors defined by stochastic grammars. We define a class of grammars that capture relationships between the objects in a scene and provide important contextual cues for statistical inference. The distribution over scenes defined by a probabilistic scene grammar can be represented by a graphical model, and this construction can be used for efficient inference with loopy belief propagation.

We show experimental results with two applications. One application involves the reconstruction of binary contour maps. Another application involves detecting and localizing faces in images. In both applications, the same framework leads to robust inference algorithms that can effectively combine local information to reason about a scene.

References

[1]

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Tools, and Techniques. Addison-Wesley.

[2]

Yali Amit. 2002. 2D Object Detection and Recognition. MIT Press.

[3]

Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. 2011. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (May 2011), 898--916.

Digital Library

[4]

Julian Besag. 1986. On the statistical analysis of dirty pictures. J. Roy. Stat. Soc. Ser. B (Methodological) 48, 3 (1986), 259--302.

[5]

Elie Bienenstock, Stuart Geman, and Daniel Potter. 1997. Compositionality, MDL priors, and object recognition. In Advances in Neural Information Processing Systems. 838--844.

[6]

Michael Burl, Markus Weber, and Pietro Perona. 1998. A probabilistic approach to object recognition using local photometry and global geometry. In Proceedings of the European Conference on Computer Vision. 628--641.

[7]

Lo-Bin Chang, Ya Jin, Wei Zhang, Eran Borenstein, and Stuart Geman. 2011. Context, computation, and optimal ROC performance in hierarchical models. Int. J. Comput. Vis. 93, 2 (2011), 117--140.

Digital Library

[8]

Rama Chellapa and Anil Jain. 1993. Markov Random Fields: Theory and Application. Academic Press.

[9]

Noam Chomsky. 1956. Three models for the description of language. IRE Trans. Inf. Theory 2, 3 (1956), 113--124.

[10]

Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2nd ed.). The MIT Press.

[11]

Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886--893.

Digital Library

[12]

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1 (1977), 1--38.

[13]

Frank Drewes. 2006. Grammatical Picture Generation. Springer.

[14]

Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

[15]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2012. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.

[16]

Pedro F. Felzenszwalb, Ross B. Girshick, and David McAllester. 2010. Discriminatively Trained Deformable Part Models, Release 4.

[17]

Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645.

Digital Library

[18]

Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2005. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 1 (2005), 55--79.

Digital Library

[19]

Pedro F. Felzenszwalb and David McAllester. 2010. Object detection grammars. Univerity of Chicago Computer Science Technical Report 2010-02 (2010).

[20]

Pedro F. Felzenszwalb and John G. Oberlin. 2014. Multiscale fields of patterns. In Advances in Neural Information Processing Systems. 82--90.

[21]

Martin A. Fischler and Robert A. Elschlager. 1973. The representation and matching of pictorial structures. IEEE Trans. Comput. 22, 1 (1973), 67--92.

Digital Library

[22]

King Sun Fu. 1974. Syntactic Methods in Pattern Recognition. Elsevier.

[23]

Donald Geman and Bruno Jedynak. 1996. An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18, 1 (1996), 1--14.

Digital Library

[24]

Stuart Geman and Donald Geman. 1984. Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 6 (1984), 721--741.

Digital Library

[25]

Stuart Geman, Daniel F. Potter, and Zhiyi Chi. 2002. Composition systems. Quart. Appl. Math. 60, 4 (2002), 707–736.

[26]

Ulf Grenander. 1993. General Pattern Theory. Oxford University Press.

[27]

Matthew T. Harrison. 2005. Discovering Compositional Structures. Ph.D. Dissertation. Brown University.

[28]

Tom Heskes, Onno Zoeter, and Wim Wiegerinck. 2004. Approximate expectation maximization. In Advances in Neural Information Processing Systems 16. 353--360.

[29]

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.

[30]

Ya Jin and Stuart Geman. 2006. Context and hierarchy in a probabilistic image model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 2145--2152.

[31]

Dan Klein. 2005. The Unsupervised Learning of Natural Language Structure. Ph.D. Dissertation. Stanford University.

[32]

Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 2 (2001), 498--519.

Digital Library

[33]

Tejas D. Kulkarni, Pushmeet Kohli, Joshua B. Tenenbaum, and Vikash Mansinghka. 2015. Picture: A probabilistic programming language for scene perception. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4390--4399.

[34]

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.

[35]

David Mumford. 1994. The Bayesian rationale for energy functionals. Geometry-driven Diffusion in Computer Vision, Haar Romeny (Ed.). Springer, 141--153.

[36]

David Mumford. 1994. Elastica and computer vision. In Algebraic Geometry and Its Applications. Springer, 491--506.

[37]

Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Uncertainty in Artificial Intelligence. 467--475.

[38]

Stephen E. Palmer. 1999. Vision Science: Photons to Phenomenology. MIT Press.

[39]

Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

[40]

Przemyslaw Prusinkiewicz and Aristid Lindenmayer. 1991. The Algorithmic Beauty of Plants (The Virtual Laboratory). Springer.

[41]

Azriel Rosenfeld. 1979. Picture Languages (Formal Models for Picture Recognition). Academic Press.

[42]

A. Shashua and S. Ullman. 1988. Structural saliency: The detection of globally salient structures using a locally connected network. MIT AI Lab Memo No. 1061 (1988).

[43]

Andreas Stolcke. 1994. Bayesian Learning of Probabilistic Language Models. Ph.D. Dissertation. University of California at Berkeley.

[44]

Daniel Tarlow, Kevin Swersky, Richard S. Zemel, Ryan Prescott Adams, and Brendan J. Frey. 2012. Fast exact inference for recursive cardinality models. In Uncertainty in Artificial Intelligence.

[45]

Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. Deep probabilistic programming. In Proceedings of the International Conference on Learning Representations.

[46]

Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu. 2005. Image parsing: Unifying segmentation, detection, and recognition. Int. J. Comput. Vis. 63, 2 (2005), 113--140.

Digital Library

[47]

Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–2 (2008), 1--305.

Digital Library

[48]

Yair Weiss. 2000. Correctness of local probability propagation in graphical models with loops. Neur. Comput. 12, 1 (2000), 1--41.

Digital Library

[49]

Lance R. Williams and David W. Jacobs. 1997. Stochastic completion fields: A neural model of illusory contour shape and salience. Neur. Comput. 9, 4 (1997), 837--858.

Digital Library

[50]

Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2001. Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millennium. Morgan Kaufmann, 236--239.

[51]

Yibiao Zhao and Song-Chun Zhu. 2011. Image parsing with stochastic scene grammar. In Advances in Neural Information Processing Systems. 73--81.

[52]

Long Zhu, Yuanhao Chen, and Alan Yuille. 2009. Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1 (2009), 114--128.

Digital Library

[53]

Song-Chun Zhu and David Mumford. 2007. A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2, 4 (2007), 259--362.

Digital Library

Index Terms

Scene Grammars, Factor Graphs, and Belief Propagation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Knowledge representation and reasoning
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic reasoning algorithms
      1. Loopy belief propagation
    2. Probabilistic representations

Recommendations

Convex combination belief propagation
Abstract
We present new message passing algorithms for performing inference with graphical models. Our methods are designed for the most difficult inference problems where loopy belief propagation and other heuristics fail to converge. Belief ...
Tractable Bayesian learning of tree belief networks

In this paper we present decomposable priors , a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be ...
Multi-view Occlusion Reasoning for Probabilistic Silhouette-Based Dynamic Scene Reconstruction

In this paper, we present an algorithm to probabilistically estimate object shapes in a 3D dynamic scene using their silhouette information derived from multiple geometrically calibrated video camcorders. The scene is represented by a 3D volume. Every ...

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 67, Issue 4

August 2020

265 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/3403612

Editor:
Éva Tardos
Cornell University

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2020

Online AM: 07 May 2020

Accepted: 01 April 2020

Revised: 01 August 2019

Received: 01 July 2018

Published in JACM Volume 67, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
371
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents