Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Scene Grammars, Factor Graphs, and Belief Propagation

Published: 30 May 2020 Publication History

Abstract

We describe a general framework for probabilistic modeling of complex scenes and for inference from ambiguous observations. The approach is motivated by applications in image analysis and is based on the use of priors defined by stochastic grammars. We define a class of grammars that capture relationships between the objects in a scene and provide important contextual cues for statistical inference. The distribution over scenes defined by a probabilistic scene grammar can be represented by a graphical model, and this construction can be used for efficient inference with loopy belief propagation.
We show experimental results with two applications. One application involves the reconstruction of binary contour maps. Another application involves detecting and localizing faces in images. In both applications, the same framework leads to robust inference algorithms that can effectively combine local information to reason about a scene.

References

[1]
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Tools, and Techniques. Addison-Wesley.
[2]
Yali Amit. 2002. 2D Object Detection and Recognition. MIT Press.
[3]
Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. 2011. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (May 2011), 898--916.
[4]
Julian Besag. 1986. On the statistical analysis of dirty pictures. J. Roy. Stat. Soc. Ser. B (Methodological) 48, 3 (1986), 259--302.
[5]
Elie Bienenstock, Stuart Geman, and Daniel Potter. 1997. Compositionality, MDL priors, and object recognition. In Advances in Neural Information Processing Systems. 838--844.
[6]
Michael Burl, Markus Weber, and Pietro Perona. 1998. A probabilistic approach to object recognition using local photometry and global geometry. In Proceedings of the European Conference on Computer Vision. 628--641.
[7]
Lo-Bin Chang, Ya Jin, Wei Zhang, Eran Borenstein, and Stuart Geman. 2011. Context, computation, and optimal ROC performance in hierarchical models. Int. J. Comput. Vis. 93, 2 (2011), 117--140.
[8]
Rama Chellapa and Anil Jain. 1993. Markov Random Fields: Theory and Application. Academic Press.
[9]
Noam Chomsky. 1956. Three models for the description of language. IRE Trans. Inf. Theory 2, 3 (1956), 113--124.
[10]
Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2nd ed.). The MIT Press.
[11]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886--893.
[12]
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1 (1977), 1--38.
[13]
Frank Drewes. 2006. Grammatical Picture Generation. Springer.
[14]
Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
[15]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2012. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
[16]
Pedro F. Felzenszwalb, Ross B. Girshick, and David McAllester. 2010. Discriminatively Trained Deformable Part Models, Release 4.
[17]
Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645.
[18]
Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2005. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 1 (2005), 55--79.
[19]
Pedro F. Felzenszwalb and David McAllester. 2010. Object detection grammars. Univerity of Chicago Computer Science Technical Report 2010-02 (2010).
[20]
Pedro F. Felzenszwalb and John G. Oberlin. 2014. Multiscale fields of patterns. In Advances in Neural Information Processing Systems. 82--90.
[21]
Martin A. Fischler and Robert A. Elschlager. 1973. The representation and matching of pictorial structures. IEEE Trans. Comput. 22, 1 (1973), 67--92.
[22]
King Sun Fu. 1974. Syntactic Methods in Pattern Recognition. Elsevier.
[23]
Donald Geman and Bruno Jedynak. 1996. An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18, 1 (1996), 1--14.
[24]
Stuart Geman and Donald Geman. 1984. Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 6 (1984), 721--741.
[25]
Stuart Geman, Daniel F. Potter, and Zhiyi Chi. 2002. Composition systems. Quart. Appl. Math. 60, 4 (2002), 707–736.
[26]
Ulf Grenander. 1993. General Pattern Theory. Oxford University Press.
[27]
Matthew T. Harrison. 2005. Discovering Compositional Structures. Ph.D. Dissertation. Brown University.
[28]
Tom Heskes, Onno Zoeter, and Wim Wiegerinck. 2004. Approximate expectation maximization. In Advances in Neural Information Processing Systems 16. 353--360.
[29]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.
[30]
Ya Jin and Stuart Geman. 2006. Context and hierarchy in a probabilistic image model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 2145--2152.
[31]
Dan Klein. 2005. The Unsupervised Learning of Natural Language Structure. Ph.D. Dissertation. Stanford University.
[32]
Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 2 (2001), 498--519.
[33]
Tejas D. Kulkarni, Pushmeet Kohli, Joshua B. Tenenbaum, and Vikash Mansinghka. 2015. Picture: A probabilistic programming language for scene perception. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4390--4399.
[34]
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
[35]
David Mumford. 1994. The Bayesian rationale for energy functionals. Geometry-driven Diffusion in Computer Vision, Haar Romeny (Ed.). Springer, 141--153.
[36]
David Mumford. 1994. Elastica and computer vision. In Algebraic Geometry and Its Applications. Springer, 491--506.
[37]
Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Uncertainty in Artificial Intelligence. 467--475.
[38]
Stephen E. Palmer. 1999. Vision Science: Photons to Phenomenology. MIT Press.
[39]
Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
[40]
Przemyslaw Prusinkiewicz and Aristid Lindenmayer. 1991. The Algorithmic Beauty of Plants (The Virtual Laboratory). Springer.
[41]
Azriel Rosenfeld. 1979. Picture Languages (Formal Models for Picture Recognition). Academic Press.
[42]
A. Shashua and S. Ullman. 1988. Structural saliency: The detection of globally salient structures using a locally connected network. MIT AI Lab Memo No. 1061 (1988).
[43]
Andreas Stolcke. 1994. Bayesian Learning of Probabilistic Language Models. Ph.D. Dissertation. University of California at Berkeley.
[44]
Daniel Tarlow, Kevin Swersky, Richard S. Zemel, Ryan Prescott Adams, and Brendan J. Frey. 2012. Fast exact inference for recursive cardinality models. In Uncertainty in Artificial Intelligence.
[45]
Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. Deep probabilistic programming. In Proceedings of the International Conference on Learning Representations.
[46]
Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu. 2005. Image parsing: Unifying segmentation, detection, and recognition. Int. J. Comput. Vis. 63, 2 (2005), 113--140.
[47]
Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–2 (2008), 1--305.
[48]
Yair Weiss. 2000. Correctness of local probability propagation in graphical models with loops. Neur. Comput. 12, 1 (2000), 1--41.
[49]
Lance R. Williams and David W. Jacobs. 1997. Stochastic completion fields: A neural model of illusory contour shape and salience. Neur. Comput. 9, 4 (1997), 837--858.
[50]
Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2001. Understanding belief propagation and its generalizations. In Exploring Artificial Intelligence in the New Millennium. Morgan Kaufmann, 236--239.
[51]
Yibiao Zhao and Song-Chun Zhu. 2011. Image parsing with stochastic scene grammar. In Advances in Neural Information Processing Systems. 73--81.
[52]
Long Zhu, Yuanhao Chen, and Alan Yuille. 2009. Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1 (2009), 114--128.
[53]
Song-Chun Zhu and David Mumford. 2007. A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2, 4 (2007), 259--362.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 67, Issue 4
August 2020
265 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3403612
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2020
Online AM: 07 May 2020
Accepted: 01 April 2020
Revised: 01 August 2019
Received: 01 July 2018
Published in JACM Volume 67, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pattern theory
  2. graphical model
  3. image restoration
  4. object detection
  5. stochastic grammar

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 371
    Total Downloads
  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media