article

Describing Visual Scenes Using Transformed Objects and Parts

Authors:

Erik B. Sudderth,

Antonio Torralba,

William T. Freeman,

Alan S. WillskyAuthors Info & Claims

International Journal of Computer Vision, Volume 77, Issue 1-3

Pages 291 - 330

https://doi.org/10.1007/s11263-007-0069-5

Published: 01 May 2008 Publication History

Abstract

We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.

References

[1]

Adams, N. J., & Williams, C. K. I. (2003). Dynamic trees for image modelling. Image and Vision Computing, 21 , 865-877.

[2]

Amit, Y., & Trouvé, A. (2007). Generative models for labeling multiobject configurations in images. In J. Ponce, et al. (Ed.), Toward category-level object recognition . Berlin: Springer.

[3]

Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3 , 1107-1135.

Digital Library

[4]

Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (4), 509-522.

Digital Library

[5]

Bienenstock, E., Geman, S., & Potter, D. (1997). Compositionality, MDL priors, and object recognition. In Neural information processing systems 9 (pp. 838-844). Cambridge: MIT Press.

[6]

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3 , 993-1022.

Digital Library

[7]

Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision (Vol. 2, pp. 109-122).

[8]

Bosch, A., Zisserman, A., & Muñoz, X. (2006). Scene classification via pLSA. In European conference on computer vision (pp. 517- 530).

[9]

Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 (6), 679-698.

Digital Library

[10]

Casella, G., & Robert, C. P. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83 (1), 81-94.

[11]

Csurka, G., et al. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision .

[12]

De Iorio, M., Müller, P., Rosner, G. L., & MacEachern, S. N. (2004). An ANOVA model for dependent random measures. Journal of the American Statistical Association, 99 (465), 205-215.

[13]

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44 , 837-845.

[14]

Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90 (430), 577-588.

[15]

Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 524-531).

[16]

Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In CVPR workshop on generative model based vision .

[17]

Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google's image search. In International conference on computer vision (Vol. 2, pp. 1816-1823).

[18]

Fink, M., & Perona, P. (2004). Mutual boosting for contextual inference. In Neural information processing systems 16 . Cambridge: MIT Press.

[19]

Fischler, M. A., & Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22 (1), 67-92.

Digital Library

[20]

Frey, B. J., & Jojic, N. (2003). Transformation-invariant clustering using the EM algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (1), 1-17.

Digital Library

[21]

Gelfand, A. E., Kottas, A., & MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association, 100 (471), 1021- 1035.

[22]

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis . London: Chapman & Hall.

[23]

Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101 , 5228-5235.

[24]

He, X., Zemel, R. S., & Carreira-Perpiñán, M. A. (2004). Multiscale conditional random fields for image labeling. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 695-702).

[25]

Helmer, S., & Lowe, D. G. (2004). Object class recognition with many local features. In CVPR workshop on generative model based vision .

[26]

Hinton, G. E., Ghahramani, Z., & Teh, Y. W. (2000). Learning to parse images. In Neural information processing systems 12 (pp. 463- 469). Cambridge: MIT Press.

[27]

Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96 (453), 161-173.

[28]

Ishwaran, H., & Zarepour, M. (2002). Dirichlet prior sieves in finite normal mixtures. Statistica Sinica, 12 , 941-963.

[29]

Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2145-2152).

[30]

Jojic, N., & Frey, B. J. (2001). Learning flexible sprites in video layers. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 199-206).

[31]

Jordan, M. I. (2004). Graphical models. Statistical Science, 19 (1), 140- 155.

[32]

Jordan, M. I. (2005). Dirichlet processes, Chinese restaurant processes and all that. Tutorial at Neural Information Processing Systems .

[33]

Kovesi, P. (2005). MATLAB and Octave functions for computer vision and image processing. Available from http://www.csse.uwa.edu. au/~pk/research/matlabfns/.

[34]

LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 97-104).

[35]

Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision .

[36]

Liter, J. C., & Bülthoff, H. H. (1998). An introduction to object recognition. Zeitschrift für Naturforschung, 53c , 610-621.

[37]

Loeff, N., Arora, H., Sorokin, A., & Forsyth, D. (2006). Efficient unsupervised learning for localization and detection in object categories. In Neural information processing systems 18 (pp. 811- 818). Cambridge: MIT Press.

[38]

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60 (2), 91- 110.

Digital Library

[39]

MacEachern, S. N. (1999). Dependent nonparametric processes. In Proceedings section on Bayesian statistical science (pp. 50-55). Alexandria: American Statistical Association.

[40]

Matas, J., Chum, O., Urban, M., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In British machine vision conference (pp. 384-393).

[41]

Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60 (1), 63-86.

Digital Library

[42]

Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (10), 1615-1630.

Digital Library

[43]

Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., & Kolobov, A. (2005). BLOG: Probabilistic models with unknown objects. In International joint conference on artificial intelligence 19 (pp. 1352-1359).

[44]

Miller, E. G., & Chefd'hotel, C. (2003). Practical nonparametric density estimation on a transformation group for vision. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 114-121).

[45]

Miller, E. G., Matsakis, N. E., & Viola, P. A. (2000). Learning from one example through shared densities on transforms. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 464-471).

[46]

Murphy, K., Torralba, A., & Freeman, W. T. (2004). Using the forest to see the trees: A graphical model relating features, objects, and scenes. In Neural information processing systems 16 . Cambridge: MIT Press.

[47]

Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9 (2), 249-265.

[48]

Pitman, J. (2002). Combinatorial stochastic processes . Technical Report 621, U.C. Berkeley Department of Statistics, August 2002.

[49]

Rodriguez, A., Dunson, D. B., & Gelfand, A. E. (2006). The nested Dirichlet process . Working Paper 2006-19, Duke Institute of Statistics and Decision Sciences.

[50]

Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Uncertainty in artificial intelligence 20 (pp. 487-494). Corvallis: AUAI Press.

[51]

Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). LabelMe: A database and web-based tool for image annotation . Technical Report 2005-025, MIT AI Lab.

[52]

Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science, 210 , 390-398.

[53]

Simard, P. Y., LeCun, Y. A., Denker, J. S., & Victorri, B. (1998). Transformation invariance in pattern recognition: Tangent distance and tangent propagation. In B. O. Genevieve & K. R. Müller (Eds.), Neural networks: tricks of the trade (pp. 239-274). Berlin: Springer.

[54]

Siskind, J. M., Sherman, J., Pollak, I., Harper, M. P., & Bouman, C. A. (2004, submitted). Spatial random tree grammars for modeling hierarchal structure in images. IEEE Transactions on Pattern Analysis and Machine Intelligence .

[55]

Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their location in images. In International conference on computer vision (Vol. 1, pp. 370-377).

[56]

Storkey, A. J., & Williams, C. K. I. (2003). Image modeling with position-encoding dynamic trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (7), 859-871.

Digital Library

[57]

Sudderth, E. B. (2006). Graphical models for visual object recognition and tracking . PhD thesis, Massachusetts Institute of Technology.

[58]

Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2005). Learning hierarchical models of scenes, objects, and parts. In International conference on computer vision (Vol. 2, pp. 1331- 1338).

[59]

Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2006a). Depth from familiar objects: A hierarchical model for 3D scenes. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2410-2417).

[60]

Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2006b). Describing visual scenes using transformed Dirichlet processes. In Neural information processing systems 18 (pp. 1297-1304). Cambridge: MIT Press.

[61]

Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101 (476), 1566-1581.

[62]

Tenenbaum, J. M., & Barrow, H. G. (1977). Experiments in interpretation-guided segmentation. Artificial Intelligence, 8 , 241-274.

Digital Library

[63]

Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53 (2), 169-191.

Digital Library

[64]

Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 762-769).

[65]

Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63 (2), 113-140.

Digital Library

[66]

Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5 (7), 682-687.

[67]

Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57 (2), 137-154.

Digital Library

[68]

Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In European conference on computer vision (pp. 18-32).

[69]

Williams, C. K. I., & Allan, M. (2006). On a connection between object localization with a generative template of features and pose-space prediction methods . Informatics Research Report 719, University of Edinburgh.

Cited By

Li LWang WZhou TQuan RYang Y(2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3332435
Li XLei L(2021)A bibliometric analysis of topic modelling studies (2000–2017)Journal of Information Science10.1177/016555151987704947:2(161-175)Online publication date: 23-Mar-2021
https://dl.acm.org/doi/10.1177/0165551519877049
Wang HOndrej JO'Sullivan C(2017)Trending PathsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2016.264296323:5(1454-1464)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1109/TVCG.2016.2642963
Show More Cited By

Index Terms

Describing Visual Scenes Using Transformed Objects and Parts
1. Computing methodologies

Recommendations

Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles

We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. Our approach is formulated in an MDL hypothesis selection framework, which allows it to recover ...
Describing visual scenes using transformed Dirichlet processes
NIPS'05: Proceedings of the 18th International Conference on Neural Information Processing Systems

Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly ...
Familiarity based unified visual attention model for fast and robust object recognition

Even though visual attention models using bottom-up saliency can speed up object recognition by predicting object locations, in the presence of multiple salient objects, saliency alone cannot discern target objects from the clutter in a scene. Using a ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision

International Journal of Computer Vision Volume 77, Issue 1-3

May 2008

321 pages

ISSN:0920-5691

Issue’s Table of Contents

Copyright © Copyright © 2008 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li LWang WZhou TQuan RYang Y(2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3332435
Li XLei L(2021)A bibliometric analysis of topic modelling studies (2000–2017)Journal of Information Science10.1177/016555151987704947:2(161-175)Online publication date: 23-Mar-2021
https://dl.acm.org/doi/10.1177/0165551519877049
Wang HOndrej JO'Sullivan C(2017)Trending PathsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2016.264296323:5(1454-1464)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1109/TVCG.2016.2642963
Zhu HYuen KMihaylova LLeung H(2017)Overview of Environment Perception for Intelligent VehiclesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.265866218:10(2584-2601)Online publication date: 29-Sep-2017
https://dl.acm.org/doi/10.1109/TITS.2017.2658662
Zhang FMao QDong MZhan YHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Multi-pose Facial Expression Recognition Using Transformed Dirichlet ProcessProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967240(347-351)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2967240
Wang HOndřej JO'Sullivan CWyman CYuksel C(2016)Path patternsProceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games10.1145/2856400.2856410(49-57)Online publication date: 27-Feb-2016
https://dl.acm.org/doi/10.1145/2856400.2856410
Li CRana SPhung DVenkatesh S(2016)Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical recordsKnowledge-Based Systems10.1016/j.knosys.2016.02.00599:C(168-182)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.knosys.2016.02.005
Tu NDinh DRasel MLee Y(2016)Topic modeling and improvement of image representation for large-scale image retrievalInformation Sciences: an International Journal10.1016/j.ins.2016.05.029366:C(99-120)Online publication date: 20-Oct-2016
https://dl.acm.org/doi/10.1016/j.ins.2016.05.029
Wang HLu TWang YShivakumara PTan C(2016)Weakly-supervised region annotation for understanding scene imagesMultimedia Tools and Applications10.1007/s11042-014-2420-575:6(3027-3051)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s11042-014-2420-5
Li CRana SPhung DVenkatesh S(2016)Data clustering using side information dependent Chinese restaurant processesKnowledge and Information Systems10.1007/s10115-015-0834-747:2(463-488)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1007/s10115-015-0834-7
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents