Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Describing Visual Scenes Using Transformed Objects and Parts

Published: 01 May 2008 Publication History

Abstract

We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.

References

[1]
Adams, N. J., & Williams, C. K. I. (2003). Dynamic trees for image modelling. Image and Vision Computing, 21 , 865-877.
[2]
Amit, Y., & Trouvé, A. (2007). Generative models for labeling multiobject configurations in images. In J. Ponce, et al. (Ed.), Toward category-level object recognition . Berlin: Springer.
[3]
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3 , 1107-1135.
[4]
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (4), 509-522.
[5]
Bienenstock, E., Geman, S., & Potter, D. (1997). Compositionality, MDL priors, and object recognition. In Neural information processing systems 9 (pp. 838-844). Cambridge: MIT Press.
[6]
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3 , 993-1022.
[7]
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision (Vol. 2, pp. 109-122).
[8]
Bosch, A., Zisserman, A., & Muñoz, X. (2006). Scene classification via pLSA. In European conference on computer vision (pp. 517- 530).
[9]
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 (6), 679-698.
[10]
Casella, G., & Robert, C. P. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83 (1), 81-94.
[11]
Csurka, G., et al. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision .
[12]
De Iorio, M., Müller, P., Rosner, G. L., & MacEachern, S. N. (2004). An ANOVA model for dependent random measures. Journal of the American Statistical Association, 99 (465), 205-215.
[13]
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44 , 837-845.
[14]
Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90 (430), 577-588.
[15]
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 524-531).
[16]
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In CVPR workshop on generative model based vision .
[17]
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google's image search. In International conference on computer vision (Vol. 2, pp. 1816-1823).
[18]
Fink, M., & Perona, P. (2004). Mutual boosting for contextual inference. In Neural information processing systems 16 . Cambridge: MIT Press.
[19]
Fischler, M. A., & Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22 (1), 67-92.
[20]
Frey, B. J., & Jojic, N. (2003). Transformation-invariant clustering using the EM algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (1), 1-17.
[21]
Gelfand, A. E., Kottas, A., & MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association, 100 (471), 1021- 1035.
[22]
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis . London: Chapman & Hall.
[23]
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101 , 5228-5235.
[24]
He, X., Zemel, R. S., & Carreira-Perpiñán, M. A. (2004). Multiscale conditional random fields for image labeling. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 695-702).
[25]
Helmer, S., & Lowe, D. G. (2004). Object class recognition with many local features. In CVPR workshop on generative model based vision .
[26]
Hinton, G. E., Ghahramani, Z., & Teh, Y. W. (2000). Learning to parse images. In Neural information processing systems 12 (pp. 463- 469). Cambridge: MIT Press.
[27]
Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96 (453), 161-173.
[28]
Ishwaran, H., & Zarepour, M. (2002). Dirichlet prior sieves in finite normal mixtures. Statistica Sinica, 12 , 941-963.
[29]
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2145-2152).
[30]
Jojic, N., & Frey, B. J. (2001). Learning flexible sprites in video layers. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 199-206).
[31]
Jordan, M. I. (2004). Graphical models. Statistical Science, 19 (1), 140- 155.
[32]
Jordan, M. I. (2005). Dirichlet processes, Chinese restaurant processes and all that. Tutorial at Neural Information Processing Systems .
[33]
Kovesi, P. (2005). MATLAB and Octave functions for computer vision and image processing. Available from http://www.csse.uwa.edu. au/~pk/research/matlabfns/.
[34]
LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 97-104).
[35]
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision .
[36]
Liter, J. C., & Bülthoff, H. H. (1998). An introduction to object recognition. Zeitschrift für Naturforschung, 53c , 610-621.
[37]
Loeff, N., Arora, H., Sorokin, A., & Forsyth, D. (2006). Efficient unsupervised learning for localization and detection in object categories. In Neural information processing systems 18 (pp. 811- 818). Cambridge: MIT Press.
[38]
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60 (2), 91- 110.
[39]
MacEachern, S. N. (1999). Dependent nonparametric processes. In Proceedings section on Bayesian statistical science (pp. 50-55). Alexandria: American Statistical Association.
[40]
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In British machine vision conference (pp. 384-393).
[41]
Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60 (1), 63-86.
[42]
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (10), 1615-1630.
[43]
Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., & Kolobov, A. (2005). BLOG: Probabilistic models with unknown objects. In International joint conference on artificial intelligence 19 (pp. 1352-1359).
[44]
Miller, E. G., & Chefd'hotel, C. (2003). Practical nonparametric density estimation on a transformation group for vision. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 114-121).
[45]
Miller, E. G., Matsakis, N. E., & Viola, P. A. (2000). Learning from one example through shared densities on transforms. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 464-471).
[46]
Murphy, K., Torralba, A., & Freeman, W. T. (2004). Using the forest to see the trees: A graphical model relating features, objects, and scenes. In Neural information processing systems 16 . Cambridge: MIT Press.
[47]
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9 (2), 249-265.
[48]
Pitman, J. (2002). Combinatorial stochastic processes . Technical Report 621, U.C. Berkeley Department of Statistics, August 2002.
[49]
Rodriguez, A., Dunson, D. B., & Gelfand, A. E. (2006). The nested Dirichlet process . Working Paper 2006-19, Duke Institute of Statistics and Decision Sciences.
[50]
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Uncertainty in artificial intelligence 20 (pp. 487-494). Corvallis: AUAI Press.
[51]
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). LabelMe: A database and web-based tool for image annotation . Technical Report 2005-025, MIT AI Lab.
[52]
Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science, 210 , 390-398.
[53]
Simard, P. Y., LeCun, Y. A., Denker, J. S., & Victorri, B. (1998). Transformation invariance in pattern recognition: Tangent distance and tangent propagation. In B. O. Genevieve & K. R. Müller (Eds.), Neural networks: tricks of the trade (pp. 239-274). Berlin: Springer.
[54]
Siskind, J. M., Sherman, J., Pollak, I., Harper, M. P., & Bouman, C. A. (2004, submitted). Spatial random tree grammars for modeling hierarchal structure in images. IEEE Transactions on Pattern Analysis and Machine Intelligence .
[55]
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their location in images. In International conference on computer vision (Vol. 1, pp. 370-377).
[56]
Storkey, A. J., & Williams, C. K. I. (2003). Image modeling with position-encoding dynamic trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 (7), 859-871.
[57]
Sudderth, E. B. (2006). Graphical models for visual object recognition and tracking . PhD thesis, Massachusetts Institute of Technology.
[58]
Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2005). Learning hierarchical models of scenes, objects, and parts. In International conference on computer vision (Vol. 2, pp. 1331- 1338).
[59]
Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2006a). Depth from familiar objects: A hierarchical model for 3D scenes. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2410-2417).
[60]
Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2006b). Describing visual scenes using transformed Dirichlet processes. In Neural information processing systems 18 (pp. 1297-1304). Cambridge: MIT Press.
[61]
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101 (476), 1566-1581.
[62]
Tenenbaum, J. M., & Barrow, H. G. (1977). Experiments in interpretation-guided segmentation. Artificial Intelligence, 8 , 241-274.
[63]
Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53 (2), 169-191.
[64]
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 762-769).
[65]
Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63 (2), 113-140.
[66]
Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5 (7), 682-687.
[67]
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57 (2), 137-154.
[68]
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In European conference on computer vision (pp. 18-32).
[69]
Williams, C. K. I., & Allan, M. (2006). On a connection between object localization with a generative template of features and pose-space prediction methods . Informatics Research Report 719, University of Edinburgh.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision
International Journal of Computer Vision  Volume 77, Issue 1-3
May 2008
321 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2008

Author Tags

  1. Context
  2. Dirichlet process
  3. Graphical models
  4. Hierarchical Dirichlet process
  5. Object recognition
  6. Scene analysis
  7. Transformation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: 1-Apr-2024
  • (2021)A bibliometric analysis of topic modelling studies (2000–2017)Journal of Information Science10.1177/016555151987704947:2(161-175)Online publication date: 23-Mar-2021
  • (2017)Trending PathsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2016.264296323:5(1454-1464)Online publication date: 1-May-2017
  • (2017)Overview of Environment Perception for Intelligent VehiclesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.265866218:10(2584-2601)Online publication date: 29-Sep-2017
  • (2016)Multi-pose Facial Expression Recognition Using Transformed Dirichlet ProcessProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967240(347-351)Online publication date: 1-Oct-2016
  • (2016)Path patternsProceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games10.1145/2856400.2856410(49-57)Online publication date: 27-Feb-2016
  • (2016)Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical recordsKnowledge-Based Systems10.1016/j.knosys.2016.02.00599:C(168-182)Online publication date: 1-May-2016
  • (2016)Topic modeling and improvement of image representation for large-scale image retrievalInformation Sciences: an International Journal10.1016/j.ins.2016.05.029366:C(99-120)Online publication date: 20-Oct-2016
  • (2016)Weakly-supervised region annotation for understanding scene imagesMultimedia Tools and Applications10.1007/s11042-014-2420-575:6(3027-3051)Online publication date: 1-Mar-2016
  • (2016)Data clustering using side information dependent Chinese restaurant processesKnowledge and Information Systems10.1007/s10115-015-0834-747:2(463-488)Online publication date: 1-May-2016
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media