PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding

Zhang, Yinda; Song, Shuran; Tan, Ping; Xiao, Jianxiong

doi:10.1007/978-3-319-10599-4_43

Yinda Zhang¹⁹,
Shuran Song¹⁹,
Ping Tan²⁰ &
…
Jianxiong Xiao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

18k Accesses
6 Altmetric

Abstract

The field-of-view of standard cameras is very small, which is one of the main reasons that contextual information is not as useful as it should be for object detection. To overcome this limitation, we advocate the use of 360° full-view panoramas in scene understanding, and propose a whole-room context model in 3D. For an input panorama, our method outputs 3D bounding boxes of the room and all major objects inside, together with their semantic categories. Our method generates 3D hypotheses based on contextual constraints and ranks the hypotheses holistically, combining both bottom-up and top-down context information. To train our model, we construct an annotated panorama dataset and reconstruct the 3D model from single-view using manual annotation. Experiments show that solely based on 3D context without any image region category classifier, we can achieve a comparable performance with the state-of-the-art object detector. This demonstrates that when the FOV is large, context is as powerful as object appearance. All data and source code are available online.

Download to read the full chapter text

Chapter PDF

R3DS: Reality-Linked 3D Scenes for Panoramic Scene Understanding

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

PANDORA: A Panoramic Detection Dataset for Object with Orientation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Roberts, L.G.: Machine perception of 3-D solids. PhD thesis, Massachusetts Institute of Technology (1963)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI (2010)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (voc) challenge. IJCV (2010)
Google Scholar
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV (2013)
Google Scholar
Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV (2013)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524 (2013)
Google Scholar
Biederman, I.: On the semantics of a glance at a scene (1981)
Google Scholar
Torralba, A.: Contextual influences on saliency (2004)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)
Google Scholar
Brown, M., Lowe, D.G.: Recognising panoramas. In: ICCV (2003)
Google Scholar
Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. IJCV (2007)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. JMLR (2008)
Google Scholar
von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: LSD: a Line Segment Detector. Image Processing On Line (2012)
Google Scholar
Hough, P.V.: Machine analysis of bubble chamber pictures. In: International Conference on High Energy Accelerators and Instrumentation, vol. 73 (1959)
Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: ICCV (2009)
Google Scholar
Lee, D.C., Hebert, M., Kanade., T.: Geometric reasoning for single image structure recovery. In: CVPR (2009)
Google Scholar
Xiao, J., Russell, B.C., Torralba, A.: Localizing 3D cuboids in single-view images. In: NIPS (2012)
Google Scholar
Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural svms. In: Machine Learning (2009)
Google Scholar
Xiao, J., Ehinger, K.A., Oliva, A., Torralba, A.: Recognizing scene viewpoint using panoramic place representation. In: CVPR (2012)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: Large-scale scene recognition from abbey to zoo. In: CVPR (2010)
Google Scholar
Delage, E., Lee, H., Ng, A.Y.: Automatic single-image 3D reconstructions of indoor manhattan world scenes. In: ISRR (2005)
Google Scholar
Coughlan, J.M., Yuille, A.: Manhattan world: Compass direction from a single image by bayesian inference. In: ICCV (1999)
Google Scholar
Hoiem, D.: Seeing the world behind the image: spatial layout for 3D scene understanding. PhD thesis, Carnegie Mellon University (2007)
Google Scholar
Saxena, A., Sun, M., Ng, A.: Make3D: Learning 3D scene structure from a single still image. PAMI (2009)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. TOG (2005)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. IJCV (2008)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop in scene interpretation. In: CVPR (2008)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: ICCV (2005)
Google Scholar
Gupta, A., Satkin, S., Efros, A.A., Hebert, M.: From scene geometry to human workspace. In: CVPR (2011)
Google Scholar
Han, F., Zhu, S.C.: Bottom-up/top-down image parsing by attribute graph grammar. In: ICCV (2005)
Google Scholar
Zhao, Y.: chun Zhu, S.: Image parsing with stochastic scene grammar. In: NIPS (2011)
Google Scholar
Wang, H., Gould, S., Koller, D.: Discriminative learning with latent variables for cluttered indoor scene understanding. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 435–449. Springer, Heidelberg (2010)
Chapter Google Scholar
Yu, S., Zhang, H., Malik, J.: Inferring spatial layout from a single image via depth-ordered grouping. In: IEEE Workshop on Perceptual Organization in Computer Vision (2008)
Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Thinking inside the box: Using appearance models and context based on room geometry. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 224–237. Springer, Heidelberg (2010)
Chapter Google Scholar
Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: NIPS (2010)
Google Scholar
Pero, L.D., Guan, J., Brau, E., Schlecht, J., Barnard, K.: Sampling bedrooms. In: CVPR (2011)
Google Scholar
Yu, L.F., Yeung, S.K., Tang, C.K., Terzopoulos, D., Chan, T.F., Osher, S.: Make it home: automatic optimization of furniture arrangement. TOG (2011)
Google Scholar
Pero, L.D., Bowdish, J.C., Fried, D., Kermgard, B.D., Hartley, E.L., Barnard, K.: Bayesian geometric modelling of indoor scenes. In: CVPR (2012)
Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Recovering free space of indoor scenes from a single image. In: CVPR (2012)
Google Scholar
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3D indoor scene understanding. In: CVPR (2012)
Google Scholar
Xiao, J., Hays, J., Russell, B.C., Patterson, G., Ehinger, K., Torralba, A., Oliva, A.: Basic level scene understanding: Categories, attributes and structures. Frontiers in Psychology (2013)
Google Scholar
Guo, R., Hoiem, D.: Beyond the line of sight: Labeling the underlying surfaces. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 761–774. Springer, Heidelberg (2012)
Chapter Google Scholar
Satkin, S., Hebert, M.: 3DNN: Viewpoint invariant 3D geometry matching for scene understanding. In: ICCV (2013)
Google Scholar
Satkin, S., Lin, J., Hebert, M.: Data-driven scene understanding from 3D models. In: BMVC (2012)
Google Scholar
Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3D geometric phrases. In: CVPR (2013)
Google Scholar
Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., Barnard, K.: Understanding bayesian rooms using composite 3D object models. In: CVPR (2013)
Google Scholar
Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: CVPR (2013)
Google Scholar
Schwing, A.G., Fidler, S., Pollefeys, M., Urtasun, R.: Box in the box: Joint 3D layout and object reasoning from single images (2013)
Google Scholar
Schwing, A.G., Urtasun, R.: Efficient exact inference for 3D indoor scene understanding. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 299–313. Springer, Heidelberg (2012)
Chapter Google Scholar
Chao, Y.-W., Choi, W., Pantofaru, C., Savarese, S.: Layout estimation of highly cluttered indoor scenes using geometric and semantic cues. In: Petrosino, A. (ed.) ICIAP 2013, Part II. LNCS, vol. 8157, pp. 489–499. Springer, Heidelberg (2013)
Chapter Google Scholar
Furlan, A., Miller, D., Sorrenti, D.G., Fei-Fei, L., Savarese, S.: Free your camera: 3D indoor scene understanding from arbitrary camera motion. In: BMVC (2013)
Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV (2007)
Google Scholar
Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008)
Google Scholar
Choi, M.J., Torralba, A., Willsky, A.S.: A tree-based context model for object recognition. PAMI (2012)
Google Scholar
Choi, M.J., Torralba, A., Willsky, A.S.: Context models and out-of-context objects. Pattern Recognition Letters (2012)
Google Scholar
Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.S.: Exploiting hierarchical context on a large database of object categories. In: CVPR (2010)
Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.C.: Discriminative models for multi-class object layout. IJCV (2011)
Google Scholar
Ladicky, L., Russell, C., Kohli, P., Torr, P.H.S.: Graph cut based inference with co-occurrence statistics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 239–253. Springer, Heidelberg (2010)
Chapter Google Scholar
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Describing visual scenes using transformed objects and parts. IJCV (2008)
Google Scholar
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Depth from familiar objects: A hierarchical model for 3D scenes. In: CVPR (2006)
Google Scholar
Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Describing visual scenes using transformed dirichlet processes. In: NIPS (2005)
Google Scholar
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Learning hierarchical models of scenes, objects, and parts. In: ICCV (2005)
Google Scholar
Sudderth, E.B., Jordan, M.I.: Shared segmentation of natural scenes using dependent pitman-yor processes. In: NIPS (2008)
Google Scholar
Li, C., Kowdle, A., Saxena, A., Chen, T.: Towards holistic scene understanding: Feedback enabled cascaded classification models. PAMI (2012)
Google Scholar
Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: NIPS (2008)
Google Scholar
Wu, T., Zhu, S.C.: A numerical study of the bottom-up and top-down inference processes in and-or graphs. IJCV (2011)
Google Scholar
Battaglia, P.W., Hamrick, J.B., Tenenbaum, J.B.: Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences (2013)
Google Scholar
Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to grow a mind: Statistics, structure, and abstraction. Science (2011)
Google Scholar
Mansinghka, V.K., Kulkarni, T.D., Perov, Y.N., Tenenbaum, J.B.: Approximate bayesian image interpretation using generative probabilistic graphics programs. In: NIPS (2013)
Google Scholar
Han, F., Zhu, S.C.: Bottom-up/top-down image parsing with attribute grammar. PAMI (2009)
Google Scholar
Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: Unifying segmentation, detection, and recognition. IJCV (2005)
Google Scholar
Li, L.J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Google Scholar
Li, L.J., Su, H., Xing, E.P., Li, F.F.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: NIPS (2010)
Google Scholar
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with rgbd cameras. In: ICCV (2013)
Google Scholar
Fidler, S., Dickinson, S.J., Urtasun, R.: 3D object detection and viewpoint estimation with a deformable 3d cuboid model. In: NIPS (2012)
Google Scholar
Xiao, J., Furukawa, Y.: Reconstructing the world’s museums. IJCV (2014)
Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. IJCV (2008)
Google Scholar
Bell, S., Upchurch, P., Snavely, N., Bala, K.: OpenSurfaces: a richly annotated catalog of surface appearance. TOG (2013)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV (2009)
Google Scholar
Russell, B.C., Torralba, A.: Building a database of 3D scenes from user annotations. In: CVPR (2009)
Google Scholar
Ni, K., Kannan, A., Criminisi, A., Winn, J.: Epitomic location recognition. In: CVPR (2008)
Google Scholar
Zhang, Y., Xiao, J., Hays, J., Tan, P.: Framebreak: Dramatic image extrapolation by guided shift-maps. In: CVPR (2013)
Google Scholar
He, K., Chang, H., Sun, J.: Rectangling panoramic images via warping. TOG (2013)
Google Scholar
Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 647–664. Springer, Heidelberg (2014)
Google Scholar
Wu, Z., Song, S., Khosla, A., Tang, X., Xiao, J.: 3D ShapeNets for 2.5D object recognition and Next-Best-View prediction. ArXiv e-prints (2014)
Google Scholar
Guo, R., Hoiem, D.: Support surface prediction in indoor scenes (2013)
Google Scholar
Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from rgb-d images. In: CVPR (2013)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Chapter Google Scholar
Jiang, H., Xiao, J.: A linear approach to matching cuboids in RGBD images. In: CVPR (2013)
Google Scholar
Kim, B., Kohli, P., Savarese, S.: 3D scene understanding by Voxel-CRF. In: ICCV (2013)
Google Scholar
Zhang, J., Kan, C., Schwing, A.G., Urtasun, R.: Estimating the 3D layout of indoor scenes and its clutter from depth sensors. In: ICCV (2013)
Google Scholar
Jia, Z., Gallagher, A., Saxena, A., Chen, T.: 3D-based reasoning with blocks, support, and stability. In: CVPR (2013)
Google Scholar
Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: Scene understanding by reasoning geometry and physics. In: CVPR (2013)
Google Scholar
Xiao, J., Owens, A., Torralba, A.: SUN3D: A database of big spaces reconstructed using sfm and object labels. In: ICCV (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Princeton University, USA
Yinda Zhang, Shuran Song & Jianxiong Xiao
Simon Fraser University, Canada
Ping Tan

Authors

Yinda Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuran Song
View author publications
You can also search for this author in PubMed Google Scholar
Ping Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jianxiong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Copyright information

About this paper

Cite this paper

Zhang, Y., Song, S., Tan, P., Xiao, J. (2014). PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_43
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding

Abstract

Chapter PDF

Similar content being viewed by others

R3DS: Reality-Linked 3D Scenes for Panoramic Scene Understanding

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

PANDORA: A Panoramic Detection Dataset for Object with Orientation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding

Abstract

Chapter PDF

Similar content being viewed by others

R3DS: Reality-Linked 3D Scenes for Panoramic Scene Understanding

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

PANDORA: A Panoramic Detection Dataset for Object with Orientation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation