research-article

Activity-centric scene synthesis for functional 3D scene modeling

Authors:

Matthew Fisher,

Matthias NießnerAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 34, Issue 6

Article No.: 179, Pages 1 - 13

https://doi.org/10.1145/2816795.2818057

Published: 02 November 2015 Publication History

Abstract

We present a novel method to generate 3D scenes that allow the same activities as real environments captured through noisy and incomplete 3D scans. As robust object detection and instance retrieval from low-quality depth data is challenging, our algorithm aims to model semantically-correct rather than geometrically-accurate object arrangements. Our core contribution is a new scene synthesis technique which, conditioned on a coarse geometric scene representation, models functionally similar scenes using prior knowledge learned from a scene database. The key insight underlying our scene synthesis approach is that many real-world environments are structured to facilitate specific human activities, such as sleeping or eating. We represent scene functionalities through virtual agents that associate object arrangements with the activities for which they are typically used. When modeling a scene, we first identify the activities supported by a scanned environment. We then determine semantically-plausible arrangements of virtual objects -- retrieved from a shape database -- constrained by the observed scene geometry. For a given 3D scan, our algorithm produces a variety of synthesized scenes which support the activities of the captured real environments. In a perceptual evaluation study, we demonstrate that our results are judged to be visually appealing and functionally comparable to manually designed scenes.

Supplementary Material

ZIP File (a179-fisher.zip)

Supplemental files.

Download
86.31 MB

References

[1]

Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM Transactions on Graphics (TOG) 33, 6, 208.

Digital Library

[2]

Drost, B., and Ilic, S. 2012. 3d object detection and localization using multimodal point pair features. In Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), IEEE, 9--16.

Digital Library

[3]

Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 34.

Digital Library

[4]

Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG) 31, 6, 135.

Digital Library

[5]

Galleguillos, C., and Belongie, S. 2010. Context based object categorization: A critical survey. Computer Vision and Image Understanding 114, 6, 712--722.

Digital Library

[6]

Gibson, J. 1977. The concept of affordances. Perceiving, acting, and knowing.

[7]

Grabner, H., Gall, J., and Van Gool, L. 2011. What makes a chair a chair? In CVPR.

Digital Library

[8]

Jiang, Y., and Saxena, A. 2013. Hallucinating humans for learning robotic placement of objects. In Experimental Robotics, Springer, 921--937.

[9]

Jiang, Y., Lim, M., and Saxena, A. 2012. Learning object arrangements in 3d scenes using human context. arXiv preprint arXiv:1206.6462.

[10]

Johnson, A. E., and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5, 433--449.

Digital Library

[11]

Kim, Y. M., Mitra, N. J., Yan, D.-M., and Guibas, L. 2012. Acquiring 3d indoor environments with variability and repetition. ACM Transactions on Graphics (TOG) 31, 6, 138.

Digital Library

[12]

Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics (TOG) 33, 4, 120.

Digital Library

[13]

Koppula, H., Gupta, R., and Saxena, A. 2013. Learning human activities and object affordances from RGB-D videos. IJRR.

Digital Library

[14]

Li, Y., Dai, A., Guibas, L., and Niessner, M. 2015. Database-assisted object retrieval for real-time 3d reconstruction. Computer Graphics Forum 34, 2.

Digital Library

[15]

Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (TOG) 33, 6.

Digital Library

[16]

Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91--110.

Digital Library

[17]

Margaritis, D. 2003. Learning Bayesian network model structure from data. PhD thesis, University of Pittsburgh.

[18]

Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., and Pajarola, R. 2014. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum 33, 2, 11--21.

Digital Library

[19]

Nan, L., Xie, K., and Sharf, A. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG) 31, 6, 137.

Digital Library

[20]

Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics (TOG) 32, 6, 169.

Digital Library

[21]

Satkin, S., and Hebert, M. 2013. 3dnn: Viewpoint invariant 3d geometry matching for scene understanding. In Computer Vision (ICCV), 2013 IEEE International Conference on, IEEE, 1873--1880.

Digital Library

[22]

Savva, M., Chang, A. X., Hanrahan, P., Fisher, M., and Niessner, M. 2014. Scenegrok: Inferring action maps in 3D environments. ACM Transactions on Graphics (TOG) 33, 6.

Digital Library

[23]

Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics (TOG) 31, 6, 136.

Digital Library

[24]

Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. J. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM Transactions on Graphics (TOG) 33, 6, 209.

Digital Library

[25]

Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. 2013. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1, 116--124.

Digital Library

[26]

Tversky, B., and Hard, B. M. 2009. Embodied and disembodied cognition: Spatial perspective-taking. Cognition 110, 1, 124--129.

[27]

Wei, P., Zhao, Y., Zheng, N., and Zhu, S.-C. 2013. Modeling 4D human-object interactions for event and object recognition. In ICCV.

Digital Library

[28]

Wei, P., Zheng, N., Zhao, Y., and Zhu, S.-C. 2013. Concurrent action detection with structural prediction. In ICCV.

Digital Library

[29]

Xu, K., Zhang, H., Cohen-Or, D., and Chen, B. 2012. Fit and diverse: set evolution for inspiring 3d shape galleries. ACM Transactions on Graphics (TOG) 31, 4, 57.

Digital Library

[30]

Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM Transactions on Graphics (TOG) 32, 4, 123.

Digital Library

[31]

Xu, K., Ma, R., Zhang, H., Zhu, C., Shamir, A., Cohen-Or, D., and Huang, H. 2014. Organizing heterogeneous scene collections through contextual focal points. ACM Transactions on Graphics (TOG) 33, 4, 35.

Digital Library

[32]

Yeh, Y.-T., Yang, L., Watson, M., Goodman, N. D., and Hanrahan, P. 2012. Synthesizing open worlds with constraints using locally annealed reversible jump mcmc. ACM Transactions on Graphics (TOG) 31, 4, 56.

Digital Library

Cited By

Zeng FRao XLei JHuo XShi YAi D(2025)Behavioral Correlation-Based Residential Space Modularization Using Design Structure Matrix and Fuzzy C-Means Clustering AlgorithmBuildings10.3390/buildings1504064715:4(647)Online publication date: 19-Feb-2025
https://doi.org/10.3390/buildings15040647
Matilde FCruz GSilva D(2025)Impact of synthetic images in the training of neural networks for airborne vessel segmentationSeventeenth International Conference on Machine Vision (ICMV 2024)10.1117/12.3055305(50)Online publication date: 25-Feb-2025
https://doi.org/10.1117/12.3055305
Zhi JLien J(2025)Improving Human-Robot Collaboration via Computational DesignIEEE Robotics and Automation Letters10.1109/LRA.2024.351986310:2(1074-1081)Online publication date: Feb-2025
https://doi.org/10.1109/LRA.2024.3519863
Show More Cited By

Index Terms

Activity-centric scene synthesis for functional 3D scene modeling
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging

Recommendations

Deep convolutional priors for indoor scene synthesis

We present a convolutional neural network based approach for indoor scene synthesis. By representing 3D scenes with a semantically-enriched image-based representation based on orthographic top-down views, we learn convolutional object placement priors ...
Manhattan Scene Understanding via XSlit Imaging
CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition

A Manhattan World (MW) is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes ...
Understanding Indoor Scene: Spatial Layout Estimation, Scene Classification, and Object Detection
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we seek to understand scene from different viewpoints such as estimating the spatial layout of indoor scenes, detecting objects in the scene and making scene classification. In the previous work, every step has been done in a separate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 34, Issue 6

November 2015

944 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2816795

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2015

Published in TOG Volume 34, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC
Max Planck Center for Visual Computing and Communications

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

96
Total Citations
View Citations
1,048
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zeng FRao XLei JHuo XShi YAi D(2025)Behavioral Correlation-Based Residential Space Modularization Using Design Structure Matrix and Fuzzy C-Means Clustering AlgorithmBuildings10.3390/buildings1504064715:4(647)Online publication date: 19-Feb-2025
https://doi.org/10.3390/buildings15040647
Matilde FCruz GSilva D(2025)Impact of synthetic images in the training of neural networks for airborne vessel segmentationSeventeenth International Conference on Machine Vision (ICMV 2024)10.1117/12.3055305(50)Online publication date: 25-Feb-2025
https://doi.org/10.1117/12.3055305
Zhi JLien J(2025)Improving Human-Robot Collaboration via Computational DesignIEEE Robotics and Automation Letters10.1109/LRA.2024.351986310:2(1074-1081)Online publication date: Feb-2025
https://doi.org/10.1109/LRA.2024.3519863
Wang YWu WFei YZheng L(2025)CVTLayout: Automated generation of mid-scale commercial space layout via Centroidal Voronoi TessellationComputers & Graphics10.1016/j.cag.2025.104175127(104175)Online publication date: Apr-2025
https://doi.org/10.1016/j.cag.2025.104175
He FHu XQian XZhu ZRamani K(2024)AdapTUI: Adaptation of Geometric-Feature-Based Tangible User Interfaces in Augmented RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981278:ISS(44-69)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3698127
Zhang SZhu HChen XChen JPeng ZChen ZYang YZhang SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)ScenePhotographer: Object-Oriented Photography for Residential ScenesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680942(7843-7851)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680942
Zhang STam HLi YRen KFu HZhang S(2024)SceneDirector: Interactive Scene Synthesis by Simultaneously Editing Multiple Objects in Real-TimeIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.326811530:8(4558-4569)Online publication date: Aug-2024
https://doi.org/10.1109/TVCG.2023.3268115
Tang JNie YMarkhasin LDai AThies JNießner M(2024)DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01938(20507-20518)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01938
Yang YJia BZhi PHuang S(2024)PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01539(16262-16272)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01539
Ju XHuang ZLi YZhang GQiao YLi H(2024)DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00433(4526-4535)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00433
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents