Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Activity-centric scene synthesis for functional 3D scene modeling

Published: 02 November 2015 Publication History

Abstract

We present a novel method to generate 3D scenes that allow the same activities as real environments captured through noisy and incomplete 3D scans. As robust object detection and instance retrieval from low-quality depth data is challenging, our algorithm aims to model semantically-correct rather than geometrically-accurate object arrangements. Our core contribution is a new scene synthesis technique which, conditioned on a coarse geometric scene representation, models functionally similar scenes using prior knowledge learned from a scene database. The key insight underlying our scene synthesis approach is that many real-world environments are structured to facilitate specific human activities, such as sleeping or eating. We represent scene functionalities through virtual agents that associate object arrangements with the activities for which they are typically used. When modeling a scene, we first identify the activities supported by a scanned environment. We then determine semantically-plausible arrangements of virtual objects -- retrieved from a shape database -- constrained by the observed scene geometry. For a given 3D scan, our algorithm produces a variety of synthesized scenes which support the activities of the captured real environments. In a perceptual evaluation study, we demonstrate that our results are judged to be visually appealing and functionally comparable to manually designed scenes.

Supplementary Material

ZIP File (a179-fisher.zip)
Supplemental files.

References

[1]
Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM Transactions on Graphics (TOG) 33, 6, 208.
[2]
Drost, B., and Ilic, S. 2012. 3d object detection and localization using multimodal point pair features. In Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), IEEE, 9--16.
[3]
Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 34.
[4]
Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG) 31, 6, 135.
[5]
Galleguillos, C., and Belongie, S. 2010. Context based object categorization: A critical survey. Computer Vision and Image Understanding 114, 6, 712--722.
[6]
Gibson, J. 1977. The concept of affordances. Perceiving, acting, and knowing.
[7]
Grabner, H., Gall, J., and Van Gool, L. 2011. What makes a chair a chair? In CVPR.
[8]
Jiang, Y., and Saxena, A. 2013. Hallucinating humans for learning robotic placement of objects. In Experimental Robotics, Springer, 921--937.
[9]
Jiang, Y., Lim, M., and Saxena, A. 2012. Learning object arrangements in 3d scenes using human context. arXiv preprint arXiv:1206.6462.
[10]
Johnson, A. E., and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5, 433--449.
[11]
Kim, Y. M., Mitra, N. J., Yan, D.-M., and Guibas, L. 2012. Acquiring 3d indoor environments with variability and repetition. ACM Transactions on Graphics (TOG) 31, 6, 138.
[12]
Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics (TOG) 33, 4, 120.
[13]
Koppula, H., Gupta, R., and Saxena, A. 2013. Learning human activities and object affordances from RGB-D videos. IJRR.
[14]
Li, Y., Dai, A., Guibas, L., and Niessner, M. 2015. Database-assisted object retrieval for real-time 3d reconstruction. Computer Graphics Forum 34, 2.
[15]
Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (TOG) 33, 6.
[16]
Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91--110.
[17]
Margaritis, D. 2003. Learning Bayesian network model structure from data. PhD thesis, University of Pittsburgh.
[18]
Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., and Pajarola, R. 2014. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum 33, 2, 11--21.
[19]
Nan, L., Xie, K., and Sharf, A. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG) 31, 6, 137.
[20]
Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics (TOG) 32, 6, 169.
[21]
Satkin, S., and Hebert, M. 2013. 3dnn: Viewpoint invariant 3d geometry matching for scene understanding. In Computer Vision (ICCV), 2013 IEEE International Conference on, IEEE, 1873--1880.
[22]
Savva, M., Chang, A. X., Hanrahan, P., Fisher, M., and Niessner, M. 2014. Scenegrok: Inferring action maps in 3D environments. ACM Transactions on Graphics (TOG) 33, 6.
[23]
Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics (TOG) 31, 6, 136.
[24]
Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. J. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM Transactions on Graphics (TOG) 33, 6, 209.
[25]
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. 2013. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1, 116--124.
[26]
Tversky, B., and Hard, B. M. 2009. Embodied and disembodied cognition: Spatial perspective-taking. Cognition 110, 1, 124--129.
[27]
Wei, P., Zhao, Y., Zheng, N., and Zhu, S.-C. 2013. Modeling 4D human-object interactions for event and object recognition. In ICCV.
[28]
Wei, P., Zheng, N., Zhao, Y., and Zhu, S.-C. 2013. Concurrent action detection with structural prediction. In ICCV.
[29]
Xu, K., Zhang, H., Cohen-Or, D., and Chen, B. 2012. Fit and diverse: set evolution for inspiring 3d shape galleries. ACM Transactions on Graphics (TOG) 31, 4, 57.
[30]
Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM Transactions on Graphics (TOG) 32, 4, 123.
[31]
Xu, K., Ma, R., Zhang, H., Zhu, C., Shamir, A., Cohen-Or, D., and Huang, H. 2014. Organizing heterogeneous scene collections through contextual focal points. ACM Transactions on Graphics (TOG) 33, 4, 35.
[32]
Yeh, Y.-T., Yang, L., Watson, M., Goodman, N. D., and Hanrahan, P. 2012. Synthesizing open worlds with constraints using locally annealed reversible jump mcmc. ACM Transactions on Graphics (TOG) 31, 4, 56.

Cited By

View all
  • (2025)Behavioral Correlation-Based Residential Space Modularization Using Design Structure Matrix and Fuzzy C-Means Clustering AlgorithmBuildings10.3390/buildings1504064715:4(647)Online publication date: 19-Feb-2025
  • (2025)Impact of synthetic images in the training of neural networks for airborne vessel segmentationSeventeenth International Conference on Machine Vision (ICMV 2024)10.1117/12.3055305(50)Online publication date: 25-Feb-2025
  • (2025)Improving Human-Robot Collaboration via Computational DesignIEEE Robotics and Automation Letters10.1109/LRA.2024.351986310:2(1074-1081)Online publication date: Feb-2025
  • Show More Cited By

Index Terms

  1. Activity-centric scene synthesis for functional 3D scene modeling

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 34, Issue 6
    November 2015
    944 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/2816795
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2015
    Published in TOG Volume 34, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. activities
    2. scene synthesis
    3. scene understanding

    Qualifiers

    • Research-article

    Funding Sources

    • NSFC
    • Max Planck Center for Visual Computing and Communications

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Behavioral Correlation-Based Residential Space Modularization Using Design Structure Matrix and Fuzzy C-Means Clustering AlgorithmBuildings10.3390/buildings1504064715:4(647)Online publication date: 19-Feb-2025
    • (2025)Impact of synthetic images in the training of neural networks for airborne vessel segmentationSeventeenth International Conference on Machine Vision (ICMV 2024)10.1117/12.3055305(50)Online publication date: 25-Feb-2025
    • (2025)Improving Human-Robot Collaboration via Computational DesignIEEE Robotics and Automation Letters10.1109/LRA.2024.351986310:2(1074-1081)Online publication date: Feb-2025
    • (2025)CVTLayout: Automated generation of mid-scale commercial space layout via Centroidal Voronoi TessellationComputers & Graphics10.1016/j.cag.2025.104175127(104175)Online publication date: Apr-2025
    • (2024)AdapTUI: Adaptation of Geometric-Feature-Based Tangible User Interfaces in Augmented RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981278:ISS(44-69)Online publication date: 24-Oct-2024
    • (2024)ScenePhotographer: Object-Oriented Photography for Residential ScenesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680942(7843-7851)Online publication date: 28-Oct-2024
    • (2024)SceneDirector: Interactive Scene Synthesis by Simultaneously Editing Multiple Objects in Real-TimeIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.326811530:8(4558-4569)Online publication date: Aug-2024
    • (2024)DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01938(20507-20518)Online publication date: 16-Jun-2024
    • (2024)PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01539(16262-16272)Online publication date: 16-Jun-2024
    • (2024)DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00433(4526-4535)Online publication date: 16-Jun-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media