research-article

Public Access

PiGraphs: learning interaction snapshots from observations

Authors:

Angel X. Chang,

Matthew Fisher,

Matthias NießnerAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 35, Issue 4

Article No.: 139, Pages 1 - 12

https://doi.org/10.1145/2897824.2925867

Published: 11 July 2016 Publication History

Abstract

We learn a probabilistic model connecting human poses and arrangements of object geometry from real-world observations of interactions collected with commodity RGB-D sensors. This model is encoded as a set of prototypical interaction graphs (PiGraphs), a human-centric representation capturing physical contact and visual attention linkages between 3D geometry and human body parts. We use this encoding of the joint probability distribution over pose and geometry during everyday interactions to generate interaction snapshots, which are static depictions of human poses and relevant objects during human-object interactions. We demonstrate that our model enables a novel human-centric understanding of 3D content and allows for jointly generating 3D scenes and interaction poses given terse high-level specifications, natural language, or reconstructed real-world scene constraints.

Supplementary Material

ZIP File (a139-savva-supp.zip)

Supplemental files.

Download
95.68 MB

MP4 File (a139.mp4)

Download
330.26 MB

References

[1]

Bai, Y., Siu, K., and Liu, C. K. 2012. Synthesis of concurrent object manipulation tasks. ACM Trans. Graph. 31, 6, 156.

Digital Library

[2]

Bohg, J., Morales, A., Asfour, T., and Kragic, D. 2013. Data-driven grasp synthesis---a survey.

[3]

Chang, A. X., Savva, M., and Manning, C. D. 2014. Learning spatial knowledge for text to 3D scene generation. In Empirical Methods in Natural Language Processing (EMNLP).

[4]

Coyne, B., and Sproat, R. 2001. WordsEye: an automatic text-to-scene conversion system. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques.

Digital Library

[5]

De la Torre, F., Hodgins, J., Montano, J., Valcarcel, S., and Macey, J. 2009. Guide to the Carnegie Mellon university multimodal activity (CMU-MMAC) database. Robotics Institute, Carnegie Mellon University.

[6]

Delaitre, V., Fouhey, D. F., Laptev, I., Sivic, J., Gupta, A., and Efros, A. A. 2012. Scene semantics from long-term observation of people. In ECCV.

Digital Library

[7]

Felzenszwalb, P. F., and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. IJCV.

Digital Library

[8]

Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. In ACM TOG.

Digital Library

[9]

Fisher, M., Savva, M., Li, Y., Hanrahan, P., and Niessner, M. 2015. Activity-centric scene synthesis for functional 3D scene modeling. ACM Transactions on Graphics (TOG) 34, 6, 179.

Digital Library

[10]

Fouhey, D. F., Delaitre, V., Gupta, A., Efros, A. A., Laptev, I., and Sivic, J. 2012. People watching: Human actions as a cue for single view geometry. In ECCV.

Digital Library

[11]

Gibson, J. 1977. The concept of affordances. Perceiving, acting, and knowing.

[12]

Grabner, H., Gall, J., and Van Gool, L. 2011. What makes a chair a chair? In CVPR.

Digital Library

[13]

Grochow, K., Martin, S. L., Hertzmann, A., and Popović, Z. 2004. Style-based inverse kinematics. In ACM Transactions on Graphics (TOG), vol. 23, ACM, 522--531.

Digital Library

[14]

Guo, S., Southern, R., Chang, J., Greer, D., and Zhang, J. J. 2014. Adaptive motion synthesis for virtual characters: a survey. The Visual Computer, 1--16.

Digital Library

[15]

Gupta, A., Satkin, S., Efros, A. A., and Hebert, M. 2011. From 3D scene geometry to human workspace. In CVPR.

Digital Library

[16]

Hu, R., Zhu, C., van Kaick, O., Liu, L., Shamir, A., and Zhang, H. 2015. Interaction context (icon): Towards a geometric functionality descriptor. ACM Trans. Graph. 34, 4 (July), 83:1--83:12.

Digital Library

[17]

Huang, H., Kalogerakis, E., and Marlin, B. 2015. Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces. Computer Graphics Forum 34, 5.

[18]

Jiang, Y., and Saxena, A. 2013. Infinite latent conditional random fields for modeling environments through humans. In RSS.

[19]

Jiang, Y., Lim, M., and Saxena, A. 2012. Learning object arrangements in 3D scenes using human context. arXiv preprint arXiv:1206.6462.

[20]

Jiang, Y., Koppula, H., and Saxena, A. 2013. Hallucinated humans as the hidden context for labeling 3D scenes. In CVPR.

Digital Library

[21]

Kallmann, M., and Thalmann, D. 1999. Modeling objects for interaction tasks. Springer.

[22]

Kalogerakis, E., Chaudhuri, S., Koller, D., and Koltun, V. 2012. A probabilistic model of component-based shape synthesis. ACM Transactions on Graphics 31, 4.

Digital Library

[23]

Kang, C., and Lee, S.-H. 2014. Environment-adaptive contact poses for virtual characters. In Computer Graphics Forum, vol. 33, Wiley Online Library, 1--10.

Digital Library

[24]

Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2Pose: Human-centric shape analysis. ACM TOG.

Digital Library

[25]

Koppula, H. S., and Saxena, A. 2013. Anticipating human activities using object affordances for reactive robotic response. RSS.

[26]

Koppula, H., Gupta, R., and Saxena, A. 2013. Learning human activities and object affordances from RGB-D videos. IJRR.

Digital Library

[27]

Lee, K. H., Choi, M. G., and Lee, J. 2006. Motion patches: building blocks for virtual environments annotated with motion data. In ACM Transactions on Graphics (TOG), vol. 25, ACM, 898--906.

Digital Library

[28]

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. 2014. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL): System Demonstrations.

[29]

Mardia, K. V., and Jupp, P. E. 2009. Directional statistics, vol. 494. John Wiley & Sons.

[30]

Min, J., and Chai, J. 2012. Motion graphs++: a compact generative model for semantic motion analysis and synthesis. ACM Transactions on Graphics (TOG) 31, 6, 153.

Digital Library

[31]

Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM TOG.

Digital Library

[32]

Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., and Bajcsy, R. 2013. Berkeley MHAD: A comprehensive multi-modal human action database. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on, IEEE, 53--60.

Digital Library

[33]

Savva, M., Chang, A. X., Hanrahan, P., Fisher, M., and Niessner, M. 2014. SceneGrok: Inferring action maps in 3D environments. ACM TOG.

Digital Library

[34]

Shapiro, A. 2011. Building a character animation system. In Motion in Games. Springer, 98--109.

Digital Library

[35]

Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. 2013. Real-time human pose recognition in parts from single depth images. CACM.

Digital Library

[36]

Sra, S. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of I<sub>s</sub>(x). Computational Statistics 27, 1, 177--190.

Digital Library

[37]

Stark, M., Lies, P., Zillich, M., Wyatt, J., and Schiele, B. 2008. Functional object class detection based on learned affordance cues. In Computer Vision Systems.

Digital Library

[38]

Tenorth, M., Bandouch, J., and Beetz, M. 2009. The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, IEEE, 1089--1096.

[39]

Wei, P., Zhao, Y., Zheng, N., and Zhu, S.-C. 2013. Modeling 4D human-object interactions for event and object recognition. In ICCV.

Digital Library

[40]

Wei, P., Zheng, N., Zhao, Y., and Zhu, S.-C. 2013. Concurrent action detection with structural prediction. In ICCV.

Digital Library

[41]

Xu, K., Ma, R., Zhang, H., Zhu, C., Shamir, A., Cohen-Or, D., and Huang, H. 2014. Organizing heterogeneous scene collection through contextual focal points. ACM TOG.

Digital Library

[42]

Yu, L.-F., Yeung, S. K., Tang, C.-K., Terzopoulos, D., Chan, T. F., and Osher, S. 2011. Make it home: automatic optimization of furniture arrangement. ACM Transactions on Graphics 30, 4, 86.

Digital Library

[43]

Yumer, M. E., Chaudhuri, S., Hodgins, J. K., and Kara, L. B. 2015. Semantic shape editing using deformation handles. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2015) 34.

Digital Library

[44]

Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., and Zhu, S.-C. 2014. Detecting potential falling objects by inferring human action and natural disturbance. In ICRA.

Cited By

Tao DRuizhen HLibin LLi YHao Z(2024)Research progress in human-like indoor scene interactionJournal of Image and Graphics10.11834/jig.24000429:6(1575-1606)Online publication date: 2024
https://doi.org/10.11834/jig.240004
Jiang HSong LWeng DSun ZLi HDongye XZhang ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)In Situ 3D Scene Synthesis for Ubiquitous Embodied InterfacesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681616(3666-3675)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681616
Liu JZheng YZhou KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Virtual Agent Positioning Driven by Personal CharacteristicsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680634(7658-7666)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680634
Show More Cited By

Index Terms

PiGraphs: learning interaction snapshots from observations
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Spatial and physical reasoning
  2. Computer graphics

Recommendations

SceneGrok: inferring action maps in 3D environments

With modern computer graphics, we can generate enormous amounts of 3D scene data. It is now possible to capture high-quality 3D representations of large real-world environments. Large shape and scene databases, such as the Trimble 3D Warehouse, are ...
PiGraphs: learning interaction snapshots from observations
SA '16: SIGGRAPH ASIA 2016 Virtual Reality meets Physical Reality: Modelling and Simulating Virtual Humans and Environments

Computer graphics has made great progress in enabling people to create visual content. However, we still face a big content creation bottleneck. In particular, designing 3D scenes and virtual character interactions within them is still a time---...
Understanding and Exploiting Object Interaction Landscapes

Interactions play a key role in understanding objects and scenes for both virtual and real-world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or interaction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 35, Issue 4

July 2016

1396 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2897824

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2016

Published in TOG Volume 35, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

115
Total Citations
View Citations
1,243
Total Downloads

Downloads (Last 12 months)224
Downloads (Last 6 weeks)33

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tao DRuizhen HLibin LLi YHao Z(2024)Research progress in human-like indoor scene interactionJournal of Image and Graphics10.11834/jig.24000429:6(1575-1606)Online publication date: 2024
https://doi.org/10.11834/jig.240004
Jiang HSong LWeng DSun ZLi HDongye XZhang ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)In Situ 3D Scene Synthesis for Ubiquitous Embodied InterfacesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681616(3666-3675)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681616
Liu JZheng YZhou KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Virtual Agent Positioning Driven by Personal CharacteristicsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680634(7658-7666)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680634
Li JHuang TZhu QWong T(2024)Physics-based Scene Layout Generation from Human MotionACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657517(1-10)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657517
Zhu WMa XRo DCi HZhang JShi JGao FTian QWang Y(2024)Human Motion Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333093546:4(2430-2449)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3330935
Yu TLin XWang SSheng WHuang QYu J(2024)A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D ScenesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329688934:3(1322-1338)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3296889
Guo YLi YRen DZhang XLi JPu LMa CZhan XGuo JWei MZhang YYu PYang SJi DYe HSun HLiu YChen YZhu JLiu H(2024)LiDAR-Net: A Real-Scanned 3D Point Cloud Dataset for Indoor Scenes2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02076(21989-21999)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02076
Li LDai A(2024)GenZI: Zero-Shot 3D Human-Scene Interaction Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01934(20465-20474)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01934
Kulkarni NRempe DGenova KKundu AJohnson JFouhey DGuibas L(2024)NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00096(947-957)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00096
Zhao CZhang JDu JShan ZWang JYu JWang JXu L(2024)I'M HOI: Inertia-Aware Monocular Capture of 3D Human-Object Interactions2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00076(729-741)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00076
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents