research-article

Open access

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Authors:

Feng XuAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 52, Pages 1 - 10

https://doi.org/10.1145/3641519.3657505

Published: 13 July 2024 Publication History

All formats PDF

Abstract

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.

Supplemental Material

MP4 File

Appendix and supplementary video.

Download
200.77 MB

MP4 File

Supplementary video and appendix.There were some copyright issues with the previously uploaded version, please use the version uploaded this time.

Download
200.77 MB

MP4 File - presentation

presentation

Download
344.71 MB

PDF File

Appendix and supplementary video.

Download
331.04 KB

References

[1]

Sheldon Andrews, Kenny Erleben, and Zachary Ferguson. 2022. Contact and friction simulation for computer graphics. In ACM SIGGRAPH 2022 Courses. 1–172.

Digital Library

[2]

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39, 1 (2020), 3–20.

Digital Library

[3]

Luca Ballan, Aparna Taneja, Jürgen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion capture of hands in action using discriminative salient points. In European Conference on Computer Vision. Springer, 640–653.

Digital Library

[4]

Samarth Brahmbhatt, Chengcheng Tang, Christopher D. Twigg, Charles C. Kemp, and James Hays. 2020. ContactPose: A Dataset of Grasps with Object Contact and Hand Pose. In The European Conference on Computer Vision (ECCV).

[5]

Stéphane Caron, Quang-Cuong Pham, and Yoshihiko Nakamura. 2015. Stability of surface contacts for humanoid robots: Closed-form formulae of the contact wrench cone for rectangular support areas. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5107–5112.

[6]

Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox. 2021. DexYCB: A Benchmark for Capturing Hand Grasping of Objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]

Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia Weng, Li Yi, Shuran Song, and He Wang. 2023c. Tracking and reconstructing hand object interactions from point cloud sequences in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 304–312.

Digital Library

[8]

Sirui Chen, Albert Wu, and C Karen Liu. 2023b. Synthesizing Dexterous Nonprehensile Pregrasp for Ungraspable Objects. In ACM SIGGRAPH 2023 Conference Proceedings. 1–10.

[9]

Tao Chen, Jie Xu, and Pulkit Agrawal. 2022b. A system for general in-hand object re-orientation. In Conference on Robot Learning. PMLR, 297–307.

[10]

Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuan Jiang, Zongqing Lu, Stephen McAleer, Hao Dong, Song-Chun Zhu, and Yaodong Yang. 2022a. Towards human-level bimanual dexterous manipulation with reinforcement learning. Advances in Neural Information Processing Systems 35 (2022), 5150–5163.

[11]

Zerui Chen, Shizhe Chen, Cordelia Schmid, and Ivan Laptev. 2023a. gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12890–12900.

[12]

Sammy Christen, Muhammed Kocabas, Emre Aksan, Jemin Hwangbo, Jie Song, and Otmar Hilliges. 2022. D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20577–20586.

[13]

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J. Black, and Otmar Hilliges. 2023. ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]

Levi Fussell, Kevin Bergamin, and Daniel Holden. 2021. Supertrack: Motion tracking for physically simulated characters using supervised learning. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–13.

Digital Library

[15]

Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, and Tae-Kyun Kim. 2018. First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. In Proceedings of Computer Vision and Pattern Recognition (CVPR).

[16]

Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. 2020. HOnnotate: A method for 3D Annotation of Hand and Object Poses. In CVPR.

[17]

Yana Hasson, Gul Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J Black, Ivan Laptev, and Cordelia Schmid. 2019. Learning joint reconstruction of hands and manipulated objects. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11807–11816.

[18]

Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. Advances in neural information processing systems 29 (2016).

[19]

Haoyu Hu, Xinyu Yi, Hao Zhang, Jun-Hai Yong, and Feng Xu. 2022. Physical interaction: Reconstructing hand-object interactions with physics. In SIGGRAPH Asia 2022 Conference Papers. 1–9.

Digital Library

[20]

Sumit Jain and C Karen Liu. 2011. Controlling physics-based characters using soft contacts. In Proceedings of the 2011 SIGGRAPH Asia Conference. 1–10.

Digital Library

[21]

Korrawe Karunratanakul, Jinlong Yang, Yan Zhang, Michael J Black, Krikamol Muandet, and Siyu Tang. 2020. Grasping field: Learning implicit representations for human grasps. In 2020 International Conference on 3D Vision (3DV). IEEE, 333–344.

[22]

Paul G Kry and Dinesh K Pai. 2006. Interaction capture and synthesis. ACM Transactions on Graphics (TOG) 25, 3 (2006), 872–880.

Digital Library

[23]

Nikolaos Kyriazis and Antonis Argyros. 2013. Physically plausible 3d scene : The single actor hypothesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9–16.

Digital Library

[24]

C Karen Liu. 2009. Dextrous manipulation from a grasping pose. In ACM SIGGRAPH 2009 papers. 1–6.

Digital Library

[25]

Libin Liu and Jessica Hodgins. 2018. Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–14.

Digital Library

[26]

YuXuan Liu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. 2018. Imitation from observation: Learning to imitate behaviors from raw video via context translation. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1118–1125.

Digital Library

[27]

Stefan Luding. 2008. Cohesive, frictional powders: contact models for tension. Granular matter 10, 4 (2008), 235–246.

[28]

Kevin M Lynch and Frank C Park. 2017. Modern robotics. Cambridge University Press.

[29]

Igor Mordatch, Zoran Popović, and Emanuel Todorov. 2012. Contact-invariant optimization for hand manipulation. In Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation. 137–144.

[30]

Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011. Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In 2011 International Conference on Computer Vision. IEEE, 2088–2095.

Digital Library

[31]

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. 2018. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG) 37, 4 (2018), 1–14.

Digital Library

[32]

Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. 2022. Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Transactions On Graphics (TOG) 41, 4 (2022), 1–17.

Digital Library

[33]

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. 2021. Amp: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–20.

Digital Library

[34]

Yuzhe Qin, Yueh-Hua Wu, Shaowei Liu, Hanwen Jiang, Ruihan Yang, Yang Fu, and Xiaolong Wang. 2022. Dexmv: Imitation learning for dexterous manipulation from human videos. In European Conference on Computer Vision. Springer, 570–587.

Digital Library

[35]

Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, and Jitendra Malik. 2021. State-only imitation learning for dexterous manipulation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 7865–7871.

Digital Library

[36]

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087 (2017).

[37]

Tanner Schmidt, Katharina Hertkorn, Richard Newcombe, Zoltan Marton, Michael Suppa, and Dieter Fox. 2015. Depth-based tracking with physical constraints for robot manipulation. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 119–126.

[38]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[39]

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. 2018. Time-contrastive networks: Self-supervised learning from video. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 1134–1141.

Digital Library

[40]

Soshi Shimada, Vladislav Golyanik, Weipeng Xu, and Christian Theobalt. 2020. PhysCap: physically plausible monocular 3D motion capture in real time. ACM Transactions on Graphics 39 (dec 2020).

Digital Library

[41]

Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time joint tracking of a hand manipulating an object from rgb-d input. In European Conference on Computer Vision. Springer, 294–310.

[42]

Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.

[43]

Bugra Tekin, Federica Bogo, and Marc Pollefeys. 2019. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4511–4520.

[44]

Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, and Xue Bin Peng. 2023. Calm: Conditional adversarial latent models for directable virtual characters. In ACM SIGGRAPH 2023 Conference Proceedings. 1–9.

Digital Library

[45]

Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 5026–5033.

[46]

Aggeliki Tsoli and Antonis A Argyros. 2018. Joint 3D tracking of a deformable object in interaction with a hand. In Proceedings of the European Conference on Computer Vision (ECCV). 484–500.

Digital Library

[47]

Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. 2016. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision 118, 2 (2016), 172–193.

Digital Library

[48]

Yangang Wang, Jianyuan Min, Jianjie Zhang, Yebin Liu, Feng Xu, Qionghai Dai, and Jinxiang Chai. 2013. Video-based hand manipulation capture through composite motion control. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–14.

Digital Library

[49]

Xinyue Wei, Minghua Liu, Zhan Ling, and Hao Su. 2022. Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–18.

Digital Library

[50]

Alexander Winkler, Jungdam Won, and Yuting Ye. 2022. QuestSim: Human motion tracking from sparse sensors with simulated avatars. In SIGGRAPH Asia 2022 Conference Papers. 1–8.

Digital Library

[51]

Albert Wu, Michelle Guo, and Karen Liu. 2023. Learning Diverse and Physically Feasible Dexterous Grasps with Generative Model and Bilevel Optimization. In Conference on Robot Learning. PMLR, 1938–1948.

[52]

Kevin Xie, Tingwu Wang, Umar Iqbal, Yunrong Guo, Sanja Fidler, and Florian Shkurti. 2021. Physics-based human motion estimation and synthesis from videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11532–11541.

[53]

Zeshi Yang, Kangkang Yin, and Libin Liu. 2022. Learning to use chopsticks in diverse gripping styles. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–17.

Digital Library

[54]

Yuting Ye and C Karen Liu. 2012. Synthesis of detailed hand manipulations using contact sampling. ACM Transactions on Graphics (ToG) 31, 4 (2012), 1–10.

Digital Library

[55]

Xinyu Yi, Yuxiao Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Feng Xu. 2022. Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]

Ye Yuan and Kris Kitani. 2020. Residual force control for agile human behavior imitation and extended motion synthesis. Advances in Neural Information Processing Systems 33 (2020), 21763–21774.

[57]

Ye Yuan, Shih-En Wei, Tomas Simon, Kris Kitani, and Jason Saragih. 2021. Simpoe: Simulated character control for 3d human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7159–7169.

[58]

He Zhang, Yuting Ye, Takaaki Shiratori, and Taku Komura. 2021a. Manipnet: neural manipulation synthesis with a hand-object spatial representation. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–14.

Digital Library

[59]

Hao Zhang, Yuxiao Zhou, Yifei Tian, Jun-Hai Yong, and Feng Xu. 2021b. Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions. ACM Transactions on Graphics (TOG) 40, 3 (2021), 1–12.

Digital Library

[60]

Zimeng Zhao, Binghui Zuo, Wei Xie, and Yangang Wang. 2022. Stability-driven contact reconstruction from monocular color images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1643–1653.

[61]

Yu Zheng and Katsu Yamane. 2013. Human motion tracking control with strict contact force constraints for floating-base humanoid robots. In 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). IEEE, 34–41.

Index Terms

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture

Recommendations

Physical Interaction: Reconstructing Hand-object Interactions with Physics
SA '22: SIGGRAPH Asia 2022 Conference Papers

Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions. This paper proposes a physics-based method to better solve the ambiguities in the reconstruction. It first proposes a ...
Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions
Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-...
InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions

Hand-object interaction is challenging to reconstruct but important for many applications like HCI, robotics and so on. Previous works focus on either the hand or the object while we jointly track the hand poses, fuse the 3D object model and reconstruct ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

July 2024

1106 pages

ISBN:9798400705250

DOI:10.1145/3641519

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SIGGRAPH '24

Sponsor:

SIGGRAPH

SIGGRAPH '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
308
Total Downloads

Downloads (Last 12 months)308
Downloads (Last 6 weeks)107

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents