Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3208159.3208192acmotherconferencesArticle/Chapter ViewAbstractPublication PagescgiConference Proceedingsconference-collections
research-article

Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction

Published: 11 June 2018 Publication History

Abstract

Detecting small hand-held objects plays a critical role for human-robot interaction, because the hand-held objects often reveal the intention of the human, e.g., use a cell phone to make a call or use a cup to drink, thus helps the robots understand the human behavior and response accordingly. Existing solutions relying on wearable sensor to detect hand-held objects often comprise the user experiences thus may not be preferred. With the development of commodity RGB-D sensors, e.g., Microsoft Kinect II, RGB and depth information have been used for the understanding of human actions and recognizing objects. Motivated by the previous success, we propose to detect hand-held objects using RGB-D sensor. However, instead of performing object detection alone, we propose to leverage human body pose as the context to achieve robust hand-held object detection in RGB-D videos. Our system demonstrates a person can interact with a humanoid social robot with hand-held object such as a cell phone or a cup. Experimental evaluations validate the effectiveness of this proposed method.

References

[1]
Cigdem Beyan and Alptekin Temizel. 2015. A multimodal approach for individual tracking of people and their belongings. The Imaging Science Journal 63, 4 (2015), 192--202.
[2]
Barry Brumitt, Brian Meyers, John Krumm, Amanda Kern, and Steven Shafer. 2000. Easyliving: Technologies for intelligent environments. In International Symposium on Handheld and Ubiquitous Computing. Springer, 12--29.
[3]
Kerstin Dautenhahn. 2007. Socially intelligent robots: dimensions of human--robot interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 362, 1480 (2007), 679--704.
[4]
Chaitanya Desai, Deva Ramanan, and Charless Fowlkes. 2010. Discriminative models for static human-object interactions. In Computer vision and pattern recognition workshops (CVPRW), 2010 IEEE computer society conference on. IEEE, 9--16.
[5]
K. P. Fishkin, M. Philipose, and A. Rea. 2005. Hands-on RFID: wireless wearables for detecting use of objects. In IEEE International Symposium on Wearable Computers, 2005. Proceedings. 38--43.
[6]
Ross Girshick. 2015. Fast r-cnn. arXiv preprint arXiv:1504.08083 (2015).
[7]
Michael A Goodrich and Alan C Schultz. 2007. Human-robot interaction: a survey. Foundations and trends in human-computer interaction 1, 3 (2007), 203--275.
[8]
João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2015), 583--596.
[9]
Isibor Kennedy Ihianle, Usman Naeem, and Abdel-Rahman Tawil. 2016. Recognition of activities of daily living from topic model. Procedia Computer Science 98 (2016), 24--31.
[10]
Kai Kang, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2016. Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 817--825.
[11]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[12]
Marina Pérez-Jiménez, Borja Bordel Sánchez, and Ramón Alcarria. 2016. T4AI: A system for monitoring people based on improved wearable devices. Research Briefs on Information & Communication Technology Evolution (ReBICTE) 2 (2016), 1--16.
[13]
Juhi Ranjan and Kamin Whitehouse. 2016. Towards recognizing person-object interactions using a single wrist wearable device. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. ACM, 722--731.
[14]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.
[15]
Joseph Redmon and Ali Farhadi. 2016. YOLO9000: better, faster, stronger. arXiv preprint 1612 (2016).
[16]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[17]
Thomas B Sheridan. 2016. Human--robot interaction: status and challenges. Human factors 58, 4 (2016), 525--532.
[18]
Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. 2011. Real-time human pose recognition in parts from single depth images. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. Ieee, 1297--1304.
[19]
Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, and Fabio Cuzzolin. 2017. Online real-time multiple spatiotemporal action localisation and prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3637--3646.
[20]
Joshua R. Smith, Kenneth P. Fishkin, Bing Jiang, Alexander Mamishev, Matthai Philipose, Adam D. Rea, Sumit Roy, and Kishore Sundara-Rajan. 2005. RFID-based techniques for human-activity detection. Communications of the Acm 48, 9 (2005), 39--44.
[21]
Juan R Terven and Diana M Córdova-Esparza. 2016. Kin2. A Kinect 2 toolbox for MATLAB. Science of Computer Programming 130 (2016), 97--106.
[22]
Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. International journal of computer vision 104, 2 (2013), 154--171.
[23]
Yang Xiao, Zhijun Zhang, Aryel Beck, Junsong Yuan, and Daniel Thalmann. 2014. Human--robot interaction by understanding upper body gestures. Presence: teleoperators and virtual environments 23, 2 (2014), 133--154.
[24]
Zhaozhuo Xu, Yuan Tian, Xinjue Hu, and Fangling Pu. 2015. Dangerous human event understanding using human-object interaction model. In Signal Processing, Communications and Computing (ICSPCC), 2015 IEEE International Conference on. IEEE, 1--5.
[25]
Jiong Yang and Junsong Yuan. 2017. Common Action Discovery and Localization in Unconstrained Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2157--2166.
[26]
Xianxun Yao, Kai Liu, Anyong Hu, and Jungang Miao. 2015. Improved design of a passive millimeter-wave synthetic aperture interferometric imager for indoor applications. In Millimetre Wave and Terahertz Sensors and Technology VIII, Vol. 9651. International Society for Optics and Photonics, 965105.
[27]
Haoyong Yu, Sunan Huang, Gong Chen, Yongping Pan, and Zhao Guo. 2015. Human--robot interaction control of rehabilitation robots with series elastic actuators. IEEE Transactions on Robotics 31, 5 (2015), 1089--1100.

Cited By

View all
  • (2024)Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot InteractionAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680244(1-8)Online publication date: 21-Sep-2024
  • (2022)Deep learning and RGB-D based human action, human–human and human–object interaction recognition: A surveyJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10353186(103531)Online publication date: Jul-2022
  • (2021)Human-Object Interaction Detection: 1D Convolutional Neural Network Approach Using Skeleton Data2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685549(1-5)Online publication date: 23-Nov-2021
  • Show More Cited By

Index Terms

  1. Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      CGI 2018: Proceedings of Computer Graphics International 2018
      June 2018
      284 pages
      ISBN:9781450364010
      DOI:10.1145/3208159
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 June 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Handheld object detection
      2. Human-robot interaction
      3. Microsoft Kinect II
      4. Object tracking

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      CGI 2018
      CGI 2018: Computer Graphics International 2018
      June 11 - 14, 2018
      Island, Bintan, Indonesia

      Acceptance Rates

      CGI 2018 Paper Acceptance Rate 35 of 159 submissions, 22%;
      Overall Acceptance Rate 35 of 159 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 31 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot InteractionAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680244(1-8)Online publication date: 21-Sep-2024
      • (2022)Deep learning and RGB-D based human action, human–human and human–object interaction recognition: A surveyJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10353186(103531)Online publication date: Jul-2022
      • (2021)Human-Object Interaction Detection: 1D Convolutional Neural Network Approach Using Skeleton Data2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685549(1-5)Online publication date: 23-Nov-2021
      • (2021)Survey of Speechless Interaction Techniques in Social RoboticsIntelligent Scene Modeling and Human-Computer Interaction10.1007/978-3-030-71002-6_14(241-257)Online publication date: 9-Jun-2021
      • (2020)Implementation of grey wolf optimization controller for multiple humanoid navigationComputer Animation and Virtual Worlds10.1002/cav.191931:3Online publication date: 5-Mar-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media