research-article

Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction

Authors:

Nadia Magnenat-ThalmannAuthors Info & Claims

CGI 2018: Proceedings of Computer Graphics International 2018

Pages 163 - 167

https://doi.org/10.1145/3208159.3208192

Published: 11 June 2018 Publication History

Abstract

Detecting small hand-held objects plays a critical role for human-robot interaction, because the hand-held objects often reveal the intention of the human, e.g., use a cell phone to make a call or use a cup to drink, thus helps the robots understand the human behavior and response accordingly. Existing solutions relying on wearable sensor to detect hand-held objects often comprise the user experiences thus may not be preferred. With the development of commodity RGB-D sensors, e.g., Microsoft Kinect II, RGB and depth information have been used for the understanding of human actions and recognizing objects. Motivated by the previous success, we propose to detect hand-held objects using RGB-D sensor. However, instead of performing object detection alone, we propose to leverage human body pose as the context to achieve robust hand-held object detection in RGB-D videos. Our system demonstrates a person can interact with a humanoid social robot with hand-held object such as a cell phone or a cup. Experimental evaluations validate the effectiveness of this proposed method.

References

[1]

Cigdem Beyan and Alptekin Temizel. 2015. A multimodal approach for individual tracking of people and their belongings. The Imaging Science Journal 63, 4 (2015), 192--202.

[2]

Barry Brumitt, Brian Meyers, John Krumm, Amanda Kern, and Steven Shafer. 2000. Easyliving: Technologies for intelligent environments. In International Symposium on Handheld and Ubiquitous Computing. Springer, 12--29.

Digital Library

[3]

Kerstin Dautenhahn. 2007. Socially intelligent robots: dimensions of human--robot interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 362, 1480 (2007), 679--704.

[4]

Chaitanya Desai, Deva Ramanan, and Charless Fowlkes. 2010. Discriminative models for static human-object interactions. In Computer vision and pattern recognition workshops (CVPRW), 2010 IEEE computer society conference on. IEEE, 9--16.

[5]

K. P. Fishkin, M. Philipose, and A. Rea. 2005. Hands-on RFID: wireless wearables for detecting use of objects. In IEEE International Symposium on Wearable Computers, 2005. Proceedings. 38--43.

Digital Library

[6]

Ross Girshick. 2015. Fast r-cnn. arXiv preprint arXiv:1504.08083 (2015).

Digital Library

[7]

Michael A Goodrich and Alan C Schultz. 2007. Human-robot interaction: a survey. Foundations and trends in human-computer interaction 1, 3 (2007), 203--275.

Digital Library

[8]

João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2015), 583--596.

Digital Library

[9]

Isibor Kennedy Ihianle, Usman Naeem, and Abdel-Rahman Tawil. 2016. Recognition of activities of daily living from topic model. Procedia Computer Science 98 (2016), 24--31.

Digital Library

[10]

Kai Kang, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2016. Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 817--825.

[11]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[12]

Marina Pérez-Jiménez, Borja Bordel Sánchez, and Ramón Alcarria. 2016. T4AI: A system for monitoring people based on improved wearable devices. Research Briefs on Information & Communication Technology Evolution (ReBICTE) 2 (2016), 1--16.

[13]

Juhi Ranjan and Kamin Whitehouse. 2016. Towards recognizing person-object interactions using a single wrist wearable device. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. ACM, 722--731.

Digital Library

[14]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.

[15]

Joseph Redmon and Ali Farhadi. 2016. YOLO9000: better, faster, stronger. arXiv preprint 1612 (2016).

[16]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

Digital Library

[17]

Thomas B Sheridan. 2016. Human--robot interaction: status and challenges. Human factors 58, 4 (2016), 525--532.

[18]

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. 2011. Real-time human pose recognition in parts from single depth images. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. Ieee, 1297--1304.

Digital Library

[19]

Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, and Fabio Cuzzolin. 2017. Online real-time multiple spatiotemporal action localisation and prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3637--3646.

[20]

Joshua R. Smith, Kenneth P. Fishkin, Bing Jiang, Alexander Mamishev, Matthai Philipose, Adam D. Rea, Sumit Roy, and Kishore Sundara-Rajan. 2005. RFID-based techniques for human-activity detection. Communications of the Acm 48, 9 (2005), 39--44.

Digital Library

[21]

Juan R Terven and Diana M Córdova-Esparza. 2016. Kin2. A Kinect 2 toolbox for MATLAB. Science of Computer Programming 130 (2016), 97--106.

Digital Library

[22]

Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. International journal of computer vision 104, 2 (2013), 154--171.

Digital Library

[23]

Yang Xiao, Zhijun Zhang, Aryel Beck, Junsong Yuan, and Daniel Thalmann. 2014. Human--robot interaction by understanding upper body gestures. Presence: teleoperators and virtual environments 23, 2 (2014), 133--154.

Digital Library

[24]

Zhaozhuo Xu, Yuan Tian, Xinjue Hu, and Fangling Pu. 2015. Dangerous human event understanding using human-object interaction model. In Signal Processing, Communications and Computing (ICSPCC), 2015 IEEE International Conference on. IEEE, 1--5.

[25]

Jiong Yang and Junsong Yuan. 2017. Common Action Discovery and Localization in Unconstrained Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2157--2166.

[26]

Xianxun Yao, Kai Liu, Anyong Hu, and Jungang Miao. 2015. Improved design of a passive millimeter-wave synthetic aperture interferometric imager for indoor applications. In Millimetre Wave and Terahertz Sensors and Technology VIII, Vol. 9651. International Society for Optics and Photonics, 965105.

[27]

Haoyong Yu, Sunan Huang, Gong Chen, Yongping Pan, and Zhao Guo. 2015. Human--robot interaction control of rehabilitation robots with series elastic actuators. IEEE Transactions on Robotics 31, 5 (2015), 1089--1100.

Cited By

Zhang YMa YKragic D(2024)Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot InteractionAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680244(1-8)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3680244
Khaire PKumar P(2022)Deep learning and RGB-D based human action, human–human and human–object interaction recognition: A surveyJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10353186(103531)Online publication date: Jul-2022
https://doi.org/10.1016/j.jvcir.2022.103531
Hristov PAvresky DBoumbarov O(2021)Human-Object Interaction Detection: 1D Convolutional Neural Network Approach Using Skeleton Data2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685549(1-5)Online publication date: 23-Nov-2021
https://doi.org/10.1109/NCA53618.2021.9685549
Show More Cited By

Index Terms

Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
      2. Computer vision tasks
        Vision for robotics

Recommendations

A Safe-Control Paradigm for Human–Robot Interaction

This paper introduces a new approach to control a robot manipulator in a way that is safe for humans in the robot‘s workspace. Conceptually the robot is viewed as a tool with limited autonomy. The limited perception capabilities of automatic systems ...
A study on usability of human-robot interaction using a mobile computer and a human interface device
MobileHCI '07: Proceedings of the 9th international conference on Human computer interaction with mobile devices and services

A variety of devices are used for robot control such as personal computers or other human interface devices, haptic devices, and so on. However, sometimes it is not easy to select a device which fits the specific character of varied kinds of robots ...
Pre-collision safety strategies for human-robot interaction

Safe planning and control is essential to bringing human-robot interaction into common experience. This paper presents an integrated human robot interaction strategy that ensures the safety of the human participant through a coordinated suite of safety ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CGI 2018: Proceedings of Computer Graphics International 2018

June 2018

284 pages

ISBN:9781450364010

DOI:10.1145/3208159

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CGI 2018

CGI 2018: Computer Graphics International 2018

June 11 - 14, 2018

Island, Bintan, Indonesia

Acceptance Rates

CGI 2018 Paper Acceptance Rate 35 of 159 submissions, 22%;

Overall Acceptance Rate 35 of 159 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
219
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YMa YKragic D(2024)Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot InteractionAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680244(1-8)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3680244
Khaire PKumar P(2022)Deep learning and RGB-D based human action, human–human and human–object interaction recognition: A surveyJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10353186(103531)Online publication date: Jul-2022
https://doi.org/10.1016/j.jvcir.2022.103531
Hristov PAvresky DBoumbarov O(2021)Human-Object Interaction Detection: 1D Convolutional Neural Network Approach Using Skeleton Data2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)10.1109/NCA53618.2021.9685549(1-5)Online publication date: 23-Nov-2021
https://doi.org/10.1109/NCA53618.2021.9685549
Ramanathan MSatapathy RMagnenat Thalmann N(2021)Survey of Speechless Interaction Techniques in Social RoboticsIntelligent Scene Modeling and Human-Computer Interaction10.1007/978-3-030-71002-6_14(241-257)Online publication date: 9-Jun-2021
https://doi.org/10.1007/978-3-030-71002-6_14
Muni MParhi DKumar P(2020)Implementation of grey wolf optimization controller for multiple humanoid navigationComputer Animation and Virtual Worlds10.1002/cav.191931:3Online publication date: 5-Mar-2020
https://doi.org/10.1002/cav.1919

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents