On-the-Fly Detection of User Engagement Decrease in Spontaneous Human–Robot Interaction Using Recurrent and Deep Neural Networks

Ben-Youssef, Atef; Varni, Giovanna; Essid, Slim; Clavel, Chloé

doi:10.1007/s12369-019-00591-2

On-the-Fly Detection of User Engagement Decrease in Spontaneous Human–Robot Interaction Using Recurrent and Deep Neural Networks

Published: 28 September 2019

Volume 11, pages 815–828, (2019)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

Atef Ben-Youssef ORCID: orcid.org/0000-0002-8396-001X¹,
Giovanna Varni¹,
Slim Essid¹ &
…
Chloé Clavel¹

1227 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper we consider the detection of a decrease of engagement by users spontaneously interacting with a socially assistive robot in a public space. We first describe the UE-HRI dataset that collects spontaneous human–robot interactions following the guidelines provided by the affective computing research community to collect data “in-the-wild”. We then analyze the users’ behaviors, focusing on proxemics, gaze, head motion, facial expressions and speech during interactions with the robot. Finally, we investigate the use of deep leaning techniques (recurrent and deep neural networks) to detect user engagement decrease in real-time. The results of this work highlight, in particular, the relevance of taking into account the temporal dynamics of a user’s behavior. Allowing 1–2 s as buffer delay improves the performance of taking a decision on user engagement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Engagement Prediction in Human-Robot Interaction Using Transformer Neural Networks

How Should a Robot Interrupt a Conversation Between Multiple Humans

Human Interaction Prediction Using Deep Temporal Features

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.softbankrobotics.com/emea/en/pepper.
https://www.tsi.telecom-paristech.fr/aao/en/2017/05/18/ue-hri-dataset/.
http://wiki.ros.org/naoqi_driver.
http://doc.aldebaran.com/2-7/family/pepper_technical/sonar_pep.html.
http://doc.aldebaran.com/2-7/naoqi/peopleperception/.
http://doc.aldebaran.com/2-7/glossary.html#term-frame-robot.
Beamforming would be a better alternative that will be considered in future work.
http://doc.aldebaran.com/2-7/family/pepper_technical/video_2D_pep.html.
https://www.ffmpeg.org/.

References

Andrist S, Bohus D, Kamar E, Horvitz E (2017) What went wrong and why? Diagnosing situated interaction failures in the wild. In: 9th international conference on social robotics (ICSR), Tsukuba, Japan
Chapter Google Scholar
Anzalone SM, Varni G, Zibetti E, Ivaldi S, Chetouani M (2015) Automated prediction of extraversion during human–robot interaction. In: Finzi A, Alberto and Mastrogiovanni, Fulvio and Orlandini, Andrea and Sgorbissa (ed) AIRO@AI*IA, vol 1544, pp 29–39
Baltrusaitis T, Mahmoud M, Robinson P (2015) Cross-dataset learning and person-specific normalisation for automatic Action Unit detection. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, pp 1–6
Baltrusaitis T, Zadeh A, Lim YC, Morency LP (2018) OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE international conference on automatic face and gesture recognition (FG 2018). IEEE, pp 59–66
Ben-Youssef A, Clavel C, Essid S (2019) Early detection of user engagement breakdown in spontaneous human–humanoid interaction. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2019.2898399
Article Google Scholar
Ben-Youssef A, Clavel C, Essid S, Bilac M, Chamoux M, Lim A (2017) UE-HRI: a new dataset for the study of user engagement in spontaneous human–robot interactions. In: Proceedings of the 19th ACM international conference on multimodal interaction, ICMI 2017. ACM, New York, pp 464–472
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet Google Scholar
Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL 2009 conference on the 10th annual meeting of the special interest group on discourse and dialogue—SIGDIAL ’09, September, pp 244–252
Bohus D, Horvitz E (2009) Models for multiparty engagement in open-world dialog. In: Proceedings of the SIGDIAL 2009 conference: the 10th annual meeting of the special interest group on discourse and dialogue, SIGDIAL ’09. Association for Computational Linguistics, Stroudsburg, pp 225–234
Bohus D, Horvitz E (2009) Open-world dialog: challenges, directions, and a prototype. In: Proceedings of the IJCAI’2009 workshop on knowledge and reasoning in practical dialogue systems, Pasadena, California, USA, pp 34–45
Bohus D, Horvitz E (2014) Managing human–robot engagement with forecasts and...um...hesitations. In: Proceedings of the 16th international conference on multimodal interaction—ICMI ’14. ACM Press, New York, pp 2–9
Bosch N, D’Mello S (2015) The affective experience of novice computer programmers. Int J Artif Intell Educ 27(1):181–206
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Article Google Scholar
Castellano G, Leite I, Pereira A, Martinho C, Paiva A, McOwan PW (2012) Detecting engagement in HRI: an exploration of social and task-based context. In: 2012 international conference on privacy, security, risk and trust and 2012 international conference on social computing. IEEE, pp 421–428
Celiktutan O, Skordos E, Gunes H (2017) Multimodal human–human–robot interactions (MHHRI) dataset for studying personality and engagement. IEEE Trans Affect Comput
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation
Chollet F (2015) keras. https://github.com/fchollet/keras. Accessed 05 Feb 2018
Clavel C, Cafaro A, Campano S, Pelachaud C (2016) Fostering user engagement in face-to-face human–agent interactions: a survey. Springer, Cham, pp 93–120
Google Scholar
Corrigan LJ, Peters C, Küster D, Castellano G (2016) Engagement perception and generation for social robots and virtual agents. Springer, Cham, pp 29–51
Google Scholar
D’Mello S, Graesser A (2012) Dynamics of affective states during complex learning. Learn Instr 22(2):145–157
Article Google Scholar
Dominey P, Metta G, Nori F, Natale L (2008) Anticipation and initiative in human-humanoid interaction. In: Humanoids 2008—8th IEEE-RAS international conference on humanoid robots. IEEE, pp 693–699
Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on multimedia—MM ’10. ACM Press, New York, pp 1459–1462
Feil-Seifer D, Mataric M (2005) Defining socially assistive robotics. In: 9th international conference on rehabilitation robotics, 2005. ICORR 2005. IEEE, pp 465–468
Foster ME, Alami R, Gestranius O, Lemon O, Niemelä M, Odobez JM, Pandey AK (2016) The MuMMER project: engaging human–robot interaction in real-world public spaces. Springer, Cham, pp 753–763
Google Scholar
Foster ME, Gaschler A, Giuliani M (2017) Automatically classifying user engagement for dynamic multi-party human-robot interaction. Int J Soc Robot 9(5):659–674
Article Google Scholar
Gehle R, Pitsch K, Dankert T, Wrede S (2017) How to open an interaction between robot and museum visitor? Strategies to establish a focused encounter in HRI. In: Proceedings of the 2017 ACM/IEEE international conference on human–robot interaction—HRI ’17. ACM Press, New York, pp 187–195
Glas N, Pelachaud C (2015) User engagement and preferences in information-giving chat with virtual agents, pp 33–40
Hall J, Tritton T, Rowe A, Pipe A, Melhuish C, Leonards U (2014) Perception of own and robot engagement in human–robot interactions and their dependence on robotics knowledge. Robot Autonom Syst 62(3):392–399
Article Google Scholar
Hayashi K, Sakamoto D, Kanda T, Shiomi M, KoizumiS, Ishiguro H, Ogasawara T, Hagita N (2007) Humanoid robots as a passive-social medium. In: Proceedings of the ACM/IEEE international conference on human–robot interaction—HRI ’07. ACM Press, New York, p 137
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–80
Article Google Scholar
Human Vision Components (HVC-P2) B5T-007001 Command Specifications. Technical report, OMRON Corporation Electronic and Mechanical Components Company, Japan (2016)
Ivaldi S, Lefort S, Peters J, Chetouani M, Provasi J, Zibetti E (2017) Towards engagement models that consider individual factors in HRI: on the relation of extroversion and negative attitude towards robots to gaze and speech during a human-robot assembly task. Int J Soc Robot 9(1):63–86
Article Google Scholar
Joder C, Essid S, Richard G (2009) Temporal integration for audio classification with application to musical instrument classification. IEEE Trans Audio Speech Lang Process 17(1):174–186
Article Google Scholar
Kanda T, Shiomi M, Miyashita Z, Ishiguro H, Hagita N (2009) An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction—HRI ’09. ACM Press, New York, p 173
Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychol 26:22–63
Article Google Scholar
Leite I, McCoy M, Ullman D, Salomons N, Scassellati B (2015) Comparing models of disengagement in individual and group interactions. In: Proceedings of the tenth annual ACM/IEEE international conference on human–robot interaction—HRI ’15. ACM Press, New York, pp 99–105
Li L, Xu Q, Tan YK (2012) Attention-based addressee selection for service and social robots to interact with multiple persons. In: Proceedings of the workshop at SIGGRAPH Asia, WASA ’12. ACM, New York, pp 131–136
Liu T, Kappas A (2018) Predicting engagement breakdown in HRI using thin-slices of facial expressions. In: Workshops at the thirty-second AAAI conference on artificial intelligence, pp 37–43
Martinovski B, Traum D (2003) The error is the clue: breakdown in human–machine interaction. In: Proceedings of the ISCA workshop on error handling in spoken dialogue systems, pp 11–17
Miller RB (1968) Response time in man-computer conversational transactions. In: Proceedings of the December 9–11, 1968, fall joint computer conference, part I on–AFIPS ’68 (Fall, part I). ACM Press, New York, p 267
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Pitsch K, Kuzuoka H, Suzuki Y, Sussenbach L, Luff P, Heath C (2009) “The first five seconds”: contingent stepwise entry into an interaction as a means to secure sustained engagement in HRI. In: RO-MAN 2009—the 18th IEEE international symposium on robot and human interactive communication. IEEE, Toyama, pp 985–991
Poggi I (2007) Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler Buchverlag, Berlin
Google Scholar
Rawassizadeh R, Momeni E, Dobbins C, Gharibshah J, Pazzani M (2016) Scalable daily human behavioral pattern mining from multivariate temporal data. IEEE Trans Knowl Data Eng 28(11):3098–3112
Article Google Scholar
Rich C, Ponsler B, Holroyd A, Sidner CL (2010) Recognizing engagement in human–robot interaction. In: 2010 5th ACM/IEEE international conference on human–robot interaction (HRI). IEEE, pp 375–382
Robots in public spaces (2013) towards multi-party, short-term, dynamic human-robot interaction. In: Giuliani M, Petrick R (eds) International conference on social robotics (ICSR 2013), Bristol, UK
Schuller B, Ganascia JG, Devillers L (2016) Multimodal sentiment analysis in the wild: ethical considerations on data collection, annotation, and exploitation. In: Actes du workshop on ethics in corpus collection, annotation and application (ETHI-CA2), LREC, Portoroz, Slovénie
Schuller B, Müeller R, Höernler B, Höethker A, Konosu H, Rigoll G (2007) Audiovisual recognition of spontaneous interest within conversations. In: Proceedings of the ninth international conference on multimodal interfaces—ICMI ’07. ACM Press, New York, p 30
Sidner CL, Lee C, Kidd CD, Lesh N, Rich C (2005) Explorations in engagement for humans and robots. Artif Intell 166(1–2):140–164
Article Google Scholar
Tapus A Mataric MJ (2008) Socially assistive robots: the link between personality, empathy, physiological signals, and task performance. Undefined
Trung P, Giuliani M, Miksch M, Stollnberger G, Stadler S, Mirnig N, Tscheligi M (2017) Head and shoulders: automatic error detection in human–robot interaction. In: Proceedings of the 19th ACM international conference on multimodal interaction—ICMI 2017. ACM Press, New York, pp 181–188
Vaufreydaz D, Johal W, Combe C (2016) Starting engagement detection towards a companion robot using multimodal features. Robot Autonom Syst 75:4–16
Article Google Scholar
Wittenburg P, Brugman H, Russel A, Klassmann A, Sloetjes H (2006) ELAN: a professional framework for multimodality research. In: LREC 2006, pp 1556–1559
Wood E, Baltruaitis T, Zhang X, Sugano Y, Robinson P, Bulling A (2015) Rendering of eyes for eye-shape registration and gaze estimation. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 3756–3764

Download references

Acknowledgements

This work was supported by European projects H2020 ANIMATAS (ITN 7659552) and a grant overseen by the French National Research Agency (ANR-17-MAOI). The authors would like to thank Nicolas Rollet and Christian Licoppe for useful discussions on pre-closing and Rodolphe Gelin, Angelica Lim, Marine Chanoux and Myriam Bilac from Softbank robotics for their help in the recording of UE-HRI dataset.

Funding

This work was supported by SoftBank Robotics.

Author information

Authors and Affiliations

LTCI, Télécom Paris, Institut polytechnique de Paris, 75013, Paris, France
Atef Ben-Youssef, Giovanna Varni, Slim Essid & Chloé Clavel

Authors

Atef Ben-Youssef
View author publications
You can also search for this author in PubMed Google Scholar
Giovanna Varni
View author publications
You can also search for this author in PubMed Google Scholar
Slim Essid
View author publications
You can also search for this author in PubMed Google Scholar
Chloé Clavel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atef Ben-Youssef.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Survey of satisfaction presented as the final phase of the scenario. The participant was asked to indicate:

1.
his satisfaction with the interaction,
2.
his involvement in the interaction,
3.
his desire to leave the interaction,
4.
his desire to continue the interaction during the welcome phase,
5.
his desire to continue the interaction during the dialog phase,
6.
his desire to continue the interaction during the cucumber phase,
7.
his desire to continue the interaction during the survey phase,
8.
his desire to stay during the interaction,
9.
if he believes that the robot wanted to stay during the interaction,
10.
his desire to continue the conversation,
11.
if he believes that the robot wanted to continue the conversation,
12.
his feeling about his involvement in the interaction,
13.
if he finds that the interaction was boring or fun,
14.
if he finds that the information was interesting,
15.
if he liked the interaction.

Appendix B

See Fig. 9.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben-Youssef, A., Varni, G., Essid, S. et al. On-the-Fly Detection of User Engagement Decrease in Spontaneous Human–Robot Interaction Using Recurrent and Deep Neural Networks. Int J of Soc Robotics 11, 815–828 (2019). https://doi.org/10.1007/s12369-019-00591-2

Download citation

Accepted: 11 September 2019
Published: 28 September 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s12369-019-00591-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On-the-Fly Detection of User Engagement Decrease in Spontaneous Human–Robot Interaction Using Recurrent and Deep Neural Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Engagement Prediction in Human-Robot Interaction Using Transformer Neural Networks

How Should a Robot Interrupt a Conversation Between Multiple Humans

Human Interaction Prediction Using Deep Temporal Features

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On-the-Fly Detection of User Engagement Decrease in Spontaneous Human–Robot Interaction Using Recurrent and Deep Neural Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Engagement Prediction in Human-Robot Interaction Using Transformer Neural Networks

How Should a Robot Interrupt a Conversation Between Multiple Humans

Human Interaction Prediction Using Deep Temporal Features

Explore related subjects

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation