research-article

Open access

Modeling of Human Visual Attention in Multiparty Open-World Dialogues

Authors:

Kalin Stefanov,

Giampiero Salvi,

Dimosthenis Kontogiorgos,

Hedvig Kjellström,

Jonas BeskowAuthors Info & Claims

ACM Transactions on Human-Robot Interaction (THRI), Volume 8, Issue 2

Article No.: 8, Pages 1 - 21

https://doi.org/10.1145/3323231

Published: 03 June 2019 Publication History

All formats PDF

Abstract

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

References

[1]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems, White paper.

[2]

H. Admoni and B. Scassellati. 2014. Data-driven model of nonverbal behavior for socially assistive human-robot interactions. In Proceedings of the ACM International Conference on Multimodal Interaction. ACM, New York, NY, 196--199.

Digital Library

[3]

H. Admoni and B. Scassellati. 2017. Social eye gaze in human-robot interaction: A review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.

Digital Library

[4]

J. Anderson, M. Matessa, and C. Lebiere. 1997. ACT-R: A theory of higher level cognition and its relation to visual attention. Human-Computer Interaction 12, 4 (1997), 439--462.

Digital Library

[5]

S. Andrist, B. Mutlu, and M. Gleicher. 2013. Conversational gaze aversion for virtual agents. In Proceedings of the Intelligent Virtual Agents. Springer, Berlin, 249--262.

[6]

S. Andrist, X. Tan, M. Gleicher, and B. Mutlu. 2014. Conversational gaze aversion for humanlike robots. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, NY, 25--32.

Digital Library

[7]

M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke. 2005. Towards a humanoid museum guide robot that interacts with multiple persons. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 418--423.

[8]

D. Bohus and E. Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In Proceedings of the International Conference on Multimodal Interfaces. ACM, New York, NY, 1--8.

Digital Library

[9]

A. Borji and L. Itti. 2013. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 185--207.

Digital Library

[10]

C. Breazeal and B. Scassellati. 1999. A context-dependent attention system for a social robot. In Proceedings of the Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1146--1153.

Digital Library

[11]

F. Chollet et al. 2015. Keras.

[12]

A. Colburn, M. Cohen, and S. Drucker. 2000. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. Technical Report. Microsoft.

[13]

S. Frintrop, E. Rome, and H. Christensen. 2010. Computational visual attention systems and their cognitive foundations: A survey. ACM Transactions on Applied Perception 7, 1 (2010), 1--39.

Digital Library

[14]

E. Gu and N. Badler. 2006. Visual attention and eye gaze during multiparty conversations with distractions. In Proceedings of the Intelligent Virtual Agents. Springer, Berlin, 193--204.

Digital Library

[15]

E. Hall. 1990. The Hidden Dimension. Anchor, Garden City, NY.

[16]

S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[17]

M. Hoffman, D. Grimes, A. Shon, and R. Rao. 2006. A probabilistic model of gaze imitation and shared attention. Neural Networks 19, 3 (2006), 299--310.

Digital Library

[18]

A. Holroyd, C. Rich, C. Sidner, and B. Ponsler. 2011. Generating connection events for human-robot collaboration. In Proceedings of the IEEE International Conference on Robot and Human Interactive Communication. IEEE, 241--246.

[19]

C. Huang and B. Mutlu. 2014. Learning-based modeling of multimodal behaviors for humanlike robots. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, New York, NY, 57--64.

Digital Library

[20]

C. Ishi, C. Liu, H. Ishiguro, and N. Hagita. 2010. Head motion during dialogue speech and nod timing control in humanoid robots. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. IEEE Press, Piscataway, NJ, 293--300.

Digital Library

[21]

L. Itti and C. Koch. 2000. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 40, 10 (2000), 1489--1506.

[22]

L. Itti and C. Koch. 2001. Computational modelling of visual attention. Nature Reviews Neuroscience 2, 3 (2001).

[23]

S. Khullar and N. Badler. 2001. Where to look? Automating attending behaviors of virtual human characters. Autonomous Agents and Multi-Agent Systems 4, 1 (2001), 9--23.

Digital Library

[24]

D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. Computing Research Repository, abs/1412.6980.

[25]

C. Koch and S. Ullman. 1987. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. Springer Netherlands, Dordrecht, 115--141.

[26]

D. Kontogiorgos, V. Avramova, S. Alexandersson, P. Jonell, C. Oertel, J. Beskow, G. Skantze, and J. Gustafson. 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction. In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).

[27]

C. Liu, C. Ishi, H. Ishiguro, and N. Hagita. 2012. Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, NY, 285--292.

Digital Library

[28]

B. Mutlu, J. Forlizzi, and J. Hodgins. 2006. A storytelling robot: Modeling and evaluation of human-like gaze behavior. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 518--523.

[29]

C. Peters, C. Pelachaud, E. Bevacqua, M. Mancini, and I. Poggi. 2005. A model of attention and interest using gaze behavior. In Proceedings of the Intelligent Virtual Agents. Springer-Verlag, London, UK, 229--240.

Digital Library

[30]

C. Rich, B. Ponsler, A. Holroyd, and C. Sidner. 2010. Recognizing engagement in human-robot interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. IEEE Press, Piscataway, NJ, 375--382.

Digital Library

[31]

J. Ruesch, M. Lopes, A. Bernardino, J. Hornstein, J. Santos-Victor, and R. Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 962--967.

[32]

T. Spexard, M. Hanheide, and G. Sagerer. 2007. Human-oriented interaction with an anthropomorphic robot. IEEE Transactions on Robotics 23, 5 (2007), 852--862.

Digital Library

[33]

K. Stefanov and J. Beskow. 2016. A multi-party multi-modal dataset for focus of visual attention in human-human and human-robot interaction. In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).

[34]

J. Trafton, M. Bugajska, B. Fransen, and R. Ratwani. 2008. Integrating vision and audition within a cognitive architecture to track conversations. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, NY, 201--208.

Digital Library

[35]

H. Walker, W. Hall, and J. Hurst. 1990. Clinical Methods: The History, Physical, and Laboratory Examinations. Butterworth Publishers, Boston, MA.

[36]

Y. Zhang, J. Beskow, and H. Kjellström. 2017. Look but don’t stare: Mutual gaze interaction in social robots. In Proceedings the International Conference on Social Robotics. Springer International Publishing, Cham, 556--566.

Cited By

Haefflinger LElisei FBouchot BVarini BBailly G(2023)Data-Driven Generation of Eyes and Head Movements of a Social Robot in Multiparty ConversationSocial Robotics10.1007/978-981-99-8715-3_17(191-203)Online publication date: 3-Dec-2023
https://doi.org/10.1007/978-981-99-8715-3_17
Mishra CSkantze G(2022)Knowing Where to Look: A Planning-based Architecture to Automate the Gaze Behavior of Social Robots*2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900740(1201-1208)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1109/RO-MAN53752.2022.9900740
Ghosh SDhall AHayat MKnibbe J(2022)AV-GAZE: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-profilic Faces2022 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP46576.2022.9897360(2921-2925)Online publication date: 16-Oct-2022
https://doi.org/10.1109/ICIP46576.2022.9897360
Show More Cited By

Index Terms

Modeling of Human Visual Attention in Multiparty Open-World Dialogues
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Spatial and physical reasoning
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
        Supervised learning by regression
    2. Machine learning approaches
      1. Neural networks

Recommendations

On the Benefit of Independent Control of Head and Eye Movements of a Social Robot for Multiparty Human-Robot Interaction
Human-Computer Interaction
Abstract
The human gaze direction is the sum of the head and eye movements. The coordination of these two segments has been studied and models of the contribution of head movement to the gaze of virtual agents or robots have been proposed. However, these ...
Grounding and turn-taking in multimodal multiparty conversation
HCI'13: Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV

This study explores the empirical basis for multimodal conversation control acts. Applying conversation analysis as an exploratory approach, we attempt to illuminate the control functions of paralinguistic behaviors in managing multiparty conversation. ...
Non-invasive Gaze Direction Estimation from Head Orientation for Human-Machine Interaction
Human-Computer Interaction. Interaction Technologies
Abstract
Gaze direction is one of the most important interaction cues that is widely used in human-machine interactions. In scenarios where participants’ head movement is involved and/or participants are sensitive to body-attached sensors, traditional gaze ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Human-Robot Interaction

ACM Transactions on Human-Robot Interaction Volume 8, Issue 2

June 2019

136 pages

EISSN:2573-9522

DOI:10.1145/3339062

Editors:
Odest Chadwicke Jenkins
University of Michigan, USA
,
Selma Sabanovic
Indiana University, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2019

Accepted: 01 March 2019

Revised: 01 January 2019

Received: 01 April 2018

Published in THRI Volume 8, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

CHIST-ERA project IGLU and KTH SRA ICT The Next Generation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
765
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)20

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Haefflinger LElisei FBouchot BVarini BBailly G(2023)Data-Driven Generation of Eyes and Head Movements of a Social Robot in Multiparty ConversationSocial Robotics10.1007/978-981-99-8715-3_17(191-203)Online publication date: 3-Dec-2023
https://doi.org/10.1007/978-981-99-8715-3_17
Mishra CSkantze G(2022)Knowing Where to Look: A Planning-based Architecture to Automate the Gaze Behavior of Social Robots*2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900740(1201-1208)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1109/RO-MAN53752.2022.9900740
Ghosh SDhall AHayat MKnibbe J(2022)AV-GAZE: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-profilic Faces2022 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP46576.2022.9897360(2921-2925)Online publication date: 16-Oct-2022
https://doi.org/10.1109/ICIP46576.2022.9897360
Martín FGinés JRodríguez-Lera FGuerrero-Higueras AMatellán Olivera V(2021)Client-Server Approach for Managing Visual Attention, Integrated in a Cognitive Architecture for a Social RobotFrontiers in Neurorobotics10.3389/fnbot.2021.63038615Online publication date: 9-Sep-2021
https://doi.org/10.3389/fnbot.2021.630386
Arslan Aydin ÜKalkan SAcartürk C(2021)Speech Driven Gaze in a Face-to-Face InteractionFrontiers in Neurorobotics10.3389/fnbot.2021.59889515Online publication date: 4-Mar-2021
https://doi.org/10.3389/fnbot.2021.598895
Birmingham CStefanov KMataric MShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Group-Level Focus of Visual Attention for Improved Next Speaker PredictionProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3479213(4838-4842)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3479213
Birmingham CMataric MStefanov K(2021)Group-Level Focus of Visual Attention for Improved Active Speaker DetectionCompanion Publication of the 2021 International Conference on Multimodal Interaction10.1145/3461615.3485430(37-42)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3461615.3485430
Huang HKimura SKuwabara KNishida T(2020)Generation of Head Movements of a Robot Using Multimodal Features of Peer Participants in Group Discussion ConversationMultimodal Technologies and Interaction10.3390/mti40200154:2(15)Online publication date: 29-Apr-2020
https://doi.org/10.3390/mti4020015
Tan XAndrist SBohus DHorvitz EBelpaeme TYoung JGunes HRiek L(2020)Now, Over HereCompanion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3371382.3378363(468-470)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3371382.3378363
Ghosh H(2020)ReferencesComputational Models for Cognitive Vision10.1002/9781119527886.refs(187-213)Online publication date: 6-Jul-2020
https://doi.org/10.1002/9781119527886.refs

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents