Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Modeling of Human Visual Attention in Multiparty Open-World Dialogues

Published: 03 June 2019 Publication History
  • Get Citation Alerts
  • Abstract

    This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

    References

    [1]
    M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems, White paper.
    [2]
    H. Admoni and B. Scassellati. 2014. Data-driven model of nonverbal behavior for socially assistive human-robot interactions. In Proceedings of the ACM International Conference on Multimodal Interaction. ACM, New York, NY, 196--199.
    [3]
    H. Admoni and B. Scassellati. 2017. Social eye gaze in human-robot interaction: A review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.
    [4]
    J. Anderson, M. Matessa, and C. Lebiere. 1997. ACT-R: A theory of higher level cognition and its relation to visual attention. Human-Computer Interaction 12, 4 (1997), 439--462.
    [5]
    S. Andrist, B. Mutlu, and M. Gleicher. 2013. Conversational gaze aversion for virtual agents. In Proceedings of the Intelligent Virtual Agents. Springer, Berlin, 249--262.
    [6]
    S. Andrist, X. Tan, M. Gleicher, and B. Mutlu. 2014. Conversational gaze aversion for humanlike robots. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, NY, 25--32.
    [7]
    M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke. 2005. Towards a humanoid museum guide robot that interacts with multiple persons. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 418--423.
    [8]
    D. Bohus and E. Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In Proceedings of the International Conference on Multimodal Interfaces. ACM, New York, NY, 1--8.
    [9]
    A. Borji and L. Itti. 2013. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 185--207.
    [10]
    C. Breazeal and B. Scassellati. 1999. A context-dependent attention system for a social robot. In Proceedings of the Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1146--1153.
    [11]
    F. Chollet et al. 2015. Keras.
    [12]
    A. Colburn, M. Cohen, and S. Drucker. 2000. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. Technical Report. Microsoft.
    [13]
    S. Frintrop, E. Rome, and H. Christensen. 2010. Computational visual attention systems and their cognitive foundations: A survey. ACM Transactions on Applied Perception 7, 1 (2010), 1--39.
    [14]
    E. Gu and N. Badler. 2006. Visual attention and eye gaze during multiparty conversations with distractions. In Proceedings of the Intelligent Virtual Agents. Springer, Berlin, 193--204.
    [15]
    E. Hall. 1990. The Hidden Dimension. Anchor, Garden City, NY.
    [16]
    S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
    [17]
    M. Hoffman, D. Grimes, A. Shon, and R. Rao. 2006. A probabilistic model of gaze imitation and shared attention. Neural Networks 19, 3 (2006), 299--310.
    [18]
    A. Holroyd, C. Rich, C. Sidner, and B. Ponsler. 2011. Generating connection events for human-robot collaboration. In Proceedings of the IEEE International Conference on Robot and Human Interactive Communication. IEEE, 241--246.
    [19]
    C. Huang and B. Mutlu. 2014. Learning-based modeling of multimodal behaviors for humanlike robots. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, New York, NY, 57--64.
    [20]
    C. Ishi, C. Liu, H. Ishiguro, and N. Hagita. 2010. Head motion during dialogue speech and nod timing control in humanoid robots. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. IEEE Press, Piscataway, NJ, 293--300.
    [21]
    L. Itti and C. Koch. 2000. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 40, 10 (2000), 1489--1506.
    [22]
    L. Itti and C. Koch. 2001. Computational modelling of visual attention. Nature Reviews Neuroscience 2, 3 (2001).
    [23]
    S. Khullar and N. Badler. 2001. Where to look? Automating attending behaviors of virtual human characters. Autonomous Agents and Multi-Agent Systems 4, 1 (2001), 9--23.
    [24]
    D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. Computing Research Repository, abs/1412.6980.
    [25]
    C. Koch and S. Ullman. 1987. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. Springer Netherlands, Dordrecht, 115--141.
    [26]
    D. Kontogiorgos, V. Avramova, S. Alexandersson, P. Jonell, C. Oertel, J. Beskow, G. Skantze, and J. Gustafson. 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction. In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).
    [27]
    C. Liu, C. Ishi, H. Ishiguro, and N. Hagita. 2012. Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, NY, 285--292.
    [28]
    B. Mutlu, J. Forlizzi, and J. Hodgins. 2006. A storytelling robot: Modeling and evaluation of human-like gaze behavior. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 518--523.
    [29]
    C. Peters, C. Pelachaud, E. Bevacqua, M. Mancini, and I. Poggi. 2005. A model of attention and interest using gaze behavior. In Proceedings of the Intelligent Virtual Agents. Springer-Verlag, London, UK, 229--240.
    [30]
    C. Rich, B. Ponsler, A. Holroyd, and C. Sidner. 2010. Recognizing engagement in human-robot interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. IEEE Press, Piscataway, NJ, 375--382.
    [31]
    J. Ruesch, M. Lopes, A. Bernardino, J. Hornstein, J. Santos-Victor, and R. Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 962--967.
    [32]
    T. Spexard, M. Hanheide, and G. Sagerer. 2007. Human-oriented interaction with an anthropomorphic robot. IEEE Transactions on Robotics 23, 5 (2007), 852--862.
    [33]
    K. Stefanov and J. Beskow. 2016. A multi-party multi-modal dataset for focus of visual attention in human-human and human-robot interaction. In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).
    [34]
    J. Trafton, M. Bugajska, B. Fransen, and R. Ratwani. 2008. Integrating vision and audition within a cognitive architecture to track conversations. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, New York, NY, 201--208.
    [35]
    H. Walker, W. Hall, and J. Hurst. 1990. Clinical Methods: The History, Physical, and Laboratory Examinations. Butterworth Publishers, Boston, MA.
    [36]
    Y. Zhang, J. Beskow, and H. Kjellström. 2017. Look but don’t stare: Mutual gaze interaction in social robots. In Proceedings the International Conference on Social Robotics. Springer International Publishing, Cham, 556--566.

    Cited By

    View all
    • (2023)Data-Driven Generation of Eyes and Head Movements of a Social Robot in Multiparty ConversationSocial Robotics10.1007/978-981-99-8715-3_17(191-203)Online publication date: 3-Dec-2023
    • (2022)Knowing Where to Look: A Planning-based Architecture to Automate the Gaze Behavior of Social Robots*2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900740(1201-1208)Online publication date: 29-Aug-2022
    • (2022)AV-GAZE: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-profilic Faces2022 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP46576.2022.9897360(2921-2925)Online publication date: 16-Oct-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Human-Robot Interaction
    ACM Transactions on Human-Robot Interaction  Volume 8, Issue 2
    June 2019
    136 pages
    EISSN:2573-9522
    DOI:10.1145/3339062
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2019
    Accepted: 01 March 2019
    Revised: 01 January 2019
    Received: 01 April 2018
    Published in THRI Volume 8, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Human-human interaction
    2. eye-gaze direction
    3. head orientation
    4. multiparty
    5. open-world dialogue

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • CHIST-ERA project IGLU and KTH SRA ICT The Next Generation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Data-Driven Generation of Eyes and Head Movements of a Social Robot in Multiparty ConversationSocial Robotics10.1007/978-981-99-8715-3_17(191-203)Online publication date: 3-Dec-2023
    • (2022)Knowing Where to Look: A Planning-based Architecture to Automate the Gaze Behavior of Social Robots*2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900740(1201-1208)Online publication date: 29-Aug-2022
    • (2022)AV-GAZE: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-profilic Faces2022 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP46576.2022.9897360(2921-2925)Online publication date: 16-Oct-2022
    • (2021)Client-Server Approach for Managing Visual Attention, Integrated in a Cognitive Architecture for a Social RobotFrontiers in Neurorobotics10.3389/fnbot.2021.63038615Online publication date: 9-Sep-2021
    • (2021)Speech Driven Gaze in a Face-to-Face InteractionFrontiers in Neurorobotics10.3389/fnbot.2021.59889515Online publication date: 4-Mar-2021
    • (2021)Group-Level Focus of Visual Attention for Improved Next Speaker PredictionProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3479213(4838-4842)Online publication date: 17-Oct-2021
    • (2021)Group-Level Focus of Visual Attention for Improved Active Speaker DetectionCompanion Publication of the 2021 International Conference on Multimodal Interaction10.1145/3461615.3485430(37-42)Online publication date: 18-Oct-2021
    • (2020)Generation of Head Movements of a Robot Using Multimodal Features of Peer Participants in Group Discussion ConversationMultimodal Technologies and Interaction10.3390/mti40200154:2(15)Online publication date: 29-Apr-2020
    • (2020)Now, Over HereCompanion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3371382.3378363(468-470)Online publication date: 23-Mar-2020
    • (2020)ReferencesComputational Models for Cognitive Vision10.1002/9781119527886.refs(187-213)Online publication date: 6-Jul-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media