Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3351529.3360661acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
abstract
Public Access

Fusing Dialogue and Gaze From Discussions of 2D and 3D Scenes

Published: 14 October 2019 Publication History

Abstract

Conversation partners rely on inference using each other’s gaze and utterances to negotiate shared meaning. In contrast, dialogue systems still operate mostly with unimodal question or command and response interactions. To realize systems that can intuitively discuss and collaborate with humans, we should consider other sensory information. We begin to address this limitation with an innovative study that acquires, analyzes, and fuses interlocutors’ discussion and gaze. Introducing a discussion-based elicitation task, we collect gaze with remote and wearable eye trackers alongside dialogue as interlocutors come to consensus on questions about an on-screen 2D image and a real-world 3D scene. We analyze the visual-linguistic patterns, and also map the modalities onto the visual environment by extending a multimodal image region annotation framework using statistical machine translation for multimodal fusion, applying three ways of fusing speakers’ gaze and discussion.

References

[1]
Nicola C. Anderson, Walter F. Bischof, Kaitlin E.W. Laidlaw, Evan F. Risko, and Alan Kingstone. 2013. Recurrence quantification analysis of eye movements. Behavior Research Methods 45 (2013), 842–856.
[2]
Dan Bohus and Eric Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction.
[3]
Aliya Gangji, Trevor Walden, Preethi Vaidyanathan, Emily Prud’hommeaux, Reynold Bailey, and Cecilia O Alm. 2017. Using co-captured face, gaze, and verbal reactions to images of varying emotional content for analysis and semantic alignment. In Proceedings of the AAAI Workshop on Human-Aware Artificial Intelligence.
[4]
Zenzi M. Griffin. 2004. Why look? Reasons for eye movements related to language production. In The Interface of Language, Vision, and Action: Eye Movements and the Visual World. Psychology Press, 213–248.
[5]
Zenzi M. Griffin and Kathryn Bock. 2000. What the eyes say about speaking. Psychological Science 11, 4 (2000), 274–279.
[6]
Nikita Haduong, David Nester, Preethi Vaidyanathan, Emily Prud’hommeaux, Reynold Bailey, and Cecilia Alm. 2018. Multimodal Alignment for Affective Content. In Proceedings of the AAAI Workshop on Affective Content Analysis.
[7]
Jana Holsanova. 2008. Discourse, Vision, and Cognition. John Benjamins Publishing Company.
[8]
IBM. 2019. Watson Text to Speech. https://www.ibm.com/watson/services/text-to-speech/
[9]
[9] iMotions.2019. https://imotions.com/
[10]
[10] SensoMotoric Instruments.2019. https://www.smivision.com/eye-tracking/products/software-for-eye-tracking/
[11]
Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, and Seiichi Yamamoto. 2010. Turn-alignment using eye-gaze and speech in conversational interaction. In Eleventh Annual Conference of the International Speech Communication Association.
[12]
Dimosthenis Kontogiorgos, Vanya Avramova, Simon Alexandersson, Patrik Jonell, Catharine Oertel, Jonas Beskow, Gabriel Skantze, and Joakim Gustafsson. 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction. In Language Resources and Evaluation Conference.
[13]
[13] Pupil Labs.2019. https://pupil-labs.com/
[14]
Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 104–111.
[15]
Yuki Matsuda, Dmitrii Fedotov, Yuta Takahashi, Yutaka Arakawa, Keiichi Yasumoto, and Wolfgang Minker. 2018. Estimating User Satisfaction Impact in Cities using Physical Reaction Sensing and Multimodal Dialogue System. In International Workshop on Spoken Dialogue Systems Technology.
[16]
Shaolin Qu and Joyce Y. Chai. 2008. Incorporating temporal and semantic information with eye gaze for automatic word acquisition in multimodal conversational systems. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing. 244–253.
[17]
Daniel C. Richardson and Rick Dale. 2005. Looking to understand: The coupling between speakers’ and listeners’ eye movements and its relationship to discourse comprehension. Cognitive Science 29, 6 (2005), 1045–1060.
[18]
Daniel C. Richardson, Rick Dale, and Natasha Z. Kirkham. 2007. The art of conversation is coordination. Psychological Science 18, 5 (2007), 407–413.
[19]
Anthony Santella and Doug DeCarlo. 2004. Robust clustering of eye movement recordings for quantification of visual interest. In Proceedings of the ACM symposium on Eye tracking research & Applications. 27–34.
[20]
TJ Tsai, Andreas Stolcke, and Malcolm Slaney. 2015. Multimodal addressee detection in multiparty dialogue systems. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing. 2314–2318.
[21]
Preethi Vaidyanathan. 2017. Visual-Linguistic Semantic Alignment: Fusing Human Gaze and Spoken Narratives for Image Region Annotation. Ph.D. Dissertation. Rochester Institute of Technology.
[22]
Preethi Vaidyanathan, Emily Prud’hommeaux, Jeff B. Pelz, Cecilia Ovesdotter Alm, and Anne R. Haake. 2016. Fusing eye movements and observer narratives for expert-driven image-region annotations. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. 27–34.
[23]
Preethi Vaidyanathan, Emily T. Prud’hommeaux, Jeff B. Pelz, and Cecilia O. Alm. 2018. SNAG: Spoken Narratives and Gaze Dataset. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol. 2. 132–137.
[24]
Alfred Yarbus. 1965. Role of eye movements in the visual process.Nauka Press, Moscow.
[25]
Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky, and Tamara L. Berg. 2013. Studying Relationships between Human Gaze, Description, and Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 739–746.

Cited By

View all
  • (2020)Computational framework for fusing eye movements and spoken narratives for image annotationJournal of Vision10.1167/jov.20.7.1320:7(13)Online publication date: 17-Jul-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '19: Adjunct of the 2019 International Conference on Multimodal Interaction
October 2019
86 pages
ISBN:9781450369374
DOI:10.1145/3351529
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Check for updates

Author Tags

  1. 2D and 3D scenes
  2. dialogue
  3. eye movements
  4. gaze
  5. multimodal fusion
  6. spoken discussion

Qualifiers

  • Abstract
  • Research
  • Refereed limited

Funding Sources

Conference

ICMI '19

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)20
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Computational framework for fusing eye movements and spoken narratives for image annotationJournal of Vision10.1167/jov.20.7.1320:7(13)Online publication date: 17-Jul-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media