Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3686215.3688382acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Detecting when Users Disagree with Generated Captions

Published: 04 November 2024 Publication History

Abstract

The pervasive integration of artificial intelligence (AI) into daily life has led to a growing interest in AI agents that can learn continuously. Interactive Machine Learning (IML) has emerged as a promising approach to meet this need, essentially involving human experts in the model training process, often through iterative user feedback. However, repeated feedback requests can lead to frustration and reduced trust in the system. Hence, there is increasing interest in refining how these systems interact with users to ensure efficiency without compromising user experience. Our research investigates the potential of eye tracking data as an implicit feedback mechanism to detect user disagreement with AI-generated captions in image captioning systems. We conducted a study with 30 participants using a simulated captioning interface and gathered their eye movement data as they assessed caption accuracy. The goal of the study was to determine whether eye tracking data can predict user agreement or disagreement effectively, thereby strengthening IML frameworks. Our findings reveal that, while eye tracking shows promise as a valuable feedback source, ensuring consistent and reliable model performance across diverse users remains a challenge.

References

[1]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine 35, 4 (Dec. 2014), 105–120. https://doi.org/10.1609/aimag.v35i4.2513
[2]
Michael Barz, Omair Shahzad Bhatti, and Daniel Sonntag. 2021. Implicit Estimation of Paragraph Relevance From Eye Movements. Frontiers Comput. Sci. 3 (2021), 808507. https://doi.org/10.3389/fcomp.2021.808507
[3]
Michael Barz, Sven Stauden, and Daniel Sonntag. 2020. Visual Search Target Inference in Natural Interaction Settings with Machine Learning. In ETRA ’20: 2020 Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, June 2-5, 2020, Andreas Bulling, Anke Huckauf, Eakta Jain, Ralph Radach, and Daniel Weiskopf (Eds.). ACM, 1:1–1:8. https://doi.org/10.1145/3379155.3391314
[4]
Nilavra Bhattacharya, Somnath Rakshit, and Jacek Gwizdka. 2020. Towards Real-time Webpage Relevance Prediction UsingConvex Hull Based Eye-tracking Features. In ACM Symposium on Eye Tracking Research and Applications (Stuttgart, Germany) (ETRA ’20 Adjunct). Association for Computing Machinery, New York, NY, USA, Article 28, 10 pages. https://doi.org/10.1145/3379157.3391302
[5]
Omair Bhatti, Michael Barz, and Daniel Sonntag. 2022. Leveraging Implicit Gaze-Based User Feedback for Interactive Machine Learning. In KI 2022: Advances in Artificial Intelligence, Ralph Bergmann, Lukas Malburg, Stephanie C. Rodermund, and Ingo J. Timm (Eds.). Springer International Publishing, Cham, 9–16.
[6]
Nigel Bosch, Yuxuan Chen, and Sidney D’Mello. 2014. It’s Written on Your Face: Detecting Affective States from Facial Expressions while Learning Computer Programming. In Intelligent Tutoring Systems, Stefan Trausan-Matu, Kristy Elizabeth Boyer, Martha Crosby, and Kitty Panourgia (Eds.). Springer International Publishing, Cham, 39–44.
[7]
Maya Cakmak, Crystal Chao, and Andrea L. Thomaz. 2010. Designing Interactions for Robot Active Learners. IEEE Transactions on Autonomous Mental Development 2, 2 (2010), 108–118. https://doi.org/10.1109/TAMD.2010.2051030
[8]
Sidney K. D’Mello, Scotty D. Craig, and Art C. Graesser. 2009. Multimethod Assessment of Affective Experience and Expression during Deep Learning. Int. J. Learn. Technol. 4, 3/4 (oct 2009), 165–187. https://doi.org/10.1504/IJLT.2009.028805
[9]
John J. Dudley and Per Ola Kristensson. 2018. A Review of User Interface Design for Interactive Machine Learning. ACM Trans. Interact. Intell. Syst. 8, 2, Article 8 (jun 2018), 37 pages. https://doi.org/10.1145/3185517
[10]
SIDNEY K D’Mello and Arthur C Graesser. 2014. Confusion. In International handbook of emotions in education. Routledge, 299–320.
[11]
Paul Ekman, Wallace V Friesen, Maureen O’sullivan, Anthony Chan, Irene Diacoyanni-Tarlatzis, Karl Heider, Rainer Krause, William Ayhan LeCompte, Tom Pitcairn, Pio E Ricci-Bitti, 1987. Universals and cultural differences in the judgments of facial expressions of emotion.Journal of personality and social psychology 53, 4 (1987), 712.
[12]
Maliheh Ghajargar, Jan Persson, Jeffrey Bardzell, Lars Holmberg, and Agnes Tegen. 2020. The UX of Interactive Machine Learning. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3419249.3421236
[13]
Donald Honeycutt, Mahsan Nourani, and Eric Ragan. 2020. Soliciting Human-in-the-Loop User Feedback for Interactive Machine Learning Reduces User Trust and Impressions of Model Accuracy. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 8, 1 (Oct. 2020), 63–72. https://ojs.aaai.org/index.php/HCOMP/article/view/7464
[14]
Lea Krause and Piek Vossen. 2020. When to explain: Identifying explanation triggers in human-agent interaction. In 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence. 55–60.
[15]
Sébastien Lallé, Cristina Conati, and Giuseppe Carenini. 2016. Predicting Confusion in Information Visualization from Eye Tracking and Interaction Data. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (New York, New York, USA) (IJCAI’16). AAAI Press, 2529–2535.
[16]
Jia Zheng Lim, James Mountstephens, and Jason Teo. 2020. Emotion Recognition Using Eye-Tracking: Taxonomy, Review and Current Challenges. Sensors 20, 8 (2020). https://doi.org/10.3390/s20082384
[17]
luxonis. 2020. OAK-D: Stereo camera with Edge AI. https://luxonis.com/ Stereo Camera with Edge AI capabilities from Luxonis and OpenCV.
[18]
Abdulrahman Mohamed Selim, Michael Barz, Omair Shahzad Bhatti, Hasan Md Tusfiqur Alam, and Daniel Sonntag. 2024. A review of machine learning in scanpath analysis for passive gaze-based interaction. Frontiers in Artificial Intelligence 7 (2024). https://doi.org/10.3389/frai.2024.1391745
[19]
Sucheta Nadkarni and Reetika Gupta. 2007. A Task-Based Model of Perceived Website Complexity. MIS Quarterly 31, 3 (2007), 501–524. http://www.jstor.org/stable/25148805
[20]
Anneli Olsen. 2012. The Tobii IVT Fixation Filter Algorithm description. https://api.semanticscholar.org/CorpusID:52834703
[21]
Mariya Pachman, Amaël Arguel, Lori Lockyer, Gregor Kennedy, and Jason Lodge. 2016. Eye tracking and early detection of confusion in digital learning environments: Proof of concept. Australasian Journal of Educational Technology 32, 6 (Dec. 2016). https://doi.org/10.14742/ajet.3060
[22]
Manuela Pollak, Andrea Salfinger, and Karin Anna Hummel. 2022. Teaching Drones on the Fly: Can Emotional Feedback Serve as Learning Signal for Training Artificial Agents?arXiv preprint arXiv:2202.09634 (2022).
[23]
Joni Salminen, Bernard J. Jansen, Jisun An, Soon-Gyo Jung, Lene Nielsen, and Haewoon Kwak. 2018. Fixation and Confusion: Investigating Eye-Tracking Participants’ Exposure to Information in Personas. In Proceedings of the 2018 Conference on Human Information Interaction I&’ Retrieval (New Brunswick, NJ, USA) (CHIIR ’18). Association for Computing Machinery, New York, NY, USA, 110–119. https://doi.org/10.1145/3176349.3176391
[24]
Joni Salminen, Mridul Nagpal, Haewoon Kwak, Jisun An, Soon-gyo Jung, and Bernard J. Jansen. 2019. Confusion Prediction from Eye-Tracking Data: Experiments with Machine Learning. In Proceedings of the 9th International Conference on Information Systems and Technologies (Cairo, Egypt) (icist 2019). Association for Computing Machinery, New York, NY, USA, Article 5, 9 pages. https://doi.org/10.1145/3361570.3361577
[25]
Dario D Salvucci and Joseph H Goldberg. 2000. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the 2000 symposium on Eye tracking research & applications. 71–78.
[26]
Abraham. Savitzky and M. J. E. Golay. 1964. Smoothing and Differentiation of Data by Simplified Least Squares Procedures.Analytical Chemistry 36, 8 (1964), 1627–1639. https://doi.org/10.1021/ac60214a047
[27]
Abdulrahman Mohamed Selim, Omair Shahzad Bhatti, Michael Barz, and Daniel Sonntag. 2024. Perceived Text Relevance Estimation Using Scanpaths and GNNs. In Proceedings of the INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI ’24) (San Jose, Costa Rica) (ICMI ’24). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3678957.3685736
[28]
Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, and Raffaella Bernardi. 2017. "FOIL it! Find One mismatch between Image and Language caption". In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 1: Long Papers). 255–265.
[29]
Shane D Sims and Cristina Conati. 2020. A neural architecture for detecting user confusion in eye-tracking data. In Proceedings of the 2020 international conference on multimodal interaction (Virtual Event, Netherlands) (ICMI ’20). Association for Computing Machinery, New York, NY, USA, 15–23. https://doi.org/10.1145/3382507.3418828
[30]
Harshinee Sriram, Cristina Conati, and Thalia Field. 2023. Classification of Alzheimer’s Disease with Deep Learning on Eye-tracking Data. In Proceedings of the 25th International Conference on Multimodal Interaction. 104–113.
[31]
Benjamin Voloh, Marcus Watson, Seth Konig, and Thilo Womelsdorf. 2020. MAD saccade: statistically robust saccade threshold estimation via the median absolute deviation. Journal of Eye Movement Research 12 (05 2020). https://doi.org/10.16910/jemr.12.8.3
[32]
Jan Zacharias, Michael Barz, and Daniel Sonntag. 2018. A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces. arxiv:1803.04818 [cs.HC]
[33]
Zhihong Zeng, Maja Pantic, Glenn I. Roisman, and Thomas S. Huang. 2009. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 1 (2009), 39–58. https://doi.org/10.1109/TPAMI.2008.52

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI Companion '24: Companion Proceedings of the 26th International Conference on Multimodal Interaction
November 2024
252 pages
ISBN:9798400704635
DOI:10.1145/3686215
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. disagreement detection
  2. emotion detection
  3. eye tracking
  4. gaze
  5. interactive machine learning
  6. user disagreement

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICMI '24
Sponsor:
ICMI '24: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
November 4 - 8, 2024
San Jose, Costa Rica

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 59
    Total Downloads
  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)8
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media