Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild

Published: 25 September 2023 Publication History

Abstract

Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans’ visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these devices places unnecessary constraints on the exploration of more efficient, end-to-end models of eye dynamics. In this work, we propose CLERA, a unified model for Cognitive Load and Eye Region Analysis, which achieves precise keypoint detection and spatiotemporal tracking in a joint-learning framework. Our method demonstrates significant efficiency and outperforms prior work on tasks including cognitive load estimation, eye landmark detection, and blink estimation. We also introduce a large-scale dataset of 30 k human faces with joint pupil, eye-openness, and landmark annotation, which aims at supporting future HCI research on human factors and eye-related analysis.

References

[1]
Markus Weber. 2022. Caltech Face Dataset 1999.
[2]
Ulf Ahlstrom. 2010. An eye for the air traffic controller workload. In Proceedings of the Journal of the Transportation Research Forum.
[3]
Essa R. Anas, Pedro Henriquez, and Bogdan J. Matuszewski. 2017. Online eye status detection in the wild with convolutional neural networks. In Proceedings of the VISIGRAPP (6: VISAPP). 88–95.
[4]
Tobias Appel, Peter Gerjets, Stefan Hoffmann, Korbinian Moeller, Manuel Ninaus, Christian Scharinger, Natalia Sevcenko, Franz Wortha, and Enkelejda Kasneci. 2023. Cross-task and cross-participant classification of cognitive load in an emergency simulation game. IEEE Transactions on Affective Computing 14, 2 (2023), 1558–1571.
[5]
Claudio Aracena, Sebastián Basterrech, Václav Snáel, and Juan Velásquez. 2015. Neural networks for emotion recognition based on eye tracking data. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 2632–2637.
[6]
Ali Borji and Laurent Itti. 2012. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 185–207.
[7]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the CVPR.
[8]
Bo-Chun Chen, Po-Chen Wu, and Shao-Yi Chien. 2015. Real-time eye localization, blink detection, and gaze estimation system without infrared illumination. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 715–719.
[9]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7103–7112.
[10]
Viviane Clay, Peter König, and Sabine Koenig. 2019. Eye tracking in virtual reality. Journal of Eye Movement Research 12, 1 (2019).
[11]
Kévin Cortacero, Tobias Fischer, and Yiannis Demiris. 2019. RT-BENE: A dataset and baselines for real-time blink estimation in natural environments. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0.
[12]
Roddy Cowie, Ellen Douglas-Cowie, Nicolas Tsapatsoulis, George Votsis, Stefanos Kollias, Winfried Fellenz, and John G. Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18, 1 (2001), 32–80.
[13]
Li Ding and Lex Fridman. 2019. Object as distribution. arXiv:1907.12929. Retrieved from https://arxiv.org/abs/1907.12929.
[14]
Li Ding, Michael Glazer, Meng Wang, Bruce Mehler, Bryan Reimer, and Lex Fridman. 2020. Mit-avt clustered driving scene dataset: Evaluating perception systems in real-world naturalistic driving scenarios. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 232–237.
[15]
Li Ding, Rini Sherony, Bruce Mehler, and Bryan Reimer. 2021. Perceptual evaluation of driving scene segmentation. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1444–1450.
[16]
Li Ding, Jack Terwilliger, Rini Sherony, Bryan Reimer, and Lex Fridman. 2020. MIT DriveSeg (Manual) Dataset for Dynamic Driving Scene Segmentation. Massachusetts Institute of Technology AgeLab Technical Report 2020-1. Cambridge, MA.
[17]
Li Ding, Jack Terwilliger, Rini Sherony, Bryan Reimer, and Lex Fridman. 2020. MIT DriveSeg (Semi-auto) Dataset: Large-scale Semi-automated Annotation of Semantic Driving Scenes. Massachusetts Institute of Technology AgeLab Technical Report 2020-2. Cambridge, MA.
[18]
Li Ding, Jack Terwilliger, Rini Sherony, Bryan Reimer, and Lex Fridman. 2021. Value of temporal dynamics information in driving scene segmentation. IEEE Transactions on Intelligent Vehicles 7, 1 (2021), 113–122.
[19]
Tomas Drutarovsky and Andrej Fogelton. 2014. Eye blink detection using variance of motion vectors. In Proceedings of the European Conference on Computer Vision. Springer, 436–448.
[20]
Gerhard Fischer. 2001. User modeling in human–computer interaction. User Modeling and User-adapted Interaction 11, 1 (2001), 65–86.
[21]
Tobias Fischer, Hyung Jin Chang, and Yiannis Demiris. 2018. Rt-gene: Real-time eye gaze estimation in natural environments. In Proceedings of the European Conference on Computer Vision (ECCV). 334–352.
[22]
Andrej Fogelton and Wanda Benesova. 2016. Eye blink detection based on motion vectors analysis. Computer Vision and Image Understanding 148, C (2016), 23–33.
[23]
Lex Fridman, Daniel E. Brown, Michael Glazer, William Angell, Spencer Dodd, Benedikt Jenik, Jack Terwilliger, Aleksandr Patsekin, Julia Kindelsberger, Li Ding, Sean Seaman, Alea Mehler, Andrew Sipperley, Anthony Pettinato, Bobbie D. Seppelt, Linda Angell, Bruce Mehler, and Bryan Reimer. 2019. MIT advanced vehicle technology study: Large-scale naturalistic driving study of driver behavior and interaction with automation. IEEE Access 7 (2019), 102021–102038.
[24]
Lex Fridman, Li Ding, Benedikt Jenik, and Bryan Reimer. 2019. Arguing machines: Human supervision of black box AI systems that make life-critical decisions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
[25]
Lex Fridman, Bryan Reimer, Bruce Mehler, and William T. Freeman. 2018. Cognitive load estimation in the wild. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–9.
[26]
Lex Fridman, Heishiro Toyoda, Sean Seaman, Bobbie Seppelt, Linda Angell, Joonbum Lee, Bruce Mehler, and Bryan Reimer. 2017. What can be predicted from six seconds of driver glances?. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2805–2813.
[27]
Wolfgang Fuhl, Thomas Kübler, Katrin Sippel, Wolfgang Rosenstiel, and Enkelejda Kasneci. 2015. Excuse: Robust pupil detection in real-world scenarios. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, 39–51.
[28]
Wen Gao, Bo Cao, Shiguang Shan, Xilin Chen, Delong Zhou, Xiaohua Zhang, and Debin Zhao. 2007. The CAS-PEAL large-scale Chinese face database and baseline evaluations. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38, 1 (2007), 149–161.
[29]
Stephan J. Garbin, Yiru Shen, Immo Schuetz, Robert Cavin, Gregory Hughes, and Sachin S. Talathi. 2019. Openeds: Open eye dataset. arXiv:1905.03702. Retrieved from https://arxiv.org/abs/1905.03702.
[30]
Joseph H. Goldberg and Anna M. Wichansky. 2003. Eye tracking in usability evaluation: A practitioner’s guide. In Proceedings of the Mind’s Eye. Elsevier, 493–516.
[31]
Eija Haapalainen, SeungJun Kim, Jodi F. Forlizzi, and Anind K. Dey. 2010. Psycho-physiological measures for assessing cognitive load. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing. ACM, 301–310.
[32]
Katarzyna Harezlak and Pawel Kasprowski. 2018. Application of eye tracking in medicine: A survey, research issues and challenges. Computerized Medical Imaging and Graphics 65 (2018), 176–190.
[33]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2980–2988.
[34]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[35]
Marika Hoedemaeker and Mark Neerincx. 2007. Attuning in-car user interfaces to the momentary cognitive load. In Proceedings of the International Conference on Foundations of Augmented Cognition. Springer, 286–293.
[36]
Gary B. Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition.
[37]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448–456.
[38]
Antony William Joseph and Ramaswamy Murugesh. 2020. Potential eye tracking metrics and indicators to measure cognitive load in human-computer interaction research. Journal of Scientific Research 64, 1 (2020), 168–175.
[39]
Antony William Joseph, J. Sharmila Vaiz, and Ramaswami Murugesh. 2021. Modeling cognitive load in mobile human computer interaction using eye tracking metrics. In Proceedings of the International Conference on Applied Human Factors and Ergonomics. Springer, 99–106.
[40]
Joohwan Kim, Michael Stengel, Alexander Majercik, Shalini De Mello, David Dunn, Samuli Laine, Morgan McGuire, and David Luebke. 2019. Nvgaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[41]
Ki Kim, Hyung Hong, Gi Nam, and Kang Park. 2017. A study of deep CNN-based classification of open and closed eyes using a visible light camera sensor. Sensors 17, 7 (2017), 1534.
[42]
Marc Lalonde, David Byrns, Langis Gagnon, Normand Teasdale, and Denis Laurendeau. 2007. Real-time eye blink detection with GPU-based SIFT tracking. In Proceedings of the 4th Canadian Conference on Computer and Robot Vision (CRV’07). IEEE, 481–487.
[43]
Guohao Lan, Tim Scargill, and Maria Gorlatova. 2022. EyeSyn: Psychology-inspired eye movement synthesis for gaze-based activity recognition. In Proceedings of the ACM/IEEE IPSN.
[44]
Lucie Lévêque, Hilde Bosmans, Lesley Cockmartin, and Hantao Liu. 2018. State-of-the-art: Eye-tracking studies in medical imaging. IEEE Access 6 (2018), 37023–37034.
[45]
Yulan Liang, Michelle L. Reyes, and John D. Lee. 2007. Real-time detection of driver cognitive distraction using support vector machines. IEEE Transactions on Intelligent Transportation Systems 8, 2 (2007), 340–350.
[46]
Jia Zheng Lim, James Mountstephens, and Jason Teo. 2020. Emotion recognition using eye-tracking: Taxonomy, review and current challenges. Sensors 20, 8 (2020), 2384.
[47]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.
[48]
Feng Lu, Yusuke Sugano, Takahiro Okabe, and Yoichi Sato. 2014. Adaptive linear regression for appearance-based gaze estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 10 (2014), 2033–2046.
[49]
Päivi Majaranta and Andreas Bulling. 2014. Eye tracking and eye-based human–computer interaction. In Proceedings of the Advances in Physiological Computing. Springer, 39–65.
[50]
Bruce Mehler. 2020. Is supportive driver monitoring needed to maximize trust, use, and the safety-benefits of collaborative automation? Panel on “Emerging Automotive Technologies - How our Life will Change”. United Nations Economic Commission for Europe (UNECE), Global Forum for Road Traffic Safety (WP.1). (2020).
[51]
Bruce Mehler, Bryan Reimer, and Joseph F. Coughlin. 2012. Sensitivity of physiological measures for detecting systematic variations in cognitive demand from a working memory task: An on-road study across three age groups. Human Factors 54, 3 (2012), 396–412.
[52]
Jessica S. Oliveira, Felipe O. Franco, Mirian C. Revers, Andréia F. Silva, Joana Portolese, Helena Brentani, Ariane Machado-Lima, and Fátima L. S. Nunes. 2021. Computer-aided autism diagnosis based on visual attention models using eye tracking. Scientific Reports 11, 1 (2021), 1–11.
[53]
Fred Paas, Juhani E. Tuovinen, Huib Tabbers, and Pascal W. M. Van Gerven. 2003. Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist 38, 1 (2003), 63–71.
[54]
Gang Pan, Lin Sun, Zhaohui Wu, and Shihong Lao. 2007. Eyeblink-based anti-spoofing in face recognition from a generic webcamera. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.
[55]
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4903–4911.
[56]
Anjul Patney, Joohwan Kim, Marco Salvi, Anton Kaplanyan, Chris Wyman, Nir Benty, Aaron Lefohn, and David Luebke. 2016. Perceptually-based foveated virtual reality. In Proceedings of the ACM SIGGRAPH 2016 Emerging Technologies. 1–2.
[57]
Marco Pedrotti, Mohammad Ali Mirzaei, Adrien Tedesco, Jean-Rémy Chardonnet, Frédéric Mérienne, Simone Benedetto, and Thierry Baccino. 2014. Automatic stress classification with pupil diameter analysis. International Journal of Human-Computer Interaction 30, 3 (2014), 220–236.
[58]
Prarthana Pillai, Balakumar Balasingam, Yong Hoon Kim, Chris Lee, and Francesco Biondi. 2022. Eye-Gaze metrics for cognitive load detection on a driving simulator. IEEE/ASME Transactions on Mechatronics 27, 4 (2022), 2134–2141.
[59]
P. Ramakrishnan, B. Balasingam, and F. Biondi. 2021. Cognitive load estimation for adaptive human–machine system automation. In Proceedings of the Learning Control. Elsevier, 35–58.
[60]
Miguel A. Recarte and Luis M. Nunes. 2003. Mental workload while driving: Effects on visual search, discrimination, and decision making. Journal of Experimental Psychology: Applied 9, 2 (2003), 119.
[61]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[62]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767. Retrieved from https://arxiv.org/abs/1804.02767.
[63]
Bryan Reimer, Bruce Mehler, Ying Wang, and Joseph F. Coughlin. 2012. A field study on the impact of variations in short-term memory demands on drivers’ visual attention and driving performance across three age groups. Human Factors 54, 3 (2012), 454–468.
[64]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems. 91–99.
[65]
Lars Schillingmann and Yukie Nagai. 2015. Yet another gaze detector: An embodied calibration free system for the iCub robot. In Proceedings of the 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids). IEEE, 8–13.
[66]
Rémy Siegfried, Yu Yu, and Jean-Marc Odobez. 2019. A deep learning approach for robust head pose independent eye movements recognition from videos. In Proceedings of the 11th ACM Symposium on Eye Tracking Research and Applications. ACM, 31.
[67]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.
[68]
Fengyi Song, Xiaoyang Tan, Xue Liu, and Songcan Chen. 2014. Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients. Pattern Recognition 47, 9 (2014), 2825–2838.
[69]
John Sweller, Paul Ayres, and Slava Kalyuga. 2011. Measuring cognitive load. In Proceedings of the Cognitive Load Theory. Springer, 71–85.
[70]
Lech Świrski, Andreas Bulling, and Neil Dodgson. 2012. Robust real-time pupil tracking in highly off-axis images. In Proceedings of the Symposium on Eye Tracking Research and Applications. 173–176.
[71]
Marc Tonsen, Xucong Zhang, Yusuke Sugano, and Andreas Bulling. 2016. Labelled pupils in the wild: A dataset for studying pupil detection in unconstrained environments. In Proceedings of the 9th Biennial ACM Symposium on Eye Tracking Research and Applications. 139–142.
[72]
Ying Wang, Bryan Reimer, Jonathan Dobres, and Bruce Mehler. 2014. The sensitivity of different methodologies for characterizing drivers’ gaze concentration under increased cognitive demand. Transportation Research Part F: Traffic Psychology and Behaviour 26 (2014), 227–237.
[73]
Chuhao Wu, Jackie Cha, Jay Sulek, Tian Zhou, Chandru P. Sundaram, Juan Wachs, and Denny Yu. 2020. Eye-tracking metrics predict perceived workload in robotic surgical skills training. Human Factors 62, 8 (2020), 1365–1386.
[74]
Shengyuan Yan, Cong Chi Tran, Yu Chen, Ke Tan, and Jean Luc Habiyaremye. 2017. Effect of user interface layout on the operators’ mental workload in emergency operating procedures in nuclear power plants. Nuclear Engineering and Design 322 (2017), 266–276.
[75]
Ebru Yazgan, SERT Erdi, and Deniz ŞİMŞEK. 2021. Overview of studies on the cognitive workload of the air traffic controller. International Journal of Aviation Science and Technology 2, 01 (2021), 28–36.
[76]
Jingling Zhang, Daizhong Su, Yan Zhuang, and QIU Furong. 2020. Study on cognitive load of OM interface and eye movement experiment for nuclear power system. Nuclear Engineering and Technology 52, 1 (2020), 78–86.
[77]
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It’s written all over your face: Full-face appearance-based gaze estimation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2299–2308.
[78]
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 1 (2017), 162–175.
[79]
Yilu Zhang, Yuri Owechko, and Jing Zhang. 2004. Driver cognitive workload estimation: A data-driven perspective. In Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749). IEEE, 642–647.
[80]
Gal Ziv. 2016. Gaze behavior and visual attention: A review of eye tracking studies in aviation. The International Journal of Aviation Psychology 26, 3-4 (2016), 75–104.

Cited By

View all
  • (2024)A training and assessment system for human-computer interaction combining fNIRS and eye-tracking dataAdvanced Engineering Informatics10.1016/j.aei.2024.10276562(102765)Online publication date: Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer-Human Interaction
ACM Transactions on Computer-Human Interaction  Volume 30, Issue 6
December 2023
424 pages
ISSN:1073-0516
EISSN:1557-7325
DOI:10.1145/3623488
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2023
Online AM: 07 June 2023
Accepted: 02 May 2023
Revised: 24 April 2023
Received: 07 March 2022
Published in TOCHI Volume 30, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human-centered computing
  2. cognitive load estimation
  3. pupil detection
  4. driver monitoring systems
  5. computer vision
  6. machine learning

Qualifiers

  • Research-article

Funding Sources

  • Veoneer

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)602
  • Downloads (Last 6 weeks)18
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A training and assessment system for human-computer interaction combining fNIRS and eye-tracking dataAdvanced Engineering Informatics10.1016/j.aei.2024.10276562(102765)Online publication date: Oct-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media