Article

Bimodal HCI-related affect recognition

Authors:

Nicholas Rizzolo,

Thomas S. Huang,

Stephen LevinsonAuthors Info & Claims

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

Pages 137 - 143

https://doi.org/10.1145/1027933.1027958

Published: 13 October 2004 Publication History

Abstract

Perhaps the most fundamental application of affective computing will be Human-Computer Interaction (HCI) in which the computer should have the ability to detect and track the user's affective states, and make corresponding feedback. The human multi-sensor affect system defines the expectation of multimodal affect analyzer. In this paper, we present our efforts toward audio-visual HCI-related affect recognition. With HCI applications in mind, we take into account some special affective states which indicate users' cognitive/motivational states. Facing the fact that a facial expression is influenced by both an affective state and speech content, we apply a smoothing method to extract the information of the affective state from facial features. In our fusion stage, a voting method is applied to combine audio and visual modalities so that the final affect recognition accuracy is greatly improved. We test our bimodal affect recognition approach on 38 subjects with 11 HCI-related affect states. The extensive experimental results show that the average person-dependent affect recognition accuracy is almost 90% for our bimodal fusion.

References

[1]

Pantic M., Rothkrantz, L.J.M., Toward an affect-sensitive multimodal human-computer interaction, Proceedings of the IEEE, Vol. 91, No. 9, Sept. 2003, 1370--1390

[2]

Chen, L. and Huang, T. S., Emotional expressions in audiovisual human computer interaction, Int. Conf. on Multimedia & Expo 2000, 423--426

[3]

Chen, L., Huang, T. S., Miyasato, T., and Nakatsu, R., Multimodal human emotion/expression recognition, Int. Conf. on Automatic Face & Gesture Recognition 1998, 396--401

Digital Library

[4]

De Silva, L. C., and Ng, P. C., Bimodal emotion recognition, Int. Conf. on Automatic Face & Gesture Recognition 2000, 332--335

Digital Library

[5]

Yoshitomi, Y., Kim, S., Kawano, T., and Kitazoe, T., Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face, in Proc. ROMAN 2000, 178--183

[6]

Roth, D., Learning to Resolve Natural Language Ambiguities: A Unified Approach. AAAI '98, July 1998

Digital Library

[7]

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J.G., Emotion Recognition in Human-Computer Interaction, IEEE Signal Processing Magazine, January 2001, 32--80

[8]

Steeneken, H.J.M. and Hansen, J.H.L., Speech Under Stress Conditions: Overview of the Effect on Speech Production and on System Performance, in Proc. ICASSP, vol. 4, 1999, 2079--2082

Digital Library

[9]

Carlson, A.J., Cumby, C.M., Rizzolo, N.D., Rosen, J.L., and Roth, D., SNoW User Manual, UIUC Tech report UIUC-DCS-R-99-210

[10]

Tu, J., Zhang, Z., Zeng, Z. and Huang, T.S., Face Localization via Hierarchical Condensation with Fisher Boosting Feature Selection, In Proc. Computer Vision and Pattern Recognition, 2004.

Digital Library

[11]

Picard, R.W., Affective Computing, MIT Press, Cambridge, 1997.

Digital Library

[12]

Mehrabian, A., Communication without words, Psychol. Today, vol.2, no.4, 53--56, 1968

[13]

Sebe, N., Lew, M., Cohen, I., Sun, Y., Gevers, T., and Huang, T.S., Authentic Facial Expression Analysis, Int. Conf. on Automatic Face & Gesture Recognition 2004.

Digital Library

[14]

itr.beckman.uiuc.edu

[15]

Chen, L.S, Joint Processing of Audio-Visual Information for the Recognition of Emotional Expressions in Human-Computer Interaction, PhD thesis, UIUC, 2000

Digital Library

[16]

Tao, H. and Huang, T.S., Connected Vibrations: A Model Analysis Approach to Non-rigid Motion Tracking, CVPR 1998, 735--740.

Digital Library

Cited By

Shahabaz ASarkar S(2024)Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A SurveyIEEE Access10.1109/ACCESS.2024.339181712(59399-59430)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3391817
Hsu JWu C(2023)Applying Segment-Level Attention on Bi-Modal Transformer Encoder for Audio-Visual Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.325890014:4(3231-3243)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TAFFC.2023.3258900
Bursic SBoccignone GFerrara AD’Amelio ALanzarotti R(2020)Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep LearningApplied Sciences10.3390/app1011400210:11(4002)Online publication date: 9-Jun-2020
https://doi.org/10.3390/app10114002
Show More Cited By

Index Terms

Bimodal HCI-related affect recognition
1. Applied computing
  1. Education
    1. Collaborative learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
  2. Interaction design
    1. Interaction design process and methods
      1. User centered design

Recommendations

A survey of affect recognition methods: audio, visual and spontaneous expressions
ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. Promising approaches have been reported, including automatic methods ...
Read More
Audio-visual emotion recognition in adult attachment interview
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Automatic multimodal recognition of spontaneous affective expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting - Adult Attachment Interview (AAI)...
Read More
Audio-Visual Affect Recognition

The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to human-computer interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

368 pages

ISBN:1581139950

DOI:10.1145/1027933

General Chairs:
Rajeev Sharma
Advanced Interfaces
,
Trevor Darrell
Massachusetts Institute of Technology
,
Program Chairs:
Mary Harper
Purdue University, West Lafayette, IN
,
Gianni Lazzari
ITC-IRST
,
Matthew Turk
University of California, Santa Barbara, CA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI04

Sponsor:

ICMI04: Sixth International Conference on Multimodal Interfaces 2004

October 13 - 15, 2004

PA, State College, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,003
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Shahabaz ASarkar S(2024)Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A SurveyIEEE Access10.1109/ACCESS.2024.339181712(59399-59430)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3391817
Hsu JWu C(2023)Applying Segment-Level Attention on Bi-Modal Transformer Encoder for Audio-Visual Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.325890014:4(3231-3243)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TAFFC.2023.3258900
Bursic SBoccignone GFerrara AD’Amelio ALanzarotti R(2020)Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep LearningApplied Sciences10.3390/app1011400210:11(4002)Online publication date: 9-Jun-2020
https://doi.org/10.3390/app10114002
Bouzakraoui MSadiq AAlaoui A(2020)Customer Satisfaction Recognition Based on Facial Expression and Machine Learning TechniquesAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0504705:4(594-594)Online publication date: Aug-2020
https://doi.org/10.25046/aj050470
Meng ZHan SLiu PTong Y(2019)Improving Speech Related Facial Action Unit Recognition by Audiovisual Information FusionIEEE Transactions on Cybernetics10.1109/TCYB.2018.284009049:9(3293-3306)Online publication date: Sep-2019
https://doi.org/10.1109/TCYB.2018.2840090
Hossain MGedeon TBrereton M(2017)Discriminating real and posed smilesProceedings of the 29th Australian Conference on Computer-Human Interaction10.1145/3152771.3156179(581-586)Online publication date: 28-Nov-2017
https://dl.acm.org/doi/10.1145/3152771.3156179
Chen LGedeon THossain MCaldwell SBrereton M(2017)Are you really angry?Proceedings of the 29th Australian Conference on Computer-Human Interaction10.1145/3152771.3156147(412-416)Online publication date: 28-Nov-2017
https://dl.acm.org/doi/10.1145/3152771.3156147
Bezawada SHu QGray ABrick TTucker C(2017)Automatic Facial Feature Extraction for Predicting Designers' Comfort With Engineering Equipment During Prototype CreationJournal of Mechanical Design10.1115/1.4035428139:2(021102)Online publication date: 6-Jan-2017
https://doi.org/10.1115/1.4035428
Mariooryad SBusso C(2016)Facial Expression Recognition in the Presence of Speech Using Blind Lexical CompensationIEEE Transactions on Affective Computing10.1109/TAFFC.2015.24900707:4(346-359)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1109/TAFFC.2015.2490070
Dan Zbancioc MFeraru S(2015)A study about the automatic recognition of the anxiety emotional state using Emo-DB2015 E-Health and Bioengineering Conference (EHB)10.1109/EHB.2015.7391506(1-4)Online publication date: Nov-2015
https://doi.org/10.1109/EHB.2015.7391506
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents