Abstract
Affective computing is one of the most important research directions in human-computer interaction system which gains increasing popularity. However, the traditional affective computing methods all make decisions based on unimodal signals, which has low accuracy and poor feasibility. In this article, the final classification decision is made from the perspective of multimodal fusion results which combines the decision results of both text emotion network and visual emotion network through weighted linear fusion algorithm. It is obvious that the speaker’s intention can be better understood by observing the speaker’s expression, listening to the speaker’s tone of voice and analyzing the words. Combining auditory, visual, semantic and other modes certainly provides more information than a single mode. Video information often contains a variety of modal characteristics, only one mode is always not enough to describe all aspects of the overall video stream characteristic information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
D’mello, S.K., Kory, J.: A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. 47(3), 43–79 (2015)
Sharma, A., Anderson, D.V.: Deep emotion recognition using prosodic and spectral feature extraction and classification based on cross validation and bootstrap. In: Signal Processing and Signal Processing Education Workshop, pp. 421–425. IEEE (2015)
Eknam, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971)
Ekman, P., Friesen, W.V.: Facial action coding system: a technique for the measurement of facial movement. Palo Alto: Consul. Psychol. 17(2), 124–129 (1971)
Solymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput. (TAC) 3(2), 211–223 (2012)
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 22(12), 1424–1445 (2000)
Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vis (IJCV) 57(2), 137–154 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29(1), 51–59 (1996)
Zhang, L., Chu, R., Xiang, S.: Face detection based on multi-block LBP representation. In: International Conference on Biometrics, pp. 11–18 (2007)
Li, H., Lin, Z., Shen, X.: A convolutional neural network cascade for face detection. In: CVPR, pp. 5325–5334 (2015)
Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: IEEE International Conference on Computer Vision (ICCV), pp. 3676–3684 (2015)
Zhang, K., Zhang, Z., Li, Z., Qiao, Yu.: Joint face detection and alignment using multi task cascaded convolutional networks. Signal Process. Lett. 23(10), 1499–1503 (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017)
Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 650–657 (2017)
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage head less face detector. In: ICCV, pp. 4875–4884 (2017)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Zhang, S.F., Zhu, X.Y., Lei, Z., et al.: S3FD: singles hot scale-invariant face detector. In: ICCV, pp. 192–201 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. (TIP) 11(4), 467–476 (2002)
Shan, C., Gong, S., Mcowan, P.W.: Facial expression Recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Mavadatis, M., Mahoor, M.H., Bartlett, K., et al.: DISFA: a spontaneous facial action intensity database. TAC 4(2), 151–160 (2013)
Tan, L., Zhang, K., Wang, K., et al.: Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In: ICMI, pp. 549–552 (2017)
Soujanya, P., Navonil, M., Rada, M., Eduard, H.: Emotion recognition in conversation: research challenges, datasets, and recent advances (2019). arXiv preprint arXiv:1905.02947
Shoumy, N.J., Ang, L.M., Seng, K.P., Rahaman, D.M.M., Zia, T.: Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 149, 102447 (2020)
Neverova, N., Wolf, C., Taylor, G., et al.: Moddrop: adaptive multi-modal gesture recognition. TPAMI 38(8), 1692–1706 (2016)
Weenink, D.: Canonical Correlation Analysis Inference For Functional Data With Applications. Springer, New York (2012)
Majumder, N., Poria, S., Hazarika, D., et al.: Dialogue RNN: an attentive RNN for emotion detection in conversations. In: National Conference on Artificial Intelligence, pp. 6818–6825 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Jin, K., Wang, Y., Wu, C. (2021). Multimodal Affective Computing Based on Weighted Linear Fusion. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1252. Springer, Cham. https://doi.org/10.1007/978-3-030-55190-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-55190-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55189-6
Online ISBN: 978-3-030-55190-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)