short-paper

Speech Emotion Recognition via Attention-based DNN from Multi-Task Learning

Authors:

Fei Ma,

Lin ZhangAuthors Info & Claims

SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems

Pages 363 - 364

https://doi.org/10.1145/3274783.3275184

Published: 04 November 2018 Publication History

Get Access

Abstract

Speech unlocks the huge potentials in emotion recognition. High accurate and real-time understanding of human emotion via speech assists Human-Computer Interaction. Previous works are often limited in either coarse-grained emotion learning tasks or the low precisions on the emotion recognition. To solve these problems, we construct a real-world large-scale corpus composed of 4 common emotions (i.e., anger, happiness, neutral and sadness). We also propose a multi-task attention-based DNN model (i.e., MT-A-DNN) on the emotion learning. MT-A-DNN efficiently learns the high-order dependency and non-linear correlations underlying in the audio data. Extensive experiments show that MT-A-DNN outperforms conventional methods on the emotion recognition. It could take one step further on the real-time acoustic emotion recognition in many smart audio-devices.

References

[1]

Moataz El Ayadi, Mohamed S Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572--587.

Digital Library

Google Scholar

[2]

Theodoros Giannakopoulos. 2015. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis. PloS one 10, 12 (2015).

Google Scholar

[3]

Weixi Gu. 2017. PhD Forum Abstract: Non-intrusive Blood Glucose Monitor by Multi-task Deep Learning. In Information Processing in Sensor Networks (IPSN), 2017 16th ACM/IEEE International Conference on. IEEE, 249--250.

Digital Library

Google Scholar

[4]

Weixi Gu, Yuxun Zhou, Zimu Zhou, Xi Liu, Han Zou, Pei Zhang, Costas J Spanos, and Lin Zhang. 2017. Sugarmate: Non-intrusive blood glucose monitoring with smartphones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 54.

Digital Library

Google Scholar

[5]

Weixi Gu, Zimu Zhou, Yuxun Zhou, Han Zou, Yunxin Liu, Costas J Spanos, and Lin Zhang. 2017. BikeMate: Bike Riding Behavior Monitoring with Smartphones. In Proceedings of the 14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, MobiQuitous 2017. ACM.

Digital Library

Google Scholar

[6]

YAWEI MU, LUIS A HERNÁNDEZ GÓMEZ, ANTONIO CANO MONTES, CARLOS ALCARAZ MARTÍNEZ, XUETIAN WANG, and HONGMIN GAO. 2017. Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model. DEStech Transactions on Computer Science and Engineering cii (2017).

Google Scholar

[7]

Bjorn Schuller, Bogdan Vlasenko, Florian Eyben, Martin Wollmer, Andre Stuhlsatz, Andreas Wendemuth, and Gerhard Rigoll. 2010. Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions on Affective Computing 1, 2 (2010), 119--131.

Digital Library

Google Scholar

[8]

Fei Tao, Gang Liu, and Qingen Zhao. 2018. An Ensemble Framework of Voice-Based Emotion Recognition System for Films and TV Programs. arXiv preprint arXiv:1803.01122 (2018).

Google Scholar

[9]

Rui Xia and Yang Liu. 2017. A multi-task learning framework for emotion recognition using 2D continuous space. IEEE Transactions on Affective Computing 1 (2017), 3--14.

Digital Library

Google Scholar

Cited By

View all

Liu ILi SMa DLuan JWu XLiu FShen YNi S(2024)Detecting Moral Emotions with Facial and Vocal Expressions: A Multimodal Emotion Recognition Approach2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS)10.1109/ICHMS59971.2024.10555674(1-5)Online publication date: 15-May-2024
https://doi.org/10.1109/ICHMS59971.2024.10555674
Latif SRana RKhalifa SJurdak RSchuller B(2023)Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322174914:4(3164-3176)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TAFFC.2022.3221749
Fu CLiu CIshi CIshiguro H(2023)An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian RegularizationIEEE Transactions on Affective Computing10.1109/TAFFC.2022.316909114:3(2361-2374)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3169091
Show More Cited By

Index Terms

Speech Emotion Recognition via Attention-based DNN from Multi-Task Learning
1. Computing methodologies
  1. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Abstract
Speech emotion recognition is very challenging because the definition of emotion is uncertain and the feature representation is complex. Accurate feature representation is one of the key factors for successful speech emotion recognition. Studies ...
Vocal Emotion Recognition Based on HMM and GMM for Mandarin Speech
ETCS '11: Proceedings of the 2011 Third International Workshop on Education Technology and Computer Science - Volume 02

The recognition of emotions from speech is a challenging issue. In this paper, two Hidden Markov Model-based vocal emotion classifiers are trained and evaluated by an emotional mandarin speech corpus based on Mel-Frequency Cepstral Coefficient features. ...
Segment-based emotion recognition from continuous Mandarin Chinese speech

Recognition of emotion in speech has recently matured to one of the key disciplines in speech analysis serving next generation human-machine interaction and communication. However, compared to automatic speech recognition, that emotion recognition from ...

Comments

Information & Contributors

Information

Published In

SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems

November 2018

449 pages

ISBN:9781450359528

DOI:10.1145/3274783

Editors:
Gowri Sankar Ramachandran
University of Southern California, Los Angeles
,
Bhaskar Krishnamachari
University of Southern California, Los Angeles

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

SenSys '18

Sponsor:

SenSys '18: The 16th ACM Conference on Embedded Networked Sensor Systems

November 4 - 7, 2018

Shenzhen, China

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
407
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liu ILi SMa DLuan JWu XLiu FShen YNi S(2024)Detecting Moral Emotions with Facial and Vocal Expressions: A Multimodal Emotion Recognition Approach2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS)10.1109/ICHMS59971.2024.10555674(1-5)Online publication date: 15-May-2024
https://doi.org/10.1109/ICHMS59971.2024.10555674
Latif SRana RKhalifa SJurdak RSchuller B(2023)Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322174914:4(3164-3176)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TAFFC.2022.3221749
Fu CLiu CIshi CIshiguro H(2023)An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian RegularizationIEEE Transactions on Affective Computing10.1109/TAFFC.2022.316909114:3(2361-2374)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3169091
Ni YDing RChen YHou HNi S(2023)Focusing on Needs: A Chatbot-Based Emotion Regulation Tool for Adolescents2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394600(2295-2300)Online publication date: 1-Oct-2023
https://doi.org/10.1109/SMC53992.2023.10394600
Ma FLi YNi SHuang SZhang L(2022)Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GANApplied Sciences10.3390/app1201052712:1(527)Online publication date: 5-Jan-2022
https://doi.org/10.3390/app12010527
Latif SRana RKhalifa SJurdak REpps JSchuller B(2022)Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2020.298366913:2(992-1004)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TAFFC.2020.2983669
Swetha KSeventline J(2022)Refined Feature Vectors for Human Emotion Classifier by combining multiple learning strategies with Recurrent Neural Networks2022 International Conference on Breakthrough in Heuristics And Reciprocation of Advanced Technologies (BHARAT)10.1109/BHARAT53139.2022.00042(160-165)Online publication date: Apr-2022
https://doi.org/10.1109/BHARAT53139.2022.00042
Li SShi WWang JZhou H(2021)A Deep Learning-Based Approach to Constructing a Domain Sentiment Lexicon: a Case Study in Financial Distress PredictionInformation Processing & Management10.1016/j.ipm.2021.10267358:5(102673)Online publication date: Sep-2021
https://doi.org/10.1016/j.ipm.2021.102673
Ma FZhang WLi YHuang SZhang L(2020)Learning Better Representations for Audio-Visual Emotion Recognition with Common InformationApplied Sciences10.3390/app1020723910:20(7239)Online publication date: 16-Oct-2020
https://doi.org/10.3390/app10207239
Sefara TMokgonyane T(2020)Emotional Speaker Recognition based on Machine and Deep Learning2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)10.1109/IMITEC50163.2020.9334138(1-8)Online publication date: 25-Nov-2020
https://doi.org/10.1109/IMITEC50163.2020.9334138
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning

Vocal Emotion Recognition Based on HMM and GMM for Mandarin Speech

Segment-based emotion recognition from continuous Mandarin Chinese speech

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning

Vocal Emotion Recognition Based on HMM and GMM for Mandarin Speech

Segment-based emotion recognition from continuous Mandarin Chinese speech

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations