research-article

A Deep Learning-based Stress Detection Algorithm with Speech Signal

Authors:

Kyunggeun Byun,

Hong-Goo KangAuthors Info & Claims

AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia

Pages 11 - 15

https://doi.org/10.1145/3264869.3264875

Published: 26 October 2018 Publication History

Abstract

In this paper, we propose a deep learning-based psychological stress detection algorithm using speech signals. With increasing demands for communication between human and intelligent systems, automatic stress detection is becoming an interesting research topic. Stress can be reliably detected by measuring the level of specific hormones (e.g., cortisol), but this is not a convenient method for the detection of stress in human-machine interactions. The proposed algorithm first extracts mel-filterbank coefficients using pre-processed speech data and then predicts the status of stress output using a binary decision criterion (i.e., stressed or unstressed) using long short-term memory (LSTM) and feed-forward networks. To evaluate the performance of the proposed algorithm, speech, video, and bio-signal data were collected in a well-controlled environment. We utilized only speech signals in the decision process from subjects whose salivary cortisol level varies over 10%. Using the proposed algorithm, we achieved 66.4% accuracy in detecting the stress state from 25 subjects, thereby demonstrating the possibility of utilizing speech signals for automatic stress detection.

References

[1]

A. Baum. Stress, intrusive imagery, and chronic distress. Health psychology, 9(6): 653, 1990.

[2]

N. Sharma and T. Gedeon. Objective measures, sensors and computational techniques for stress recognition and classification: A survey. Computer methods and programs in biomedicine, 108(3):1287--1301, 2012.

Digital Library

[3]

Khan M. Vijay R. Sondhi, S. and A. K. Salhan. Vocal indicators of emotional stress. International Journal of Computer Applications, 122(15), 2015.

[4]

I. R. Murray and Arnott J. L. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2):1097--1108, 1993.

[5]

Andrews S. Ellis D. Dobrescu R. Sandulescu, V. and O. Martinez-Mozos. Mobile app for stress monitoring using voice features. In E-Health and Bioengineering Conference (EHB), 2015, pages 1--4. IEEE, 2015.

[6]

Vignolo L. Schlotthauer G. Colominas M.A. Rufiner H.L. Sharma, R. and S.R.M. Prasanna. Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Communication, 88:39--64, 2017.

Digital Library

[7]

J. Lee and I. Tashev. High-level feature representation using recurrent neural network for speech emotion recognition. 2015.

[8]

C.N. Anagnostopoulos and T. Iliou. Towards emotion recognition from speech: definition, problems and the materials of research. In Semantics in Adaptive and Personalized Services, pages 127--143. Springer, 2010.

[9]

M. Hashemi. Language stress and anxiety among the english language learners. Procedia-Social and Behavioral Sciences, 30:1811--1816, 2011.

[10]

L. Woodrow. Anxiety and speaking english as a second language. RELC journal, 37(3):308--328, 2006.

[11]

K. Manley. Comparative study of foreign language anxiety in korean and chinese students. 2015.

[12]

M. Boden. A guide to recurrent neural networks and backpropagation. the Dallas project, 2002.

[13]

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157--166, 1994.

Digital Library

[14]

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.

Digital Library

[15]

Mohamed A.R. Graves, A. and G. Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645--6649. IEEE, 2013.

[16]

Kumsta R. von Dawans B. Monakhov M. Ebstein R.P. Chen, F.S. and M. Heinrichs. Common oxytocin receptor gene (oxtr) polymorphism and social support interact to reduce stress in humans. Proceedings of the National Academy of Sciences, 108 (50):19937--19942, 2011.

[17]

Yamaguchi M. Aragaki T. Eto K. Uchihashi K. Takai, N. and Y. Nishikawa. Effect of psychological stress on the salivary cortisol and amylase levels in healthy young adults. Archives of oral biology, 49(12):963--968, 2004.

[18]

L. V. D. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579--2605, 2008.

Cited By

Zainal NAsnawi AJusoh AIbrahim SMohd. Ramli H(2024)Integration of MFCCs and CNN for Multi-Class Stress Speech Classification on Unscripted DatasetIIUM Engineering Journal10.31436/iiumej.v25i2.320725:2(381-395)Online publication date: 14-Jul-2024
https://doi.org/10.31436/iiumej.v25i2.3207
Manjulatha BPabboju S(2024)Multimodal depression detection using deep learning in the workplace2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT60202.2024.10468966(1-8)Online publication date: 11-Jan-2024
https://doi.org/10.1109/ICAECT60202.2024.10468966
Rostami AMotaman KTarvirdizadeh BAlipour KGhamari M(2024)LSTM‐based real‐time stress detection using PPG signals on raspberry PiIET Wireless Sensor Systems10.1049/wss2.12083Online publication date: 30-Oct-2024
https://doi.org/10.1049/wss2.12083
Show More Cited By

Index Terms

A Deep Learning-based Stress Detection Algorithm with Speech Signal
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Speech / audio search

Recommendations

User-level psychological stress detection from social media using deep neural network
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

It is of significant importance to detect and manage stress before it turns into severe problems. However, existing stress detection methods usually rely on psychological scales or physiological devices, making the detection complicated and costly. In ...
Comparative analysis of deep learning models for dysarthric speech detection
Abstract
Dysarthria is a speech communication disorder that is associated with neurological impairments. To detect this disorder from speech, we present an experimental comparison of deep models developed based on frequency domain features. A comparative ...
Arrhythmia detection using TQWT, CEEMD and deep CNN-LSTM neural networks with ECG signals
Abstract
Cardiac arrhythmia is a typically clinical manifestation of cardiovascular disease which leads to serious health problem. Detection of arrhythmia is traditionally relying on manual interpretation of electrocardiography (ECG) signals by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia

October 2018

46 pages

ISBN:9781450359771

DOI:10.1145/3264869

General Chairs:
Adrian Hilton
University of Surrey, UK
,
Hong-Goo Kang
Yonsei University, South Korea
,
Hansung Kim
University of Surrey, UK
,
Kwanghoon Sohn
Yonsei University, South Korea

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 26, 2018

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
654
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)9

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zainal NAsnawi AJusoh AIbrahim SMohd. Ramli H(2024)Integration of MFCCs and CNN for Multi-Class Stress Speech Classification on Unscripted DatasetIIUM Engineering Journal10.31436/iiumej.v25i2.320725:2(381-395)Online publication date: 14-Jul-2024
https://doi.org/10.31436/iiumej.v25i2.3207
Manjulatha BPabboju S(2024)Multimodal depression detection using deep learning in the workplace2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT60202.2024.10468966(1-8)Online publication date: 11-Jan-2024
https://doi.org/10.1109/ICAECT60202.2024.10468966
Rostami AMotaman KTarvirdizadeh BAlipour KGhamari M(2024)LSTM‐based real‐time stress detection using PPG signals on raspberry PiIET Wireless Sensor Systems10.1049/wss2.12083Online publication date: 30-Oct-2024
https://doi.org/10.1049/wss2.12083
Duvvuri KKanisettypalli HMasabattula TVekkot SGupta DZakariah M(2024)Unravelling stress levels in continuous speech through optimal feature selection and deep learningProcedia Computer Science10.1016/j.procs.2024.04.163235(1722-1731)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.04.163
Mukherjee PHalder Roy A(2024)EEG sensor driven assistive device for elbow and finger rehabilitation using deep learningExpert Systems with Applications10.1016/j.eswa.2023.122954244(122954)Online publication date: Jun-2024
https://doi.org/10.1016/j.eswa.2023.122954
Kumar ASingh SBhardwaj ISingh PKhanna ABrahma B(2024)Audio spectrogram analysis in IoT paradigm for the classification of psychological-emotional characteristicsInternational Journal of Information Technology10.1007/s41870-024-02166-5Online publication date: 5-Sep-2024
https://doi.org/10.1007/s41870-024-02166-5
Rostami ATarvirdizadeh BAlipour KGhamari M(2024)Real-Time Stress Detection from Raw Noisy PPG Signals Using LSTM Model Leveraging TinyMLArabian Journal for Science and Engineering10.1007/s13369-024-09095-2Online publication date: 7-May-2024
https://doi.org/10.1007/s13369-024-09095-2
Sağbaş EKorukoglu SBallı S(2024)Real-time stress detection from smartphone sensor data using genetic algorithm-based feature subset optimization and k-nearest neighbor algorithmMultimedia Tools and Applications10.1007/s11042-023-15706-183:1(1-32)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15706-1
Saraswat MKumar RHarbola JKalkhundiya DKaur MKumar Goyal M(2023)Stress and Anxiety Detection via Facial Expression Through Deep Learning2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS)10.1109/ICTACS59847.2023.10389882(1565-1568)Online publication date: 1-Nov-2023
https://doi.org/10.1109/ICTACS59847.2023.10389882
Staš JHládek DSokolová ZČech MŠkotková KPoremba P(2023)Analysis and Detection of Speech under Emotional Stress2023 21st International Conference on Emerging eLearning Technologies and Applications (ICETA)10.1109/ICETA61311.2023.10343755(493-498)Online publication date: 26-Oct-2023
https://doi.org/10.1109/ICETA61311.2023.10343755
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents