Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3607947.3608029acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesic3Conference Proceedingsconference-collections
research-article

Deep Learning Technique to Diagnose Depression in Audio

Published: 28 September 2023 Publication History

Abstract

Depression is a prevalent psychiatric condition that has to be identified and treated right away. It may cause suicidal ideation in extreme instances. The requirement for creating an efficient audio-based automated depression identification system has recently piqued the fascination of researchers. The bulk of studies conducted so far incorporates a broad range of expertly created audio elements for a depression diagnosis. This expands feature space and causes a high-dimensionality problem which complicates pattern identification and increases the chance of data imbalance. This paper suggests a deep learning autoencoder-based method to retrieve pertinent and condensed features from speech signals in order to more precisely diagnose mental illness. The performance and efficacy of the suggested approach are evaluated on the DAIC-WoZ dataset and compared the results with other noteworthy machine learning algorithms. According to the findings, this technique works better than existing audio-based depression detection models when used with an SVM classifier resulting in an accuracy of 97% for diagnosing depression.

References

[1]
Tuka Al Hanai, Mohammad M Ghassemi, and James R Glass. 2018. Detecting Depression with Audio/Text Sequence Modeling of Interviews. In Interspeech. 1716–1720.
[2]
Joana Correia, Bhiksha Raj, and Isabel Trancoso. 2018. Querying depression vlogs. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 987–993.
[3]
Joana Correia, Isabel Trancoso, and Bhiksha Raj. 2016. Detecting psychological distress in adults through transcriptions of clinical interviews. In Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23-25, 2016, Proceedings 3. Springer, 162–171.
[4]
Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Sebastian Schnieder, and Jarek Krajewski. 2015. Analysis of acoustic space variability in speech affected by depression. Speech Communication 75 (2015), 27–49.
[5]
Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing 7, 2 (2015), 190–202.
[6]
Jonathan Gratch, Ron Artstein, Gale Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, 2014. The distress analysis interview corpus of human and computer interviews. Technical Report. University of Southern California Los Angeles.
[7]
Lang He and Cui Cao. 2018. Automated depression analysis using convolutional neural networks from speech. Journal of biomedical informatics 83 (2018), 103–111.
[8]
Paula Lopez-Otero and Laura Docio-Fernandez. 2021. Analysis of gender and identity issues in depression detection on de-identified speech. Computer Speech & Language 65 (2021), 101118.
[9]
Paula Lopez-Otero, Laura Docio-Fernandez, and Carmen Garcia-Mateo. 2015. Assessing speaker independence on a speech-based depression level estimation system. Pattern Recognition Letters 68 (2015), 343–350.
[10]
Paula Lopez-Otero, Laura Docío Fernández, Alberto Abad, and Carmen Garcia-Mateo. 2017. Depression Detection Using Automatic Transcriptions of De-Identified Speech. In INTERSPEECH. 3157–3161.
[11]
Colin D Mathers and Dejan Loncar. 2006. Projections of global mortality and burden of disease from 2002 to 2030. PLoS medicine 3, 11 (2006), e442.
[12]
Michelle Renee Morales and Rivka Levitan. 2016. Speech vs. text: A comparative analysis of features for depression detection systems. In 2016 IEEE spoken language technology workshop (SLT). IEEE, 136–143.
[13]
Md Nasir, Arindam Jati, Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, and Panayiotis Georgiou. 2016. Multimodal and multiresolution depression detection from speech and facial landmark features. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 43–50.
[14]
World Health Organization. 2021. Depressive disorder (depression). https://www.who.int/news-room/fact-sheets/detail/depression
[15]
Eugenia Palylyk-Colwell and Charlene Argáez. 2018. Telehealth for the assessment and treatment of depression, post-traumatic stress disorder, and anxiety: clinical evidence. (2018).
[16]
Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th annual workshop on audio/visual emotion challenge. 3–9.
[17]
Bo Sun, Yinghui Zhang, Jun He, Lejun Yu, Qihua Xu, Dongliang Li, and Zhaoying Wang. 2017. A random forest regression method with selected-text feature for depression assessment. In Proceedings of the 7th annual workshop on Audio/Visual emotion challenge. 61–68.
[18]
Zafi Sherhan Syed, Kirill Sidorov, and David Marshall. 2017. Depression severity prediction based on biomarkers of psychomotor retardation. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 37–43.
[19]
Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 3–10.
[20]
Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. 2014. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th international workshop on audio/visual emotion challenge. 3–10.
[21]
James R Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie Dagli, and Thomas F Quatieri. 2016. Detecting depression using vocal, facial and semantic communication cues. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. 11–18.
[22]
James R Williamson, Thomas F Quatieri, Brian S Helfer, Gregory Ciccarelli, and Daryush D Mehta. 2014. Vocal and facial biomarkers of depression based on motor incoordination and timing. In Proceedings of the 4th international workshop on audio/visual emotion challenge. 65–72.
[23]
Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision tree based depression classification from audio video and language information. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 89–96.
[24]
Le Yang, Dongmei Jiang, Xiaohan Xia, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2017. Multimodal measurement of depression using deep learning models. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 53–59.
[25]
Xiaowei Zhang, Jian Shen, Zia ud Din, Jinyong Liu, Gang Wang, and Bin Hu. 2019. Multimodal depression detection: fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble. IEEE journal of biomedical and health informatics 23, 6 (2019), 2265–2275.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing
August 2023
783 pages
ISBN:9798400700224
DOI:10.1145/3607947
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Audio Feature Extraction
  2. Auto-encoder
  3. Deep Learning
  4. Depression Detection
  5. Mental Illness

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IC3 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 93
    Total Downloads
  • Downloads (Last 12 months)93
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media