research-article

Multimodal Measurement of Depression Using Deep Learning Models

Authors:

Meshia Cédric Oveneke,

Hichem SahliAuthors Info & Claims

AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

Pages 53 - 59

https://doi.org/10.1145/3133944.3133948

Published: 23 October 2017 Publication History

Abstract

This paper addresses multi-modal depression analysis. We propose a multi-modal fusion framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. Our framework considers audio, video and text streams. For each modality, handcrafted feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, then the learned features are fed to a DNN to predict the PHQ-8 scores. For multi-modal fusion, the estimated PHQ-8 scores from the three modalities are integrated in a DNN to obtain the final PHQ-8 score. Moreover, in this work, we propose new feature descriptors for text and video. For the text descriptors, we select the participant»s answers to the questions associated with psychoanalytic aspects of depression, such as sleep disorder, and make use of the Paragraph Vector (PV) to learn the distributed representations of these sentences. For the video descriptors, we propose a new global descriptor, the Histogram of Displacement Range (HDR), calculated directly from the facial landmarks to measure their displacements and speed. Experiments have been carried out on the AVEC2017 depression sub-challenge dataset. The obtained results show that the proposed depression recognition framework obtains very promising accuracy, with the root mean square error (RMSE) as 4.653, mean absolute error (MAE) as 3.980 on the development set, and RMSE as 5.974, MAE as 5.163 on the test set.

References

[1]

Sharifa Alghowinem, Roland Goecke, Michael Wagner, Julien Epps, Tom Gedeon, Michael Breakspear, and Gordon Parker. 2013. A comparative study of different classifiers for detecting depression from spontaneous speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8022--8026.

[2]

Jeffrey F. Cohn, Tomas Simon Kruez, Iain Matthews, Ying Yang, Minh Hoai Nguyen, Margara Tejera Padilla, Feng Zhou, and Fernando De la Torre. 2009. Detecting depression from facial actions and vocal prosody. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on. IEEE, 1--7.

[3]

Nicholas Cummins, Julien Epps, Michael Breakspear, and Roland Goecke. 2011. An investigation of depressed speech detection: Features and normalization. In Twelfth Annual Conference of the International Speech Communication Association.

[4]

Florian Eyben, Klaus R. Scherer, Bjórn W. Schuller, Johan Sundberg, Elisabeth André Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, and Shrikanth S. Narayanan. 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing 7, 2 (2016), 190--202.

Digital Library

[5]

Ringeval Fabien, Bjorn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Mozgai Sharon, Cummins Nicholas, Schmitt Maximilian, and Maja Pantic. 2017. AVEC 2017 - Real-life Depression, and Affect Recognition Workshop and Challenge. In Proceedings of the 7th International Workshop on Audio/Visual Emotion Challenge.

Digital Library

[6]

Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.

Digital Library

[7]

Lynne Friedli, World Health Organization, et al. 2009. Mental health, resilience and inequalities. (2009).

[8]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.

Digital Library

[9]

Jonathan Gratch, Ron Artstein, Gale M. Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, et al. 2014. The Distress Analysis Interview Corpus of human and computer interviews. In LREC. 3123--3128.

[10]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).

[11]

Yelin Kim, Honglak Lee, and Emily Mower Provost. 2013. Deep learning for robust feature generation in audiovisual emotion recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3687--3691.

[12]

Shashidhar G. Koolagudi and K. Sreenivasa Rao. 2012. Emotion recognition from speech: a review. International journal of speech technology 15, 2 (2012), 99--117.

Digital Library

[13]

Duc Le and Emily Mower Provost. 2013. Emotion recognition from spontaneous speech using hidden markov models with deep belief networks. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 216--221.

[14]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1188--1196.

Digital Library

[15]

Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, and Hichem Sahli. 2013. Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 312--317.

Digital Library

[16]

Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, and Yunhong Wang. 2016. DepAudioNet: An Efficient Deep Model for Audio based Depression Classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 35--42.

Digital Library

[17]

Veena Mayya, Radhika M. Pai, and M. M. Manohara Pai. 2016. Automatic Facial Expression Recognition Using DCNN. Procedia Computer Science 93 (2016), 453--461.

[18]

Vikramjit Mitra, Elizabeth Shriberg, Mitchell McLaren, Andreas Kathol, Colleen Richey, Dimitra Vergyri, and Martin Graciarena. 2014. The SRI AVEC-2014 evaluation system. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 93--101.

Digital Library

[19]

Michelle Renee Morales and Rivka Levitan. 2016. Speech vs. text: A comparative analysis of features for depression detection systems. In Spoken Language Technology Workshop (SLT), 2016 IEEE. IEEE, 136--143.

[20]

Anastasia Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, Matthew Pediaditis, Dimitris Manousos, Alexandros Roniotis, Georgios Giannakakis, Fabrice Meriaudeau, Panagiotis Simos, Kostas Marias, et al. 2016. Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 27--34.

Digital Library

[21]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.

[22]

Bjórn Schuller, Stefan Steidl, Anton Batliner, Elmar Nüth, Alessandro Vinciarelli, Felix Burkhardt, Rob Son, Felix Weninger, Florian Eyben, and Tobias Bocklet. 2012. The INTERSPEECH 2012 speaker trait challenge. In INTERSPEECH 2012, Conference of the International Speech Communication Association.

[23]

Björn W. Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, and Yue Zhang. 2014. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In INTERSPEECH. 427--431.

[24]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[25]

James R. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie Dagli, and Thomas F. Quatieri. 2016. Detecting Depression using Vocal, Facial and Semantic Communication Cues. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 11--18.

Digital Library

[26]

James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Gregory Ciccarelli, and Daryush D. Mehta. 2014. Vocal and facial biomarkers of depression based on motor incoordination and timing. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 65--72.

Digital Library

[27]

Jingwei Yan, Wenming Zheng, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong, and Ning Sun. 2016. Multi-clue fusion for emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 458--463.

Digital Library

[28]

Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision Tree Based Depression Classification from Audio Video and Language Information. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 89--96.

Digital Library

[29]

Yu Zhu, Yuanyuan Shang, Zhuhong Shao, and Guodong Guo. 1949. Automated Depression Diagnosis based on Deep Networks to Encode Facial Appearance and Dynamics. IEEE Transactions on Affective Computing PP, 99 (1949), 1--1.

Cited By

Prof. Saba Anjum Patel Kalakshi Jadhav Sayali Ligade Vishal Mahajan Keshav Anant (2024)Depression Prediction using Machine Learning AlgorithmsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-18279(526-532)Online publication date: 16-May-2024
https://doi.org/10.48175/IJARSCT-18279
Gautam RSharma M(2024)Computational Approaches for Anxiety and Depression: A Meta- Analytical PerspectiveICST Transactions on Scalable Information Systems10.4108/eetsis.623211Online publication date: 14-Aug-2024
https://doi.org/10.4108/eetsis.6232
Zhang ZZhang SNi DWei ZYang KJin SHuang GLiang ZZhang LLi LDing HZhang ZWang J(2024)Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text DataSensors10.3390/s2412371424:12(3714)Online publication date: 7-Jun-2024
https://doi.org/10.3390/s24123714
Show More Cited By

Index Terms

Multimodal Measurement of Depression Using Deep Learning Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Hybrid Depression Classification and Estimation from Audio Video and Text Information
AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

In this paper, we design a hybrid depression classification and depression estimation framework from audio, video and text descriptors. It contains three main components: 1) Deep Convolutional Neural Network (DCNN) and Deep Neural Network (DNN) based ...
Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech
Pattern Recognition. ICPR International Workshops and Challenges
Abstract
Intelligent monitoring systems and affective computing applications have emerged in recent years to enhance healthcare. Examples of these applications include assessment of affective states such as Major Depressive Disorder (MDD). MDD describes ...
Depression Detection with Convolutional Neural Networks: A Step Towards Improved Mental Health Care
Abstract
Depression is a mental disease affecting 5% of the population, and its prevalence is increasing. Depression is characterized by feelings of worthlessness, hopelessness, disinterest in enjoyable activities, and sadness, which can result in suicidal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

October 2017

78 pages

ISBN:9781450355025

DOI:10.1145/3133944

General Chairs:
Fabien Ringeval
University of Grenoble Alpes
,
Björn Schuller
University of Passau/Imperial College London
,
Michel Valstar
University of Nottingham
,
Jonathan Gratch
University of Southern California
,
Roddy Cowie
Queen's University Belfast
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shaanxi Provincial International Science and Technology Collaboration Project
VUB Interdisciplinary Research Program
National Natural Science Foundation of China

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23, 2017

California, Mountain View, USA

Acceptance Rates

AVEC '17 Paper Acceptance Rate 8 of 17 submissions, 47%;

Overall Acceptance Rate 52 of 98 submissions, 53%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

103
Total Citations
View Citations
1,929
Total Downloads

Downloads (Last 12 months)222
Downloads (Last 6 weeks)24

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Prof. Saba Anjum Patel Kalakshi Jadhav Sayali Ligade Vishal Mahajan Keshav Anant (2024)Depression Prediction using Machine Learning AlgorithmsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-18279(526-532)Online publication date: 16-May-2024
https://doi.org/10.48175/IJARSCT-18279
Gautam RSharma M(2024)Computational Approaches for Anxiety and Depression: A Meta- Analytical PerspectiveICST Transactions on Scalable Information Systems10.4108/eetsis.623211Online publication date: 14-Aug-2024
https://doi.org/10.4108/eetsis.6232
Zhang ZZhang SNi DWei ZYang KJin SHuang GLiang ZZhang LLi LDing HZhang ZWang J(2024)Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text DataSensors10.3390/s2412371424:12(3714)Online publication date: 7-Jun-2024
https://doi.org/10.3390/s24123714
Khoo LLim MChong CMcNaney R(2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
https://doi.org/10.3390/s24020348
Xu XLi JZhu ZZhao LWang HSong CChen YZhao QYang JPei Y(2024)A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical DiagnosisBioengineering10.3390/bioengineering1103021911:3(219)Online publication date: 25-Feb-2024
https://doi.org/10.3390/bioengineering11030219
Jianhua TCunhang FZheng LZhao LYing SShan L(2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
https://doi.org/10.11834/jig.240017
Li MSun XWang M(2024)Detecting Depression With Heterogeneous Graph Neural Network in Clinical Interview TranscriptIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326305611:1(1315-1324)Online publication date: Feb-2024
https://doi.org/10.1109/TCSS.2023.3263056
Pan YShang YShao ZLiu TGuo GDing H(2024)Integrating Deep Facial Priors Into Landmarks for Privacy Preserving Multimodal Depression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329631815:3(828-836)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2023.3296318
Hassan ABernadin S(2024)A Comprehensive Analysis of Speech Depression Recognition SystemsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500078(1509-1518)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500078
Liang KChen JHe TWang WSingh ARawat DSong HLyu Z(2024)Review of the Open Data Sets for Contactless SensingIEEE Internet of Things Journal10.1109/JIOT.2024.335183811:11(19000-19022)Online publication date: 1-Jun-2024
https://doi.org/10.1109/JIOT.2024.3351838
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents