Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3133944.3133948acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multimodal Measurement of Depression Using Deep Learning Models

Published: 23 October 2017 Publication History

Abstract

This paper addresses multi-modal depression analysis. We propose a multi-modal fusion framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. Our framework considers audio, video and text streams. For each modality, handcrafted feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, then the learned features are fed to a DNN to predict the PHQ-8 scores. For multi-modal fusion, the estimated PHQ-8 scores from the three modalities are integrated in a DNN to obtain the final PHQ-8 score. Moreover, in this work, we propose new feature descriptors for text and video. For the text descriptors, we select the participant»s answers to the questions associated with psychoanalytic aspects of depression, such as sleep disorder, and make use of the Paragraph Vector (PV) to learn the distributed representations of these sentences. For the video descriptors, we propose a new global descriptor, the Histogram of Displacement Range (HDR), calculated directly from the facial landmarks to measure their displacements and speed. Experiments have been carried out on the AVEC2017 depression sub-challenge dataset. The obtained results show that the proposed depression recognition framework obtains very promising accuracy, with the root mean square error (RMSE) as 4.653, mean absolute error (MAE) as 3.980 on the development set, and RMSE as 5.974, MAE as 5.163 on the test set.

References

[1]
Sharifa Alghowinem, Roland Goecke, Michael Wagner, Julien Epps, Tom Gedeon, Michael Breakspear, and Gordon Parker. 2013. A comparative study of different classifiers for detecting depression from spontaneous speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8022--8026.
[2]
Jeffrey F. Cohn, Tomas Simon Kruez, Iain Matthews, Ying Yang, Minh Hoai Nguyen, Margara Tejera Padilla, Feng Zhou, and Fernando De la Torre. 2009. Detecting depression from facial actions and vocal prosody. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on. IEEE, 1--7.
[3]
Nicholas Cummins, Julien Epps, Michael Breakspear, and Roland Goecke. 2011. An investigation of depressed speech detection: Features and normalization. In Twelfth Annual Conference of the International Speech Communication Association.
[4]
Florian Eyben, Klaus R. Scherer, Bjórn W. Schuller, Johan Sundberg, Elisabeth André Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, and Shrikanth S. Narayanan. 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing 7, 2 (2016), 190--202.
[5]
Ringeval Fabien, Bjorn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Mozgai Sharon, Cummins Nicholas, Schmitt Maximilian, and Maja Pantic. 2017. AVEC 2017 - Real-life Depression, and Affect Recognition Workshop and Challenge. In Proceedings of the 7th International Workshop on Audio/Visual Emotion Challenge.
[6]
Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.
[7]
Lynne Friedli, World Health Organization, et al. 2009. Mental health, resilience and inequalities. (2009).
[8]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.
[9]
Jonathan Gratch, Ron Artstein, Gale M. Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, et al. 2014. The Distress Analysis Interview Corpus of human and computer interviews. In LREC. 3123--3128.
[10]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[11]
Yelin Kim, Honglak Lee, and Emily Mower Provost. 2013. Deep learning for robust feature generation in audiovisual emotion recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3687--3691.
[12]
Shashidhar G. Koolagudi and K. Sreenivasa Rao. 2012. Emotion recognition from speech: a review. International journal of speech technology 15, 2 (2012), 99--117.
[13]
Duc Le and Emily Mower Provost. 2013. Emotion recognition from spontaneous speech using hidden markov models with deep belief networks. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 216--221.
[14]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14). 1188--1196.
[15]
Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, and Hichem Sahli. 2013. Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 312--317.
[16]
Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, and Yunhong Wang. 2016. DepAudioNet: An Efficient Deep Model for Audio based Depression Classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 35--42.
[17]
Veena Mayya, Radhika M. Pai, and M. M. Manohara Pai. 2016. Automatic Facial Expression Recognition Using DCNN. Procedia Computer Science 93 (2016), 453--461.
[18]
Vikramjit Mitra, Elizabeth Shriberg, Mitchell McLaren, Andreas Kathol, Colleen Richey, Dimitra Vergyri, and Martin Graciarena. 2014. The SRI AVEC-2014 evaluation system. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 93--101.
[19]
Michelle Renee Morales and Rivka Levitan. 2016. Speech vs. text: A comparative analysis of features for depression detection systems. In Spoken Language Technology Workshop (SLT), 2016 IEEE. IEEE, 136--143.
[20]
Anastasia Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, Matthew Pediaditis, Dimitris Manousos, Alexandros Roniotis, Georgios Giannakakis, Fabrice Meriaudeau, Panagiotis Simos, Kostas Marias, et al. 2016. Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 27--34.
[21]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.
[22]
Bjórn Schuller, Stefan Steidl, Anton Batliner, Elmar Nüth, Alessandro Vinciarelli, Felix Burkhardt, Rob Son, Felix Weninger, Florian Eyben, and Tobias Bocklet. 2012. The INTERSPEECH 2012 speaker trait challenge. In INTERSPEECH 2012, Conference of the International Speech Communication Association.
[23]
Björn W. Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, and Yue Zhang. 2014. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In INTERSPEECH. 427--431.
[24]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[25]
James R. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie Dagli, and Thomas F. Quatieri. 2016. Detecting Depression using Vocal, Facial and Semantic Communication Cues. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 11--18.
[26]
James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Gregory Ciccarelli, and Daryush D. Mehta. 2014. Vocal and facial biomarkers of depression based on motor incoordination and timing. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 65--72.
[27]
Jingwei Yan, Wenming Zheng, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong, and Ning Sun. 2016. Multi-clue fusion for emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 458--463.
[28]
Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision Tree Based Depression Classification from Audio Video and Language Information. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 89--96.
[29]
Yu Zhu, Yuanyuan Shang, Zhuhong Shao, and Guodong Guo. 1949. Automated Depression Diagnosis based on Deep Networks to Encode Facial Appearance and Dynamics. IEEE Transactions on Affective Computing PP, 99 (1949), 1--1.

Cited By

View all
  • (2024)Depression Prediction using Machine Learning AlgorithmsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-18279(526-532)Online publication date: 16-May-2024
  • (2024)Computational Approaches for Anxiety and Depression: A Meta- Analytical PerspectiveICST Transactions on Scalable Information Systems10.4108/eetsis.623211Online publication date: 14-Aug-2024
  • (2024)Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text DataSensors10.3390/s2412371424:12(3714)Online publication date: 7-Jun-2024
  • Show More Cited By

Index Terms

  1. Multimodal Measurement of Depression Using Deep Learning Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge
    October 2017
    78 pages
    ISBN:9781450355025
    DOI:10.1145/3133944
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dcnn-dnn
    2. depression recognition
    3. multi-modal

    Qualifiers

    • Research-article

    Funding Sources

    • Shaanxi Provincial International Science and Technology Collaboration Project
    • VUB Interdisciplinary Research Program
    • National Natural Science Foundation of China

    Conference

    MM '17
    Sponsor:
    MM '17: ACM Multimedia Conference
    October 23, 2017
    California, Mountain View, USA

    Acceptance Rates

    AVEC '17 Paper Acceptance Rate 8 of 17 submissions, 47%;
    Overall Acceptance Rate 52 of 98 submissions, 53%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)222
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Depression Prediction using Machine Learning AlgorithmsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-18279(526-532)Online publication date: 16-May-2024
    • (2024)Computational Approaches for Anxiety and Depression: A Meta- Analytical PerspectiveICST Transactions on Scalable Information Systems10.4108/eetsis.623211Online publication date: 14-Aug-2024
    • (2024)Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text DataSensors10.3390/s2412371424:12(3714)Online publication date: 7-Jun-2024
    • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
    • (2024)A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical DiagnosisBioengineering10.3390/bioengineering1103021911:3(219)Online publication date: 25-Feb-2024
    • (2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
    • (2024)Detecting Depression With Heterogeneous Graph Neural Network in Clinical Interview TranscriptIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326305611:1(1315-1324)Online publication date: Feb-2024
    • (2024)Integrating Deep Facial Priors Into Landmarks for Privacy Preserving Multimodal Depression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329631815:3(828-836)Online publication date: Jul-2024
    • (2024)A Comprehensive Analysis of Speech Depression Recognition SystemsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500078(1509-1518)Online publication date: 15-Mar-2024
    • (2024)Review of the Open Data Sets for Contactless SensingIEEE Internet of Things Journal10.1109/JIOT.2024.335183811:11(19000-19022)Online publication date: 1-Jun-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media