Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi Fine-Grained Fusion Network for Depression Detection

Published: 29 June 2024 Publication History

Abstract

Depression is an illness that involves emotional and mental health. Currently, depression detection through interviews is the most popular way. With the advancement of natural language processing and sentiment analysis, automated interview-based depression detection is strongly supported. However, current multimodal depression detection models fail to adequately capture the fine-grained features of depressive behaviors, making it difficult for the models to accurately characterize the subtle changes in depressive symptoms. To address this problem, we propose a Multi Fine-Grained Fusion Network (MFFNet). The core idea of this model is to extract and fuse the information of different scale feature pairs through a Multi-Scale Fastformer (MSfastformer), and then use the Recurrent Pyramid Model to integrate the features of different resolutions, promoting the interaction of multi-level information. Through the interaction of multi-scale and multi-resolution features, it aims to explore richer feature representations. To validate the effectiveness of our proposed MFFNet model, we conduct experiments on two depression interview datasets. The experimental results show that the MFFNet model performs better in depression detection compared to other benchmark multimodal models.

References

[1]
WHO. 2017. Depression and Other Common Mental Disorders: Global Health Estimates. WHO, Geneva, Switzerland.
[2]
Yan Ding, Xuemei Chen, Qiming Fu, and Shan Zhong. 2020. A depression recognition method for college students using deep integrated support vector algorithm. IEEE Access 8 (2020), 75616–75629.
[3]
Hong Jin Jeon. 2011. Depression and suicide. Journal of the Korean Medical Association 54, 4 (2011), 370–375.
[4]
Chris Woolston and Sarah O’Meara. 2019. PhD students in China report misery and hope. Nature 575, 7784 (2019), 711–714.
[5]
Paul E. Greenberg, Andree-Anne Fournier, Tammy Sisitsky, Mark Simes, Richard Berman, Sarah H. Koenigsberg, and Ronald C. Kessler. 2021. The economic burden of adults with major depressive disorder in the United States (2010 and 2018). Pharmacoeconomics 39, 6 (2021), 653–665.
[6]
Jun-Xu Li. 2015. Pain and depression comorbidity: A preclinical perspective. Behavioural Brain Research 276 (2015), 92–98.
[7]
Max Hamilton. 1960. A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry 23, 1 (1960), 56.
[8]
R. J. Hu. 2003. Diagnostic and Statistical Manual of Mental Disorders: DSM-IV. Encyclopedia of the Neurological Sciences 25, 2 (2003), 4–8.
[9]
Aaron T. Beck, Robert A. Steer, and Gregory Brown. 1996. Beck Depression Inventory–II (BDI II). APA PsychTests.
[10]
Vikram K. Yeragani, Robert Pohl, Richard Balon, C. Ramesh, Debra Glitz, Inkwa Jung, and Paul Sherwood. 1991. Heart rate variability in patients with major depression. Psychiatry Research 37, 1 (1991), 35–46.
[11]
Celine Koch, Marcel Wilhelm, Stefan Salzmann, Winfried Rief, and Frank Euteneuer. 2019. A meta-analysis of heart rate variability in major depression. Psychological Medicine 49, 12 (2019), 1948–1957.
[12]
Maoyuan Li, Yong Song, Yongtao Hou, Ning Li, Yurong Jiang, Muhammad Sulaman, and Qun Hao. 2019. Comparable investigation of characteristics for implant intra-body communication based on galvanic and capacitive coupling. IEEE Transactions on Biomedical Circuits and Systems 13, 6 (2019), 1747–1758.
[13]
Xinfang Ding, Xinxin Yue, Rui Zheng, Cheng Bi, Dai Li, and Guizhong Yao. 2019. Classifying major depression patients and healthy controls using EEG, eye tracking and galvanic skin response data. Journal of Affective Disorders 251 (2019), 156–161.
[14]
Hanshu Cai, Zhidiao Qu, Zhe Li, Yi Zhang, Xiping Hu, and Bin Hu. 2020. Feature-level fusion approaches based on multimodal EEG data for depression recognition. Information Fusion 59 (2020), 127–138.
[15]
Wuliang Huang, Yiqiang Chen, Xinlong Jiang, Teng Zhang, and Qian Chen. 2023. GJFusion: A channel-level correlation construction method for multimodal physiological signal fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 20, 2 (2023), 1–23.
[16]
An-Qi Bi, Xiao-Yang Tian, Shui-Hua Wang, and Yu-Dong Zhang. 2022. Dynamic transfer exemplar based facial emotion recognition model toward online video. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2s (2022), 1–17.
[17]
Abdul Qayyum, Imran Razzak, M. Tanveer, and Moona Mazher. 2023. Spontaneous facial behavior analysis using deep transformer-based framework for child–computer interaction. ACM Transactions on Multimedia Computing, Communications, and Applications 20, 2 (2023), 1–17.
[18]
Jianlong Zhou, Hamad Zogan, Shuiqiao Yang, Shoaib Jameel, Guandong Xu, and Fang Chen. 2021. Detecting community depression dynamics due to COVID-19 pandemic in australia. IEEE Transactions on Computational Social Systems 8, 4 (2021), 982–991.
[19]
Cong Huang, Xiulian Peng, Dong Liu, and Yan Lu. 2023. Text image super-resolution guided by text structure and embedding priors. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 6 (2023), 1–18.
[20]
Yujuan Xing, Zhenyu Liu, Qiongqiong Chen, Gang Li, Zhijie Ding, Lei Feng, and Bin Hu. 2023. Depression recognition base on acoustic speech model of multi-task emotional stimulus. Biomedical Signal Processing and Control 85 (2023), 104970.
[21]
Mingyue Niu, Jianhua Tao, Yongwei Li, Yong Qin, and Ya Li. 2023. WavDepressionNet: Automatic depression level prediction via raw speech signals. IEEE Transactions on Affective Computing. Published Online, May 3, 2023.
[22]
Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision tree based depression classification from audio video and language information. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. 89–96.
[23]
Paula Lopez-Otero, Laura Docío Fernández, Alberto Abad, and Carmen Garcia-Mateo. 2017. Depression detection using automatic transcriptions of de-identified speech. In Proceedings of the 2017 Interspeech Conference. 3157–3161.
[24]
Tuka Al Hanai, Mohammad M. Ghassemi, and James R. Glass. 2018. Detecting depression with audio/text sequence modeling of interviews. In Proceedings of the 2018 Interspeech Conference. 1716–1720.
[25]
Mariana Rodrigues Makiuchi, Tifani Warnita, Kuniaki Uto, and Koichi Shinoda. 2019. Multimodal fusion of BERT-CNN and gated CNN representations for depression detection. In Proceedings of the 9th International Workshop on Audio/Visual Emotion Challenge. 55–63.
[26]
Syed Arbaaz Qureshi, Mohammed Hasanuzzaman, Sriparna Saha, and Gaël Dias. 2019. The verbal and non verbal signals of depression—Combining acoustics, text and visuals for estimating depression level. arXiv preprint arXiv:1904.07656 (2019).
[27]
Zhenyu Liu, Dongyu Wang, ZhiJie Ding, and Qiongqiong Chen. 2021. A novel bimodal fusion-based model for depression recognition. In Proceedings of the 2020 IEEE International Conference on E-health Networking, Application & Services (HEALTHCOM ’21). IEEE, 1–4.
[28]
Jiayu Ye, Yanhong Yu, Qingxiang Wang, Wentao Li, Hu Liang, Yunshao Zheng, and Gang Fu. 2021. Multi-modal depression detection based on emotional audio and evaluation text. Journal of Affective Disorders 295 (2021), 904–913.
[29]
Ying Shen, Huiyu Yang, and Lin Lin. 2022. Automatic depression detection: An emotional audio-textual corpus and a GRU/BiLSTM-based model. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’22). IEEE, 6247–6251.
[30]
Jing Xiao, Yongming Huang, Guobao Zhang, and Wei Liu. 2021. A deep learning method on audio and text sequences for automatic depression detection. In Proceedings of the 2021 3rd International Conference on Applied Machine Learning (ICAML ’21). IEEE, 388–392.
[31]
Ngumimi Karen Iyortsuun, Soo-Hyung Kim, Hyung-Jeong Yang, Seung-Won Kim, and Min Jhon. 2024. Additive cross-modal attention network (ACMA) for depression detection based on audio and textual features. IEEE Access 12 (2024), 20479–20489.
[32]
Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, and Yunhong Wang. 2016. DepAudioNet: An efficient deep model for audio based depression classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. 35–42.
[33]
Emna Rejaibi, Ali Komaty, Fabrice Meriaudeau, Said Agrebi, and Alice Othmani. 2022. MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomedical Signal Processing and Control 71 (2022), 103107.
[34]
Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, and Björn Schuller. 2020. Hierarchical attention transfer networks for depression assessment from speech. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’20). IEEE, 7159–7163.
[35]
Sri Harsha Dumpala, Sheri Rempel, Katerina Dikaios, Mehri Sajjadian, Rudolf Uher, and Sageev Oore. 2021. Estimating severity of depression from acoustic features and embeddings of natural speech. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’21). IEEE, 7278–7282.
[36]
Cunhang Fan, Zhao Lv, Shengbing Pei, and Mingyue Niu. 2022. CSENET: Complex squeeze-and-excitation network for speech depression level prediction. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’22). IEEE, 546–550.
[37]
Ya Li, Mingyue Niu, Ziping Zhao, and Jianhua Tao. 2022. Automatic depression level assessment from speech by long-term global information embedding. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’22). IEEE, 8507–8511.
[38]
Sanne Koops, Sanne G. Brederoo, Janna N. de Boer, Femke G. Nadema, Alban E. Voppel, and Iris E. Sommer. 2023. Speech as a biomarker for depression. CNS & Neurological Disorders—Drug Targets 22, 2 (2023), 152–160.
[39]
Mingzheng Li, Xiao Sun, and Meng Wang. 2023. Detecting depression with heterogeneous graph neural network in clinical interview transcript. IEEE Transactions on Computational Social Systems. Published Online, April 7, 2023.
[40]
Suhas Bettapalli Nagaraj and Saeed Abdullah. 2022. Privacy sensitive speech analysis using federated learning to assess depression. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’22). IEEE, 6272–6276.
[41]
Yazhou Zhang, Yu He, Lu Rong, and Yijie Ding. 2022. A hybrid model for depression detection with transformer and bi-directional long short-term memory. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM ’22). IEEE, 2727–2734.
[42]
Zhaocheng Huang, Julien Epps, Dale Joachim, and Vidhyasaharan Sethu. 2019. Natural language processing methods for acoustic and landmark event-based features in speech-based depression detection. IEEE Journal of Selected Topics in Signal Processing 14, 2 (2019), 435–448.
[43]
Danai Xezonaki, Georgios Paraskevopoulos, Alexandros Potamianos, and Shrikanth S. Narayanan. 2020. Affective conditioning on hierarchical attention networks applied to depression detection from transcribed clinical interviews. In Proceedings of the 2020 Interspeech Conference.
[44]
David E. Losada and Pablo Gamallo. 2020. Evaluating and improving lexical resources for detecting signs of depression in text. Language Resources and Evaluation 54, 1 (2020), 1–24.
[45]
Neda Firoz, Olga Grigorievna Beresteneva, Aksyonov Sergey Vladimirovich, and Mohammad Sadman Tahsin. 2023. Enhancing depression detection through advanced text analysis: Integrating BERT, autoencoder, and LSTM models. In Proceedings of the 2023 International Russian Automation Conference (RusAutoCon ’23).
[46]
Meng Niu, Kai Chen, Qingcai Chen, and Lufeng Yang. 2021. HCAG: A hierarchical context-aware graph attention model for depression detection. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’21). IEEE, 4235–4239.
[47]
Lin Lin, Xuri Chen, Ying Shen, and Lin Zhang. 2020. Towards automatic depression detection: A BiLSTM/1D CNN-based model. Applied Sciences 10, 23 (2020), 8701.
[48]
Weiquan Fan, Zhiwei He, Xiaofen Xing, Bolun Cai, and Weirui Lu. 2019. Multi-modality depression detection via multi-scale temporal dilated CNNs. In Proceedings of the 9th International Workshop on Audio/Visual Emotion Challenge. 73–80.
[49]
Kaining Mao, Wei Zhang, Deborah Baofeng Wang, Ang Li, Rongqi Jiao, Yanhui Zhu, Bin Wu, Tiansheng Zheng, Lei Qian, Wei Lyu, Minjie Ye, and Jie Chen. 2022. Prediction of depression severity based on the prosodic and semantic features with bidirectional LSTM and time distributed CNN. IEEE Transactions on Affective Computing. Published Online, February 24, 2022.
[50]
Ming Fang, Siyu Peng, Yujia Liang, Chih-Cheng Hung, and Shuhua Liu. 2023. A multimodal fusion model with multi-level attention mechanism for depression detection. Biomedical Signal Processing and Control 82 (2023), 104561.
[51]
Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, and Liang Huang. 2022. PaddleSpeech: An easy-to-use all-in-one speech toolkit. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations. 114–123.
[52]
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, and Furu Wei2022. WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing 16, 6 (2022), 1505–1518.
[53]
Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, WenYong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, and Qun Liu. 2019. NEZHA: Neural contextualized representation for Chinese language understanding. arXiv abs/1909.00204 (2019).
[54]
Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang, and Xing Xie. 2021. Fastformer: Additive attention can be all you need. arXiv preprint arXiv:2108.09084 (2021).
[55]
Jiudong Yang, Peiying Wang, Yi Zhu, Mingchao Feng, Meng Chen, and Xiaodong He. 2022. Gated multimodal fusion with contrastive learning for turn-taking prediction in human-robot dialogue. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’22). IEEE, 7747–7751.
[56]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8759–8768.
[57]
Mingxing Tan, Ruoming Pang, and Quoc V. Le. 2020. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10781–10790.
[58]
Hanshu Cai, Zhenqin Yuan, Yiwen Gao, Shuting Sun, Na Li, Fuze Tian, Han Xiao, Jianxiu Li, Zhengwu Yang, Xiaowei Li, Qinglin Zhao, Zhenyu Liu, Zhijun Yao, Minqiang Yang, Hong Peng, Jing Zhu, Xiaowei Zhang, Guoping Gao, Fang Zheng, Rui Li, Zhihua Guo, Rong Ma, Jing Yang, Lan Zhang, Xiping Hu, Yumin Li, and Bin Hu. 2022. A multi-modal open dataset for mental-disorder analysis. Scientific Data 9, 1 (2022), 178.
[59]
Yujuan Xing, Zhenyu Liu, Gang Li, ZhiJie Ding, and Bin Hu. 2022. 2-level hierarchical depression recognition method based on task-stimulated and integrated speech features. Biomedical Signal Processing and Control 72 (2022), 103287.
[60]
Anmol Chaudhary, Kuldeep Singh Chouhan, Jyoti Gajrani, and Bhavna Sharma. 2020. Deep learning with PyTorch. In Machine Learning and Deep Learning in Real-Time Applications. IGI Global, 61–95.
[61]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[62]
Jiaxin Ma, Hao Tang, Wei-Long Zheng, and Bao-Liang Lu. 2019. Emotion recognition using multimodal residual LSTM network. In Proceedings of the 27th ACM International Conference on Multimedia. 176–183.
[63]
Bubai Maji, Monorama Swain, Rajlakshmi Guha, and Aurobinda Routray. 2023. Multimodal emotion recognition based on deep temporal features using cross-modal transformer and self-attention. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’23). IEEE, 1–5.
[64]
Hao Sun, Hongyi Wang, Jiaqing Liu, Yen-Wei Chen, and Lanfen Lin. 2022. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation. In Proceedings of the 30th ACM International Conference on Multimedia. 3722–3729.
[65]
Ziyu Ma, Fuyan Ma, Bin Sun, and Shutao Li. 2021. Hybrid multimodal fusion for dimensional emotion recognition. In Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge. 29–36.
[66]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).
[67]
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2018. Efficient low-rank multimodal fusion with modality-specific factors. arXiv preprint arXiv:1806.00064 (2018).
[68]
R. Gnana Praveen, Patrick Cardinal, and Eric Granger. 2023. Audio-visual fusion for emotion recognition in the valence-arousal space using joint cross-attention. IEEE Transactions on Biometrics, Behavior, and Identity Science 5, 3 (2023), 360–373.
[69]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.
[70]
Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
[71]
Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In-So Kweon. 2018. BAM: Bottleneck attention module. In Proceedings of the British Machine Vision Conference (BMVC ’18).
[72]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 3–19.
[73]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.

Index Terms

  1. Multi Fine-Grained Fusion Network for Depression Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 8
    August 2024
    726 pages
    EISSN:1551-6865
    DOI:10.1145/3618074
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 June 2024
    Online AM: 01 June 2024
    Accepted: 12 May 2024
    Revised: 02 May 2024
    Received: 14 September 2023
    Published in TOMM Volume 20, Issue 8

    Check for updates

    Author Tags

    1. Depression detection
    2. interview
    3. Multi Fine-Grained Fusion Network (MFFNet)
    4. Multi-Scale Fastformer (MSfastformer)
    5. Recurrent Pyramid Model (RPM)

    Qualifiers

    • Research-article

    Funding Sources

    • National Key Research and Development Program of China
    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Central Universities

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 256
      Total Downloads
    • Downloads (Last 12 months)256
    • Downloads (Last 6 weeks)79
    Reflects downloads up to 25 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media