Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech

Zhao, Ziping; Liu, Shizhao; Niu, Mingyue; Wang, Haishuai; Schuller, Björn W.

doi:10.1007/978-3-031-78201-5_26

Ziping Zhao¹³,
Shizhao Liu¹³,
Mingyue Niu¹⁴,
Haishuai Wang¹⁵ &
…
Björn W. Schuller^16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15313))

Included in the following conference series:

International Conference on Pattern Recognition

125 Accesses

Abstract

Automatic depression level estimation from speech is currently an active research topic in the field of computational emotion recognition. One symptom commonly exhibited by patients with depression is erratic speech volume; thus, patients’ voices can be used as a bio-signature to identify their level of depression. However, speech signals have time-frequency properties; different frequencies and different timestamps contribute to depression detection in different ways. Accordingly, we design a Coordinate Channel Attention (CCA) block for differentiating tensor information with different contributions. We use a dense block to extract profound speech features with the above-mentioned blocks to form our proposed Dense Coordinate Channel Attention Network (DCCANet). Subsequently, a vectorization block is utilized to fuse the high-dimensional information. We split the original long speech into short audio segments of equal length, then feed these short segments into the network after feature extraction to determine BDI-II scores. Ultimately, the mean of the scores is used as the individual’s depression level. Experiments on both the AVEC2013 and AVEC2014 datasets prove the effectiveness of DCCANet, which outperforms several existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep learning for Depression Recognition from Speech

Article 26 January 2023

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Article Open access 25 April 2024

Depression Classification Using Token Merging-Based Speech Spectrotemporal Transformer

References

Cai, C., Niu, M., Liu, B., Tao, J., Liu, X.: TDCA-Net: time-domain channel attention network for depression detection. In: Proceedings of the INTERSPEECH, pp. 2511–2515. Brno, Czechia (2021)
Google Scholar
Cummins, N., Sethu, V., Epps, J., Williamson, J.R., Quatieri, T.F., Krajewski, J.: Generalized two-stage rank regression framework for depression score prediction from speech. IEEE Trans. Affect. Comput. 11(2), 272–283 (2020)
Article Google Scholar
Dietrich, M., Abbott, K.V., Gartner-Schmidt, J., Rosen, C.A.: The frequency of perceived stress, anxiety, and depression in patients with common pathologies affecting voice. J. Voice 22(4), 472–488 (2008)
Article Google Scholar
Dong, Y., Yang, X.: A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 441, 279–290 (2021)
Article Google Scholar
Fan, C., Lv, Z., Pei, S., Niu, M.: CSENet: complex squeeze-and-excitation network for speech depression level prediction. In: Proceedings of the 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 546–550. Singapore (2022)
Google Scholar
Fu, X., Li, J., Liu, H., Zhang, M., Xin, G.: Audio signal-based depression level prediction combining temporal and spectral features. In: Proceedings of the 26th International Conference on Pattern Recognition (ICPR), pp. 359–365. MontrÃal, Canada (2022)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. Salt Lake City, Utah, USA (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708. Honolulu, Hawaii, USA (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2017)
Li, Y., Niu, M., Zhao, Z., Tao, J.: Automatic depression level assessment from speech by long-term global information embedding. In: Proceedings of the 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8507–8511. Singapore (2022)
Google Scholar
Lin, S., Zeng, Y., Gong, Y.: Learning of time-frequency attention mechanism for automatic modulation recognition. IEEE Wireless Commun. Lett. 11(4), 707–711 (2022)
Article Google Scholar
Liu, Q., He, H., Yang, J., Feng, X., Zhao, F., Lyu, J.: Changes in the global burden of depression from 1990 to 2017: findings from the global burden of disease study. J. Psychiatr. Res. 126, 134–140 (2020)
Article Google Scholar
Mundt, J.C., Snyder, P.J., Cannizzaro, M.S., Chappie, K., Geralts, D.S.: Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J. Neurolinguistics 20(1), 50–64 (2007)
Article Google Scholar
Mundt, J.C., Vogel, A.P., Feltner, D.E., Lenderking, W.R.: Vocal acoustic biomarkers of depression severity and treatment response. Biol. Psychiat. 72(7), 580–587 (2012)
Article Google Scholar
Niu, M., Liu, B., Tao, J., Li, Q.: A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 450, 208–218 (2021)
Article Google Scholar
Niu, M., Tao, J., Liu, B., Fan, C.: Automatic depression level detection via LP-norm pooling. In: Proceedings of the INTERSPEECH, pp. 4559–4563. Graz, Austria (2019)
Google Scholar
Niu, M., Tao, J., Liu, B., Huang, J., Lian, Z.: Multimodal spatiotemporal representation for automatic depression level detection. IEEE Trans. Affect. Comput. 14(1), 294–307 (2020)
Article Google Scholar
Organization, W.H., et al.: Depression and other common mental disorders: global health estimates. Tech. rep., World Health Organization (2017)
Google Scholar
Paykel, E.S.: Basic concepts of depression. Dialogues Clin. Neurosci. 10(3), 279–289 (2022)
Article Google Scholar
Skaik, R.S., Inkpen, D.: Predicting depression in Canada by automatic filling of beck’s depression inventory questionnaire. IEEE Access 10, 102033–102047 (2022)
Article Google Scholar
Stasak, B., Epps, J., Goecke, R.: An investigation of linguistic stress and articulatory vowel characteristics for automatic depression classification. Comput. Speech Lang. 53, 140–155 (2019)
Article Google Scholar
Takaya, T., et al.: Major depressive disorder discrimination using vocal acoustic features. J. Affect. Disord. 225, 214–220 (2018)
Article Google Scholar
Valstar, M., et al.: AVEC 2014: 3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (AVEC), pp. 3–10. Orlando, Florida, USA (2014)
Google Scholar
Valstar, M., et al.: AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge (AVEC), pp. 3–10. Barcelona, Spain (2013)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the 33rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11534–11542. Seattle, USA (2020)
Google Scholar
Wang, Y., Gorenstein, C.: Psychometric properties of the beck depression inventory-ii: a comprehensive review. Braz. J. Psychiatry 35(4), 416–431 (2013)
Article Google Scholar
Wang, Y., et al.: Transformer-based acoustic modeling for hybrid speech recognition. In: Proceedings of the 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6874–6878. Virtual, Barcelona (2020)
Google Scholar
Zhao, Z., et al.: Hybrid network feature extraction for depression assessment from speech. In: Proceedings of the INTERSPEECH, pp. 4956–4960. Shanghai, China (2020)
Google Scholar

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (No. 62071330) and the Open Project Program of the State Key Laboratory of Multimodal Artificial Intelligence System (No. 202200012).

Author information

Authors and Affiliations

College of Computer and Information Engineering, Tianjin Normal University, Tianjin, 300387, China
Ziping Zhao & Shizhao Liu
School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, China
Mingyue Niu
College of Computer Science, Zhejiang University, Hangzhou, 310058, China
Haishuai Wang
Chair of Health Informatics, Technical University of Munich (TUM), Munich, Germany
Björn W. Schuller
GLAM - Group on Language, Audio, and Music, Imperial College London, London, UK
Björn W. Schuller

Authors

Ziping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shizhao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mingyue Niu
View author publications
You can also search for this author in PubMed Google Scholar
Haishuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Björn W. Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziping Zhao .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
IIT Bombay, Powai, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Liu, S., Niu, M., Wang, H., Schuller, B.W. (2025). Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15313. Springer, Cham. https://doi.org/10.1007/978-3-031-78201-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-78201-5_26
Published: 02 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78200-8
Online ISBN: 978-3-031-78201-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning for Depression Recognition from Speech

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Depression Classification Using Token Merging-Based Speech Spectrotemporal Transformer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning for Depression Recognition from Speech

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Depression Classification Using Token Merging-Based Speech Spectrotemporal Transformer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation