Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15313))

Included in the following conference series:

  • 125 Accesses

Abstract

Automatic depression level estimation from speech is currently an active research topic in the field of computational emotion recognition. One symptom commonly exhibited by patients with depression is erratic speech volume; thus, patients’ voices can be used as a bio-signature to identify their level of depression. However, speech signals have time-frequency properties; different frequencies and different timestamps contribute to depression detection in different ways. Accordingly, we design a Coordinate Channel Attention (CCA) block for differentiating tensor information with different contributions. We use a dense block to extract profound speech features with the above-mentioned blocks to form our proposed Dense Coordinate Channel Attention Network (DCCANet). Subsequently, a vectorization block is utilized to fuse the high-dimensional information. We split the original long speech into short audio segments of equal length, then feed these short segments into the network after feature extraction to determine BDI-II scores. Ultimately, the mean of the scores is used as the individual’s depression level. Experiments on both the AVEC2013 and AVEC2014 datasets prove the effectiveness of DCCANet, which outperforms several existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cai, C., Niu, M., Liu, B., Tao, J., Liu, X.: TDCA-Net: time-domain channel attention network for depression detection. In: Proceedings of the INTERSPEECH, pp. 2511–2515. Brno, Czechia (2021)

    Google Scholar 

  2. Cummins, N., Sethu, V., Epps, J., Williamson, J.R., Quatieri, T.F., Krajewski, J.: Generalized two-stage rank regression framework for depression score prediction from speech. IEEE Trans. Affect. Comput. 11(2), 272–283 (2020)

    Article  Google Scholar 

  3. Dietrich, M., Abbott, K.V., Gartner-Schmidt, J., Rosen, C.A.: The frequency of perceived stress, anxiety, and depression in patients with common pathologies affecting voice. J. Voice 22(4), 472–488 (2008)

    Article  Google Scholar 

  4. Dong, Y., Yang, X.: A hierarchical depression detection model based on vocal and emotional cues. Neurocomputing 441, 279–290 (2021)

    Article  Google Scholar 

  5. Fan, C., Lv, Z., Pei, S., Niu, M.: CSENet: complex squeeze-and-excitation network for speech depression level prediction. In: Proceedings of the 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 546–550. Singapore (2022)

    Google Scholar 

  6. Fu, X., Li, J., Liu, H., Zhang, M., Xin, G.: Audio signal-based depression level prediction combining temporal and spectral features. In: Proceedings of the 26th International Conference on Pattern Recognition (ICPR), pp. 359–365. MontrÃal, Canada (2022)

    Google Scholar 

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. Salt Lake City, Utah, USA (2018)

    Google Scholar 

  8. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708. Honolulu, Hawaii, USA (2017)

    Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2017)

  10. Li, Y., Niu, M., Zhao, Z., Tao, J.: Automatic depression level assessment from speech by long-term global information embedding. In: Proceedings of the 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8507–8511. Singapore (2022)

    Google Scholar 

  11. Lin, S., Zeng, Y., Gong, Y.: Learning of time-frequency attention mechanism for automatic modulation recognition. IEEE Wireless Commun. Lett. 11(4), 707–711 (2022)

    Article  Google Scholar 

  12. Liu, Q., He, H., Yang, J., Feng, X., Zhao, F., Lyu, J.: Changes in the global burden of depression from 1990 to 2017: findings from the global burden of disease study. J. Psychiatr. Res. 126, 134–140 (2020)

    Article  Google Scholar 

  13. Mundt, J.C., Snyder, P.J., Cannizzaro, M.S., Chappie, K., Geralts, D.S.: Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J. Neurolinguistics 20(1), 50–64 (2007)

    Article  Google Scholar 

  14. Mundt, J.C., Vogel, A.P., Feltner, D.E., Lenderking, W.R.: Vocal acoustic biomarkers of depression severity and treatment response. Biol. Psychiat. 72(7), 580–587 (2012)

    Article  Google Scholar 

  15. Niu, M., Liu, B., Tao, J., Li, Q.: A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 450, 208–218 (2021)

    Article  Google Scholar 

  16. Niu, M., Tao, J., Liu, B., Fan, C.: Automatic depression level detection via LP-norm pooling. In: Proceedings of the INTERSPEECH, pp. 4559–4563. Graz, Austria (2019)

    Google Scholar 

  17. Niu, M., Tao, J., Liu, B., Huang, J., Lian, Z.: Multimodal spatiotemporal representation for automatic depression level detection. IEEE Trans. Affect. Comput. 14(1), 294–307 (2020)

    Article  Google Scholar 

  18. Organization, W.H., et al.: Depression and other common mental disorders: global health estimates. Tech. rep., World Health Organization (2017)

    Google Scholar 

  19. Paykel, E.S.: Basic concepts of depression. Dialogues Clin. Neurosci. 10(3), 279–289 (2022)

    Article  Google Scholar 

  20. Skaik, R.S., Inkpen, D.: Predicting depression in Canada by automatic filling of beck’s depression inventory questionnaire. IEEE Access 10, 102033–102047 (2022)

    Article  Google Scholar 

  21. Stasak, B., Epps, J., Goecke, R.: An investigation of linguistic stress and articulatory vowel characteristics for automatic depression classification. Comput. Speech Lang. 53, 140–155 (2019)

    Article  Google Scholar 

  22. Takaya, T., et al.: Major depressive disorder discrimination using vocal acoustic features. J. Affect. Disord. 225, 214–220 (2018)

    Article  Google Scholar 

  23. Valstar, M., et al.: AVEC 2014: 3D dimensional affect and depression recognition challenge. In: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (AVEC), pp. 3–10. Orlando, Florida, USA (2014)

    Google Scholar 

  24. Valstar, M., et al.: AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge (AVEC), pp. 3–10. Barcelona, Spain (2013)

    Google Scholar 

  25. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the 33rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11534–11542. Seattle, USA (2020)

    Google Scholar 

  26. Wang, Y., Gorenstein, C.: Psychometric properties of the beck depression inventory-ii: a comprehensive review. Braz. J. Psychiatry 35(4), 416–431 (2013)

    Article  Google Scholar 

  27. Wang, Y., et al.: Transformer-based acoustic modeling for hybrid speech recognition. In: Proceedings of the 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6874–6878. Virtual, Barcelona (2020)

    Google Scholar 

  28. Zhao, Z., et al.: Hybrid network feature extraction for depression assessment from speech. In: Proceedings of the INTERSPEECH, pp. 4956–4960. Shanghai, China (2020)

    Google Scholar 

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (No. 62071330) and the Open Project Program of the State Key Laboratory of Multimodal Artificial Intelligence System (No. 202200012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziping Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Z., Liu, S., Niu, M., Wang, H., Schuller, B.W. (2025). Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15313. Springer, Cham. https://doi.org/10.1007/978-3-031-78201-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78201-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78200-8

  • Online ISBN: 978-3-031-78201-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics