Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Social media has become a platform for people to express emotions, including happiness and sadness, to their followers. Major Depressive Disorder (MDD), a common mental health disorder, is characterized by sadness and loss of interest in activities, leading to physical, emotional, cognitive, and social suicidal thoughts. Early detection and intervention of MDD are crucial for effective management and treatment. The study investigates the potential of detecting MDD on social media platforms like Facebook, Twitter and Reddit by analyzing text using advanced machine learning and deep learning algorithms. In order to collect dataset, we employed both web scraping techniques and publically existing datasets (Twitter, Reddit) that are available on the Kaggle website. Natural language processing (NLP) techniques are applied to preprocess and excerpt meaningful features from the textual data. Several machine learning algorithms are employed to make prophetic models for MDD discovery grounded on verbal patterns, sentiment analysis, and verbal labels associated with depressive symptoms. We analyse our models using three datasets. The two online datasets for which the LSTM algorithm performs best are Reddit with 93.72% accuracy, Twitter with 99.85% accuracy, and our dataset which is extracted using web scraping technologies from Reddit gets 96.47% accuracy utilizing Stacking ensemble. The model’s performance is thoroughly assessed using a variety of criteria, such as accuracy, precision, recall, and F1-score. Additionally, We find an approach with a more effective ML framework for enhancing MDD detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Aizawa A (2003) An information-theoretic perspective of TF-IDF measures. Inf Process Manag 39(1):45–65

    Article  Google Scholar 

  • Akhter A, Acharjee UK, Talukder MA, Islam MM, Uddin MA (2023) A robust hybrid machine learning model for Bengali cyber bullying detection in social media. Nat Lang Process J 4:100027

    Article  Google Scholar 

  • Aladağ AE, Muderrisoglu S, Akbas NB, Zahmacioglu O, Bingol HO (2018) Detecting suicidal ideation on forums: proof-of-concept study. J Med Internet Res 20(6):e9840

    Article  Google Scholar 

  • Aldarwish MM, Ahmad HF (2017) Predicting depression levels using social media posts. In: 2017 IEEE 13th international symposium on autonomous decentralized system (ISADS), pp. 277–280. IEEE

  • Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:1–74

    Article  Google Scholar 

  • Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5):676

    Article  Google Scholar 

  • Baumgartner J, Zannettou S, Keegan B, Squire M, Blackburn J (2020) The pushshift reddit dataset. In: proceedings of the international AAAI conference on web and social media. vol. 14, pp.830–839

  • Benhardus J, Kalita J (2013) Streaming trend detection in twitter. Int J Web Based Commun 9(1):122–139

    Article  Google Scholar 

  • Bentéjac C, Csörgő A, Martínez-Muñoz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54:1937–1967

    Article  Google Scholar 

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281

    MathSciNet  Google Scholar 

  • Boateng EY, Abaye DA (2019) A review of the logistic regression model with emphasis on medical research. J Data Anal Inf Process 7(4):190–207

    Google Scholar 

  • Boinepelli S, Raha T, Abburi H, Parikh P, Chhaya N, Varma V (2022) Leveraging mental health forums for user-level depression detection on social media. In: proceedings of the thirteenth language resources and evaluation conference, pp. 5418–5427

  • Chiong R, Budhi GS, Dhakal S, Chiong F (2021) A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput Biol Med 135:104499

    Article  Google Scholar 

  • Depression W (2017) Other common mental disorders: global health estimates. World Health Organization, Geneva, p 24

    Google Scholar 

  • Desu V, Komati N, Lingamaneni S, Shaik F (2022) Suicide and depression detection in social media forums. In: smart intelligent computing and applications, Vol. 2 proceedings of fifth international conference on smart computing and informatics (SCI 2021), pp. 263–270. Springer

  • Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA (2018) Facebook language predicts depression in medical records. Proc Natl Acad Sci 115(44):11203–11208

    Article  Google Scholar 

  • Fatima I, Abbasi BUD, Khan S, Al-Saeed M, Ahmad HF, Mumtaz R (2019) Prediction of postpartum depression using machine learning techniques from social media text. Expert Syst 36(4):e12409

    Article  Google Scholar 

  • Fontanella C (2021) How to get, use, & benefit from twitter’s api. HubSpot Blog| marketing, sales, agency, and customer success content. Available online: https://blog.hubspot.com/website/how-to-use-twitter-api#: :text=The%20Twitter%20API%20lets%20you,stands%20for%20Application%20Programming%20Interface (accessed on 5 February 2021)

  • Gaikar M, Chavan J, Indore K, Shedge R (2019) Depression detection and prevention system by analysing tweets. In: proceedings 2019: conference on technologies for future cities (CTFC)

  • Govindasamy KA, Palanichamy N (2021) Depression detection using machine learning techniques on twitter data. In: 2021 5th international conference on intelligent computing and control systems (ICICCS), pp. 960–966. IEEE

  • Kaur P (2022) Sentiment analysis using web scraping for live news data with machine learning algorithms. Mater Today Proc 65:3333–3341

    Article  Google Scholar 

  • Kessler RC, Bromet EJ (2013) The epidemiology of depression across cultures. Annu Rev Public Health 34:119–138

    Article  Google Scholar 

  • Khafaga DS, Auvdaiappan M, Deepa K, Abouhawwash M, Karim FK (2023) Deep learning for depression detection using twitter data. Intell Autom Soft Comput 36(2):1301–1313

    Article  Google Scholar 

  • Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Scientif Rep 10(1):11846

    Article  Google Scholar 

  • Komati N (2021) Suicide and depression detection. Kaggle Dataset

  • Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 625–633

  • Mishra V, Garg T (2018) A systematic study on predicting depression using text analytics. J Fundam Appl Sci. vol. 10(2)

  • Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222

    Article  Google Scholar 

  • Polikar R (2012) Ensemble learning. Ensemble machine learning: Methods and applications, pp. 1–34

  • Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenom 404:132306

    Article  MathSciNet  Google Scholar 

  • Shinigami (2022) Sentimental analysis for tweets dataset. Available Kaggle Website. Sentimental Analysis for Tweets dataset, Last Access: 27-09-2023

  • Song Y-Y, Ying L (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130

    Google Scholar 

  • Staudemeyer RC, Morris ER (2019) Understanding lstm–a tutorial into long short-term memory recurrent neural networks. arXiv preprint arXiv:1909.09586

  • Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7:44883–44893

    Article  Google Scholar 

  • Uddin MA, Islam MM, Talukder MA, Hossain MAA, Akhter A, Aryal S, Muntaha M (2023) Machine learning based diabetes detection model for false negative reduction. Biomed Mater Devices 2:1–17

    Google Scholar 

  • Uddin MZ, Dysthe KK, Følstad A, Brandtzaeg PB (2022) Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Comput Appl 34(1):721–744

    Article  Google Scholar 

  • Vasilev I, Slater D, Spacagna G, Roelants P, Zocca V (2019) Python deep learning: exploring deep learning techniques and neural network architectures with Pytorch, Keras, and TensorFlow. Packt Publishing Ltd.

  • Verywell Mind (March 17, 2024) Common types of depression. Verywell Mind

  • Weinberger AH, Gbedemah M, Martinez AM, Nash D, Galea S, Goodwin RD (2018) Trends in depression prevalence in the USA from 2005 to 2015: widening disparities in vulnerable groups. Psychol Med 48(8):1308–1315

    Article  Google Scholar 

  • Zhang H (2004) The optimality of naive Bayes. Aa 1(2):3

    Google Scholar 

Download references

Acknowledgements

This project was supported by UGC Research Grant, Bangladesh, for the project “Design and development of a robust cyberbullying detection framework using Machine Learning”. fiscal year 2020-21 (Ref:37.01.0000.073.06.048.22.856)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Manowarul Islam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hridoy, M.T.A., Saha, S.R., Islam, M.M. et al. Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts. Soc. Netw. Anal. Min. 14, 239 (2024). https://doi.org/10.1007/s13278-024-01392-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-024-01392-w

Keywords