Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts

Hridoy, Md. Tanvir Ahammed; Saha, Susmita Rani; Islam, Md Manowarul; Uddin, Md Ashraf; Mahmud, Md. Zulfiker

doi:10.1007/s13278-024-01392-w

Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts

Original Article
Published: 26 December 2024

Volume 14, article number 239, (2024)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Md. Tanvir Ahammed Hridoy¹,
Susmita Rani Saha¹,
Md Manowarul Islam¹,
Md Ashraf Uddin¹ &
…
Md. Zulfiker Mahmud¹

73 Accesses
Explore all metrics

Abstract

Social media has become a platform for people to express emotions, including happiness and sadness, to their followers. Major Depressive Disorder (MDD), a common mental health disorder, is characterized by sadness and loss of interest in activities, leading to physical, emotional, cognitive, and social suicidal thoughts. Early detection and intervention of MDD are crucial for effective management and treatment. The study investigates the potential of detecting MDD on social media platforms like Facebook, Twitter and Reddit by analyzing text using advanced machine learning and deep learning algorithms. In order to collect dataset, we employed both web scraping techniques and publically existing datasets (Twitter, Reddit) that are available on the Kaggle website. Natural language processing (NLP) techniques are applied to preprocess and excerpt meaningful features from the textual data. Several machine learning algorithms are employed to make prophetic models for MDD discovery grounded on verbal patterns, sentiment analysis, and verbal labels associated with depressive symptoms. We analyse our models using three datasets. The two online datasets for which the LSTM algorithm performs best are Reddit with 93.72% accuracy, Twitter with 99.85% accuracy, and our dataset which is extracted using web scraping technologies from Reddit gets 96.47% accuracy utilizing Stacking ensemble. The model’s performance is thoroughly assessed using a variety of criteria, such as accuracy, precision, recall, and F1-score. Additionally, We find an approach with a more effective ML framework for enhancing MDD detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble approach to detect depression from social media platform: E-CLS

Article 06 February 2024

Predicting Suicidal Ideation on Reddit: A Precise Machine Learning Classifier for Mental Health Support

Detecting Depression on Social Platforms Using Machine Learning

References

Aizawa A (2003) An information-theoretic perspective of TF-IDF measures. Inf Process Manag 39(1):45–65
Article Google Scholar
Akhter A, Acharjee UK, Talukder MA, Islam MM, Uddin MA (2023) A robust hybrid machine learning model for Bengali cyber bullying detection in social media. Nat Lang Process J 4:100027
Article Google Scholar
Aladağ AE, Muderrisoglu S, Akbas NB, Zahmacioglu O, Bingol HO (2018) Detecting suicidal ideation on forums: proof-of-concept study. J Med Internet Res 20(6):e9840
Article Google Scholar
Aldarwish MM, Ahmad HF (2017) Predicting depression levels using social media posts. In: 2017 IEEE 13th international symposium on autonomous decentralized system (ISADS), pp. 277–280. IEEE
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:1–74
Article Google Scholar
Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5):676
Article Google Scholar
Baumgartner J, Zannettou S, Keegan B, Squire M, Blackburn J (2020) The pushshift reddit dataset. In: proceedings of the international AAAI conference on web and social media. vol. 14, pp.830–839
Benhardus J, Kalita J (2013) Streaming trend detection in twitter. Int J Web Based Commun 9(1):122–139
Article Google Scholar
Bentéjac C, Csörgő A, Martínez-Muñoz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54:1937–1967
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281
MathSciNet Google Scholar
Boateng EY, Abaye DA (2019) A review of the logistic regression model with emphasis on medical research. J Data Anal Inf Process 7(4):190–207
Google Scholar
Boinepelli S, Raha T, Abburi H, Parikh P, Chhaya N, Varma V (2022) Leveraging mental health forums for user-level depression detection on social media. In: proceedings of the thirteenth language resources and evaluation conference, pp. 5418–5427
Chiong R, Budhi GS, Dhakal S, Chiong F (2021) A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput Biol Med 135:104499
Article Google Scholar
Depression W (2017) Other common mental disorders: global health estimates. World Health Organization, Geneva, p 24
Google Scholar
Desu V, Komati N, Lingamaneni S, Shaik F (2022) Suicide and depression detection in social media forums. In: smart intelligent computing and applications, Vol. 2 proceedings of fifth international conference on smart computing and informatics (SCI 2021), pp. 263–270. Springer
Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA (2018) Facebook language predicts depression in medical records. Proc Natl Acad Sci 115(44):11203–11208
Article Google Scholar
Fatima I, Abbasi BUD, Khan S, Al-Saeed M, Ahmad HF, Mumtaz R (2019) Prediction of postpartum depression using machine learning techniques from social media text. Expert Syst 36(4):e12409
Article Google Scholar
Fontanella C (2021) How to get, use, & benefit from twitter’s api. HubSpot Blog| marketing, sales, agency, and customer success content. Available online: https://blog.hubspot.com/website/how-to-use-twitter-api#: :text=The%20Twitter%20API%20lets%20you,stands%20for%20Application%20Programming%20Interface (accessed on 5 February 2021)
Gaikar M, Chavan J, Indore K, Shedge R (2019) Depression detection and prevention system by analysing tweets. In: proceedings 2019: conference on technologies for future cities (CTFC)
Govindasamy KA, Palanichamy N (2021) Depression detection using machine learning techniques on twitter data. In: 2021 5th international conference on intelligent computing and control systems (ICICCS), pp. 960–966. IEEE
Kaur P (2022) Sentiment analysis using web scraping for live news data with machine learning algorithms. Mater Today Proc 65:3333–3341
Article Google Scholar
Kessler RC, Bromet EJ (2013) The epidemiology of depression across cultures. Annu Rev Public Health 34:119–138
Article Google Scholar
Khafaga DS, Auvdaiappan M, Deepa K, Abouhawwash M, Karim FK (2023) Deep learning for depression detection using twitter data. Intell Autom Soft Comput 36(2):1301–1313
Article Google Scholar
Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Scientif Rep 10(1):11846
Article Google Scholar
Komati N (2021) Suicide and depression detection. Kaggle Dataset
Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 625–633
Mishra V, Garg T (2018) A systematic study on predicting depression using text analytics. J Fundam Appl Sci. vol. 10(2)
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222
Article Google Scholar
Polikar R (2012) Ensemble learning. Ensemble machine learning: Methods and applications, pp. 1–34
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenom 404:132306
Article MathSciNet Google Scholar
Shinigami (2022) Sentimental analysis for tweets dataset. Available Kaggle Website. Sentimental Analysis for Tweets dataset, Last Access: 27-09-2023
Song Y-Y, Ying L (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130
Google Scholar
Staudemeyer RC, Morris ER (2019) Understanding lstm–a tutorial into long short-term memory recurrent neural networks. arXiv preprint arXiv:1909.09586
Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7:44883–44893
Article Google Scholar
Uddin MA, Islam MM, Talukder MA, Hossain MAA, Akhter A, Aryal S, Muntaha M (2023) Machine learning based diabetes detection model for false negative reduction. Biomed Mater Devices 2:1–17
Google Scholar
Uddin MZ, Dysthe KK, Følstad A, Brandtzaeg PB (2022) Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Comput Appl 34(1):721–744
Article Google Scholar
Vasilev I, Slater D, Spacagna G, Roelants P, Zocca V (2019) Python deep learning: exploring deep learning techniques and neural network architectures with Pytorch, Keras, and TensorFlow. Packt Publishing Ltd.
Verywell Mind (March 17, 2024) Common types of depression. Verywell Mind
Weinberger AH, Gbedemah M, Martinez AM, Nash D, Galea S, Goodwin RD (2018) Trends in depression prevalence in the USA from 2005 to 2015: widening disparities in vulnerable groups. Psychol Med 48(8):1308–1315
Article Google Scholar
Zhang H (2004) The optimality of naive Bayes. Aa 1(2):3
Google Scholar

Download references

Acknowledgements

This project was supported by UGC Research Grant, Bangladesh, for the project “Design and development of a robust cyberbullying detection framework using Machine Learning”. fiscal year 2020-21 (Ref:37.01.0000.073.06.048.22.856)

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jagannath University, Dhaka, Bangladesh
Md. Tanvir Ahammed Hridoy, Susmita Rani Saha, Md Manowarul Islam, Md Ashraf Uddin & Md. Zulfiker Mahmud

Authors

Md. Tanvir Ahammed Hridoy
View author publications
You can also search for this author in PubMed Google Scholar
Susmita Rani Saha
View author publications
You can also search for this author in PubMed Google Scholar
Md Manowarul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Md Ashraf Uddin
View author publications
You can also search for this author in PubMed Google Scholar
Md. Zulfiker Mahmud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Manowarul Islam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hridoy, M.T.A., Saha, S.R., Islam, M.M. et al. Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts. Soc. Netw. Anal. Min. 14, 239 (2024). https://doi.org/10.1007/s13278-024-01392-w

Download citation

Received: 29 July 2024
Revised: 29 October 2024
Accepted: 22 November 2024
Published: 26 December 2024
DOI: https://doi.org/10.1007/s13278-024-01392-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An ensemble approach to detect depression from social media platform: E-CLS

Predicting Suicidal Ideation on Reddit: A Precise Machine Learning Classifier for Mental Health Support

Detecting Depression on Social Platforms Using Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An ensemble approach to detect depression from social media platform: E-CLS

Predicting Suicidal Ideation on Reddit: A Precise Machine Learning Classifier for Mental Health Support

Detecting Depression on Social Platforms Using Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation