Abstract
Social media has become a platform for people to express emotions, including happiness and sadness, to their followers. Major Depressive Disorder (MDD), a common mental health disorder, is characterized by sadness and loss of interest in activities, leading to physical, emotional, cognitive, and social suicidal thoughts. Early detection and intervention of MDD are crucial for effective management and treatment. The study investigates the potential of detecting MDD on social media platforms like Facebook, Twitter and Reddit by analyzing text using advanced machine learning and deep learning algorithms. In order to collect dataset, we employed both web scraping techniques and publically existing datasets (Twitter, Reddit) that are available on the Kaggle website. Natural language processing (NLP) techniques are applied to preprocess and excerpt meaningful features from the textual data. Several machine learning algorithms are employed to make prophetic models for MDD discovery grounded on verbal patterns, sentiment analysis, and verbal labels associated with depressive symptoms. We analyse our models using three datasets. The two online datasets for which the LSTM algorithm performs best are Reddit with 93.72% accuracy, Twitter with 99.85% accuracy, and our dataset which is extracted using web scraping technologies from Reddit gets 96.47% accuracy utilizing Stacking ensemble. The model’s performance is thoroughly assessed using a variety of criteria, such as accuracy, precision, recall, and F1-score. Additionally, We find an approach with a more effective ML framework for enhancing MDD detection.
Similar content being viewed by others
References
Aizawa A (2003) An information-theoretic perspective of TF-IDF measures. Inf Process Manag 39(1):45–65
Akhter A, Acharjee UK, Talukder MA, Islam MM, Uddin MA (2023) A robust hybrid machine learning model for Bengali cyber bullying detection in social media. Nat Lang Process J 4:100027
Aladağ AE, Muderrisoglu S, Akbas NB, Zahmacioglu O, Bingol HO (2018) Detecting suicidal ideation on forums: proof-of-concept study. J Med Internet Res 20(6):e9840
Aldarwish MM, Ahmad HF (2017) Predicting depression levels using social media posts. In: 2017 IEEE 13th international symposium on autonomous decentralized system (ISADS), pp. 277–280. IEEE
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:1–74
Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S, Uddin M (2022) Deep learning for depression detection from textual data. Electronics 11(5):676
Baumgartner J, Zannettou S, Keegan B, Squire M, Blackburn J (2020) The pushshift reddit dataset. In: proceedings of the international AAAI conference on web and social media. vol. 14, pp.830–839
Benhardus J, Kalita J (2013) Streaming trend detection in twitter. Int J Web Based Commun 9(1):122–139
Bentéjac C, Csörgő A, Martínez-Muñoz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54:1937–1967
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281
Boateng EY, Abaye DA (2019) A review of the logistic regression model with emphasis on medical research. J Data Anal Inf Process 7(4):190–207
Boinepelli S, Raha T, Abburi H, Parikh P, Chhaya N, Varma V (2022) Leveraging mental health forums for user-level depression detection on social media. In: proceedings of the thirteenth language resources and evaluation conference, pp. 5418–5427
Chiong R, Budhi GS, Dhakal S, Chiong F (2021) A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput Biol Med 135:104499
Depression W (2017) Other common mental disorders: global health estimates. World Health Organization, Geneva, p 24
Desu V, Komati N, Lingamaneni S, Shaik F (2022) Suicide and depression detection in social media forums. In: smart intelligent computing and applications, Vol. 2 proceedings of fifth international conference on smart computing and informatics (SCI 2021), pp. 263–270. Springer
Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA (2018) Facebook language predicts depression in medical records. Proc Natl Acad Sci 115(44):11203–11208
Fatima I, Abbasi BUD, Khan S, Al-Saeed M, Ahmad HF, Mumtaz R (2019) Prediction of postpartum depression using machine learning techniques from social media text. Expert Syst 36(4):e12409
Fontanella C (2021) How to get, use, & benefit from twitter’s api. HubSpot Blog| marketing, sales, agency, and customer success content. Available online: https://blog.hubspot.com/website/how-to-use-twitter-api#: :text=The%20Twitter%20API%20lets%20you,stands%20for%20Application%20Programming%20Interface (accessed on 5 February 2021)
Gaikar M, Chavan J, Indore K, Shedge R (2019) Depression detection and prevention system by analysing tweets. In: proceedings 2019: conference on technologies for future cities (CTFC)
Govindasamy KA, Palanichamy N (2021) Depression detection using machine learning techniques on twitter data. In: 2021 5th international conference on intelligent computing and control systems (ICICCS), pp. 960–966. IEEE
Kaur P (2022) Sentiment analysis using web scraping for live news data with machine learning algorithms. Mater Today Proc 65:3333–3341
Kessler RC, Bromet EJ (2013) The epidemiology of depression across cultures. Annu Rev Public Health 34:119–138
Khafaga DS, Auvdaiappan M, Deepa K, Abouhawwash M, Karim FK (2023) Deep learning for depression detection using twitter data. Intell Autom Soft Comput 36(2):1301–1313
Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media. Scientif Rep 10(1):11846
Komati N (2021) Suicide and depression detection. Kaggle Dataset
Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 625–633
Mishra V, Garg T (2018) A systematic study on predicting depression using text analytics. J Fundam Appl Sci. vol. 10(2)
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222
Polikar R (2012) Ensemble learning. Ensemble machine learning: Methods and applications, pp. 1–34
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenom 404:132306
Shinigami (2022) Sentimental analysis for tweets dataset. Available Kaggle Website. Sentimental Analysis for Tweets dataset, Last Access: 27-09-2023
Song Y-Y, Ying L (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130
Staudemeyer RC, Morris ER (2019) Understanding lstm–a tutorial into long short-term memory recurrent neural networks. arXiv preprint arXiv:1909.09586
Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7:44883–44893
Uddin MA, Islam MM, Talukder MA, Hossain MAA, Akhter A, Aryal S, Muntaha M (2023) Machine learning based diabetes detection model for false negative reduction. Biomed Mater Devices 2:1–17
Uddin MZ, Dysthe KK, Følstad A, Brandtzaeg PB (2022) Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Comput Appl 34(1):721–744
Vasilev I, Slater D, Spacagna G, Roelants P, Zocca V (2019) Python deep learning: exploring deep learning techniques and neural network architectures with Pytorch, Keras, and TensorFlow. Packt Publishing Ltd.
Verywell Mind (March 17, 2024) Common types of depression. Verywell Mind
Weinberger AH, Gbedemah M, Martinez AM, Nash D, Galea S, Goodwin RD (2018) Trends in depression prevalence in the USA from 2005 to 2015: widening disparities in vulnerable groups. Psychol Med 48(8):1308–1315
Zhang H (2004) The optimality of naive Bayes. Aa 1(2):3
Acknowledgements
This project was supported by UGC Research Grant, Bangladesh, for the project “Design and development of a robust cyberbullying detection framework using Machine Learning”. fiscal year 2020-21 (Ref:37.01.0000.073.06.048.22.856)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hridoy, M.T.A., Saha, S.R., Islam, M.M. et al. Leveraging web scraping and stacking ensemble machine learning techniques to enhance detection of major depressive disorder from social media posts. Soc. Netw. Anal. Min. 14, 239 (2024). https://doi.org/10.1007/s13278-024-01392-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-024-01392-w