research-article

"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

Authors:

Amanuel Alambo,

Raminta Daniulaityte,

Krishnaprasad Thirunarayan, and

Jyotishman PathakAuthors Info & Claims

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

October 2018

Pages 753 - 762

https://doi.org/10.1145/3269206.3271732

Published: 17 October 2018 Publication History

Abstract

Social media platforms are increasingly being used to share and seek advice on mental health issues. In particular, Reddit users freely discuss such issues on various subreddits, whose structure and content can be leveraged to formally interpret and relate subreddits and their posts in terms of mental health diagnostic categories. There is prior research on the extraction of mental health-related information, including symptoms, diagnosis, and treatments from social media; however, our approach can additionally provide actionable information to clinicians about the mental health of a patient in diagnostic terms for web-based intervention. Specifically, we provide a detailed analysis of the nature of subreddit content from domain expert's perspective and introduce a novel approach to map each subreddit to the best matching DSM-5 (Diagnostic and Statistical Manual of Mental Disorders - 5th Edition) category using multi-class classifier. Our classification algorithm analyzes all the posts of a subreddit by adapting topic modeling and word-embedding techniques, and utilizing curated medical knowledge bases to quantify relationship to DSM-5 categories. Our semantic encoding-decoding optimization approach reduces the false-alarm-rate from 30% to 2.5% over a comparable heuristic baseline, and our mapping results have been verified by domain experts achieving a kappa score of 0.84.

References

[1]

Amrudin Agovic and Arindam Banerjee. 2012. Gaussian process topic models. arXiv preprint arXiv:1203.3462 (2012).

Digital Library

[2]

Melanie Andresen and Heike Zinsmeister. 2017. Approximating Style by N-gram-based Annotation. In Proceedings of the Workshop on Stylistic Variation .

[3]

Erik Cambria, Bjorn Schuller, Bing Liu, Haixun Wang, and Catherine Havasi. 2013. Knowledge-based approaches to concept-level sentiment analysis. IEEE intelligent systems (2013).

Digital Library

[4]

Delroy Cameron, Gary A Smith, Raminta Daniulaityte, Amit P Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera Z Watkins, and Russel Falck. 2013. PREDOSE: a semantic web platform for drug abuse epidemiology using social media. Journal of biomedical informatics (2013).

Digital Library

[5]

William B Cavnar, John M Trenkle, and others. 1994. N-gram-based text categorization. Ann arbor mi (1994).

[6]

Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley (2004).

[7]

Raminta Daniulaityte, Robert Carlson, Gregory Brigham, Delroy Cameron, and Amit Sheth. 2015. "Sub is a weird drug:" A web-based study of lay attitudes about use of buprenorphine to self-treat opioid withdrawal symptoms. The American journal on addictions (2015).

[8]

Raminta Daniulaityte, Francois R Lamy, G Alan Smith, Ramzi W Nahhas, Robert G Carlson, Krishnaprasad Thirunarayan, Silvia S Martins, Edward W Boyer, and Amit Sheth. 2017. "Retweet to Pass the Blunt": Analyzing Geographic and Content Features of Cannabis-Related Tweeting Across the United States. Journal of studies on alcohol and drugs (2017).

[9]

Munmun De Choudhury, Scott Counts, and Mary Czerwinski. 2011. Identifying relevant social media content: leveraging information diversity and user cognition. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia .

Digital Library

[10]

Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013a. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference .

Digital Library

[11]

Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013b. Predicting depression via social media. ICWSM (2013).

[12]

Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and Mrinal Kumar. 2016. Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems .

Digital Library

[13]

George Gkotsis, Anika Oellrich, Tim Hubbard, Richard Dobson, Maria Liakata, Sumithra Velupillai, and Rina Dutta. 2016. The language of mental health problems in social media. In Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology .

[14]

George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim JP Hubbard, Richard JB Dobson, and Rina Dutta. 2017. Characterisation of mental health conditions in social media using Informed Deep Learning. Scientific reports (2017).

[15]

Li Guan, Bibo Hao, Qijin Cheng, Paul SF Yip, and Tingshao Zhu. 2015. Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features: classification model. JMIR mental health (2015).

[16]

Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning (2014).

Digital Library

[17]

Matthew R Jamnik and David J Lane. 2017. The Use of Reddit as an Inexpensive Source for High-Quality Data. Practical Assessment, Research & Evaluation (2017).

[18]

Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. arXiv preprint arXiv:1704.08345 (2017).

[19]

Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence (2016).

[20]

Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Munmun De Choudhury. 2015. Detecting changes in suicide content manifested in social media following celebrity suicides. In Proceedings of the 26th ACM Conference on Hypertext & Social Media .

Digital Library

[21]

Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth, and I Budak Arpinar. 2018. Predictive Analysis on Twitter: Techniques and Applications. Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining, Springer-Nature (2018).

[22]

Francois R Lamy, Raminta Daniulaityte, Ramzi W Nahhas, Monica J Barratt, Alan G Smith, Amit Sheth, Silvia S Martins, Edward W Boyer, and Robert G Carlson. 2017. Increases in synthetic cannabinoids-related harms: Results from a longitudinal web-based content analysis. International Journal of Drug Policy (2017).

[23]

Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1997. Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models. (1997).

[24]

Neil A Macmillan and Howard L Kaplan. 1985. Detection theory analysis of group data: estimating sensitivity from average hit and false-alarm rates. Psychological bulletin (1985).

[25]

Matthew J Maenner, Marshalyn Yeargin-Allsopp, Kim Van Naarden Braun, Deborah L Christensen, and Laura A Schieve. 2016. Development of a machine learning algorithm for the surveillance of autism spectrum disorder. PloS one (2016).

[26]

Shervin Malmasi, Marcos Zampieri, and Mark Dras. 2016. Predicting post severity in mental health forums. Proceedings of the third workshop on computational lingusitics and clinical psychology .

[27]

Stefano Massei, Davide Palitta, and Leonardo Robol. 2017. Solving rank structured Sylvester and Lyapunov equations. arXiv preprint arXiv:1711.05493 (2017).

[28]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems .

Digital Library

[29]

David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing .

Digital Library

[30]

M Mitchell, K Hollingshead, and G Coppersmith. 2015. Quantifying the language of schizophrenia in social media. In Proceedings of the 2nd workshop on Computational linguistics and clinical psychology: From linguistic signal to clinical reality .

[31]

Finn Årup Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011).

[32]

Albert Park, Mike Conway, and Annie T Chen. 2018. Examining thematic similarity, difference, and membership in three online mental health communities from Reddit: a text mining and visualization approach. Computers in Human Behavior (2018).

Digital Library

[33]

D Preoţiuc-Pietro, M Sap, H A Schwartz, and L Ungar. 2015. Mental illness detection at the World Well-Being Project for the CLPsych 2015 shared task. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality .

[34]

Elvis Saravia, Chun-Hao Chang, Renaud Jollet De Lorenzo, and Yi-Shin Chen. 2016. MIDAS: Mental illness detection and analysis via social media. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on .

Digital Library

[35]

Judy Hanwen Shen and Frank Rudzicz. 2017. Detecting Anxiety through Reddit. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology--From Linguistic Signal to Clinical Reality .

[36]

Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. Advances in neural information processing systems .

Digital Library

[37]

Joseph Thomas. 2009. Medical records and issues in negligence. Indian journal of urology: IJU: journal of the Urological Society of India (2009).

[38]

Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In ICDM .

Digital Library

[39]

Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, and Amit Sheth. 2016. Word embeddings to enhance twitter gang member profile identification. (2016).

[40]

Sanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S Al-Olimat, Manas Gaur, AH Yazdavar, and Krishnaprasad Thirunarayan. 2017. Feature Engineering for Twitter-based Applications. Feature Engineering for Machine Learning and Data Analytics (2017).

[41]

Marie Bee Hui Yap, Shireen Mahtani, Ronald M Rapee, Claire Nicolas, Katherine A Lawrence, Andrew Mackinnon, and Anthony F Jorm. 2018. A tailored web-based intervention to improve parenting risk and protective factors for adolescent depression and anxiety problems: postintervention findings from a randomized controlled trial. Journal of medical Internet research (2018).

[42]

Amir Hossein Yazdavar, Hussein S Al-Olimat, Monireh Ebrahimi, Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad Thirunarayan, Jyotishman Pathak, and Amit Sheth. 2017. Semi-supervised approach to monitoring clinical depressive symptoms in social media. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 .

Digital Library

[43]

Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining .

Digital Library

Cited By

Lokala UPhukan ODastidar TLamy FDaniulaityte RSheth A(2024)Detecting Substance Use Disorder Using Social Media Data and the Dark Web: Time- and Knowledge-Aware StudyJMIRx Med10.2196/485195(e48519-e48519)Online publication date: 1-May-2024
https://doi.org/10.2196/48519
Pendse SKumar NDe Choudhury M(2024)Quantifying the Pollan Effect: Investigating the Impact of Emerging Psychiatric Interventions on Online Mental Health DiscourseProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642477(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642477
Show More Cited By

Index Terms

"Let Me Tell You About Your Mental Health!": Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention

Recommendations

Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention
WWW '19: The World Wide Web Conference

Mental health illness such as depression is a significant risk factor for suicide ideation, behaviors, and attempts. A report by Substance Abuse and Mental Health Services Administration (SAMHSA) shows that 80% of the patients suffering from Borderline ...
Read More
Language on Reddit Reveals Differential Mental Health Markers for Individuals posting in Immigration Communities
WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023

The experience of immigrating to a foreign land is associated with exposure to new cultures, changes in social networks, and challenges to prevalent systems of meaning. A body of literature has shown that the immigration experience, while pursued with ...
Read More
COVID-19 and Mental Health/Substance Use Disorders on Reddit: A Longitudinal Study
Pattern Recognition. ICPR International Workshops and Challenges
Abstract
COVID-19 pandemic has adversely and disproportionately impacted people suffering from mental health issues and substance use problems. This has been exacerbated by social isolation during the pandemic and the social stigma associated with mental ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

October 2018

2362 pages

ISBN:9781450360142

DOI:10.1145/3269206

General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Institute of Drug Abuse
National Institutes of Health

Conference

CIKM '18

Sponsor:

CIKM '18: The 27th ACM International Conference on Information and Knowledge Management

October 22 - 26, 2018

Torino, Italy

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
1,703
Total Downloads

Downloads (Last 12 months)218
Downloads (Last 6 weeks)22

Other Metrics

View Author Metrics

Citations

Cited By

Lokala UPhukan ODastidar TLamy FDaniulaityte RSheth A(2024)Detecting Substance Use Disorder Using Social Media Data and the Dark Web: Time- and Knowledge-Aware StudyJMIRx Med10.2196/485195(e48519-e48519)Online publication date: 1-May-2024
https://doi.org/10.2196/48519
Pendse SKumar NDe Choudhury M(2024)Quantifying the Pollan Effect: Investigating the Impact of Emerging Psychiatric Interventions on Online Mental Health DiscourseProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642477(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642477
Lin F(2024)Association Analysis of Population Health Data Based on Topsis Evaluation Model and XGBoost Algorithm2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)10.1109/ICPECA60615.2024.10471119(419-426)Online publication date: 26-Jan-2024
https://doi.org/10.1109/ICPECA60615.2024.10471119
Islam MHassan SAkter SJibon FSahidullah M(2024)A comprehensive review of predictive analytics models for mental illness using machine learning algorithmsHealthcare Analytics10.1016/j.health.2024.1003506(100350)Online publication date: Dec-2024
https://doi.org/10.1016/j.health.2024.100350
Gaur MSheth A(2024)Building trustworthy NeuroSymbolic AI Systems: Consistency, reliability, explainability, and safetyAI Magazine10.1002/aaai.12149Online publication date: 14-Feb-2024
https://doi.org/10.1002/aaai.12149
Kim SCha JKim DPark E(2023)Understanding Mental Health Issues in Different Subdomains of Social Networking Services: Computational Analysis of Text-Based Reddit PostsJournal of Medical Internet Research10.2196/4907425(e49074)Online publication date: 30-Nov-2023
https://doi.org/10.2196/49074
Liu TJain DRapole SCurtis BEichstaedt JUngar LGuntuku SHorvát ÁHall WContractor NFröhling LOgnayova KTaneja HWeber IGligori? KMejova Y(2023)Detecting Symptoms of Depression on RedditProceedings of the 15th ACM Web Science Conference 202310.1145/3578503.3583621(174-183)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3578503.3583621
Mittal JBelorkar AJakhetiya VPokuri VGuntuku SHorvát ÁHall WContractor NFröhling LOgnayova KTaneja HWeber IGligori? KMejova Y(2023)Language on Reddit Reveals Differential Mental Health Markers for Individuals posting in Immigration CommunitiesProceedings of the 15th ACM Web Science Conference 202310.1145/3578503.3583600(153-162)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3578503.3583600
Thieme AHanratty MLyons MPalacios JMarques RMorrison CDoherty G(2023)Designing Human-centered AI for Mental Health: Developing Clinically Relevant Applications for Online CBT TreatmentACM Transactions on Computer-Human Interaction10.1145/356475230:2(1-50)Online publication date: 17-Mar-2023
https://dl.acm.org/doi/10.1145/3564752
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents