Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357729.3357745acmconferencesArticle/Chapter ViewAbstractPublication PagesdphConference Proceedingsconference-collections
short-paper

Unsupervised Classification of Health Content on Reddit

Published: 20 November 2019 Publication History

Abstract

Online forums are easily accessible to the public and useful to acquire and disseminate health information, however, advanced methods have to be applied to correctly interpret the content. For this reason, we propose the application of an unsupervised embedding-based approach for health content classification. Specifically, we utilise word embeddings and a clustering method to create content-sensitive word clusters; we then align the health content with the clusters classifying it into illnesses/medication/disease agents. The results suggest that a cosine similarity of 0.70 is preferred for the creation of informative clusters as well as for the automatically generation of synonyms, acronyms, abbreviations and common misspellings. Our approach does not only demonstrate the potential given by discussion forums, in particular, Reddit, for unsupervised content classification but also for dictionary building from informal health content.

References

[1]
Debra Betts, Hannah G Dahlen, and Caroline A Smith. 2014. A Search for Hope and Understanding: An Analysis of Threatened Miscarriage Internet Forums. Midwifery, Vol. 30, 6 (2014), 650--656.
[2]
Ellen Brady, Julia Segar, and Caroline Sanders. 2016. "You Get to Know the People and Whether they're Talking Sense Or Not": Negotiating Trust on Health-Related Forums. Social Science & Medicine, Vol. 162 (2016), 151--157.
[3]
Jason HD Cho, Tony Gao, and Roxana Girju. 2017. Identifying Medications That Patients Stopped Taking in Online Health Forums. In Proceedings of the 11th International Conference on Semantic Computing. IEEE, 141--148.
[4]
Wen-Ying Sylvia Chou, Yvonne M Hunt, Ellen B Beckjord, Richard P Moser, and Bradford W Hesse. 2009. Social Media Use In the United States: Implications for Health Communication. Journal of Medical Internet Research, Vol. 11, 4 (2009).
[5]
Arman Cohan, Sydney Young, Andrew Yates, and Nazli Goharian. 2017. Triaging Content Severity in Online Mental Health Forums. Journal of the Association for Information Science and Technology, Vol. 68, 11 (2017), 2675--2689.
[6]
Xiangfeng Dai, Marwan Bikdash, and Bradley Meyer. 2017. From Social Media to Public Health Surveillance: Word Embedding Based Clustering Method for Twitter Classification. In Proceedings of the SoutheastCon 2017. IEEE.
[7]
Patricia Driscoll, S Lipsky Gorman, and Noémie Elhadad. 2013. Learning Attribution Labels for Disorder Mentions in Online Health Forums. In Proceedings of the SIGIR Workshop on Health Search and Discovery .
[8]
Ronen Feldman, Oded Netzer, Aviv Peretz, and Binyamin Rosenfeld. 2015. Utilizing Text Mining on Online Medical Forums to Predict Label Change Due to Adverse Drug Reactions. In Proceedings of the 21th SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1779--1788.
[9]
Alexander S Fiksdal, Ashok Kumbamu, Ashutosh S Jadhav, Cristian Cocos, Laurie a Nelsen, Jyotishman Pathak, and Jennifer B McCormick. 2014. Evaluating the Process of Online Health Information Searching: A Qualitative Approach to Exploring Consumer Perspectives. Journal of Medical Internet Research, Vol. 16, 10 (2014).
[10]
Susannah Fox and Maeve Duggan. 2013. Health Online 2013 . http://www.pewinternet.org/2013/01/15/health-online-2013/
[11]
Katherine J Gold, Martha E Boggs, Emeline Mugisha, and Christie Lancaster Palladino. 2012. Internet Message Boards for Pregnancy Loss: Who's On-line and Why? Women's Health Issues, Vol. 22, 1 (2012), 67--72.
[12]
Yunliang Jiang, Qingzi Vera Liao, Qian Cheng, Richard B Berlin, and Bruce R Schatz. 2012. Designing and Evaluating a Clustering System for Organizing and Integrating Patient Drug Outcomes in Personal Health Messages. In Proceedings of the AMIA Annual Symposium. AMIA, 417.
[13]
Ray Jones, Siobhan Sharkey, Tamsin Ford, Tobit Emmens, Elaine Hewis, Janet Smithson, Bryony Sheaves, and Christabel Owens. 2011. Online Discussion Forums for Young People Who Self-Harm: User Views. The Psychiatrist, Vol. 35, 10 (2011), 364--368.
[14]
Christian Karmen, Robert C Hsiung, and Thomas Wetter. 2015. Screening Internet Forum Participants for Depression Symptoms by Assembling and Enhancing Multiple NLP Methods. Computer Methods and Programs in Biomedicine, Vol. 120, 1 (2015), 27--36.
[15]
Taha A. Kass-Hout and Hend Alhinnawi. 2013. Social Media in Public Health. British Medical Bulletin, Vol. 108, 1 (2013), 5--24.
[16]
Yungchang Ku, Chaochang Chiu, Yulei Zhang, Hsinchun Chen, and Handsome Su. 2014. Text Mining Self-Disclosing Health Information for Public Health Service. Journal of the Association for Information Science and Technology, Vol. 65, 5 (2014), 928--947.
[17]
Reeva Lederman, Hanmei Fan, Stephen Smith, and Shanton Chang. 2014. Who Can You Trust? Credibility Assessment in Online Health Forums. Health Policy and Technology, Vol. 3, 1 (2014), 13--25.
[18]
Xiao Liu and Hsinchun Chen. 2013. AZDrugMiner: An Information Extraction System for Mining Patient-Reported Adverse Drug Events in Online Patient Forums. In Proceedings of the International Conference on Smart Health. Springer, 134--150.
[19]
Yunzhong Liu, Yi Chen, Jiliang Tang, and Huan Liu. 2015. Context-Aware Experience Extraction from Online Health Forums. In Proceedings of the International Conference on Healthcare Informatics. IEEE, 42--47.
[20]
Edward Loper and Steven Bird. 2002. NLTK: the Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. ACL, 63--70.
[21]
Yingjie Lu, Pengzhu Zhang, Jingfang Liu, Jia Li, and Shasha Deng. 2013. Health-Related Hot Topic Detection in Online Communities Using Text Clustering. Plos One, Vol. 8, 2 (2013).
[22]
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 746--751.
[23]
orgnohpxf. 2018-04--20. I created a txt list of all subreddits... 1,082,444 of the m. https://www.reddit.com/r/ListOfSubreddits/comments/8drbn3/i_created_a_txt_list_of_all_subreddits_1082444_of/
[24]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 1532--1543.
[25]
Pew Research Center. 2013. The Internet and Health . http://www.pewinternet.org/2013/02/12/the-internet-and-health/
[26]
Radim v Rehr uv rek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, 45--50.
[27]
Jerry Rolia, Wen Yao, Sujoy Basu, Wei-Nchih Lee, Sharad Singhal, Akhil Kumar, and Sharat R Sabbella. 2013. Tell Me What I Don't Know - Making the Most of Social Health Forums. In Proceedings of the International Conference on Healthcare Informatics. IEEE, 447--454.
[28]
Hariprasad Sampathkumar, Bo Luo, and Xue-wen Chen. 2012. Mining Adverse Drug Side-Effects from Online Medical Forums. In Proceedings of the Second International Conference on Healthcare Informatics, Imaging and Systems Biology. IEEE, 150--150.
[29]
Fabian Sudau, Tim Friede, Jens Grabowski, Janka Koschack, Philip Makedonski, and Wolfgang Himmel. 2014. Sources of Information and Behavioral Patterns in Online Health Forums: Observational Study. Journal of Medical Internet Research, Vol. 16, 1 (2014).
[30]
Martin Tanis. 2008. Health-Related On-line Forums: What's the Big Attraction? Journal of Health Communication, Vol. 13, 7 (2008), 698--714.
[31]
Sheng Wang, Yanen Li, Duncan Ferguson, and Chengxiang Zhai. 2014. SideEffectPMT: An Unsupervised Topic Model To Mine Adverse Drug Reactions From Health Forums. In Proceedings of the Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 321--330.
[32]
World Health Organization. 2018. Public Health Surveillance . http://www.who.int/topics/public_health_surveillance/en/
[33]
World Health Organization. 2019. Infectious Diseases . https://www.who.int/topics/infectious_diseases/en/
[34]
Thomas Zhang, Jason HD Cho, and Chengxiang Zhai. 2014. Understanding User Intents in Online Health Forums. In Proceedings of the Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 220--229.

Cited By

View all
  • (2021)Benchmark of public intent recognition servicesLanguage Resources and Evaluation10.1007/s10579-021-09563-3Online publication date: 4-Oct-2021
  • (2020)Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network ApproachIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2020.300121624:10(2733-2742)Online publication date: Oct-2020
  • (2020)Focused Query Expansion with Entity Cores for Patient-Centric Health SearchThe Semantic Web – ISWC 202010.1007/978-3-030-62419-4_31(547-564)Online publication date: 1-Nov-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DPH2019: Proceedings of the 9th International Conference on Digital Public Health
November 2019
147 pages
ISBN:9781450372084
DOI:10.1145/3357729
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. discussion forum
  3. health informatics
  4. unsupervised learning
  5. vocabulary building
  6. word embeddings

Qualifiers

  • Short-paper

Funding Sources

Conference

DPH2019
Sponsor:
  • SIGKDD
  • University College London

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Benchmark of public intent recognition servicesLanguage Resources and Evaluation10.1007/s10579-021-09563-3Online publication date: 4-Oct-2021
  • (2020)Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network ApproachIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2020.300121624:10(2733-2742)Online publication date: Oct-2020
  • (2020)Focused Query Expansion with Entity Cores for Patient-Centric Health SearchThe Semantic Web – ISWC 202010.1007/978-3-030-62419-4_31(547-564)Online publication date: 1-Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media