Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2567948.2579272acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

On the ground validation of online diagnosis with Twitter and medical records

Published: 07 April 2014 Publication History

Abstract

Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health.

References

[1]
T. Bodnar and M. Salathé. Validating models for disease detection using twitter. In Proceedings of the 22nd International Conference on World Wide Web Companion, WWW '13 Companion, pages 699--702, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee.
[2]
D. Butler. When Google got u wrong. Nature, 494(7436):155--156, Feb. 2013.
[3]
E. H. Chan, T. F. Brewer, L. C. Madoff, M. P. Pollack, A. L. Sonricker, M. Keller, C. C. Freifeld, M. Blench, A. Mawudeku, and J. S. Brownstein. Global capacity for emerging infectious disease detection. Proceedings of the National Academy of Sciences, 107(50):21701--21706, 2010.
[4]
A. Culotta. Towards detecting in uenza epidemics by analyzing Twitter messages . In the First Workshop, pages 115--122, New York, New York, USA, 2010. ACM Press.
[5]
S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts. Predicting consumer behavior with Web search. Proceedings of the National Academy of Sciences of the United States of America, 107(41):17486--17490, Oct. 2010.
[6]
F. E. Grubbs. Procedures for Detecting Outlying Observations in Samples. Technometrics, 11(1):1--21, 1969.
[7]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, Nov. 2009.
[8]
D. L. Heymann and G. R. Rodier. Hot spots in a wired world: f WHO g surveillance of emerging and re-emerging infectious diseases. The Lancet Infectious Diseases, 1(5):345 -- 353, 2001.
[9]
A. Lamb, M. J. Paul, and M. Dredze. Separating fact from fear: Tracking u infections on twitter. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 789--795, Atlanta, Georgia, June 2013. Association for Computational Linguistics.
[10]
R. L. Marquet, A. I. Bartelds, S. P. van Noort, C. E. Koppeschaar, J. Paget, F. G. Schellevis, and J. van der Zee. Internet-based monitoring of in uenza-like illness (ILI) in the general population of the Netherlands during the 2003-2004 in uenza season. BMC public health, 6(1):242, 2006.
[11]
D. R. Olson, K. J. Konty, M. Paladini, C. Viboud, and L. Simonsen. Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic In uenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS computational biology, 9(10):e1003256, Oct. 2013.
[12]
M. F. Porter. An algorithm for suffix stripping. Program: electronic library and information systems, 14(3):130--137, 1980.
[13]
M. Salathé, L. Bengtsson, T. J. Bodnar, D. D. Brewer, J. S. Brownstein, C. Buckee, E. M. Campbell, C. Cattuto, S. Khandelwal, P. L. Mabry, and A. Vespignani. Digital epidemiology. PLoS computational biology, 8(7):e1002616, Jul 2012.
[14]
L. Todorovski and S. Dzeroski. Combining classifiers with meta decision trees. Mach. Learn., 50(3):223--249, Mar. 2003.
[15]
G. Tsirogiannis, D. Frossyniotis, K. Nikita, and A. Stafylopatis. A meta-classifier approach for medical diagnosis. In G. Vouros and T. Panayiotopoulos, editors, Methods and Applications of Artificial Intelligence, volume 3025 of Lecture Notes in Computer Science, pages 154--163. Springer Berlin Heidelberg, 2004.
[16]
S. P. van Noort, M. Muehlen, H. Rebelo de Andrade, C. Koppeschaar, J. M. Lima Lourenco, and M. G. Gomes. Gripenet: an internet-based system to monitor in uenza-like illness uniformly across Europe. Euro Surveill., 12(7):5--6, Jul 2007.

Cited By

View all
  • (2022)Detecting Personal Health Mentions from Social Media Using Supervised Machine LearningPersonal Health Informatics10.1007/978-3-031-07696-1_12(247-266)Online publication date: 23-Nov-2022
  • (2022)Deep Learning for Medical Informatics and Public HealthBlockchain and Deep Learning10.1007/978-3-030-95419-2_13(285-308)Online publication date: 26-Mar-2022
  • (2022)Deep Learning for Medical Informatics and Public HealthBioinformatics and Medical Applications10.1002/9781119792673.ch7(117-146)Online publication date: 23-Mar-2022
  • Show More Cited By

Index Terms

  1. On the ground validation of online diagnosis with Twitter and medical records

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web
    April 2014
    1396 pages
    ISBN:9781450327459
    DOI:10.1145/2567948

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 April 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. digital epidemiology
    2. remote diagnosis
    3. twitter
    4. validation

    Qualifiers

    • Short-paper

    Conference

    WWW '14
    Sponsor:
    • IW3C2

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Detecting Personal Health Mentions from Social Media Using Supervised Machine LearningPersonal Health Informatics10.1007/978-3-031-07696-1_12(247-266)Online publication date: 23-Nov-2022
    • (2022)Deep Learning for Medical Informatics and Public HealthBlockchain and Deep Learning10.1007/978-3-030-95419-2_13(285-308)Online publication date: 26-Mar-2022
    • (2022)Deep Learning for Medical Informatics and Public HealthBioinformatics and Medical Applications10.1002/9781119792673.ch7(117-146)Online publication date: 23-Mar-2022
    • (2020)Detection of Behavioral Anomalies in Medication Adherence Patterns from Patients with Serious Mental Illness Engaged with a Digital Medicine System (Preprint)JMIR Mental Health10.2196/21378Online publication date: 12-Jun-2020
    • (2019)Internet-Based Sources of Health Information: A Systematic Literature Review (Preprint)Journal of Medical Internet Research10.2196/13680Online publication date: 24-Feb-2019
    • (2018)Effective surveillance and predictive mapping of mosquito-borne diseases using social mediaJournal of Computational Science10.1016/j.jocs.2017.07.00325(406-415)Online publication date: Mar-2018
    • (2017)Forecasting Seasonal Influenza Fusing Digital Indicators and a Mechanistic Disease ModelProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052678(311-319)Online publication date: 3-Apr-2017
    • (2017)Using Large-Scale Social Media Networks as a Scalable Sensing System for Modeling Real-Time Energy Utilization PatternsIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2016.261886047:10(2627-2640)Online publication date: Oct-2017
    • (2017)An ontology-based framework for extracting spatio-temporal influenza data using TwitterInternational Journal of Digital Earth10.1080/17538947.2017.1411535(1-23)Online publication date: 8-Dec-2017
    • (2016)Features for Ranking Tweets Based on Credibility and Newsworthiness2016 International Conference on Collaboration Technologies and Systems (CTS)10.1109/CTS.2016.0023(18-25)Online publication date: Oct-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media