Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Survey of Text-based Epidemic Intelligence: A Computational Linguistics Perspective

Published: 16 October 2019 Publication History

Abstract

Epidemic intelligence deals with the detection of outbreaks using formal (such as hospital records) and informal sources (such as user-generated text on the web) of information. In this survey, we discuss approaches for epidemic intelligence that use textual datasets, referring to it as “text-based epidemic intelligence.” We view past work in terms of two broad categories: health mention classification (selecting relevant text from a large volume) and health event detection (predicting epidemic events from a collection of relevant text). The focus of our discussion is the underlying computational linguistic techniques in the two categories. The survey also provides details of the state of the art in annotation techniques, resources, and evaluation strategies for epidemic intelligence.

References

[1]
Hafsah Aamer, Bahadorreza Ofoghi, and Karin Verspoor. 2016. Syndromic surveillance through measuring lexical shift in emergency department chief complaint texts. In Proceedings of the Australasian Language Technology Association Workshop 2016. 45--53.
[2]
Dillon C. Adam, Jitendra Jonnagaddala, Daniel Han-Chen, Sean Batongbacal, Luan Almeida, Jing Z. Zhu, Jenny J. Yang, Jumail M. Mundekkat, Steven Badman, Abrar Chughtai, et al. 2017. ZikaHack 2016: A digital disease detection competition. In Proceedings of the International Workshop on Digital Disease Detection Using Social Media 2017 (DDDSM’17). 39--46.
[3]
Mohammed Ali Al-garadi, Muhammad Sadiq Khan, Kasturi Dewi Varathan, Ghulam Mujtaba, and Abdelkodose M. Al-Kabsi. 2016. Using online social networks to track a pandemic: A systematic review. J. Biomed. Inf. 62 (2016), 1--11.
[4]
Cristiano Alicino, Nicola Luigi Bragazzi, Valeria Faccio, Daniela Amicizia, Donatella Panatto, Roberto Gasparini, Giancarlo Icardi, and Andrea Orsi. 2015. Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes. Infect. Dis. Pov. 4, 1 (2015), 54.
[5]
Eiji Aramaki, Sachiko Maskawa, and Mizuki Morita. 2011. Twitter catches the flu: Detecting influenza epidemics using Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1568--1576.
[6]
Elena Arsevska, Mathieu Roche, Pascal Hendrikx, David Chavernac, Sylvain Falala, Renaud Lancelot, and Barbara Dufour. 2016. Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web. Comput. Electr. Agric. 123, C (2016), 104--115.
[7]
Elena Arsevska, Sarah Valentin, Julien Rabatel, Jocelyn de Goër de Hervé, Sylvain Falala, Renaud Lancelot, and Mathieu Roche. 2018. Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System. PloS ONE 13, 8 (2018), e0199960.
[8]
Adrian Benton, Glen Coppersmith, and Mark Dredze. 2017. Ethical research protocols for social media health research. In Proceedings of the 1st ACL Workshop on Ethics in Natural Language Processing. 94--102.
[9]
Theresa Marie Bernardo, Andrijana Rajic, Ian Young, Katie Robiadek, Mai T. Pham, and Julie A. Funk. 2013. Scoping review on search queries and social media for disease surveillance: A chronology of innovation. J. Med. Internet Res. 15, 7 (2013).
[10]
Valérie Bertaud-Gounot, Régis Duvauferrier, and Anita Burgun. 2012. Ontology and medical diagnosis. Inf. Health Soc. Care 37, 2 (2012), 51--61.
[11]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (Jan 2003), 993--1022.
[12]
Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucl. Acids Res. 32, suppl. 1 (2004), D267--D270.
[13]
Justin R. Boyle, Ross S. Sparks, Gerben B. Keijzers, Julia L. Crilly, James F. Lind, and Louise M. Ryan. 2011. Prediction and surveillance of influenza epidemics. Med. J. Austr. 194, 4 (2011), S28.
[14]
John S. Brownstein, Clark C. Freifeld, and Lawrence C. Madoff. 2009. Digital disease detection—Harnessing the Web for public health surveillance. N. Engl. J. Med. 360, 21 (2009), 2153--2157.
[15]
Wendy W. Chapman, Lee M. Christensen, Michael M. Wagner, Peter J. Haug, Oleg Ivanov, John N. Dowling, and Robert T. Olszewski. 2005. Classifying free-text triage chief complaints into syndromic categories with natural language processing. Artif. Intell. Med. 33, 1 (2005), 31--40.
[16]
Lauren E. Charles-Smith, Tera L. Reynolds, Mark A. Cameron, Mike Conway, Eric H. Y. Lau, Jennifer M. Olsen, Julie A. Pavlin, Mika Shigematsu, Laura C. Streichert, Katie J. Suda, et al. 2015. Using social media for actionable disease surveillance and outbreak management: A systematic literature review. PloS ONE 10, 10 (2015), e0139701.
[17]
Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, and B. Aditya Prakash. 2016. Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models. Data Min. Knowl. Discov. 30, 3 (2016), 681--710.
[18]
Nigel Collier, Reiko Matsuda Goodwin, John McCrae, Son Doan, Ai Kawazoe, Mike Conway, Asanee Kawtrakul, Koichi Takeuchi, and Dinh Dien. 2010. An ontology-driven system for detecting global health events. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 215--222.
[19]
N. Collier, A. Kawazoe, L. Jin, M. Shigematsu, D. Dien, et al. 2007. The BioCaster ontology: A multilingual ontology for infectious disease outbreak surveillance: Rationale, design and challenges. J. Lang. Resour. Eval. 40 (2007), 405--413.
[20]
Mike Conway, John Dowling, and Wendy Chapman. 2011. Developing an application ontology for mining free text clinical reports: The extended syndromic surveillance ontology. In Proceedings of the 3rd International Workshop on Health Document Text Mining and Information Analysis (LOUHI’11). Citeseer, 75--82.
[21]
Mike Conway, John N. Dowling, and Wendy W. Chapman. 2013. Using chief complaints for syndromic surveillance: A review of chief complaint based classifiers in North America. J. Biomed. Inf. 46, 4 (2013), 734--743.
[22]
Monica Crubézy, Martin O’Connor, Zachary Pincus, Mark A. Musen, and David L. Buckeridge. 2005. Ontology-centered syndromic surveillance for bioterrorism. IEEE Intell. Syst. 20, 5 (2005), 26--35.
[23]
Xiangfeng Dai, Marwan Bikdash, and Bradley Meyer. 2017. From social media to public health surveillance: Word embedding based clustering method for twitter classification. In Proceedings of the IEEE Region 3 Technical, Professional, and Student Conference (SoutheastCon’17). IEEE, 1--7.
[24]
Sameer Dhoju, Md Main Uddin Rony, Muhammad Ashad Kabir, and Naeemul Hassan. 2019. Differences in health news from reliable and unreliable media. In Proceedings of the International Workshop on Misinformation, Computational Fact-Checking and Credible Web. The Web Conference.
[25]
Son Doan, Ai Kawazoe, Nigel Collier, et al. 2008. Global health monitor-a web-based system for detecting and mapping infectious diseases. In Proceedings of the 3rd International Joint Conference on Natural Language Processing.
[26]
Clark C. Freifeld, Kenneth D. Mandl, Ben Y. Reis, and John S. Brownstein. 2008. HealthMap: Global infectious disease monitoring through automated classification and visualization of Internet media reports. J. Am. Med. Inf. Assoc. 15, 2 (2008), 150--157.
[27]
Isaac Chun-Hai Fung, Zion Tsz Ho Tse, and King-Wa Fu. 2015. The use of social media in public health surveillance. West. Pac. Surveill. Resp. J. 6, 2 (2015), 3.
[28]
Jeremy Ginsberg, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2009), 1012.
[29]
Janaína Gomide, Adriano Veloso, Wagner Meira Jr, Virgílio Almeida, Fabrício Benevenuto, Fernanda Ferraz, and Mauro Teixeira. 2011. Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In Proceedings of the 3rd International Web Science Conference. ACM, 3.
[30]
Thomas R. Gruber. 1993. A translation approach to portable ontology specifications. Knowl. Acq. 5, 2 (1993), 199--220.
[31]
I. S. O. Hayate, Shoko Wakamiya, and Eiji Aramaki. 2016. Forecasting word model: Twitter-based influenza surveillance and prediction. In Proceedings of the 26th International Conference on Computational Linguistics. 76--86.
[32]
Kelly J. Henning. 2004. What is syndromic surveillance? Morbid. Mortal. Week. Rep. 53 (2004), 7--11.
[33]
Richard S. Hopkins, Catherine C. Tong, Howard S. Burkom, Judy E. Akkina, John Berezowski, Mika Shigematsu, Patrick D. Finley, Ian Painter, Roland Gamache, Victor J. Del Rio Vilas, et al. 2017. A practitioner-driven research agenda for syndromic surveillance. Publ. Health Rep. 132, Suppl 1 (2017), 116S--126S.
[34]
Pin Huang, Andrew MacKinlay, and Antonio Jimeno Yepes. 2016. Syndromic surveillance using generic medical entities on Twitter. In Proceedings of the Australasian Language Technology Association Workshop 2016. 35--44.
[35]
Anette Hulth, Gustaf Rydevik, and Annika Linde. 2009. Web queries as a source for syndromic surveillance. PloS ONE 4, 2 (2009), e4378.
[36]
Adith Iyer, Aditya Joshi, Sarvnaz Karimi, Ross Sparks, and Cecile Paris. 2019. Figurative usage detection of symptom words to improve personal health mention detection. In Proceedings of the Conference of Association for Computational Linguistics. Association for Computational Linguistics.
[37]
Keyuan Jiang, Ricardo Calix, and Matrika Gupta. 2016. Construction of a personal experience tweet corpus for health surveillance. In Proceedings of the 15th Workshop on Biomedical Natural Language Processing. 128--135.
[38]
Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire. 2017. Sentiment resources: Lexicons and datasets. In A Practical Guide to Sentiment Analysis. Springer, 85--106.
[39]
Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, and C. Raina MacIntyre. 2019. A comparison of word-based and context-based representations for classification problems in health informatics. In Proceedings of Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
[40]
Shin Kanouchi, Mamoru Komachi, Naoaki Okazaki, Eiji Aramaki, and Hiroshi Ishikawa. 2015. Who caught a cold?-identifying the subject of a symptom. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 1660--1670.
[41]
Sarvnaz Karimi, Chen Wang, Alejandro Metke-Jimenez, Raj Gaire, and Cecile Paris. 2015. Text and data mining techniques in adverse drug reaction detection. Comput. Surv. 47, 4 (2015), 56.
[42]
Payam Karisani and Eugene Agichtein. 2018. Did you really just have a heart attack?: Toward robust detection of personal health mentions in social media. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 137--146.
[43]
Alex Lamb, Michael J. Paul, and Mark Dredze. 2013. Separating fact from fear: Tracking flu infections on twitter. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 789--795.
[44]
Vasileios Lampos, Bin Zou, and Ingemar Johansson Cox. 2017. Enhancing feature selection using word embeddings: The case of flu surveillance. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 695--704.
[45]
Mark E. Larsen, Tjeerd W. Boonstra, Philip J. Batterham, Bridianne O’Dea, Cecile Paris, and Helen Christensen. 2015. We feel: Mapping emotion on Twitter. IEEE J. Biomed. Health Inf. 19, 4 (2015), 1246--1252.
[46]
Gaël Lejeune, Antoine Doucet, Roman Yangarber, and Nadine Lucas. 2010. Filtering news for epidemic surveillance: Towards processing more languages with fewer resources. In Proceedings of the 4th International Workshop on Cross-lingual Information Access.
[47]
Donald A. B. Lindberg, Betsy L. Humphreys, and Alexa T. McCray. 1993. The unified medical language system. Methods Inf. Med. 32, 04 (1993), 281--291.
[48]
Hsin-Min Lu, Hsinchun Chen, Daniel Zeng, Chwan-Chuen King, Fuh-Yuan Shih, Tsung-Shu Wu, and Jin-Yi Hsiao. 2009. Multilingual chief complaint classification for syndromic surveillance: An experiment with Chinese chief complaints. Int. J. Med. Inf. 78, 5 (2009), 308--320.
[49]
Christopher D. Manning, Christopher D. Manning, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
[50]
Aurélie Névéol, Won Kim, W. John Wilbur, and Zhiyong Lu. 2009. Exploring two biomedical text genres for disease recognition. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing. Association for Computational Linguistics, 144--152.
[51]
Bahadorreza Ofoghi, Meghan Mann, and Karin Verspoor. 2016. Towards early discovery of salient health threats: A social media emotion classification technique. In Proceedings of the Pacific Symposium on Biocomputing (Biocomputing’16). World Scientific, 504--515.
[52]
A. Okhmatovskaia, W. Chapman, N. Collier, J. Espino, and D. L. Buckeridge. 2009. SSO: The syndromic surveillance ontology. In Proceedings of the International Society for Disease Surveillance.
[53]
Robert T. Olszewski. 2003. Bayesian classification of triage diagnoses for the early detection of epidemics. In Proceedings of the International Florida Artificial Intelligence Research Society Conference. 412--416.
[54]
Michael J. Paul and Mark Dredze. 2011. You are what you Tweet: Analyzing Twitter for public health. International AAAI Conference on Web and Social Media 20, 265--272.
[55]
Michael J. Paul and Mark Dredze. 2012. A model for mining public health topics from Twitter. Health 11 (2012), 16--6.
[56]
Fahad Pervaiz, Mansoor Pervaiz, Nabeel Abdur Rehman, and Umar Saif. 2012. FluBreaks: Early epidemic detection from Google flu trends. J. Med. Internet Res. 14, 5 (2012).
[57]
Adam Sadilek, Henry A. Kautz, and Vincent Silenzio. 2012. Predicting disease transmission from geo-tagged micro-blog data. In Proceedings of the Conference on Artificial Intelligence (AAAI). 136--142.
[58]
Abeed Sarker, Azadeh Nikfarjam, and Graciela Gonzalez. 2016. Social media mining shared task workshop. In Proceedings of the Pacific Symposium on Biocomputing (Biocomputing’16). World Scientific, 581--592.
[59]
Minglai Shao, Jianxin Li, Feng Chen, Hongyi Huang, Shuai Zhang, and Xunxun Chen. 2017. An efficient approach to event detection and forecasting in dynamic multivariate social media networks. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1631--1639.
[60]
Ross Sparks, Tim Keighley, and David Muscatello. 2010. Exponentially weighted moving average plans for detecting unusual negative binomial counts. IIE Trans. 42, 10 (2010), 721--733.
[61]
Ross S. Sparks, Bella Robinson, Robert Power, Mark Cameron, and Sam Woolford. 2017. An investigation into social media syndromic monitoring. Commun. Stat. Simul. Comput. 46, 8 (2017), 5901--5923.
[62]
Paola Velardi, Giovanni Stilo, Alberto E. Tozzi, and Francesco Gesualdo. 2014. Twitter mining for fine-grained syndromic surveillance. Artif. Intell. Med. 61, 3 (2014), 153--163.
[63]
Edward Velasco, Tumacha Agheneza, Kerstin Denecke, Goeran Kirchner, and Tim Eckmanns. 2014. Social media and internet-based data in global systems for public health surveillance: A systematic review. Milbank Q. 92, 1 (2014), 7--33.
[64]
Michael M. Wagner, Andrew W. Moore, and Ron M. Aryel. 2011. Handbook of Biosurveillance. Elsevier.
[65]
Chen-Kai Wang, Onkar Singh, Zhao-Li Tang, and Hong-Jie Dai. 2017. Using a recurrent neural network model for classification of tweets conveyed influenza-related information. In Proceedings of the International Workshop on Digital Disease Detection Using Social Media 2017 (DDDSM’17). 33--38.
[66]
Shiliang Wang, Michael J. Paul, and Mark Dredze. 2014. Exploring health topics in Chinese social media: An analysis of Sina Weibo. In Proceedings of the AAAI Workshop on the World Wide Web and Public Health Intelligence, Vol. 31. 59.
[67]
Davy Weissenbacher, Abeed Sarker, Michael J. Paul, and Graciela Gonzalez-Hernandez. 2018. Overview of the third social media mining for health (smm4h) shared tasks at emnlp 2018. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop and Shared Task. 13--16.
[68]
Marijke Welvaert, Omar Al-Ghattas, Mark Cameron, and Peter Caley. 2017. Limits of use of social media for monitoring biosecurity events. PloS ONE 12, 2 (2017), e0172457.
[69]
Hyekyung Woo, Youngtae Cho, Eunyoung Shim, Jong-Koo Lee, Chang-Gun Lee, and Seong Hwan Kim. 2016. Estimating influenza outbreaks using both search engine query data and social media data in South Korea. J. Med. Internet Res. 18, 7 (2016).
[70]
Ping Yan, Daniel Zeng, and Hsinchun Chen. 2006. A review of public health syndromic surveillance systems. In Proceedings of the International Conference on Intelligence and Security Informatics. Springer, 249--260.
[71]
S. J. Yan, A. A. Chughtai, and C. R. Macintyre. 2017. Utility and potential of rapid epidemic intelligence from internet-based sources. Int. J. Infect. Dis. 63 (2017), 77--87.
[72]
Roman Yangarber, Peter Von Etter, and Ralf Steinberger. 2008. Content collection and analysis in the domain of epidemiology. In Proceedings of the International Workshop on Describing Medical Web Resources (DrMED’08).
[73]
Andrew Yates, Jon Parker, Nazli Goharian, and Ophir Frieder. 2014. A framework for public health surveillance. In Language Resources and Evaluation Conference. 475--482.
[74]
Antonio Jimeno Yepes, Andrew MacKinlay, and Bo Han. 2015. Investigating public health surveillance using Twitter. In Proceedings of the Workshop on Biomedical Natural Language Processing. 164--170.
[75]
Bin Zou, Vasileios Lampos, and Ingemar Cox. 2018. Multi-task learning improves disease models from web search. In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 87--96.

Cited By

View all
  • (2024)Text Regression Analysis: A Review, Empirical, and Experimental InsightsIEEE Access10.1109/ACCESS.2024.344676512(137333-137344)Online publication date: 2024
  • (2024)A comprehensive survey of text classification techniques and their research applications: Observational and experimental insightsComputer Science Review10.1016/j.cosrev.2024.10066454(100664)Online publication date: Nov-2024
  • (2023)Intelligent Stress Assessment for e-Coaching2023 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI52147.2023.10371856(1638-1643)Online publication date: 5-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 52, Issue 6
November 2020
806 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3368196
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2019
Accepted: 01 August 2019
Revised: 01 June 2019
Received: 01 August 2018
Published in CSUR Volume 52, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Epidemic intelligence
  2. natural language processing

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Text Regression Analysis: A Review, Empirical, and Experimental InsightsIEEE Access10.1109/ACCESS.2024.344676512(137333-137344)Online publication date: 2024
  • (2024)A comprehensive survey of text classification techniques and their research applications: Observational and experimental insightsComputer Science Review10.1016/j.cosrev.2024.10066454(100664)Online publication date: Nov-2024
  • (2023)Intelligent Stress Assessment for e-Coaching2023 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI52147.2023.10371856(1638-1643)Online publication date: 5-Dec-2023
  • (2023)Extraction of Unstructured Electronic Healthcare Records using Natural Language Processing2023 International Conference on Networking and Communications (ICNWC)10.1109/ICNWC57852.2023.10127351(1-6)Online publication date: 5-Apr-2023
  • (2023)Post-Pandemic Follow-Up Audit of Security CheckpointsIEEE Access10.1109/ACCESS.2023.323831111(7599-7616)Online publication date: 2023
  • (2023)Anomaly Detection in Social Media Using Text-Mining and Emotion Classification with Emotion DetectionCognition and Recognition10.1007/978-3-031-22405-8_5(67-78)Online publication date: 1-Jan-2023
  • (2022)FluSa-Tweet: A Benchmark Dataset for Influenza Detection in Saudi Arabia2022 13th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS55353.2022.9811149(346-351)Online publication date: 21-Jun-2022
  • (2022)Neural Natural Language Processing for unstructured data in electronic health recordsComputer Science Review10.1016/j.cosrev.2022.10051146:COnline publication date: 1-Nov-2022
  • (2021)Sosyal medyada otomatik halk sağlığı takibi: Güncel bir derlemeÖmer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi10.28948/ngumuh.778948Online publication date: 6-Jan-2021
  • (2021)Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative StudyJMIR Medical Informatics10.2196/276709:9(e27670)Online publication date: 17-Sep-2021
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media