Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

SymptomID: A Framework for Rapid Symptom Identification in Pandemics Using News Reports

Published: 08 September 2021 Publication History

Abstract

The ability to quickly learn fundamentals about a new infectious disease, such as how it is transmitted, the incubation period, and related symptoms, is crucial in any novel pandemic. For instance, rapid identification of symptoms can enable interventions for dampening the spread of the disease. Traditionally, symptoms are learned from research publications associated with clinical studies. However, clinical studies are often slow and time intensive, and hence delays can have dire consequences in a rapidly spreading pandemic like we have seen with COVID-19. In this article, we introduce SymptomID, a modular artificial intelligence–based framework for rapid identification of symptoms associated with novel pandemics using publicly available news reports. SymptomID is built using the state-of-the-art natural language processing model (Bidirectional Encoder Representations for Transformers) to extract symptoms from publicly available news reports and cluster-related symptoms together to remove redundancy. Our proposed framework requires minimal training data, because it builds on a pre-trained language model. In this study, we present a case study of SymptomID using news articles about the current COVID-19 pandemic. Our COVID-19 symptom extraction module, trained on 225 articles, achieves an F1 score of over 0.8. SymptomID can correctly identify well-established symptoms (e.g., “fever” and “cough”) and less-prevalent symptoms (e.g., “rashes,” “hair loss,” “brain fog”) associated with the novel coronavirus. We believe this framework can be extended and easily adapted in future pandemics to quickly learn relevant insights that are fundamental for understanding and combating a new infectious disease.

References

[1]
Harshavardhan Achrekar, Avinash Gandhe, Ross Lazarus, Ssu-Hsin Yu, and Benyuan Liu. 2011. Predicting flu trends using twitter data. In Proceedings of the 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS’11). IEEE, 702–707.
[2]
E. Alsentzer, J. Murphy, W. Boag, W. Weng, D. Jindi, T. Naumann, and M. McDermott. 2019. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop.
[3]
Amazon Mechanical Turk. 2020. Retrieved from https://www.mturk.com/.
[4]
I. Bedmar, P. Martínez, and M. Zazo. 2013. Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’13).
[5]
Angelo Carfì, Roberto Bernabei, Francesco Landi, et al. 2020. Persistent symptoms in patients after acute COVID-19. J. Am. Med. Assoc. 324, 6 (2020), 603–605.
[6]
Center of Disease Control and Prevention (CDC). 2020. Symptoms of Coronavirus. Retrieved from https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html?utm_campaign=AC_CRNA.
[7]
J. Chen, L. Wu, J. Zhang, L. Zhang, D. Gong, Y. Zhao, S. Hu, Y. Wang, X. Hu, and B. Zheng. 2020. Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: A prospective study. Scientific Reports 10, 19196 (2020).
[8]
Nanshan Chen, Min Zhou, Xuan Dong, Jieming Qu, Fengyun Gong, Yang Han, Yang Qiu, Jingli Wang, Ying Liu, Yuan Wei, et al. 2020. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study. The Lancet 395, 10223 (2020), 507–513.
[9]
M. Cinelli, W. Quattrociocchi, A. Galeazzi, C. Valensise, E. Brugnoli, A. Schmidt, P. Zola, F. Zollo, and A. Scala. 2020. The COVID-19 social media infodemic. Scientific Reports 10, 16598 (2020).
[10]
S. Clinchant, W. Jung, and V. Nikoulina. 2019. On the use of BERT for neural machine translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation.
[11]
N. Colic, L. Furrer, and F. Rinaldi. 2020. Annotating the Pandemic: Named entity recognition and Normalisation in COVID-19 Literature. Retrieved from openreview.net.
[12]
A. Conneau and G. Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’19).
[13]
Aron Culotta. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the 1st Workshop on Social Media Analytics. 115–122.
[14]
J. Devlin, M. Chang, K. Lee, and K. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805v2. Retrieved from http://arxiv.org/abs/1810.04805v2.
[15]
Mohamed E. El Zowalaty and Josef D. Järhult. 2020. From SARS to COVID-19: A previously unknown SARS-CoV-2 virus of pandemic potential infecting humans—Call for a One Health approach. One Health 9, 100124 (2020), 100124.
[16]
M. Ester, H. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the AAAI Annual Conference on Artificial Intelligence (AAAI’96).
[17]
EventRegistry. 2020. Retreived from http://eventregistry.org/.
[18]
L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378–382.
[19]
K. Hakala and S. Pyysalo. 2019. Biomedical named entity recognition with multilingual BERT. In BioNLP Open Shared Tasks@EMNLP.
[20]
E. Hemdan, M. Shouman, and M. Karar. 2020. COVIDX-Net: A framework of deep learning classifiers to diagnose COVID-19 in X-ray images(unpublished). arXiv:2003.11055.
[21]
A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from http://arxiv.org/abs/1704.04861.
[22]
Chaolin Huang, Yeming Wang, Xingwang Li, Lili Ren, Jianping Zhao, Yi Hu, Li Zhang, Guohui Fan, Jiuyang Xu, Xiaoying Gu, et al. 2020. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet 395, 10223 (2020), 497–506.
[23]
Siddique Latif, Muhammad Usman, Sanaullah Manzoor, Waleed Iqbal, Junaid Qadir, Gareth Tyson, Ignacio Castro, Adeel Razi, Maged N. Kamel Boulos, Adrian Weller, et al. 2020. Leveraging data science to combat COVID-19: A comprehensive review. IEEE Trans. Artif. Intell. 1, 1 (2020), 85–103.
[24]
Stephen A. Lauer, Kyra H. Grantz, Qifang Bi, Forrest K. Jones, Qulu Zheng, Hannah R. Meredith, Andrew S. Azman, Nicholas G. Reich, and Justin Lessler. 2020. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Ann. Intern. Med. 172, 9 (2020), 577–582.
[25]
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kin, C. So, and J. Kand.2019. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019).
[26]
J. Li, S. Shang, and L. Shao. 2020. MetaNER: Named entity recognition with meta-learning. In Proceedings of the Annual Conference on the World Wide Web (WWW’20).
[27]
J. Li, A. Sun, J. Han, and C. Li. 2020. A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. (2020).
[28]
J. Li, A. Sun, and M. Ma. 2020. Neural named entity boundary detection. IEEE Trans. Knowl. Data Eng. (2020).
[29]
J. Li, D. Ye, and S. Shang. 2019. Adversarial transfer for named entity boundary detection with pointer networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’19).
[30]
C. Lin, S. Bethard, D. Dligach, F. Sadeque, G. Savova, and T. Miller. 2020. Does BERT need domain adaptation for clinical negation detection?J. Am. Med. Inf. Assoc. 27, 4 (2020), 584–591.
[31]
LocalTurk. 2020. Retrieved from https://github.com/danvk/localturk.
[32]
I. Loshchilov and F. Hutter. 2019. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR’19).
[33]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvist. Invest. 30, 1 (2007), 3–26.
[34]
Eskild Petersen, Marion Koopmans, Unyeong Go, Davidson H. Hamer, Nicola Petrosillo, Francesco Castelli, Merete Storgaard, Sulien Al Khalili, and Lone Simonsen. 2020. Comparing SARS-CoV-2 with SARS-CoV and influenza pandemics. Lancet Infect. Dis. 20, 9 (2020), 238–244.
[35]
T. Pires, Eva Schlinger, and D. Garrette. 2019. How multilingual is multilingual BERT? arXiv:1906.01502. Retrieved from http://arxiv.org/abs/1906.01502.
[36]
Temiloluwa Prioleau. 2021. Learning from the experiences of COVID-19 survivors: Web-based survey study. JMIR Form Res 5, 5 (2021).
[37]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[38]
Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. arXiv:cmp-lg/9505040v1. Retrieved from https://arxiv.org/abs/cmp-lg/9505040v1.
[39]
B. Roquette, H. Nagano, E. Marujo, and A. Maiorano. 2020. Prediction of admission in pediatric emergency department with deep neural networks and triage textual data. Neural Netw. 126 (2020), 170–177.
[40]
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Mat. 20 (1987), 53–65.
[41]
J. SaIRE and R. Navarro. 2020. What is the people posting about symptoms related to coronavirus in bogota, colombia?. arXiv:2003.11159. Retrieved from http://arxiv.org/abs/2003.11159.
[42]
E. Sang. 2002. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning.
[43]
E. Sang and F. Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning.
[44]
J. Sim and C. Wright. 2005. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Phys. Ther. 85 (2005).
[45]
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from http://arxiv.org/abs/1409.1556.
[46]
K. Tomanek and U. Hahn. 2009. Reducing class imbalance during active learning for named entity annotation. In Proceedings of the International Conference on Knowledge Capture (K-CAP’09).
[47]
H. Tsai, J. Riesa, M. Johnson, N. Arivazha, X. Li, and A. Archer. 2019. Small and practical BERT models for sequence labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19).
[48]
Ö. Uzuner, B. South, S. Shen, and S. DuVall. 2011. 2010 I2B2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inf. Assoc. 18, 5 (2011), 552–556.
[49]
Jingqi Wang, Huy Anh, Frank Manion, Masoud Rouhizadeh, and Yaoyun Zhang. 2020. COVID-19 SignSym–A fast adaptation of general clinical NLP tools to identify and normalize COVID-19 signs and symptoms to OMOP common data model. arXiv:2007.10286v3.
[50]
Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, et al. 2020. CORD-19: The Covid-19 Open Research Dataset. In ACL NLP-COVID Workshop 2020.
[51]
S. Wang, B. Kang, J. Ma, X. Zeng, M. Xiao, J. Guo, M. Cai, J. Yang, Y. Li, and X. Meng. 2020. A deep learning algorithm using CT images to screen for corona virus disease (COVID-19). Eur Radiol. 31, 8 (2020), 6096–6104.
[52]
Shui-Hua Wang, Xiaosheng Wu, Zhang Yu-Dong, and Zhang Xin Tanf, Chaosheng. 2020. Diagnosis of COVID-19 by wavelet renyi entropy and three-segment biogeography-based optimization. Int. J. Comput. Intell. Syst. 13, 1 (2020), 1332–1344.
[53]
X. Wang, X. Song, B. Li, Y. Guan, and J. Han. 2020. Comprehensive named entity recognition on CORD-19 with distant or weak supervision. arXiv:2003.12218. Retrieve from http://arxiv.org/abs/2003.12218.
[54]
V. Yadav and S. Bethard. 2019. A survey on recent advances in named entity recognition from deep learning models. arXiv:1910.11470v1. Retrieved from http://arxiv.org/abs/1910.11470v1.
[55]
W. Yang, Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin. 2019. End-to-end open-domain question answering with BERTserini. arXiv:1902.01718v2. Retreived from http://arxiv.org/abs/1902.01718v2.
[56]
Yongshi Yang, Fujun Peng, Runsheng Wang, Kai Guan, Taijiao Jiang, Guogang Xu, Jinlyu Sun, and Christopher Chang. 2020. The deadly coronaviruses: The 2003 SARS pandemic and the 2020 novel coronavirus epidemic in China. J. Autoimmun. (2020), 102434.
[57]
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’19).
[58]
Z. Zhou, M. Siddiquee, N. Tajbakhsh, and J. Liang. 2018. Unet++: A nested U-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.

Cited By

View all
  • (2024)CafeLLM: Context-Aware Fine-Grained Semantic Clustering Using Large Language ModelsGeneralizing from Limited Resources in the Open World10.1007/978-981-97-6125-8_6(66-81)Online publication date: 28-Jul-2024
  • (2023)A Human-in-the-Loop Segmented Mixed-Effects Modeling Method for Analyzing Wearables DataACM Transactions on Management Information Systems10.1145/356427614:2(1-17)Online publication date: 25-Jan-2023
  • (2022)Learning the Morphological and Syntactic Grammars for Named Entity RecognitionInformation10.3390/info1302004913:2(49)Online publication date: 20-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Management Information Systems
ACM Transactions on Management Information Systems  Volume 12, Issue 4
December 2021
225 pages
ISSN:2158-656X
EISSN:2158-6578
DOI:10.1145/3483349
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2021
Accepted: 01 April 2021
Revised: 01 March 2021
Received: 01 September 2020
Published in TMIS Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Symptom identification
  2. named entity extraction
  3. BERT
  4. novel pandemics
  5. COVID-19
  6. news articles

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Science Foundation (NSF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)156
  • Downloads (Last 6 weeks)23
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CafeLLM: Context-Aware Fine-Grained Semantic Clustering Using Large Language ModelsGeneralizing from Limited Resources in the Open World10.1007/978-981-97-6125-8_6(66-81)Online publication date: 28-Jul-2024
  • (2023)A Human-in-the-Loop Segmented Mixed-Effects Modeling Method for Analyzing Wearables DataACM Transactions on Management Information Systems10.1145/356427614:2(1-17)Online publication date: 25-Jan-2023
  • (2022)Learning the Morphological and Syntactic Grammars for Named Entity RecognitionInformation10.3390/info1302004913:2(49)Online publication date: 20-Jan-2022
  • (2021)Introduction to the Special Section on Using AI and Data Science to Handle Pandemics and Related DisruptionsACM Transactions on Management Information Systems10.1145/348696912:4(1-2)Online publication date: 22-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media