Abstract
Scientists are harnessing their multi-disciplinary expertise and resources to fight the COVID-19 pandemic. Aligned with this mindset, the Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the “COVID-19 Open Research Dataset” (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. We report on the RDF dataset and software resources produced in this project by leveraging skills in knowledge representation, text, data and argument mining, as well as data visualization and exploration. The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools based on the Corese Semantic Web platform, MGExplorer visualization library, as well as the Jupyter Notebook technology. All along this initiative, we have been engaged in discussions with healthcare and medical research institutes to align our approach with the actual needs of the biomedical community, and we have paid particular attention to comply with the open and reproducible science goals, and the FAIR principles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
PICO is a framework to answer health-care questions in evidence-based practice that comprises patients/population (P), intervention (I), control/comparison (C) and outcome (O).
- 16.
BERT is a self-attentive transformer models that uses language model (LM) pre-training to learn a task-independent understanding from vast amounts of text in an unsupervised fashion.
- 17.
The intervention and comparison label are treated as one joint class.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
Inputs were tokenized with the BERT tokenizer, where one sub-word token has a length of one to three characters.
- 25.
- 26.
- 27.
Covid-on-the-Web dataset URI: http://ns.inria.fr/covid19/covidontheweb-1-1.
- 28.
- 29.
ODC-By license: http://opendatacommons.org/licenses/by/1.0/.
- 30.
Covid Linked Data Visualizer can be tested at: http://covid19.i3s.unice.fr:8080.
- 31.
- 32.
Dataframes are tabular data structures widely used in Python and R for the data analysis.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
References
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Bersanelli, M.: Controversies about COVID-19 and anticancer treatment with immune checkpoint inhibitors. Immunotherapy 12(5), 269–273 (2020)
Cabrio, E., Villata, S.: Five years of argument mining: a data-driven analysis. Proc. IJCAI 2018, 5427–5433 (2018)
Cava, R.A., Freitas, C.M.D.S., Winckler, M.: Clustervis: visualizing nodes attributes in multivariate graphs. In: Seffah, A., Penzenstadler, B., Alves, C., Peng, X. (eds.) Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, 3–7 April 2017, pp. 174–179. ACM (2017)
Corby, O., Dieng-Kuntz, R., Faron-Zucker, C.: Querying the semantic web with Corese search engine. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, vol. 16, p. 705 (2004)
Daiber, M. Jakob, C. Hokamp, J., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124 (2013)
J. Devlin, M.-W. Chang, K.L., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Farias Lóscio, B., Burle, C., Calegari, N.: Data on the Web Best Practices. W3C Recommandation (2017)
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injecting domain knowledge in electronic medical records to improve hospitalization prediction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 116–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_8
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injection of automatically selected DBpedia subjects in electronic medical records to boost hospitalization prediction. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) The 35th ACM/SIGAPP Symposium on Applied Computing, SAC 2020, online event, 30 March–3 April 2020, pp. 2013–2020. ACM (2020)
Green, N.: Argumentation for scientific claims in a biomedical research article. In: Proceedings of ArgNLP 2014 Workshop (2014)
Jonquet, C., Shah, N.H., Musen, M.A.: The open biomedical annotator. Summit Transl. Bioinf. 2009, 56 (2009)
Mayer, T., Cabrio, E., Villata, S.: ACTA a tool for argumentative clinical trial analysis. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp. 6551–6553 (2019)
Mayer, T., Cabrio, E., Villata, S.: Transformer-based argument mining for healthcare applications. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI) (2020)
Michel, F., Djimenou, L., Faron-Zucker, C., Montagnat, J.: Translation of relational and non-relational databases into RDF with xR2RML. In: Proceeding of the 11th International Conference on Web Information Systems and Technologies (WebIST), Lisbon, Portugal, pp. 443–454 (2015)
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy, pp. 319–327. Association for Computational Linguistics, August 2019
Nye, B., Li, J.J., Patel, R., Yang, Y., Marshall, I., Nenkova, A., Wallace, B.: A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 197–207 (2018)
Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I.: Classification and clustering of arguments with contextualized word embeddings. Proc. ACL 2019, 567–578 (2019)
Tchechmedjiev, A., Abdaoui, A., Emonet, V., Melzi, S., Jonnagaddala, J., Jonquet, C.: Enhanced functionalities for annotating and indexing clinical text with the NCBO annotator+. Bioinformatics 34(11), 1962–1965 (2018)
Wang, L.L., et al.: Cord-19: the COVID-19 open research dataset. ArXiv, abs/2004.10706 (2020)
Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)
Zabkar, J., Mozina, M., Videcnik, J., Bratko, I.: Argument based machine learning in a medical domain. Proc. COMMA 2006, 59–70 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Michel, F. et al. (2020). Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-62466-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62465-1
Online ISBN: 978-3-030-62466-8
eBook Packages: Computer ScienceComputer Science (R0)