Abstract
Biomedical Entity Linking (BM-EL) task, which aims to match biomedical mentions in articles to entities in a certain knowledge base (e.g., the Unified Medical Language System), draws dramatic attention in recent years. BM-EL can help to disambiguate medical terms and link to rich semantic information in the biomedical knowledge base, which can act as an essential means for many downstream applications. Although entity linking tasks have been investigated in the general domain and achieved great success, many challenges remain in the biomedical field, for instance, highly complex terminology, less training data, and entity ambiguity. In this survey, we categorize BM-EL methods into rule-based, machine learning, and deep learning models according to the development of the model paradigm and provide a comprehensive review of each approach. In-depth study of current BM-EL efforts, we group the model architectures into four categories: joint entity recognition and linking, graph-based global entity disambiguation, cross-lingual architectures, and model-efficiency improvement. We further introduce six well-established datasets that are commonly used for BM-EL tasks. Furthermore, we present a comparison of the different methods and discuss their advantages and disadvantages. Finally, we discuss the limitations of existing methods for BM-EL and discuss promising future research directions.
Similar content being viewed by others
Availability of supporting Data
Not applicable.
References
Reddy, C.K., Aggarwal, C.C.: Healthcare data analytics (2015)
Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Res. 32(suppl_1), 267–270 (2004)
Huang, M.-S., Lai, P.-T., Li, P.-Y., You, Y.-T., Tsai, R.T.-H., Hsu, W.-L.: Biomedical named entity recognition and linking datasets: survey and our recent development. Brief. Bioinform. 21(6), 2219–2238 (2020)
Tsai, R.T.-H., Wu, S.-H., Chou, W.-C., Lin, Y.-C., He, D., Hsiang, J., Sung, T.-Y., Hsu, W.-L.: Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics 7(1), 1–8 (2006)
Shen, W., Li, Y., Liu, Y., Han, J., Wang, J.: Yuan, X. Entity linking meets deep learning, Techniques and Solutions (2021)
Sevgili, O., Shelmanov, A., Arkhipov, M., Panchenko, A., Biemann, C.: Neural entity linking: a survey of models based on deep learning arXiv e-prints (2020)
Rao, D., Mcnamee, P., Dredze, M.: Entity linking: finding extracted entities in a knowledge base.Springer Berlin Heidelberg (2013)
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)
Al-Moslmi, T., Ocaa, M.G., Opdahl, A.L., Veres, C.: Named entity extraction for knowledge graphs: a literature overview. IEEE Access 8(1), 32862–32881 (2020)
Zhu, M., Celikkaya, B., Bhatia, P., Reddy, C.K.: Latte: latent type modeling for biomedical entity linking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9757–9764 (2020)
Tutubalina, E., Miftahutdinov, Z., Nikolenko, S., Malykh, V.: Medical concept normalization in social media posts with recurrent neural networks. J. Biomed. Inform. 84, 93–102 (2018)
Karimi, S., Wang, C., Metke-Jimenez, A., Gaire, R., Paris, C.: Text and data mining techniques in adverse drug reaction detection. ACM Computing Surveys (CSUR) 47(4), 1–39 (2015)
Limsopatham, N., Collier, N.: Normalising medical concepts in social media texts by learning semantic representation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers), pp. 1014–1023 (2016)
Miftahutdinov, Z., Tutubalina, E.: Deep neural models for medical concept normalization in user-generated texts (2019)
Yuan, H., Yuan, Z., Yu, S.: Generative biomedical entity linking via knowledge base-guided pre-training and synonyms-aware fine-tuning. arXiv (2022)
Yuan, H., Yuan, Z., Gan, R., Zhang, J., Xie, Y., Yu, S.: BioBART: pretraining and evaluation of a biomedical generative language model. arXiv (2022)
Almeida, T., Antunes, R., F Silva, J., Almeida, J.R., Matos, S.: Chemical identification and indexing in pubmed full-text articles using deep learning and heuristics. Database 2022 (2022)
Leaman, R., Islamaj Doğan, R., Lu, Z.: Dnorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Li, H., Chen, Q., Tang, B., Wang, X., Xu, H., Wang, B., Huang, D.: Cnn-based ranking for biomedical entity normalization. BMC Bioinformatics 18(11), 79–86 (2017)
Wiatrak, M., Iso-Sipila, J.: Simple hierarchical multi-task neural end-to-end entity linking for biomedical text. In: Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, pp. 12–17 (2020)
Fu, X., Zhang, J., Meng, Z., King, I.: Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In: Proceedings of The Web Conference vol. 2020, pp. 2331–2341 (2020)
D’Souza, J., Ng, V.: Sieve-based entity linking for the biomedical domain. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 297–302 (2015)
Kang, N., Singh, B., Afzal, Z., Mulligen, E.M., Kors, J.A.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013)
Leal, A., Martins, B., Couto, F.M.: Ulisboa: recognition and normalization of medical concepts. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 406–411 (2015)
Buyko, E., Tomanek, K., Hahn, u.: 2007. resolution of coordination ellipses in biological named entities using conditional random fields. In: In PACLING 2007 - Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pp. 163–171 (2007)
Savova, G.K., Coden, A.R., Sominsky, I.L., Johnson, R., Ogren, P.V., Groen, P., Chute, C.G.: Word sense disambiguation across two domains: biomedical literature and clinical notes. J. Biomed. Inform. 41(6), 1088–1100 (2008)
Stevenson, M., Guo, Y., Alamri, A., Gaizauskas, R.: Disambiguation of biomedical abbreviations (2009)
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in medline. Bioinform. 21(18), 3658–3664 (2005)
Xu, J., Lee, H.-J., Ji, Z., Wang, J., Wei, Q., Xu, H.: Uth_Ccb system for adverse drug reaction extraction from drug labels at tac-Adr 2017. In: TAC (2017)
Leaman, R., Lu, Z.: Taggerone: joint named entity recognition and normalization with semi-markov models. Bioinformatics 32(18), 2839–2846 (2016)
Luo, Y., Song, G., Li, P., Qi, Z.: Multi-task medical concept normalization using multi-view convolutional neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Schumacher, E., Mulyar, A., Dredze, M.: Clinical concept linking with contextualized neural representations (2020)
Xu, D., Zhang, Z., Bethard, S.: A generate-and-rank framework with semantic type regularization for biomedical concept normalization, pp 8452–8464 (2020)
Ji, Z., Wei, Q., Xu, H.: Bert-based ranking for biomedical entity normalization (2019)
Zhao, S., Liu, T., Zhao, S., Wang, F.: A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 817–824 (2019)
Niu, J., Yang, Y., Zhang, S., Sun, Z., Zhang, W.: Multi-task character-level attentional networks for medical concept normalization. Neural. Process. Lett. 49(3), 1239–1256 (2019)
Deng, P., Chen, H., Huang, M., Ruan, X., Xu, L.: An ensemble cnn method for biomedical entity normalization. In: Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pp. 143–149 (2019)
Murty*, S., Verga*, P., Vilnis, L., Radovanovic, I., McCallum, A.: Hierarchical losses and new resources for fine-grained entity typing and linking. arXiv (2018)
Mondal, I., Purkayastha, S., Sarkar, S., Goyal, P., Pillai, J., Bhattacharyya, A., Gattu, M.: Medical entity linking using triplet network. arXiv preprint. arXiv:2012.11164 (2020)
Fakhraei, S., Mathew, J., Ambite, J.L.: NSEEN: neural semantic embedding for entity normalization. In: Machine Learning and Knowledge Discovery In, pp. 665–680. Springer (2019)
Angell, R., Monath, N., Mohan, S., Yadav, N., McCallum, A.: Clustering-based inference for zero-shot biomedical entity linking. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics. Human Language Technologies
Vretinaris, A., Lei, C., Efthymiou, V., Qin, X., Özcan, F.: Medical entity disambiguation using graph neural networks. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2310–2318 (2021)
Kate, R.J.: Normalizing clinical terms using learned edit distance patterns. J. Am. Med. Inform. Assoc. 23(2), 380–386 (2015)
Lee, K., Hasan, S.A., Farri, O., Choudhary, A., Agrawal, A.: Medical concept normalization for online user-generated texts. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI) (2017)
Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17, Medical Informatics Association (2001)
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010)
Dogan, R.I., Lu, Z.: An inference method for disease name normalization. In: Information Retrieval and Knowledge Discovery in Biomedical Text, Papers from the 2012 AAAI Fall Symposium, Arlington, Virginia, USA, November 2-4, 2012. AAAI Technical Report (2012)
Wermter, J., Tomanek, K., Hahn, U.: High-performance gene name normalization with GeNo. Bioinformatics 25(6), 815–821 (2009)
Zhang, W., Tan, C.L., Su, J., Wang, W.T.: Entity linking leveraging automatically generated annotation. In: The 23rd International Conference on Computational Linguistics, Beijing, pp. 1290–1298. Institute for Infocomm Research (2010)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(ARTICLE), 2493–2537 (2011)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. arXiv (2013)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018)
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinform. 36(4), 1234–1240 (2020)
Wei, Q., Ji, Z., Si, Y., Du, J., Wang, J., Tiryaki, F., Wu, S., Tao, C., Roberts, K., Xu, H.: Relation extraction from clinical narratives using pre-trained language models. In: AMIA Annual Symposium Proceedings, vol. 2019, p. 1236. American Medical Informatics Association (2019)
Vashishth, S., Newman-Griffis, D., Joshi, R., Dutt, R., Rosé, C.P.: Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets. J. Biomed. Inform. 121, 103880 (2021)
Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. arXiv (2020)
Miftahutdinov, Z., Tutubalina, E.: Kfu at Clef Ehealth 2017 Task 1: Icd-10 coding of english death certificates with recurrent neural networks. In: CLEF (Working Notes) (2017)
Cao, N.D., Izacard, G., Riedel, S., Petroni, F.: Autoregressive entity retrieval. coRR (2020)
Rajani, N.F., Bornea, M., Barker, K.: Stacking with auxiliary features for entity linking in the medical domain. In: BioNLP 2017, pp. 39–47 (2017)
Mrini, K., Nie, S., Gu, J., Wang, S., Sanjabi, M., Firooz, H. (2022)
Chen, Z., Ji, H.: Collaborative ranking: a case study on entity linking. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 771–781 (2011)
Chisholm, A., Hachey, B.: Entity disambiguation with web links. Transactions of the Association for Computational Linguistics 3, 145–156 (2015)
Lazic, N., Subramanya, A., Ringgaard, M., Pereira, F.: Plato: a selective context model for entity resolution. Trans. Assoc. Comput. Linguist. 3, 503–515 (2015)
Yamada, I., Shindo, H., Takeda, H., Takefuji, Y. (2016)
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graphbased method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774 (2011)
Cassidy, T., Ji, H., Ratinov, L.-A., Zubiaga, A., Huang, H.: Analysis and Enhancement of Wikification for Microblogs with Context Expansion. In: COLING, vol. 12, pp. 441–456 (2012)
He, Z., Liu, S., Song, Y., Li, M., Zhou, M., Wang, H.: Efficient collective entity linking with stacking. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 426–435 (2013)
Cheng, X., Roth, D.: Relational inference for wikification. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1787–1796 (2013)
Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. Trans. Assoc. Comput. Linguist. 2, 477–490 (2014)
Huang, H., Cao, Y., Huang, X., Ji, H., Lin, C.-Y.: Collective tweet wikification based on semi-supervised graph regularization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 380–390 (2014)
Zheng, J.G., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D., Hendler, J., Ji, H.: Entity linking for biomedical literature. BMC Med. Inform. Decis. Making 15(1), 1–9 (2015)
Pujary, D., Thorne, C., Aziz, W.: Disease Normalization with graph embeddings. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Systems and Applications, pp. 209–217. Springer, Cham (2021)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)
Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.v.d., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: European Semantic Web Conference, pp. 593–607. Springer (2018)
Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl_1), 267–270 (2004)
Mohan, S., Li, D.: Medmentions: a large biomedical corpus annotated with UMLS concepts. arXiv (2019)
Ji, H., Nothman, J., Hachey, B., Florian, R.: Overview of Tac-Kbp2015 Tri-Lingual Entity Discovery and Linking. In: TAC (2015)
Afzal, Z., Akhondi, S.A., Haagen, H., Mulligen, E.M., Kors, J.A.: Biomedical Concept Recognition in French Text Using Automatic Translation of English Terms. In: CLEF (Working Notes) (2015)
Van Mulligen, E.M., Afzal, Z., Akhondi, S., Vo, D., Kors, J.: Erasmus Mc at Clef Ehealth 2016: concept recognition and coding in French texts. In: CEUR Workshop Proceedings, pp. 171–178 (2016)
Jiang, J., Guan, Y., Zhao, C.: Wi-Enre in Clef Ehealth Evaluation Lab 2015: clinical named entity recognition based on Crf. In: CLEF (Working Notes) (2015)
Roller, R., Kittner, M., Weissenborn, D., Leser, U.: Cross-lingual candidate search for biomedical concept Normalization. arXiv (2018)
Liu, F., Vulić, I., Korhonen, A., Collier, N.: Learning Domain-specialised representations for cross-Lingual. Biomedical Entity Linking. arXiv (2021)
Borchert, F.: Schapranow, M.-P. Spanish biomedical entity linking with pre-trained transformers and cross-lingual candidate retrieval, Hpi-dhc@ bioasq distemist (2022)
Lai, T., Ji, H., Zhai, C.: Bert might be overkill: A tiny but effective biomedical entity linker based on residual convolutional neural networks. arXiv preprint. arXiv:2109.02237 (2021)
Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., Poon, H.: Domain-specific language model pretraining for biomedical natural language processing. CoRR (2020)
Chen, L., Varoquaux, G., Suchanek, F.M.: A lightweight neural model for biomedical entity linking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12657–12665 (2021)
Bhowmik, R., Stratos, K., Melo, G.: Fast and effective biomedical entity linking using a dual encoder. arXiv preprint arXiv:2103.05028 (2021)
Ye, D., Lin, Y., Li, P., Sun, M., Liu, Z.: A simple but effective pluggable entity lookup table for pre-trained language models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp. 523–529 (2022)
Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C. -H., Leaman, R., Davis, A.P., Mattingly, C.J., Wiegers, T.C., Lu, Z.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016)
Doğan, R.I., Leaman, R., Lu, Z.: Ncbi disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Davis, A.P., Wiegers, T.C., Rosenstein, M.C., Mattingly, C.J.: MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012 (2012)
Pradhan, S., Elhadad, N., South, B.R., Martinez, D., Christensen, L.M., Vogel, A., Suominen, H., Chapman, W.W., Savova, G.K.: Task 1: Share/Clef Ehealth Evaluation Lab 2013. In: CLEF (Working Notes), Vol. 1179 (2013)
Basaldella, M., Liu, F., Shareghi, E., Collier, N.: COMETA: a corpus for medical entity linking in the social media. arXiv (2020)
Wright, D., Katsis, Y., Mehta, R., Hsu, C.-N.: Normco: deep disease normalization for biomedical knowledge base construction. In: Automated Knowledge Base Construction (AKBC) (2019)
Varma, M., Orr, L., Wu, S., Leszczynski, M., Ling, X., Ré, C.: Cross-domain data integration for named entity disambiguation in biomedical text. arXiv preprint. arXiv:2110.08228 (2021)
Abdurxit, M., Tohti, T., Hamdulla, A.: An efficient method for biomedical entity linking based on inter-and intra-entity attention. Appl. Sci. 12(6), 3191 (2022)
Dong, H., Suárez-Paniagua, V., Zhang, H., Wang, M., Casey, A., Davidson, E., Chen, J., Alex, B., Whiteley, W., Wu, H.: Ontology-based and Weakly Supervised Rare Disease Phenotyping from Clinical Notes. arXiv (2022)
Du, C., Popat, K., Martin, L., Petroni, F.: Entity tagging: extracting entities in text without mention supervision. coRR (2022)
Ayoola, T., Tyagi, S., Fisher, J., Christodoulopoulos, C., Pierleoni, A.: RefinED: an efficient zero-shot-capable approach to end-to-end entity linking. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pp. 209–220. Association for Computational Linguistics, Hybrid: Seattle, Washington + Online (2022)
Dong, S., Miao, X., Liu, P., Wang, X., Cui, B., Li, J.. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 1754–1766 (2022)
Funding
No Funding.
Author information
Authors and Affiliations
Contributions
Jiyun Shi: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data Curation, Writing for Original Draft. Zhimeng Yuan: Methodology, Validation, Formal analysis, Visualization. Wenxuan Guo: Methodology, Validation, Formal analysis. Meihui Zhang: Review and editing, Project administration. Chen Ma: Review and editing. Jiehao Chen: prepared the table, Investigation, Data Curation.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for Publication
Not applicable.
Competing interests
We declare that authors have no known competing interests or personal relationships that might be perceived to influence the discussion reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Knowledge-Graph-Enabled Methods and Applications for the Future Web Guest Editors: Xin Wang, Jeff Pan, Qingpeng Zhang, Yuan-Fang Li
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shi, J., Yuan, Z., Guo, W. et al. Knowledge-graph-enabled biomedical entity linking: a survey. World Wide Web 26, 2593–2622 (2023). https://doi.org/10.1007/s11280-023-01144-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-023-01144-4