Abstract
In this study an approach for detecting biomedical syntax variations through the Named Entity Recognition (NER) called Statistical Character-Based Syntax Similarity (SCSS) is proposed which is used by dictionary-based NER approaches. Named Entity Recognition for biomedical literatures is extraction and recognition of biomedical names. There are different types of NER approaches, that the most common one is dictionary-based approaches. For a given unknown pattern, Dictionary-Based approaches, search through a biomedical dictionary and finds the most common similar patterns to assign their biomedical types to the given unknown pattern. Biomedical literatures include syntax variations, which means two different patterns, refer to the same biomedical named entity. Hence a similarity function should be able to support all of the possible syntax variations. There are three syntax variations namely: (i) character-level, (ii) word-level, and (iii) word order. The SCSS is able to detect all of the mentioned syntax vitiations. This study is evaluated based on two measures: recall and precision which are used to calculate a balanced F-score. Result is satisfied as recall is 92.47% and precision is 96.7%, while the f-test is 94.53%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tsuruoka, Y., Tsujii, J.: Improving the Performance of Dictionary-Based Approaches in Protein Name Recognition. Journal of Biomedical Informatics 37, 461–470 (2004)
Krauthammer, M.: Using BLAST for Identifying Gene and Protein Names in Journal Articles. Journal of Gene 259(1–2), 245–252 (2000)
Collier, N., Nobata, C., Tsujii, J.: Extracting the Names of Genes and Gene Products with a Hidden Markov Model. In: Proceedings of the 17th International Conferences on Computational Linguistics, pp. 201–207 (2000)
Morgan, A.: Gene Name Identification and Normalization using a Model Organism Database. Journal of Biomedical Informatics 37, 396–410 (2004)
Zhou, G.D., Su, J.: Named Entity Recognition using an HMM-Based Chunk Tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 473–480 (July 2002)
Proux, D.: Detecting Gene Symbols and Names in Biomedical Texts: A First Step Toward Pertinent Information. In: Proceedings of the 9th Workshop on Genome Informatics, pp. 72–80 (1998)
Chinchor, N.: MUC-7 Information Extraction Task Definition. In: Proceedings of the 7th Message Understanding Conf. (1998)
Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996)
Kim, J.D.: GENIA Corpus—A Semantically Annotated Corpus for Bio-Textmining. Bioinformatics 19(suppl. 1), i180–i182 (2003)
Fukuda: Towards Information Extraction: Identifying Protein Names from Biological Papers. In: Proceedings of the Pacific Symp. on Biocomputing, Wailea, HI, pp. 707–718 (1998)
Hanisch, D.: Playing Biology’s Name Game: Identifying Protein Names in Scientific Text. In: Hanisch, D. (ed.) Proceedings of the Pacific Symp. on Biocomputing, pp. 403–414 (2003)
Gaizauskas, R., Demetriou, G., Humphreys, K.: Protein Structures and Information Extraction from Biological Texts: The PASTA System. Journal of Bioinformatics 19(1), 135–143 (2003)
Drymonas, E., Zervanou, K., Petrakis, E.G.M.: Exploiting Multi-Word Similarity for Retrieval in Medical Document Collections: the TSRM Approach. Journal of Digital Information Management 8(5), 315–321 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tohidi, H., Ibrahim, H., Azmi, M.A. (2011). Statistical Character-Based Syntax Similarity Measurement for Detecting Biomedical Syntax Variations through Named Entity Recognition. In: Fong, S. (eds) Networked Digital Technologies. NDT 2011. Communications in Computer and Information Science, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22185-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-22185-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22184-2
Online ISBN: 978-3-642-22185-9
eBook Packages: Computer ScienceComputer Science (R0)