Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Statistical Character-Based Syntax Similarity Measurement for Detecting Biomedical Syntax Variations through Named Entity Recognition

  • Conference paper
Networked Digital Technologies (NDT 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 136))

Included in the following conference series:

  • 1017 Accesses

Abstract

In this study an approach for detecting biomedical syntax variations through the Named Entity Recognition (NER) called Statistical Character-Based Syntax Similarity (SCSS) is proposed which is used by dictionary-based NER approaches. Named Entity Recognition for biomedical literatures is extraction and recognition of biomedical names. There are different types of NER approaches, that the most common one is dictionary-based approaches. For a given unknown pattern, Dictionary-Based approaches, search through a biomedical dictionary and finds the most common similar patterns to assign their biomedical types to the given unknown pattern. Biomedical literatures include syntax variations, which means two different patterns, refer to the same biomedical named entity. Hence a similarity function should be able to support all of the possible syntax variations. There are three syntax variations namely: (i) character-level, (ii) word-level, and (iii) word order. The SCSS is able to detect all of the mentioned syntax vitiations. This study is evaluated based on two measures: recall and precision which are used to calculate a balanced F-score. Result is satisfied as recall is 92.47% and precision is 96.7%, while the f-test is 94.53%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Tsuruoka, Y., Tsujii, J.: Improving the Performance of Dictionary-Based Approaches in Protein Name Recognition. Journal of Biomedical Informatics 37, 461–470 (2004)

    Article  Google Scholar 

  2. Krauthammer, M.: Using BLAST for Identifying Gene and Protein Names in Journal Articles. Journal of Gene 259(1–2), 245–252 (2000)

    Article  Google Scholar 

  3. Collier, N., Nobata, C., Tsujii, J.: Extracting the Names of Genes and Gene Products with a Hidden Markov Model. In: Proceedings of the 17th International Conferences on Computational Linguistics, pp. 201–207 (2000)

    Google Scholar 

  4. Morgan, A.: Gene Name Identification and Normalization using a Model Organism Database. Journal of Biomedical Informatics 37, 396–410 (2004)

    Article  Google Scholar 

  5. Zhou, G.D., Su, J.: Named Entity Recognition using an HMM-Based Chunk Tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 473–480 (July 2002)

    Google Scholar 

  6. Proux, D.: Detecting Gene Symbols and Names in Biomedical Texts: A First Step Toward Pertinent Information. In: Proceedings of the 9th Workshop on Genome Informatics, pp. 72–80 (1998)

    Google Scholar 

  7. Chinchor, N.: MUC-7 Information Extraction Task Definition. In: Proceedings of the 7th Message Understanding Conf. (1998)

    Google Scholar 

  8. Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996)

    Google Scholar 

  9. Kim, J.D.: GENIA Corpus—A Semantically Annotated Corpus for Bio-Textmining. Bioinformatics 19(suppl. 1), i180–i182 (2003)

    Article  Google Scholar 

  10. Fukuda: Towards Information Extraction: Identifying Protein Names from Biological Papers. In: Proceedings of the Pacific Symp. on Biocomputing, Wailea, HI, pp. 707–718 (1998)

    Google Scholar 

  11. Hanisch, D.: Playing Biology’s Name Game: Identifying Protein Names in Scientific Text. In: Hanisch, D. (ed.) Proceedings of the Pacific Symp. on Biocomputing, pp. 403–414 (2003)

    Google Scholar 

  12. Gaizauskas, R., Demetriou, G., Humphreys, K.: Protein Structures and Information Extraction from Biological Texts: The PASTA System. Journal of Bioinformatics 19(1), 135–143 (2003)

    Article  Google Scholar 

  13. Drymonas, E., Zervanou, K., Petrakis, E.G.M.: Exploiting Multi-Word Similarity for Retrieval in Medical Document Collections: the TSRM Approach. Journal of Digital Information Management 8(5), 315–321 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tohidi, H., Ibrahim, H., Azmi, M.A. (2011). Statistical Character-Based Syntax Similarity Measurement for Detecting Biomedical Syntax Variations through Named Entity Recognition. In: Fong, S. (eds) Networked Digital Technologies. NDT 2011. Communications in Computer and Information Science, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22185-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22185-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22184-2

  • Online ISBN: 978-3-642-22185-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics