Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2110363.2110405acmconferencesArticle/Chapter ViewAbstractPublication PagesihiConference Proceedingsconference-collections
research-article

Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet

Published: 28 January 2012 Publication History

Abstract

Automated measures of semantic relatedness are important for effectively processing medical data for a variety of tasks such as information retrieval and natural language processing. In this paper, we present a context vector approach that can compute the semantic relatedness between any pair of concepts in the Unified Medical Language System (UMLS). Our approach has been developed on a corpus of inpatient clinical reports. We use 430 pairs of clinical concepts manually rated for semantic relatedness as the reference standard. The experiments demonstrate that incorporating a combination of the UMLS and WordNet definitions can improve the semantic relatedness. The paper also shows that second order co-occurrence vector measure is a more effective approach than path-based methods for semantic relatedness.

References

[1]
Banerjee, S. and Pedersen, T. 2003. Extended Gloss Overlaps as a Measure of Semantic Relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 805--810.
[2]
Bodenreider, O. and Burgun, A. 2002. Characterizing the definitions of anatomical concepts in WordNet and specialized sources. In Proceedings of the First Global WordNet Conference, 223--230.
[3]
Bodenreider, O. 2004. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32, D267--D270.
[4]
Bousequet, C., Lagier, G., LilloLe, L. A., Le Beller, C., Venot, A., and Jaulent, M. C. 2005. Appraisal of the MedDRA Conseputal Structure for describing and grouping adverse drug reactions. Drug Safety, 28(1), 19--34.
[5]
Budanitsky, A. and Hirst, G. 2006. Evaluation WordNet-based measures of lexical semantic relatedness. Journal of computational linguistics, 32(1), 13--47.
[6]
Burgun, A. and Bodenreider, O. 2001. Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System. In Proceedings of NAACL Workshop, 77--82.
[7]
Chen, B., Foster, G., and Kuhn, R. 2010. Bilingual sense similarity for statistical machine translation. In Proceedings of the ACL, 834--843.
[8]
Erk, K. 2007. A simple, simple-based model for selectional preferences. In Proceedings of the ACL, 858--865.
[9]
Guo, X., Liu, R., Shriver, C. D., Hu, H., and Liebman, M. N. 2006. Assessing semantic similarity measures for the characterization of human regulatory pathways. Journal of Bioinformatics, 22(8), 967--973.
[10]
Hirst, G. and St-Onge, D. 1998. Lexical chains as representations of context for the detection and correction of malapropisms. In WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, 305--332.
[11]
Hogan, D. 2007. Empirical measurements of lexical similarity in noun phrase conjuncts. In Proceedings of the ACL. 680--687.
[12]
Huang, K. C., Geller, J., Halper, M., Perl, Y., and Xu, J. 2009. Using WordNet synonym substitution to enhance UMLS source integration. Journal of Artificial Intelligence in Medicine, 46(2):97--109.
[13]
Jiang, J. and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics (ROCLING X), Taiwan, 19--33.
[14]
Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, 265--283.
[15]
Lesk, M. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation, Toronto, Canada, 24--26.
[16]
Lin, D. An information-theoretic definition of similarity. 1998. In Proceedings of the International Conference on Machine Learning, 296--304.
[17]
Lord, P. W., Stevens, R. D., Brass, A., and Goble, C. A. 2003. Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Journal of Bioinformatics, 19(10),1275--1283.
[18]
McInnes, B, Pedersen T, Pakhomov S. UMLSInterface and UMLS-Similarity: Open Source Software for measuring paths and semantic similarity. In Proceedings of AMIA, 431--435.
[19]
McInnes, B., Pedersen, T., Liu, Y., Pakhomov, S., and Melton, G. 2011. Using Second-order vectors in a Knowledge-based Method for Acronym Disambiguation. In the Proceedings of the Conference on Computational Natural Language Learning (CoNLL), 145--153.
[20]
Mougin F., Burgun, A., and Bodenreider, O. 2006. Using WordNet to Improve the Mapping of Data Elements to UMLS for Data Sources Integration. In Proceedings of AMIA, 574--578.
[21]
Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., and Melton, G. 2010. Semantic similarity and relatedness between clinical terms: an experimental study. In Proceedings of AMIA, 572--576.
[22]
Patwardhan, S. and Pedersen, T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 workshop, making sense of sense: Bringing computational linguistics and psycholinguistics together. Trento, Italy, 1--8.
[23]
Patwardhan, S. 2003. Incorporating dictionary and corpus information into context vector measure of semantic relatedness. Master of Science Thesis, Duluth, MN: Department of Computer Science. Duluth: University of Minnesota.
[24]
Pedersen, T., Pakhomov, S., Patwardhan, S., and Chute, C. 2006. Measures of Semantic Similarity and Relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3), 288--299.
[25]
Pedersen, T., Patwardhan, S., and Michelizzi, J. 2004. WordNet:: Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT NAACL, 38--41.
[26]
Ponzetto, S. and Strube, M. 2007. An API for measuring the relatedness of words in Wikipedia. In Proceedings of the ACL, 49--52.
[27]
Pucher, M. 2007. WordNet-based semantic relatedness measures in automatic speech recognition for meetings. In Proceedings of the ACL, 129--132.
[28]
Rada, R., Mili, H., Bicknell, E., and Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17--30.
[29]
Resnik, P. 1995. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, 448--453.
[30]
Schutze, H. 1998. Automatic word sense discrimination. Compute Linguist, 24(1), 97--124.
[31]
Tsang, V. and Stevenson, S. 2010. A graph-theoretic framework for semantic distance. Journal of Computational Linguistics, 36(1), 31--69.
[32]
Wu, Z. and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Meeting of Association of Computational Linguistics, Las Cruces, NM, 133--138.

Cited By

View all
  • (2023)Identifying the driving factors of word co-occurrence: a perspective of semantic relationsScientometrics10.1007/s11192-023-04851-x128:12(6471-6494)Online publication date: 30-Oct-2023
  • (2022)Rel Topic : A graph-based semantic relatedness measure in topic ontologies and its applicability for topic labeling of old press articlesSemantic Web10.3233/SW-22291914:2(293-321)Online publication date: 15-Dec-2022
  • (2022)A vector-based semantic relatedness measure using multiple relations within SNOMED CT and UMLSJournal of Biomedical Informatics10.1016/j.jbi.2022.104118131:COnline publication date: 1-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IHI '12: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
January 2012
914 pages
ISBN:9781450307819
DOI:10.1145/2110363
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational linguistics
  2. semantic relatedness
  3. umls
  4. wordnet

Qualifiers

  • Research-article

Conference

IHI '12
Sponsor:
IHI '12: ACM International Health Informatics Symposium
January 28 - 30, 2012
Florida, Miami, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Identifying the driving factors of word co-occurrence: a perspective of semantic relationsScientometrics10.1007/s11192-023-04851-x128:12(6471-6494)Online publication date: 30-Oct-2023
  • (2022)Rel Topic : A graph-based semantic relatedness measure in topic ontologies and its applicability for topic labeling of old press articlesSemantic Web10.3233/SW-22291914:2(293-321)Online publication date: 15-Dec-2022
  • (2022)A vector-based semantic relatedness measure using multiple relations within SNOMED CT and UMLSJournal of Biomedical Informatics10.1016/j.jbi.2022.104118131:COnline publication date: 1-Jul-2022
  • (2021)The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content AnalysisJMIR Medical Informatics10.2196/206759:8(e20675)Online publication date: 27-Aug-2021
  • (2021)Secure data outsourcing in presence of the inference problem: A graph-based approachJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.09.006Online publication date: Oct-2021
  • (2019)A Hybrid Semantic Relatedness Algorithm by Entity Co-Occurrence and Specialized Word Embeddings2019 IEEE International Conference on Healthcare Informatics (ICHI)10.1109/ICHI.2019.8904663(1-2)Online publication date: Jun-2019
  • (2019)deepBioWSD: effective deep neural word sense disambiguation of biomedical text dataJournal of the American Medical Informatics Association10.1093/jamia/ocy189Online publication date: 26-Feb-2019
  • (2019)Concept Embedding to Measure Semantic Relatedness for Biomedical Information OntologiesJournal of Biomedical Informatics10.1016/j.jbi.2019.103182(103182)Online publication date: Apr-2019
  • (2017)Modeling physicians' utterances to explore diagnostic decision-makingProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172406(3700-3706)Online publication date: 19-Aug-2017
  • (2017)Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vecBMC Medical Informatics and Decision Making10.1186/s12911-017-0498-117:1Online publication date: 3-Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media