Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2506583.2506651acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

Measuring Relatedness Between Scientific Entities in Annotation Datasets

Published: 22 September 2013 Publication History

Abstract

Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.

References

[1]
Classified transporter families in arabidopsis. http://www.clfs.umd.edu/CBMG/faculty/sze/lab/AtTransporters.html.
[2]
D. Aumueller, H. H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with coma++. In SIGMOD Conference, pages 906--908, 2005.
[3]
Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Springer, 2011.
[4]
S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(4):509--522, 2002.
[5]
M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57(2):75--94, 2005.
[6]
J. Benik, C. Chang, L. Raschid, M. E. Vidal, G. Palma, and A. Thor. Finding cross genome patterns in annotation graphs. In Proceedings of Data Integration in the Life Sciences (DILS), 2012.
[7]
S. Bhagwani, S. Satapathy, and H. Karnick. Semantic textual similarity using maximal weighted bipartite graph matching. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pages 579--585. Association for Computational Linguistics, 2012.
[8]
K. Bleakley and Y. Yamanishi. Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics, 25(18):2397--2403, 2009.
[9]
C. Chen, S. Hsieh, Y. Weng, W. Chang, and F. Lai. Semantic similarity measure in biomedical domain leverage web search engine. Proc.IEEE Eng Med Biol Soc, pages 4436--4439, 2010.
[10]
W. Cook and A. Rohe. Blossom iv: Code for minimum weight perfect matchings. http://www2.isye.gatech.edu/~wcook/software.html.
[11]
M. A. Jaro. Probabilistic linkage of large public health data files. Statistics in Medicine, pages 491--498, 1995.
[12]
J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. CoRR, cmp-lg/9709008, 1997.
[13]
J. K. Kalervo Jarvelin. Cumulated gain-based evaluation of ir techniques. JACM Transactions on Information Systems, 20(4):422--446, 2002.
[14]
D. Lin. An information-theoretic definition of similarity. In ICML, pages 296--304, 1998.
[15]
B. McInnes, T. Pedersen, and S. Pakhomov. Umls-interface and umls-similarity: Open source software for measuring paths and semantic similarity. Proceedings of the AMIA Symposium, pages 431--435, 2009.
[16]
S. Pakhomov, B. McInnes, T. Adam, Y. Liu, T. Pedersen, and G. Melton. Semantic similarity and relatedness between clinical terms: An experimental study. Proceedings of the AMIA Symposium, pages 572--576, 2010.
[17]
T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007.
[18]
V. Pekar and S. Staab. Taxonomy learning - factoring the structure of a taxonomy into a semantic classification decision. In COLING, 2002.
[19]
C. Pesquita, D. Faria, A. Falcão, P. Lord, and F. Couto. Semantic similarity in biomedical ontologies. PLoS Computational Biology, 5(7):e1000443, 2009.
[20]
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pages 448--453, 1995.
[21]
J. Schwartz, A. Steger, and A. Weißl. Fast algorithms for weighted bipartite matching. In WEA, pages 476--487, 2005.
[22]
Y. Shavitt, E. Weinsberg, and U. Weinsberg. Estimating peer similarity using distance of shared files. In International workshop on peer-to-peer systems (IPTPS), volume 104, 2010.
[23]
C. Shi, X. Kong, P. S. Yu, S. Xie, and B. Wu. Relevance search in heterogeneous networks. In EDBT, pages 180--191, 2012.
[24]
P. Shvaiko and J. Euzenat. Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng., 25(1):158--176, 2013.
[25]
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, March 1981.
[26]
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, pages 195--197, 1981.
[27]
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11):992--1003, 2011.
[28]
A. Thor, T. Kirsten, and E. Rahm. Instance-based matching of hierarchical ontologies. In BTW, pages 436--448, 2007.
[29]
J. Z. Wang, Z. Du, R. Payattakool, P. S. Yu, and C.-F. Chen. A new method to measure the semantic similarity of go terms. Bioinformatics, 23(10):1274--1281, 2007.

Cited By

View all
  • (2019)Content based News Recommendation via Shortest Entity Distance over Knowledge GraphsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317703(690-699)Online publication date: 13-May-2019
  • (2018)A Hybrid Approach for Measuring Similarity between Government Documents of ChinaProceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence10.1145/3297156.3297248(431-435)Online publication date: 8-Dec-2018
  • (2016)Efficient Graph-Based Document SimilarityProceedings of the 13th International Conference on The Semantic Web. Latest Advances and New Domains - Volume 967810.1007/978-3-319-34129-3_21(334-349)Online publication date: 29-May-2016
  • Show More Cited By

Index Terms

  1. Measuring Relatedness Between Scientific Entities in Annotation Datasets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
    September 2013
    987 pages
    ISBN:9781450324342
    DOI:10.1145/2506583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 September 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Annotation datasets
    2. Annotation similarity
    3. topological distance
    4. weighted bipartite match

    Qualifiers

    • Tutorial
    • Research
    • Refereed limited

    Conference

    BCB'13
    Sponsor:
    BCB'13: ACM-BCB2013
    September 22 - 25, 2013
    Wshington DC, USA

    Acceptance Rates

    BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
    Overall Acceptance Rate 254 of 885 submissions, 29%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Content based News Recommendation via Shortest Entity Distance over Knowledge GraphsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317703(690-699)Online publication date: 13-May-2019
    • (2018)A Hybrid Approach for Measuring Similarity between Government Documents of ChinaProceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence10.1145/3297156.3297248(431-435)Online publication date: 8-Dec-2018
    • (2016)Efficient Graph-Based Document SimilarityProceedings of the 13th International Conference on The Semantic Web. Latest Advances and New Domains - Volume 967810.1007/978-3-319-34129-3_21(334-349)Online publication date: 29-May-2016
    • (2015)Proactive Annotation Management in Relational DatabasesProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2749435(2017-2030)Online publication date: 27-May-2015
    • (2015)Even Metadata is Getting BigProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735355(1409-1414)Online publication date: 27-May-2015
    • (2015)Determining similarity of scientific entities in annotation datasetsDatabase10.1093/database/bau1232015Online publication date: 27-Feb-2015
    • (2015)OnSim: A Similarity Measure for Determining Relatedness Between Ontology TermsData Integration in the Life Sciences10.1007/978-3-319-21843-4_6(70-86)Online publication date: 8-Jul-2015
    • (2015)Exploiting Semantics from Ontologies to Enhance Accuracy of Similarity MeasuresProceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 908810.1007/978-3-319-18818-8_52(795-805)Online publication date: 31-May-2015
    • (2014)InsightNotesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610501(661-672)Online publication date: 18-Jun-2014
    • (2014)HeteSim: A General Framework for Relevance Measure in Heterogeneous NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.229792026:10(2479-2492)Online publication date: Oct-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media