Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty

Published: 01 May 2012 Publication History

Abstract

Motivation: Several measures have been recently proposed for quantifying the functional similarity between gene products according to well-structured controlled vocabularies where biological terms are organized in a tree or in a directed acyclic graph (DAG) structure. However, existing semantic similarity measures ignore two important facts. First, when calculating the similarity between two terms, they disregard the descendants of these terms. While this makes no difference when the ontology is a tree, we shall show that it has important consequences when the ontology is a DAG—this is the case, for example, with the Gene Ontology (GO). Second, existing similarity measures do not model the inherent uncertainty which comes from the fact that our current knowledge of the gene annotation and of the ontology structure is incomplete. Here, we propose a novel approach based on downward random walks that can be used to improve any of the existing similarity measures to exhibit these two properties. The approach is computationally efficient—random walks do not need to be simulated as we provide formulas to calculate their stationary distributions.
Results: To show that our approach can potentially improve any semantic similarity measure, we test it on six different semantic similarity measures: three commonly used measures by Resnik (1999 ), Lin (1998 ), and Jiang and Conrath (1997 ); and three recently proposed measures: simUI, simGIC by Pesquita et al. (2008 ); GraSM by Couto et al. (2007 ); and Couto and Silva (2011 ). We applied these improved measures to the GO annotations of the yeast Saccharomyces cerevisiae , and tested how they correlate with sequence similarity, mRNA co-expression and protein–protein interaction data. Our results consistently show that the use of downward random walks leads to more reliable similarity measures.
Availability: We have developed a suite of tools that implement existing semantic similarity measures and our improved measures based on random walks. The tools are implemented in Matlab and are freely available from: http://www.paccanarolab.org/papers/GOsim/
Supplementary information: Supplementary data are available at Bioinformatics online.

Cited By

View all
  1. Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Bioinformatics
      Bioinformatics  Volume 28, Issue 10
      May 2012
      115 pages

      Publisher

      Oxford University Press, Inc.

      United States

      Publication History

      Published: 01 May 2012

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media