Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1851476.1851548acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Modeling sequence and function similarity between proteins for protein functional annotation

Published: 21 June 2010 Publication History

Abstract

A common task in biological research is to predict function for proteins by comparing sequences between proteins of known and unknown function. This is often done using pair-wise sequence alignment algorithms (e.g. BLAST). A problem with this approach is the assumption of a simple equivalence between a minimum sequence similarity threshold and the function similarity between proteins. This assumption is based on the binary concept of homology in that proteins are or not homologous. The relationship between sequence and function however is more complex as well as pertinent for predicting protein function, e.g. evaluating BLAST alignments or developing training sets for profile models based on functional rather than homologous groupings. Our motivation for this study was to model sequence and function similarity between proteins to gain insights into the "sequence-function similarity relationship between proteins for predicting function. Using our model we found that function similarity generally increases with sequence similarity but with a high degree of variability. This result has implications for pair-wise approaches in that it appears sequence similarity must be very high to ensure high function similarity. Profile models which enable higher sensitivity are a potential solution. However, multiple sequences alignments (a necessary prerequisite) are a problem in that current algorithms have difficulty aligning sequences with very low sequence similarity, which is common in our data set, or are intractable for high numbers of sequences. Given the importance of predicting protein function and the need for multiple sequence alignments, algorithms for accomplishing this task should be further refined and developed.

References

[1]
}}Altschul, S. F., Madden, T. L., Schäffer, A. A., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 17 (1997), 3389--3402.
[2]
}}Ashburner, M., Ball, C. A., Blake, J. A., et al. Gene Ontology: tool for the unification of biology. Nat Genet 25, 1 (2000), 25--29.
[3]
}}Blackshields, G., Wallace, I. M., Larkin, M., and Higgins, D. G. Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biology 6, 4 (2006), 321--339.
[4]
}}Bork, P. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Research 10, 4 (2000), 398--400.
[5]
}}Brenner, S. E. Errors in genome annotation. Trends in Genetics 15, 4 (1999), 132--133.
[6]
}}Devos, D. and Valencia, A. Practical limits of function prediction. Proteins: Structure, Function, and Genetics 41, 1 (2000), 98--107.
[7]
}}Eddy, S. R. Profile hidden Markov models. Bioinformatics 14, 9 (1998), 755--763.
[8]
}}Gough, J., Karplus, K., Hughey, R., and Chothia, C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Journal of Molecular Biology 313, 4 (2001), 903--919.
[9]
}}Jones, C., Brown, A., and Baumann, U. Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 8, 1 (2007), 170.
[10]
}}Kolker, E., Makarova, K. S., Shabalina, S., et al. Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae. Nucl. Acids Res. 32, 8 (2004), 2353--2361.
[11]
}}Kolker, E., Picone, A. F., Galperin, M. Y., et al. Global profiling of Shewanella oneidensis MR-1: Expression of hypothetical genes and improved functional annotations. PNAS 102, 6 (2005), 2099--2104.
[12]
}}Kolker, E., Purvine, S., Galperin, M. Y., et al. Initial Proteome Analysis of Model Microorganism Haemophilus influenzae Strain Rd KW20. J. Bacteriol. 185, 15 (2003), 4593--4602.
[13]
}}Lin, D. An Information-Theoretic Definition of Similarity. Proceeding of the 15th International Conference on Machine Learning, (1998), 296--304.
[14]
}}Lord, P. W., Stevens, R. D., Brass, A., and Goble, C. A. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 10 (2003), 1275--1283.
[15]
}}Louie, B., Bergen, S., Higdon, R., and Kolker, E. Quantifying Protein Function Specificity in the Gene Ontology. Standards in Genomic Sciences (in press).
[16]
}}Louie, B., Higdon, R., and Kolker, E. A Statistical Model of Protein Sequence Similarity and Function Similarity Reveals Overly-Specific Function Predictions. PLoS ONE 4, 10 (2009), e7546.
[17]
}}Louie, B., Tarczy-Hornoch, P., Higdon, R., and Kolker, E. Validating annotations for uncharacterized proteins in Shewanella oneidensis. Omics: A Journal of Integrative Biology 12, 3 (2008), 211--215.
[18]
}}Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucl. Acids Res. 33, suppl_1 (2005), D54--58.
[19]
}}McClure, M. A., Vasi, T. K., and Fitch, W. M. Comparative analysis of multiple protein-sequence alignment methods. Molecular Biology and Evolution 11, 4 (1994), 571--592.
[20]
}}Pesquita, C., Faria, D., Bastos, H., Ferreira, A., Falcao, A., and Couto, F. Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 9, Suppl 5 (2008), S4.
[21]
}}Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 12, 2 (1999), 85--94.
[22]
}}Thompson, J., Plewniak, F., and Poch, O. A comprehensive comparison of multiple sequence alignment programs. Nucl. Acids Res. 27, 13 (1999), 2682--2690.
[23]
}}Valencia, A. Automatic annotation of protein function. Current Opinion in Structural Biology 15, 3 (2005), 267--274.

Cited By

View all
  • (2024)Comparative Bioinformatic Analysis of the Proteomes of Rabbit and Human Sex ChromosomesAnimals10.3390/ani1402021714:2(217)Online publication date: 9-Jan-2024
  • (2024)Searching for Protein Off-Targets of Prostate-Specific Membrane Antigen-Targeting Radioligands in the Salivary GlandsCancer Biotherapy and Radiopharmaceuticals10.1089/cbr.2024.006639:10(721-732)Online publication date: 1-Dec-2024
  • (2021)Biodegradation of isoprene by Arthrobacter sp. strain BHU FT2: Genomics-proteomics enabled novel insightsBioresource Technology10.1016/j.biortech.2021.125634(125634)Online publication date: Jul-2021
  • Show More Cited By

Index Terms

  1. Modeling sequence and function similarity between proteins for protein functional annotation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
      June 2010
      911 pages
      ISBN:9781605589428
      DOI:10.1145/1851476
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 June 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bioinformatics
      2. biostatistics
      3. multiple sequence alignment

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      HPDC '10
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 166 of 966 submissions, 17%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)20
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 25 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Comparative Bioinformatic Analysis of the Proteomes of Rabbit and Human Sex ChromosomesAnimals10.3390/ani1402021714:2(217)Online publication date: 9-Jan-2024
      • (2024)Searching for Protein Off-Targets of Prostate-Specific Membrane Antigen-Targeting Radioligands in the Salivary GlandsCancer Biotherapy and Radiopharmaceuticals10.1089/cbr.2024.006639:10(721-732)Online publication date: 1-Dec-2024
      • (2021)Biodegradation of isoprene by Arthrobacter sp. strain BHU FT2: Genomics-proteomics enabled novel insightsBioresource Technology10.1016/j.biortech.2021.125634(125634)Online publication date: Jul-2021
      • (2020)Whole-Genome Sequencing of Lactobacillus helveticus D75 and D76 Confirms Safety and Probiotic PotentialMicroorganisms10.3390/microorganisms80303298:3(329)Online publication date: 26-Feb-2020
      • (2020)Comprehensive Study of Keywords for Sequence-Based Automatic Annotation of Protein Functions2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE)10.1109/BIBE50027.2020.00012(23-28)Online publication date: Oct-2020
      • (2020)Fermentation optimization of cellulase production from sugarcane bagasse by Bacillus pseudomycoides and molecular modeling study of cellulaseCurrent Research in Microbial Sciences10.1016/j.crmicr.2020.100013(100013)Online publication date: Nov-2020
      • (2012)Visualizing the protein sequence universeProceedings of the 3rd international workshop on Emerging computational methods for the life sciences10.1145/2483954.2483958(13-22)Online publication date: 18-Jun-2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media