Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/640075.640087acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

An integrated probabilistic model for functional prediction of proteins

Published: 10 April 2003 Publication History

Abstract

We develop an integrated probabilistic model to combine protein physical interactions, genetic interactions, highly correlated gene expression network, protein complex data, and domain structures of individual proteins to predict protein functions. The model is an extension of our previous model for protein function prediction based on Markovian random field theory. The model is flexible in that other protein pairwise relationship information and features of individual proteins can be easily incorporated. Two features distinguish the integrated approach from other available methods for protein function prediction. One is that the integrated approach uses all available sources of information with different weights for different sources of data. It is a global approach that takes the whole network into consideration. The second feature is that the posterior probability that a protein has the function of interest is assigned. The posterior probability indicates how confident we are about assigning the function to the protein. We apply our integrated approach to predict functions of yeast proteins based upon MIPS protein function classifications and upon the interaction networks based on MIPS physical and genetic interactions, gene expression profiles, Tandem Affinity Purification (TAP) protein complex data, and protein domain information. We study the sensitivity and specificity of the integrated approach using different sources of information by the leave-one-out approach. In contrast to using MIPS physical interactions only, the integrated approach combining all of the information increases the sensitivity from 57% to 87% when the specificity is set at 57%-an increase of 30%. It should also be noted that enlarging the interaction network greatly increases the number of proteins whose functions can be predicted.

References

[1]
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs. Nucleic Acids Res. 25: 3389--3402. 1997.
[2]
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller. L., Eddy, S.R., Griffiths-Jones S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. The Pfam Protein Families Database. Nucleic Acids Res. 30: 276--280. 2002.
[3]
Brown, M., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M. Jr., and Haussler, D. Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines Proc. Natl. Acad. Sci. USA 97: 262--267. 2000.
[4]
Clare, A. and King, R.D. Machine Learning of Functional Class from Phenotype Data. Bioinformatics 18: 160--166. 2002.
[5]
Costanzo, M.C., Crawford, M.E., Hirschman, J.E., Kranz, J.E., Olsen, P., Robertson, L.S., Skrzypek, M.S., Braun, B.R., Hopkins, K.L., Kondu, P., Lengieza, C., Lew-Smith, J.E., Tillberg, M., and Garrels, J.I. YPD™, PombePD™, and WormPD™: Model Organism Volumes of the BioKnowledge Library, an Integrated Resource for Protein Information. Nucleic Acids Res. 29: 75--79. 2001.
[6]
Deng, M., Zhang, K., Mehta, S., Chen, T. and Sun, F. Prediction of Protein Function Using Protein-protein Interaction Data. Proceedings of the First IEEE Computer Society Bioinformatics Conference (CSB2002): 197--206. 2002.
[7]
Deng, M., Chen, T., and Sun, F. Assessment of the Reliability of Protein-protein Interactions and Protein Function Prediction. Pacific Symposium of Biocomputing (PSB2003): 140--151. 2002.
[8]
Devos,D., and Valencia, A. Pratical Limits of Function Prediction. Proteins: Structure, Function, and Genetics 41: 98--107. 2000.
[9]
Drawid, A. and Gerstein, M. A Bayesian System Integrating Expression Data with Squence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome. J. Mol. Biol. 301: 1059--1075. 2000.
[10]
Dwight, S.S., Harris, M.A., Dolinski, K., Ball, C.A., Binkley, G., Christie, K.R., Fisk, D.G., Issel-Tarver, L., Schroeder, M., and Sherlock, G. et al. Saccharomyces Genome Database (SGD) Provides Secondary Gene Annotation Using the Gene Ontology (GO). Nucleic Acids Res. 30: 69--72. 2002.
[11]
Eisen, M.B., Spellman, P.T., Brown, P.O., and Bostein D. Cluster Analysis and Display of Genome-wide Expression Patterns. Proc. Natl. Acad. Sci. USA 95: 14863--14868. 1998.
[12]
Fellenberg, M., Albermann, K., Zollner, A., Mewes, H.W., and Hani J. Integrative Analysis of Protein Ineraction Data. In Proc. of the Eighth Int. Conf. on Intelligent System for Molecular Biology (ISMB2000): 152--161. 2000.
[13]
Gavin, A., Böche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A., and Cruciat, C. et al. Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes. Nature 415: 141--147. 2002.
[14]
Greenbaum, D., Luscombe, N.M.,Jansen, R., Qian, J., and Gerstein, M. Interrelating Different Types of Genomic Data, from Proteome to Secretome: Coming in on Function. Genome Research 11: 1463--1468. 2001.
[15]
Gupta, R. and Brunak, S. Prediction of Glycosylation Across the Human Proteome and the Correlation to Protein Function. Pacific Symposium of Biocomputing (PSB2002): 310--322. 2002.
[16]
Hegyi,H. and Gerstein, M. (1999). The Relationship Between Protein Structure and Function: a Comprehensive Survey with Application to Yeast Genome. J. Mol. Biol. 288: 147--164. 1999.
[17]
Hegyi,H. and Gerstein, M. Annotation Transfer for Genomics: Measuring Fuctinal Divergence in Multi-domain Proteins. Genome Research 11: 1632--1640. 2001.
[18]
Hishigaki H., Nakai K., Ono T., Tanigami A., and Takagi T. Assessment of Prediction Accuracy of Protein Function from Protein-protein Interaction Data. Yeast 18: 523--531. 2001.
[19]
Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S., Millar, A., Taylor, P., Bennett, K., and Boutilier, K. et al. Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry. Nature 415: 180--183.2002.
[20]
Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto,K., Kuhara, S., Sakaki, Y. Toward a Protein-protein Interaction Map of the Budding Yeast: a Comprehensive System to Examine Two-hybrid Interactions in All Possible Combinations Between the Yeast Proteins. Proc. Natl. Acad. Sci. USA 97: 1143--1147. 2000.
[21]
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M. and Sakaki, Y. A Comprehensive Two Hybrid Analysis to Explore the Yeast Protein Interactome. Proc. Natl. Acad. Sci. USA 98: 4569--4574. 2001.
[22]
Jensen, L.J., Gupta, R., Blom,N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Stærfeldt,~H.H, Rapacki,K., and Workman, C. et. al. Prediction of Human Protein Function from Post-translational Modifications and Localization Features. J. Mol. Biol. 319: 1257--1265. 2002.
[23]
Kell, D.B. and King, R.D. On the optimization of Classes for the Assignment of Unidentified Reading Frames in Functional Genomics Programmes: the Need for Machine Learning. Trends Biotechnol. 18: 93--98. 2000.
[24]
King, R.D., Karwath, A., Clare, A., and Dehaspe, L. The Utility of Different Representations of Protein Sequence for Predicting Functional Class. Bioinformatics 17: 445--454. 2001.
[25]
Li, S.Z. (1995). Markov Random Field Modeling in Computer Vision. Springer-Verlag: Tokyo.
[26]
Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer-Verlag: New York.
[27]
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice,D.W., Yeates, T.O., and Eisenberg, D. Detecting Protein Function and Protein-protein Interactions from Genome Sequences. Science 285: 751--753. 1999.
[28]
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O., and Eisenberg, D. (1999). A Combined Algorithm for Genome-wide Prediction of Protein Function. Nature 402: 83--86. 1999.
[29]
C.V. Mering, R. Krause, M. Snel, S.G. Oliver, S. Fields, and P. Bork. Comparative Assessment of Large Scale Data Sets of Protein-protein Interactions. Nature 417: 399--403. 2002.
[30]
H.W. Mewes, D. Frishman, U. Guldener,G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkotter, S. Rudd, and B. Weil. MIPS: a Database for Genomes and Protein Sequences. Nucleic Acids Res. 30: 31--34. 2002.
[31]
R. Mrowka, A. Patzak, and H. Herzel. (2001) Is There a Bias in Proteome Research? Genome Research 11: 1971--1973. 2001.
[32]
Oliver, S. Guilt-by-association Goes Global. Nature 403: 601--603. 2000.
[33]
Pavlidis, P. and Weston,J. Gene Functional Classification from Heterogeneous Data. In Proceedings of the Fifth International Conference on Computational Molecular Biology (RECOMB2001): 249--255. 2001.
[34]
Pearson,W.R. and Lipman,D.J. Improved Tools for Biological Sequence Comparison. Proc. Natl. Acad. Sci. USA 85: 2444--2448. 1988.
[35]
Pellegrini, M., Marcotte, E.M., Thompson,M.J., Eisenberg, D., and Yeates, T.O. Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles. Proc. Natl. Acad. Sci. USA 96: 4285--4288. 1999.
[36]
Schwikowski, B., Uetz, P., and Fields, S. A Network of Protein-protein Interactions in Yeast. Nature Biotechnology 18: 1257--1261. 2000.
[37]
Schug,J., Diskin, S., Mazzarelli, J., Brunk, B.P.,and Stoeckert, C.J,Jr. Prediction Gene Ontology Functions from Prodom and CDD Protein Domains. Genome Research 12: 648--655. 2002.
[38]
P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher. Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 9: 3273--3297.1998.
[39]
Stawiki, E.W., Mandel-Gutfreund, Y., Lowenthal, A.C.,and Gregoret, L.M. Progress in Predicting Protein Function from Structure: Unique features of O-Glycosidases. Pacific Symposium of Biocomputing (PSB2002): 637--648. 2002.
[40]
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., and Pochart, et al. A Comprehensive Analysis of Protein-protein Interactions in Saccharomyces Cerevisiae. Nature 403: 623--627. 2000.
[41]
Venables, W.N. and Ripley, B.D. Modern Applied Statistics with S-Plus. Springer-Verlag; New York. 1996.
[42]
Wu, L., Hughes, T.R., Davierwala A.P., Robinson, M.D., Stoughton,R., and Altschuler S.J. Large-scale Prediction of Saccharomyces Cerevisiae Gene Function using Overlapping Transcriptional Clusters. Nature Genetics 31: 255--265. 2002.
[43]
Zhou, X., Kao, M., and Wong, W. Transitive Functional Annotation by Shortest-path Analysis of Gene Expression Data. Proc. Natl. Acad. Sci. USA 99: 12783--12788. 2002.

Cited By

View all
  • (2013)Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative ApproachesSystems Biology10.1007/978-94-007-6803-1_8(241-272)Online publication date: 2013
  • (2012)Exploiting label dependency for hierarchical multi-label classificationProceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I10.1007/978-3-642-30217-6_25(294-305)Online publication date: 29-May-2012
  • (2011)Multi-view prediction of protein functionProceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine10.1145/2147805.2147820(135-142)Online publication date: 1-Aug-2011
  • Show More Cited By

Index Terms

  1. An integrated probabilistic model for functional prediction of proteins

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology
        April 2003
        352 pages
        ISBN:1581136358
        DOI:10.1145/640075
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 April 2003

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Gibbs sampler
        2. Markov random field
        3. function prediction
        4. pfam domain
        5. protein-protein interaction

        Qualifiers

        • Article

        Conference

        RECOMB03
        Sponsor:

        Acceptance Rates

        RECOMB '03 Paper Acceptance Rate 35 of 175 submissions, 20%;
        Overall Acceptance Rate 148 of 538 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 10 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2013)Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative ApproachesSystems Biology10.1007/978-94-007-6803-1_8(241-272)Online publication date: 2013
        • (2012)Exploiting label dependency for hierarchical multi-label classificationProceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I10.1007/978-3-642-30217-6_25(294-305)Online publication date: 29-May-2012
        • (2011)Multi-view prediction of protein functionProceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine10.1145/2147805.2147820(135-142)Online publication date: 1-Aug-2011
        • (2011)Prediction of Protein Functions with Gene Ontology and Interspecies Protein Homology DataIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.158:3(775-784)Online publication date: May-2011
        • (2010)Geo-clustering of Images with Missing GeoTagsProceedings of the 2010 IEEE International Conference on Granular Computing10.1109/GrC.2010.76(420-425)Online publication date: 14-Aug-2010
        • (2009)The use of gene ontology evidence codes in preventing classifier assessment biasBioinformatics10.1093/bioinformatics/btp12225:9(1173-1177)Online publication date: 1-May-2009
        • (2009)Protein functional class prediction using global encoding of amino acid sequenceJournal of Theoretical Biology10.1016/j.jtbi.2009.07.017261:2(290-293)Online publication date: Nov-2009
        • (2009)Protein functional class prediction with a combined graphExpert Systems with Applications: An International Journal10.1016/j.eswa.2008.01.00636:2(3284-3292)Online publication date: 1-Mar-2009
        • (2008)Systems Biology via Redescription and Ontologies (III)Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine10.1109/BIBM.2008.82(278-283)Online publication date: 3-Nov-2008
        • (2008)Integrative Protein Function Transfer Using Factor Graphs and Heterogeneous Data SourcesProceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine10.1109/BIBM.2008.65(314-318)Online publication date: 3-Nov-2008
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media