Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/369133.369168acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

On the predictive power of sequence similarity in yeast

Published: 22 April 2001 Publication History

Abstract

Perhaps the most direct way to infer functional linkage of proteins is through structural similarity. However, structure determination lags behind DNA sequencing. Here we show that sequence similarity based on nucleotide sequences alone between ORFs in yeast is indicative of the corresponding genes being of the same functional group, having a similar gene expression pattern or of being involved in a protein-protein interaction. In particular, we compare the nucleotide sequences corresponding to the 6280 yeast ORFs using BLAST, and then cluster them together using a simple neighbor-joining algorithm. This, in effect, gives us hierarchical clustering of 53 levels, where higher levels have bigger clusters. We compare the clustering to large databases that are not based a-priori on sequence information to get a notion of how well our clustering is correlated with this data. For functional annotation we use the SGD database that gives one of 540 annotations for about half the yeast genes. For all pairs that appear within a cluster, we test the hypothesis that almost all genes within the same cluster have the same function. We get very high percentage rates of correct annotation at the lower levels of the hierarchy, which decreases gradually at higher ones. From the results of the large scale gene expression experiments we generate a list of pairs of genes, whose expression is highly correlated (≱0.9). We then go over this list of pairs, and count how many of them are contained within a cluster as opposed to the expected number by a random model. We estimate the significance of our results using simulation, and get for all levels p-value≰≰0.001. The third type of data obtained from the protein-protein interaction database that is given as a list of airs of proteins involved in interaction.
As before, we count how many pairs are contained within a cluster, and get much better results than expected by a random model, with p-values≰0.001 for almost all levels. In summary, we show that successful functional predictions and functional annotations can be applied at a genomic scale. This can be achieved by combining a naive hierarchical clustering method that creates sets of clusters at different levels of granularity with statistical validation tools.

References

[1]
Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y., Juvik, G. et al. 1998. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26: 73-79.
[2]
Chervitz, S. A., Hester, E. T., Ball, C. A., Dolinski, K., Dwight, S. S., Harris, M. A., Juvik, G., Malekian, A. et al. 1999. Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure. Nucleic Acids Res. 27: 74-78.
[3]
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc Nail Acad Sci USA 95: 14863- 14868.
[4]
Eisenberg, D., Marcotte, E. M., Xenarios, I., and Yeates, T. O. 2000. Protein function in the post-genomic era. Nature 405: 823-826.
[5]
Eisenstein, E., Gilliland, G. L., Herzberg, O., Moult, J., Orban, J., Poljak, R. J., Banerjei, L., Richardson, D. et al. 2000. Biological function made crystal clear - annotation of hypothetical proteins via structural genomics. Curr. Opin. Biotechnol. 11: 25-30.
[6]
Friedman, N., Linial, M., Nachman, I., and Pe'er, D. 2000. Using Bayesian networks to analyze expression data. Proceedings of RECOMB 2000 4:127-135.
[7]
Goffeau, A., Barrell, B. G., Bussey, H., Davis, R. W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J. D. et al. 1996. Life with 6000 genes. Science 274: 563-567.
[8]
Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., Bennett, H. A., Coffey, E. et al. 2000. Functional discovery via a compendium of expression profiles. Cell 102:109-126.
[9]
Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S. et al. 2000. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA 97:1143- 1147.
[10]
Jungmann, J., and Munro, S. 1998. Multi-protein complexes in the cis Golgi of Saccharomyces cerevisiae with alpha-1,6-mannosyltransferase activity. EMBO J. 17: 423-434.
[11]
Kim, S. H. 1998. Shining a light on structural genomics. Nat. Struct. Biol. 5: 643-645.
[12]
Louis, E. J., and Haber, J. E. 1992. The structure and evolution of subtelomeric Y' repeats in Saccharomyces cerevisiae. Genetics 1311: 559-574.
[13]
Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., and Eisenberg, D. 1999. Detecting protein function and protein-protein interactions from genome sequences. Science 285: 751-753.
[14]
Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg, D. 1999. A combined algorithm for genome-wide prediction of protein function. Nature 402: 83-86.
[15]
Mewes, H. W., Albermann, K., Bahr, M., Frishman, D., Gteissner, A., Hani, J., Heumann, K., Kleine, K. et al. 1997. Overview of the yeast genome. Natttre 387: 7-65.
[16]
Reinders, A., Burckert, N., Hohmann, S., Thevelein, J. M., Boller, T., Wiemken, A., and De Virgilio, C. 1997. Structural analysis of tile subunits of the trehalose-6- phosphate synthase/phosphatase complex in Saccharomyces cerevisiae and their function during heat shock. Mol. Microbiol. 24: 687-695.
[17]
Ross-Macdonald, P., Coelho, P. S., Roemer, T., Agarwal, S., Kumar, A., Jansen, R., Cheung, K. H. et al. 1999. Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature 402:413-418.
[18]
Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, v. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. et al. 1998. Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9: 3273-3297.
[19]
Tatusov, R. L., Koonin, E. V., and Lipman, D. J. 1997. A genomic perspective on protein families. Science 278: 631-637.
[20]
Teunissen, A. W. and Steensma, H. Y. 1995. The dominant flocculation genes of Saccharomyces cerevisiae constitute a new subtelomeric gene family. Yeast 11: 1001-1013.
[21]
Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V. et al. 2000. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nattu'e 4tt3: 623-627.
[22]
Uetz, P., and Hughes, R. E. 2000. Systematic and largescale two-hybrid screens. Curt. Opin. Microbiol. 3: 303- 308.
[23]
Wilson, C. A., Kreychman, J., and Gerstein, M. 2000. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol. 297: 233-249.
[24]
Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. 2000. DIP: the database of interacting proteins. Nucleic Acids" Res. 28: 289-291.
[25]
Yamada, M., Hayatsu, N., Matsuura, A., and Ishikawa, F. 1998. Y'-Helpl, a DNA helicase encoded by the yeast subtelomeric Y' element, is induced in survivors defective for teiomerase. J. Biol. Chem. 273: 33360- 33366.
[26]
Zhang, M. Q. 1999. Large-scale gene expression data analysis: a new challenge to computational biologists. Genome Res. 9:681-688.
[27]
Zhu, J., and Zhang, M. Q. 2000. Cluster, function and promoter: analysis of yeast expression array. Proceed. Pac. Symp. Biocomput. 479-490.

Cited By

View all
  • (2022)Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike ProteinViruses10.3390/v1408167214:8(1672)Online publication date: 29-Jul-2022

Index Terms

  1. On the predictive power of sequence similarity in yeast

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        RECOMB '01: Proceedings of the fifth annual international conference on Computational biology
        April 2001
        316 pages
        ISBN:1581133537
        DOI:10.1145/369133
        • Chairman:
        • Thomas Lengauer
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 April 2001

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Article

        Conference

        RECOMB01
        Sponsor:

        Acceptance Rates

        RECOMB '01 Paper Acceptance Rate 35 of 128 submissions, 27%;
        Overall Acceptance Rate 148 of 538 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 03 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike ProteinViruses10.3390/v1408167214:8(1672)Online publication date: 29-Jul-2022

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media