Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/974614.974658acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

A structural perspective on genome evolution

Published: 27 March 2004 Publication History

Abstract

At UCL we have developed several automated protocols for generating protein family resources (CATH; Gene3D). These resources can be used to perform comparative genome analyses in order to understand the evolution of protein families. Also to identify biologically and/or medically interesting families for which no structural data currently exists and which may therefore be important targets for structure genomics initiatives.The CATH domain structure database, established by Orengo and Thornton in 1993, now contains a significant proportion of protein structures from the PDB clustered into 1400 evolutionary families. Relationships have been identified using robust structure comparison methods (SSAP, CATHEDRAL). We have also benchmarked and optimised various 1D-profiles and HMM based protocols for assigning genome sequences to families within the resource (e.g. SAM-T99, SAMOSA, CATH-ISL).In this way we can assign structural data to a large proportion (up to 60%) of whole or partial sequences in completed genomes and >80% of genes coding for enzymes and other proteins in biochemical pathways. However, in order to include all families regardless of whether their structure is known or not, a new protein family resource has been developed (Gene3D). In Gene3D, complete genes have been clustered according to sequence similarity alone, using a robust clustering method (Pfscape). 120 completed genomes from all kingdoms have been clustered into 220,000 gene families, 70,000 of which contain 2 or more sequences. Subsequently, we have labelled those gene families for which CATH structural or Pfam functional domain annotations can be provided for all or part of the gene.Preliminary analysis of the genome annotations reveals that a significant proportion (up to 70%) of CATH annotated genes or gene regions in genomes are assigned to domain families that are common to all three kingdoms of life. However, only 20% of the genome sequences are assigned to gene families common to all kingdoms. Since a large proportion of these genes are multidomain proteins this supports the view that a great deal of functional diversity within the genomes has been achieved by combining domain modules in different ways.In collaboration with Professor Janet Thornton, we have analysed a subset of 56 bacterial genomes to determine the recurrence of specific domain structure families within the genomes. This revealed a small but essential group of universal, and in some cases, highly recurring domain families. For some size-dependent families, domain recurrence is highly correlated with increase in genome size, whilst in other size-independent families no correlation is observed. Statistical analysis allowed us to distinguish three groups. Within the size-dependent families we differentiated two groups: linearly-distributed and non-linearly-distributed. Functional annotation using the COGs revealed that these domains were predominantly involved in metabolism and regulation, respectively. Whilst a third group of Evenly-distributed size independent domains are primarily involved in protein translation and biosynthesis.By mapping CATH and Pfam domains families onto all the genome sequences in Gene3D we observe that a few hundred highly recurrent families are dominating at least 50% of whole or partial genome sequences. Many of these families are common to both prokaryotes and eukaryotes and are performing essential generic functions. In many of the largest families, significant divergence in sequence has been accompanied by modifications in structure and function. Targetting representatives in these families for structure determination will allow the structure genomics initiatives to map both fold and function space and reveal the mechanisms by which divergence in protein families promotes evolution of new functions.
  1. A structural perspective on genome evolution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology
    March 2004
    370 pages
    ISBN:1581137559
    DOI:10.1145/974614
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 March 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    RECOMB04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 148 of 538 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 98
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media