Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CATH Database

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

CATH database

The CATH Protein Structure Classification database is a free,


publicly available online resource that provides information on the
CATH
evolutionary relationships of protein domains. It was created in the
mid-1990s by Professor Christine Orengo and colleagues
including Janet Thornton and David Jones,[2] and continues to be
developed by the Orengo group at University College London. Content
CATH shares many broad features with the SCOP resource, Description Protein Structure
however there are also many areas in which the detailed Classification
classification differs greatly.[3][4][5][6]
Contact
Research center University
Hierarchical organization College London

Experimentally-determined protein three-dimensional structures are Laboratory Institute of


obtained from the Protein Data Bank and split into their Structural and
consecutive polypeptide chains, where applicable. Protein domains Molecular
are identified within these chains using a mixture of automatic Biology
methods and manual curation. Primary citation Dawson et al.
(2016) [1]
The domains are then classified within the CATH structural
hierarchy: at the Class (C) level, domains are assigned according to Release date 1997
their secondary structure content, i.e. all alpha, all beta, a mixture Access
of alpha and beta, or little secondary structure; at the Architecture
(A) level, information on the secondary structure arrangement in Website cathdb.info (htt
three-dimensional space is used for assignment; at the p://cathdb.info)
Topology/fold (T) level, information on how the secondary Download URL cathdb.info
structure elements are connected and arranged is used; assignments /download (htt
are made to the Homologous superfamily (H) level if there is good p://cathdb.info/d
evidence that the domains are related by evolution [2] i.e. they are ownload)
homologous.
Miscellaneous
The four main levels of the CATH hierarchy: Data release CATH-B is
# Level Description frequency released daily.
Official releases
the overall secondary-structure content of the
1 Class are
domain. (Equivalent to the SCOP Class)
approximately
high structural similarity but no evidence of
2 Architecture annual.
homology.
a large-scale grouping of topologies which Version 4.3
3 Topology/fold share particular structural features
(Equivalent to the 'fold' level in SCOP)

indicative of a demonstrable evolutionary


Homologous
4 relationship. (Equivalent to SCOP
superfamily
superfamily)
Additional sequence data for domains with no experimentally determined structures are provided by
CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein
sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence
boundaries and make homologous superfamily assignments.

Releases
The CATH team aim to provide official releases of the CATH classification every 12 months. This release
process is important because it allows for the provision of internal validation, extra annotations and
analysis. However, it can mean that there is a time delay between new structures appearing in the PDB and
the latest official CATH release,

In order to address this issue: CATH-B provides a limited amount of information to the very latest domain
annotations (e.g. domain boundaries and superfamily classifications).

The latest release of CATH-Gene3D (v4.3) was released in December 2020 and consists of:

500,238 structural protein domain entries [1]


151 mln non-structural protein domain entries [1]
5,481 homologous superfamily entries [1]
212,872 functional family entries [1]

Open source software


CATH is an open source software project, with developers developing and maintaining a number of open
source tools.[7] CATH maintains a todo list on GitHub to allow external users to create and keep track of
issues relating to the CATH protein structure classification.

References
1. Dawson, NL; Lewis, TE; Das, S; Lees, JG; Lee, D; Ashford, P; Orengo, CA; Sillitoe, I (28
November 2016). "CATH: an expanded resource to predict protein function through structure
and sequence" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210570). Nucleic Acids
Research. 45 (D1): D289–D295. doi:10.1093/nar/gkw1098 (https://doi.org/10.1093%2Fnar%
2Fgkw1098). PMC 5210570 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210570).
PMID 27899584 (https://pubmed.ncbi.nlm.nih.gov/27899584).
2. Orengo, CA; Michie, AD; Jones, S; Jones, DT; Swindells, MB; Thornton, JM (1997). "CATH –
a hierarchic classification of protein domain structures" (https://doi.org/10.1016%2FS0969-2
126%2897%2900260-8). Structure. 5 (8): 1093–1109. doi:10.1016/S0969-2126(97)00260-8
(https://doi.org/10.1016%2FS0969-2126%2897%2900260-8). ISSN 0969-2126 (https://www.
worldcat.org/issn/0969-2126). PMID 9309224 (https://pubmed.ncbi.nlm.nih.gov/9309224).
3. "CATH: Protein Structure Classification Database at UCL" (http://www.cathdb.info).
Cathdb.info. Retrieved 9 March 2017.
4. "CATH" (http://www.cathdb.info/wiki/doku/?id=tutorials:index). Cathdb.info. Retrieved
9 March 2017.
5. "CATH Database (@CATHDatabase)" (https://twitter.com/CATHDatabase). Twitter.
Retrieved 9 March 2017.
6. Pearl, F. M. G. (2003). "The CATH database: an extended protein family resource for
structural and functional genomics" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC165509).
Nucleic Acids Research. 31 (1): 452–455. doi:10.1093/nar/gkg062 (https://doi.org/10.1093%
2Fnar%2Fgkg062). ISSN 1362-4962 (https://www.worldcat.org/issn/1362-4962).
PMC 165509 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC165509). PMID 12520050 (http
s://pubmed.ncbi.nlm.nih.gov/12520050).
7. "Tools" (http://www.cathdb.info/wiki/doku/?id=cath_tools). cathdb.info. Retrieved
18 December 2016.

Retrieved from "https://en.wikipedia.org/w/index.php?title=CATH_database&oldid=1166578676"

You might also like