Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bioinformatics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

PROTEIN

BIOINFORMATICS
Course no: Biotech-628
Contents
CLASSIFICATION OF PROTEIN STRUCTURE..................................................................................................2
i. SCOP................................................................................................................................................2
ii. CATH................................................................................................................................................3
iii. FSSP.................................................................................................................................................4
PREDICTION OF THREE DIMENSIONAL STRUCTURE OF PROTEIN................................................................4
i. Homology Modelling.......................................................................................................................5
a) Template Selection......................................................................................................................5
b) Sequence Alignment....................................................................................................................5
c) Backbone Model Building............................................................................................................5
d) Loop Modelling............................................................................................................................5
e) Side Chain Refinement.................................................................................................................6
f) Model Refinement.......................................................................................................................6
g) Model Evaluation.........................................................................................................................6
ii. Threading/Fold recognition.............................................................................................................6
a) Pairwise Energy Method..............................................................................................................6
b) Profile Method.............................................................................................................................7
iii. Ab initio Prediction..........................................................................................................................7
CLASSIFICATION OF PROTEIN STRUCTURE
Proteins are biological molecules formed of liner chains of organic compound residues that fold
into the corresponding distinctive three-dimensional (3D) structures comprising secondary
structure parts, whereby they acquire their own functions regulated by their 3D structures. The
search to visualize protein structure geometries has given rise to many experimental and
theoretical ways (Konno et al., 2019).

Detailed description of the relationships of known protein structures is provided by the Structural
Classification of Proteins databases. The classification is on hierarchical levels: the first two
levels, family and super family, depict distant and near evolutionary relationships; the third, fold,
describes geometrical relationships (Skariyachan & Garka, 2018).

Some of the foremost outstanding Protein structure classification schemes that supported the
“whole” protein structures are,

i. SCOP (Structural Classification Of Proteins)


ii. CATH (Class, Architecture, Topology, Homologous superfamily)
iii. FSSP (Fold classification based on Structure-Structure alignment of Proteins)

i. SCOP
 SCOP database provides a comprehensive description of the structural and biological
process relationships of the proteins of better-known structures. For each macromolecule,
the classification has the hierarchal levels, family, taxonomic category, fold, and
structural category (Gromiha, 2010).
 On the primary level of the hierarchy, the ‘class’ is divided into four major classes—all-
α, all-β, α + β and α/β—describing the contents of these secondary structural parts within
the domain (Konno et al., 2019).
 Class (secondary structure composition and sequence/arrangement): members of various
folds are placed beneath same class that supported the extent of secondary structural
content and order of incidence of SSEs. Domains in circular proteins are typically
classified under the subsequent categories of ‘Class’ (Kumar et al., 2019).
o All α class: members of this class are mainly composed of α- helices.
o All β class: members of this class are mainly composed of β-sheets.
o α/β class: members comprise of interspersed α- helices and β strands in their
structure.
o α+β class: members comprise of isolated α- helices and β strands in their
structure.
o Multidomain category: a multidomain class includes of members with domains
of various folds.
o Small proteins: this category includes members similar to many disulfide rich
and metal binding proteins with few or nearly no regular secondary structures.
o Membrane proteins: this category includes membrane proteins.
 Fold (gross structural similarity): members in numerous super families are sorted into one
fold if the arrangement of major SSEs in conjunction with their topological connections
is the same. Structural similarity among members within the same fold arises from
physicochemical properties favoring packing arrangements and chain topologies.
 Superfamily (probable evolutionary relationship): families showing overall structural
similarity and in several cases gross functional similarity, thus indicating potential
common evolutionary origin, are categorized into one taxonomic category. Generally
though the functions don't seem to be constant for the families among the taxonomic
category, the character of functions in conjunction with the topological and chemical
equivalence of useful sites imply the potential evolutionary relationship between the
families.
 Family (clear evolutionary relationship): family may be outlined as a set of connected
macromolecule regions, which share high sequence identity and typically smart
functional and structural similarity. Most of the members of a family show quite 30%
sequence identity with one another. However, there exist few samples of families in
SCOP containing members with low sequence similarity to the haematohiston family
wherever sequence identity between members may be as low as 15%. However, all
members show constant overall structure and important useful residues in topologically
equivalent positions so implying divergence from a common ancestor (Kumar et al.,
2019).
 SCOP provides a broad survey of all known protein folds, elaborated information
regarding the close relatives of any specific protein, and a framework for future analysis
and classification.

ii. CATH
 (CATH) is a semiautomatic procedure for deriving a novel hierarchical classification of
protein domain structures. It is a database that provides the class information for all the
structures in PDB.
 The four main levels of CATH classification are protein class (C), architecture (A),
topology (T), and homologous superfamily (H).
 Class is the simplest level, and it basically describes the secondary structure composition
of every domain.
 Architecture summarizes the form disclosed by the orientations of the secondary structure
units, like barrels and sandwiches.
 At the topology level, sequential property is taken into account, specified members of
constant design may have completely different topologies.

 The homologous superfamilies contain proteins with extremely similar structures and
functions (Gromiha, 2010).
 CATH is a combination of manual curation and automatic procedure that makes the
process less subjective. For instance, in defining domains, CATH first depends on the
consensus of 3 completely different algorithms to acknowledge domains. Once the
computer programs disagree, human intervention can take place. Additionally, the
additional Architecture level in CATH makes the structure classification a lot more
continuous. The disadvantage of the systems is that the fixed thresholds in structural
comparison could make assignment less correct (Xiong, 2006).

iii. FSSP
 In addition to CATH, the FSSP database is also a completely automatic platform that
employees statistical Z-scores to match structural similarity.
 In contrast to the SCOP and CATH databases, FSSP uses the Dali program to directly
compare a queried structure to different structures within the PDB.
 Supported by the Z-score, FSSP generates a ranking of three-dimensional similarity
wherever the next Z-score correlates to high structural similarity.
 Matches with a Z-score less than 2 are thought to lack structural similarity (Korasick &
Jez, 2016).

PREDICTION OF THREE DIMENSIONAL STRUCTURE OF PROTEIN


Knowledge of protein three-dimensional structure or tertiary structure (3D) could be a basic
requirement for understanding the function of a protein. Currently, the major techniques that
verify protein 3D structures are X-ray crystallography and nuclear resonance (NMR).

In X-ray crystallography the protein is crystallized and the structure of protein is determined
through X-ray diffraction. Determination of 3D structure by X-ray crystallography isn't
perpetually simple and generally takes the maximum amount as 3 to 5 years. Magnetic
resonance is another helpful technique to visualize the protein structure. The advantage of
magnetic resonance over X-ray crystallography is that the protein is studied in associate binary
compound surroundings which will tally its actual physiological condition more closely. The
major limitation of magnetic resonance is that it's solely appropriate for tiny proteins that have
one hundred fifty amino acids (Bhasin & Raghava, 2006).
The gap between identified protein sequences and the identified protein structure is increasing
exponentially. Thus, there is a need to develop the computational techniques to predict protein
structures.

There are three computational approaches to protein three-dimensional structural modeling and
prediction.

They are homology modeling, threading, and ab initio prediction.

The first two are knowledge-based methods; they predict protein structures supported on
knowledge of existing protein structural information in databases.

The ab initio approach is simulation based and predicts structures based on physicochemical
principles governing protein folding without the employment of structural templates (Bhasin &
Raghava, 2006).

i. Homology Modelling
The process consists of six major steps and one additional step,

a) Template Selection
The template selection involves searching the Protein Data Bank (PDB) for homologous
proteins with known structure and can be performed conducting a heuristic pairwise
alignment search program such as BLAST or FASTA.
It is classified into 3 areas; Midnight Zone: Less than 20% sequence identity, Twilight
Zone: 20% - 40% sequence identity, Safe Zone: 40% or more sequence identity.

b) Sequence Alignment
Once the structure with the highest sequence homology is identified as a template, the
full-length sequences of the template and target proteins are then realigned using refined
alignment algorithms to obtain optimal alignment. Praline and T-Coffee are mostly used
for sequence allignment.

c) Backbone Model Building


After optimal alignment is achieved, the coordinates of the corresponding fragments of
the template proteins can be simply copied onto the target protein.

d) Loop Modelling
In the sequence alignment for modeling, there are sometimes regions caused by insertions
and deletions that produce gaps in sequence alignment. The gaps cannot be directly
modeled, creating “holes” in the model. To close the gaps there is a need for loop
modeling which is a very difficult problem in homology modeling and is also a main
source of error. Currently, there are two main methods that are used to approach the
problem: the database searching method and the ab initio method. The database method
involves finding “spare parts” from known protein structures in a database that fit onto
the two stem regions of the target protein.

e) Side Chain Refinement


After the main chain atoms are enginered, the positions of side chains that are not
modeled must be determined. A side chain engineered by searching every probable
conformation at every torsion angle of the side chain to select the one that has the lowest
interaction energy with neighboring atoms. Most current side chain prediction programs
use the concept of rotamers that favored side chain torsion angles extracted from known
protein crystal structures. A collection of preferred side chain conformations is a rotamer
library in which the rotamers are ranked by their frequency of occurrence. One of the
specialized side chain modeling program is SCWRL, which is a UNIX program.

f) Model Refinement
In these loop modeling and side chain modeling steps, P.E. calculations are applied to
boost the modelling. Modeling typically produces unfavorable bond lengths, bond angles,
torsion angles and contacts. Thus, it's vital to attenuate energy to regularize native bond
and angle geometry and to relax close contacts and geometric chain. GROMOS is a
UNIX program for molecular dynamic simulation.

g) Model Evaluation
The final similarity model should be evaluated to create certain that the structural options
of the model are in accordance with the physicochemical rules. This involves checking
anomalies in φ–ψ angles, bond lengths, close contacts, and so on. If structural
irregularities are found, the region is taken into account to own errors and should be
refined. Procheck may be a UNIX program that's ready to check general physicochemical
parameters.

ii. Threading/Fold recognition


By definition, threading or structural fold recognition predicts the structural fold of associate
unknown macromolecule sequence by fitting the sequence into structural information and
choosing the best-fitting fold. The comparison emphasizes matching of secondary structures that
are most evolutionarily preserved. The algorithms are often classified into 2 classes, pairwise
energy based and profile based.

a) Pairwise Energy Method


In the pairwise energy primarily based methodology, a Protein sequence is explored for
structural fold information to seek out the most effective matching structural fold by
energy-based criteria. The elaborated procedure involves orienting the question sequence
with every structural fold during a fold library. The alignment is performed primarily at
the sequence profile level, dynamic programming or heuristic approaches.
b) Profile Method
In the profile-based methodology, a profile is made for a group of connected protein
structures. The structural profile is generated by superimposition of the structures to show
corresponding residues.

iii. Ab initio Prediction


When no appropriate structure templates are found, at the Ab initio strategy is used to
predict the Protein structure from the sequence data solely. because the name suggests,
the Ab initio prediction methodology tries to provide all- atom Protein models supported
sequence data alone while not getting the help of known protein structures.
One of the top ab initio prediction methods is called Rosetta. The basic plan of Rosetta is
To narrow the conformation segments with native structure predictions & Model the
structures of proteins by collecting the native structures of segments.

REFERENCES
Bhasin, M., & Raghava, G. P. S. (2006). 8—Computational Methods in Genome Research. In D.

K. Arora, R. M. Berka, & G. B. Singh (Eds.), Applied Mycology and Biotechnology (Vol.

6, pp. 179–207). Elsevier. https://doi.org/10.1016/S1874-5334(06)80011-0

Gromiha, M. M. (2010). Chapter 1—Proteins. In M. M. Gromiha (Ed.), Protein Bioinformatics

(pp. 1–27). Academic Press. https://doi.org/10.1016/B978-8-1312-2297-3.50001-1

Konno, S., Namiki, T., & Ishimori, K. (2019). Quantitative description and classification of

protein structures by a novel robust amino acid network: Interaction selective network

(ISN). Scientific Reports, 9(1), 16654. https://doi.org/10.1038/s41598-019-52766-6

Korasick, D. A., & Jez, J. M. (2016). Protein Domains: Structure, Function, and Methods. In R.

A. Bradshaw & P. D. Stahl (Eds.), Encyclopedia of Cell Biology (pp. 91–97). Academic

Press. https://doi.org/10.1016/B978-0-12-394447-4.10011-2

Kumar, P., Halder, S., & Bansal, M. (2019). Biomolecular Structures: Prediction, Identification

and Analyses. In S. Ranganathan, M. Gribskov, K. Nakai, & C. Schönbach (Eds.),

Encyclopedia of Bioinformatics and Computational Biology (pp. 504–534). Academic

Press. https://doi.org/10.1016/B978-0-12-809633-8.20141-6

Skariyachan, S., & Garka, S. (2018). Chapter 1—Exploring the binding potential of carbon

nanotubes and fullerene towards major drug targets of multidrug resistant bacterial

pathogens and their utility as novel therapeutic agents. In A. M. Grumezescu (Ed.),

Fullerens, Graphenes and Nanotubes (pp. 1–29). William Andrew Publishing.

https://doi.org/10.1016/B978-0-12-813691-1.00001-4

Xiong, J. (2006). Essential Bioinformatics. Cambridge University Press.

https://doi.org/10.1017/CBO9780511806087

You might also like