Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Secondary Structure Motif

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 106

Motifs of Proteins Structure

Their Diverse Functions Require Proteins to


Have Irregular Structures

• Kendrew's model of the low-resolution structure of


myoglobin shown in three different views. The sausage-
shaped regions represent  helices, which are arranged in a
seemingly irregular manner to form a compact globular
molecule. (Courtesy of J.C. Kendrew.)
Types of Secondary Structure
• There are three common secondary structures in proteins,
namely alpha helices, beta sheets, and turns.
• That which cannot be classified as one of the standard
three classes is usually grouped into a category called
"other" or "random coil". This designation is unfortunate
as no portion of a protein’s three dimensional structure is
truly random and it is usually not a coil.
• A common element of most secondary structures is the
presence of characteristic hydrogen bonds e.g., C=O of
residue i to HN of residue i+4 (i, i+4). They are formed
when a number of consecutive residues have the same phi
and psi angles.
Helix
• In a helical conformation, the relationship of one peptide unit
to the next is the same for all alpha-carbons. This means that
the dihedral angle pairs phi and psi (phii, psii) are the same
for each residue in the helical conformation.

• Helices are classified as repetitive secondary structure since


their backbone phi and psi angles repeat

• Two parameters describe the helix about this axis:


n - the number of residues per helical turn
r - the rise per helical residue

• By convention, a positive value of n denotes a right-handed


helix. (Curling the fingers of your right hand along the helical
path, your thumb will point in the direction of your fingertips
if the helix is right-handed.)
-helix 310 helix -helix
Three Regular
Polypeptide Helices

phi psi
-helix -57.8 -47.0
310-helix -74.0 -4.0
-helix -57.1 -69.7

Idealized model of the


conformations of polyalanine
are displayed.
The Alpha () Helix

• Main-chain N and O atoms are hydrogen-bonded to each other 


helices. There are 3.6 residues per turn in an  helix, which
corresponds to 5.4 Å (1.5 Å per residue).
The Alpha () Helix

The Side chains project


out from the alpha helix .
The Alpha Helix has a Dipole Moment
Negatively charged
groups such as
phosphate ions
frequently bind to the
amino ends of helices.
The dipole moment of an
helix as well as the
possibility of hydrogen-
bonding to free NH
groups at the end of the
helix favors such
binding.
• (a) The dipole of a peptide unit. Values in boxes give the
approximate fractional charges of the atoms of the peptide unit.
(b) The dipoles of peptide units are aligned along the -helical
axis, which creates an overall dipole moment in the  helix,
positive at the amino end and negative at the carboxyl end.
The Helical
Wheels
• The helical wheel or
spiral. Amino acid
residues are plotted
every (360/3.6) 100°
around the spiral.
• Green is an amino acid
with a hydrophobic side
chain, blue is a polar
side chain, and red is a
charged side chain.
Helix Wheels
Some Amino Acids are Preferred in a Helix
• Eight Most Common Residues as Helix Formers:
– Glu, Met, Ala, Leu, Lys, Phe, Gln, Trp
• Eight Least Common Residues as Helix Formers:
– Gly, Pro, Asn, Tyr, Cys, Ser, Thr, Arg
-sheet
• Beta sheets are another major structural element in globular
proteins containing 20 – 28 % of all residues (Kabsch & Sander,
1983; Creighton, 1993).
• The basic unit of a beta sheet is a beta strand with approximate
backbone dihedral angles phi = -120 and psi = +120 producing a
translation of 3.2 to 3.4 Å/residue for residues in anti-parallel
and parallel strands, respectively.
• Due to the extended nature of the chain, there are no significant
intra-segment hydrogen bonds and van der Waals interactions
between atoms of neighboring residues. This extended
conformation is only stable as part of a beta sheet where
contributions from hydrogen bonds and van der Waals
interactions between aligned strands exert a stabilizing influence.
• The beta sheet is sometimes called the beta "pleated" sheet since
sequentially neighboring C atoms are alternately above and
below the plane of the sheet.
Single β strand are rarely found in
proteins because the structure is not
that much more stable than a
random coil. However, when two
adjacent β strands line up they can
from bridges of hydrogen bonds.
This creates a very stable structure
known as a β sheet.
Anti-parallel  sheet

• Main-chain NH and O atoms within a  sheet are hydrogen


bonded to each other. The amino acids in successive strands
have alternating directions (anti-parallel).
Anti-parallel  sheet

A residue in an antiparallel beta strand has values of -139 and +135


degrees for the backbone dihedral angles phi and psi,respectively.
Antiparallel beta sheets are thought to be intrinsically more stable than
parallel sheets due to the more optimal orientation of the interstrand
hydrogen bonds and that peptide bond dipoles of nearest neighbors
within a strand cancel whereas in the parallel sheet, components of
the dipoles parallel to the strands align and may interact unfavorably.
Parallel  sheet

The amino acids in the aligned strands run in the same direction.
Twisted  Sheets in Thioredoxin
Twist of  Sheet
• The classical beta sheets originally proposed
are planar but most sheets observed in globular
proteins are twisted (0 to 30 º/ residue).
• Antiparallel beta sheets are more often twisted
than parallel sheets. This twist is always of the
same handedness, but unfortunately, it has
been described using two conflicting
conventions in the literature. If defined in
terms of the progressive twist of the hydrogen-
bonding direction, the twist is right-handed.
• Two-stranded beta strands show the largest
twists.
Preferred Residues for  Sheet and Turns
• Eight most common • Eight most common
residues for beta-sheet residues for turns
Val, Ile, Tyr, Trp, Gly, Asn, Pro, Asp,
Phe, Leu, Cys, Thr Ser, Cys, Tyr, Lys
• Eight least common • Eight least common
residues for beta-sheet residues for turns
Glu, Asp, Pro, Ser, Ile, Val, Met, Leu,
Lys, Gly, Ala, Asn Phe, Ala, Glu, Trp
Loops
• In Leszczynski & Rose (1986), out of 67
proteins surveyed, they tabulated 26 % helix,
19% sheet, 26 % turns and 21 % in loops.
• These loop structures contain between 6 and 16
residues and are compact and globular in
structure. Like turns, they generally contain
polar residues and hence are predominantly at
the protein surface.
• When homologous aa from different species
are compared, insertions and deletions of a
few residues occur almost exclusively in loop
regions.
• During Evolution cores are much more stable
than loop.
• Loop regions connecting 2 antiparallel Beta
strands are called Hairpin loops, short hair pin
loops are called reverse turns.
-hairpin Loop

• Adjacent antiparallel  strands are joined by hairpin loops. Such loops are
frequently short and do not have regular secondary structure. Nevertheless,
many loop regions in different proteins have similar structures.
Schematic Structural Diagrams of Myoglobin
Richardson Diagrams

Myoglobin Triosephosphate isomerase

Cylinder for  helices; arrows for  strands, which gives the


direction of the strand from N to C; and the ribbons for the
remaining part.
Beta Sheet Topology Diagrams

transcarbamoylase flavodoxin plastocyanin


• Beta sheets are usually represented simply by arrows in topology
diagrams that show both the direction of each  strand and the way
the strands are connected to each other along the polypeptide chain.
Super Secondary Structures (Motifs)
• Simple combinations of a few secondary structure
elements with a specific geometric arrangement are
called super secondary structures or motifs.
• They may have functional and structural significance.
• Common motifs:
– Helix-turn-helix
-hairpin, -meander
-barrel, Geek key

Helix-Turn-Helix MotifCa binding,
Parvalbumin,c
almodulin,trop
onin-c. This
ca binding
motif first
found by
Robert
Kretsinger inn
Parrvalbumin
in 1973 to 1.8
A

• Two  helices that are connected by a short loop region in a


specific geometric arrangement constitute a helix-turn-helix
motif. (a) the DNA-binding motif and (b) the calcium-
binding motif, which are present in many proteins whose
function is regulated by calcium.
EF-hand Calcium-binding Motif

• The calcium atom is bound to one of the motifs in the muscle


protein troponin-C through six oxygen atoms: one each from
the side chains of Asp (D) 9, Asn (N) 11, and Asp (D) 13; one
from the main chain of residue 15; and two from the side chain
of Glu (E) 20. In addition, a water molecule (W) is bound to
the calcium atom.
Amino Acid Sequences of EF-hand
Motifs
1 3 5 7 9 12

The side chains of hydrophobic residues on the flanking helices


form a hydrophobic core between the  helices
The  Hairpin Motif

Bovine Trypsin Inhibitor Snake Venom Erabutoxin


• The hairpin motif is very frequent in  sheets and is built up
from two adjacent  strands that are joined by a loop region.
Greek Key Motif

• The Greek key motif is found in antiparallel sheets when


four adjacent  strands are arranged in the pattern shown as a
topology diagram in (a). The three dimensional structure of the
enzyme Staphylococcus Nuclease shown in (b) in blue and red
is also a Greek key motif.
Forming Greek Key Motif

• Suggested folding pathway from


a hairpin-like structure to the
Greek key motif.
• Beta strands 2 and 3 fold over
such that strand 2 is aligned
adjacent and antiparallel to
strand 1.
 Motif

• Two adjacent parallel  strands are usually connected by


an  helix from the C-terminus of strand 1 to the N-
terminus of strand 2.
• Most protein structures that contain parallel  sheets are
built up from combinations of such  motifs.
 Handedness

• The  motif can in principle have two "hands."


• (a) This connection with the helix above the sheet is found
in almost all proteins and is called right-handed because it
has the same hand as a right-handed  helix.
• (b) The left-handed connection with the helix below the
sheet.
Domain Organization

• Small protein molecules like the epidermal growth factor, EGF,


are comprised of only one domain. Others, like the serine
proteinase chymotrypsin, are arranged in two domains that are
required to form a functional unit. Many of the proteins that are
involved in blood coagulation and fibrinolysis have long
polypeptide chains that comprise different combinations of
domains.
Domains
• "Within a single subunit [polypeptide chain], contiguous portions
of the polypeptide chain frequently fold into compact, local semi-
independent units called domains." - Richardson, 1981
• Domains may be considered to be connected units, which are to
varying extents independent in terms of their structure, function
and folding behavior.
• Each domain can be described by its fold. While some proteins
consist of a single domain, others consist of several or many. A
number of globular protein chains consist of two or three domains
appearing as 'lobes'.
• In other cases, the domains may be of a very different nature. For
example, some proteins located in cell membranes have a
globular intracellular or extracellular domain distinct from that
which spans the membrane.
Adjacent Motifs

Motifs that are adjacent in


the amino acid sequence
are also usually adjacent
in the three-dimensional
structure.
Triose-phosphate isomerase
is built up from four
 motifs that are
consecutive both in the
amino acid sequence (a)
and in the three
dimensional structure (b).
Mosaic Proteins
• Mosaic proteins are those
which consist of many
repeated copies of one or a
few domains, all within one
polypeptide chain.
• Many extracellular proteins
are of this nature. The
domains in question are
termed modules and are
sometimes relatively small.
Note that this term is often
applied to sequences whose
structures may not be
known for certain.
PDB: An Information Portal to 106293 Biological
Macromolecular Structures
SCOP
• The Unit of classification is usually the
protein domain.
• Classification:
• Family: criteria is common evolutionary
origin based on 1) Proteins are clustered
together into families on the basis of 30% and
greater seq residue identity.2) Lower seq
identities but whose functions and structures
are very similar.
Superfamily
• Proteins with low seq identities but whose
structure and in many cases functional features
suggest that a common evolutionary origin is
probableand are placed together inn
superfamilies.
• Common fold: Super families and families are
defined as having common fold if the proteins
have the same major sec structures in the same
arrangement and with the same topological
connections.
• Class: The different folds have been grouped
into classes.
• Five different classes:
• All Alpha
• All Beta
• Alpha/Beta
• Alpha and Beta
• Multidomain: those with domains of different
fold and for which no homologues are known at
present
CATH
CATH v4.0 based on PDB dated March 26, 2013
235,858 CATH Domains
2,738 CATH Superfamilies
69,058 Annotated PDBs
Side Chains and Tertiary
Structure
• At the level of tertiary structure the side
chain plays a much more active role in
creating the final structure.
• The final 3 D tertiary structure of a protein
is commonly referred to as its fold.
Among the most widely cited—yet least read—papers in the field, partly
owing to the difficulties in getting hold of them, Cyrus Levinthal used a
simple model to show that a typical polypeptide chain cannot fold
through an unbiased search of all conformational space on a
reasonable timescale. This is commonly referred to as the "Levinthal's
paradox", and led to the concept that proteins fold along discrete
pathways. The first paper presents this idea and is usually cited, but the
model is actually presented in the second one. Although the model was
later shown to be overly simplistic, the work had a crucial role in directing
the search and characterization of intermediate states.
• Are there pathways for protein folding?-
Levinthal, C.
J. Chim. Phys. Physico-Chim. Biol. 65, 44-45 (1968).

Levinthal, C.
Mossbauer Spectroscopy in Biological Systems, Proceedings of a meeting held at Allerton house, Monticello, Illinois. (eds
P. Debrunner, J. Tsibris, & E. Munck) 22-24 (University of Illinois Press, Urbana, Illinois, 1969).
Levinthal Paradox
• Cyrus Levinthal, Columbia University, 1968
• Observed that there is insufficient time to
randomly search the entire conformational
space of a protein
• Resolution: Proteins have to fold through
some directed process
• Goal is to understand the dynamics of this
process
Even the once challenging Levinthal puzzle
now seems to have an answer—a protein
can avoid searching irrelevant
conformations and fold quickly by making
local independent decisions first, followed
by non-local global decisions later.
Each Protein has a unique structure
Folding funnel
 ~30000 proteins encoded in the human
genome folds into >1000 unique structural
architectures

 Hydrophobic, electrostatic van der Waals


forces and Hydrogen bonds

 protein's native state corresponds to its


free energy minimum under the solution
conditions usually encountered in cells.

• Many human diseases are associated with the chronic expression of misfolded & damaged proteins
Current leading model of protein folding , Peter
Wolynes and colleagues in 1991, gave folding
“energy Landscape” which is funnel shaped

the edges = high potential


energy unfolded states.
But , How the protein
chain avoids getting stuck
in “dips” in the funnel that
are low energuy but not
the native (lowest energy)
fold.
• Small naturally evolved proteins have been found
to fold in a “cooperative manner”. That is the
whole protein chain participates in the folding
process and once the protein begins to fold the
transition from unfolded to folded stae is
extremely rapid, and discrte intermediate stages
cannot be isolated. This is specific cooperative
folding quality seen in naturalsequences and not
necessarily seen in laboratory designed proteins.
(watters ,2007)
• Larger proteins(>100 residues) appears to fold based
on coocperative folding. But their complexity leads
to slower folding , often with clear transition states
and a more modular folding pattern Which
introduces a new risk, the partly folded protein could
start to interact improperly with other proteins
leading to the formaton of aggregates (Dobson,
2003).
• For this cells employ molecular chaperones , which
help proteins to attain correct fold and avoid
aggregation (Hartl, 2002)
This seminal paper established that the primary amino
acid sequence of a protein is sufficient to determine its
secondary and tertiary structures. Christian Anfinsen was
awarded the Nobel Prize for this work in 1973

• The kinetics of formation of native ribonuclease during


oxidation of the reduced polypeptide chain.-
• Anfinsen, C.B., Haber, E., Sela, M. & White, F.H. Proc.
Natl. Acad. Sci. USA 23, 121-282 (1968).
Proteins are capable of self
assembly
• The process by which a linear p.p chain
achieves its distinctive fold is known as
Protein Folding.
• The primary str. of a protein contains all the
information required for it to acquuire the
correct fold.
• A dominent molecular interaction in the 3
struc of pro is the Hydrophobic effect.
(Tanford 1978).
• Secondary Structural elements are critical to
the formation of the hydrohobic core
• Because….. Although residues with hydrophobic R group fit

naturally into the core of the protein, their polar poly pep

backbone doesn’t . For the backbone to participate in the

hydro core its H bonding groups must be satisfied such that

their polarity , in effect, neutralized. Ordered Secondary Struc

elements provide this neutrality through their regular H

bonding patterns. Thius, sec strc elements are critical to the

formation of the hydrophobic core.


Polar residues can be inside
Core
• Polar Residues may exist into core. They
also subject to same restrictions ast the PP
backbone: they must be involved in an
interaction that neutralizes their polarity.
Buried polar residues can form H bond with
other polar residues, or with sections of
backbone not participating in a regular sec
structure formation.
Water Molecule could be part
of Protein ?
• In some protein small pockets exists in
which buried residues satisfy their H-bonds
with Water Molecules. These are
completely isolated from solvent and are
integral part to the protein structure.
• Fold Space
• Biochemical classifications of FOLDS:
Globular, membrane and fibrous.

Structural Classification of Folds (grouped


proteins on predominant secondary
structure element).
Number of
Number of families
Class Number of folds superfamilies

All alpha proteins 284 507 871


All beta proteins 174 354 742
Alpha and beta
147 244 803
proteins (a/b)
Alpha and beta
376 552 1055
proteins (a+b)
Multi-domain
66 66 89
proteins
Membrane and cell
58 110 123
surface proteins
Small proteins 90 129 219
Total 1195 1962 3902
Secondary Structure Prediction
• Use PSIPRED (output will be H,E and C
with confidence line consisting of digits (9
to 0) indicating the reliability (9 indicates
high reliability and 0 shows poor).
• Predict protein server most
comprehensive site for protein structure
analysis.
• The x-ray-determined structures of 15
proteins containing 2473 amino acid
residues were carefully examined, and the
number of Occurrences of a given amino
acid in the a helix, Beta sheet, and coil was
tabulated
• From this. the conformational parameters
for each amino acid were calculated by
considering the relative frequency of a
given amino acid within a protein, its
Occurrence in a given type of sccondry
structure, and the fraction of residues
occurring in that type of structure- (Chou
and Fasman, 1974)
Chou and Fasman later extended thc analysis of a helix, p sheet, and
coil to include 29 proteins of known x-ray structure. This increased
the total number of residues classificd to 4741, or approximately
double the Initial number (Chou and Fasman. 1978). Thc most
pronounced change occuhed lor Met. This changc rcsulted from an
underrepresentation of MCI in the initial 15 proteins examined. kss
pronounced changes were also seen in Asn. Asp, Ala. His, Gly. He,
Lys, and Tyr
• An abbreviated set of rules follows (Fasman, 1985).

• 1. A cluster of four helical residues (Ha or ha) out of six along the protein sequcnce will
initiate a helix. The helical segment is extended in both directions until sets of tetrapeptide
breakers ((Pa) < 1.00) are reached.
• Proline cannot occur in the inner helix or at the C-terminal helical end but can occur
within the last three residues at the N-terminal end.
• The inner helix is defined as one omitting the three helical end residues at both the amino
and carboxyl ends.
• Any segment that is at least six residues long with (Pa) > 1.03 and (Pa) > (Pg) is predicted
as belical.
• 2. A cluster of thrce Beta formers or a clustcr of three Beta formers out of five rcsidues
along the sequence will initiate a Beta sheet. The Beta sheet is propagated in both
directions until terminated by a set of tetrapcptide breakers ((P) < 1 .00).
• Any segment with (P beta) > 1.05 as well as (Pbeta) > (Palpha) is predicted as beta sheet.
Classification

• Eight states from DSSP


 H: helix 24 26 E H < S+ 0 0 132
 G: 310 helix 25 27 R H < S+ 0 0 125
 I: -helix 26 28 N < 0 0 41
27 29 K 0 0 197
 E: strand 28 ! 0 0 0
 B: bridge 29 34 C 0 0 73
 T: turn 30 35 I E -cd 58 89B 9
31 36 L E -cd 59 90B 2
 S: bend 32 37 V E -cd 60 91B 0
 C: coil 33 38 G E -cd 61 92B 0

• CASP Standard
 H = (H, G, I), E = (E, B), C = (C, T, S)
What is secondary structure
prediction?
• Given a protein sequence (primary structure)

GHWIATR
HWIAT GQLIREAYEDYRHF
GQLIREAYEDY SSECPFIP
SS

 Predict its secondary structure content


(C=Coils H=Alpha Helix E=Beta Strands)

CEEEEEC
EEEEE HHHHHHHHHHHCCC
HHHHHHHHHHH HHCCCCCC
HH
Why secondary structure prediction?

o An easier problem than 3D structure prediction (more


than 40 years of history).
o Accurate secondary structure prediction can be an
important information for the tertiary structure
prediction
o Protein function prediction
o Protein classification
o Predicting structural change
Prediction methods

o Statistical method
o Chou-Fasman method, GOR I-IV
o Nearest neighbors
o NNSSP, SSPAL
o Neural network
o PHD, Psi-Pred, J-Pred
o Support vector machine (SVM)
o HMM
Accuracy measure

• Three-state prediction accuracy: Q3

Q 3  correctly predicted residues


number of residues

 A prediction of all loop: Q3 ~ 40%


 Correlation coefficients
Improvement of accuracy

1974 Chou & Fasman ~50-53%


1978 Garnier 63%
1987 Zvelebil 66%
1988 Qian & Sejnowski 64.3%
1993 Rost & Sander 70.8-72.0%
1997 Frishman & Argos <75%
1999 Cuff & Barton 72.9%
1999 Jones 76.5%
2000 Petersen et al. 77.9%
Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)
Alanine 142 83 66 0.060 0.076 0.035 0.058
Arginine 98 93 95 0.070 0.106 0.099 0.085

Aspartic acid 101 54 146 0.147 0.110 0.179 0.081

Asparagine 67 89 156 0.161 0.083 0.191 0.091


Cysteine 70 119 119 0.149 0.050 0.117 0.128

Glumatic acid 151 37 74 0.056 0.060 0.077 0.064

Glutamine 111 110 98 0.074 0.098 0.037 0.098

Glycine 57 75 156 0.102 0.085 0.190 0.152

Histidine 100 87 95 0.140 0.047 0.093 0.054

Isoleucine 108 160 47 0.043 0.034 0.013 0.056

Leucine 121 130 59 0.061 0.025 0.036 0.070


Lysine 114 74 101 0.055 0.115 0.072 0.095

Methionine 145 105 60 0.068 0.082 0.014 0.055

Phenylalanine 113 138 60 0.059 0.041 0.065 0.065

Proline 57 55 152 0.102 0.301 0.034 0.068


Serine 77 75 143 0.120 0.139 0.125 0.106

Threonine 83 119 96 0.086 0.108 0.065 0.079

Tryptophan 108 137 96 0.077 0.013 0.064 0.167

Tyrosine 69 147 114 0.082 0.065 0.114 0.125


Valine 106 170 50 0.062 0.048 0.028 0.053
An intermediate but useful step is to predictthe protein secondary
structure, that is, each residue of a protein sequence is assigned
aconformational state, either helix (H), strand (E) or coil (C). The
information provided by thisassignment is valuable both in
ab initio
tertiary structure prediction and as additional restraintsfor fold
recognition algorithms (Cuff and Barton, 2000). In addition, it can
also be used inprotein function prediction (Paquet et al ., 2000).
The Chou-Fasman method was among the first secondary
structure prediction algorithmsdeveloped and relies
predominantly on probability parameters determined from
relativefrequencies of each amino acid's appearance in each
type of secondary structure (Chou andFasman, 1974). The
original Chou-Fasman parameters, determined from the small
sample of structures solved in the mid-1970s, produce poor
results compared to modern methods, thoughthe
parameterization has been updated since it was first published.
The Chou-Fasman method isroughly 56-60% accurate in
predicting secondary structures (Mount, 2004).
Chou-Fasman algorithm

• Helix, Strand
1. Scan for window of 6 residues where average score > 1 (4 residues
for helix and 3 residues for strand)
2. Propagate in both directions until 4 (or 3) residue window with mean
propensity < 1
3. Move forward and repeat
• Conflict solution
Any region containing overlapping alpha-helical and beta-strand
assignments are taken to be helical if the average P(helix) > P(strand). It
is a beta strand if the average P(strand) > P(helix).
• Accuracy: ~50%  ~60%

GHWIATRGQLIREAYEDYRHFSSECPFI
P
Initiation

Identify regions where 4/6 have a P(H) >1.00


“alpha-helix nucleus”

T S P T A E L M R S T G
P(H) 69 77 57 69 142 151 121 145 98 77 69 57

T S P T A E L M R S T G
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
Propagation

Extend helix in both directions until a set of


four residues have an average P(H) <1.00.

T S P T A E L M R S T G
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
GOR Method:
The last names of the authors who originally developed it
(Garnier, Osguthorpe, and Robson, 1978)

 function called the information:

I(S , R) = ln (fS,RN/fRfS).

fS,R is the frequency of amino acid R in secondary structure S

in the database, N is the number of amino acids in the database,

and fR and fS are the frequencies of R and S in the database,

respectively.
GOR predicts the sec str based on a window of 17 residues. For each
residue in th seq, 8 N terminal and 8 C terminal positions are considered
along with the central residue.
A collection of sample proteins of known secondary structure was
analyzed and the frequencies with which each amino acid occupioed each
of the 17 window position in helices, sheets and coils
Protein Structure Prediction
• Stage 1: Backbone
Prediction
– Ab initio folding
– Homology modeling
– Protein threading

• Stage 2: Loop
Modeling
• Stage 3: Side-Chain
Packing
• Stage 4: Structure
Refinement

p://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.html
How accurate or reliable is
homology modeling?
Homology models are classified into 3 area in terms of their accuracy and
reliability.

•Midnight Zone Less than 20% sequence identity. The structure cannot
reliably be used as a template.

•Twilight Zone 20% - 40% sequence identity. Sequence identity may


imply structural identity.

•Safe Zone 40% or more sequence identity. It is very likely that sequence
identity implies structural identity.
Server name URL Automatic Homology Modeling Servers
3D-Jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHModels http://www.cbs.dtu.dk/services/CPHmodels/
EsyPred3D http://www.fundp.ac.be/urbm/bioinfo/esypred/
Robetta http://robetta.bakerlab.org/
SwissModel http://swissmodel.expasy.org/
TASSER-lite
http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/inde
x.html
Semi-Automatically Homology Modeling Servers (provide your
own alignment)
HOMER http://protein.cribi.unipd.it/homer/help.html
WHAT If http://swift.cmbi.kun.nl/WIWWWI/
Homology Modeling
• Identify a set of protein structures related to the
target protein
• Align the sequence of the target with the
sequences of the template protein.
• Model Building
• Loop Modeling
• Side Chain Modeling
• Evaluation of the Model
Assignment on YP_001856241.1
• Model of a protein by homology modelling
(swissprot)http://swissmodel.expasy.org/
Tut-
http://spdbv.vital-it.ch/TheMolecularLevel/Ma
tics/index.html

• Model protein by threading (in expasy Phyre)


http://www.sbg.bio.ic.ac.uk/phyre/
• Query Pro Seq Structure DB search Clear Homology
YES Homology Modelling
NO

Iterative Search
BLAST/Profiles
Model Generation
Modeller/What if
Secondary Str Prediction
Chou-Fasman, GOR,
DSC,PHD YES
Clear Homology

NO Evaluate Models
FOLD Recognition
SEQ FOLD/SCOP
Built MSA

Refine Models

Rank Folds
General Alternative
Activity Similar Hits
Alignments
• Thresding is used:
• 1) the sequence has little or no primary seq
smilarity to any sequence with a known
structure
• 2) some model from the structure library
represents the true fold of the sequence.
• Threading is chosen because :
• Very remote relationships can be better
detected through 2D or 3D structural
homology instead of sequence homology.
Protein Threading

• A target sequence is threaded through the backbone


structure of a collection of template proteins (fold
library)
• Quantitative measure of how well the sequence fits
the fold
• Based on assumptions
– 3-D structures of proteins have characteristics that are
semi-quantitatively predictable
– reflect the physical-chemical properties of amino acids
– Limited types of interactions allowed within folding
Protein Threading
• The word threading implies that one drags the
sequence (ACDEFG...) step by step through each
location on each template
Protein Threading
Generalized Threading Score
• Want to correctly recognize arrangements of residues
• Building a score function
– potentials of mean force
– from an optimization calculation.
• G(rAB) = kTln (ρAB/ ρAB°)
– G, free energy
– k and T Boltzmanns constant and temperature respectively
– ρ is the observed frequency of AB pairs at distance r.
– ρ° the frequency of AB pairs at distance r you would expect to see by
chance.
• Z-score = (ENat - <Ealt>)/σ Ealt
– Natural energies and mean energies of all the wrong structures/
standard deviation
Scoring Different Folds

• Goodness of fit score


– Based on empirical energy
function
– Modify to take into account
pairwise interactions and
solvation terms
– High score means good fit
– Low score means nothing
learned
Some Threading Programs

• 3D-pssm (ICNET). Based on sequence profiles, solvatation potentials and


secondary structure.
• TOPITS (PredictProtein server) (EMBL). Based on coincidence of secondary
structure and accesibility.
• UCLA-DOE Structure Prediction Server (UCLA). Executes various threading
programs and report a consensus.
• 123D+ Combines substitution matrix, secondary structure prediction, and contact
capacity potentials.
• SAM/HMM (UCSC). Basen on Markov models of alignments of crystalized proteins.
• FAS (Burnham Institute). Based on profile-profile matching algorithms of the query
sequence with sequences from clustered PDB database.
• PSIPRED-GenThreader (Brunel)
• THREADER2 (Warwick). Based on solvatation potentials and contacts obtained
from crystalized proteins.
• ProFIT CAME (Salzburg)

You might also like