Secondary Structure Motif
Secondary Structure Motif
Secondary Structure Motif
phi psi
-helix -57.8 -47.0
310-helix -74.0 -4.0
-helix -57.1 -69.7
The amino acids in the aligned strands run in the same direction.
Twisted Sheets in Thioredoxin
Twist of Sheet
• The classical beta sheets originally proposed
are planar but most sheets observed in globular
proteins are twisted (0 to 30 º/ residue).
• Antiparallel beta sheets are more often twisted
than parallel sheets. This twist is always of the
same handedness, but unfortunately, it has
been described using two conflicting
conventions in the literature. If defined in
terms of the progressive twist of the hydrogen-
bonding direction, the twist is right-handed.
• Two-stranded beta strands show the largest
twists.
Preferred Residues for Sheet and Turns
• Eight most common • Eight most common
residues for beta-sheet residues for turns
Val, Ile, Tyr, Trp, Gly, Asn, Pro, Asp,
Phe, Leu, Cys, Thr Ser, Cys, Tyr, Lys
• Eight least common • Eight least common
residues for beta-sheet residues for turns
Glu, Asp, Pro, Ser, Ile, Val, Met, Leu,
Lys, Gly, Ala, Asn Phe, Ala, Glu, Trp
Loops
• In Leszczynski & Rose (1986), out of 67
proteins surveyed, they tabulated 26 % helix,
19% sheet, 26 % turns and 21 % in loops.
• These loop structures contain between 6 and 16
residues and are compact and globular in
structure. Like turns, they generally contain
polar residues and hence are predominantly at
the protein surface.
• When homologous aa from different species
are compared, insertions and deletions of a
few residues occur almost exclusively in loop
regions.
• During Evolution cores are much more stable
than loop.
• Loop regions connecting 2 antiparallel Beta
strands are called Hairpin loops, short hair pin
loops are called reverse turns.
-hairpin Loop
• Adjacent antiparallel strands are joined by hairpin loops. Such loops are
frequently short and do not have regular secondary structure. Nevertheless,
many loop regions in different proteins have similar structures.
Schematic Structural Diagrams of Myoglobin
Richardson Diagrams
Levinthal, C.
Mossbauer Spectroscopy in Biological Systems, Proceedings of a meeting held at Allerton house, Monticello, Illinois. (eds
P. Debrunner, J. Tsibris, & E. Munck) 22-24 (University of Illinois Press, Urbana, Illinois, 1969).
Levinthal Paradox
• Cyrus Levinthal, Columbia University, 1968
• Observed that there is insufficient time to
randomly search the entire conformational
space of a protein
• Resolution: Proteins have to fold through
some directed process
• Goal is to understand the dynamics of this
process
Even the once challenging Levinthal puzzle
now seems to have an answer—a protein
can avoid searching irrelevant
conformations and fold quickly by making
local independent decisions first, followed
by non-local global decisions later.
Each Protein has a unique structure
Folding funnel
~30000 proteins encoded in the human
genome folds into >1000 unique structural
architectures
• Many human diseases are associated with the chronic expression of misfolded & damaged proteins
Current leading model of protein folding , Peter
Wolynes and colleagues in 1991, gave folding
“energy Landscape” which is funnel shaped
naturally into the core of the protein, their polar poly pep
• 1. A cluster of four helical residues (Ha or ha) out of six along the protein sequcnce will
initiate a helix. The helical segment is extended in both directions until sets of tetrapeptide
breakers ((Pa) < 1.00) are reached.
• Proline cannot occur in the inner helix or at the C-terminal helical end but can occur
within the last three residues at the N-terminal end.
• The inner helix is defined as one omitting the three helical end residues at both the amino
and carboxyl ends.
• Any segment that is at least six residues long with (Pa) > 1.03 and (Pa) > (Pg) is predicted
as belical.
• 2. A cluster of thrce Beta formers or a clustcr of three Beta formers out of five rcsidues
along the sequence will initiate a Beta sheet. The Beta sheet is propagated in both
directions until terminated by a set of tetrapcptide breakers ((P) < 1 .00).
• Any segment with (P beta) > 1.05 as well as (Pbeta) > (Palpha) is predicted as beta sheet.
Classification
• CASP Standard
H = (H, G, I), E = (E, B), C = (C, T, S)
What is secondary structure
prediction?
• Given a protein sequence (primary structure)
GHWIATR
HWIAT GQLIREAYEDYRHF
GQLIREAYEDY SSECPFIP
SS
CEEEEEC
EEEEE HHHHHHHHHHHCCC
HHHHHHHHHHH HHCCCCCC
HH
Why secondary structure prediction?
o Statistical method
o Chou-Fasman method, GOR I-IV
o Nearest neighbors
o NNSSP, SSPAL
o Neural network
o PHD, Psi-Pred, J-Pred
o Support vector machine (SVM)
o HMM
Accuracy measure
• Helix, Strand
1. Scan for window of 6 residues where average score > 1 (4 residues
for helix and 3 residues for strand)
2. Propagate in both directions until 4 (or 3) residue window with mean
propensity < 1
3. Move forward and repeat
• Conflict solution
Any region containing overlapping alpha-helical and beta-strand
assignments are taken to be helical if the average P(helix) > P(strand). It
is a beta strand if the average P(strand) > P(helix).
• Accuracy: ~50% ~60%
GHWIATRGQLIREAYEDYRHFSSECPFI
P
Initiation
T S P T A E L M R S T G
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
T S P T A E L M R S T G
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
Propagation
T S P T A E L M R S T G
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
GOR Method:
The last names of the authors who originally developed it
(Garnier, Osguthorpe, and Robson, 1978)
I(S , R) = ln (fS,RN/fRfS).
respectively.
GOR predicts the sec str based on a window of 17 residues. For each
residue in th seq, 8 N terminal and 8 C terminal positions are considered
along with the central residue.
A collection of sample proteins of known secondary structure was
analyzed and the frequencies with which each amino acid occupioed each
of the 17 window position in helices, sheets and coils
Protein Structure Prediction
• Stage 1: Backbone
Prediction
– Ab initio folding
– Homology modeling
– Protein threading
• Stage 2: Loop
Modeling
• Stage 3: Side-Chain
Packing
• Stage 4: Structure
Refinement
p://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.html
How accurate or reliable is
homology modeling?
Homology models are classified into 3 area in terms of their accuracy and
reliability.
•Midnight Zone Less than 20% sequence identity. The structure cannot
reliably be used as a template.
•Safe Zone 40% or more sequence identity. It is very likely that sequence
identity implies structural identity.
Server name URL Automatic Homology Modeling Servers
3D-Jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
CPHModels http://www.cbs.dtu.dk/services/CPHmodels/
EsyPred3D http://www.fundp.ac.be/urbm/bioinfo/esypred/
Robetta http://robetta.bakerlab.org/
SwissModel http://swissmodel.expasy.org/
TASSER-lite
http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/inde
x.html
Semi-Automatically Homology Modeling Servers (provide your
own alignment)
HOMER http://protein.cribi.unipd.it/homer/help.html
WHAT If http://swift.cmbi.kun.nl/WIWWWI/
Homology Modeling
• Identify a set of protein structures related to the
target protein
• Align the sequence of the target with the
sequences of the template protein.
• Model Building
• Loop Modeling
• Side Chain Modeling
• Evaluation of the Model
Assignment on YP_001856241.1
• Model of a protein by homology modelling
(swissprot)http://swissmodel.expasy.org/
Tut-
http://spdbv.vital-it.ch/TheMolecularLevel/Ma
tics/index.html
Iterative Search
BLAST/Profiles
Model Generation
Modeller/What if
Secondary Str Prediction
Chou-Fasman, GOR,
DSC,PHD YES
Clear Homology
NO Evaluate Models
FOLD Recognition
SEQ FOLD/SCOP
Built MSA
Refine Models
Rank Folds
General Alternative
Activity Similar Hits
Alignments
• Thresding is used:
• 1) the sequence has little or no primary seq
smilarity to any sequence with a known
structure
• 2) some model from the structure library
represents the true fold of the sequence.
• Threading is chosen because :
• Very remote relationships can be better
detected through 2D or 3D structural
homology instead of sequence homology.
Protein Threading