Protein Structure Prediction

Chapter 3:
Methods of Structure
Prediction
Introduction to Proteins, Kessel and Ben-Tal, 2018

The 3D structure of proteins can be
determined using lab methods
• Diffraction methods:
 X-ray diffraction/scattering
 Neutron scattering
 Electron microscopy/crystallography
• Spectroscopic methods:
 Nuclear magnetic resonance (NMR) spectroscopy
 Electron paramagnetic resonance (EPR) spectroscopy
 Others: FTIR, Raman spectroscopy, circular dichroism,
mass spectrometry
Why predict protein structure?
• Lab methods provide accurate 3D structures
• However:
 They are slow
 Some are expensive
 Membrane and large proteins are a problem
 Thus, only a small part of known proteins have solved
structure
• Computational methods are less accurate, but
faster and cheaper

Overview
• The physical (ab initio) approach
1. Force-field-based calculation of the potential energy
2. Configurational sampling
3. The mean-field approach
• The template-based (comparative) approach
1. Homology modelling
2. Fold recognition
• Integrative methods
• Experimentally guided computational prediction
• Evolutionary methods (correlated mutations)
The physical approach

• Assumptions
1. Protein folding involves a free energy decrease
2. Native structure has the lowest free energy (global minimum)
G1 G2
ΔG = G2 - G1 < 0
• Assumptions:
3. The folding process can be followed, given:
• An explicit description of the unfolded structure and its
surrounding (water, ions, lipids)
• A mathematical description of the total energy of the system
in a given configuration (‘force field’)
• An algorithm for sampling different configurations of the
system, to find the lowest-energy one (native)

• Explicit system description
Protein Water
Na+
Cl-

• Force-field-based calculation of the potential energy
 Folding involves a drop in the free energy of the system:
G1 G2
ΔG = G2 - G1 < 0
 The potential energy can approximate the free energy:
ΔG = ΔH - TΔSsys (constant temp)

(ΔE + Δ(PV))
(ΔU +
In biological ΔK)
systems: ΔG = ΔU + ΔK + Δ(PV) - TΔSsys
proportional to kBT

 U is calculated using a force-field (molecular mechanics):
covalent bond length
covalent bond angle
covalent bond dihedral angle
improper dihedral angle
vdW interactions
electrostatic interactions

• Configurational sampling
 Option 1: sampling all possible configurations randomly
and finding the lowest-energy one - impractical
 Option 2: Energy minimization
1. Calculate the potential energy of unfolded protein
2. Introduce small changes in atoms location and recalculate
3. If energy is lower – accept and repeat steps 1-2. If energy is
higher, make another change until energy is lower

 Problem: random changes in atomic locations may drive
the system towards higher-energy configurations
 Solution: the changes in atom location are predicted in
response to the force applied on the atoms (molecular
dynamics)
velocity position
acceleration
-
mass
time
(Newton’s 2nd law of motion)
 Problem: protein folding involves local energy minima 
the simulation gets the stuck!

 Solution: the kinetic barriers are overcome by virtual
heating, which translates into atomic motions (simulated
annealing)
• Extensive sampling allows to estimate ΔSsys:
S = kBlnΩ
number of possible system
configurations
• MD cannot describe protein folding. Why?
 Explicit models  the simulations cover ns-μs (folding takes
ms)
 The potential energy is just an approximation of the free
energy
 The description of bonds as springs is highly approximated
 Solvation effects are not fully accounted for
 The force constants are obtained by fitting calculations to
empirical data

• So why use MD?
 Refining models
 Producing near-native structures
 Exploring dynamic changes in proteins
 Learn about the separate contributions to the total free energy
 Deriving structure-function relationships
 Engineering proteins

• Implicit (mean-field) calculations
 MD’s biggest problem: describing all electrostatic
interactions between fixed protein charges and mobile
solvent charges (solvation/polarization energy)
Gelec =
 Solution:
Fixed protein charges – described by charge distribution (ρ)
Mobile solvent charges - described by the dielectric (ε)
Gelec – described as a function of ρ and the potential ()

 The elec. potential:

  is calculated by the Poisson equation
.
 In the presence of mobile ions, the
Poisson-Boltzmann equation is
used:
.
(I - the ionic strength in the system)
 The total nonpolar energy (hydrophobic + vdW) is derived
using an empirical surface area dependency:
change in buried surface area

upon folding

Template-based methods
• The 3D structure of the query protein is deduced
from proteins of known structure (templates) that:
1. Have a very similar sequence to the query protein
(homologs)
2. Share physicochemical properties or statistical tendencies
with the query protein
• Main methods:
1. Homology modeling
2. Fold recognition (threading)

Measuring structural similarity
• R.M.S.D - root-mean-square deviation
δ 2(3)
 N - the number of equivalent atoms

compared between the two structures
δ2(1)
 δi - the distance (in Å) between the atom
pair i (one from each protein):
δ 2 (2)
δ2 = (xa-xb)2 + (ya-yb)2 + (za-zb)2
• Smaller R.M.S.D = more similar structures
• Homology modeling: basic logics
 Proteins with similar sequences have similar structures
 Thus, the structure of a protein can be predicted based on
the structures of its sequence homologs
Templates
Query
• Homology modeling: steps
1. Finding templates with ≥ 30% sequence identity in
shared regions (psi-BLAST)
2. Aligning the two sequences
3. Transferring the coordinates of identical amino acids
from the template to the query protein
4. Performing energy optimization to get rid of clashes and
distortions
5. Model evaluation (WHATIF, Verify3D) – also for ranking
models
• Software/servers: Modeller, SwissModel, NEST

• Homology modeling: multiple sequence alignment
 MSA quality usually determines model accuracy
• Software: Clustal-ω, MAFFT, MUSCLE, T-Coffee

• Homology modeling: problems
1. The number of available templates is limited
2. Loops have low conservation  difficult to predict
Partial solution: using other methods for loop prediction
• Still, HM is currently the best method for structure
prediction

• Fold recognition: basic logics
 Proteins of similar structures share certain sequence-encoded
properties or statistical tendencies
 Thus, a protein of known structure can serve as a template if it
has similar properties/tendencies to those of the query protein

• Fold recognition: types of properties/tendencies
1. Amino acids-related: angles, surface area, polarity, 2nd
structure preference, etc.
2. Purely statistical – detected from multiple sequence
alignment

• Fold recognition: steps
1. Each position in the query protein is coded in a way
describing the specific tendencies of this position. This
yields a sequence profile for the query protein
2. The profile is systematically compared to a library
containing the profiles of all proteins of known structure
3. A match represents a protein with similar fold
4. If a match is not found, the query protein is assumed to
have a novel fold
• Software/servers: GenTHREADER, SPARKS-X, HHPred,

TASSER
• Integrative methods
 Integrate different approaches to get best results
 E.g.: HM for initial models, energy methods for refinement
 Software/servers: I-TASSER, Rosetta
Nature Protocols
(2010) 5 725–738

• Experimentally guided methods
 Structural data are derived from low-resolution lab methods
(EM, SAXS) and used as constraints in computational
structure prediction
J Mol Biol. (2009) 392: 181-90
A Rosetta implementation with cryo-EM constraints

Databases of predicted structures
• The protein model portal (PMP) at the NIH:
http://www.proteinmodelportal.org/
• The protein model database (PMDB):
https://bioinformatics.cineca.it/PMDB/

Evolutionary methods
• Prediction by correlated mutations
 Positions that are close in space tend to co-evolve
 Therefore, data on co-evolving positions can be used as
distance constraints on structure prediction
PLoS One (2011) 6: e28766

 The sequence information is used to build a contact map,
which is integrated into a structure prediction algorithm.
bioRxiv 021022; doi: https://doi.org/10.1101/021022

 Web servers: EVcouplings, RaptorX-Contacts, GREMLIN,
DESTINI

Summary of Chapter 3
• The 3D structure of proteins can be determined by lab
methods but most of these methods are slow and
expensive, resulting in a small number of known
structures compared to the explosion of protein
sequences that are determined every day.
• A partial solution to this problem is given by fast
computational methods that predict protein structures.

• Physics-based methods attempt to predict the structures
of proteins by sampling many folded conformations and
calculating their energy. Since the calculations are heavy
and approximated, such methods are usually unable to
predict protein structures from scratch, but can refine
near-native conformations and elucidate their dynamics.
• Comparative (template-based) methods rely on proteins
with known structure, and on their sequence similarity to
the query protein, to predict its structure.

• The best structure predictors combine comparative and
physics-based methods, and use data from NMR, SAXS,
and EM experiments to guide the prediction process.
• Recently developed methods that rely on the tendency of
protein positions close in 3D space to co-evolve have
been producing encouraging results.
• Experimentally structures are deposited in the Protein
Data Bank (PDB), whereas predicted structures can be
found in several databases.

Protein Structure Prediction

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Protein Structure Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Protein Structure Prediction

Uploaded by

Copyright:

Available Formats

Chapter 3:

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

ΔG = ΔH - TΔSsys (constant temp)

Introduction to Proteins, Kessel and Ben-Tal, 2018

covalent bond length

covalent bond angle

covalent bond dihedral angle

improper dihedral angle

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

• Extensive sampling allows to estimate ΔSsys:

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

change in buried surface area

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

 N - the number of equivalent atoms

• Software/servers: Modeller, SwissModel, NEST

• Software: Clustal-ω, MAFFT, MUSCLE, T-Coffee

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

• Software/servers: GenTHREADER, SPARKS-X, HHPred,

Introduction to Proteins, Kessel and Ben-Tal, 2018

J Mol Biol. (2009) 392: 181-90

A Rosetta implementation with cryo-EM constraints

• The protein model portal (PMP) at the NIH:

• The protein model database (PMDB):

Introduction to Proteins, Kessel and Ben-Tal, 2018

PLoS One (2011) 6: e28766

Introduction to Proteins, Kessel and Ben-Tal, 2018

bioRxiv 021022; doi: https://doi.org/10.1101/021022

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

Introduction to Proteins, Kessel and Ben-Tal, 2018

You might also like