Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Protein Structure Prediction

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Chapter 3:

Methods of Structure
Prediction

Introduction to Proteins, Kessel and Ben-Tal, 2018


The 3D structure of proteins can be
determined using lab methods
• Diffraction methods:
 X-ray diffraction/scattering
 Neutron scattering
 Electron microscopy/crystallography

• Spectroscopic methods:
 Nuclear magnetic resonance (NMR) spectroscopy
 Electron paramagnetic resonance (EPR) spectroscopy
 Others: FTIR, Raman spectroscopy, circular dichroism,
mass spectrometry
Introduction to Proteins, Kessel and Ben-Tal, 2018
Why predict protein structure?
• Lab methods provide accurate 3D structures
• However:
 They are slow
 Some are expensive
 Membrane and large proteins are a problem
 Thus, only a small part of known proteins have solved
structure
• Computational methods are less accurate, but
faster and cheaper

Introduction to Proteins, Kessel and Ben-Tal, 2018


Overview
• The physical (ab initio) approach
1. Force-field-based calculation of the potential energy
2. Configurational sampling
3. The mean-field approach
• The template-based (comparative) approach
1. Homology modelling
2. Fold recognition
• Integrative methods
• Experimentally guided computational prediction
• Evolutionary methods (correlated mutations)
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Assumptions
1. Protein folding involves a free energy decrease
2. Native structure has the lowest free energy (global minimum)

G1 G2
ΔG = G2 - G1 < 0
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• Assumptions:
3. The folding process can be followed, given:
• An explicit description of the unfolded structure and its
surrounding (water, ions, lipids)
• A mathematical description of the total energy of the system
in a given configuration (‘force field’)
• An algorithm for sampling different configurations of the
system, to find the lowest-energy one (native)

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Explicit system description

Protein Water

Na+

Cl-

Introduction to Proteins, Kessel and Ben-Tal, 2018


Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• Force-field-based calculation of the potential energy
 Folding involves a drop in the free energy of the system:

G1 G2
ΔG = G2 - G1 < 0
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• Force-field-based calculation of the potential energy
 The potential energy can approximate the free energy:

ΔG = ΔH - TΔSsys (constant temp)


(ΔE + Δ(PV))
(ΔU +
In biological ΔK)
systems: ΔG = ΔU + ΔK + Δ(PV) - TΔSsys
proportional to kBT

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Force-field-based calculation of the potential energy
 U is calculated using a force-field (molecular mechanics):

covalent bond length

covalent bond angle

covalent bond dihedral angle

improper dihedral angle

vdW interactions

electrostatic interactions

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Configurational sampling
 Option 1: sampling all possible configurations randomly
and finding the lowest-energy one - impractical
 Option 2: Energy minimization
1. Calculate the potential energy of unfolded protein
2. Introduce small changes in atoms location and recalculate
3. If energy is lower – accept and repeat steps 1-2. If energy is
higher, make another change until energy is lower

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Configurational sampling
 Problem: random changes in atomic locations may drive
the system towards higher-energy configurations
 Solution: the changes in atom location are predicted in
response to the force applied on the atoms (molecular
dynamics)
velocity position
acceleration

-
mass
time
(Newton’s 2nd law of motion)
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• Configurational sampling
 Problem: protein folding involves local energy minima 
the simulation gets the stuck!

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Configurational sampling
 Solution: the kinetic barriers are overcome by virtual
heating, which translates into atomic motions (simulated
annealing)

• Extensive sampling allows to estimate ΔSsys:

S = kBlnΩ
number of possible system
configurations
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• MD cannot describe protein folding. Why?
 Explicit models  the simulations cover ns-μs (folding takes
ms)
 The potential energy is just an approximation of the free
energy
 The description of bonds as springs is highly approximated
 Solvation effects are not fully accounted for
 The force constants are obtained by fitting calculations to
empirical data

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• So why use MD?
 Refining models
 Producing near-native structures
 Exploring dynamic changes in proteins
 Learn about the separate contributions to the total free energy
 Deriving structure-function relationships
 Engineering proteins

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Implicit (mean-field) calculations
 MD’s biggest problem: describing all electrostatic
interactions between fixed protein charges and mobile
solvent charges (solvation/polarization energy)

Gelec =
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• Implicit (mean-field) calculations
 Solution:
Fixed protein charges – described by charge distribution (ρ)
Mobile solvent charges - described by the dielectric (ε)
Gelec – described as a function of ρ and the potential ()

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Implicit (mean-field) calculations
 The elec. potential:

Introduction to Proteins, Kessel and Ben-Tal, 2018


The physical approach
• Implicit (mean-field) calculations
  is calculated by the Poisson equation

.
 In the presence of mobile ions, the
Poisson-Boltzmann equation is
used:

.
(I - the ionic strength in the system)
Introduction to Proteins, Kessel and Ben-Tal, 2018
The physical approach
• Implicit (mean-field) calculations
 The total nonpolar energy (hydrophobic + vdW) is derived
using an empirical surface area dependency:

change in buried surface area


upon folding

Introduction to Proteins, Kessel and Ben-Tal, 2018


Template-based methods
• The 3D structure of the query protein is deduced
from proteins of known structure (templates) that:
1. Have a very similar sequence to the query protein
(homologs)
2. Share physicochemical properties or statistical tendencies
with the query protein

• Main methods:
1. Homology modeling
2. Fold recognition (threading)

Introduction to Proteins, Kessel and Ben-Tal, 2018


Measuring structural similarity
• R.M.S.D - root-mean-square deviation

δ 2(3)

 N - the number of equivalent atoms


compared between the two structures
δ2(1)
 δi - the distance (in Å) between the atom
pair i (one from each protein):
δ 2 (2)
δ2 = (xa-xb)2 + (ya-yb)2 + (za-zb)2
• Smaller R.M.S.D = more similar structures
Introduction to Proteins, Kessel and Ben-Tal, 2018
Template-based methods
• Homology modeling: basic logics
 Proteins with similar sequences have similar structures
 Thus, the structure of a protein can be predicted based on
the structures of its sequence homologs

Templates

Query
Introduction to Proteins, Kessel and Ben-Tal, 2018
Template-based methods
• Homology modeling: steps
1. Finding templates with ≥ 30% sequence identity in
shared regions (psi-BLAST)
2. Aligning the two sequences
3. Transferring the coordinates of identical amino acids
from the template to the query protein
4. Performing energy optimization to get rid of clashes and
distortions
5. Model evaluation (WHATIF, Verify3D) – also for ranking
models

• Software/servers: Modeller, SwissModel, NEST


Introduction to Proteins, Kessel and Ben-Tal, 2018
Template-based methods
• Homology modeling: multiple sequence alignment
 MSA quality usually determines model accuracy

• Software: Clustal-ω, MAFFT, MUSCLE, T-Coffee


Introduction to Proteins, Kessel and Ben-Tal, 2018
Template-based methods
• Homology modeling: problems
1. The number of available templates is limited
2. Loops have low conservation  difficult to predict
Partial solution: using other methods for loop prediction
• Still, HM is currently the best method for structure
prediction

Introduction to Proteins, Kessel and Ben-Tal, 2018


Template-based methods
• Fold recognition: basic logics
 Proteins of similar structures share certain sequence-encoded
properties or statistical tendencies
 Thus, a protein of known structure can serve as a template if it
has similar properties/tendencies to those of the query protein

Introduction to Proteins, Kessel and Ben-Tal, 2018


Template-based methods
• Fold recognition: types of properties/tendencies
1. Amino acids-related: angles, surface area, polarity, 2nd
structure preference, etc.
2. Purely statistical – detected from multiple sequence
alignment

Introduction to Proteins, Kessel and Ben-Tal, 2018


Template-based methods
• Fold recognition: steps
1. Each position in the query protein is coded in a way
describing the specific tendencies of this position. This
yields a sequence profile for the query protein
2. The profile is systematically compared to a library
containing the profiles of all proteins of known structure
3. A match represents a protein with similar fold
4. If a match is not found, the query protein is assumed to
have a novel fold

• Software/servers: GenTHREADER, SPARKS-X, HHPred,


TASSER
Introduction to Proteins, Kessel and Ben-Tal, 2018
Template-based methods
• Integrative methods
 Integrate different approaches to get best results
 E.g.: HM for initial models, energy methods for refinement
 Software/servers: I-TASSER, Rosetta

Nature Protocols
(2010) 5 725–738

Introduction to Proteins, Kessel and Ben-Tal, 2018


Template-based methods
• Experimentally guided methods
 Structural data are derived from low-resolution lab methods
(EM, SAXS) and used as constraints in computational
structure prediction

J Mol Biol. (2009) 392: 181-90

A Rosetta implementation with cryo-EM constraints


Introduction to Proteins, Kessel and Ben-Tal, 2018
Databases of predicted structures

• The protein model portal (PMP) at the NIH:

http://www.proteinmodelportal.org/

• The protein model database (PMDB):

https://bioinformatics.cineca.it/PMDB/

Introduction to Proteins, Kessel and Ben-Tal, 2018


Evolutionary methods
• Prediction by correlated mutations
 Positions that are close in space tend to co-evolve
 Therefore, data on co-evolving positions can be used as
distance constraints on structure prediction

PLoS One (2011) 6: e28766

Introduction to Proteins, Kessel and Ben-Tal, 2018


Evolutionary methods
• Prediction by correlated mutations
 The sequence information is used to build a contact map,
which is integrated into a structure prediction algorithm.

bioRxiv 021022; doi: https://doi.org/10.1101/021022


Introduction to Proteins, Kessel and Ben-Tal, 2018
Evolutionary methods
• Prediction by correlated mutations
 Web servers: EVcouplings, RaptorX-Contacts, GREMLIN,
DESTINI

Introduction to Proteins, Kessel and Ben-Tal, 2018


Summary of Chapter 3
• The 3D structure of proteins can be determined by lab
methods but most of these methods are slow and
expensive, resulting in a small number of known
structures compared to the explosion of protein
sequences that are determined every day.
• A partial solution to this problem is given by fast
computational methods that predict protein structures.

Introduction to Proteins, Kessel and Ben-Tal, 2018


Summary of Chapter 3
• Physics-based methods attempt to predict the structures
of proteins by sampling many folded conformations and
calculating their energy. Since the calculations are heavy
and approximated, such methods are usually unable to
predict protein structures from scratch, but can refine
near-native conformations and elucidate their dynamics.
• Comparative (template-based) methods rely on proteins
with known structure, and on their sequence similarity to
the query protein, to predict its structure.

Introduction to Proteins, Kessel and Ben-Tal, 2018


Summary of Chapter 3
• The best structure predictors combine comparative and
physics-based methods, and use data from NMR, SAXS,
and EM experiments to guide the prediction process.
• Recently developed methods that rely on the tendency of
protein positions close in 3D space to co-evolve have
been producing encouraging results.
• Experimentally structures are deposited in the Protein
Data Bank (PDB), whereas predicted structures can be
found in several databases.

Introduction to Proteins, Kessel and Ben-Tal, 2018

You might also like