Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

RNA Structure Prediction Software and Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

RNA Structure Prediction

RNA structure prediction strategies


Secondary structure prediction

1) Energy minimization
(thermodynamics)

2) Comparative sequence analysis


(co-variation)

3) Combined experimental & computational


Secondary structure prediction strategies

1) Energy minimization (thermodynamics)


• Algorithm:
Dynamic programming to find
high probability pairs
(also, some Genetic algorithms)

• Software:
Mfold - Zuker
Vienna RNA Package - Hofacker
RNAstructure - Mathews
Sfold - Ding & Lawrence

R Knight 2005
Secondary structure prediction strategies

2) Comparative sequence analysis (co-variation)


• Algorithm:
Mutual information
Context-free grammars

• Software:
ConStruct
Alifold
Pfold
FOLDALIGN
Dynalign

R Knight 2005
Secondary structure prediction strategies

3) Combined experimental & computational

• Experiment:
Map single-stranded vs double-stranded regions in
folded RNA

• How?
Enzymes: S1 nuclease, T1 RNase
Chemicals: kethoxal, DMS

R Knight 2005
Experimental RNA structure determination?

• X-ray crystallography

• NMR spectroscopy

• Enzymatic/chemical mapping
1) Energy minimization method

What are the assumptions?


Native tertiary structure or "fold" of an RNA
molecule is (one of) its "lowest" free energy
configuration(s)
Gibbs free energy = G in kcal/mol at 37C
= equilibrium stability of structure
lower values (negative) are more favorable
Is this assumption valid?
in vivo? - this may not hold, but we don't really know
Free energy minimization

What are the rules?

A U Basepair A=U
A U A=U What gives here?
G = -1.2 kcal/mole

A U Basepair
A=U
U A U=A
G = -1.6 kcal/mole

C Staben 2005
Energy minimization calculations:
Base-stacking is critical

AA -1.2 CG -3.0
UU GC

AU or UA -1.6 GC -4.3
UA AU CG

AG, AC, CA, GA -2.1 GU -0.3


UC, UG, GU, CU UG

CC -4.8 XG, GX 0
GG YU, UY

- Tinocco et al.

C Staben 2005
Nearest-neighbor parameters

Most methods for free energy minimization


use nearest-neighbor parameters (derived from
experiment) for predicting stability of an RNA secondary structure
(in terms of G at 37C)

& most available software packages use


the same set of parameters:
Mathews, Sabina, Zuker & Turner, 1999
Energy minimization - calculations:
Total free energy of a specific
conformation for a specific RNA molecule
= sum of incremental energy terms for:
• helical stacking
(sequence dependent)
• loop initiation
• unpaired stacking

(favorable "increments" are < 0)

Fig 6.3
Baxevanis &
Ouellette 2005
But how many possible conformations for a single RNA molecule?

Huge number:
Zuker estimates (1.8)N possible secondary structures for a
sequence of N nucleotides
for 100 nts (small RNA…) =
3 X 1025 structures!
Solution? Not exhaustive enumeration…
 Dynamic programming
O(N3) in time
O(N2) in space/storage
iff pseudoknots excluded, otherwise:
O(N6 ), time
O(N4 ), space
2) Comparative sequence analysis
(co-variation)

Two basic approaches:


• Algorithms constrained by initial alignment
Much faster, but not as robust as unconstrained
Base-pairing probabilities determined by a partition
function

• Algorithms not constrained by initial alignment


Genetic algorithms often used for finding an alignment & set
of structures
RNA Secondary structure prediction: Performance?

How evaluate?
• Not many experimentally determined structures
currently, ~ 50% are rRNA structures
so "Gold Standard" (in absence of tertiary structure):
compare with predicted RNA secondary structure with that
determined by comparative sequence analysis (!!??) using Benchmark
Datasets
NOTE: Base-pairs predicted by comparative sequence analysis for large &
small subunit rRNAs are 97% accurate when compared with high resolution
crystal structures! - Gutell, Pace
RNA Secondary structure prediction: Performance?

1) Energy minimization (via dynamic programming)


73% avg. prediction accuracy - single sequence
2) Comparative sequence analysis
97% avg. prediction accuracy - multiple sequences (e.g., highly
conserved rRNAs)
much lower if sequence conservation is lower &/or fewer sequences
are available for alignment
3) Combined - recent developments:
combine thermodynamics & co-variation
& experimental constraints? IMPROVED RESULTS
RNA structure prediction strategies
Tertiary structure prediction
Requires "craft" & significant user input & insight
1) Extensive comparative sequence analysis to predict tertiary
contacts (co-variation)
e.g., MANIP - Westhof
2) Use experimental data to constrain model building
e.g., MC-CYM - Major
3) Homology modeling using sequence alignment & reference tertiary
structure (not many of these!)
4) Low resolution molecular mechanics
e.g., yammp - Harvey

You might also like