Zubradt 2016
Zubradt 2016
Zubradt 2016
Coupling of structure-specific in vivo chemical modification chemical RNA structure probing. Because of their distinct mecha-
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.
to next-generation sequencing is transforming RNA secondary nisms of modification, DMS and SHAPE report on different and
structure studies in living cells. The dominant strategy complementary aspects of RNA structure6,7. In early efforts,
for detecting in vivo chemical modifications uses reverse chemical lesions from either SHAPE or DMS were detected
transcriptase truncation products, which introduce biases when the reverse transcriptase (RT) enzyme terminated cDNA
and necessitate population-average assessments of RNA synthesis upon reaching a modified nucleotide. We and others
structure. Here we present dimethyl sulfate (DMS) mutational have coupled the chemical probing of RNA structure to next-
profiling with sequencing (DMS-MaPseq), which encodes DMS generation sequencing (Fig. 1), allowing for experimental analy-
modifications as mismatches using a thermostable group II sis of RNA structure on a global scale in vitro or in vivo4,8–11
intron reverse transcriptase. DMS-MaPseq yields a high (see refs. 12 and 13 for reviews of sequencing-coupled RNA
signal-to-noise ratio, can report multiple structural features structure techniques). Globally, these experiments have revealed
per molecule, and allows both genome-wide studies and substantial differences in RNA structure in vivo versus in vitro,
focused in vivo investigations of even low-abundance RNAs. underscoring the importance of examining RNA structure in its
We apply DMS-MaPseq for the first analysis of RNA structure native cellular environment4,10.
within an animal tissue and to identify a functional structure Despite important contributions to RNA structure discovery,
involved in noncanonical translation initiation. Additionally, truncation-based approaches using either DMS or SHAPE have
we use DMS-MaPseq to compare the in vivo structure of pre- intrinsic limitations that render them unsuitable to address certain
mRNAs with their mature isoforms. These applications illustrate biological questions, such as the heterogeneity of RNA structures
DMS-MaPseq’s capacity to dramatically expand in vivo analysis in vivo. We sought to develop an in vivo and genome-wide approach
of RNA structure. that would overcome existing limitations in truncation strategies
by encoding DMS lesions as mutations instead of as cDNA trun-
RNA is a functionally diverse molecule that both carries genetic cations, as has been recently described for individual or highly
information and directly conducts biological processes through abundant RNA targets7,14–16. Such mutational profiling (MaP)
its ability to fold into complex secondary and tertiary structures1. approaches confer several advantages. These advantages include
The discovery of functional RNA structures depends critically the resolution of enzymatic biases proximal to the information-
on accurate, targeted, and accessible RNA structure determina- encoding nucleotide and (most importantly) the analysis of multiple
tion methods, particularly in vivo. Sequence information alone chemical modification sites per molecule, which opens up the pos-
is generally not sufficient for prediction of RNA structure, sibility of distinguishing heterogeneous RNA structure subpopu-
but by combining sequence information with experimental lations from one another in vivo. In truncation approaches, only
structure data at single-nucleotide resolution, one can often a single site of chemical modification can be observed per RNA
obtain an accurate assessment of RNA folding status and discover molecule; thus, the structure signal corresponds to a population
novel RNA structures2–4. average. Additionally, low-abundance RNAs are not conducive
Existing high-resolution techniques to measure RNA second- to truncation-based RNA structure probing. Specifically, they are
ary structure are based on structure-specific chemical modifica- poorly sequenced on a genome-wide scale, and input requirements
tion. DMS has emerged as one of the pre-eminent choices for for available low-throughput methods often necessitate in vitro
this application. DMS rapidly and specifically modifies unpaired transcription before structure profiling6,14,15,17. We reasoned
adenines and cytosines in vivo at their Watson–Crick base-pairing that an in vivo MaP approach would make it possible to perform
positions5. Selective 2′-hydroxyl acylation analyzed by primer targeted amplification of low-abundance RNA species while
extension (SHAPE) chemicals are another powerful option for retaining a record of the modification sites.
1Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biology, Center for RNA Systems Biology, Howard Hughes Medical Institute,
University of California, San Francisco, San Francisco, California, USA. 2Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. 3Institute for
Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, USA. Correspondence should be addressed to
J.S.W. (jonathan.weissman@ucsf.edu) or S.R. (srouskin@wi.mit.edu).
Received 24 June; accepted 29 September; published online 7 November 2016; doi:10.1038/nmeth.4057
RESULTS yielded average detection frequencies of only 53% and 1.4% at the
Development of genome-wide in vivo RNA structure same modified residues (Fig. 2b). This tendency of SSII–Mn2+ to
probing with mutational profiling under-report the DMS modification signal in a context-dependent
For DMS-MaPseq, we treated cells with a high concentration of manner could severely undermine data quality.
DMS to increase the number of modifications detected per frag- A valuable measure for the signal-to-noise ratio in DMS
ment, modifying approximately 1 in 50 nucleotides. We compared data is the enrichment of signal on adenines and cytosines4
data produced at this DMS concentration (5% v/v) to previously (Supplementary Fig. 2a). When the same source of DMS-modified
validated concentrations4, and we observed excellent correlation RNA was reverse transcribed using either TGIRT or SSII–Mn2+,
of the RNA structure signal both globally and for each nucle- we observed a far greater fraction of mismatches on A–Cs using
otide in the yeast 18S rRNA (Supplementary Fig. 1; R = 0.94 and TGIRT (TGIRT, 93.5%; SSII–Mn2+, 84%) (Fig. 2c). This high
R = 0.98, respectively). For applications that aim to use even A–C signal in TGIRT data also exceeds that of our previously
higher DMS levels, it will be important to do a similar analysis published DMS-seq strategy based on cDNA truncation, and there
to evaluate whether RNA structures are perturbed with increas- are notable differences in the relative contributions of A–Cs4.
ing DMS concentrations. After DMS treatment and total RNA Analysis of the mismatch nucleotide bias in DMS-seq reveals
extraction, random fragmentation with Zn2+, and the removal that 54% of mismatches occur on cytosines in a DMS-dependent
of ribosomal RNA, we did a broad size selection, ligated a 3′ manner, suggesting that truncation at cytosines is not robust 14
adaptor, and reverse transcribed under conditions in which chem- (Supplementary Fig. 2b,c). Notably, the signal on adenines is
ically modified bases were encoded as mutations in the cDNA lower with SSII–Mn2+ than with the other techniques, which
(Fig. 1). Consequently, multiple modifications can be observed suggests an underlying failure of SSII–Mn2+ to robustly encode
on a single cDNA fragment, providing an essential frame- m1A modifications consistent with the low signal detection on
work for future applications of single-molecule RNA structure the endogenous rRNA residues.
determination. Both TGIRT and SSII–Mn2+ produce excellent signal at
The accuracy of DMS-MaPseq depends critically on reverse unpaired A–C residues in the yeast RPS28B positive-control
transcription conditions that optimize the detection of DMS structure, but the SSII–Mn2+ data reveal high background signal
modifications while retaining high fidelity and processivity dur- on certain G–U residues, suggesting a propensity for nonrandom
ing cDNA synthesis. The TGIRT enzyme was recently adapted errors in cDNA synthesis. This higher background error for
with these latter priorities in mind and notably produces mis- SSII–Mn2+ is also reflected in the genome-wide frequency of muta-
matches at endogenous m1A and m3C tRNA residues—the exact tions and indels on matched untreated and DMS-treated RNA
methylation profiles of a DMS modification18,19. Additionally, (Supplementary Fig. 2d), which is consistent with the historical
Superscript II with Mn2+ buffer (SSII–Mn2+) has been used for the use of Mn2+ buffer in deliberate mutagenesis during oligonucleotide
Identify mutations
Ratiometric calculation
Mismatch/total
Figure 1 | Sequencing library generation for RNA structure probing techniques. Schematic of library preparation strategies for cDNA truncation
approaches (top) and for DMS-MaPseq (bottom).
a b 100 a 1.00
DMS-MaPseq, denatured rRNA
TGIRT
2+
Percentage of total mutations SSII–Mn 0.95 Endogenous
Reverse
80
m 1A
Mismatch/total (%)
transcriptase Mismatches Deletions Insertions 0.90
A
e 1.0 b 1.0
c 0.9
A
A
U
G A DMS-seq
1 10 U
U
A
C
0.9
G
A
A
U
U C DMS-MaPseq
2+
U C A A A A A A U
A
A
0.8 0.8
SSII–Mn U
U 30
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.
R value
0.8
Fraction of bases
U
G G U U U U U U G
50
G
U RPS28B 0.7
0.6 snRNA or snoRNA
GC content
U
U 0.7 0.6
G
U
40 20 G
A
U
0.6 0.5
A U
G A
TGIRT 0.4
A G 2+ 0.4
1 10 U
U
A
A
C
U A
SSII–Mn
U C A A A A A A U G
A
A
U
U C 0.5
TGIRT U
U
A
A
30 –0.10 0.0 0.10 0.20 0.2 0.3
U
Gini index difference
G G U U U U U U G
50 U 0.2 CDS
G 0.0 1.0 (Rep1–Rep2) Non coding
U
U 0.0 0.1
G 0.00 0.02 0.04 0.06 0.08 0.10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
40
Ratiometric DMS signal Gini index
Figure 2 | TGIRT enzyme delivers higher signal and lower background for Increasing structure
DMS-MaPseq. (a) Distribution of mutation type generated by SSII–Mn2+
Figure 3 | Global analysis of in vivo DMS-MaPseq data. (a) Signal decay
or TGIRT reverse transcription from in vivo DMS-treated yeast mRNA.
observed after endogenous m1A modification at position 642 in the
(b) Endogenous m1A modifications in yeast 25S rRNA transcript reveal
yeast 25S rRNA in DMS-seq but not in DMS-MaPseq. (b) Histogram of
superior modification detection with TGIRT. Average percent modification
ratiometric reactivity for negative-control bases in the yeast 18S rRNA.
(bar) detected at the position across two biological DMS-treated replicates
The total number of negative-control bases is 338, characterized as bases
(circles) with error bars representing s.d. from the average. (c) Nucleotide
known to be base paired. (c) Scatter plot of GC content versus Gini index
composition of mismatches from TGIRT or SSII–Mn2+ approaches.
in 50-nucleotide (nt) windows of deeply sequenced genes. Noncoding
(d) Yeast RPS28B mRNA positive-control structure with nucleotides
RNA regions include UTRs and all classes of mammalian noncoding RNAs.
colored by DMS reactivity in vivo. Numbers represent the nucleotide
CDS regions are coding regions. A gray box is placed around small nuclear
position within the displayed region. Black boxes outline G–U bases
RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). The total number of
with high background signal. DMS reactivity was calculated as the average
evaluated windows is 182. Pearson’s correlation = 0.32; P value = 7.3 × 10−6.
ratiometric DMS signal per position across two biological replicates
normalized to the highest number of reads in displayed region, which
is set to 1.0. (e) Genome-wide DMS-MaPseq replicates (Rep1 and Rep2) structure metric to assess the evenness of the data distribution4
compared by Pearson’s R value and Gini index for yeast mRNA regions (Fig. 2e). This analysis revealed a stronger reproducibility
(requiring 15× coverage, resulting in 733 and 272 regions displayed
between data generated by TGIRT than by SSII–Mn 2+, consist-
for TGIRT and SSII–Mn2+, respectively).
ent with our observations of high background noise in the latter
approach. Because of the high DMS signal and low background
synthesis20. Other RNA structure methods have subtracted back- error observed across many quality control metrics, we chose the
ground signal on a nucleotide-by-nucleotide basis15; however, TGIRT enzyme for all further DMS-MaPseq experimentation and
we see an increase in noise after applying a background cor- method development.
rection to the RPS28B positive-control structure21 (Fig. 2d and
Supplementary Fig. 3a–d). Global investigation reveals a poor Global analysis of DMS-MaPseq data
correlation of background signal for both TGIRT and SSII–Mn2+ When DMS lesions are detected by truncation, only the most 3′
untreated replicates, suggesting that background signal is variable DMS modification on an RNA fragment will be detected. For
and stochastic (Supplementary Fig. 3e,f). Thus, a key advantage this reason, DMS treatment conditions must be carefully titrated
of DMS-MaPseq is the ratiometric nature of the data (i.e., in a to avoid improper hit kinetics and 5′ signal decay22. This effect
population-level analysis, the rate of modification at each posi- is illustrated by the lack of DMS-seq signal immediately 5′ of an
tion is equal to the ratio of mutated reads to total reads; Fig. 1). endogenous m1A residue in denatured yeast 25S rRNA (Fig. 3a).
Untreated or denatured DMS-MaPseq controls may still be useful This drop off does not occur with DMS-MaPseq data, confirming
in the discovery of endogenous mRNA modifications encoded that the TGIRT enzyme can encode multiple DMS lesions in a
during reverse transcription19, uncharacterized single-nucleotide short sequence space. Additionally, negative-control bases in the
polymorphisms, or as a negative control, but it is not a necessary yeast rRNA fall overwhelmingly into the lowest bin of reactivity
component for single-nucleotide RNA structure calculations. in DMS-MaPseq data, confirming low background noise relative
We used replicates to assess the reproducibility of the RNA to previous DMS-seq data4 (Fig. 3b).
structure signal across yeast transcriptome regions as measured We also collected a genome-wide in vivo DMS-MaPseq data
by r value and the Gini index difference, an established RNA set from human embryonic kidney (HEK) 293T cells, and we
a 1.0 b 1.0
20× coverage
Average mismatch coverage: 50 million
10× 20× 30× 40× 50× 100 million
0.8 0.8 200 million
Fraction of mRNA regions 1 billion
Fraction of genes
0.6
0.6
0.4
0.4
0.2
0.22
0.2
0.0
0.4 0.5 0.6 0.7 0.8 0.9 1.0
R value 0.0
0 20 40 60 80 100 120
c A
Average mismatch coverage from
AAA 3′ mRNA-specific RT genome-wide DMS-MaPseq data
5′ (or oligo dT priming)
d 100
90
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.
Tagmentation
DMS-MaPseq
80
mRNA-specific PCR
70
50
40
NexteraXT 30
fragmentation
20
10
0
NexteraXT PCR; add 0 10 20 30 40 50 60 70 80 90 100
adaptors and indices
False-positive rate (%)
1
Mismatch/total
0.8
0.6
0.4
Align reads; ratiometric calculation
0.2 for structure signal
0
AGACUAUCAUGAUGCUAGGACA AUCAUGG ACAUGA U
20 G
f RPS28B A U G
A U
A
0.0 1.0 A
A U G A
e HAC1 1 10
G A
A U
U
U C
A C
1 U C A A A A A A U U
A 10 A A A A
A C U 30
U C A A G A UG C G A 20 U U
G U U G G G G U U U U U U G
G G U
A G U U C U C G C 30 50 U
C G G
C A G C C
80 U C A C
A A A G U
U 70 G C G U
C A A C G G G G
60 40
Figure 4 | DMS-MaPseq enables in vivo RNA structure probing for specific RNA targets. (a) Cumulative histogram of Pearson’s R values between yeast
mRNA regions in DMS-MaPseq replicates at varied depths of average mismatch coverage. (b) Fraction of genes exceeding the minimum average mismatch
coverage of 20× in genome-wide human HEK 293T DMS-MaPseq data with varied sequencing depths. 0.006, 0.009, and 0.03 are the fraction of genes
passing this threshold at 50, 100, and 200 million uniquely mapped reads, respectively. (c) Schematic for targeted RNA structure probing via target-
specific RT–PCR and NexteraXT tagmentation. (d) ROC curve for DMS signal on yeast 18S rRNA using ratiometric data from target-specific tagmentation
approach and from genome-wide DMS-MaPseq. (e,f) Yeast HAC1 (e) and RPS28B (f) 3′ UTR mRNA positive-control structures from target-specific priming
with nucleotides colored by DMS reactivity in vivo. DMS reactivity calculated as the ratiometric DMS signal per position normalized to the highest number
of reads in displayed region, which is set to 1.0.
confirmed the agreement of our data with the XBP1 positive- the biggest outliers are small nucleolar RNAs (snoRNAs) and
control structure23 (Supplementary Fig. 4). Often, a region of small nuclear RNAs (snRNAs), which have a low GC content but
high GC content is considered a candidate region for RNA folding are highly structured.
owing to the high stability of G–C pairings, so we investigated
this relationship across human transcriptome regions, plotting DMS-MaPseq for specific or low-abundance RNA targets
GC content against the Gini index from DMS-MaPseq (Fig. 3c). Low-abundance mRNAs do not receive sufficient sequencing cov-
A small correlation (R = 0.32) exists, but overall, coding regions erage in genome-wide experiments to enable robust conclusions
have lower GC content, and their RNA appears less structured, about their structure. Plotting the cumulative r value distribution
as we demonstrated previously4. However, the lack of structure for mRNA regions between in vivo DMS-MaPseq replicates in
is more pronounced than expected by GC content alone, and yeast reveals that an average mismatch coverage depth of greater
noncoding RNA regions are more structured than coding DNA than 20× greatly improves data reproducibility (Fig. 4a). However,
sequence (CDS) regions of comparable GC content. Interestingly, for genome-wide HEK 293T DMS-MaPseq data, only a limited
a G
C G G b
C
C
U
60 A 1.0
C U U
G
A U A
G
C A A C U U 70
U
A
oskar 3′ UTR C
50 U U G
U
A C A U
G 90 A G G
U C
U G A U A C G
0.4
C A G C C
A A A G U C C
G G G
U C U U 10
0.0 1.0 C
U
A U
U C A U A
U G U
A G 20
A U
0.2
30 A
A A A U
A U 110
A U
G C
0.0
1 40 80 114
U G
A A U Position (nt)
1 114
c
0.08 FXR2 5′ UTR + exon1
Mismatch/total ratio
GUG
0.06
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.
0.04
0.02
0.00
50 100 150 200 250 300 350 400
Distance from TSS (nt)
Mismatch/total ratio
0.08
0.06
0.04
0.02
0.00
100 110 120 160 170 180 190 200 210 250 260
Loop Stem1 Stem2
82 nt GUG
263 nt
Stem2 Loop
Stem1
Figure 5 | Novel experimental applications for in vivo RNA structure probing. (a) oskar 3′ UTR mRNA positive-control structure from target-specific
priming with nucleotides colored by in vivo DMS reactivity in D. melanogaster ovaries. DMS reactivity calculated as the ratiometric DMS signal per
position normalized to the highest number of reads in displayed region, which is set to 1.0. (b) oskar positive-control region from a shown with average
normalized DMS-MaPseq values from two biological replicates, one at 5 min DMS treatment and one at 10 min. Error bars represent 1 s.d. (c) Ratiometric
DMS-MaPseq from targeted amplification of the human FXR2 5′ UTR and exon1 sequence. Nucleotides accessible to DMS are noted with a value >0.03,
which is the threshold representing the best agreement with our model. Position 1 corresponds to chromosome XVII:7614897.
fraction of genes pass this 20× coverage threshold (Fig. 4b). Even concordance of 18S rRNA DMS-MaPseq data with the published
when extrapolated to an exorbitant sequencing depth of 1 billion yeast crystal structure model24 and observed an excellent agree-
uniquely mapped reads, many human genes (78%) have insuffi- ment with data from both our genome-wide and targeted approach
cient coverage. To probe the in vivo structure of low-abundance (Fig. 4d and Supplementary Fig. 5a). We also assessed whether
mRNAs, we developed and validated a simple targeted RT–PCR the targeted DMS-MaPseq data supported positive-control mRNA
implementation of DMS-MaPseq (Fig. 4c). Targeted DMS- structure models, and we observed excellent agreement with the
MaPseq begins with the in vivo modification of RNA, followed yeast HAC1 and RPS28B structures21,25 (Fig. 4e,f) as well as with
by total RNA extraction, DNase treatment, and rRNA deple- the human XBP1 and MSRB1 structures23,26 (Supplementary
tion. Then, we reverse-transcribe using the TGIRT enzyme and Fig. 5b–e). Finally, we observed no signal drop-off in our ampli-
target-specific primers (primers can be used in combination to fied regions until the primer-binding region, and we observed a
amplify multiple RNA species in a single reaction). Directly after low level of background signal (Supplementary Fig. 6).
cDNA synthesis, target-specific PCR primers amplify the RNA To reduce PCR amplification biases for quantitative applica-
region of interest, and this is followed by NexteraXT tagmentation tions or low-input material, we also developed a variation of tar-
and sequencing. geted DMS-MaPseq that tags each RNA molecule with a unique
To assess data quality from this targeted approach, we exam- molecular index (UMI) on the RT primer (Supplementary Fig. 7a
ined the structure signal for known RNA structures. We plot- and Supplementary Table 1). Unique reads can then be isolated
ted a receiver operating characteristic (ROC) curve to assess the easily based on their specific UMI and DMS mutation profiles.
a b 0.16
RPL14A pre-mRNA
1.0 *
A allele 0.12
0.8 1
* * * U *
0.6 U C U U C
U C
U
0.08
Combined alleles A * 10
Normalized DMS signal
Mismtatch/total ratio
1.0 0.4 U G A C G G U
C
A G G A G G A G
C
U 0.04
G
*
G U U
30 U
*
0.8 0.2 0.00
20
0.6 0.0
UUCUU CUCUAUGCG A GGAU U UGGAC UGGC A GUG
0.4 1.0
C allele RPL14A spliced mRNA
0.2 0.8 1 0.16
10
* *
A
*
U C G
0.0 0.6 U
U U C U C
U
C U G C
G
A G
U
0.12
UUCUUCUCUMUGCGAGGA UUUGGACUGGCAGUG U 20
0 5 10 15 20 25 30 0.4 G G A C G U C
U
U G
*
G 0.08
Position (nt) 0.2 30
A G
0.0 0.04
UUCUUCUCUCUGCGAGGAUUUGGACUGGC AGUG
0 5 10 15 20 25 30 0.00
Position (nt) 100 200 300 400 500 600 700
Distance from AUG (nt)
Exon1 Exon2
Figure 6 | Investigating RNA structure heterogeneity with DMS-MaPseq. (a) Regions of heterogeneous structure exhibit indistinguishable structure
signals when combined but can be distinguished by DMS-MaPseq, which is illustrated by normalized DMS-MaPseq data derived from the human MRPS21
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.
ribosnitch A–C alleles. Allele-specific data represented as the mean of three technical replicates. Error bars represent 1 s.d.; stars mark A–C nucleotides
with different pairing states between alleles. (b) Targeted DMS-MaPseq data specific for the yeast RPL14A pre-mRNA and spliced mRNA isoforms reveal
minimal structure difference in the common exon1 sequence (R = 0.88). Ratiometric in vivo DMS-MaPseq data is plotted with isoform-specific RT primer
locations noted with arrows.
The SFT2 and ASH1 yeast mRNAs are weakly expressed and host Supplementary Fig. 9; free energy < –31 kcal/mol), with some
functional RNA structures in their 5′ and 3′ UTRs, respectively, ambiguity across certain regions depending on the thresholds
serving as positive controls for DMS signal detection using a UMI. used to impose folding constraints (see alternative structure
Indeed, both controls show DMS modification profiles consistent model, Supplementary Fig. 10a). We mutated these putative FXR2
with the known secondary structure models4,27 (Supplementary structures to perturb the majority of base-pairing interactions in
Fig. 7b,c). Irrespective of their uniqueness, these data are in both models and tested their effects within a reporter construct,
excellent agreement when processed, which suggests that a UMI revealing a drop in protein levels upon mutating either struc-
may not be necessary for amplification of transcripts of com- ture (Supplementary Fig. 10b–d and Supplementary Table 2).
parable abundance. Given the limitations regarding the size of Compensatory mutations, designed to optimize the restora-
RNA region assayed with this UMI approach and the expense of tion of our predicted RNA structures, restored eGFP levels and
longer sequencing reads, choosing between the targeted versions thus implicated the structure itself as a functional modulator of
of DMS-MaPseq depends on the region size, target abundance, translation initiation for FXR2. In addition to the compensa-
and quantitative demands of an experiment. tory mutations, the in vivo structure signal supports this model
(Supplementary Fig. 9c–e).
DMS-MaPseq for Drosophila melanogaster ovaries
With their dramatic developmental changes independent of Structure probing of RNAs in multiple conformations
transcription and mRNA degradation, D. melanogaster oocytes In the complex environment of the cell, the structure of an RNA
provide a premier system for studying mRNA localization and molecule may vary based on its current state, such as matura-
translational control. Many mRNAs are localized during oogen- tion, translation, protein binding, and degradation. In the case of
esis28, and while these localization mechanisms are poorly under- structural heterogeneity from a ribosnitch (i.e., a single-nucleotide
stood globally, RNA structure has been shown to be involved29–31. polymorphism that yields a local RNA structure rearrangement),
Here, we apply targeted DMS-MaPseq to D. melanogaster ovaries, the interpretation of in vitro RNA folding status differs greatly
which yields excellent structure data at two DMS treatment levels when DMS-MaPseq data from the two human MRPS21 ribosnitch
consistent with the oskar and gurken mRNA structures respon- alleles35 are analyzed together or separately. Allele-specific analysis
sible for localization31,32 (Fig. 5a,b and Supplementary Fig. 8). of the data reveals two distinct and mutually exclusive structures,
This is the first example of RNA structure probing in an animal which are not detectable from the combined allele analysis (Fig. 6a).
tissue and marks a key step forward in investigating the role of This example illustrates the complexity of analyzing structur-
RNA structure in mRNA localization in this model system. ally heterogeneous regions and a simple resolution using DMS-
MaPseq to separate specific RNA subpopulations by allele.
A highly structured region influences noncanonical Of particular interest regarding structural heterogeneity
translation initiation are isoform-specific RNA structures. Structure differences have
We recently discovered that translation of the mammalian FXR2 been proposed between pre-mRNAs and their mature translated
(Fragile X Mental Retardation, Autosomal Homolog 2) gene initi- counterparts, such as RNA structures that influence splice-site
ates predominantly at a GUG codon33. On account of the extreme selection36 or affect translation37,38. We used intron- or exon-
GC content (80%) of the first exon of FXR2, we hypothesized a specific RT primers to separately amplify each isoform of two
stable RNA structure may contribute to the non-canonical initia- yeast ribosomal protein genes using targeted DMS-MaPseq. The
tion. We used in vitro DMS-MaPseq data to develop a secondary RNA structure signal in the common exon1 sequence between
structure model with RNAfold34. This revealed two highly stable the RPL14A and RPL31B pre-mRNAs and their respective mature
putative structures flanking the GUG initiation codon (Fig. 5c, counterparts reveals surprisingly little structure difference
Additionally, DMS-MaPseq allows the selective amplification sequencing data. Genome Res. 23, 377–387 (2013).
4. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J.S.
of RNA targets, including pre-mRNAs or differentially spliced Genome-wide probing of RNA structure reveals active unfolding of mRNA
isoforms. Together, these advances drastically expand the range structures in vivo. Nature 505, 701–705 (2014).
of experimentally accessible RNA species for structural analysis, 5. Wells, S.E., Hughes, J.M., Igel, A.H. & Ares, M. Jr. Use of dimethyl sulfate
enabling a wide range of future studies. In theory, our in vivo to probe RNA structure in vivo. Methods Enzymol. 318, 479–493 (2000).
6. Mortimer, S.A. & Weeks, K.M. A fast-acting reagent for accurate analysis
MaP approach with TGIRT could also be used for SHAPE, which of RNA secondary and tertiary structure by SHAPE chemistry. J. Am. Chem.
would be a valuable and complementary approach. However, Soc. 129, 4144–4145 (2007).
the bulky nature of the best characterized and validated in vivo 7. Smola, M.J., Rice, G.M., Busan, S., Siegfried, N.A. & Weeks, K.M. Selective
2′-hydroxyl acylation analyzed by primer extension and mutational
SHAPE chemical, NAI-N3 (ref. 10), may prove challenging. profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure
Finally, DMS-MaPseq could be combined with the analysis of analysis. Nat. Protoc. 10, 1643–1669 (2015).
endogenous mRNA modifications, including the sequencing- 8. Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure
based mapping of pseudouridines or m6A methylation39–42. These reveals novel regulatory features. Nature 505, 696–700 (2014).
9. Lucks, J.B. et al. Multiplexed RNA structure characterization with
endogenous modifications occur on only a subset of their RNA selective 2′-hydroxyl acylation analyzed by primer extension sequencing
targets. Combined with the single-molecule aspects of DMS- (SHAPE-Seq). Proc. Natl. Acad. Sci. USA 108, 11063–11068 (2011).
MaPseq, it would be possible to evaluate how such endogenous 10. Spitale, R.C. et al. Structural imprints in vivo decode RNA regulatory
mechanisms. Nature 519, 486–490 (2015).
RNA modification affects structure within a single experiment.
11. Poulsen, L.D., Kielpinski, L.J., Salama, S.R., Krogh, A. & Vinther, J. SHAPE
It is the versatility of DMS-MaPseq that makes it a transformative Selection (SHAPES) enrich for RNA structure signal in SHAPE sequencing-
tool for in vivo RNA structure probing, allowing for more com- based probing data. RNA 21, 1042–1052 (2015).
prehensive investigations into the biological relevance of RNA 12. Kwok, C.K., Tang, Y., Assmann, S.M. & Bevilacqua, P.C. The RNA
structurome: transcriptome-wide structure probing with next-generation
structures than ever before. sequencing. Trends Biochem. Sci. 40, 221–232 (2015).
13. Strobel, E.J., Watters, K.E., Loughrey, D. & Lucks, J.B. RNA systems
Methods biology: uniting functional discoveries and structural tools to understand
Methods and any associated references are available in the online global roles of RNAs. Curr. Opin. Biotechnol. 39, 182–191 (2016).
14. Homan, P.J. et al. Single-molecule correlated chemical probing of RNA.
version of the paper. Proc. Natl. Acad. Sci. USA 111, 13858–13863 (2014).
15. Siegfried, N.A., Busan, S., Rice, G.M., Nelson, J.A.E. & Weeks, K.M.
Accession codes. Raw and processed data are available at NCBI RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP).
Nat. Methods 11, 959–965 (2014).
Gene Expression Omnibus, accession number GSE84537. 16. Smola, M.J., Calabrese, J.M. & Weeks, K.M. Detection of RNA–protein
interactions in living cells with SHAPE. Biochemistry 54, 6867–6875 (2015).
Note: Any Supplementary Information and Source Data files are available in the 17. Inoue, T. & Cech, T.R. Secondary structure of the circular form of the
online version of the paper. Tetrahymena rRNA intervening sequence: a technique for RNA structure
analysis using chemical probes and reverse transcriptase. Proc. Natl. Acad.
Acknowledgments Sci. USA 82, 648–652 (1985).
We thank A. Fields from UCSF for FXR2 reporter plasmids; T. Norman, A. Fields, 18. Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion
and J. Quinn for insightful discussions and comments on the manuscript; proteins and their use in cDNA synthesis and next-generation RNA
A. Jaeger (Whitehead Institute for Biomedical Research, Cambridge, sequencing. RNA 19, 958–970 (2013).
Massachusetts, USA) for providing HEK 293T cells; and the Orr-Weaver lab at the 19. Katibah, G.E. et al. Broad and adaptable RNA structure recognition
Whitehead Institute for providing flies. We also thank Y. Chen, D. Bogdanoff, by the human interferon-induced tetratricopeptide repeat protein IFIT5.
E. Chow, and J. Lund at the UCSF Center for Advanced Technology for sequencing Proc. Natl. Acad. Sci. USA 111, 12025–12030 (2014).
assistance; J. Love and S. Levine in the Whitehead Core and MIT BioMicro 20. Beckman, R.A., Mildvan, A.S. & Loeb, L.A. On the fidelity of DNA
Center for library preparation; and C. Reiger, M. DeVera, J. Kanter, and replication: manganese mutagenesis in vitro. Biochemistry 24, 5810–5817
G. McCauley for administrative support. This research was supported by (1985).
the CRSB (Center for RNA Systems Biology; grant P50 GM102706 to J.S.W.), 21. Badis, G., Saveanu, C., Fromont-Racine, M. & Jacquier, A. Targeted mRNA
the Howard Hughes Medical Institute (J.S.W.), the National Science Foundation degradation by deadenylation-independent decapping. Mol. Cell 15, 5–15
grant 1144247 (M. Z.), and the Genentech Foundation (M.Z.). Research on (2004).
TGIRTs and their modes of use was supported by NIH R01 grants GM37949 22. Aviran, S. & Pachter, L. Rational experiment design for sequencing-based
and GM37951 (A.M.L.). RNA structure mapping. RNA 20, 1864–1877 (2014).
31. Jambor, H., Brunel, C. & Ephrussi, A. Dimerization of oskar 3′ UTRs 148–162 (2014).
promotes hitchhiking for RNA localization in the Drosophila oocyte. 41. Meyer, K.D. et al. Comprehensive analysis of mRNA methylation reveals
RNA 17, 2049–2057 (2011). enrichment in 3′ UTRs and near stop codons. Cell 149, 1635–1646
32. Van De Bor, V., Hartswood, E., Jones, C., Finnegan, D. & Davis, I. gurken (2012).
and the I factor retrotransposon RNAs share common localization signals 42. Dominissini, D. et al. Topology of the human and mouse m6A RNA
and machinery. Dev. Cell 9, 51–62 (2005). methylomes revealed by m6A-seq. Nature 485, 201–206 (2012).
quenched by adding a 30 ml stop solution comprised of 30% was incubated at 80 °C for 2 min to denature the template, then
beta-mercaptoethanol (from a 14.2 M stock) and 50% isoamyl it was returned to ice for the addition of SUPERase Inhibitor
alcohol, after which cells were quickly put on ice, collected by (Ambion), DTT, dNTPs, and RT enzyme to generate the final
centrifugation at 3,500 × g at 4 °C for 4 min, and washed with reaction conditions. For reverse transcription using superscript
10 ml 30% BME solution. Cells were then resuspended in 0.6 ml II with Mn2+ buffer, we followed the exact published reactions
total RNA lysis buffer (6 mM EDTA, 45 mM NaOAc pH 5.5), conditions for mutational profiling14 (0.5 mM dNTPs, 50 mM
and total RNA was purified with hot acid phenol (Ambion) and Tris–HCl pH 8.0, 75 mM KCl, 6 mM MnCl2, and 10 mM DTT)
EtOH precipitation. Ribosomal RNA was depleted using RiboZero and allowed the reaction to proceed for 2–3 h at 42 °C with 100 U
(Epicentre), either directly after RNA extraction or postligation in of superscript II (Invitrogen). Because TGIRT may pause at modi-
the genome-wide library preparation. Denatured RNA structure fication sites, this long incubation time facilitates readthrough of
samples were treated as in DMS-seq4. For HEK 293T cells, 15 cm45 multiple modifications per RNA fragment. For the TGIRT reverse
plates with 15 ml of media were treated with the addition of 300 µl transcription, a 5 min incubation at room temperature followed
DMS and incubation at 37 °C for 4–5 min. Media with DMS the initial denaturation, and the RT reaction proceeded for 1.5 h
was decanted, and plates were washed twice in 30% BME (v/v). at 57 °C with 100 U TGIRT-III enzyme (InGex) and the following
Cells were resuspended in Trizol, and RNA was isolated accord- reaction conditions: 1 mM dNTPs, 5 mM freshly prepared DTT
ing to the manufacturer protocol. For D. melanogaster oocytes, (Sigma-Aldrich), 10 U SUPERase Inhibitor, 50 mM Tri–HCl pH
we dissected ovaries from ~100 flies (OreR strain) in 250 µl 8.3, 75 mM KCl, and 3 mM MgCl2. After reverse transcription,
1× PBS. We added 250 µl DMS for 5 min at 26 °C with shaking at 1 µl of 5 M NaOH was added and the reaction incubated for 3 min
500 r.p.m. To stop the reaction, we added 1 ml of 30% BME (v/v) at 95 °C to degrade the RNA, followed by EtOH precipitation and
and transferred the oocytes to a sieve, where they were washed gel purification to remove excess RT primer. Finally, cDNAs were
three times in 30% BME and two times with sterile water. Finally, circularized using CircLigase (Epicentre), and Illumina sequenc-
the ovaries were collected and resuspended in 1mL of Trizol and ing adapters and indexes were introduced by 9–13 cycles of PCR
10 µl BME, and total RNA was extracted. using Phusion HF Polymerase (NEB), oNTI231, and indexing
primers with TruSeq 6 bp indices. Libraries were sequenced with
Library generation and genome-wide DMS-MaPseq. Sequencing oNTI202 in 50 nt single-end reads on the HiSeq4000 (Illumina).
libraries were prepared with a modified version of the protocol See primer sequences in Supplementary Table 1.
used for DMS-seq4. Specifically, 10 µg of DMS-treated total RNA
was denatured for 2 min at 95 °C then fragmented at 95 °C for Library generation and targeted DMS-MaPseq. After in vivo
2 min in 1× RNA Fragmentation Reagent (Zn2+ based, Ambion). DMS treatment and total RNA extraction, 5 µg of total RNA was
Note that this is an increase in starting material over the 1–3 µg DNase-treated for 30 min at 37 °C in 1× TURBO DNase buffer with
used in our previous DMS-seq approach (REF). The reaction was 1 µl TURBO DNase enzyme (Thermo Fisher Scientific). Reactions
stopped with 1× Stop Solution (Ambion) and quickly placed on were desalted using RNA Clean & Concentrator-5 columns (Zymo
ice. The fragmented RNA was run on a 6% Tris Borate Urea (TBU) Research), and rRNA was depleted using RiboZero (Epicentre) or
polyacrylamide gel for 45 min at 150 V. A blue light (Invitrogen) RNase H for D. melanogaster and HEK 293T samples; RNase H
was used for gel imaging, and RNA fragments of 100–170 nt in treatment was implemented with slight modifications to the pub-
size were excised, depleting small ncRNA contaminants of <100 nt lished protocol46. For the RNase H protocol, briefly, 5 µg of total
(tRNAs, snoRNAs). Gel extraction was performed by crushing RNA was depleted of small RNA species with a Zymo RNA Clean &
the purified gel piece and incubating in 300 µl 300 mM NaCl Concentrator-5 column, retaining RNA >200 nt per manufac-
at 70 °C for 10 min with vigorous shaking. The RNA was then turer instructions. RNase H subtraction was performed by adding
precipitated by adding 2 µl GlycoBlue (Invitrogen) and 3× volume 5 µg of published subtraction oligos46 in a total volume of 30 µl
(900 µl) 100% EtOH, incubating on dry ice for 20 min and spin- in 1× hybridization buffer (200 mM NaCl, 100 mM Tris pH 7.5).
ning at 20,000 × g for 45 min at 4 °C. The samples were then resus- The mixture was incubated at 68 °C for 1 min, and the tempera-
pended in 7 µl 1× CutSmart buffer (NEB), and the 3′ phosphate ture was ramped down at a rate of 1 °C/min to 45 °C. MgCl2 was
tion to ice, 1 µl RNase H (Enzymatics, 5 U/µl) was added, and stripped of linker sequences and filtered for quality using the
RNA–DNA hybrids were degraded at 37 °C for 20 min to release FASTX-Toolkit Clipper and Quality Filter functions, respectively,
the cDNA. We used RNase H at this step for convenience—NaOH requiring that 80% of sequenced bases have a quality score >25
hydrolysis as used in the genome-wide protocol also works well (http://hannonlab.cshl.edu/fastx_toolkit/). Reads were aligned
at this step. cDNA was purified using the ssDNA protocol for using Tophat v2.1.0 with bowtie2 with the following settings for
DNA Clean & Concentrator-5 columns (Zymo Research). a 50 nt sequencing run:–no-novel-juncs -N 5–read-gap-length
We used the Advantage HF 2 PCR kit (Clontech) with high- 7–read-edit-dist 7–max-insertion-length 5–max-deletion-length
fidelity conditions for two-step PCR amplification, using 1/12 5 -g 3. All nonuniquely aligned reads were then removed. Sequencing
of the purified RT reaction and gene-specific primers targeting data was aligned against the Saccharomyces cerevisiae assembly R64
a single template with a target amplicon size of 300–600 nt for (UCSC, sacCer3) downloaded from the Saccharomyces Genome
low-abundance RNA targets. When possible, we designed our Database on February 8, 2011 (SGD, http://www.yeastgenome.
gene-specific RT primers close to the PCR amplicon of interest, org) or against the longest human RefSeq isoforms (hg19). Despite
and in many cases we used the RT primer as the reverse primer template-switching capabilities of the TGIRT enzyme, we did not
in our PCR reactions. High-abundance RNAs, such as the yeast detect a substantial number of chimeric reads in our data and did
18S rRNA, can be amplified in a single 1.8 kb amplicon. Because not include a processing step beyond alignment to remove these.
of the high GC content of the FXR2 template, we used 200 mM On account of empirically determined mutation enrichment from
NaCl instead of 75 mM KCl in the RT reaction buffer and the nontemplate addition and Nextera XT transposase insertion, we
Advantage GC 2 PCR Kit (Clontech) for its amplification. The trimmed 2, 5, and 7 nt from the 5′ end of each read for TGIRT-,
PCR program begins with 10 cycles at a 65 °C annealing tempera- SSII–Mn2+-, and NexteraXT-generated libraries, respectively.
ture to promote specificity, followed by 20–25 cycles at a 57 °C Mismatches located within 3 nt of an indel were also discarded for
annealing temperature. PCR bands were gel purified on a nonde- future analysis. The ratiometric DMS signal was calculated for each
naturing 8% TBE polyacrylamide gel (Invitrogen) and crushed, nucleotide as number of mismatches/sequencing depth.
extracted, and EtOH precipitated as described above. NexteraXT Target-specific sequencing data prepared with NexteraXT
(Illumina) was used to fragment and prepare amplicons (1 ng) for were combined across both strand alignments, because of lack of
sequencing. Tagmented amplicons were barcoded and amplified strandedness after tagmentation. Transposase insertion is subject
using 12 cycles of PCR, and barcoded libraries were cleaned using to primary sequence biases in transposase insertion; thus it is
1.5× (v/v) PCRClean beads (Aline Biosciences). Libraries were possible (although rare) to have amplicon regions that are poorly
quantified using the Fragment Analyzer (Advanced Analytical) sampled and result in false-positive bases with high ratiometric
and subjected to a final quantification by qPCR before sequencing reactivity due to poor sequencing depth. After linker stripping
by 50 bp single-end reads on the HiSeq4000 (Illumina). with a length requirement for reads >100 nt from a 2 × 150 nt
For the UMI-based RT–PCR, reverse transcriptase primers were MiSeq run, target-specific sequencing data prepared with the
designed with a random 10 nt barcode, labeling each cDNA with a UMI was collapsed to unique reads using FASTX-Collapser.
unique molecular index. Gene-specific variations of oMZ282 were Unique reads are, therefore, the combination of a unique molecu-
used in the reverse transcription reaction described above, followed lar index and internal DMS-induced modifications, which add
by Advantage HF 2 PCR with gene-specific variants of primers sequence diversity beyond the 10 bp UMI.
oMZ282 and oMZ283. Amplicons were purified by polyacryla- Genome-wide yeast DMS-MaPseq data was collected and
mide gel and extracted as described above, and a second round sequenced with two biological replicates for each SSII–Mn2+ and
of PCR was done with 20–25 cycles to add Illumina adaptors and TGIRT, untreated and in vivo DMS-treated libraries. For each
indices for sequencing (oMZ284 and indexing primers). Libraries library variation, we collected a combined total of 90 to 200 million
were constructed so the UMI was sequenced first using custom uniquely mapped reads between yeast replicates and 200 million
Read1 sequencing primer oNTI202. We used the standard Illumina for HEK 293T cells. Note that we sequenced to a similar depth
Read2 primer, and sequencing was done via MiSeq v2 2 × 150 for a genome-wide DMS-MaPseq experiment as we did for our
(Illumina). See primer sequences in Supplementary Table 1. previously published genome-wide DMS-seq method43.