Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Zubradt 2016

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Articles

DMS-MaPseq for genome-wide or targeted RNA


structure probing in vivo
Meghan Zubradt1, Paromita Gupta2, Sitara Persad2, Alan M Lambowitz3, Jonathan S Weissman1 & Silvi Rouskin2

Coupling of structure-specific in vivo chemical modification chemical RNA structure probing. Because of their distinct mecha-
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

to next-generation sequencing is transforming RNA secondary nisms of modification, DMS and SHAPE report on different and
structure studies in living cells. The dominant strategy complementary aspects of RNA structure6,7. In early efforts,
for detecting in vivo chemical modifications uses reverse chemical lesions from either SHAPE or DMS were detected
transcriptase truncation products, which introduce biases when the reverse transcriptase (RT) enzyme terminated cDNA
and necessitate population-average assessments of RNA synthesis upon reaching a modified nucleotide. We and others
structure. Here we present dimethyl sulfate (DMS) mutational have coupled the chemical probing of RNA structure to next-
profiling with sequencing (DMS-MaPseq), which encodes DMS generation sequencing (Fig. 1), allowing for experimental analy-
modifications as mismatches using a thermostable group II sis of RNA structure on a global scale in vitro or in vivo4,8–11
intron reverse transcriptase. DMS-MaPseq yields a high (see refs. 12 and 13 for reviews of sequencing-coupled RNA
signal-to-noise ratio, can report multiple structural features structure techniques). Globally, these experiments have revealed
per molecule, and allows both genome-wide studies and substantial differences in RNA structure in vivo versus in vitro,
focused in vivo investigations of even low-abundance RNAs. underscoring the importance of examining RNA structure in its
We apply DMS-MaPseq for the first analysis of RNA structure native cellular environment4,10.
within an animal tissue and to identify a functional structure Despite important contributions to RNA structure discovery,
involved in noncanonical translation initiation. Additionally, truncation-based approaches using either DMS or SHAPE have
we use DMS-MaPseq to compare the in vivo structure of pre- intrinsic limitations that render them unsuitable to address certain
mRNAs with their mature isoforms. These applications illustrate biological questions, such as the heterogeneity of RNA structures
DMS-MaPseq’s capacity to dramatically expand in vivo analysis in vivo. We sought to develop an in vivo and genome-wide approach
of RNA structure. that would overcome existing limitations in truncation strategies
by encoding DMS lesions as mutations instead of as cDNA trun-
RNA is a functionally diverse molecule that both carries genetic cations, as has been recently described for individual or highly
information and directly conducts biological processes through abundant RNA targets7,14–16. Such mutational profiling (MaP)
its ability to fold into complex secondary and tertiary structures1. approaches confer several advantages. These advantages include
The discovery of functional RNA structures depends critically the resolution of enzymatic biases proximal to the information-
on accurate, targeted, and accessible RNA structure determina- encoding nucleotide and (most importantly) the analysis of multiple
tion methods, particularly in vivo. Sequence information alone chemical modification sites per molecule, which opens up the pos-
is generally not sufficient for prediction of RNA structure, sibility of distinguishing heterogeneous RNA structure subpopu-
but by combining sequence information with experimental lations from one another in vivo. In truncation approaches, only
structure data at single-nucleotide resolution, one can often a single site of chemical modification can be observed per RNA
obtain an accurate assessment of RNA folding status and discover molecule; thus, the structure signal corresponds to a population
novel RNA structures2–4. average. Additionally, low-abundance RNAs are not conducive
Existing high-resolution techniques to measure RNA second- to truncation-based RNA structure probing. Specifically, they are
ary structure are based on structure-specific chemical modifica- poorly sequenced on a genome-wide scale, and input requirements
tion. DMS has emerged as one of the pre-eminent choices for for available low-throughput methods often necessitate in vitro
this application. DMS rapidly and specifically modifies unpaired transcription before structure profiling6,14,15,17. We reasoned
adenines and cytosines in vivo at their Watson–Crick base-pairing that an in vivo MaP approach would make it possible to perform
positions5. Selective 2′-hydroxyl acylation analyzed by primer targeted amplification of low-abundance RNA species while
extension (SHAPE) chemicals are another powerful option for retaining a record of the modification sites.
1Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biology, Center for RNA Systems Biology, Howard Hughes Medical Institute,

University of California, San Francisco, San Francisco, California, USA. 2Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. 3Institute for
Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, USA. Correspondence should be addressed to
J.S.W. (jonathan.weissman@ucsf.edu) or S.R. (srouskin@wi.mit.edu).
Received 24 June; accepted 29 September; published online 7 November 2016; doi:10.1038/nmeth.4057

nature methods | ADVANCE ONLINE PUBLICATION | 


Articles
Here we describe DMS-MaPseq, an RNA structure probing mutational read through of DMS and SHAPE modifications for
strategy that takes advantage of a high-fidelity and processive abundant individual RNA species14–16. To compare the suitability
thermostable group II reverse transcriptase (TGIRT) enzyme. We of these two enzymes for our in vivo DMS-MaPseq approach,
apply this technique globally in vivo and for selected RNA spe- we prepared genome-wide yeast libraries with each. Encoding
cies, including low-abundance RNA targets in yeast and human DMS modifications as mismatches inherently retains the single-
cells, producing the high signal and low background necessary for nucleotide resolution of DMS while insertions or deletions
high data quality. We also highlight a simple RT–PCR approach (indels) suffer from positional ambiguity when aligned across a
for targeted amplification and demonstrate RNA experiments homopolymeric stretch. TGIRT does not produce a high number
inaccessible by previous techniques such as the investigation of indels (6%, Fig. 2a). However, we found that nearly a third
of isoform-specific RNA structure and the discovery of a func- of DMS-induced mutations from SSII–Mn2+ reverse transcrip-
tional structure in the low-abundance human FXR2 mRNA. tion were insertions or deletions. Next, we used two endogenous
DMS-MaPseq enables a far broader exploration of in vivo RNA m1A modifications on the yeast 25S rRNA as internal controls
structure and offers an accessible technical solution to address for DMS lesion detection. The frequency of mismatches at these
structure–function hypotheses for virtually any RNA, regardless residues across TGIRT replicate experiments revealed m1A detec-
of its abundance. tion at 85% and 48% average frequency, placing a lower bound
on the fraction of these endogenous modifications. SSII–Mn2+
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

RESULTS yielded average detection frequencies of only 53% and 1.4% at the
Development of genome-wide in vivo RNA structure same modified residues (Fig. 2b). This tendency of SSII–Mn2+ to
probing with mutational profiling under-report the DMS modification signal in a context-dependent
For DMS-MaPseq, we treated cells with a high concentration of manner could severely undermine data quality.
DMS to increase the number of modifications detected per frag- A valuable measure for the signal-to-noise ratio in DMS
ment, modifying approximately 1 in 50 nucleotides. We compared data is the enrichment of signal on adenines and cytosines4
data produced at this DMS concentration (5% v/v) to previously (Supplementary Fig. 2a). When the same source of DMS-modified
validated concentrations4, and we observed excellent correlation RNA was reverse transcribed using either TGIRT or SSII–Mn2+,
of the RNA structure signal both globally and for each nucle- we observed a far greater fraction of mismatches on A–Cs using
otide in the yeast 18S rRNA (Supplementary Fig. 1; R = 0.94 and TGIRT (TGIRT, 93.5%; SSII–Mn2+, 84%) (Fig. 2c). This high
R = 0.98, respectively). For applications that aim to use even A–C signal in TGIRT data also exceeds that of our previously
higher DMS levels, it will be important to do a similar analysis published DMS-seq strategy based on cDNA truncation, and there
to evaluate whether RNA structures are perturbed with increas- are notable differences in the relative contributions of A–Cs4.
ing DMS concentrations. After DMS treatment and total RNA Analysis of the mismatch nucleotide bias in DMS-seq reveals
extraction, random fragmentation with Zn2+, and the removal that 54% of mismatches occur on cytosines in a DMS-dependent
of ribosomal RNA, we did a broad size selection, ligated a 3′ manner, suggesting that truncation at cytosines is not robust 14
adaptor, and reverse transcribed under conditions in which chem- (Supplementary Fig. 2b,c). Notably, the signal on adenines is
ically modified bases were encoded as mutations in the cDNA lower with SSII–Mn2+ than with the other techniques, which
(Fig. 1). Consequently, multiple modifications can be observed suggests an underlying failure of SSII–Mn2+ to robustly encode
on a single cDNA fragment, providing an essential frame- m1A modifications consistent with the low signal detection on
work for future applications of single-molecule RNA structure the endogenous rRNA residues.
determination. Both TGIRT and SSII–Mn2+ produce excellent signal at
The accuracy of DMS-MaPseq depends critically on reverse unpaired A–C residues in the yeast RPS28B positive-control
transcription conditions that optimize the detection of DMS structure, but the SSII–Mn2+ data reveal high background signal
modifications while retaining high fidelity and processivity dur- on certain G–U residues, suggesting a propensity for non­random
ing cDNA synthesis. The TGIRT enzyme was recently adapted errors in cDNA synthesis. This higher background error for
with these latter priorities in mind and notably produces mis- SSII–Mn2+ is also reflected in the genome-wide frequency of muta-
matches at endogenous m1A and m3C tRNA residues—the exact tions and indels on matched untreated and DMS-treated RNA
methylation profiles of a DMS modification18,19. Additionally, (Supplementary Fig. 2d), which is consistent with the historical
Superscript II with Mn2+ buffer (SSII–Mn2+) has been used for the use of Mn2+ buffer in deliberate mutagenesis during oligonucleotide

Random fragmentation 3′ ligation of adaptor and Selection of


of modified RNA reverse transcription modified RNA or cDNA

AAAA 5′ 3′ Identify ends


Library generation
AAAA Structure calculations
cDNA truncation and sequencing
AAAA relative to control
strategies

Identify mutations
Ratiometric calculation
Mismatch/total

Random fragmentation 3′ ligation of adaptor and Circularization, PCR


0.8
of modified RNA reverse transcription cDNA fragments and sequencing 0.6
+DMS 0.4
AAAA 5′ 3′ 0.2
0
DMS-MaPseq AAAA AGACUAUCAUGAUGCUAGGACA
AAAA

Figure 1 | Sequencing library generation for RNA structure probing techniques. Schematic of library preparation strategies for cDNA truncation
approaches (top) and for DMS-MaPseq (bottom).

 | ADVANCE ONLINE PUBLICATION | nature methods


Articles

a b 100 a 1.00
DMS-MaPseq, denatured rRNA
TGIRT
2+
Percentage of total mutations SSII–Mn 0.95 Endogenous
Reverse
80
m 1A

Mismatch/total (%)
transcriptase Mismatches Deletions Insertions 0.90

Normalized DMS signal


60 0.10
TGIRT 94% 5% 1%
2+ 0.05
SSII–Mn 71% 20% 9%
40
0.00
DMS-seq, denatured rRNA
c 20 1.00
2+
TGIRT SSII–Mn DMS-seq
0.95
T G T T G 0
2.5% 4% 6% G 4.5% 4.5% 1 1 0.90
25S m A 25S m A
10% C 2,142 nt 645 nt
C C 0.10
32%
A 42.5% 49% 0.05
A A
51% 35% 59%
0.00
580 600 620 640 660 680
25S rRNA position (nt)
d A U
G
20
A
U
U
G

A
e 1.0 b 1.0
c 0.9
A
A
U
G A DMS-seq
1 10 U
U
A
C
0.9
G
A
A
U
U C DMS-MaPseq
2+
U C A A A A A A U
A
A
0.8 0.8
SSII–Mn U
U 30
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

R value

0.8

Fraction of bases
U
G G U U U U U U G

50
G
U RPS28B 0.7
0.6 snRNA or snoRNA

GC content
U
U 0.7 0.6
G
U
40 20 G
A
U
0.6 0.5
A U
G A
TGIRT 0.4
A G 2+ 0.4
1 10 U
U
A
A
C
U A
SSII–Mn
U C A A A A A A U G
A
A
U
U C 0.5
TGIRT U
U
A
A
30 –0.10 0.0 0.10 0.20 0.2 0.3
U
Gini index difference
G G U U U U U U G

50 U 0.2 CDS
G 0.0 1.0 (Rep1–Rep2) Non coding
U
U 0.0 0.1
G 0.00 0.02 0.04 0.06 0.08 0.10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
40
Ratiometric DMS signal Gini index

Figure 2 | TGIRT enzyme delivers higher signal and lower background for Increasing structure
DMS-MaPseq. (a) Distribution of mutation type generated by SSII–Mn2+
Figure 3 | Global analysis of in vivo DMS-MaPseq data. (a) Signal decay
or TGIRT reverse transcription from in vivo DMS-treated yeast mRNA.
observed after endogenous m1A modification at position 642 in the
(b) Endogenous m1A modifications in yeast 25S rRNA transcript reveal
yeast 25S rRNA in DMS-seq but not in DMS-MaPseq. (b) Histogram of
superior modification detection with TGIRT. Average percent modification
ratiometric reactivity for negative-control bases in the yeast 18S rRNA.
(bar) detected at the position across two biological DMS-treated replicates
The total number of negative-control bases is 338, characterized as bases
(circles) with error bars representing s.d. from the average. (c) Nucleotide
known to be base paired. (c) Scatter plot of GC content versus Gini index
composition of mismatches from TGIRT or SSII–Mn2+ approaches.
in 50-nucleotide (nt) windows of deeply sequenced genes. Noncoding
(d) Yeast RPS28B mRNA positive-control structure with nucleotides
RNA regions include UTRs and all classes of mammalian noncoding RNAs.
colored by DMS reactivity in vivo. Numbers represent the nucleotide
CDS regions are coding regions. A gray box is placed around small nuclear
position within the displayed region. Black boxes outline G–U bases
RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). The total number of
with high background signal. DMS reactivity was calculated as the average
evaluated windows is 182. Pearson’s correlation = 0.32; P value = 7.3 × 10−6.
ratiometric DMS signal per position across two biological replicates
normalized to the highest number of reads in displayed region, which
is set to 1.0. (e) Genome-wide DMS-MaPseq replicates (Rep1 and Rep2) structure metric to assess the evenness of the data distribution4
compared by Pearson’s R value and Gini index for yeast mRNA regions (Fig. 2e). This analysis revealed a stronger reproducibility
(requiring 15× coverage, resulting in 733 and 272 regions displayed
between data generated by TGIRT than by SSII–Mn 2+, consist-
for TGIRT and SSII–Mn2+, respectively).
ent with our observations of high background noise in the latter
approach. Because of the high DMS signal and low background
synthesis20. Other RNA structure methods have subtracted back- error observed across many quality control metrics, we chose the
ground signal on a nucleotide-by-nucleotide basis15; however, TGIRT enzyme for all further DMS-MaPseq experimentation and
we see an increase in noise after applying a background cor- method development.
rection to the RPS28B positive-control structure21 (Fig. 2d and
Supplementary Fig. 3a–d). Global investigation reveals a poor Global analysis of DMS-MaPseq data
correlation of background signal for both TGIRT and SSII–Mn2+ When DMS lesions are detected by truncation, only the most 3′
untreated replicates, suggesting that background signal is variable DMS modification on an RNA fragment will be detected. For
and stochastic (Supplementary Fig. 3e,f). Thus, a key advantage this reason, DMS treatment conditions must be carefully titrated
of DMS-MaPseq is the ratiometric nature of the data (i.e., in a to avoid improper hit kinetics and 5′ signal decay22. This effect
population-level analysis, the rate of modification at each posi- is illustrated by the lack of DMS-seq signal immediately 5′ of an
tion is equal to the ratio of mutated reads to total reads; Fig. 1). endogenous m1A residue in denatured yeast 25S rRNA (Fig. 3a).
Untreated or denatured DMS-MaPseq controls may still be useful This drop off does not occur with DMS-MaPseq data, confirming
in the discovery of endogenous mRNA modifications encoded that the TGIRT enzyme can encode multiple DMS lesions in a
during reverse transcription19, uncharacterized single-nucleotide short sequence space. Additionally, negative-control bases in the
polymorphisms, or as a negative control, but it is not a necessary yeast rRNA fall overwhelmingly into the lowest bin of reactivity
component for single-nucleotide RNA structure calculations. in DMS-MaPseq data, confirming low background noise relative
We used replicates to assess the reproducibility of the RNA to previous DMS-seq data4 (Fig. 3b).
structure signal across yeast transcriptome regions as measured We also collected a genome-wide in vivo DMS-MaPseq data
by r value and the Gini index difference, an established RNA set from human embryonic kidney (HEK) 293T cells, and we

nature methods | ADVANCE ONLINE PUBLICATION | 


Articles

a 1.0 b 1.0

20× coverage
Average mismatch coverage: 50 million
10× 20× 30× 40× 50× 100 million
0.8 0.8 200 million
Fraction of mRNA regions 1 billion

Fraction of genes
0.6
0.6

0.4
0.4

0.2
0.22
0.2

0.0
0.4 0.5 0.6 0.7 0.8 0.9 1.0
R value 0.0
0 20 40 60 80 100 120

c A
Average mismatch coverage from
AAA 3′ mRNA-specific RT genome-wide DMS-MaPseq data
5′ (or oligo dT priming)
d 100
90
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

Tagmentation
DMS-MaPseq
80
mRNA-specific PCR
70

True-positve rate (%)


60

50

40
NexteraXT 30
fragmentation
20

10

0
NexteraXT PCR; add 0 10 20 30 40 50 60 70 80 90 100
adaptors and indices
False-positive rate (%)
1
Mismatch/total

0.8
0.6
0.4
Align reads; ratiometric calculation
0.2 for structure signal
0
AGACUAUCAUGAUGCUAGGACA AUCAUGG ACAUGA U
20 G
f RPS28B A U G
A U
A
0.0 1.0 A
A U G A
e HAC1 1 10
G A
A U
U
U C
A C
1 U C A A A A A A U U
A 10 A A A A
A C U 30
U C A A G A UG C G A 20 U U
G U U G G G G U U U U U U G
G G U
A G U U C U C G C 30 50 U
C G G
C A G C C
80 U C A C
A A A G U
U 70 G C G U
C A A C G G G G
60 40

Figure 4 | DMS-MaPseq enables in vivo RNA structure probing for specific RNA targets. (a) Cumulative histogram of Pearson’s R values between yeast
mRNA regions in DMS-MaPseq replicates at varied depths of average mismatch coverage. (b) Fraction of genes exceeding the minimum average mismatch
coverage of 20× in genome-wide human HEK 293T DMS-MaPseq data with varied sequencing depths. 0.006, 0.009, and 0.03 are the fraction of genes
passing this threshold at 50, 100, and 200 million uniquely mapped reads, respectively. (c) Schematic for targeted RNA structure probing via target-
specific RT–PCR and NexteraXT tagmentation. (d) ROC curve for DMS signal on yeast 18S rRNA using ratiometric data from target-specific tagmentation
approach and from genome-wide DMS-MaPseq. (e,f) Yeast HAC1 (e) and RPS28B (f) 3′ UTR mRNA positive-control structures from target-specific priming
with nucleotides colored by DMS reactivity in vivo. DMS reactivity calculated as the ratiometric DMS signal per position normalized to the highest number
of reads in displayed region, which is set to 1.0.

confirmed the agreement of our data with the XBP1 positive- the biggest outliers are small nucleolar RNAs (snoRNAs) and
control structure23 (Supplementary Fig. 4). Often, a region of small nuclear RNAs (snRNAs), which have a low GC content but
high GC content is considered a candidate region for RNA folding are highly structured.
owing to the high stability of G–C pairings, so we investigated
this relationship across human transcriptome regions, plotting DMS-MaPseq for specific or low-abundance RNA targets
GC content against the Gini index from DMS-MaPseq (Fig. 3c). Low-abundance mRNAs do not receive sufficient sequencing cov-
A small correlation (R = 0.32) exists, but overall, coding regions erage in genome-wide experiments to enable robust conclusions
have lower GC content, and their RNA appears less structured, about their structure. Plotting the cumulative r value distribution
as we demonstrated previously4. However, the lack of structure for mRNA regions between in vivo DMS-MaPseq replicates in
is more pronounced than expected by GC content alone, and yeast reveals that an average mismatch coverage depth of greater
noncoding RNA regions are more structured than coding DNA than 20× greatly improves data reproducibility (Fig. 4a). However,
sequence (CDS) regions of comparable GC content. Interestingly, for genome-wide HEK 293T DMS-MaPseq data, only a limited

 | ADVANCE ONLINE PUBLICATION | nature methods


Articles

a G
C G G b
C
C
U
60 A 1.0
C U U
G
A U A
G
C A A C U U 70
U
A
oskar 3′ UTR C
50 U U G

Normalized DMS signal


A
A A 0.8 T
A
U
G G
U
A G
C C U 100 0.6
A
C
C 80
40 G
A U U

U
A C A U
G 90 A G G
U C
U G A U A C G
0.4
C A G C C
A A A G U C C
G G G
U C U U 10
0.0 1.0 C
U
A U
U C A U A
U G U
A G 20
A U
0.2
30 A
A A A U
A U 110
A U
G C
0.0
1 40 80 114
U G
A A U Position (nt)
1 114
c
0.08 FXR2 5′ UTR + exon1
Mismatch/total ratio

GUG
0.06
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

0.04

0.02

0.00
50 100 150 200 250 300 350 400
Distance from TSS (nt)
Mismatch/total ratio

0.08

0.06

0.04
0.02

0.00
100 110 120 160 170 180 190 200 210 250 260
Loop Stem1 Stem2

82 nt GUG

263 nt
Stem2 Loop

Stem1

Figure 5 | Novel experimental applications for in vivo RNA structure probing. (a) oskar 3′ UTR mRNA positive-control structure from target-specific
priming with nucleotides colored by in vivo DMS reactivity in D. melanogaster ovaries. DMS reactivity calculated as the ratiometric DMS signal per
position normalized to the highest number of reads in displayed region, which is set to 1.0. (b) oskar positive-control region from a shown with average
normalized DMS-MaPseq values from two biological replicates, one at 5 min DMS treatment and one at 10 min. Error bars represent 1 s.d. (c) Ratiometric
DMS-MaPseq from targeted amplification of the human FXR2 5′ UTR and exon1 sequence. Nucleotides accessible to DMS are noted with a value >0.03,
which is the threshold representing the best agreement with our model. Position 1 corresponds to chromosome XVII:7614897.

fraction of genes pass this 20× coverage threshold (Fig. 4b). Even concordance of 18S rRNA DMS-MaPseq data with the published
when extrapolated to an exorbitant sequencing depth of 1 billion yeast crystal structure model24 and observed an excellent agree-
uniquely mapped reads, many human genes (78%) have insuffi- ment with data from both our genome-wide and targeted approach
cient coverage. To probe the in vivo structure of low-abundance (Fig. 4d and Supplementary Fig. 5a). We also assessed whether
mRNAs, we developed and validated a simple targeted RT–PCR the targeted DMS-MaPseq data supported positive-control mRNA
implementation of DMS-MaPseq (Fig. 4c). Targeted DMS- structure models, and we observed excellent agreement with the
MaPseq begins with the in vivo modification of RNA, followed yeast HAC1 and RPS28B structures21,25 (Fig. 4e,f) as well as with
by total RNA extraction, DNase treatment, and rRNA deple- the human XBP1 and MSRB1 structures23,26 (Supplementary
tion. Then, we reverse-transcribe using the TGIRT enzyme and Fig. 5b–e). Finally, we observed no signal drop-off in our ampli-
target-specific primers (primers can be used in combination to fied regions until the primer-binding region, and we observed a
amplify multiple RNA species in a single reaction). Directly after low level of background signal (Supplementary Fig. 6).
cDNA synthesis, target-specific PCR primers amplify the RNA To reduce PCR amplification biases for quantitative applica-
region of interest, and this is followed by NexteraXT tagmentation tions or low-input material, we also developed a variation of tar-
and sequencing. geted DMS-MaPseq that tags each RNA molecule with a unique
To assess data quality from this targeted approach, we exam- molecular index (UMI) on the RT primer (Supplementary Fig. 7a
ined the structure signal for known RNA structures. We plot- and Supplementary Table 1). Unique reads can then be isolated
ted a receiver operating characteristic (ROC) curve to assess the easily based on their specific UMI and DMS mutation profiles.

nature methods | ADVANCE ONLINE PUBLICATION | 


Articles

a b 0.16
RPL14A pre-mRNA

1.0 *
A allele 0.12
0.8 1
* * * U *
0.6 U C U U C
U C
U
0.08
Combined alleles A * 10
Normalized DMS signal

Normalized DMS signal

Mismtatch/total ratio
1.0 0.4 U G A C G G U
C
A G G A G G A G
C
U 0.04
G
*
G U U
30 U
*
0.8 0.2 0.00
20
0.6 0.0
UUCUU CUCUAUGCG A GGAU U UGGAC UGGC A GUG
0.4 1.0
C allele RPL14A spliced mRNA
0.2 0.8 1 0.16
10
* *
A
*
U C G
0.0 0.6 U
U U C U C
U
C U G C
G
A G
U
0.12
UUCUUCUCUMUGCGAGGA UUUGGACUGGCAGUG U 20
0 5 10 15 20 25 30 0.4 G G A C G U C
U
U G
*
G 0.08
Position (nt) 0.2 30
A G

0.0 0.04
UUCUUCUCUCUGCGAGGAUUUGGACUGGC AGUG
0 5 10 15 20 25 30 0.00
Position (nt) 100 200 300 400 500 600 700
Distance from AUG (nt)

Exon1 Exon2

Figure 6 | Investigating RNA structure heterogeneity with DMS-MaPseq. (a) Regions of heterogeneous structure exhibit indistinguishable structure
signals when combined but can be distinguished by DMS-MaPseq, which is illustrated by normalized DMS-MaPseq data derived from the human MRPS21
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

ribosnitch A–C alleles. Allele-specific data represented as the mean of three technical replicates. Error bars represent 1 s.d.; stars mark A–C nucleotides
with different pairing states between alleles. (b) Targeted DMS-MaPseq data specific for the yeast RPL14A pre-mRNA and spliced mRNA isoforms reveal
minimal structure difference in the common exon1 sequence (R = 0.88). Ratiometric in vivo DMS-MaPseq data is plotted with isoform-specific RT primer
locations noted with arrows.

The SFT2 and ASH1 yeast mRNAs are weakly expressed and host Supplementary Fig. 9; free energy < –31 kcal/mol), with some
functional RNA structures in their 5′ and 3′ UTRs, respectively, ambiguity across certain regions depending on the thresholds
serving as positive controls for DMS signal detection using a UMI. used to impose folding constraints (see alternative structure
Indeed, both controls show DMS modification profiles consistent model, Supplementary Fig. 10a). We mutated these putative FXR2
with the known secondary structure models4,27 (Supplementary structures to perturb the majority of base-pairing interactions in
Fig. 7b,c). Irrespective of their uniqueness, these data are in both models and tested their effects within a reporter construct,
excellent agreement when processed, which suggests that a UMI revealing a drop in protein levels upon mutating either struc-
may not be necessary for amplification of transcripts of com- ture (Supplementary Fig. 10b–d and Supplementary Table 2).
parable abundance. Given the limitations regarding the size of Compensatory mutations, designed to optimize the restora-
RNA region assayed with this UMI approach and the expense of tion of our predicted RNA structures, restored eGFP levels and
longer sequencing reads, choosing between the targeted versions thus implicated the structure itself as a functional modulator of
of DMS-MaPseq depends on the region size, target abundance, translation initiation for FXR2. In addition to the compensa-
and quantitative demands of an experiment. tory mutations, the in vivo structure signal supports this model
(Supplementary Fig. 9c–e).
DMS-MaPseq for Drosophila melanogaster ovaries
With their dramatic developmental changes independent of Structure probing of RNAs in multiple conformations
transcription and mRNA degradation, D. melanogaster oocytes In the complex environment of the cell, the structure of an RNA
provide a premier system for studying mRNA localization and molecule may vary based on its current state, such as matura-
translational control. Many mRNAs are localized during oogen- tion, translation, protein binding, and degradation. In the case of
esis28, and while these localization mechanisms are poorly under- structural heterogeneity from a ribosnitch (i.e., a single-nucleotide
stood globally, RNA structure has been shown to be involved29–31. polymorphism that yields a local RNA structure rearrangement),
Here, we apply targeted DMS-MaPseq to D. melanogaster ovaries, the interpretation of in vitro RNA folding status differs greatly
which yields excellent structure data at two DMS treatment levels when DMS-MaPseq data from the two human MRPS21 ribosnitch
consistent with the oskar and gurken mRNA structures respon- alleles35 are analyzed together or separately. Allele-specific analysis
sible for localization31,32 (Fig. 5a,b and Supplementary Fig. 8). of the data reveals two distinct and mutually exclusive structures,
This is the first example of RNA structure probing in an animal which are not detectable from the combined allele analysis (Fig. 6a).
tissue and marks a key step forward in investigating the role of This example illustrates the complexity of analyzing structur-
RNA structure in mRNA localization in this model system. ally heterogeneous regions and a simple resolution using DMS-
MaPseq to separate specific RNA subpopulations by allele.
A highly structured region influences noncanonical Of particular interest regarding structural heterogeneity
translation initiation are isoform-specific RNA structures. Structure differences have
We recently discovered that translation of the mammalian FXR2 been proposed between pre-mRNAs and their mature translated
(Fragile X Mental Retardation, Autosomal Homolog 2) gene initi- counterparts, such as RNA structures that influence splice-site
ates predominantly at a GUG codon33. On account of the extreme selection36 or affect translation37,38. We used intron- or exon-
GC content (80%) of the first exon of FXR2, we hypothesized a specific RT primers to separately amplify each isoform of two
stable RNA structure may contribute to the non-canonical initia- yeast ribo­somal protein genes using targeted DMS-MaPseq. The
tion. We used in vitro DMS-MaPseq data to develop a secondary RNA structure signal in the common exon1 sequence between
structure model with RNAfold34. This revealed two highly stable the RPL14A and RPL31B pre-mRNAs and their respective mature
putative structures flanking the GUG initiation codon (Fig. 5c, counterparts reveals surprisingly little structure difference

 | ADVANCE ONLINE PUBLICATION | nature methods


Articles
between isoforms (Fig. 6b, Supplementary Figs. 11 and 12). AUTHOR CONTRIBUTIONS
These mRNAs are highly translated, but their exon1 structure M.Z., J.S.W., and S.R. designed the experiments. A.M.L. provided early samples of
the TGIRT enzymes and advice on troubleshooting and methods. M.Z., P.G., and
is similar to that of the untranslated pre-mRNA, suggesting that S.R. performed the experiments and analyzed the data with help from S.P. M.Z.,
local RNA structure rapidly refolds after translation. While we J.S.W., and S.R. drafted and revised the manuscript, and all authors reviewed the
focus here on a limited number of messages, this approach broadly manuscript and provided comments.
enables the analysis of different RNA isoforms.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details are available in the
DISCUSSION online version of the paper.
Here we establish DMS-MaPseq as a robust and simple tool suit-
able for the quantitative analysis of RNA secondary structure Reprints and permissions information is available online at http://www.nature.
com/reprints/index.html.
in vivo by improving the inherent quality of the structure data,
enabling qualitatively new types of structure to be gathered, and 1. Mortimer, S.A., Kidwell, M.A. & Doudna, J.A. Insights into RNA structure and
greatly expanding the repertoire of RNAs that can be analyzed. function from genome-wide studies. Nat. Rev. Genet. 15, 469–479 (2014).
2. Deigan, K.E., Li, T.W., Mathews, D.H. & Weeks, K.M. Accurate SHAPE-directed
Future applications include in vivo single-molecule analyses of the
RNA structure determination. Proc. Natl. Acad. Sci. USA 106, 97–102 (2009).
co-occurrence of DMS modifications to identify heterogeneous 3. Ouyang, Z., Snyder, M.P. & Chang, H.Y. SeqFold: genome-scale
RNA structure subpopulations (e.g., ribosnitches35) empirically. reconstruction of RNA secondary structure integrating high-throughput
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

Additionally, DMS-MaPseq allows the selective amplification sequencing data. Genome Res. 23, 377–387 (2013).
4. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J.S.
of RNA targets, including pre-mRNAs or differentially spliced Genome-wide probing of RNA structure reveals active unfolding of mRNA
isoforms. Together, these advances drastically expand the range structures in vivo. Nature 505, 701–705 (2014).
of experimentally accessible RNA species for structural analysis, 5. Wells, S.E., Hughes, J.M., Igel, A.H. & Ares, M. Jr. Use of dimethyl sulfate
enabling a wide range of future studies. In theory, our in vivo to probe RNA structure in vivo. Methods Enzymol. 318, 479–493 (2000).
6. Mortimer, S.A. & Weeks, K.M. A fast-acting reagent for accurate analysis
MaP approach with TGIRT could also be used for SHAPE, which of RNA secondary and tertiary structure by SHAPE chemistry. J. Am. Chem.
would be a valuable and complementary approach. However, Soc. 129, 4144–4145 (2007).
the bulky nature of the best characterized and validated in vivo 7. Smola, M.J., Rice, G.M., Busan, S., Siegfried, N.A. & Weeks, K.M. Selective
2′-hydroxyl acylation analyzed by primer extension and mutational
SHAPE chemical, NAI-N3 (ref. 10), may prove challenging. profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure
Finally, DMS-MaPseq could be combined with the analysis of analysis. Nat. Protoc. 10, 1643–1669 (2015).
endogenous mRNA modifications, including the sequencing- 8. Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure
based mapping of pseudouridines or m6A methylation39–42. These reveals novel regulatory features. Nature 505, 696–700 (2014).
9. Lucks, J.B. et al. Multiplexed RNA structure characterization with
endogenous modifications occur on only a subset of their RNA selective 2′-hydroxyl acylation analyzed by primer extension sequencing
targets. Combined with the single-molecule aspects of DMS- (SHAPE-Seq). Proc. Natl. Acad. Sci. USA 108, 11063–11068 (2011).
MaPseq, it would be possible to evaluate how such endogenous 10. Spitale, R.C. et al. Structural imprints in vivo decode RNA regulatory
mechanisms. Nature 519, 486–490 (2015).
RNA modification affects structure within a single experiment.
11. Poulsen, L.D., Kielpinski, L.J., Salama, S.R., Krogh, A. & Vinther, J. SHAPE
It is the versatility of DMS-MaPseq that makes it a transformative Selection (SHAPES) enrich for RNA structure signal in SHAPE sequencing-
tool for in vivo RNA structure probing, allowing for more com- based probing data. RNA 21, 1042–1052 (2015).
prehensive investigations into the biological relevance of RNA 12. Kwok, C.K., Tang, Y., Assmann, S.M. & Bevilacqua, P.C. The RNA
structurome: transcriptome-wide structure probing with next-generation
structures than ever before. sequencing. Trends Biochem. Sci. 40, 221–232 (2015).
13. Strobel, E.J., Watters, K.E., Loughrey, D. & Lucks, J.B. RNA systems
Methods biology: uniting functional discoveries and structural tools to understand
Methods and any associated references are available in the online global roles of RNAs. Curr. Opin. Biotechnol. 39, 182–191 (2016).
14. Homan, P.J. et al. Single-molecule correlated chemical probing of RNA.
version of the paper. Proc. Natl. Acad. Sci. USA 111, 13858–13863 (2014).
15. Siegfried, N.A., Busan, S., Rice, G.M., Nelson, J.A.E. & Weeks, K.M.
Accession codes. Raw and processed data are available at NCBI RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP).
Nat. Methods 11, 959–965 (2014).
Gene Expression Omnibus, accession number GSE84537. 16. Smola, M.J., Calabrese, J.M. & Weeks, K.M. Detection of RNA–protein
interactions in living cells with SHAPE. Biochemistry 54, 6867–6875 (2015).
Note: Any Supplementary Information and Source Data files are available in the 17. Inoue, T. & Cech, T.R. Secondary structure of the circular form of the
online version of the paper. Tetrahymena rRNA intervening sequence: a technique for RNA structure
analysis using chemical probes and reverse transcriptase. Proc. Natl. Acad.
Acknowledgments Sci. USA 82, 648–652 (1985).
We thank A. Fields from UCSF for FXR2 reporter plasmids; T. Norman, A. Fields, 18. Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion
and J. Quinn for insightful discussions and comments on the manuscript; proteins and their use in cDNA synthesis and next-generation RNA
A. Jaeger (Whitehead Institute for Biomedical Research, Cambridge, sequencing. RNA 19, 958–970 (2013).
Massachusetts, USA) for providing HEK 293T cells; and the Orr-Weaver lab at the 19. Katibah, G.E. et al. Broad and adaptable RNA structure recognition
Whitehead Institute for providing flies. We also thank Y. Chen, D. Bogdanoff, by the human interferon-induced tetratricopeptide repeat protein IFIT5.
E. Chow, and J. Lund at the UCSF Center for Advanced Technology for sequencing Proc. Natl. Acad. Sci. USA 111, 12025–12030 (2014).
assistance; J. Love and S. Levine in the Whitehead Core and MIT BioMicro 20. Beckman, R.A., Mildvan, A.S. & Loeb, L.A. On the fidelity of DNA
Center for library preparation; and C. Reiger, M. DeVera, J. Kanter, and replication: manganese mutagenesis in vitro. Biochemistry 24, 5810–5817
G. McCauley for administrative support. This research was supported by (1985).
the CRSB (Center for RNA Systems Biology; grant P50 GM102706 to J.S.W.), 21. Badis, G., Saveanu, C., Fromont-Racine, M. & Jacquier, A. Targeted mRNA
the Howard Hughes Medical Institute (J.S.W.), the National Science Foundation degradation by deadenylation-independent decapping. Mol. Cell 15, 5–15
grant 1144247 (M. Z.), and the Genentech Foundation (M.Z.). Research on (2004).
TGIRTs and their modes of use was supported by NIH R01 grants GM37949 22. Aviran, S. & Pachter, L. Rational experiment design for sequencing-based
and GM37951 (A.M.L.). RNA structure mapping. RNA 20, 1864–1877 (2014).

nature methods | ADVANCE ONLINE PUBLICATION | 


Articles
23. Hooks, K.B. & Griffiths-Jones, S. Conserved RNA structures in the non- 33. Fields, A.P. et al. A regression-based analysis of ribosome-profiling data
canonical Hac1/Xbp1 intron. RNA Biol. 8, 552–556 (2011). reveals a conserved complexity to mammalian translation. Mol. Cell 60,
24. Ben-Shem, A. et al. The structure of the eukaryotic ribosome at 3.0 Å 816–827 (2015).
resolution. Science 334, 1524–1529 (2011). 34. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
25. Aragón, T. et al. Messenger RNA targeting to endoplasmic reticulum stress 35. Wan, Y. et al. Landscape and variation of RNA secondary structure across
signaling sites. Nature 457, 736–740 (2009). the human transcriptome. Nature 505, 706–709 (2014).
26. Latrèche, L., Jean-Jean, O., Driscoll, D.M. & Chavatte, L. Novel structural 36. Meyer, M., Plass, M., Pérez-Valle, J., Eyras, E. & Vilardell, J. Deciphering
determinants in human SECIS elements modulate the translational recoding 3′ss selection in the yeast genome reveals an RNA thermosensor that
of UGA as selenocysteine. Nucleic Acids Res. 37, 5868–5880 (2009). mediates alternative splicing. Mol. Cell 43, 1033–1039 (2011).
27. Chartrand, P., Meng, X.H., Singer, R.H. & Long, R.M. Structural elements 37. Babendure, J.R., Babendure, J.L., Ding, J.-H. & Tsien, R.Y. Control of
required for the localization of ASH1 mRNA and of a green fluorescent mammalian translation by mRNA structure near caps. RNA 12, 851–861
protein reporter particle in vivo. Curr. Biol. 9, 333–338 (1999). (2006).
28. Jambor, H. et al. Systematic imaging reveals features and changing 38. Kudla, G., Murray, A.W., Tollervey, D. & Plotkin, J.B. Coding-sequence
localization of mRNAs in Drosophila development. eLife 4, e05003 (2015). determinants of gene expression in Escherichia coli. Science 324,
29. MacDonald, P.M. bicoid mRNA localization signal: phylogenetic 255–258 (2009).
conservation of function and RNA secondary structure. Development 110, 39. Carlile, T.M. et al. Pseudouridine profiling reveals regulated mRNA
161–171 (1990). pseudouridylation in yeast and human cells. Nature 515, 143–146
30. Bullock, S.L., Ringel, I., Ish-Horowicz, D. & Lukavsky, P.J. A′-form RNA (2014).
helices are required for cytoplasmic mRNA transport in Drosophila. Nat. 40. Schwartz, S. et al. Transcriptome-wide mapping reveals widespread
Struct. Mol. Biol. 17, 703–709 (2010). dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell 159,
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

31. Jambor, H., Brunel, C. & Ephrussi, A. Dimerization of oskar 3′ UTRs 148–162 (2014).
promotes hitchhiking for RNA localization in the Drosophila oocyte. 41. Meyer, K.D. et al. Comprehensive analysis of mRNA methylation reveals
RNA 17, 2049–2057 (2011). enrichment in 3′ UTRs and near stop codons. Cell 149, 1635–1646
32. Van De Bor, V., Hartswood, E., Jones, C., Finnegan, D. & Davis, I. gurken (2012).
and the I factor retrotransposon RNAs share common localization signals 42. Dominissini, D. et al. Topology of the human and mouse m6A RNA
and machinery. Dev. Cell 9, 51–62 (2005). methylomes revealed by m6A-seq. Nature 485, 201–206 (2012).

 | ADVANCE ONLINE PUBLICATION | nature methods


ONLINE METHODS groups left after random fragmentation were resolved by adding
Step-by-step protocols for target-specific and genome-wide 1.5 µl rSAP (NEB), 1 µl of SUPERase Inhibitor (Ambion), and
DMS-MaPseq are available as Supplementary Protocols 1 and 2 incubating at 37 °C for 1 h. After heat inactivation of the phos-
(and refs. 43 and 44). phatase at 65 °C for 5 min, the samples were then directly ligated
to 25 pmol of miRNA cloning linker-2 (IDT) by adding 2 µl T4
Media and growth conditions. Yeast strain BY4741 was grown in RNA ligase2, truncated K227Q (NEB), 1 µl 0.1 M DTT, 6.5 µl
YPD medium at 30 °C. Saturated cultures were diluted to OD600 50% PEG, 1 µl 10× T4 RNL2 buffer, and incubating for 2 h at
of ~0.09 and grown to a final OD600 of 0.5–0.7 at the time of DMS 25 °C. Reactions were purified by EtOH precipitation (as above),
treatment. HEK 293T cells were grown in DMEM medium with and excess linker was degraded for 1 h at 30 °C in a 20 µL reaction
high glucose, supplemented with glutamine, pyruvate, nonessen- of 1× RecJ buffer, 1 µl SUPERase Inhibitor, 1 µl 5′ Deadenylase
tial amino acids, and 10% fetal bovine serum (FBS); and cells were (Epicentre), and 1 µl RecJ exonuclease (Epicentre). Ribosomal
treated with DMS at ~80% confluence. RNA was depleted using RiboZero (Epicentre) with a final incu-
bation of 5 min at 40 °C, instead of 50 °C as recommended in
Dimethyl sulfate (DMS) modification. For in vivo DMS modi- the commercial protocol, and purified by EtOH precipitation.
fication in yeast, 15 ml of exponentially growing yeast was incu- Reverse transcription was performed in a 10 µl volume with
bated with 750 µl DMS (Sigma) for 4 min at 30 °C. DMS was 1 pmol oCJ200-link2. To begin, a mixture of RNA–primer–buffer
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

quenched by adding a 30 ml stop solution comprised of 30% was incubated at 80 °C for 2 min to denature the template, then
beta-mercaptoethanol (from a 14.2 M stock) and 50% isoamyl it was returned to ice for the addition of SUPERase Inhibitor
alcohol, after which cells were quickly put on ice, collected by (Ambion), DTT, dNTPs, and RT enzyme to generate the final
centrifugation at 3,500 × g at 4 °C for 4 min, and washed with reaction conditions. For reverse transcription using superscript
10 ml 30% BME solution. Cells were then resuspended in 0.6 ml II with Mn2+ buffer, we followed the exact published reactions
total RNA lysis buffer (6 mM EDTA, 45 mM NaOAc pH 5.5), conditions for mutational profiling14 (0.5 mM dNTPs, 50 mM
and total RNA was purified with hot acid phenol (Ambion) and Tris–HCl pH 8.0, 75 mM KCl, 6 mM MnCl2, and 10 mM DTT)
EtOH precipitation. Ribosomal RNA was depleted using RiboZero and allowed the reaction to proceed for 2–3 h at 42 °C with 100 U
(Epicentre), either directly after RNA extraction or postligation in of superscript II (Invitrogen). Because TGIRT may pause at modi-
the genome-wide library preparation. Denatured RNA structure fication sites, this long incubation time facilitates readthrough of
samples were treated as in DMS-seq4. For HEK 293T cells, 15 cm45 multiple modifications per RNA fragment. For the TGIRT reverse
plates with 15 ml of media were treated with the addition of 300 µl transcription, a 5 min incubation at room temperature followed
DMS and incubation at 37 °C for 4–5 min. Media with DMS the initial denaturation, and the RT reaction proceeded for 1.5 h
was decanted, and plates were washed twice in 30% BME (v/v). at 57 °C with 100 U TGIRT-III enzyme (InGex) and the following
Cells were resuspended in Trizol, and RNA was isolated accord- reaction conditions: 1 mM dNTPs, 5 mM freshly prepared DTT
ing to the manufacturer protocol. For D. melanogaster oocytes, (Sigma-Aldrich), 10 U SUPERase Inhibitor, 50 mM Tri–HCl pH
we dissected ovaries from ~100 flies (OreR strain) in 250 µl 8.3, 75 mM KCl, and 3 mM MgCl2. After reverse transcription,
1× PBS. We added 250 µl DMS for 5 min at 26 °C with shaking at 1 µl of 5 M NaOH was added and the reaction incubated for 3 min
500 r.p.m. To stop the reaction, we added 1 ml of 30% BME (v/v) at 95 °C to degrade the RNA, followed by EtOH precipitation and
and transferred the oocytes to a sieve, where they were washed gel purification to remove excess RT primer. Finally, cDNAs were
three times in 30% BME and two times with sterile water. Finally, circularized using CircLigase (Epicentre), and Illumina sequenc-
the ovaries were collected and resuspended in 1mL of Trizol and ing adapters and indexes were introduced by 9–13 cycles of PCR
10 µl BME, and total RNA was extracted. using Phusion HF Polymerase (NEB), oNTI231, and indexing
primers with TruSeq 6 bp indices. Libraries were sequenced with
Library generation and genome-wide DMS-MaPseq. Sequencing oNTI202 in 50 nt single-end reads on the HiSeq4000 (Illumina).
libraries were prepared with a modified version of the protocol See primer sequences in Supplementary Table 1.
used for DMS-seq4. Specifically, 10 µg of DMS-treated total RNA
was denatured for 2 min at 95 °C then fragmented at 95 °C for Library generation and targeted DMS-MaPseq. After in vivo
2 min in 1× RNA Fragmentation Reagent (Zn2+ based, Ambion). DMS treatment and total RNA extraction, 5 µg of total RNA was
Note that this is an increase in starting material over the 1–3 µg DNase-treated for 30 min at 37 °C in 1× TURBO DNase buffer with
used in our previous DMS-seq approach (REF). The reaction was 1 µl TURBO DNase enzyme (Thermo Fisher Scientific). Reactions
stopped with 1× Stop Solution (Ambion) and quickly placed on were desalted using RNA Clean & Concentrator-5 columns (Zymo
ice. The fragmented RNA was run on a 6% Tris Borate Urea (TBU) Research), and rRNA was depleted using RiboZero (Epicentre) or
polyacrylamide gel for 45 min at 150 V. A blue light (Invitrogen) RNase H for D. melanogaster and HEK 293T samples; RNase H
was used for gel imaging, and RNA fragments of 100–170 nt in treatment was implemented with slight modifications to the pub-
size were excised, depleting small ncRNA contaminants of <100 nt lished protocol46. For the RNase H protocol, briefly, 5 µg of total
(tRNAs, snoRNAs). Gel extraction was performed by crushing RNA was depleted of small RNA species with a Zymo RNA Clean &
the purified gel piece and incubating in 300 µl 300 mM NaCl Concentrator-5 column, retaining RNA >200 nt per manufac-
at 70 °C for 10 min with vigorous shaking. The RNA was then turer instructions. RNase H subtraction was performed by adding
precipitated by adding 2 µl GlycoBlue (Invitrogen) and 3× volume 5 µg of published subtraction oligos46 in a total volume of 30 µl
(900 µl) 100% EtOH, incubating on dry ice for 20 min and spin- in 1× hybridization buffer (200 mM NaCl, 100 mM Tris pH 7.5).
ning at 20,000 × g for 45 min at 4 °C. The samples were then resus- The mixture was incubated at 68 °C for 1 min, and the tempera-
pended in 7 µl 1× CutSmart buffer (NEB), and the 3′ phosphate ture was ramped down at a rate of 1 °C/min to 45 °C. MgCl2 was

doi:10.1038/nmeth.4057 nature methods


added to a 10 mM final concentration, and 3 µl of Hybridase Ribosnitch RNA preparation. dsDNA corresponding to the
Thermostable RNase H (Epicentre) was added, followed by a human MRPS21 sequences shown below were in vitro transcribed,
30 min incubation at 45 °C. The reaction was again purified by mixed, and folded by denaturing at 95 °C followed by a brief incu-
Zymo RNA Clean & Concentrator-5 column to deplete small RNA bation at 37 °C in 350 mM sodium cacodylate buffer and 6 mM
species, followed by treatment with DNaseI (Ambion) per manu- MgCl2. 10% DMS (v/v) was added, and the sample was incubated
facturer instructions and a final column clean up to remove excess for 10 min at 37 °C. The reaction was stopped by placing on ice
RNase H subtraction oligos. and adding BME to 30% final volume. The RNA was then puri-
20–100 ng of RNA was used for reverse transcription with fied by RNA Clean & Concentrator-5 column (Zymo), and the
100 U TGIRT-III (InGex) for 2 h at 57 °C in the same TGIRT small RNA fraction was collected and prepared for sequencing
reaction conditions as those described above. We used 5–10 pmol as described in the genome-wide strategy above.
of each gene-specific RT primer and successfully pooled up to MRPS21 A allele: 5′-TGCTGCCATCTCTTTTCTTCTCTATG
six different RT primers in one reaction, using no more than CGAGGATTTGGACTGGCAGTG-3; MRPS21 C allele: 5′-ATC
35 pmol total. DTT was prepared from powder directly before TCTTTTCTTCTCTCTGCGAGGATTTGGACTGGCAGTGAG
reverse transcription, and we omitted the denaturation step before AATAAGAGACAA-3′.
reverse transcription on account of low-level fragmentation of
DMS-treated RNA at high temperatures. After moving the reac- Sequencing alignment and analysis. Raw fastq files were
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

tion to ice, 1 µl RNase H (Enzymatics, 5 U/µl) was added, and stripped of linker sequences and filtered for quality using the
RNA–DNA hybrids were degraded at 37 °C for 20 min to release FASTX-Toolkit Clipper and Quality Filter functions, respectively,
the cDNA. We used RNase H at this step for convenience—NaOH requiring that 80% of sequenced bases have a quality score >25
hydrolysis as used in the genome-wide protocol also works well (http://hannonlab.cshl.edu/fastx_toolkit/). Reads were aligned
at this step. cDNA was purified using the ssDNA protocol for using Tophat v2.1.0 with bowtie2 with the following settings for
DNA Clean & Concentrator-5 columns (Zymo Research). a 50 nt sequencing run:–no-novel-juncs -N 5–read-gap-length
We used the Advantage HF 2 PCR kit (Clontech) with high- 7–read-edit-dist 7–max-insertion-length 5–max-deletion-length
fidelity conditions for two-step PCR amplification, using 1/12 5 -g 3. All nonuniquely aligned reads were then removed. Sequencing
of the purified RT reaction and gene-specific primers targeting data was aligned against the Saccharomyces cerevisiae assembly R64
a single template with a target amplicon size of 300–600 nt for (UCSC, sacCer3) downloaded from the Saccharomyces Genome
low-abundance RNA targets. When possible, we designed our Database on February 8, 2011 (SGD, http://www.yeastgenome.
gene-specific RT primers close to the PCR amplicon of interest, org) or against the longest human RefSeq isoforms (hg19). Despite
and in many cases we used the RT primer as the reverse primer template-switching capabilities of the TGIRT enzyme, we did not
in our PCR reactions. High-abundance RNAs, such as the yeast detect a substantial number of chimeric reads in our data and did
18S rRNA, can be amplified in a single 1.8 kb amplicon. Because not include a processing step beyond alignment to remove these.
of the high GC content of the FXR2 template, we used 200 mM On account of empirically determined mutation enrichment from
NaCl instead of 75 mM KCl in the RT reaction buffer and the nontemplate addition and Nextera XT transposase insertion, we
Advantage GC 2 PCR Kit (Clontech) for its amplification. The trimmed 2, 5, and 7 nt from the 5′ end of each read for TGIRT-,
PCR program begins with 10 cycles at a 65 °C annealing tempera- SSII–Mn2+-, and NexteraXT-generated libraries, respectively.
ture to promote specificity, followed by 20–25 cycles at a 57 °C Mismatches located within 3 nt of an indel were also discarded for
annealing temperature. PCR bands were gel purified on a nonde- future analysis. The ratiometric DMS signal was calculated for each
naturing 8% TBE polyacrylamide gel (Invitrogen) and crushed, nucleotide as number of mismatches/sequencing depth.
extracted, and EtOH precipitated as described above. NexteraXT Target-specific sequencing data prepared with NexteraXT
(Illumina) was used to fragment and prepare amplicons (1 ng) for were combined across both strand alignments, because of lack of
sequencing. Tagmented amplicons were barcoded and amplified strandedness after tagmentation. Transposase insertion is subject
using 12 cycles of PCR, and barcoded libraries were cleaned using to primary sequence biases in transposase insertion; thus it is
1.5× (v/v) PCRClean beads (Aline Biosciences). Libraries were possible (although rare) to have amplicon regions that are poorly
quantified using the Fragment Analyzer (Advanced Analytical) sampled and result in false-positive bases with high ratiometric
and subjected to a final quantification by qPCR before sequencing reactivity due to poor sequencing depth. After linker stripping
by 50 bp single-end reads on the HiSeq4000 (Illumina). with a length requirement for reads >100 nt from a 2 × 150 nt
For the UMI-based RT–PCR, reverse transcriptase primers were MiSeq run, target-specific sequencing data prepared with the
designed with a random 10 nt barcode, labeling each cDNA with a UMI was collapsed to unique reads using FASTX-Collapser.
unique molecular index. Gene-specific variations of oMZ282 were Unique reads are, therefore, the combination of a unique molecu-
used in the reverse transcription reaction described above, followed lar index and internal DMS-induced modifications, which add
by Advantage HF 2 PCR with gene-specific variants of primers sequence diversity beyond the 10 bp UMI.
oMZ282 and oMZ283. Amplicons were purified by polyacryla- Genome-wide yeast DMS-MaPseq data was collected and
mide gel and extracted as described above, and a second round sequenced with two biological replicates for each SSII–Mn2+ and
of PCR was done with 20–25 cycles to add Illumina adaptors and TGIRT, untreated and in vivo DMS-treated libraries. For each
indices for sequencing (oMZ284 and indexing primers). Libraries library variation, we collected a combined total of 90 to 200 million
were constructed so the UMI was sequenced first using custom uniquely mapped reads between yeast replicates and 200 million
Read1 sequencing primer oNTI202. We used the standard Illumina for HEK 293T cells. Note that we sequenced to a similar depth
Read2 primer, and sequencing was done via MiSeq v2 2 × 150 for a genome-wide DMS-MaPseq experiment as we did for our
(Illumina). See primer sequences in Supplementary Table 1. previously published genome-wide DMS-seq method43.

nature methods doi:10.1038/nmeth.4057


HEK 293T Gini index calculations. UTR and coding regions A gBlock (IDT) was ordered containing a 43bp FXR2–3×FLAG–
were defined by RefSeq coordinates, and we analyzed 50 nt win- T2A–AgeI-eGFP (40-bp fragment) for HiFi assembly (NEB) into
dows beginning at the annotated transcription start site. After the linearized plasmid backbone. This wild-type plasmid was used
requiring a minimum number of 100 total reads at As or Cs and as the PCR template for FXR2 mutations, which were designed as
>20× mismatch coverage for each window, we also discarded any overhangs on primers against the relevant portion of the FXR2
windows with evidence for endogenous modifications (>15% mis- exon1 sequence, resulting in 5′ and 3′ fragments with overlapping
match rate). The Gini index was calculated only for A and C bases, mutated regions for HiFi assembly into the linearized wild-type
as done previously4. backbone. Successful amplification of fragments was confirmed
by running a fraction on an agarose gel, and the remainder of the
Minimum average coverage calculation. Using 100 nt tran- fragments was purified using DNA Clean & Concentrator-5 col-
scriptome windows, we chose the window with the highest total umns (Zymo) or, in the case of contaminating PCR bands, purified
sequence coverage as representative coverage for the gene. We via agarose gel and MinElute gel extraction (Qiagen). Common
counted the fraction of genes from the hg19 RefSeq annotation cloning primers for FXR2 amplification from the plasmid are
that had an average mismatch coverage >120 mismatches at 5′-CTCACTCGGCGCGCCAGTC-3′ (5′ FXR2 fragment, for-
sequencing depths of 50, 100, and 200 million uniquely mapped ward) and 5′-TATAGTCCCCGTCGTGATCCTTGTA-3′ (3′ FXR2
reads. We extrapolated the data for 1 billion reads. fragment, reverse). Inserts in all analyzed constructs were con-
© 2016 Nature America, Inc., part of Springer Nature. All rights reserved.

firmed by Sanger sequencing (Molecular Cloning Laboratories).


Computing the ROC curve for ribosomal RNA. This analysis Plasmids are listed in Supplementary Table 2.
was completed as previously described, using the yeast ribosome For fluorescence measurements, HEK 293T cells were grown
crystal structure24 and the same considerations for solvent acces- as described and transfected with plasmids using TransIT-LT1
sibility and removal of outliers by 90% Winsorization4. (Mirus) 2 d before data collection. eGFP and mCherry fluo-
rescence were quantified using an LSR-II flow cytometer (BD
Secondary structure models. Novel secondary structure mod- Biosciences). Two plasmids for each type of mutation were assayed
els were generated using constraints derived from DMS-MaPseq for fluorescence, serving as biological duplicates.
data using RNAfold34. For FXR2, the sequence corresponding to
nucleotides 1–450, which comprise the 5′ UTR and first exon, Code availability. Our code is publically available at https://
were folded in RNAfold. Adenine and cytosine bases with an github.com/spersad94/DMS-MaP-Seq-Code.
in vitro ratiometric signal greater than the selected threshold
(which varied between 0.03 and 0.06, as specified in the relevant Cell lines. HEK 293T cells were obtained from ATCC.
figure legends) were required to be unpaired. Depending on the
threshold used, small differences exist in the predicted structure;
however, the 0.04 constraint threshold appears to produce the
43. Zubradt, M. et al. Genome-wide DMS-MaPseq for in vivo RNA structure
best fitting model for our experimental data. Because of the high determination. Protocol Exchange http://dx.doi.org/10.1038/
GC content of the FXR2 region (80% GC) and the necessity of protex.2016.068 (2016).
using a low-fidelity GC polymerase for these experiments, an 44. Zubradt, M. et al. Target-specific DMS-MaPseq for in vivo RNA structure
untreated control was used to mask ten positions with reactivity determination. Protocol Exchange http://dx.doi.org/10.1038/
protex.2016.069 (2016).
above background. DMS-MaPseq reactivities were overlaid on 45. Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.S. & Weissman, J.S.
structure models using VARNA (http://varna.lri.fr/)47. Genome-wide analysis in vivo of translation with nucleotide resolution
using ribosome profiling. Science 324, 218–223 (2009).
46. Adiconis, X. et al. Comparative analysis of RNA sequencing methods for
Cloning and transfection experiments. The plasmid construct degraded or low-input samples. Nat. Methods 10, 623–629 (2013).
in Supplementary Figure 5 was derived from the ∆ATG FXR2 47. Darty, K., Denise, A. & Ponty, Y. VARNA: interactive drawing and editing
exon1–eGFP–IRES–mCherry plasmid described in Fields et al.33. of the RNA secondary structure. Bioinformatics 25, 1974–1975 (2009).

doi:10.1038/nmeth.4057 nature methods

You might also like