Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Julia Salzman

    Viruses may play an important role in the evolution of human microbial communities. Clustered regularly interspaced short palindromic repeats (CRISPRs) provide bacteria and archaea with adaptive immunity to previously encountered viruses.... more
    Viruses may play an important role in the evolution of human microbial communities. Clustered regularly interspaced short palindromic repeats (CRISPRs) provide bacteria and archaea with adaptive immunity to previously encountered viruses. Little is known about CRISPR composition in members of human microbial communities, the relative rate of CRISPR locus change, or how CRISPR loci differ between the microbiota of different individuals. We collected saliva from four periodontally healthy human subjects over an 11- to 17-mo time period and analyzed CRISPR sequences with corresponding streptococcal repeats in order to improve our understanding of the predominant features of oral streptococcal adaptive immune repertoires. We analyzed a total of 6859 CRISPR bearing reads and 427,917 bacterial 16S rRNA gene sequences. We found a core (ranging from 7% to 22%) of shared CRISPR spacers that remained stable over time within each subject, but nearly a third of CRISPR spacers varied between tim...
    More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new... more
    More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing...
    Precise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology. Here, we introduce SICILIAN, a new method that assigns statistical... more
    Precise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology. Here, we introduce SICILIAN, a new method that assigns statistical confidence to splice junctions from a spliced aligner to improve precision. SICILIAN's precise splice detection achieves high accuracy on simulated data, improves concordance between matched single-cell and bulk datasets, increases agreement between biological replicates, and reliably detects un-annotated splicing in single cells, enabling the discovery of novel splicing regulation.
    Next-generation sequencing enables measurement of chemical and biological signals at high throughput and falling cost. Conventional sequencing requires increasing sampling depth to improve signal to noise discrimination, a costly... more
    Next-generation sequencing enables measurement of chemical and biological signals at high throughput and falling cost. Conventional sequencing requires increasing sampling depth to improve signal to noise discrimination, a costly procedure that is also impossible when biological material is limiting. We introduce a new general sampling theory, Molecular Entropy encodinG (MEG), which uses biophysical principles to functionally encode molecular abundance before sampling. SeQUential DepletIon and enriCHment (SQUICH) is a specific example of MEG that, in theory and simulation, enables sampling at a logarithmic or better rate to achieve the same precision as attained with conventional sequencing. In proof-of-principle experiments, SQUICH reduces sequencing depth by a factor of 10. MEG is a general solution to a fundamental problem in molecular sampling and enables a new generation of efficient, precise molecular measurement at logarithmic or better sampling depth.
    The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here,... more
    The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce a new algorithm, DEEPEST, that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling ten-fold fewer false positive fusions in non-transformed human tissues. We leverage the increased precision of DEEPEST to discover new cancer biology. For example, 888 new candidate oncogenes are identified based on over-representation in DEEPEST calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs partners, demonstrating a previously unappreciated prevalence and potential for fun...
    The systematic identification of regulatory elements that control gene expression remains a challenge. Genetic screens that use untargeted mutagenesis have the potential to identify protein-coding genes, non-coding RNAs and regulatory... more
    The systematic identification of regulatory elements that control gene expression remains a challenge. Genetic screens that use untargeted mutagenesis have the potential to identify protein-coding genes, non-coding RNAs and regulatory elements, but their analysis has mainly focused on identifying the former two. To identify regulatory elements, we conducted a new bioinformatics analysis of insertional mutagenesis screens interrogating WNT signaling in haploid human cells. We searched for specific patterns of retroviral gene trap integrations (used as mutagens in haploid screens) in short genomic intervals overlapping with introns and regions upstream of genes. We uncovered atypical patterns of gene trap insertions that were not predicted to disrupt coding sequences, but caused changes in the expression of two key regulators of WNT signaling, suggesting the presence of cis-regulatory elements. Our methodology extends the scope of haploid genetic screens by enabling the identification...
    Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without... more
    Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome. We provide a formal definition of splice site ambiguity due to the genomic sequence by introducing a definition of equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. We show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear...
    Despite thorough analysis, the human transcriptome is incompletely annotated: some genes lack accurate transcriptional start sites and in some genes, splicing events have been missed. In this paper, we report a significant example of this... more
    Despite thorough analysis, the human transcriptome is incompletely annotated: some genes lack accurate transcriptional start sites and in some genes, splicing events have been missed. In this paper, we report a significant example of this incompleteness in both the promoter and splicing of ciRS-7, a highly expressed circRNA thought to be exceptional because it is transcribed from a locus lacking any mature linear RNA transcripts of the same sense. Using unbiased computational approaches, we have discovered that the human ciRS-7 exonic sequence is spliced into linear transcripts. Further, we use statistical approaches to discover that its promoter coincides with that of the long non-coding RNA, LINC00632. We validate this prediction using multiple experimental assays and show that the splicing of ciRS-7 into linear transcripts is conserved to mouse. Together, experimental and computational evidence argue that expression of ciRS-7 is primarily determined by epigenetic state of LINC006...
    The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing... more
    The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing (RNA-seq) data have been developed in the past few years, but there is little overlap in their predictions and no clear gold-standard method to assess the accuracy of these algorithms. We review sources of experimental and bioinformatic biases that complicate the accurate discovery of circRNAs and discuss statistical approaches to address these biases. We conclude with a discussion of the current experimental progress on the topic.
    The extent to which gene fusions function as drivers of cancer remains a critical open question. In principle, transcriptome sequencing provided by The Cancer Genome Atlas (TCGA) enables unbiased discovery of gene fusions and... more
    The extent to which gene fusions function as drivers of cancer remains a critical open question. In principle, transcriptome sequencing provided by The Cancer Genome Atlas (TCGA) enables unbiased discovery of gene fusions and post-analysis that informs the answer to this question. To date, such an analysis has been impossible because of performance limitations in fusion detection algorithms. By engineering a new, more precise statistical approach to analyzing fusions in TCGA data, we report new recurrent gene fusions, including those that could be druggable; new candidate pan-cancer oncogenes based on their profiles in fusions; and prevalent, previously overlooked, candidate oncogenic gene fusions in ovarian cancer, a disease with minimal treatment advances in recent decades. The novel and reproducible statistical algorithms and, more importantly, the biological conclusions open the door for increased attention to gene fusions as drivers of cancer and for future research into using ...
    Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE... more
    Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in ...
    Just a few years ago, it had been assumed that the dominant RNA isoforms produced from eukaryotic genes were variants of messenger RNA, functioning as intermediates in gene expression. In early 2012, however, a surprising discovery was... more
    Just a few years ago, it had been assumed that the dominant RNA isoforms produced from eukaryotic genes were variants of messenger RNA, functioning as intermediates in gene expression. In early 2012, however, a surprising discovery was made: circular RNA (circRNA) was shown to be a transcriptional product in thousands of human and mouse genes and in hundreds of cases constituted the dominant RNA isoform. Subsequent studies revealed that the expression of circRNAs is developmentally regulated, tissue and cell-type specific, and shared across the eukaryotic tree of life. These features suggest important functions for these molecules. Here, we describe major advances in the field of circRNA biology, focusing on the regulation of and functional roles played by these molecules.
    Pervasive expression of circular RNA is a recently discovered feature of eukaryotic gene expression programs, yet its function remains largely unknown. The presumed biogenesis of these RNAs involves a non-canonical ‘backsplicing’ event.... more
    Pervasive expression of circular RNA is a recently discovered feature of eukaryotic gene expression programs, yet its function remains largely unknown. The presumed biogenesis of these RNAs involves a non-canonical ‘backsplicing’ event. Recent studies in mammalian cell culture posit that backsplicing is facilitated by inverted repeats flanking the circularized exon(s). Although such sequence elements are common in mammals, they are rare in lower eukaryotes, making current models insufficient to describe circularization. Through systematic splice site mutagenesis and the identification of splicing intermediates, we show that circular RNA in Schizosaccharomyces pombe is generated through an exon-containing lariat precursor. Furthermore, we have performed high-throughput and comprehensive mutagenesis of a circle-forming exon, which enabled us to discover a systematic effect of exon length on RNA circularization. Our results uncover a mechanism for circular RNA biogenesis that may accou...
    An unexpectedly large fraction of genes in metazoans (human, mouse, zebrafish, worm, fruit fly) express high levels of circularized RNAs containing canonical exons. Here we report that circular RNA isoforms are found in diverse species... more
    An unexpectedly large fraction of genes in metazoans (human, mouse, zebrafish, worm, fruit fly) express high levels of circularized RNAs containing canonical exons. Here we report that circular RNA isoforms are found in diverse species whose most recent common ancestor existed more than one billion years ago: fungi (Schizosaccharomyces pombe and Saccharomyces cerevisiae), a plant (Arabidopsis thaliana), and protists (Plasmodium falciparum and Dictyostelium discoideum). For all species studied to date, including those in this report, only a small fraction of the theoretically possible circular RNA isoforms from a given gene are actually observed. Unlike metazoans, Arabidopsis, D. discoideum, P. falciparum, S. cerevisiae, and S. pombe have very short introns (∼ 100 nucleotides or shorter), yet they still produce circular RNAs. A minority of genes in S. pombe and P. falciparum have documented examples of canonical alternative splicing, making it unlikely that all circular RNAs are by-p...
    Explorations of human microbiota have provided substantial insight into microbial community composition; however, little is known about interactions between various microbial components in human ecosystems. In response to the powerful... more
    Explorations of human microbiota have provided substantial insight into microbial community composition; however, little is known about interactions between various microbial components in human ecosystems. In response to the powerful impact of viral predation, bacteria have acquired potent defences, including an adaptive immune response based on the clustered regularly interspaced short palindromic repeats (CRISPRs)/Cas system. To improve our understanding of the interactions between bacteria and their viruses in humans, we analysed 13 977 streptococcal CRISPR sequences and compared them with 2 588 172 virome reads in the saliva of four human subjects over 17 months. We found a diverse array of viruses and CRISPR spacers, many of which were specific to each subject and time point. There were numerous viral sequences matching CRISPR spacers; these matches were highly specific for salivary viruses. We determined that spacers and viruses coexist at the same time, which suggests that streptococcal CRISPR/Cas systems are under constant pressure from salivary viruses. CRISPRs in some subjects were just as likely to match viral sequences from other subjects as they were to match viruses from the same subject. Because interactions between bacteria and viruses help to determine the structure of bacterial communities, CRISPR-virus analyses are likely to provide insight into the forces shaping the human microbiome.