CRISPR-based genome editing technologies have revolutionised the field of molecular biology, offe... more CRISPR-based genome editing technologies have revolutionised the field of molecular biology, offering unprecedented opportunities for precise genetic manipulation. However, off-target effects remain a significant challenge, potentially leading to unintended consequences and limiting the applicability of CRISPR-based genome editing technologies in clinical settings. Current literature predominantly focuses on point predictions for off-target activity, which may not fully capture the range of possible outcomes and associated risks. Here, we present CRISPAI, a hybrid multitask neural network architecture approach for predicting uncertainty estimates for off-target cleavage activity, providing a more comprehensive risk assessment and facilitating improved decision-making in single guide RNA (sgRNA) design and experimental optimization. Our approach makes use of the count noise model Zero Inflated Negative Binomial (ZINB) to model the uncertainty in the off-target cleavage activity data....
The 5’ untranslated region (5’ UTR) of the messenger RNA plays a crucial role in the translatabil... more The 5’ untranslated region (5’ UTR) of the messenger RNA plays a crucial role in the translatability and stability of the molecule. Thus, it is an important component in the design of synthetic biological circuits for high and stable expression of intermediate proteins. Several UTR sequences are patented and used frequently in laboratories. We present a novel model UTRGAN, a Generative Adversarial Network (GAN)-based model designed to generate 5’ UTR sequences coupled with an optimization procedure to ensure a target feature such as high expression for a target gene sequence or high ribosome load and translation efficiency. We rigorously analyze and show that the model can generate sequences that mimic various properties of natural UTR sequences. Then, we show that the optimization procedure yields sequences that are expected to yield (i) 61% higher average expression (up to 5-fold) on a set of target genes, (ii) 53% higher mean ribosome load on average (up to 2-fold for the best 5’...
Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders.... more Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named ECOLE. Based on a variant of the transformer architecture, the model learns to call CNVs per exon, using high-confidence calls made on matched WGS samples. We further train and fine-tune the model with a small set of expert calls via transfer learning. We show that ECOLE achieves high performance on human expert labeled data for the first time with 68.7% precision and 49.6% recall. This corresponds to precision and recall improvements of 18.7% and 30.8% over the next best-performing methods, respectiv...
Accurate and efficient detection of copy number variants (CNVs) is of critical importance due to ... more Accurate and efficient detection of copy number variants (CNVs) is of critical importance due to their significant association with complex genetic diseases. Although algorithms working on whole genome sequencing (WGS) data provide stable results with mostly-valid statistical assumptions, copy number detection on whole exome sequencing (WES) data has mostly been a losing game with extremely high false discovery rates. This is unfortunate as WES data is cost efficient, compact and is relatively ubiquitous. The bottleneck is primarily due to non-contiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data and learns to correct the copy number variations reported by any over-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that (i) we can efficientl...
CRISPR-based genome editing technologies have revolutionised the field of molecular biology, offe... more CRISPR-based genome editing technologies have revolutionised the field of molecular biology, offering unprecedented opportunities for precise genetic manipulation. However, off-target effects remain a significant challenge, potentially leading to unintended consequences and limiting the applicability of CRISPR-based genome editing technologies in clinical settings. Current literature predominantly focuses on point predictions for off-target activity, which may not fully capture the range of possible outcomes and associated risks. Here, we present CRISPAI, a hybrid multitask neural network architecture approach for predicting uncertainty estimates for off-target cleavage activity, providing a more comprehensive risk assessment and facilitating improved decision-making in single guide RNA (sgRNA) design and experimental optimization. Our approach makes use of the count noise model Zero Inflated Negative Binomial (ZINB) to model the uncertainty in the off-target cleavage activity data....
The 5’ untranslated region (5’ UTR) of the messenger RNA plays a crucial role in the translatabil... more The 5’ untranslated region (5’ UTR) of the messenger RNA plays a crucial role in the translatability and stability of the molecule. Thus, it is an important component in the design of synthetic biological circuits for high and stable expression of intermediate proteins. Several UTR sequences are patented and used frequently in laboratories. We present a novel model UTRGAN, a Generative Adversarial Network (GAN)-based model designed to generate 5’ UTR sequences coupled with an optimization procedure to ensure a target feature such as high expression for a target gene sequence or high ribosome load and translation efficiency. We rigorously analyze and show that the model can generate sequences that mimic various properties of natural UTR sequences. Then, we show that the optimization procedure yields sequences that are expected to yield (i) 61% higher average expression (up to 5-fold) on a set of target genes, (ii) 53% higher mean ribosome load on average (up to 2-fold for the best 5’...
Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders.... more Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named ECOLE. Based on a variant of the transformer architecture, the model learns to call CNVs per exon, using high-confidence calls made on matched WGS samples. We further train and fine-tune the model with a small set of expert calls via transfer learning. We show that ECOLE achieves high performance on human expert labeled data for the first time with 68.7% precision and 49.6% recall. This corresponds to precision and recall improvements of 18.7% and 30.8% over the next best-performing methods, respectiv...
Accurate and efficient detection of copy number variants (CNVs) is of critical importance due to ... more Accurate and efficient detection of copy number variants (CNVs) is of critical importance due to their significant association with complex genetic diseases. Although algorithms working on whole genome sequencing (WGS) data provide stable results with mostly-valid statistical assumptions, copy number detection on whole exome sequencing (WES) data has mostly been a losing game with extremely high false discovery rates. This is unfortunate as WES data is cost efficient, compact and is relatively ubiquitous. The bottleneck is primarily due to non-contiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data and learns to correct the copy number variations reported by any over-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that (i) we can efficientl...
Uploads
Papers by furkan özden