Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Published March 30, 2022 | Version v1
Software Open

A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules

  • 1. University of Oxford
  • 2. University of Toronto

Description

Summary:

Associated code, python notebooks, and data for the manuscript entitled 'A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules' (published in PLoS Computational Biology; doi: 10.1371/journal.pcbi.1010028).

Stage 1 peaBrain:

In Stage 1, we constructed a single model to predict the mean abundance of all genes in any given tissue from the reference genome, optionally annotated with epigenetic and genomic annotations. We applied this framework to all tissues from the GTEx dataset, constructing three classes of models: (a) using DNA sequence alone (class-A); (b) using DNA plus epigenomic annotations not specific to any tissue or cell type (i.e. non-specific annotations) (class-B); and (c) using DNA combined with both non-specific and tissue-specific annotations (class-C). We have provided all code and data necessary to generate the results for class-A and class-B models. Due to storage constraints, we provide training/test data only for skeletal muscle. Expression data for other tissues is available from GTEx. The original data sources used to train class-C models are detailed in the manuscript.

Using the Stage 1 class-B models, we generated a non-coding impact metric that captured the impact of each position in the core promoter sequence on the expression of each gene. The peaBrain impact scores for all GTEx tissues have been made available. In the manuscript, we show that this impact score correlates with nucleotide evolutionary constraint and is also predictive of disease-associated variation and allele-specific transcription factor binding. We also highlight how tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals.

Stage 2 peaBrain:

In Stage 2, we extended the peaBrain model to incorporate the transcriptomic consequences of individual genotype variation. In the manuscript, we describe the ability of this extended peaBrain model to predict the tissue-specific expression profile of each individual and to identify putatively functional variants within the sequence. Sample code has been provided. Individual level data is available from GTEx. 

Files

Files (988.4 MB)

Name Size Download all
md5:e494f1813859e878c5b3c46260a7e29d
987.7 MB Download
md5:e1ad434edd5f59e0c016b6e45045d69d
328.1 kB Download
md5:fe5d7bdd0b3c906dbdcea0f9e87747cb
3.9 kB Download
md5:3691f57761a0e7156c451a11752df354
355.4 kB Download

Additional details

Related works

Is supplement to
Journal article: 10.1371/journal.pcbi.1010028 (DOI)