Download the source code from github.
git clone git@github.com:Lav-i/GNNImpute.git
cd GNNImpute
Create a python virtual environment and install the required packages. If your device is cuda available, you can choose to use torch with gpu.
conda create -n gnnimpute python=3.6
conda activate gnnimpute
pip install -r requirements.txt
Build from Dockerfile or download docker image from docker hub.
docker pull razzil/gnnimpute:v0.1.2
docker run --gpus all --rm -it razzil/gnnimpute:v0.1.2
The benchmark data set has been provided in the docker image.
- MOUSE Embryo Stem Cells Klein
- File: GSE65525_RAW.tar
- Matrix size: (2717, 24175)
- Matrix size(after preprocess):(2713, 24021)
- Clusters:4
wget https://www.ncbi.nlm.nih.gov/geo/download/\?acc\=GSE65525\&format\=file -O ./data/Klein/GSE65525_RAW.tar
tar xvf ./data/Klein/GSE65525_RAW.tar -C ./data/Klein
- Human Frozen PBMCs (Donor A) 10X
- File:Gene / cell matrix (filtered)
- Matrix size: (2900, 32738)
- Matrix size(after preprocess):(2843, 13003)
- Clusters: ?
wget https://cf.10xgenomics.com/samples/cell-exp/1.1.0/frozen_pbmc_donor_a/frozen_pbmc_donor_a_filtered_gene_bc_matrices.tar.gz -O ./data/PBMC/frozen_pbmc_donor_a_filtered_gene_bc_matrices.tar.gz
tar xvf ./data/PBMC/frozen_pbmc_donor_a_filtered_gene_bc_matrices.tar.gz -C ./data/PBMC
mv ./data/PBMC/filtered_matrices_mex/hg19/* ./data/PBMC
Process Klein data set into standard format.
python ./data/Klein/preprocess.py
Process PBMC data set into standard format.
python ./data/PBMC/preprocess.py
Output file (./data/{name}/processed/{name}.h5ad) is the filtered expression matrix, the file format is h5ad.
Mask Klein data set.
python ./data/mask.py --masked_prob=0.1 --dataset=Klein
Mask PBMC data set.
python ./data/mask.py --masked_prob=0.1 --dataset=PBMC
Output folder (./data/{name}/masked/) contains the main output file (representing the masked expression matrix) in h5ad and csv formats. And the file in npz format indicates the location of the dropout event.
import scanpy as sc
from GNNImpute.api import GNNImpute
adata = sc.read_h5ad('./data/Klein/masked/Klein_01.h5ad')
adata = GNNImpute(adata=adata,
layer='GATConv',
no_cuda=False,
epochs=3000,
lr=0.001,
weight_decay=0.0005,
hidden=50,
patience=200,
fastmode=False,
heads=3,
use_raw=True,
verbose=True)
Output variable (adata) contains the main output file (representing the imputed expression matrix) in AnnData format.
For more details, please see to Example File.