Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

bioinform/Daedalus

Repository files navigation

Daedalus

Nextflow pipeline for analysis of libraries prepared using the ImmunoPETE assay.

Install and Configure

Note... The Nextflow Config file must be configured for the queue.

Software Requirements

  • Python 3.6
  • Java 8
  • Nextflow 19.07.0, to run the pipeline
  • UGE, for cluster job submission
  • bats 0.4.0, for testing

Download git repo

git clone git@github.com:bioinform/Daedalus.git
cd Daedalus
git checkout tags/${release-version}

Build Conda Environment (Optional)

It's recommended to create a conda environment:

conda create -n Daedalus python=3.6
conda activate Daedalus

Install

Within Daedalus directory, execute the following command.

pip install .

Build Docker images

Due to license restriction, you will have to build the Bcl2fastq image using the Docker file. Please refer to Dockerhub for creating repo and pushing images.

docker build -t {dockerhub_username}/bcl2fastq:{version} -f Dockerfile_bcl2fastq .
docker push {dockerhub_username}/bcl2fastq:{version}

Configure images

After building your own images, set the following params in the nextflow/defaults-ipete.config with your own images.

params.bcl2fastq_docker = "{dockerhub_username}/bcl2fastq:{version}"

Configure the pipeline

The pipeline runs on Roche SC1 computing cluster (UGE) by default. If you install it on a different machine, modify the cluster settings in the nextflow/nextflow.config accordingly. Please refer to Nextflow's documentation for more details: SGE/UGE, Docker.

ipete_docker {
    process.clusterOptions = { "-l h_vmem=${task.ext.vmem} -S /bin/bash -l docker_version=new -V" }
}
docker.runOptions = "-u=\$UID --rm -v /path/to/input_and_output:/path/to/input_and_output  -v /path/to/daedalus_repo:/path/to/daedalus_repo"

Test Pipeline on a single sample

Once all the software has been installed and nextflow has been configured the pipeline bats test can be run. The bats test runs the pipeline on a single sample, from the paired fastq files provided:

  • data/PBMC_1000ng_25ul_2_S6_R1_001.fastq.gz
  • data/PBMC_1000ng_25ul_2_S6_R2_001.fastq.gz

Run the test using the following commands:

cd test
bats single-sample-ipete.bats

Running Pipeline

Running the pipeline requires a complete flowcell worth of immunoPETE libraries.

Generate Manifest from Sample Sheet

manifestGenerator = /path/to/Daedalus/pipeline_runner/manifest_generator.py
illuminaDir = /path/to/illumina/run_folder
sampleSheet = /path/to/sampleSheet.csv

python ${manifestGenerator} \
       --pipeline_run_id Daedalus_example_run \
       --sequencing_run_folder ${illuminaDir} \
       --output Daedalus_example_manifest.csv \
       --subsample 1 \
       --umi_mode True \
       --umi2 'NNNNNNNNN' \
       --umi_type R2 \
       ${sampleSheet}

The manifest file contains all parameters needed for the pipeline to run. Sample specific tuning of parameters or any updates to the parameters can be achieved by editing the manifest file generated. After edits are complete, the pipeline can be submitted using the manifest file alone.

Submit Pipeline Run

Using the output from Manifest Generator Daedalus_example_manifest.csv pipeline runs can be submitted using the script: pipeline_runner.py.

pipelineRunner=/path/to/Daedalus/pipeline_runner/pipeline_runner.py
outDir=/path/to/analysis/output

python ${pipelineRunner} --no_fairshare --wait --resume -o ${outDir} Daedalus_example_manifest.csv

Output

At the specified output directory ${outDir}, the analysis folder will be written using the pipeline_run_id "Daedalus_example_run"

${outDir}/Daedalus_example_run

Workflow

workflow

Methods

Overview of the Pipeline Methods for key processing steps.