A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE
Figure 1
(a) Error detection capabilities of Score, Reference-genome, and DRISEE methods.
(1) Simplified procedural diagram of a typical sequencing protocol. Sample collection: First, the biological sample is collected, Extraction/Initial purification: Then the RNA/DNA undergoes extraction and initial purification procedures, Pre-sequencing amplification(s): Next, the extracted genetic material may undergo amplification (e.g. whole genome amplification – see main text) followed by additional purifications and/or other processing procedures, “Sequencing”: Genetic material is placed in the sequencer itself, and is sequenced. Note that sequencing itself frequently involves additional rounds of amplification, Analyses of sequencing output: Sequencer outputs are analyzed. (2) Given a procedure such as A, the portion of the procedure over which score/Phred-based methods can detect error is indicated in red. (3) Given a procedure such as A, the portion of the procedure over which reference-genome-based methods can detect error is indicated in green. Note that reference-genome-based methods are only applicable to single genome data; they cannot consider metagenomic data. (4) Given a procedure such as A, the portion of the procedure over which DRISEE-based methods can detect error is indicated in blue. Note that DRISEE methods can be applied to metagenomic or genomic data, provided that certain requirements are met. See methods. 1: BMC Bioinformatics. 2008 Sep 19;9:386. 2: Nat Methods. 2010 May;7(5):335–6. Epub 2010 Apr 11. (b) DRISEE workflow The steps in a typical DRISEE workflow are depicted and briefly described (in figure captions). Please see Text S1 (Supplemental Methods, Typical DRISEE workflow) for a much more detailed description of each depicted step.