Next Generation Sequencing Data Analysis

The document discusses Next Generation Sequencing (NGS) technology, highlighting its advantages over traditional Sanger sequencing, including higher throughput and lower costs. It outlines the NGS workflow, which includes library preparation, sequencing, and data analysis phases, emphasizing the importance of bioinformatics in converting raw data into meaningful information. Additionally, it details various NGS platforms, their characteristics, and the processes involved in data pre-processing and genome assembly.

Uploaded by

Saima Khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Next Generation Sequencing Data Analysis

Uploaded by

Saima Khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Next Generation Sequencing Data Analysis

Ranjeev Hari, Perdana University, Selangor, Malaysia

Suhanya Parthasarathy, Perdana University-Royal College of Surgeons Ireland, Perdana University, Selangor, Malaysia
r 2018 Elsevier Inc. All rights reserved.

Introduction

The era of next generation sequencing (NGS) began following the first reports of this innovative technique in 2005 (Margulies
et al., 2005; Shendure and Ji, 2008). The high-throughput NGS technology, which uses parallel amplification and sequencing
yields shorter read lengths, giving an average raw error rates of 1%–1.5% (Shendure and Ji, 2008) when compared to conventional
Sanger sequencing protocols, which can generate to 1000 bp of 99.999% per base accuracies. While automation attempts of
traditional dideoxy DNA sequencing by Sanger method improved the efficiency of DNA sequencing; however, in terms of cost and
time, NGS technology was regarded as superior. An early method called massively parallel sequencing (MPS) that was introduced
by Lynxgen Therapeutics set the stage for high throughput sequencing (Brenner et al., 2000). The first NGS machine, the GS20, that
was made available for researchers in 2005 by 454 Life Sciences (Basel, Switzerland) is based on large-scale parallel pyr-
osequencing by microbeads in micro-droplets of water in oil emulsion (Henson et al., 2012).
The major NGS technology platforms in the market for whole-genome sequencing are primarily from brand names such as
Illumina, Roche 454, Solid and IonTorrent. Each platform has its advantages and disadvantages and cost implications in terms of
reliability, time and money (Table 1). Depending on each NGS platform, they offer values that may be attractive for specific
purposes. For instance, the Ion Torrent is often positioned as a general purpose sequencer as well as in diagnostic protocols due to
the quicker turnaround time (Tarabeux et al., 2014). However, longer reads technology offered by Illumina and Roche are
desirable but the cost involved for Roche 454 FLX is very steep rendering it impractical for large scale genome projects. Reads from
Pacific Bioscience machine, PacBio, are generally not used in large genome projects for direct sequencing but can be useful in
resolving repetitive regions and ambiguous regions due to capability of generating very long read lengths (Table 1).

NGS Process Workﬂow

The common processes involved in general NGS platforms are library preparation, library amplification, and sequencing
(Figure 2.3). The starting material for library preparation can be from either RNA or DNA (genomic source or PCR-amplified). In
the case of RNA, it has to be transcribed into cDNA because as of now, NGS machines sequence only DNA directly. Since target
library molecules sequenced on each NGS platform are required to be in specific lengths, genomic DNA requires fractionation and
size selection, which is performed by sonication, nebulization, or enzymatic techniques followed by gel electrophoresis and
excision. For instance, Illumina NGs platform’s standard fragment size is in the range of 300 and 550 bp including adapters.
Generally, libraries are built by adding NGS platform-specific DNA adapters to the DNA molecules. These adapters facilitate the
binding of the library fragments to a surface such as a microbead (454, Ion PGM, SOLiD) or a glass slide (Illumina, SOLiD).
However, depending on the specific NGS platform, the library construction step has to be customised to fit the sequencing
protocol. Generally, DNA library construction directly depend on its applications and can be divided mainly into fragment
libraries and mate-paired libraries. In fragment libraries, target genomic sequences are fragmented to smaller sizes typically up to 5
times the NGS platform’s read length capabilities. Subsequently, sequencing adapters are attached to these fragments allowing the
NGS platform to sequence from the adaptor tags. Typically, fragment libraries are of single end and paired-end. With adaptor tags
at both forward and reverse sites, NGS platforms are able to sequence from both ends in paired-end libraries. The applications of
fragment libraries are mainly for variant calling, copy number detection and genome reconstruction. While other types of NGS

Table 1 Comparison between different NGS platforms

454 FLX HiSeq2000 Solid 5500XL IonTorrent (318 Chip) PacBio

Company Roche Illumina Life technologies Life technologies Paciﬁc biosciences

Nucleotides/Run 700 Mbp 540–600 Gbp 180 Gbp 800 Mbp 0.5–1 Gbp
Maximum read length 700 bp 2 100 bp 75 þ 35 bp 200 bp 10 kbp–440 kbp
Pairs 2 150 bp 2 150 bp 2 60 bp N/A
Run time 23 hours 11 days 12–16 days 4.5 hours 30 m–4 hours
Reagent cost per Mbp (USD) 7 0.04 0.07 1 0.13–0.60
Advantages –Read length –High throughput –Low cost/base –Less expensive –Read length
–Fast –Read length –Accuracy –Fast –Fast
Disadvantages –Expensive –High DNA concentration –Slow –Homopolymer errors –Error rate
–Homopolymer errors –Short reads –Palindromic errors
–Low throughput –Short reads
sequencing techniques may exist, the fundamental principles are similar in fragment library construction. Other techniques may
involve intermediary steps such as fragment capture, immune-hybridizations or reverse transcription.
In mate-pair library construction, target DNA sequences are fragmented to size more than 1000 bp that are often longer than
the read length capacities of the NGS platforms. The required fragment size is excised from a gel electrophoresis run of the sheared
DNA corresponding to the intended library size usually 2 kbp, 5 kbp or 10 kbp. These isolated DNA fragments are circularised by
ligation through addition of biotinylated adapters to the ends to promote sequence specific ligation instead of ligating across
fragments. Once again, the circularised DNA are fragmented and biotinylated fragments are purified by affinity capture.
Sequencing adapters are added to the size selected sequences. The manipulation preformed during the library construction step
allows identification of the terminal sequences of the circularised DNA by sequencing. Having known the sequence length a priori,
computational algorithms can help differentiate large scale genomic rearrangements from a reference sequence besides typically
being used to guide the scaffolding process in a genome assembly.
In the sequencing step, having known adaptor sequences allows the amplification of library fragments either by emulsion
PCR (emPCR) or bridge PCR which are methods specific to the NGS platforms. Commonly, irrespective of platforms, NGS
exploits parallelism in a factor of ten thousand to billions of library fragments. Generally, this is achieved by reiterated cycles
of nucleotide addition using DNA polymerases or ligases (SOLiD), detection of incorporated nucleotides and washing steps.
The extensive washing and repetitive steps, albeit automatic, may thus take sequencing to complete from hours to days. Base
calling is the process of algorithmically deciding the incorporated nucleotide from the signal intensities that are detected
during sequencing process.
Every NGS platform relies on slightly different strategies to generate and detect signals (Luo et al., 2012; Merriman et al.,
2012; Ronaghi et al., 1998). For instance, the chemistry used by Illumina and SOLiD sequencing is similar to the principles of
sequencing by synthesis. Using four separate fluorescent-labelled nucleotides, its flushed over the glass slide and fluorescence
signals are repetitively captured when nucleotides become incorporated to target sequences. In contrast, flooding of non-
labelled nucleotides in 454 and Ion PGM sequencing are detected by pyrophosphate or proton release as part of its unique
chemistry. Typically, the proton release is measured as a pH change by the semiconductor chip of the Ion Torrent instrument
(Merriman et al., 2012). On the other hand, pyrophosphate signal in the 454 system relies on chemistry of the enzyme
luciferase to induce light signals (Ronaghi et al., 1998). Altogether, these signals respective to each platform will be converted
into basecalls and converted into raw sequencing data that is generally output in FASTQ format that includes the basecall
qualities along with the DNA sequence.
While many platforms exist, markedly there are many advantages of NGS, compared to the conventional Sanger sequencing,
such as: (i) NGS permits a considerably higher degree of parallelism than Sanger’s conventional capillary-based sequencing.
(ii) Removal of bottlenecks and limitations in older methods (E. coli transformation and colony picking) by in vitro construction of
a sequencing library, followed by in vitro clonal amplification further enforcing parallelism in the workflow (iii) NGS system
usually uses reagents that operate on immobilized target on arrays in microliter quantities, which costs are gained back over the
full set of sequencing features on the array because the volume per feature are in the range of picoliters to femtoliters. Overall,
these advantages translate into intensely lower costs for DNA sequence production using NGS technologies (Shendure and Ji,
2008). Therefore, in any NGS projects, the high-throughput nature of the method is favoured (Figs. 1 and 2).

Typical Data Analysis Workﬂow for NGS

NGS analysis utilize bioinformatics approaches in order to convert signals from the machine to meaningful information that
involve signal conversion to data, annotations or catalogued information, and actionable knowledge. Primarily, the NGS
bioinformatics analysis is divided into three distinct phases as primary, secondary and tertiary analyses. In the primary analyses,
raw data from the sequencers are converted into nucleotide base and short read data. Secondary analyses apply detailed bioin-
formatics methodology speciﬁc to the NGS technique that was employed which may involve read alignments or read assembly.
Usually, secondary analyses have the most complex analyses workﬂow and is usually run in a sequence as a pipeline. Besides,
depending on the type of NGS technique that are employed, the analysis pipeline may differ greatly. For example, RNA-seq data,
which characterizes transcriptomes, differ in secondary analysis approach compared to ChIP-seq data, which investigates genome-
wide epigenetic mechanisms. Lastly, by tertiary analyses, the previous results obtained can be associated and understood in a
biological context. However, tertiary bioinformatics analyses can be an iterative process that involve rigorous statistical and
computational biology methods.

Sequence Generation

The initial primary analysis is usually transient as the sequencing machines detectors receives the signals obtained from the high
throughput reactions. Thus, the base-calling and recording process is tightly integrated with the sequencing instruments resulting
in quality scores corresponding to the short reads nucleotide sequence being output parallelly. The primary analysis software is
installed by machine vendors on the workstation supporting the sequencing instrument. The software can be run in high-
performance cluster systems for faster results. Besides converting raw signals to base calls, some software tools include demulti-
plexing of multiple samples into a single pooled and indexed run (Dodt et al., 2012).
Fig. 1 Schematic diagram of the process involved in common NGS platforms. Adapted from Knief, C., 2014. Analysis of plant microbe interactions
in the era of next generation sequencing technologies. Plant Genet. Genom. 5, 216. Available at: https://doi.org/10.3389/fpls.2014.00216.

NGS Raw Data Pre-Processing

As described earlier, NGS platforms suffer from higher error rates compared to Sanger sequencing (Nakamura et al., 2011;
Shendure and Ji, 2008). However, as part of primary analysis, different approaches and algorithms have been developed to
compensate and spot these errors (Margulies et al., 2005). Moreover, bases that have inaccuracy less than 0.1% can be carefully
chosen algorithmically. As a simple approach, error-rates can be decreased by performing the DNA sequencing with high coverage,
of at least 20–60-fold, depending on the sequencing project’s goal (Luo et al., 2012; Margulies et al., 2005; Voelkerding et al.,
2009). Notably, each sequencing read can be categorised as a distinct genotype but in fact could be the result of sequencing error.
Thus, it is very important to use established methods in differentiating these two causes of variation as it may lead to inaccurate
results when ﬂawed.
Fig. 2 Workﬂow of NGS data analysis in three phases: primary, secondary and tertiary. The tertiary phase of Comparison and Discovery is
indicated as an iterative process.

To improve the quality of the data after base-calling, Phred-based filtering algorithms can be used to filter or remove low
quality sequencing reads (Margulies et al., 2005). These filters discard reads with low-quality, uncalled, and ambiguous bases
besides clipping the lower quality 30 -ends of reads. All such filters use the quality information contained in the FASTQ file that are
computed by the NGS platform at each base during the base calling procedure. Previous studies (Minoche et al., 2011) have shown
the effect of different filtering approaches on Illumina data and suggests that it can reduce error rates to less than 0.2% by
eliminating around 15%–20% of the low-quality bases, mostly via 30 -end trimming that are prone to errors. Another study has
supported the findings that a 5-fold decrease of error rate can be observed by applying a filter (Phred score of Q30, with 0.1%
likelihood of a false basecall) that eliminated reads with low quality bases (Nguyen et al., 2011). It may be useful to note that low
quality bases are sometimes localised in specific regions of a genome. It is important to note that removal of these reads may
introduce potential bias in the quantitative studies undertaken (Minoche et al., 2011; Nakamura et al., 2011). Therefore, read
clipping strategy can be used to remove the erroneous bases from the left or right edges of the reads without filtering the whole
reads in order to address errors that are usually present in the reads edges alone.
Apart from read clipping and filtering methods, several error correction tools (e.g., Coral, HiTEC, Musket, Quake, RACER,
Reptile, or SHREC) could be used as a complementary strategy to reduce sequencing error rates in reads (Knief, 2014). Generally,
these error correction methods make use of high sequencing coverage in order to identify and correct errors using the laws of
probability and statistics. Moreover, these algorithms often consider quality scores of the examined bases besides looking at
neighbouring base quality values. For instance, some of these tools are able to correct substitution errors in Illumina sequencing
data (Ilie and Molnar, 2013; Liu et al., 2013; Yang et al., 2010) while others (Coral, HSHREC, KEC, and ET) are designed to include
indel correction algorithms that are available for the analysis of Roche’s 454 and IonTorrent data (Salmela, 2010; Salmela and
Schröder, 2011; Skums et al., 2012). The relevance of error correction tools is seen as a very useful strategy in de novo genome
sequencing, resequencing and amplicon sequencing projects with benefits ranging from finding more optimal assembly in the
DBG and reducing overall memory footprint to perform the assembly stage (Skums et al., 2012; Yang et al., 2010).

Genome Assembly

After the pre-processing of sequence reads, we can assemble the pre-processed reads into contigs. The process of assembly of
sequencing reads generated from NGS technology involves the reduction of redundant data by contiguously placing reads by
overlapping them adjacent to each other in an optimal way (Miller et al., 2010). Instead of reads, when contigs (assembled set of
reads) undergo the previously described process with long length information, it is known as scaffolding. In other words, it is a
process of reconstructing the target as such to groups reads into contigs and contigs into scaffolds.
Generally, the size and accuracy of the contigs and scaffolds are important statistics in genome assemblies (Miller et al., 2010). The
quality of genome assemblies is usually described by maximum length, average length, combined total length, and N50. The contig N50
is the length of the smallest contig in the set that contains the fewest (of the largest) contigs whose joint length represents at least 50% of
the assembly (Miller et al., 2010). Generally, larger N50 values imply a higher quality genome assembly that describes lesser overall
fragments. Typical high-coverage genome projects have N50 values that range in megabases; however, they are dependent on the genome
size and is not a good measure to compare between unrelated assemblies instead of the same. Assembly accuracy is tough to quantity.
Nevertheless, mapping the assembled contigs/scaffolds to reference genomes is useful to examine its quality if the references exist.
As outlined earlier, an assembly is an ordered data construction that maps the sequencing data to a supposed reconstruction of
the target (He et al., 2013). Contigs are reconstructed sequences from sequence alignment of reads which give rise to a consensus
sequence. The scaffolds is a higher order organisation of sequences which deﬁne the contig order and orientation and the sizes of
the gaps between contigs. The scaffolds represent more contiguous sequences mimicking the physical genome composition.
Scaffold sequences could have N's in the gaps between contigs. The number of consecutive N's may show the gap length estimate
during the assembly process based on the bridging of mate pair reads (Miller et al., 2010).
There are many well-established software for assembling sequencing reads into contigs/scaffolds. In general, these genome
assemblers can be grouped into three categories based on their approaches (Miller et al., 2010): (1) The Overlap/Layout/Consensus
(OLC) approaches depend on an overlap graph; (2) the de Bruijn Graph (DBG) use some form of k-mer graph; and, (3) the greedy
graph algorithms can use OLC or DBG (Table 2).
There are many factors that need to be considered when choosing the most appropriate genome assembler especially when
considering for large whole genome sequencing project. Among these factors are the choice of algorithm, compatibility with the
NGS platform, the support of the assembly of large genomes, and parallel-computing support for speeding-up the assembly
(Zerbino and Birney, 2008). Generally, the choice of algorithm and software will directly determine the memory requirements and
speed of assembly. In general, DBG assemblers are faster but require large amount of memory compared to OLC assemblers.

Read Mapping

Whenever a reference genome becomes available, instead of a de novo assembly strategy, reads are mapped or aligned to the reference
genome prior to subsequent analysis steps. The goal of mapping is to realign the vast number of reads back to the respective regions it
likely originated from. The mapping of the reads to the reference genome typically involves the alignment of millions of short reads
to the genome using fast algorithms. The algorithms are able to function parallelly while taking into account mutations such as
polymorphisms, insertions and deletions in order to produce the alignment. In well-known aligners, for example BLAST, the
individual query sequence is searched against a reference database using hash tables and seed and extend approaches. With NGS data,
often similar methods are adapted to scale the alignment of short query sequences that are in the millions against a single reference
genome of large sequences. Advances in mapping algorithms using various other techniques has improved alignment speed while
reducing memory and space requirements. Examples of mapping software that are well known and used in NGS data includes
SOAP2 (Short Oligonucleotide Alignment Program), BWA (Burrow-Wheels Alignment), NovoAlign and Bowtie 2.
The widely used format of storing mapping information of the reads to the genome is SAM (Sequence Alignment Map) format
or its compressed binary form called BAM. While the BAM file is smaller and optimized for machine reading, the SAM file is
human readable albeit slower for computer operations. There are 11 mandatory fields in the SAM format specification. Com-
monly, SAMtools software is used to manipulate and read both BAM and SAM formats.
Typically, mapping of reads to the reference genome is followed by collecting data about the mapping statistics. The summary
statistics that is mainly of interest is the percentage of aligned reads, or the mapping rate. The mapping rate of reads to the reference
genome are usually only 60%–75%. Besides the limitation due to the intrinsic properties of NGS data and technique that it was
generated from, the inability to map to the reference genome can be ascribed to challenging regions in the genome, such as repeat
rich regions, that aligners are not able to map to. Moreover, short read lengths in most high throughput NGS technology limits the
alignment in mapping to span a small region hence limiting its coverage to convenient areas of the genome. Besides that, limitations
such as NGS sequencing error, algorithmic robustness, mutational load and variation contributes to the low mapping rate.
The mapping file generated can be further inspected by region in-depth using visualization tools such as genome viewers which
plot pileups (the stacked alignment of the reads). Visualization of reads mapped, for instance, can be important in diagnosing
problems in read alignment in certain regions, detecting duplicates, and visualizing variations. Commonly used genome browsers
that enable the reading of SAM files include Integrated Genome Viewer (IGV), and Tablet while some web-based browser that
achieve similar visualizations include JBrowse, NGB and UCSC genome browser.

Table 2 Summary of assembly software used for de novo assembly of genomes

Assembler Algorithm Preferred data Multithreading Target genome

SGA OLC Illumina Yes Large

SOAPdenovo2 DBG Illumina Yes Large
ALLPATHS-LG DBG Illumina No Large
CLC DBG Mixed No Large
SSAKE, SSHARCGS and VCAKE Greedy Illumina No Small
Edena OLC Illumina No Small
Newbler OLC 454 No Large
CABOG OLC Mixed No Large
Euler DBG 454 þ Sanger No Small
Velvet DBG Illumina No Small
ABySS DBG Illumina Yes Large
Tertiary Analysis

The tertiary process of analysing NGS data can be quite diverse depending on the scenario and context of a study. Generally, the
reads that are representative of the underlying annotations will characterize the functional aspects of the study. As such, the
corresponding statistics used are usually descriptive. In other cases, where a comparison is being made to a reference or a control,
rigorous statistical tests are employed taking into account read counts in the target regions representative in each treatment groups.
Applying statistical models to identify bias, accounting covariates and testing for signiﬁcant difference are common steps in
comparative analysis. The resulting outcomes of the analysis may provide a collection of target annotations or genes. Such a gene
list can be subsequently analysed for enrichment in regards to gene ontology (GO) terms to infer the collective function, biological
process and cellular compartmentalization. With such a gene list, the pathways being affected can also be mapped to grasp a better
biological understanding of the process. In other derivative NGS techniques such as ChIP-seq or Hi-C, tertiary analysis may
additionally involve deriving proﬁles that are commonly occurring in the interactions being observed. Thus, new motifs, structural
interactions and regulatory signals that are being generated for a particular condition can be characterized. NGS techniques when
applied to a population could derive meaningful interaction of evolutionary forces besides characterizing the differences in genetic
composition either in terms of polymorphisms when observing the same species or taxonomy when studying metagenomics.

Conclusion

Approaches to NGS data analysis are diverse and dependant on the technology and methods being employed. Nevertheless,
among the multiple stages that are involved in NGS data analysis, steps in primary and secondary analysis is generally a
prerequisite in all NGS projects. Therefore, primary and secondary analyses must be carefully performed in order to prevent errors
being carried over to tertiary analysis. In tertiary analysis, insights can be generated from the inclusion of annotation, network, and
interaction information from external databases to expand on gene lists and proﬁles found in earlier steps. Collectively, these steps
are required frequently, hence, independent bioinformatics labs create analysis pipelines for their in-house routine analyses.
However, as NGS technology grows and improves, the methods for analyses may require further evaluation and integration with
latest technologies.

Reference

Brenner, S., Johnson, M., Bridgham, J., et al., 2000. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18,
630–634. Available at: https://doi.org/10.1038/76469.
Dodt, M., Roehr, J.T., Ahmed, R., Dieterich, C., 2012. FLEXBAR – Flexible barcode and adapter processing for next-generation sequencing platforms. Biology 1, 895–905.
Available at: https://doi.org/10.3390/biology1030895.
Henson, J., Tischler, G., Ning, Z., 2012. Next-generation sequencing and large genome assemblies. Pharmacogenomics 13, 901–915. Available at: https://doi.org/10.2217/
pgs.12.72.
He, Y., Zhang, Z., Peng, X., Wu, F., Wang, J., 2013. De novo assembly methods for next generation sequencing data. Tsinghua Sci. Technol. 18, 500–514. Available at:
https://doi.org/10.1109/TST.2013.6616523.
Ilie, L., Molnar, M., 2013. RACER: Rapid and accurate correction of errors in reads. Bioinformatics 29, 2490–2493. https://doi.org/10.1093/bioinformatics/btt407.
Knief, C., 2014. Analysis of plant microbe interactions in the era of next generation sequencing technologies. Plant Genet. Genom. 5, 216. Available at: https://doi.org/10.3389/
fpls.2014.00216.
Liu, Y., Schröder, J., Schmidt, B., 2013. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29. https://doi.org/10.1093/
bioinformatics/bts690.
Luo, C., Tsementzi, D., Kyrpides, N., Read, T., Konstantinidis, K.T., 2012. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial
community DNA sample. PLOS ONE 7, e30087. Available at: https://doi.org/10.1371/journal.pone.0030087.
Margulies, E.H., Maduro, V.V.B., Thomas, P.J., et al., 2005. Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme
genomes. Proc. Natl. Acad. Sci. USA 102, 3354–3359. Available at: https://doi.org/10.1073/pnas.0408539102.
Merriman, B., Rothberg, J.M., Ion Torrent R&D Team, 2012. Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33, 3397–3417. Available at:
https://doi.org/10.1002/elps.201200424.
Miller, J.R., Koren, S., Sutton, G., 2010. Assembly algorithms for next-generation sequencing data. Genomics 95, 315–327. Available at: https://doi.org/10.1016/j.
ygeno.2010.03.001.
Minoche, A.E., Dohm, J.C., Himmelbauer, H., 2011. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.
Genome Biol. 12, R112. Available at: https://doi.org/10.1186/gb-2011-12-11-r112.
Nakamura, K., Oshima, T., Morimoto, T., et al., 2011. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39, e90. Available at: https://doi.org/10.1093/
nar/gkr344.
Nguyen, P., Ma, J., Pei, D., et al., 2011. Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire. BMC Genom. 12, 106. Available
at: https://doi.org/10.1186/1471-2164-12-106.
Ronaghi, M., Uhlén, M., Nyrén, P., 1998. A sequencing method based on real-time pyrophosphate. Science 281, 363–365. Available at: https://doi.org/10.1126/
science.281.5375.363.
Salmela, L., 2010. Correction of sequencing errors in a mixed set of reads. Bioinformatics 26, 1284–1290.
Salmela, L., Schröder, J., 2011. Correcting errors in short reads by multiple alignments. Bioinformatics 27, 1455–1461.
Shendure, J., Ji, H., 2008. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145. Available at: https://doi.org/10.1038/nbt1486.
Skums, P., Dimitrova, Z., Campo, D.S., et al., 2012. Efficient error correction for next-generation sequencing of viral amplicons, in: BMC Bioinformatics. BioMed Central. p. S6.
Tarabeux, J., Zeitouni, B., Moncoutier, V., et al., 2014. Streamlined ion torrent PGM-based diagnostics: BRCA1 and BRCA2 genes as a model. Eur. J. Hum. Genet. 22,
535–541. Available at: https://doi.org/10.1038/ejhg.2013.181.
Voelkerding, K.V., Dames, S.A., Durtschi, J.D., 2009. Next-generation sequencing: From basic research to diagnostics. Clin. Chem. 55, 641–658. Available at: https://doi.org/
10.1373/clinchem.2008.112789.
Yang, X., Dorman, K.S., Aluru, S., 2010. Reptile: representative tiling for short read error correction. Bioinformatics 26, 2526–2533.
Zerbino, D.R., Birney, E., 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829. Available at: https://doi.org/10.1101/
gr.074492.107.

Blast - Oreilly
No ratings yet
Blast - Oreilly
3 pages
Next generation sequencing platforms_FINAL.ptx (1)
No ratings yet
Next generation sequencing platforms_FINAL.ptx (1)
29 pages
Bioinformatics/Computationa L Tools For NGS Data Analysis: An Overview
No ratings yet
Bioinformatics/Computationa L Tools For NGS Data Analysis: An Overview
81 pages
NGSandApp
No ratings yet
NGSandApp
41 pages
Next generation sequencing platforms_28-03-2025
No ratings yet
Next generation sequencing platforms_28-03-2025
29 pages
Sushant - NGL Technologies Project
No ratings yet
Sushant - NGL Technologies Project
12 pages
Next Generation Sequencing
100% (1)
Next Generation Sequencing
26 pages
Submitted By: Imtiaz Ahmad: Principal
No ratings yet
Submitted By: Imtiaz Ahmad: Principal
3 pages
Long Read Sequencing in Deciphering Human Genetics To A Greater Depth
No ratings yet
Long Read Sequencing in Deciphering Human Genetics To A Greater Depth
15 pages
An Overview of Next-Generation Sequencing
No ratings yet
An Overview of Next-Generation Sequencing
25 pages
BFG_Chapter09_NGS_v04
No ratings yet
BFG_Chapter09_NGS_v04
123 pages
Genome Sequencing Projects
No ratings yet
Genome Sequencing Projects
7 pages
2.DNA Sequencing
No ratings yet
2.DNA Sequencing
27 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
13 pages
Next-Generation Sequencing in Oncology - Genetic Diagnosis, Risk Prediction and Cancer Classification
No ratings yet
Next-Generation Sequencing in Oncology - Genetic Diagnosis, Risk Prediction and Cancer Classification
57 pages
Grp#3
No ratings yet
Grp#3
21 pages
Next Generation Sequencing Analysis Lecture 02.
No ratings yet
Next Generation Sequencing Analysis Lecture 02.
19 pages
NGS notes
No ratings yet
NGS notes
2 pages
NGSand App
No ratings yet
NGSand App
41 pages
NGS Workshop Update
No ratings yet
NGS Workshop Update
98 pages
Introduction To Next-Generation Sequencing Technology
No ratings yet
Introduction To Next-Generation Sequencing Technology
12 pages
Intech NGS Imp PDF
No ratings yet
Intech NGS Imp PDF
59 pages
A Practical Guide To NGS 08 05 17 Digital
No ratings yet
A Practical Guide To NGS 08 05 17 Digital
76 pages
Illumina Sequencing Introduction
No ratings yet
Illumina Sequencing Introduction
12 pages
dcvmn_dgenovese
No ratings yet
dcvmn_dgenovese
69 pages
Next Generation Project
No ratings yet
Next Generation Project
20 pages
Margue Rat 2010
No ratings yet
Margue Rat 2010
11 pages
Next-Generation DNA Sequencing: Diana Le Duc, M.D. Biochemistry Institute, Medical Faculty, University of Leipzig
No ratings yet
Next-Generation DNA Sequencing: Diana Le Duc, M.D. Biochemistry Institute, Medical Faculty, University of Leipzig
40 pages
Intro and Sequencing Tech
No ratings yet
Intro and Sequencing Tech
50 pages
DNA Sequencing Methods
No ratings yet
DNA Sequencing Methods
36 pages
Oxford Nanopore Minion Sequencing and Genome Assembly: Genomics Proteomics Bioinformatics
No ratings yet
Oxford Nanopore Minion Sequencing and Genome Assembly: Genomics Proteomics Bioinformatics
15 pages
KCL NGScourse Session1 Handout
No ratings yet
KCL NGScourse Session1 Handout
19 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
23 pages
2015 Article 76
No ratings yet
2015 Article 76
8 pages
Sequencing Technologies - The Next Generation: Michael L. Metzker
No ratings yet
Sequencing Technologies - The Next Generation: Michael L. Metzker
16 pages
Nucleic Acid Sequencing
No ratings yet
Nucleic Acid Sequencing
59 pages
Next Generation Sequencing Platforms PDF
No ratings yet
Next Generation Sequencing Platforms PDF
5 pages
Tecnologias de Secuenciacion de Siguiente Generacion
No ratings yet
Tecnologias de Secuenciacion de Siguiente Generacion
16 pages
7 - APA478 - Clase 7. Aplicaciones Genómica
No ratings yet
7 - APA478 - Clase 7. Aplicaciones Genómica
40 pages
Assembly of Large Genomes Using Second-Generation Sequencing PDF
No ratings yet
Assembly of Large Genomes Using Second-Generation Sequencing PDF
10 pages
Next Generation Sequencing - : An Overview
No ratings yet
Next Generation Sequencing - : An Overview
46 pages
Bioinformatics (5)
No ratings yet
Bioinformatics (5)
26 pages
Equipe1 - A Tale of Three Next Generation Sequencing Platforms - Comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq Sequencers
No ratings yet
Equipe1 - A Tale of Three Next Generation Sequencing Platforms - Comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq Sequencers
13 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
6 pages
Principle, Analysis, Application and Challenges of Next-Generation Sequencing: A Review
No ratings yet
Principle, Analysis, Application and Challenges of Next-Generation Sequencing: A Review
30 pages
Ensamblaje de Genomas PDF
No ratings yet
Ensamblaje de Genomas PDF
95 pages
Next Generation Sequencing - Final
100% (1)
Next Generation Sequencing - Final
33 pages
Nanopore Sequencing
No ratings yet
Nanopore Sequencing
16 pages
01 Lecture
No ratings yet
01 Lecture
50 pages
Next Generation Sequencing Ebook
No ratings yet
Next Generation Sequencing Ebook
21 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
44 pages
Lecture13 2022
No ratings yet
Lecture13 2022
20 pages
Next Generation Sequencing 101
No ratings yet
Next Generation Sequencing 101
36 pages
Next Generation Sequencing
100% (1)
Next Generation Sequencing
6 pages
Assignment 2, BBT 609 GV2-1-2.
No ratings yet
Assignment 2, BBT 609 GV2-1-2.
6 pages
Genomics 2
No ratings yet
Genomics 2
15 pages
The Handbook of Plant Functional Genomics: Concepts and Protocols
From Everand
The Handbook of Plant Functional Genomics: Concepts and Protocols
Guenter Kahl
No ratings yet
DNA Methods in Food Safety: Molecular Typing of Foodborne and Waterborne Bacterial Pathogens
From Everand
DNA Methods in Food Safety: Molecular Typing of Foodborne and Waterborne Bacterial Pathogens
Omar A. Oyarzabal
No ratings yet
Applied Digital Signal Processing and Applications
From Everand
Applied Digital Signal Processing and Applications
Othman Omran Khalifa
No ratings yet
Advanced Perl Techniques for Bioinformatics: Optimizing Data Analysis and Computational Biology
From Everand
Advanced Perl Techniques for Bioinformatics: Optimizing Data Analysis and Computational Biology
Adam Jones
No ratings yet
A Comprehensive Guide to Nanoparticles in Medicine
From Everand
A Comprehensive Guide to Nanoparticles in Medicine
Rituparna Acharya
No ratings yet
Genome Wide Analysis and Identification of NRAMP Gene Family in Wheat (Triticum Aestivum L.)
No ratings yet
Genome Wide Analysis and Identification of NRAMP Gene Family in Wheat (Triticum Aestivum L.)
6 pages
Inspector - A4 - IM - Eng Web - 0
No ratings yet
Inspector - A4 - IM - Eng Web - 0
5 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
Building Phylogenetic Trees From Molecular Data With MEGA: Molecular Biology and Evolution March 2013
No ratings yet
Building Phylogenetic Trees From Molecular Data With MEGA: Molecular Biology and Evolution March 2013
8 pages
Tools For Genomic Data
No ratings yet
Tools For Genomic Data
81 pages
Analysing Genome Alignments With Gubbins
No ratings yet
Analysing Genome Alignments With Gubbins
10 pages
VRsec BIOINFORMATICS
No ratings yet
VRsec BIOINFORMATICS
2 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Protein Structure
No ratings yet
Protein Structure
52 pages
CMB Lab Exp 9
No ratings yet
CMB Lab Exp 9
9 pages
Multiple Sequence Alignment For Construction of Phylogenetic Tree
No ratings yet
Multiple Sequence Alignment For Construction of Phylogenetic Tree
5 pages
Genomes 2 Cap16
No ratings yet
Genomes 2 Cap16
34 pages
Swiss PDB Viewer Exercises & Answers: General Instructions
No ratings yet
Swiss PDB Viewer Exercises & Answers: General Instructions
9 pages
Dynamic Programming: Longest Common Subsequences
No ratings yet
Dynamic Programming: Longest Common Subsequences
11 pages
Where can buy (Ebook) Bioinformatics: Genes, Proteins and Computers by Christine Orengo, David Jones, Janet Thornton ISBN 9781859960547, 1859960545 ebook with cheap price
100% (1)
Where can buy (Ebook) Bioinformatics: Genes, Proteins and Computers by Christine Orengo, David Jones, Janet Thornton ISBN 9781859960547, 1859960545 ebook with cheap price
72 pages
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
No ratings yet
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
52 pages
Practice Operation RIset
No ratings yet
Practice Operation RIset
3 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Linux For Bioinformatics (2012), Paul Stothard
100% (1)
Linux For Bioinformatics (2012), Paul Stothard
36 pages
Bioinformatics Syllabus For M.Sc.
No ratings yet
Bioinformatics Syllabus For M.Sc.
19 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Bosque:: Software System For Phylogenetic Analysis
No ratings yet
Bosque:: Software System For Phylogenetic Analysis
39 pages
Bioinformatics Primer (An Introductory Handbook For Bioinformatics Practitioners)
No ratings yet
Bioinformatics Primer (An Introductory Handbook For Bioinformatics Practitioners)
258 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
BSc(H)Biotech III Yr_24-25 syllabus
No ratings yet
BSc(H)Biotech III Yr_24-25 syllabus
50 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
21 pages
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
No ratings yet
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
6 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
MUMmer PDF
No ratings yet
MUMmer PDF
8 pages