0% found this document useful (0 votes)

2 views

NGS Data Analysis

Uploaded by

lucylit0666

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

NGS Data Analysis

Uploaded by

lucylit0666

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

NGS Data Analysis

NGS (Next-Generation Sequencing) generates massive amounts of raw data,

requiring systematic analysis to ensure accuracy and reliability. The initial
steps include handling FASTQ files, performing a quality check, and
applying pre-processing steps to prepare the data for downstream
analysis.

1. FASTQ Files

What are FASTQ Files?

• FASTQ is a standard file format for storing raw sequence data

generated from NGS platforms (e.g., Illumina, Oxford Nanopore).
• It combines both nucleotide sequence data and quality scores in a
single file.

Structure of a FASTQ File:

Each sequence entry in a FASTQ file consists of 4 lines:

1. Sequence Identifier: Starts with @ followed by a unique sequence

identifier.
2. Sequence: The actual nucleotide sequence (A, T, G, C, N).
3. Plus (+) Line: A + symbol, often followed by the sequence ID
(optional).
4. Quality Scores: ASCII-encoded quality scores corresponding to each
base in the sequence.

@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGT
+
!''*((((***+))%%%++)(%%%%).1***-+*''

Key Tools for Handling FASTQ Files:

• FASTQC: Quality control checks.

• seqtk: Lightweight toolkit for FASTQ file manipulation.
• FASTP: FASTQ pre-processing tool.

2. Quality Check (QC)

Why is Quality Check Important?

• Ensures the accuracy of raw sequencing data.

• Identifies poor-quality reads, adapter contamination, and other
sequencing artifacts.
• Prevents downstream errors in alignment, variant calling, or
assembly.

Key Metrics in Quality Control:

1. Per-base Sequence Quality: Quality scores across each nucleotide

position.
2. Per-sequence Quality Scores: Overall quality distribution of all
reads.
3. Adapter Content: Detects adapter sequences that may still be
present in reads.
4. GC Content: Ensures uniform GC distribution.
5. Read Length Distribution: Consistency in read lengths across
samples.
6. Duplicated Reads: Identifies PCR duplicates.

Quality Control Tools:

• FASTQC: Comprehensive quality assessment.

• MultiQC: Aggregates multiple FASTQC reports.
• Trim Galore!: Combines adapter trimming and QC filtering.

Example Output from FASTQC:

• Green: Good quality.

• Orange: Warning.
• Red: Poor quality (requires intervention).

3. Pre-processing

What is Pre-processing?

Pre-processing involves cleaning and preparing raw sequencing data for

downstream analysis. It includes:

1. Adapter Trimming
2. Quality Filtering
3. Read Trimming and Cropping
4. Removal of Low-quality Reads
5. De-duplication
Key Steps in Pre-processing:

1. Adapter Trimming:

• Adapters are short sequences added during library preparation.

• Residual adapter sequences can interfere with alignment and
analysis.
• Tools:
o Cutadapt
o Trimmomatic

2. Quality Filtering:

• Removes reads with poor-quality scores.

• Filters based on:
o Minimum Phred Score (e.g., Q30)
o Minimum read length (e.g., >50 bp)
• Tools:
o FASTP
o PRINSEQ

3. Read Trimming and Cropping:

• Trims poor-quality bases from the ends of reads.

• Crops reads to a specific length if required.
• Tools:
o Sickle
o Trim Galore!

4. Removal of Contaminants:

• Identifies and removes reads originating from non-target sources

(e.g., host genomes, bacterial contamination).
• Tools:
o Bowtie2
o Kraken2

5. De-duplication:

• PCR duplicates arise from library amplification and should be

removed to prevent bias.
• Tools:
o Picard (MarkDuplicates)
o Samtools rmdup
4. Workflow Summary:

Step Purpose Tools

1. Quality Check Assess raw data FASTQC, MultiQC
(QC) quality

2. Adapter Remove adapter Cutadapt,

Trimming sequences Trimmomatic

3. Quality Filtering Remove low-quality FASTP, PRINSEQ

reads

4. Read Trimming Remove low-quality Sickle, Trim Galore!

bases

5. Contaminant Filter unwanted Bowtie2, Kraken2

Removal reads

6. De-duplication Remove PCR Picard, Samtools

duplicates

Final Output After Pre-processing:

• Cleaned FASTQ Files: High-quality reads, free from adapters and

contaminants.
• Quality Metrics Report: Ensures the data meets downstream
analysis requirements.

Key Takeaways:

1. FASTQ Files: Store raw sequencing reads and quality scores.

2. Quality Check: Detects sequencing errors and biases using tools like
FASTQC.
3. Pre-processing: Improves data quality by trimming adapters,
filtering low-quality reads, and removing contaminants.
4. Tools: Essential tools include FASTQC, Cutadapt, Trimmomatic,
Bowtie2, and Picard.
5. Next Steps After Pre-processing: Alignment, variant calling,
transcriptome assembly, or metagenomic analysis.

Extreme Privacy What It Takes To Disappear
93% (15)
Extreme Privacy What It Takes To Disappear
514 pages
How To Download Documents From Scribd For Free - 7 Methods
67% (9)
How To Download Documents From Scribd For Free - 7 Methods
25 pages
DNM Bible
100% (2)
DNM Bible
117 pages
The Hypnotic Writer's Swipe File
100% (22)
The Hypnotic Writer's Swipe File
88 pages
250 Windows 10 11 Keyboard Shortcuts PDF
No ratings yet
250 Windows 10 11 Keyboard Shortcuts PDF
9 pages
Extreme Privacy - Linux Devices (Michael Bazzell)
No ratings yet
Extreme Privacy - Linux Devices (Michael Bazzell)
90 pages
Android Secret Codes PDF Book
100% (9)
Android Secret Codes PDF Book
8 pages
Top 800+ Latest Android Secret Codes - Hidden Codes 2019
100% (10)
Top 800+ Latest Android Secret Codes - Hidden Codes 2019
12 pages
Extreme Privacy What It Takes To Disappear 4nbsped 8431566361 9798431566363
No ratings yet
Extreme Privacy What It Takes To Disappear 4nbsped 8431566361 9798431566363
785 pages
Michael Bazzell - Open Source Intelligence Techniques - Resources For Searching and Analyzing Online Information-Createspace Independent Publishing Platform (2021)
100% (10)
Michael Bazzell - Open Source Intelligence Techniques - Resources For Searching and Analyzing Online Information-Createspace Independent Publishing Platform (2021)
669 pages
Secret Code List For Android and All Samsung - Code Exercise
100% (2)
Secret Code List For Android and All Samsung - Code Exercise
6 pages
Windows 10 by Gilbert Watts
No ratings yet
Windows 10 by Gilbert Watts
121 pages
Restore Galaxy Null IMEI # and Fix Not Registered On Network
No ratings yet
Restore Galaxy Null IMEI # and Fix Not Registered On Network
36 pages
How To Legally Unlock Icloud PDF
50% (2)
How To Legally Unlock Icloud PDF
19 pages
Extreme Privacy - Mobile Devices
100% (5)
Extreme Privacy - Mobile Devices
135 pages
Successful PCR Guide: 3rd Edition
100% (3)
Successful PCR Guide: 3rd Edition
60 pages
Quality Assurance in Analytical Chemistry
From Everand
Quality Assurance in Analytical Chemistry
Elizabeth Prichard
No ratings yet
How To Cancel Your Premium Membership - Scribd Help Center
17% (23)
How To Cancel Your Premium Membership - Scribd Help Center
1 page
dengue virus
No ratings yet
dengue virus
12 pages
Analysis of SARS-CoV-2
No ratings yet
Analysis of SARS-CoV-2
11 pages
Metagenomic Sequencing With Oxford Nanopore
No ratings yet
Metagenomic Sequencing With Oxford Nanopore
4 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
34 Fastp An Ultra
No ratings yet
34 Fastp An Ultra
7 pages
Lecture- 11 Common Tools for Packet Analysis
No ratings yet
Lecture- 11 Common Tools for Packet Analysis
22 pages
Glossary of Terms B4B
No ratings yet
Glossary of Terms B4B
8 pages
HMCW NGS Data Format
No ratings yet
HMCW NGS Data Format
21 pages
Webinar 4 MassHunter Qualitative Analysis Workflows
No ratings yet
Webinar 4 MassHunter Qualitative Analysis Workflows
73 pages
Bioinformatics Workshops
No ratings yet
Bioinformatics Workshops
49 pages
WireShark Intro&Installation
No ratings yet
WireShark Intro&Installation
39 pages
Biffi-Sm Rev 1
No ratings yet
Biffi-Sm Rev 1
81 pages
Successful PCR Guide
No ratings yet
Successful PCR Guide
48 pages
Pyro Tagger
No ratings yet
Pyro Tagger
8 pages
Lecture 3.1 (Capillary Sequencing
No ratings yet
Lecture 3.1 (Capillary Sequencing
19 pages
Genomic Data Preprocessing Through Different Libraries
No ratings yet
Genomic Data Preprocessing Through Different Libraries
30 pages
Maverick Quick Guide-V06-En
No ratings yet
Maverick Quick Guide-V06-En
30 pages
IT304 Lab 1 Introduction To Wireshark.: 1 To Study and Analyze Network Interface Through Wireshark
No ratings yet
IT304 Lab 1 Introduction To Wireshark.: 1 To Study and Analyze Network Interface Through Wireshark
5 pages
RNAseq
No ratings yet
RNAseq
58 pages
TaKaRa Successful PCR Guide 3rd Ed
No ratings yet
TaKaRa Successful PCR Guide 3rd Ed
60 pages
Real Time PCR Guide Bio Rad
100% (1)
Real Time PCR Guide Bio Rad
105 pages
PCR RT-PCR QPCR Application Handbook Abm B4S1U0
No ratings yet
PCR RT-PCR QPCR Application Handbook Abm B4S1U0
33 pages
NGS ToolsFormats r1 BDG
No ratings yet
NGS ToolsFormats r1 BDG
32 pages
Faststart Universal SYBR Green Master (ROX)
No ratings yet
Faststart Universal SYBR Green Master (ROX)
4 pages
R NGS
No ratings yet
R NGS
29 pages
Illumina
No ratings yet
Illumina
68 pages
4_RNAseq_datapreprocessing
No ratings yet
4_RNAseq_datapreprocessing
43 pages
Lession 3.2
No ratings yet
Lession 3.2
35 pages
RPA
No ratings yet
RPA
2 pages
Unit Testing
No ratings yet
Unit Testing
26 pages
approach
No ratings yet
approach
17 pages
TECSEC-2666 Part 2 PDF
No ratings yet
TECSEC-2666 Part 2 PDF
246 pages
1 - IntroductionBIG Chiopstetr
No ratings yet
1 - IntroductionBIG Chiopstetr
17 pages
NGS QC Metrics
No ratings yet
NGS QC Metrics
7 pages
Chapter 8 pdf
No ratings yet
Chapter 8 pdf
6 pages
Blank en Berg Pittsburgh 2011 Ngs
No ratings yet
Blank en Berg Pittsburgh 2011 Ngs
59 pages
Design Primer
No ratings yet
Design Primer
35 pages
Silabus MTCNA PDF
No ratings yet
Silabus MTCNA PDF
6 pages
Genome Portal Technical Document
No ratings yet
Genome Portal Technical Document
7 pages
NGS library preparation
No ratings yet
NGS library preparation
13 pages
Qualitative analysis - Introduction
No ratings yet
Qualitative analysis - Introduction
64 pages
V2Chapter 6 Advanced Malware Analysis Techniques V2
No ratings yet
V2Chapter 6 Advanced Malware Analysis Techniques V2
27 pages
Bulletin_2765
No ratings yet
Bulletin_2765
4 pages
Abacus 3CT V3-2015 Press
50% (2)
Abacus 3CT V3-2015 Press
4 pages
ChIP-qPCR Assays Technology Overview
No ratings yet
ChIP-qPCR Assays Technology Overview
25 pages
Qualification of High-Performance Liquid
No ratings yet
Qualification of High-Performance Liquid
9 pages
Module 9 Network Protocol Analysis
No ratings yet
Module 9 Network Protocol Analysis
17 pages
Cobas Ampliprep Taqman
No ratings yet
Cobas Ampliprep Taqman
4 pages
Primer Design: Dept. of Biochemistry Pmas Arid Agriculture Univ
No ratings yet
Primer Design: Dept. of Biochemistry Pmas Arid Agriculture Univ
34 pages
Roche/454 Genome Sequencing: Applications
No ratings yet
Roche/454 Genome Sequencing: Applications
12 pages
Massively Parallel Sequencing For Forensic DNA Using In-House PCR
No ratings yet
Massively Parallel Sequencing For Forensic DNA Using In-House PCR
46 pages
Brosur Mercury Analyzer.pdf
No ratings yet
Brosur Mercury Analyzer.pdf
8 pages
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
No ratings yet
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
51 pages
Bioinformatics - 5: Primer Designing
No ratings yet
Bioinformatics - 5: Primer Designing
25 pages
Process Analytical Technology
No ratings yet
Process Analytical Technology
20 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Data Manipulation with Python Step by Step: A Practical Guide with Examples
From Everand
Data Manipulation with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Ngs Technologies
No ratings yet
Ngs Technologies
34 pages
NGS Data Sources
No ratings yet
NGS Data Sources
3 pages
Metagenomics-Lucylegion
No ratings yet
Metagenomics-Lucylegion
21 pages
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures From Protein Sequences
No ratings yet
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures From Protein Sequences
21 pages
Form For External PG Students
No ratings yet
Form For External PG Students
2 pages
Sanger sequencing
No ratings yet
Sanger sequencing
16 pages
Proteins - 2021 - Alexander - Target Highlights in CASP14 Analysis of Models by Structure Providers
No ratings yet
Proteins - 2021 - Alexander - Target Highlights in CASP14 Analysis of Models by Structure Providers
26 pages
Mac OS X Hacks
No ratings yet
Mac OS X Hacks
504 pages
Free Prepper and Survival Manuals PDF S
No ratings yet
Free Prepper and Survival Manuals PDF S
19 pages
LINUX Administrator's Quick Reference Card: User Management NFS File Sharing
100% (3)
LINUX Administrator's Quick Reference Card: User Management NFS File Sharing
6 pages
How To Get Verified Cash App Accounts - 2024 Safe and Secure...
No ratings yet
How To Get Verified Cash App Accounts - 2024 Safe and Secure...
6 pages
Mand Line Scripting 0
100% (1)
Mand Line Scripting 0
458 pages
Apps For Musicians
100% (1)
Apps For Musicians
2 pages
The NSHipster Fake Book
33% (12)
The NSHipster Fake Book
108 pages
Linux Command
No ratings yet
Linux Command
135 pages
Iphone Software Unlock Tutorial
100% (3)
Iphone Software Unlock Tutorial
4 pages
Coding Python
100% (8)
Coding Python
252 pages
Linux
100% (2)
Linux
21 pages
Master PowerShell Tricks Volume 3
100% (1)
Master PowerShell Tricks Volume 3
254 pages
Creating VBA Add-Ins To Extend and Automate Microsoft Office Documents
100% (1)
Creating VBA Add-Ins To Extend and Automate Microsoft Office Documents
26 pages
Ubuntu The Complete Manual 2016
No ratings yet
Ubuntu The Complete Manual 2016
132 pages
Physical-Science - 11 - Q1 - 07 Biological-Macromolecules-revised - 08082020
100% (10)
Physical-Science - 11 - Q1 - 07 Biological-Macromolecules-revised - 08082020
18 pages
Analytical and Preparative Separation Methods of Biomacromolecules PDF
No ratings yet
Analytical and Preparative Separation Methods of Biomacromolecules PDF
465 pages
Animal Cell Organelles Functions
No ratings yet
Animal Cell Organelles Functions
1 page
June 2019 (IAL) MS - Unit 1 Edexcel Biology A-Level
No ratings yet
June 2019 (IAL) MS - Unit 1 Edexcel Biology A-Level
27 pages
Practice Exam 2017 PDF
No ratings yet
Practice Exam 2017 PDF
8 pages
Glycolysis: Professor DR Dina Sabry
No ratings yet
Glycolysis: Professor DR Dina Sabry
11 pages
MBG2004 Variant Detection and Methods (SV and CNV) Week VI
No ratings yet
MBG2004 Variant Detection and Methods (SV and CNV) Week VI
73 pages
Info Brochure PAGEInstructions en
No ratings yet
Info Brochure PAGEInstructions en
6 pages
Basic Biology of Plasmid and phage vectors
No ratings yet
Basic Biology of Plasmid and phage vectors
23 pages
AP Bio Chapter 12 Notes
100% (1)
AP Bio Chapter 12 Notes
3 pages
Molecular Detection of Pork Adulteration A Study Based On Dairy Products in Sri Lanka
No ratings yet
Molecular Detection of Pork Adulteration A Study Based On Dairy Products in Sri Lanka
5 pages
Cell Membrane
No ratings yet
Cell Membrane
3 pages
GARATA - Week 1 - GB1 - Activity 3 - Worksheet
No ratings yet
GARATA - Week 1 - GB1 - Activity 3 - Worksheet
4 pages
CH 03 CaseStudy With Worksheet
100% (1)
CH 03 CaseStudy With Worksheet
4 pages
A Rapid Protocol For Purification of Total RNA
No ratings yet
A Rapid Protocol For Purification of Total RNA
5 pages
Crispr Cas 9
No ratings yet
Crispr Cas 9
3 pages
Molecules: Banana Lectin: A Brief Review
No ratings yet
Molecules: Banana Lectin: A Brief Review
11 pages
Notes - Module 1 Cells As The Basis of Life Sibel
No ratings yet
Notes - Module 1 Cells As The Basis of Life Sibel
10 pages
Questions Molecular Cloning
No ratings yet
Questions Molecular Cloning
4 pages
Evidence and Perspectives of Cell Senescence in Neurodegenerative Diseases
No ratings yet
Evidence and Perspectives of Cell Senescence in Neurodegenerative Diseases
11 pages
Helini HLA-B 27 Real-Time PCR Kit
No ratings yet
Helini HLA-B 27 Real-Time PCR Kit
10 pages
Bio Ch. 7 WB Answers0
100% (3)
Bio Ch. 7 WB Answers0
15 pages
Kolano Cheno
No ratings yet
Kolano Cheno
15 pages