67% found this document useful (3 votes)

1K views

Data Retrieval

There are three main data retrieval systems for molecular biology databases: Sequence Retrieval System (SRS), Entrez, and DBGET. SRS provides access to over 80 biological databases developed at EBI, Entrez integrates databases from NCBI, and DBGET is part of the Japanese GenomeNet service. These systems allow text searches across multiple databases and provide links to relevant information matching search criteria. There are also data mining tools that retrieve data from genomic databases and visualization tools for proteomic databases, including tools for homology, protein function, sequence analysis, and structural analysis.

Uploaded by

Ayesha Khan

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

67% found this document useful (3 votes)

1K views

Data Retrieval

Uploaded by

Ayesha Khan

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Data retrieval means obtaining data from a database management system such as ODBMS.

The retrieved data may be stored in a file, printed, or viewed on the screen. A query language,
such as Structured Query Language (SQL), is used to prepare the queries.

Database-Searching in retrieving tools:

The amount of biological relevant data is increasing so rapidly, its important to know how to
access and search this information is essential.

There are three data retrieval systems of relevance to molecular biologist:

1. Sequence Retrieval System (SRS),

2. Entrez,
3. DBGET

These systems allow text searching of multiple molecular biology database and provide links to
relevant information for entries that match the search criteria. The three systems differ in the
databases they search and the links they have to other information.

Sequence Retrieval System (SRS) :

 SRS is a homogeneous interface to over 80 biological databases that had been developed at
the European Bioinformatics Institute (EBI) at Hinxton, UK.
 It includes databases of sequences, metabolic pathways, transcription factors, application
results (like BLAST, SSEARCH, FASTA), protein 3-D structures, genomes, mappings,
mutations, and locus specific mutations.
 The web page listing all the databases contains a link to a description page about the database
including the date on which it was last updated. One or more of the databases is selected to
search before entering your query.
 After getting results, choose an alignment algorithm (like CLUSTALW, PHYLIP) enter
parameters, and run it.
 The SRS is highly recommended for use.
Entrez:

 Entrez is a molecular biology database and retrieval system.

 Developed by the National Center for Biotechnology information (NCBI).
 It is entry point for exploring distinct but integrated databases. Of the three text-based
database systems, Entrez is the easiest to use, but also offers more limited information to
search.
 Entrez is both an indexing and retrieval system having data from various sources for
biomedical research.
 Entrez is composed of nucleotide sequences from PDB and GenBank, protein sequences
from SWISS-PROT, translated GenBank, PIR, PRF, PDB and associated abstracts and
citations from PubMed.
 The Entrez system can provide views of gene and protein sequences and chromosome maps.

DBGET:

 The integrated database retrieval system DBGET/LinkDB is the backbone of the Japanese

GenomeNet service.
 DBGET is used to search and extract entries from a wide range of molecular biology

databases, while LinkDB is used to search and compute links between entries in different
databases.

 The WWW version of DBGET/LinkDB at GenomeNet is integrated with other search tools,
such as BLAST, FASTA and MOTIF, and with local helper applications, such as RasMol.

Data Retrieving tools:

There are data-mining software that retrieve data from genomic sequence databases and also
visualization tools to analyze and retrieve information from proteomic databases. These are

 homology and similarity tools,

 Protein functional analysis tools,
 Sequence analysis tools.
Homology and Similarity Tools:
Homologous sequences are sequences that are related by divergence from a common ancestor.
Thus, the degree of similarity between two sequences can be measured while their homology is a
case of being either true of false. This set of tools can be used to identify similarities between
novel query sequences of unknown structure and function and database sequences whose
structure and function have been elucidated.
Protein Function Analysis tools:
This group of programs allow you to compare your protein sequence to the secondary (or
derived) protein databases that contain information on motifs, signatures and protein domains.
Highly significant hits against these different pattern databases allow you to approximate the
biochemical function of your query protein.
Structural Analysis tools:
This set of tools allow you to compare structures with the known structure databases. The
function of a protein is more directly a consequence of its structure rather than its sequence with
structural homologs tending to share functions. The determination of a protein's 2D/3D structure
is crucial in the study of its function.
Sequence Analysis tools:
This set of tools allows you to carry out further, more detailed analysis on your query sequence
including evolutionary analysis, identification of mutations, hydropathy regions, CpG islands
and compositional biases. The identification of these and other biological properties are all clues
that aid the search to elucidate the specific function of your sequence.

Some examples of Bioinformatics Tools:

BLAST:
BLAST ( Basic Local Alignment Search Tool) comes under the category of homology and
similarity tools.

It is a set of search programs designed for the Windows platform and is used to
perform fast similarity searches regardless of whether the query is for protein or DNA.
Comparison of nucleotide sequences in a database can be performed. Also a protein database can
be searched to find a match against the queried protein sequence. NCBI has also introduced the
new queuing system to BLAST (Q BLAST) that allows users to retrieve results at their
convenience and format their results multiple times with different formatting options.
Depending on the type of sequences to compare, there are different programs:

 blastp compares an amino acid query sequence against a protein sequence database

 blastn compares a nucleotide query sequence against a nucleotide sequence database

 blastx compares a nucleotide query sequence translated in all reading frames against a
protein sequence database
 tblastn compares a protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames
 tblastx compares the six-frame translations of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.

FASTA:
FAST is an alignment program for protein sequences created by Pearsin and Lipman in 1988.
The program is one of the many heuristic algorithms proposed to speed up sequence comparison.
The basic idea is to add a fast prescreen step to locate the highly matching segments between two
sequences, and then extend these matching segments to local alignments using more rigorous
algorithms such as Smith-Waterman.
EMBOSS:
EMBOSS (European Molecular Biology Open Software Suite) is a software-analysis package. It
can work with data in a range of formats and also retrieve sequence data transparently from the
Web. Extensive libraries are also provided with this package, allowing other scientists to release
their software as open source. It provides a set of sequence-analysis programs, and also supports
all UNIX platforms.
Clustalw:
It is a fully automated sequence alignment tool for DNA and protein sequences. It returns the
best match over a total length of input sequences, be it a protein or a nucleic acid.
Bioinformatics tools for analysis of DNA:
RasMol:
It is a powerful research tool to display the structure of DNA, proteins, and smaller molecules.
Protein Explorer, a derivative of RasMol, is an easier to use program.

WebAct- This is the web version of ACT (Artemis Comparison Tool) a DNA sequence
comparison viewer based on Artemis. (http://www.webact.org).

• BASys- It is known as Bacterial Annotation Tool. It is far-fetched tool which supports

automated and in-depth annotation. (http://basys.ca/basys/cgi/submit.pl).

Electronic PCR:

Identifies sequence tagged sites (STSs) within DNA sequences.

Open Reading Frame Finder (ORF Finder):

Suggests potential open reading frames in a DNA sequence.

Splign:

Computes alignments of cDNAs to genomic nucleotide sequences.

OSIRIS:

Facilitates the assessment of multiplex short tandem repeat (STR) DNA profiles based on
laboratory-specific protocols.

LALIGN- It finds multiple matching sub-segments in two sequences. It provides or assigns one
with % identity for different sub segments. (http://www.lalign.org).

• GraphAlin- It presents the output file in graphical and numerical form of % identity between
two proteins, or RNA or DNA molecules. (http://www.graphalin.org).
• GeneOrder- It is an ideal tool for the alignment of small GenBank genome sequences (up to
0.25Mb). It has a new version as GeneOrder 3.0. (http://www.genesorder.org).

• CoreGenes- It is designed to analyze two to five genomes simultaneously, it also generates a

table of related genes i.e. orthologs and putative orthologs. It has a limit of 0.35 Mb.
(http://www.coregenes.org).

Phenotype-Genotype Integrator (phogenl) :

Finds human phenotype/genotype relationships with queries by phenotype,

chromosome location, gene, and SNP identifiers.

BLAST RefseqGene:

Finds regions of local similarity between query sequences and genomic sequences in the
RefSeqGene/LRG set

ORF finder:

Suggests potential open reading frames in a DNA sequence.

Vec Screen:

Identifies segments of a nucleotide sequence that may be of vector origin.

Clustal Omega (EBI):

Multiple sequence alignment programs for DNA or proteins.

Clustal W- PBIL:

Multiple sequence alignment programs for DNA.

GENIO/Logo:

Graphic representation of an amino acid or DNA/RNA multiple sequence alignment.

Bioinformatics tools for analysis of protein:

A. Protein structure Databases

Protein Data Bank (PDB) :

PDB is a very large universal storage place of processing and distribution of 3- dimensional structure
data of macromolecules. the information in PDB derived from variety tools and experiments like
NMR, X-ray crystallography, microscopy, cryoelectron and theoretical modeling,. Accommodations
of the database for users are access to structural data, providing methods for visualizing the
structure and downloading structural information.[7] NCBI Structure Database (MMDB): It includes
database of 3D structure of biomolecules which experimentally determined.Most of these data
derived from X-ray crystallography and NMR spectroscopy. The database provide biologists with a
broad information on biological functions of proteins, on mechanisms related to their functions and
on relationship between biomolecules and their evolutionary history.Additionally this database
provide biologists with comparative analysis of 3D structure of proteins. NCBI also called as MMDB
(molecular modeling database) and includes 3D structure of macromolecules and visualization tools
for comparative analysis of proteins.[8] Database and tools for protein structure visualization: Cn3-D
: "see in 3-D" is a viewer of structural sequence alignment for MMDB database. It facilitates viewing
of 3-Dstructure and alignment of sequence –structure of structure-structure. It serves as a helper
application for the browser. Files can be downloaded to the pc and the application can be launched.

SWISS PDB Viewer:

It facilitates and network for analysis of several proteins simultaneously. The proteins lay over each
other in order to analyze structural alignment and provide comparison of their active sites, their
amino acid mutations angles, distances and H bonds between their atoms. This viewer is joined to
Swiss-Model server. [10] Chemscape Chime, Rasmol and protein explorer: This tool is one of the
usual tools for visualization of protein structure.It can read molecular structure files from PDB.
Chemscape chime serves as a plug in to permit structure visualization with browser. Protein explorer
serves as a plug in to permit viewing of protein structure with our browser. Both of these application
namely Chemscape chime and protein explorer are primary derivation of Rasmol.[11] Mage and
Kinemages: It is another tool for protein structure visualization. It is able for rotation of entire image
in real time, displaying of parts by turning off and on them, selection of points for their identification
and animation of change between different forms.[6] PDBsum : It is a database that facilitates a
large illustrated graphic summary of the main information on each biomolecular structure from the
protein data bank. It consists of images of structure, detailed structural analysis derived from
PROMOTIF program, schematic graphs of interactions, summary PROCHEK results [12] Protein
structure alignment tools: VAST (vector alignment sequence tool): it is a tool produced by NCBI and
provides identification of similar proteins with 3D structure. So it is structure similarity and search
service. [13]. DALI : It is an computational protein structure alignment tool used for comparison of
protein structure in 3D.[14] B: Domain architecture Database: Conserved Domain Database :(CDD) :
is a database contain sequence alignment and profiles, showing protein domain conserved during
molecular evolution course.[15] CDART: (Conserved Domain Architecture Retrieval Tool) used for
searching protein having similar domain architectures.[16] C. Bioinformatics tools for plotting
protein –ligand interactions: Ligplot : It is used to find out interaction between protein and ligand
also hydrogen and hydrophobic contacts can be represented in this tool.[17]. D. Approaches for
classification of proteins: Classification of proteins b several databases usually is on the basis of their
structural similarities. Both structural and evolutionary relationship is factors of their classification.
In hierarchy of proteins several levels exist but the main level considered are such as Family,
superfamily and fold Family: In this level proteins are grouped together into family having clear and
known evolutional relatedness so called as clear evolutionarily relationship level. Superfamily: In this
level proteins are with low sequence identities but their structural and functional characters suggest
a common evolutionary origin so the level called as probable common evolutionary origin. This
proteins positioned in superfamily level. Fold: In this level the proteins are not having evolutionary
origin but structural similarities derived from physics and chemistry of proteins facilitating certain
chain topologies and packing arrangements. So this level also called as major structural similarity
level. SCOP: It is a database for structural classification of proteins. It provides comprehensive
classification of structural and evolutionary relationships between those proteins with known
structures.[18]. CATH: (Class, Architecture, Topology and Homologous superfamily): This database
facilitates a hierarchical classification for domain structures of proteins, which cause clustering of
proteins at four different levels: C, A, T, H means Class, Architecture, Topology and Homologous
superfamily, respectively

PROSPECT:
PROSPECT (PROtein Structure Prediction and Evaluation Computer ToolKit) is a protein-
structure prediction system that employs a computational technique called protein threading to
construct a protein's 3-D model.

STRING: STRING stands for Search Tool for the Retrieval of Interacting Genes/Proteins. It is
associated with high through put experimental data, mining databases and literature, and from
predictions based on genomic context analysis. It assembles them in a common reference set, and
presents evidence in a consistent and intuitive web interface. (http://string.embl.org).

YASPIN: It is built on three individual web servers: cons-PPISP, PINUP, and Promate. It is
known as the Meta web server and is used for protein-protein interaction and site prediction.
(http://www.yaspin.org).

SPLIT: Trans membrane Protein Topology Prediction Server provides modified hydrophobic
moment index and clear, colorful output including beta reference (http://www.split).

OCTOPUS: This tool uses a novel combination of hidden Markov models and artificial neural
networks. It predicts the correct topology for 94% of the dataset of 124 sequences with known
structures. (http://octopus.org).

Swiss-port:

It contains annotated or commented sequences, that is, each sequence has been
reviewed, documented and linked to other databases.

TrEMBL:

Translation of EMBL Nucleotide Sequence Database includes the translation of all

coding sequences derived from (EMBL-BANK) and which have not yet been annotated in
Swiss-Prot.

PDB:
Protein Data Bank is the 3-D tertiary structure database of proteins that have been
crystallized. External link: PDB (http://www.rcsb.org/pdb/ )

COPIA :
COPIA (COnsensus Pattern Identification and Analysis) is a protein structure analysis tool for
discovering motifs (conserved regions) in a family of protein sequences. Such motifs can be then
used to determine membership to the family for new protein sequences, predict secondary and
tertiary structure and function of proteins and study evolution history of the sequences.

Amino acid Explorer:

Explores amino acid properties, substitutions and functions.

BLAST:

Finds regions of local similarity between biological sequences.

BLAST Link (Blink):

Displays the results of a pre computed BLAST search of a protein against all other protein
sequences at NCBI.

CD Tree:

Classifies protein sequences and investigates their evolutionary relationships.

Cn 3D:

Displays and manipulates 3 dimensional structures and alignments from the structure databases.

COBALT:

Performs protein multiple sequence alignment.

Concise Microbial Protein BLAST:

Finds regions of local similarity between query proteins and proteins from complete microbial
(prokaryotic) genome.

CDART:

It is abbreviated as Conserved Domain Architecture Retrieval Tool. It displays the functional

domains that make up a given protein sequence.

CD Search:

Identifies the conserve domains present in a protein sequence.

VAST:

It is abbreviated as Vector Alignment Search Tool. It identifies 3 dimensional protein structures.

Swiss-port:

It contains annotated or commented sequences, that is, each sequence has been
reviewed, documented and linked to other databases.

TrEMBL:

Translation of EMBL Nucleotide Sequence Database includes the translation of all

coding sequences derived from (EMBL-BANK) and which have not yet been annotated in
Swiss-Prot.

PDB:

Protein Data Bank is the 3-D tertiary structure database of proteins that have been
crystallized. External link: PDB (http://www.rcsb.org/pdb/ )

PIR:

Protein Information Resource is divided into four sub-bases that have a decreasing annotation
level. External link: PIR (http://pir.georgetown.edu/ )

INTERPRO:

It integrates information from various secondary structure databases such as

PROSITE, providing links to other databases and more extensive information. External link:
INTERPRO ( http://www.ebi.ac.uk/interpro/index.html )

Tools for the RNA analysis:

.
General tools[edit]
These tools perform normalization and calculate the abundance of each gene expressed in a
sample.[48] RPKM, FPKM and TPMs[49] are some of the units employed to quantification of expression.
Some software are also designed to study the variability of genetic expression between samples
(differential expression). Quantitative and differential studies are largely determined by the quality of
reads alignment and accuracy of isoforms reconstruction. Several studies are available comparing
differential expression methods.[50][51][52]

 ABSSeq a new RNA-Seq analysis method based on modelling absolute expression

differences.
 ALDEx2 is a tool for comparative analysis of high-throughput sequencing data. ALDEx2
uses compositional data analysis and can be applied to RNAseq, 16S rRNA gene sequencing,
metagenomic sequencing, and selective growth experiments.
 Alexa-Seq is a pipeline that makes possible to perform gene expression analysis, transcript
specific expression analysis, exon junction expression and quantitative alternative analysis.
Allows wide alternative expression visualization, statistics and graphs.
 ARH-seq – identification of differential splicing in RNA-seq data.
 ASC[53]
 Ballgown
 BaySeq is a Bioconductor package to identify differential expression using next-generation
sequencing data, via empirical Bayesian methods. There is an option of using the "snow"
package for parallelisation of computer data processing, recommended when dealing with large
data sets.
 GMNB[54] is a Bayesian method to temporal gene differential expression analysis across
different phenotypes or treatment conditions that naturally handles the heterogeneity of
sequencing depth in different samples, removing the need for ad-hoc normalization.
 BBSeq
 BitSeq (Bayesian Inference of Transcripts from Sequencing Data) is an application for
inferring expression levels of individual transcripts from sequencing (RNA-Seq) data and
estimating differential expression (DE) between conditions.
 CEDER Accurate detection of differentially expressed genes by combining significance of
exons using RNA-Seq.
 CPTRA The CPTRA package is for analyzing transcriptome sequencing data from different
sequencing platforms. It combines advantages of 454, Illumina GAII, or other platforms and can
perform sequence tag alignment and annotation, expression quantification tasks.
 casper is a Bioconductor package to quantify expression at the isoform level. It combines
using informative data summaries, flexible estimation of experimental biases and statistical
precision considerations which (reportedly) provide substantial reductions in estimation error.
 Cufflinks/Cuffdiff is appropriate to measure global de novo transcript isoform expression. It
performs assembly of transcripts, estimation of abundances and determines differential
expression (Cuffdiff) and regulation in RNA-Seq samples. [55]
 DESeq is a Bioconductor package to perform differential gene expression analysis based on
negative binomial distribution.
 DEGSeq
 Derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair
resolution via the DER Finder approach.
 DEvis is a powerful, integrated solution for the analysis of differential expression data. Using
DESeq2 as a framework, DEvis provides a wide variety of tools for data manipulation,
visualization, and project management.
 DEXSeq is Bioconductor package that finds differential differential exon usage based on
RNA-Seq exon counts between samples. DEXSeq employs negative binomial distribution,
provides options to visualization and exploration of the results.
 DEXUS is a Bioconductor package that identifies differentially expressed genes in RNA-Seq
data under all possible study designs such as studies without replicates, without sample groups,
and with unknown conditions.[56] In contrast to other methods, DEXUS does not need replicates
to detect differentially expressed transcripts, since the replicates (or conditions) are estimated by
the EM method for each transcript.
 DGEclust is a Python package for clustering expression data from RNA-seq, CAGE and
other NGS assays using a Hierarchical Dirichlet Process Mixture Model. The estimated cluster
configurations can be post-processed in order to identify differentially expressed genes and for
generating gene- and sample-wise dendrograms and heatmaps. [57]
 DiffSplice is a method for differential expression detection and visualization, not dependent
on gene annotations. This method is supported on identification of alternative splicing modules
(ASMs) that diverge in the different isoforms. A non-parametric test is applied to each ASM to
identify significant differential transcription with a measured false discovery rate.
 EBSeq is a Bioconductor package for identifying genes and isoforms differentially expressed
(DE) across two or more biological conditions in an RNA-seq experiment. It also can be used to
identify DE contigs after performing de novo transcriptome assembly. While performing DE
analysis on isoforms or contigs, different isoform/contig groups have varying estimation
uncertainties. EBSeq models the varying uncertainties using an empirical Bayes model with
different priors.
 EdgeR is a R package for analysis of differential expression of data from DNA sequencing
methods, like RNA-Seq, SAGE or ChIP-Seq data. edgeR employs statistical methods supported
on negative binomial distribution as a model for count variability.
 EdgeRun an R package for sensitive, functionally relevant differential expression discovery
using an unconditional exact test.
 EQP The exon quantification pipeline (EQP): a comprehensive approach to the quantification
of gene, exon and junction expression from RNA-seq data.
 ESAT The End Sequence Analysis Toolkit (ESAT) is specially designed to be applied for
quantification of annotation of specialized RNA-Seq gene libraries that target the 5' or 3' ends of
transcripts.
 eXpress performance includes transcript-level RNA-Seq quantification, allele-specific and
haplotype analysis and can estimate transcript abundances of the multiple isoforms present in a
gene. Although could be coupled directly with aligners (like Bowtie), eXpress can also be used
with de novo assemblers and thus is not needed a reference genome to perform alignment. It
runs on Linux, Mac and Windows.
 ERANGE performs alignment, normalization and quantification of expressed genes.
 featureCounts an efficient general-purpose read quantifier.
 FDM
 FineSplice Enhanced splice junction detection and estimation from RNA-Seq data.
 GFOLD[58] Generalized fold change for ranking differentially expressed genes from RNA-seq
data.
 globalSeq[59] Global test for counts: testing for association between RNA-Seq and high-
dimensional data.
 GPSeq This is a software tool to analyze RNA-seq data to estimate gene and exon
expression, identify differentially expressed genes, and differentially spliced exons.
 IsoDOT – Differential RNA-isoform Expression.
 Limma Limma powers differential expression analyses for RNA-sequencing and microarray
studies.
 LPEseq accurately test differential expression with a limited number of replicates.
 Kallisto "Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data,
or more generally of target sequences using high-throughput sequencing reads. It is based on
the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets,
without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can
quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only
the read sequences and a transcriptome index that itself takes less than 10 minutes to build."
 MATS Multivariate Analysis of Transcript Splicing (MATS).
 MAPTest provides a general testing framework for differential expression analysis of RNA-
Seq time course experiment. Method of the pack is based on latent negative-binomial Gaussian
mixture model. The proposed test is optimal in the maximum average power. The test allows not
only identification of traditional DE genes but also testing of a variety of composite hypotheses of
biological interest.[60]
 MetaDiff Differential isoform expression analysis using random-effects meta-regression.
 metaseqR is a Bioconductor package that detects differentially expressed genes from RNA-
Seq data by combining six statistical algorithms using weights estimated from their performance
with simulated data estimated from real data, either public or user-based. In this way, metaseqR
optimizes the tradeoff between precision and sensitivity.[61] In addition, metaseqR creates a
detailed and interactive report with a variety of diagnostic and exploration plots and auto-
generated text.
 MMSEQ is a pipeline for estimating isoform expression and allelic imbalance in diploid
organisms based on RNA-Seq. The pipeline employs tools like Bowtie, TopHat,
ArrayExpressHTS and SAMtools. Also, edgeR or DESeq to perform differential expression.
 MultiDE
 Myrna is a pipeline tool that runs in a cloud environment (Elastic MapReduce) or in a unique
computer for estimating differential gene expression in RNA-Seq datasets. Bowtie is employed
for short read alignment and R algorithms for interval calculations, normalization, and statistical
processing.
 NEUMA is a tool to estimate RNA abundances using length normalization, based on
uniquely aligned reads and mRNA isoform models. NEUMA uses known transcriptome data
available in databases like RefSeq.
 NOISeq NOISeq is a non-parametric approach for the identification of differentially
expressed genes from count data or previously normalized count data. NOISeq empirically
models the noise distribution of count changes by contrasting fold-change differences (M) and
absolute expression differences (D) for all the features in samples within the same condition.
 NPEBseq is a nonparametric empirical Bayesian-based method for differential expression
analysis.
 NSMAP allows inference of isoforms as well estimation of expression levels, without
annotated information. The exons are aligned and splice junctions are identified using TopHat.
All the possible isoforms are computed by a combination of the detected exons.
 NURD an implementation of a new method to estimate isoform expression from non-uniform
RNA-seq data.
 PANDORA An R package for the analysis and result reporting of RNA-Seq data by
combining multiple statistical algorithms.
 PennSeq PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by
modeling non-uniform read distribution.
 Quark Quark enables semi-reference-based compression of RNA-seq data.
 QuasR Quantify and Annotate Short Reads in R.
 RapMap A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to
Transcriptomes.
 RNAeXpress Can be run with Java GUI or command line on Mac, Windows, and Linux. It
can be configured to perform read counting, feature detection or GTF comparison on mapped
rnaseq data.
 Rcount Rcount: simple and flexible RNA-Seq read counting.
 rDiff is a tool that can detect differential RNA processing (e.g. alternative splicing,
polyadenylation or ribosome occupancy).
 RNASeqPower Calculating samples Size estimates for RNA Seq studies. R package
version.
 RNA-Skim RNA-Skim: a rapid method for RNA-Seq quantification at transcript-level.
 rSeq rSeq is a set of tools for RNA-Seq data analysis. It consists of programs that deal with
many aspects of RNA-Seq data analysis, such as read quality assessment, reference sequence
generation, sequence mapping, gene and isoform expressions (RPKMs) estimation, etc.
 RSEM
 rQuant is a web service (Galaxy (computational biology) installation) that determines
abundances of transcripts per gene locus, based on quadratic programming. rQuant is able to
evaluate biases introduced by experimental conditions. A combination of tools is employed:
PALMapper (reads alignment), mTiM and mGene (inference of new transcripts).
 Salmon is a software tool for computing transcript abundance from RNA-seq data using
either an alignment-free (based directly on the raw reads) or an alignment-based (based on pre-
computed alignments) approach. It uses an online stochastic optimization approach to maximize
the likelihood of the transcript abundances under the observed data. The software itself is
capable of making use of many threads to produce accurate quantification estimates quickly. It
is part of the Sailfish suite of software, and is the successor to the Sailfish tool.
 SAJR is a java-written read counter and R-package for differential splicing analysis. It uses
junction reads to estimate exon exclusion and reads mapped within exon to estimate its
inclusion. SAJR models it by GLM with quasibinomial distribution and uses log likelihood test to
assess significance.
 Scotty Performs power analysis to estimate the number of replicates and depth of
sequencing required to call differential expression.
 Seal alignment-free algorithm to quantify sequence expression by matching kmers between
raw reads and a reference transcriptome. Handles paired reads and alternate isoforms, and
uses little memory. Accepts all common read formats, and outputs read counts, coverage, and
FPKM values per reference sequence. Open-source, written in pure Java; supports all platforms
with no recompilation and no other dependencies. Distributed with BBMap. (Seal - Sequence
Expression AnaLyzer - is unrelated to the SEAL distributed short-read aligner.)
 semisup[62] Semi-supervised mixture model: detecting SNPs with interactive effects on a
quantitative trait
 Sleuth is a program for analysis of RNA-Seq experiments for which transcript abundances
have been quantified with kallisto.
 SplicingCompass differential splicing detection using RNA-Seq data.
 sSeq The purpose of this R package is to discover the genes that are differentially
expressed between two conditions in RNA-seq experiments.
 StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential
transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step
to assemble and quantitate full-length transcripts representing multiple splice variants for each
gene locus. It was designed as a successor to Cufflinks (its developers include some of the
Cufflinks developers) and has many of the same features, but runs far faster and in far less
memory.
 TIGAR Transcript isoform abundance estimation method with gapped alignment of RNA-Seq
data by variational Bayesian inference.
 TimeSeq Detecting Differentially Expressed Genes in Time Course RNA-Seq Data.
 WemIQ is a software tool to quantify isoform expression and exon splicing ratios from RNA-
seq data accurately and robustly.
Evaluation of quantification and differential expression[edit]
 CompcodeR RNAseq data simulation, differential expression analysis and performance
comparison of differential expression methods.
 DEAR-O Differential Expression Analysis based on RNA-seq data – Online.
 PROPER comprehensive power evaluation for differential expression using RNA-seq.
 RNAontheBENCH computational and empirical resources for benchmarking RNAseq
quantification and differential expression methods.
 rnaseqcomp Several quantitative and visualized benchmarks for RNA-seq quantification
pipelines. Two-condition quantifications for genes, transcripts, junctions or exons by each
pipeline with nessasery meta information should be organizd into numeric matrices in order to
proceed the evaluation.
Multi-tool solutions[edit]
 DEB is a web-interface/pipeline that permits to compare results of significantly expressed
genes from different tools. Currently are available three algorithms: edgeR, DESeq and bayseq.
 SARTools A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis
of RNA-Seq Data.
Transposable Element expression[edit]
 TeXP is a Transposable Element quantification pipeline that deconvolves pervasive
transcription from autonomous transcription of LINE-1 elements. [63]

Michael Agostino - Practical Bioinformatics-Garland Science (2013)
No ratings yet
Michael Agostino - Practical Bioinformatics-Garland Science (2013)
397 pages
Bioinformatics Quiz: Test Your Knowledge of Bioinformatics
56% (18)
Bioinformatics Quiz: Test Your Knowledge of Bioinformatics
16 pages
Herbarium Format Project
No ratings yet
Herbarium Format Project
34 pages
Ethnobotany Notes
100% (1)
Ethnobotany Notes
4 pages
Bootstrapping PRESENTATION BY GROUP 4
100% (4)
Bootstrapping PRESENTATION BY GROUP 4
31 pages
Sequence Retrieval System
No ratings yet
Sequence Retrieval System
2 pages
Data Retrival Systems
No ratings yet
Data Retrival Systems
3 pages
Btisnet: Objective, Structure &functions
No ratings yet
Btisnet: Objective, Structure &functions
12 pages
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
100% (3)
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
9 pages
Bioinformatics in PAM AND BLOSUM
100% (15)
Bioinformatics in PAM AND BLOSUM
17 pages
Common Indian Mammals and Birds PDF
100% (1)
Common Indian Mammals and Birds PDF
13 pages
Sequence File Formats
No ratings yet
Sequence File Formats
22 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Numerical Taxonomy
88% (8)
Numerical Taxonomy
4 pages
Biostatistics (MB)
0% (1)
Biostatistics (MB)
93 pages
Sources of Literature Review: Presented BY Kilda.S Associate Professor, MSN Dept
No ratings yet
Sources of Literature Review: Presented BY Kilda.S Associate Professor, MSN Dept
10 pages
Gene Prediction
25% (4)
Gene Prediction
36 pages
Sources, Acquisition and Classification of Data
No ratings yet
Sources, Acquisition and Classification of Data
6 pages
Ethnobotany Scope and Importance
100% (1)
Ethnobotany Scope and Importance
43 pages
Zoology Report 1st Year - 1
68% (34)
Zoology Report 1st Year - 1
15 pages
Homeotic Mutants in Arabidopsis and Antirrhinum
100% (2)
Homeotic Mutants in Arabidopsis and Antirrhinum
15 pages
SCOP and CATH Database
100% (5)
SCOP and CATH Database
22 pages
1.history of Taxonomy and Its Importance
No ratings yet
1.history of Taxonomy and Its Importance
9 pages
CATH, Bilogical Data Bases, Bioinformatics Data Base
No ratings yet
CATH, Bilogical Data Bases, Bioinformatics Data Base
3 pages
DNA As Genetic Material PDF
No ratings yet
DNA As Genetic Material PDF
12 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
UNIT IV & V PHD Notes
100% (1)
UNIT IV & V PHD Notes
5 pages
Complexity of EUKARYOTic Genome
No ratings yet
Complexity of EUKARYOTic Genome
27 pages
Pteridophytes General Characters PPT by Easybiologyclass
No ratings yet
Pteridophytes General Characters PPT by Easybiologyclass
27 pages
Totipotency and Embryo Culture
No ratings yet
Totipotency and Embryo Culture
31 pages
Application of Ethnoecology
100% (2)
Application of Ethnoecology
5 pages
Biosystamtics and Taxonomy First Unit
88% (8)
Biosystamtics and Taxonomy First Unit
13 pages
MSC Zoology Free Download PDF
75% (8)
MSC Zoology Free Download PDF
51 pages
LYGINOPTERIS OLDHAMIA B.Sc. Part II Botany Hons. Prof. (DR.) Manorma Kumari, Botany, ANC
100% (1)
LYGINOPTERIS OLDHAMIA B.Sc. Part II Botany Hons. Prof. (DR.) Manorma Kumari, Botany, ANC
6 pages
Cell Culture Based Vaccine
No ratings yet
Cell Culture Based Vaccine
11 pages
Fate Maps: Course: B.Sc. (H) Zoology VI Semester Paper: Developmental Biology Faculty: Dr. Priya Goel
No ratings yet
Fate Maps: Course: B.Sc. (H) Zoology VI Semester Paper: Developmental Biology Faculty: Dr. Priya Goel
17 pages
5905 Et 21-Biosystematics-Et
100% (1)
5905 Et 21-Biosystematics-Et
10 pages
Animal Cell Culture PRINT
67% (3)
Animal Cell Culture PRINT
22 pages
Biological Databases Genbank
No ratings yet
Biological Databases Genbank
31 pages
Milestones in Genetic Engineering
100% (5)
Milestones in Genetic Engineering
2 pages
Punam Jaiswal UG-II Anomalous Secondary Growth in Dracaena
100% (1)
Punam Jaiswal UG-II Anomalous Secondary Growth in Dracaena
4 pages
Unit - II
100% (1)
Unit - II
13 pages
Method of Soil Fertility Evaluation
0% (1)
Method of Soil Fertility Evaluation
11 pages
Prospectus Botany
No ratings yet
Prospectus Botany
28 pages
Fundamentals of Plant Pathology-By-R-S-Mehrotra
100% (1)
Fundamentals of Plant Pathology-By-R-S-Mehrotra
6 pages
F.y.b.sc - Botany Sem. I, II
100% (2)
F.y.b.sc - Botany Sem. I, II
15 pages
ICBN PPT 17
No ratings yet
ICBN PPT 17
15 pages
Scope of Immunology
100% (6)
Scope of Immunology
6 pages
Algae Questions
No ratings yet
Algae Questions
6 pages
Genetic Mapping and Interference and Coincidence
100% (1)
Genetic Mapping and Interference and Coincidence
17 pages
Chromosomal Aberration
75% (4)
Chromosomal Aberration
7 pages
Plant Systematics by Op Sharma Chapter 04
100% (1)
Plant Systematics by Op Sharma Chapter 04
9 pages
Biosist2010 06
67% (3)
Biosist2010 06
32 pages
Microbilogy Sem - 1 & 2
No ratings yet
Microbilogy Sem - 1 & 2
20 pages
c4 Cycle
100% (1)
c4 Cycle
24 pages
Estimation of Citric Acid From Aspergillus SP
100% (3)
Estimation of Citric Acid From Aspergillus SP
4 pages
Botanical Survey of India
100% (3)
Botanical Survey of India
7 pages
Biological Search Engines
No ratings yet
Biological Search Engines
3 pages
KEGG
No ratings yet
KEGG
6 pages
Tools in Bioinformatics
100% (1)
Tools in Bioinformatics
17 pages
Assignent-01/Abhishek Mishra/HBTI Kanpur Bioinformatics-Programs & Tools
No ratings yet
Assignent-01/Abhishek Mishra/HBTI Kanpur Bioinformatics-Programs & Tools
9 pages
Module 5
No ratings yet
Module 5
23 pages
Module_44_Reading_Guide_KEY.pdf
No ratings yet
Module_44_Reading_Guide_KEY.pdf
9 pages
Comparative Genomics
No ratings yet
Comparative Genomics
14 pages
New Trends in Taxonomy - 1-1
No ratings yet
New Trends in Taxonomy - 1-1
11 pages
Structura Chimica Denumirea Uzuala A Medicamentului (Drug Common Name) Denumiri Sinonime Ale Medicamentului (Drug Name Synonyms)
No ratings yet
Structura Chimica Denumirea Uzuala A Medicamentului (Drug Common Name) Denumiri Sinonime Ale Medicamentului (Drug Name Synonyms)
16 pages
Al-Kindi Et Al-2020-Journal of Genetic Engineering and Biotechnology
No ratings yet
Al-Kindi Et Al-2020-Journal of Genetic Engineering and Biotechnology
13 pages
Kolc Et Al. - 2019 - A Systematic Review and Meta-Analysis of 271 PCDH1
No ratings yet
Kolc Et Al. - 2019 - A Systematic Review and Meta-Analysis of 271 PCDH1
11 pages
Genomic Databases - Analysis Tools
No ratings yet
Genomic Databases - Analysis Tools
87 pages
Engineering Natural Product Biosynthesis Elizabeth Skellam - The complete ebook is available for download with one click
100% (1)
Engineering Natural Product Biosynthesis Elizabeth Skellam - The complete ebook is available for download with one click
68 pages
Protein Sequence Analysis
No ratings yet
Protein Sequence Analysis
44 pages
Sequence Alignment
No ratings yet
Sequence Alignment
17 pages
Lab 3 - Multiple Sequence Alignment: Bioinformatic Methods I Lab 3
No ratings yet
Lab 3 - Multiple Sequence Alignment: Bioinformatic Methods I Lab 3
14 pages
Significance of c1Q in Matriq Biology
No ratings yet
Significance of c1Q in Matriq Biology
52 pages
Comparative Genomics 2 - PART 1
No ratings yet
Comparative Genomics 2 - PART 1
31 pages
07 Phylogenetic Reconstruction
No ratings yet
07 Phylogenetic Reconstruction
55 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
All Quiz Questions Quiz - 02
No ratings yet
All Quiz Questions Quiz - 02
9 pages
Non-Coding Rna Prediction of Clinically Important Genomic Analysis
No ratings yet
Non-Coding Rna Prediction of Clinically Important Genomic Analysis
44 pages
Module4 Session1 Prac Lucy Nakabazzi 2
100% (1)
Module4 Session1 Prac Lucy Nakabazzi 2
3 pages
2014, Nilsson Acan
No ratings yet
2014, Nilsson Acan
9 pages
New Approaches To Prokaryotic Systematics-Elsevier - Academic Press (2014)
100% (1)
New Approaches To Prokaryotic Systematics-Elsevier - Academic Press (2014)
329 pages
Immediate download Congenital Endocrinopathies 1st Edition Renata Lorini ebooks 2024
100% (27)
Immediate download Congenital Endocrinopathies 1st Edition Renata Lorini ebooks 2024
59 pages
Computational Methods in Synthetic Biology - Mario Andrea
0% (1)
Computational Methods in Synthetic Biology - Mario Andrea
371 pages
Bioinformatics Molecular Biology
No ratings yet
Bioinformatics Molecular Biology
24 pages
4th YEAR
No ratings yet
4th YEAR
47 pages
Interpretation
No ratings yet
Interpretation
2 pages
Madusanka Et Al, 2019, Galectin-8 Sebates Schlegelii
No ratings yet
Madusanka Et Al, 2019, Galectin-8 Sebates Schlegelii
14 pages
Download Complete Algorithms for Computational Biology First International Conference AlCoB 2014 Tarragona Spain July 1 3 2014 Proceedigns 1st Edition Adrian-Horia Dediu PDF for All Chapters
No ratings yet
Download Complete Algorithms for Computational Biology First International Conference AlCoB 2014 Tarragona Spain July 1 3 2014 Proceedigns 1st Edition Adrian-Horia Dediu PDF for All Chapters
36 pages
Biochemistry. ISBN 1429229365, 978-1429229364
100% (10)
Biochemistry. ISBN 1429229365, 978-1429229364
23 pages