Meisetz and the birth of the KRAB motif
The largest family of transcription factors in mammals is of Cys 2 His 2 zinc finger-proteins, each with an NH 2 -terminal KRAB motif. Extensive expansions of this family have occurred in separate mammalian lineages, with ~400 such genes known in ...
Predicted function of the vaccinia virus G5R protein
Motivation: Of the ~200 proteins that have been identified for the vaccinia virus (VACV) genome, many are currently listed as having an unknown function, and seven of these are also found in all other poxvirus genomes that have been sequenced. The ...
Modelling interaction sites in protein domains with interaction profile hidden Markov models
Motivation: Due to the growing number of completely sequenced genomes, functional annotation of proteins becomes a more and more important issue. Here, we describe a method for the prediction of sites within protein domains, which are part of protein-...
Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA
Motivation: Predicting cis -regulatory modules (CRMs) in higher eukaryotes is a challenging computational task. Commonly used methods to predict CRMs based on the signal of transcription factor binding sites (TFBS) are limited by prior information ...
Application of a simple likelihood ratio approximant to protein sequence classification
Motivation: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences.
...
Adding sequence context to a Markov background model improves the identification of regulatory elements
Motivation: Many computational methods for identifying regulatory elements use a likelihood ratio between motif and background models. Often, the methods use a background model of independent bases. At least two different Markov background models ...
ProtBuD: a database of biological unit structures of protein families and superfamilies
Motivation: Modeling of protein interactions is often possible from known structures of related complexes. It is often time-consuming to find the most appropriate template. Hypothesized biological units (BUs) often differ from the asymmetric units ...
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules
- Dong Wang,
- Yingli Lv,
- Zheng Guo,
- Xia Li,
- Yanhui Li,
- Jing Zhu,
- Da Yang,
- Jianzhen Xu,
- Chenguang Wang,
- Shaoqi Rao,
- Baofeng Yang
Motivation: Microarrays datasets frequently contain a large number of missing values (MVs), which need to be estimated and replaced for subsequent data mining. The focus of the paper is to study the effects of different MV treatments for cDNA ...
A scalable method for integration and functional analysis of multiple microarray datasets
Motivation: The diverse microarray datasets that have become available over the past several years represent a rich opportunity and challenge for biological data mining. Many supervised and unsupervised methods have been developed for the analysis of ...
Large scale data mining approach for gene-specific standardization of microarray gene expression data
Motivation: The identification of the change of gene expression in multifactorial diseases, such as breast cancer is a major goal of DNA microarray experiments. Here we present a new data mining strategy to better analyze the marginal difference in ...
Robust method for detecting differential gene expression in twin studies
Motivation: A steadily increasing number of experiments with microarrays stimulate the further development of the statistical methods of the analysis of gene expression data. One of the central problems in this area is detecting differential gene ...
Evaluating the performance of microarray segmentation algorithms
Motivation: Although numerous algorithms have been developed for microarray segmentation, extensive comparisons between the algorithms have acquired far less attention. In this study, we evaluate the performance of nine microarray segmentation ...
Cell++---simulating biochemical pathways
Motivation: With the generation of a wealth of information, detailing cellular components, their functions and interactions, there is a growing need for the development of new computational tools capable of interpreting these data within spatial and ...
GATHER: a systems approach to interpreting genomic signatures
Motivation: Understanding the full meaning of the biology captured in molecular profiles, within the context of the entire biological system, cannot be achieved with a simple examination of the individual genes in the signature. To facilitate such an ...
Babel's tower revisited: a universal resource for cross-referencing across annotation databases
Motivation: Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological ...
CREMOFAC---a database of chromatin remodeling factors
Motivation: Chromatin-remodeling is an important event in the eukaryotic nucleus rendering nucleosomal DNA accessible for various transaction processes. Remodeling Factors facilitate the dynamic nature of chromatin through participation of the ...
ChromoScan: a scan statistic application for identifying chromosomal regions in genomic studies
Summary: ChromoScan is an implementation of a genome-based scan statistic that detects genomic regions, which are statistically significant for targeted measurements, such as genetic associations with disease, gene expression profiles, DNA copy ...
FoldUnfold: web server for the prediction of disordered regions in protein chain
Summary: Identification of disordered regions in polypeptide chains is very important because such regions are essential for protein function. A new parameter, namely mean packing density of residues has been introduced to detect disordered regions ...
COPA---cancer outlier profile analysis
Summary: Chromosomal translocations are common in cancer, and in some cases may be causal in the progression of the disease. Using microarrays, in which the expression of thousands of genes are simultaneously measured, could potentially allow one to ...
BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network
Summary: BNArray is a systemized tool developed in R. It facilitates the construction of gene regulatory networks from DNA microarray data by using Bayesian network. Significant sub-modules of regulatory networks with high confidence are ...
NMPP: a user-customized NimbleGen microarray data processing pipeline
Summary: NMPP package is a bundle of user-customized tools based on established algorithms and methods to process self-designed NimbleGen microarray data. It features a command-line-based integrative processing procedure that comprises five major ...
Extending MapMan: application to legume genome arrays
Motivation: Based on a gene classification into hierarchical categories ('BINs'), MapMan was originally developed to display Arabidopsis thaliana gene expression in a functional context. We have created a bioinformatics system to extend MapMan to ...
ssSNPer: identifying statistically similar SNPs to aid interpretation of genetic association studies
Summary: ssSNPer is a novel user-friendly web interface that provides easy determination of the number and location of untested HapMap SNPs, in the region surrounding a tested HapMap SNP, which are statistically similar and would thus produce ...
MetaQuant: a tool for the automatic quantification of GC/MS-based metabolome data
Summary: MetaQuant is a Java-based program for the automatic and accurate quantification of GC/MS-based metabolome data. In contrast to other programs MetaQuant is able to quantify hundreds of substances simultaneously with minimal manual ...
Supporting the SBML layout extension
Motivation: Researchers studying large or complex biochemical networks would benefit from the ability to automatically create lucid visualizations and store them in a portable and widely accepted format.
Summary: Two modules, SBMLSupportLayout and ...
The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks
Summary: Biological processes involve complex networks of interactions between molecules. Various large-scale experiments and curation efforts have led to preliminary versions of complete cellular networks for a number of organisms. To grapple with ...
TimeTree: a public knowledge-base of divergence times among organisms
Summary: Biologists and other scientists routinely need to know times of divergence between species and to construct phylogenies calibrated to time (timetrees). Published studies reporting time estimates from molecular data have been increasing ...