A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide Measurements
Estimation of pairwise correlation from incomplete and replicated molecular profiling data is an ubiquitous problem in pattern discovery analysis, such as clustering and networking. However, existing methods solve this problem by ad hoc data imputation, ...
A Novel Knowledge-Driven Systems Biology Approach for Phenotype Prediction upon Genetic Intervention
Deciphering the biological networks underlying complex phenotypic traits, e.g., human disease is undoubtedly crucial to understand the underlying molecular mechanisms and to develop effective therapeutics. Due to the network complexity and the ...
A Preprocessing Procedure for Haplotype Inference by Pure Parsimony
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective ...
An Efficient Algorithm for Approximating Geodesic Distances in Tree Space
The increasing use of phylogeny in biological studies is limited by the need to make available more efficient tools for computing distances between trees. The geodesic tree distance—introduced by Billera, Holmes, and Vogtmann—combines both the tree ...
Continuous Cotemporal Probabilistic Modeling of Systems Biology Networks from Sparse Data
Modeling of biological networks is a difficult endeavor, but exploration of this problem is essential for understanding the systems behavior of biological processes. In this contribution, developed for sparse data, we present a new continuous Bayesian ...
Designing Logical Rules to Model the Response of Biomolecular Networks with Complex Interactions: An Application to Cancer Modeling
We discuss the propagation of constraints in eukaryotic interaction networks in relation to model prediction and the identification of critical pathways. In order to cope with posttranslational interactions, we consider two types of nodes in the network,...
Efficient Localization of Hot Spots in Proteins Using a Novel S-Transform Based Filtering Approach
Protein-protein interactions govern almost all biological processes and the underlying functions of proteins. The interaction sites of protein depend on the 3D structure which in turn depends on the amino acid sequence. Hence, prediction of protein ...
Fast Flexible Modeling of RNA Structure Using Internal Coordinates
Modeling the structure and dynamics of large macromolecules remains a critical challenge. Molecular dynamics (MD) simulations are expensive because they model every atom independently, and are difficult to combine with experimentally derived knowledge. ...
Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees
A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold \delta. A gene team tree is a succinct ...
Metasample-Based Sparse Representation for Tumor Classification
A reliable and accurate identification of the type of tumors is crucial to the proper treatment of cancers. In recent years, it has been shown that sparse representation (SR) by l_1-norm minimization is robust to noise, outliers and even incomplete ...
Multiple Sequence Assembly from Reads Alignable to a Common Reference Genome
We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are ...
Parameterized Algorithmics for Finding Connected Motifs in Biological Networks
We study the NP-hard List-Colored Graph Motif problem which, given an undirected list-colored graph G=(V,E) and a multiset M of colors, asks for maximum-cardinality sets S\subseteq V and M^{\prime }\subseteq M such that G[S] is connected and contains ...
Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences
Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods ...
SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems
The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about ...
SEGA: Semiglobal Graph Alignment for Structure-Based Protein Comparison
Comparative analysis is a topic of utmost importance in structural bioinformatics. Recently, a structural counterpart to sequence alignment, called multiple graph alignment, was introduced as a tool for the comparison of protein structures in general ...
SLIDER: A Generic Metaheuristic for the Discovery of Correlated Motifs in Protein-Protein Interaction Networks
Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein ...
Some Mathematical Refinements Concerning Error Minimization in the Genetic Code
The genetic code is known to have a high level of error robustness and has been shown to be very error robust compared to randomly selected codes, but to be significantly less error robust than a certain code found by a heuristic algorithm. We formulate ...
Using Kernel Alignment to Select Features of Molecular Descriptors in a QSAR Study
Quantitative structure-activity relationships (QSARs) correlate biological activities of chemical compounds with their physicochemical descriptors. By modeling the observed relationship seen between molecular descriptors and their corresponding ...
A Mathematical Model for the Validation of Gene Selection Methods
Gene selection methods aim at determining biologically relevant subsets of genes in DNA microarray experiments. However, their assessment and validation represent a major difficulty since the subset of biologically relevant genes is usually unknown. To ...
A SAT-Based Algorithm for Finding Attractors in Synchronous Boolean Networks
This paper addresses the problem of finding attractors in synchronous Boolean networks. The existing Boolean decision diagram-based algorithms have limited capacity due to the excessive memory requirements of decision diagrams. The simulation-based ...
Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model
We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd\cdot 17.97^d) time for DNA strings and in O(nL + nd\cdot 61.86^d) time for protein strings, where n is the number of input strings, L is the length of ...
On the Distribution of the Number of Cycles in the Breakpoint Graph of a Random Signed Permutation
We use the finite Markov chain embedding technique to obtain the distribution of the number of cycles in the breakpoint graph of a random uniform signed permutation. This further gives a very good approximation of the distribution of the reversal ...
Probabilistic Mixture Regression Models for Alignment of LC-MS Data
A novel framework of a probabilistic mixture regression model (PMRM) is presented for alignment of liquid chromatography-mass spectrometry (LC-MS) data with respect to retention time (RT) points. The expectation maximization algorithm is used to ...
Summarizing Probe Intensities of Affymetrix GeneChip 3' Expression Arrays Taking into Account Day-to-Day Variability
Microarray experiments are affected by several sources of variability. The paper demonstrates the major role of the day-to-day variability, it underlines the importance of a randomized block design when processing replicates over several days to avoid ...
The Quality Preserving Database: A Computational Framework for Encouraging Collaboration, Enhancing Power and Controlling False Discovery
The common scenario in computational biology in which a community of researchers conduct multiple statistical tests on one shared database gives rise to the multiple hypothesis testing problem. Conventional procedures for solving this problem control ...