David Fernández-Baca

Followers

Following

Mentions

Public Views

Interests

Uploads

Papers by David Fernández-Baca

A Polynomial-Time Algorithm for the Phylogeny Problem when the Number of Character States is Fixed

Download

On matroids and hierarchial graphs

Information Processing Letters, 1991

We describe a hierearchical algorithm for computing optimum bases of certain matroids defined on ... more

Parametric Analysis , Duality , and Lattice Polytopes ?

We study the parameter space decomposition induced by parametric optimization problems where the ... more We study the parameter space decomposition induced by parametric optimization problems where the score of each feasible solution is a linear function with integer coefficients. We show that for a large class of problems the number of regions in the decomposition is polynomial in the length of the input. The proof uses geometric duality and a classical result on lattice polytopes. We apply the result to re-derive a known bound for parametric stable marriage and to obtain new ones for parametric phylogeny construction and sequence comparison.

A Sequence-Pair-Classification-Based Method for Detecting and Correcting Under-Clustered Gene Families

Gene families are groups of genes that have descended from a common ancestral gene present in the... more Gene families are groups of genes that have descended from a common ancestral gene present in the species under study. Current, widely used gene family building algorithms can produce family clusters that may be fragmented or missing true family sequences (under-clustering). Here we present a classification method based on sequence pairs that, first, inspects given families for under-clustering and then predicts the missing sequences for the families using family-specific alignment score cutoffs. We have tested this method on a set of curated, gold-standard (“true”) families from the Yeast Gene Order Browser (YGOB) database, including 20 yeast species, as well as a test set of intentionally under-clustered (“deficient”) families derived from the YGOB families. For 83% of the modified yeast families, our pair-classification method was able to reliably detect under-clustering in “deficient” families that were missing 20% of sequences relative to the full/” true” families. We also atte...

Download

Deep Robust Framework for Protein Function Prediction using Variable-Length Protein Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019

Download

Enumerating All Maximal Frequent Subtrees

Given a collection of leaf-labeled trees on a common leafset and a fraction f ∈ ( 1 2 , 1 ] , a f... more Given a collection of leaf-labeled trees on a common leafset and a fraction f ∈ ( 1 2 , 1 ] , a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identi es FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has practical applications as a metric for tree similarity, for computing tree congruence, in detection horizontal gene transfer events and as a consensus approach. Enumerating FSTs extend the MAST problem by de nition and reveal additional subtrees not displayed by MAST. This can happen in two ways such a subtree is included in majority but not all of the input trees or such a subtree though included in all the input trees, does not have the maximum number of leaves. Further, FSTs can be enumerated on collections of trees having partially overlapping leafsets. MAST may not be useful here especially if the common overlap among leafsets is very low. Though very useful, the number of FSTs su er from combinatorial explosion just a single MAST can exhibit exponentially many FSTs. This limits both the size of the trees that can be enumerated and the ability to comprehend enumerated FSTs. To overcome this, we propose enumeration of maximal frequent subtrees (MFSTs). A MFST is a FST that is not a subtree to any other FST. The set of MFSTs is a compact non-redundant summary of all FSTs and is much smaller in size. Here we tackle the novel problem of enumerating all MFSTs in collections of phylogenetic trees. We demonstrate its utility in returning larger consensus trees in comparison to MAST. The current implementation is available on the web.

Constructing Large Conservative Supertrees

Lecture Notes in Computer Science, 2011

Download

Multi-parameter Minimum Spanning Trees

Lecture Notes in Computer Science, 2000

A framework for solving certain multidimensional parametric search problems in randomized linear ... more

LATIN 2012: Theoretical Informatics

Lecture Notes in Computer Science, 2012

We consider the following geometric alignment problem: Given a set of line segments in the plane,... more We consider the following geometric alignment problem: Given a set of line segments in the plane, find a convex region of smallest area that contains a translate of each input segment. This can be seen as a generalization of Kakeya’s problem of finding a convex region of smallest area such that a needle can be turned through 360 degrees within this region. Our main result is an optimal Θ(n log n)-time algorithm for our geometric alignment problem, when the input is a set of n line segments. We also show that, if the goal is to minimize the perimeter of the region instead of its area, then the optimum placement is when the midpoints of the segments coincide. Finally, we show that for any compact convex figure G, the smallest enclosing disk of G is a smallest-perimeter region containing a translate of any rotated copy of G.

Properties of Majority-Rule Supertrees

Systematic Biology, 2009

Download

A simple characterization of the minimal obstruction sets for three-state perfect phylogenies

Applied Mathematics Letters, 2012

Download

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed

We present an algorithm for determining whether a set of species, describedby the characters they... more We present an algorithm for determining whether a set of species, describedby the characters they exhibit, has a perfect phylogeny, assuming the maximumnumber of characters is fixed. This algorithm is simpler and faster than the knownalgorithms when the number of characters is at least 4.1 IntroductionA fundamental problem in biology is that of inferring the evolutionary history of a setof species, each of which is specified by the set of traits or characters that it exhibits [6, 7, 10, 11]. Information about evolutionary history can ...

Download

An efficient algorithm for testing the compatibility of phylogenies with nested taxa

Algorithms for Molecular Biology, 2017

Download

Fast Local Search for Unrooted Robinson-Foulds Supertrees

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012

Download

Graph triangulations and the compatibility of unrooted phylogenetic trees

Applied Mathematics Letters, 2011

Checking Phylogenetic Decisiveness in Theory and in Practice

Suppose we have a set $X$ consisting of $n$ taxa and we are given information from $k$ loci from ... more Suppose we have a set $X$ consisting of $n$ taxa and we are given information from $k$ loci from which to construct a phylogeny for $X$. Each locus offers information for only a fraction of the taxa. The question is whether this data suffices to construct a reliable phylogeny. The decisiveness problem expresses this question combinatorially. Although a precise characterization of decisiveness is known, the complexity of the problem is open. Here we relate decisiveness to a hypergraph coloring problem. We use this idea to (1) obtain lower bounds on the amount of coverage needed to achieve decisiveness, (2) devise an exact algorithm for decisiveness, (3) develop problem reduction rules, and use them to obtain efficient algorithms for inputs with few loci, and (4) devise an integer linear programming formulation of the decisiveness problem, which allows us to analyze data sets that arise in practice.

Download

Testing the agreement of trees with internal labels

Algorithms for Molecular Biology

Background A semi-labeled tree is a tree where all leaves as well as, possibly, some internal nod... more

Download

Testing the Agreement of Trees with Internal Labels

Bioinformatics Research and Applications

Download

Improved Lower Bounds on the Compatibility of Multi-State Characters

ArXiv, 2012

We study a long standing conjecture on the necessary and sufficient conditions for the compatibil... more We study a long standing conjecture on the necessary and sufficient conditions for the compatibility of multi-state c There exists a function f(r) such that, for any set C of r-state characters, C is compatible if and only if every subset of f(r) characters of C is compatible. We show that for every r ≥ 2, there exists an incompatible set C of ⌊ r ⌋·⌈ r ⌉+1 r-state characters such that every proper subset of C is compatible. Thus, f(r) ≥ ⌊ r ⌋ · ⌈ r ⌉ + 1 for every r ≥ 2. This improves the previous lower bound of f(r) ≥ r given by Meacham (1983), and generalizes the construction showing that f(4) ≥ 5 given by Habib and To (2011). We prove our result via a result on quartet compatibility that may be of independent interest: For every integer n ≥ 4, there exists an incompatible set Q of ⌊ n 2 2 ⌋·⌈ n 2 2 ⌉+ 1 quartets over n labels such that every proper subset of Q is compatible. We contrast this with a result on the compatibility of triplets: For every n ≥ 3, if R is an incompatible...

Download

EvoZip: Efficient Compression of Large Collections of Evolutionary Trees

ArXiv, 2019

Phylogenetic trees represent evolutionary relationships among sets of organisms. Popular phylogen... more Phylogenetic trees represent evolutionary relationships among sets of organisms. Popular phylogenetic reconstruction approaches typically yield hundreds to thousands of trees on a common leafset. Storing and sharing such large collection of trees requires considerable amount of space and bandwidth. Furthermore, the huge size of phylogenetic tree databases can make search and retrieval operations time-consuming. Phylogenetic compression techniques are specialized compression techniques that exploit redundant topological information to achieve better compression of phylogenetic trees. Here, we present EvoZip, a new approach for phylogenetic tree compression. On average, EvoZip achieves 71.6% better compression and takes 80.71% less compression time and 60.47% less decompression time than TreeZip, the current state-of-the-art algorithm for phylogenetic tree compression. While EvoZip is based on TreeZip, it betters TreeZip due to (a) an improved bipartition and support list encoding sch...

Download