Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
International audienceFinding a sequence of transpositions that transforms a given permutation into the identity permutation and is of the shortest possible length is an important problem in bioinformatics. Here, a transposition consists... more
International audienceFinding a sequence of transpositions that transforms a given permutation into the identity permutation and is of the shortest possible length is an important problem in bioinformatics. Here, a transposition consists in exchanging two contiguous intervals of the permutation. Bafna and Pevzner introduced the cycle graph as a tool for working on this problem. In particular, they took advantage of the decomposition of the cycle graph into so-called alternating cycles. Later, Hultman raised the question of determining the number of permutations with a cycle graph containing a given quantity of alternating cycles. The resulting number is therefore similar to the Stirling number of the first kind. We provide an explicit formula for computing what we call the Hultman numbers, and give a few numerical values. We also derive formulae for related cases, as well as for a much more general problem. Finally, we indicate a counting result related to another operation on permu...
The prefix exchange distance of a permutation is the length of its shortest factorisation into transpositions that all contain 1. Using a probabilistic approach, we obtain expressions for the mean and the variance, and prove the... more
The prefix exchange distance of a permutation is the length of its shortest factorisation into transpositions that all contain 1. Using a probabilistic approach, we obtain expressions for the mean and the variance, and prove the asymptotic normality of the distribution of this distance for a random permutation verifying the Ewens sampling formula. Analogous results in the uniform setting follow as simple corollaries.
In this paper, we study the problem of sorting unichromosomal linear genomes by prefix double-cut-and-joins (or DCJs) in both the signed and the unsigned settings. Prefix DCJs cut the leftmost segment of a genome and any other segment,... more
In this paper, we study the problem of sorting unichromosomal linear genomes by prefix double-cut-and-joins (or DCJs) in both the signed and the unsigned settings. Prefix DCJs cut the leftmost segment of a genome and any other segment, and recombine the severed endpoints in one of two possible ways: one of these options corresponds to a prefix reversal, which reverses the order of elements between the two cuts (as well as their signs in the signed case). Depending on whether we consider both options or reversals only, our main results are: (1) new structural lower bounds based on the breakpoint graph for sorting by unsigned prefix reversals, unsigned prefix DCJs, or signed prefix DCJs; (2) a polynomial-time algorithm for sorting by signed prefix DCJs, thus answering an open question in [8]; (3) a 3/2-approximation for sorting by unsigned prefix DCJs, which is, to the best of our knowledge, the first sorting by prefix rearrangements problem that admits an approximation ratio strictly smaller than 2 (with the obvious exception of the polynomial-time solvable problems); and finally, (4) an FPT algorithm for sorting by unsigned prefix DCJs parameterised by the number of breakpoints in the genome.
We study two problems motivated by computational biology: genome rearrangements, which under some assumptions can be recast as the problem of sorting a permutation (therefore viewed as a linear ordering) using as few allowed moves as... more
We study two problems motivated by computational biology: genome rearrangements, which under some assumptions can be recast as the problem of sorting a permutation (therefore viewed as a linear ordering) using as few allowed moves as possible, and the construction of haplotype networks, which generalise haplotype trees in that they allow multiple paths between species. Our main contributions are:• new upper bounds and formulae for computing the exact transposition distance of many permutations (a problem of unknown ...
This paper reports on the use of the FO (·) language and the IDP framework for modeling and solving some machine learning and data mining tasks. The core component of a model in the IDP framework is an FO (·) theory consisting of formulas... more
This paper reports on the use of the FO (·) language and the IDP framework for modeling and solving some machine learning and data mining tasks. The core component of a model in the IDP framework is an FO (·) theory consisting of formulas in first order logic and definitions; the latter are basically logic programs where clause bodies can have arbitrary first order formulas. Hence, it is a small step for a well-versed computer scientist to start modeling. We describe some models resulting from the collaboration between IDP ...
We initiate the study of sorting permutations using prefix block-interchanges, which exchange any prefix of a permutation with another non-intersecting interval. The goal is to transform a given permutation into the identity permutation... more
We initiate the study of sorting permutations using prefix block-interchanges, which exchange any prefix of a permutation with another non-intersecting interval. The goal is to transform a given permutation into the identity permutation using as few such operations as possible. We give a 2-approximation algorithm for this problem, show how to obtain improved lower and upper bounds on the corresponding distance, and determine the largest possible value for that distance. 2012 ACM Subject Classification Theory of computation → Design and analysis of algorithms The problem of transforming two sequences into one another using a specified set of operations has received a lot of attention in the last decades, with applications in computational biology as (genome) rearrangement problems [13] as well as interconnection network design [21]. In the context of permutations, it can be equivalently formulated as follows: given a permutation π of [n] = {1, 2,. .. , n} and a generating set S (also consisting of permutations of [n]), find a minimum-length sequence of elements from S that sorts π. The problem is known to be NP-hard in general [15] and W[1]-hard when parameterised by the length of a solution [6], but some families of operations that are important in applications lead to problems that can be solved in polynomial time (e.g. exchanges [17], block-interchanges [10] and signed reversals [14]), while other families yield hard problems that admit good approximations (e.g. 11/8 for reversals [3] and for block-transpositions [12]). Several restrictions of these families have also been studied, one of which stands out in the field of interconnection network design: the so-called prefix constraint, which forces operations to act on a prefix of the permutation rather than on an arbitrary interval. Those restrictions were introduced as a way of reducing the size of the generated network while maintaining a low value for its diameter, thereby guaranteeing a low maximum communication delay [21]. The most famous example is perhaps the restriction of reversals (which reverse the order of elements along an interval) to prefix reversals, and the corresponding problem known as pancake flipping, introduced in [16] and whose complexity was only settled thirty years later [5]. As Table 1 shows (see [13] for undefined terms), although sorting problems using interval transformations are now fairly well understood, progress on the corresponding prefix sorting problems has been lacking, with only two families whose status has been settled and no approximation ratio smaller than 2 for those problems not known to be in P. As a result, while the topology of the Cayley graph generated by those operations might present attractive properties, efficient routing algorithms (which achieve exactly the same task as the sorting algorithms in genome rearrangements) are still needed for the network to be of practical interest.

And 25 more

Research Interests:
We study the problem of computing the minimal number of adjacent, non-intersecting block interchanges required to transform a permutation into the identity permutation. In particular, we use the graph of a permutation to compute that... more
We study the problem of computing the minimal number of adjacent, non-intersecting block interchanges required to transform a permutation into the identity permutation. In particular, we use the graph of a permutation to compute that number for a particular class of permutations in linear time and space, and derive a new tight upper bound on the so-called transposition distance.
Sorting by transpositions consists in finding a shortest sequence of interval displacements that sorts a given permutation; the length of such a sequence is referred to as the permutation's distance (to the identity permutation). The... more
Sorting by transpositions consists in finding a shortest sequence of interval displacements that sorts a given permutation; the length of such a sequence is referred to as the permutation's distance (to the identity permutation). The computational complexity of the sorting problem, as well as that of computing the distance, is unknown, as is the largest value that this distance can reach. We present a novel approach that allows us to give a simple characterisation of a class of permutations whose distance can be computed in linear time and space, and give a new general upper bound on the distance of any permutation. Modifying the structure of the aforementioned permutations allows us to derive other tractable classes, which we also describe.
Genome rearrangement is the field in bioinformatics that studies and models how a collection of genomes evolve and is generally expressed in the following way: given two (or more) genomes, find a shortest sequence of mutations that... more
Genome rearrangement is the field in bioinformatics that studies and models how a collection of genomes evolve and is generally expressed in the following way: given two (or more) genomes, find a shortest sequence of mutations that transform one into the other. Several models have been proposed, that differ either by the kind(s) of mutations taken into account or by the way genomes are represented, depending on the different biological assumptions that can be made. We review some of these models and known results, explain how computers have already helped in this area and suggest some further possible uses for them.
Genome rearrangement problems are concerned, from a combinatorial point of view, with sorting ordered sets of elements in as few moves as possible, using a prescribed set of operations. The nature of those ordered sets may vary, but a... more
Genome rearrangement problems are concerned, from a combinatorial point
of view, with sorting ordered sets of elements in as few moves as possible, using
a prescribed set of operations. The nature of those ordered sets may vary, but a large part of the literature is concerned with permutations or signed
permutations, possibly of multisets.
A traditional tool that has been the basis of most algorithmic and other theoretical
results in the field is known as the ``breakpoint graph'', a bicoloured
graph that models both our goal and our present situation and whose decomposition
into alternating cycles yields extremely good bounds for those kinds of
problems. In this talk, we present results that were obtained using tools that
are best known to mathematicians, namely the disjoint cycle decomposition
of permutations. More precisely, we will show that the decomposition can be
used both as a means to obtain bounds on one of the aforementioned sorting
problems, whose complexity is still unknown, and as a way to solve a counting
problem related to the structure of the breakpoint graph.
In 2005 Cassens, Mardulyn and Milinkovitch proposed a new method for constructing phylogenetic networks in the context of intraspecific genealogies, also known as ``haplotype networks''. The proposed method, called ``Union of Maximum... more
In 2005 Cassens, Mardulyn and Milinkovitch proposed a new method for constructing phylogenetic networks in the context of intraspecific genealogies, also known as ``haplotype networks''. The proposed method, called ``Union of Maximum Parsimonious Trees'', is based on the global maximum parsimony approach, which aims at combining all most parsimonious trees into a single graph (in that context, trees are unrooted and undirected). However, their algorithm makes a number of arbitrary choices, produces solutions whose quality depends on the order in which the merging process is performed, and is a heuristic with an unclear objective function. We propose a combinatorial optimisation problem that can be used as a formal model for building such a graph, which consists in finding the minimum common supergraph of a given set of partially labelled trees. We propose a polynomial-time algorithm for solving the problem on two trees of a certain class, and a branch-and-bound algorithm in the case of two arbitrary trees. We will also discuss possible approaches when dealing with more than two trees.
Computing distances between permutations constitutes a topic with a number of applications, including interconnection network design and the study of genome rearrangements. Those distances are defined as the minimum number of moves... more
Computing distances between permutations
constitutes a topic with a number of applications, including interconnection
network design and the study of genome rearrangements. Those distances are
defined as the minimum number of moves needed to transform one permutation
into the other (allowed transformations being constrained beforehand). In
this talk, I will present a new formulation of a structure ubiquitous in the
study of genome rearrangements, known as the ``cycle graph'' or ``breakpoint
graph'' of a permutation, which recasts that structure as an even permutation.
This new point of view allows to restate every edit distance computation
problem, as long as it deals with permutations and ``revertible''
rearrangements, in terms of particular factorisations of an(other) even
permutation. I will show how this method allows, on the one hand, to recover
known results about genome rearrangements and to derive new ones, and on the
other hand, to solve counting problems related to the breakpoint graph.
A number of fields, including genome rearrangements and interconnection network design, are concerned with sorting permutations in ``as few moves as possible'', using a given set of allowed operations. These often act on just one or two... more
A number of fields, including genome rearrangements and interconnection network design, are concerned with sorting permutations in ``as few moves as possible'', using a given set of allowed operations. These often act on just one or two segments of the permutation, e.g. by reversing one segment or exchanging two segments. The \emph{cycle graph} of the permutation to sort is a fundamental tool in the theory of genome rearrangements. In this paper, we present an algebraic reinterpretation of the cycle graph as an even permutation, and show how to reformulate our sorting problems in terms of particular factorisations of the latter permutation. Using our framework, we recover known results in a simple and unified way, and obtain a new lower bound on the \emph{prefix transposition distance} (where a \emph{prefix transposition} displaces the initial segment of a permutation), which is shown to outperform previous results. Moreover, we use our approach to improve the best known lower bound on the \emph{prefix transposition diameter} from $2n/3$ to $\left\lfloor\frac{3n+1}{4}\right \rfloor$.