arXiv:1812.08569v1 [q-bio.PE] 20 Dec 2018
Counting and Enumerating Galled Networks
Andreas DM Gunawan∗, Jeyaram Rathin†, Louxin Zhang‡
December 21, 2018
Abstract
Galled trees are widely studied as a recombination model in population genetics.
This class of phylogenetic networks is generalized into galled networks by relaxing a
structural condition. In this work, a linear recurrence formula is given for counting 1galled networks, which are galled networks satisfying the condition that each reticulate
node has only one leaf descendant. Since every galled network consists of a set of
1-galled networks stacked one on top of the other, a method is also presented to count
and enumerate galled networks.
1
Introduction
Phylogenetic networks have been used more and more frequently in evolutionary genomics
and population genetics in the past two decades [12, 18]. A rooted phylogenetic network
(RPN) is a rooted acyclic digraph in which all the sink nodes are of indegree 1 and there is
a unique source node called the root, where the former represent a set of taxa (e.g, species,
genes, or individuals in a population) and the latter represents the least common ancestor
of the taxa. Moreover, RPNs also satisfy the property that non-leaf and non-root nodes are
of either indegree 1 or outdegree 1; these nodes are called tree nodes and reticulate nodes,
respectively.
Imposing topological conditions on the network allows us to define different classes of
RPNs such as galled trees [13, 22], galled networks [15], tree-child networks [4], reticulationvisible networks [17] and tree-based networks [7, 24] (see also [21, 25]). A galled tree is a
binary RPN such that (i) for each reticulate node u, with the parents being denoted as p′ (u)
and p′′ (u), there are two edge-disjoint paths from the least common ancestor of p′ (u) and
p′′ (u) to u that contain only tree nodes except for u, and (ii) for any two reticulate nodes, the
paths in (i) do not overlap [22]. Later, galled networks were defined by Huson and Klöpper
to be RPNs that satisfy only Property (i) [15]. Reconstruction of galled networks has also
∗
Department of Mathematics, National University of Singapore, Singapore 119076
Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore - 641004, India. This work was done when he visited the National University of Singapore as an
exchange student.
‡
Department of Mathematics, National University of Singapore, Singapore 119076. Email: matzlx@nus.edu.sg
†
2 BASIC NOTATION
2
been studied in [16]. These network classes are of particular interest because they have
nice combinatorial properties. Moreover, some important NP-complete problems related to
phylogenetic trees and clusters can be solved in polynomial-time when restricted to these
classes [1, 9, 10, 23].
In this paper, we investigate how many galled networks exist over a set of taxa. Phylogenetic trees are RPNs without any reticulate node. It is well known that (2n − 3)!! binary
phylogenetic trees exist over n taxa. However, counting becomes much harder for general
RPNs. For example, even counting RPNs with a couple of reticulate nodes is challenging [8].
Recent advances in counting have been made for tree-child networks [8, 19] and galled trees
[2, 20]. Here, we provide a linear recurrence formula for finding the number of 1-galled networks, which are galled networks such that each reticulate node has only one leaf descendant.
The formula is obtained through a connection between galled networks and leaf-multi-labeled
(LML) trees. Since a galled network is essentially a set of 1-galled networks stacked one on
top of the other in a tree-like structure, we also present a general method for counting and
enumerating galled networks. Although counting LML trees was investigated by Czabarka
et al. [5], our results are not derived from their study.
The rest of this paper is divided into five sections. Section 2 introduces some basic
notation that are necessary for our study. Section 3 establishes the fact that 1-galled networks
have a one-to-one correspondence with the so-called dup-trees. Section 4 presents a linear
recurrence formula for counting 1-galled networks. Section 5 examines how to count and
enumerate general galled networks. Section 6 concludes the study with a few remarks.
2
Basic notation
2.1
Phylogenetic networks
A binary RPN over a finite set of taxa X is an acyclic digraph such that:
• there is a unique node of indegree 0 and outdegree 2, called its root;
• there are exactly |X| nodes of indegree 1 and outdegree 0, called the leaves of the RPN,
each labeled with a unique taxon in X; and
• each non-leaf/root node is either a reticulate node that is of indegree 2 and outdegree
1, or a tree node of indegree 1 and outdegree 2.
Three RPNs are illustrated in Figure 1, where each edge is directed away from the root and
edge orientation is not omitted. For a RPN N , we use V(N ) and A(N ) to denote its node
set and directed edge set, respectively.
Let u and v be two nodes of N . The node u is said to be a parent (resp. a child) of v
if (u, v) ∈ A(N ) (resp. (v, u) ∈ A(N )). Each reticulate node r has a unique child and we
denote this as c(r). Each tree node t has a unique parent and we denote this as p(t). In
general, u is an ancestor of v (or equivalently, v is below u) if there is a direct path from the
root of N to v that contains u.
A binary phylogenetic tree over a set of taxa X is simply a binary RPN containing no
reticulate nodes.
3
2 BASIC NOTATION
a
b
2
1
c
1
3
2
3
1
d
2
3
1
1
2
3
3
Figure 1: RPNs and trees over {1, 2, 3}, where reticulate and tree nodes are drawn as filled
triangles and open circles, respectively. (a) A binary 𝑀
RPN. (b) A binary galled network. (c)
2
A binary phylogenetic tree. (d) A rooted binary dup-tree, where the labels ‘1’ and ‘3’ are
duplicated labels.
𝑀1
𝑀3
A RPN is said to be galled if every reticulate node r has an ancestor ar such that there
𝑀2 nodes other than r.
are edge-disjoint paths from ar to r that𝑀do
not contain any reticulate
2
The RPN in Figure 1a is not galled but the one in Figure 1b is. By definition, every RPN
with only one reticulate
node or without reticulate
nodes
is galled.
𝑀1
𝑀
𝑀
1
2.2
Dup-trees
𝑀3
3
In this work, we will count galled networks through the connection between galled networks
and the so-called LML trees.
A rooted (resp. unrooted) LML tree is a binary rooted (resp. unrooted) tree with leaves
that are labeled in a way such that several leaves may have an identical label. It is a duptree if at most two leaves have the identical label. A rooted dup-tree is given in Figure 1d.
Here, a phylogenetic tree is considered to be a trivial rooted dup-tree. The child–parent and
ancestor–descendant relationships can be defined for nodes in a rooted dup-tree in the same
way as in a RPN.
Let M be a dup-tree over X. A taxon x ∈ X is said to be a duplicated label for M if two
distinct leaves labeled with x exist and a 1-label otherwise. L1 (M ) and L2 (M ) are used to
denote the subsets of the 1-labels and duplicated labels in M , respectively.
A cherry in a dup-tree is a pair of leaves that are adjacent to a common non-leaf node.
A cherry is said to be a twin-cherry if two leaves belonging to it are labeled with a common
taxon. A dup-tree is said to be twin-cherry-free if it does not contain any twin-cherries.
Let M be a unrooted LML tree over X, x ∈ X and e = (u, v) ∈ A(M ). Grafting a
new leaf x to e involves replacing e by a path consisting of two paths (u, p) and (p, v), and
attaching the leaf as the child of p, where p is not in M . Conversely, for a leaf ℓ in M ,
its parent p(ℓ) is adjacent to two nodes x and y other than ℓ. Pruning ℓ from M means
removing ℓ and p(ℓ) and any incident edges and then adding (x, y) as an edge to M . In this
work, we use M ⊕ (e, x) to denote the tree obtained from grafting x to e in M , or M ⊕ x
if there is no confusion if e is omitted. Similarly, M ⊖ ℓ is used to denote the tree obtained
from M by pruning ℓ for a leaf ℓ in M .
3 DUP-TREES AND 1-GALLED NETWORKS
2.3
4
Decomposition of galled networks into tree-components
Consider a RPN N . Let R(N ) and L(N ) denote the sets of reticulate nodes and leaves in
N , respectively. The subnetwork N − (R(N ) ∪ L(N )) is a forest for which each connected
component consists of tree nodes. Each connected component is called a tree-component of
N [11, 25]. Note that each tree-component does not contain any leaves. This is different
from the definition of tree-components given in [10].
A reticulate node is inner if both its parents are in a common tree-component. It is a cross
reticulate node otherwise. Galled networks have the following recursive characterization.
Theorem 1 Let G be a galled network.
(1) Each reticulate node is inner in G.
(2) For any r ∈ R(G), G − {r} consists of two connected components, and the component
contains all the descendants of r form a galled subnetwork rooted at the child c(r) of r.
Proof. (1) It has been proven [10, Theorem 2] that a binary RPN is galled if and only if
every reticulate node is inner.
(2) Clearly, the statement follows from Part 1.
A RPN is a 1-galled network if it is a galled network with only one tree-component. The
RPN in Figure 1b is 1-galled. It is easy to derive the following facts from Theorem 1.
Corollary 1 Let N be a RPN.
(1) If there is only one tree-component in N , then N is galled.
(2) If every reticulate has only one leaf descendant in N , then N is 1-galled.
3
Dup-trees and 1-galled networks
Let M be a dup-tree over X. Recall that L2 (M ) denotes the subset of duplicated labels. For
each x ∈ L2 (M ), we use ℓ′ (x) and ℓ′′ (x) to denotes the two leaves that are labeled with x.
Let us assume that M is twin-cherry-free. We derive a RPN N (M ) by (i) removing
′
ℓ (x) and ℓ′′ (x), (ii) introducing a reticulate node rx , (iii) connecting the parents p(ℓ′ (x)) and
p(ℓ′′ (x)) of x to rx , and (iv) attaching a leaf ℓx with the label below rx for each duplicated
label x. Formally, N (M ) = (V̄ , Ā), where:
V̄ = [V(M ) − {ℓ′ (x), ℓ′′ (x) | x ∈ L2 (M )}] ∪ {rx , ℓx | x ∈ L2 (M )} ,
Ā = [A(M ) − {(p(ℓ′ (x)), ℓ′ (x)), (p(ℓ′′ (x)), ℓ′′ (x)) | x ∈ L2 (M )}]
∪ {(p(ℓ′ (x)), rx ), (p(ℓ′′ (x)), rx ), (rx , ℓx ) | x ∈ L2 (M )} .
(1)
(2)
If M is a phylogenetic tree, N (M ) is just M . If M is a dup-tree containing at least one
duplicated label, N (M ) is then a 1-galled network containing as many reticulate nodes as
the duplicated labels in M . This transformation from a LML tree to a network is called the
“folding” operation in [14].
Conversely, it is not hard to see that splitting each reticulate node in a 1-galled network
N results in a dup-tree M such that N (M ) = N . This proves the following statement:
4 COUNTING 1-GALLED NETWORKS
5
Theorem 2 Let X be a finite set and r ≥ 1. There is a one-to-one correspondence between
• The binary twin-cherry-free dup-trees with r duplicated labels over X, and
• The binary 1-galled networks with r reticulate nodes.
Note that the 1-galled network in Figure 1b corresponds with the dup-tree in Figure 1d.
4
Counting 1-galled networks
Without loss of generality, we set [k] = {1, 2, ..., k}. We adopt the following notation:
• Tk is the set of phylogenetic trees over [k].
• UT k is the set of binary unrooted trees over [k].
• Di,k is the set of rooted dup-trees M over [k] such that M is twin-cherry-free and
L2 (M ) = [i], where 1 ≤ i ≤ k.
• UDi,k is the set of unrooted dup-trees M over [k] such that M is twin-cherry-free and
and L2 (M ) = [i], where 1 ≤ i < k.
• Gi,k is the set of 1-galled networks over [k] that has exactly i reticulate nodes with the
child being labeled with a unique element in [i].
Lemma 1 For any k ≥ 1,
|Tk | = |UT k+1 | = (2k − 3)!! = (2k − 3) × (2k − 5) × · · · × 3 × 1,
|Gi,k | = |Di,k | = |UDi,k+1 |, i ≤ k.
(3)
(4)
Proof. The first equation is well known (see [21, page 16]). Similarly, by Theorem 2, the
second equation is also true.
4.1
A recursive formula for |UDi,k | and |Gi,k |
It is well known that every binary unrooted tree over [k + 1] can be obtained from a unique
binary unrooted tree over [k] by inserting the leaf labeled with (k + 1) on an edge of the
latter. In the section, we generalize this fact to give a recurrence formula for |UDi,k | and
|Gi,k |.
As a warmup, we first count the dup-trees in UD1,k for k ≥ 2. For simplicity, we set
UD0,k = UT k . Let M ∈ UD1,k . Since M is twin-cherry-free, the leaves labeled with 1 are
not sibling and thus M can be partitioned into three parts (M1 , M2 , M3 ) as illustrated in
Figure 2. Pruning different leaves labeled with 1 from M results in different trees in UD0,k if
the middle subtree M3 is not empty, and the same tree otherwise. This suggests that grafting
an extra leaf labeled with 1 into every edge in each tree in UD0,k can generate every tree in
UD1,k twice. Note that if we graft the extra leaf into the edge incident to the original leaf
labeled with 1, we get a dup-tree in which two leaves of label 1 form a twin-cherry, which is
not in UD1,k .
6
4 COUNTING 1-GALLED NETWORKS
1
Prune
Leaf 1
𝑀1
1
𝑀1
𝑀3
1
𝑀3
𝑀2
𝑀2
Prune
Leaf 1
𝑀2
1
𝑀1
𝑀3
Figure 2: A dup-tree M in UD1,k can be partitioned into the three subtrees M1 , M2 , M3 ,
where only M2 can be empty. Pruning different leaves labeled 1 from M results in two
distinct trees in UD0,k if M1 is non-empty, and an identical tree otherwise.
Since there are 2k − 4 edges that are not incident to the leaf labeled 1 in every unrooted
binary tree in UD0,k , we have:
2|UD1,k | = (2k − 4) · |UD0,k |.
Therefore, by Lemma 1, we obtain:
|UD1,k | = (k − 2) · (2k − 5)!!.
(5)
In the rest of this section, we will focus on the case where i > 1. The analysis for this
case is more subtle than what we have done so far. Let M ∈ UDi,k , where i > 1.
First, we have to graft a leaf labeled with i into a twin-cherry in a dup-tree T over [k] such
that L2 (T ) = [i − 1] to get some dup-tree in UDi,k as illustrated Figure 3. In the dup-tree
on the top in Figure 3, a leaf labeled with 3 is in the twin-cherry consisting of leaves labeled
1, whereas another is in the twin-cherry consisting of leaves labeled with 2. In this case, we
have the following fact.
Lemma 2 Let T be a dup-tree over [k] such that L2 (T ) = [i − 1], i ≤ k. If T contains a
unique twin-cherry, then grafting a leaf labeled with i into either edge in the twin-cherry will
produce the same tree in UDi,k .
Conversely, consider a unrooted dup-tree M ∈ UDi,k . For a non-leaf node u and a node
v that is adjacent to u, we use Mu (v) to denote the connected component containing v in
M − u and call it a subtree adjacent to u. The node u ∈ V(M ) is said to be a duplication
node if it is adjacent to two nodes v ′ and v ′′ such that Mu (v ′ ) and Mu (v ′′ ) are identical as
rooted trees; in other words, there is a mapping
f : V(Mu (v ′ )) → V(Mu (v ′′ ))
such that (i) f (v ′ ) = v ′′ , (ii) (x, y) ∈ A(Mu (v ′ )) if and only if (f (x), f (y)) ∈ A(Mu (v ′′ )) and
(iii) x is a leaf if and only if f (x) is a leaf labeled with the same taxa, where x, y ∈ V(Mu (v ′ )).
Mu (v ′ ) and Mu (v ′′ ) are called the conjugate subtrees of u if u is a duplication node. The two
edges that are correspondent with each other under f are also said to be conjugate.
7
4 COUNTING 1-GALLED NETWORKS
4
2
4
2
1
2
1
1
3
3
3
2
Prune
Leaf 3
Graft
Leaf 3
1
3
Graft
Leaf 3
4
2
1
2
1
3
Figure 3: Grafting the second copy of Leaf 3 into either edge in the unique twin-cherry
(circled) in a dup-tree T (bottom) produces the same dup-tree (top), where L2 (T ) = {1, 2}.
Lemma 3 Let M be a dup-tree that may contain twin-cherries and L2 (M ) = [i].
(1) The non-leaf node in a twin-cherry is a duplication node.
(2) For different duplication nodes u and v in M , their conjugate subtrees are disjoint.
Proof. (1) This derives from the definition of a duplication node. (2) For different duplication nodes u and v, the conjugate subtrees associated with u contain leaves with labels that
are different from the labels appearing in the conjugate subtrees associated with v, as each
duplicated element labels exactly two leaves.
Lemma 2 can now be generalized as follows.
Lemma 4 Let M ∈ UDi,k , where i ≤ k, and let u be a duplication node of M with the
conjugate subtrees M ′ and M ′′ . Grafting the second leaf labeled with i + 1 into an edge e in
M ′ will produce the same tree as grafting the leaf in the edge conjugate to e in M ′′ .
Second, some unrooted dup-trees in UDi+1,k are generated by grafting a new leaf labeled
with i + 1 in a dup-tree in three or four times. Specifically, we have the following fact:
Lemma 5 Let M ∈ UDi+1,k , i < k.
(1) If ℓ′ (i + 1) and ℓ′′ (i + 1) are in the conjugate subtrees of a duplicate node in M , then,
M ⊖ ℓ′ (i + 1) = M ⊖ ℓ′′ (i + 1), from which M can only be generated by grafting a leaf
labeled with i + 1 in a unique edge.
(2) If neither ℓ′ (i + 1) nor ℓ′′ (i + 1) is in the conjugate subtrees of any duplicate node,
M ⊖ ℓ′ (i + 1) and M ⊖ ℓ′′ (i + 1) are different dup-trees in UDi,k if and only if p(ℓ′ (x))
and p(ℓ′′ (x)) are not adjacent.
(3) If ℓ′ (i + 1) is not in a conjugate subtree of any duplication node and if pruning ℓ′ (i + 1)
from M does not produce a new duplication node, then M can be obtained from M ⊖
ℓ′ (i + 1) by grafting a leaf labeled with i + 1 in a unique edge.
8
4 COUNTING 1-GALLED NETWORKS
a
1
5
5
1
2
2
Graft 3
3
2
2
4
3
1
b
4
3
1
1
5
2
2
1
1
2
2
5
3
2
2
4
1
1
5
2
2
1
3
1
3
4
1
1
3
5
2
2
5
2
2
1
4
3
1
3
c
5
1
4
3
3
4
3
4
3
4
Figure 4: A unrooted dup-tree in UDi+1,k can be generated by grafting a leaf labeled with
i + 1 once (a), three times (b) and four times (c) from the dup-tree M such that L2 (M ) = [i]
and M contains at most one twin-cherry. Here, the circled subtrees consist of a duplication
node and the associated conjugate subtrees. (a) The right-handed unrooted dup-tree is in
UD3,5 that has two Leaves 3 in the conjugate subtrees of a duplication node. It can only
be generated by grafting leaves with 3 in a unique edge in a unique dup-tree on the left.
(b) None of the leaves labeled with 3 is in a conjugate subtree in the right-handed duptree. Pruning the left-hand leaf labeled with 3 gives a dup-tree (left) that contains a new
duplication node, but not for the right-handed leaf labeled with 3. Conversely, the right
dup-tree can be generated from the left-handed dup-trees (in UD2,5 ) by grafting a Leaf 3
in three ways. (c) Neither of the leaves labeled with 4 are in the conjugate subtrees of a
duplication node in the right-handed dup-tree. But pruning each of the leaves gives a duptree (left) that contains a new duplication node. Conversely, the right-handed dup-tree can
be generated from two left-handed dup-trees (in UD3,5 ) by grafting a Leaf 4 four times.
(4) If ℓ′ (i + 1) is not in a duplication subtree but M ⊖ ℓ′ (i + 1) contains one duplication
node that is not a duplication node in M , then, M can be obtained by grafting a leaf
labeled i + 1 in two different edges in M ⊖ ℓ′ (i + 1).
By Lemma 5, a unrooted dup-tree in UDi+1,k can be generated by grafting a leaf labeled
with i + 1 in a unique dup-tree twice, in two different dup-trees twice, in two dup-trees three
or four times, as illustrated in Figure 4.
4 COUNTING 1-GALLED NETWORKS
9
We are now ready to establish a formula for |UDi+1,k |. We will use the following parameters:
• Ci,k : the set of unrooted dup-trees T over [k] such that L2 (T ) = [i] and that contains
only one twin-cherry.
• O1 : the number of unrooted dup-trees T in UDi+1,k such that two leaves with i + 1 are
in the conjugate subtrees of a duplication node.
• O3 : the number of unrooted dup-trees T in UDi+1,k such that the removal of one
labeled with i + 1 gives a dup-tree with one more duplication node than T , but the
removal of the other does not change the duplication nodes.
• O4 : the number of unrooted dup-trees T in UDi+1,k such that the removal of either of
the leaves labeled with i + 1 gives a dup-tree with one more duplication node than T .
To generate all the unrooted dup-trees in UDi+1,k , we graft another leaf labeled with i + 1
in all but the edge incident to Leaf i + 1 in each dup-tree in UDi,k and in each edge of the
unique twin-cherry in each dup-tree in Ci,k . Since each dup-tree in UDi,k has k + i leaves
and 2(k + i) − 3 edges, we have the following identity:
2(k + i − 2)|UDi,k | + 2|Ci,k | = 2|UDi+1,k | − O1 + O3 + 2O4 .
(6)
Lemma 6 Let i < k. We then have:
|Ci,k | = i · |U Di−1,k |,
X i
O1 =
· (2d − 1)!! · |UDi−d,k−d |,
d
1≤d≤i
X i
O3 + 2O4 =
· (2d − 1)!! · |U Di−d,k−d+1 |.
d
1≤d≤i
(7)
(8)
(9)
Proof. For any S ⊆ [k] of j elements, the dup-trees T over [k] such that L2 (T ) = S have oneto-one correspondence with the dup-trees T ′ over [k] such that L2 (T ′ ) = [j]. The dup-trees
in Ci,k that contains only one twin-cherry consisting of leaves labeled with i′ (i′ ≤ i) can be
generated by grafting Leaf i′ in the unique edge incident to Leaf i′ in every twin-cherry-free
dup-tree over [k] such that L2 = [i] − {i′ }. Taken together, these two facts imply Eqn. (7).
Note that O1 is equal to the number of dup-trees over [k] in which two Leaves i + 1
appear in the conjugate subtrees of a duplication node. We assume that T is such a dup-tree
over [k] and u is the duplication node whose conjugate subtrees contain leaves labeled with
j + 1. Let T ′ = T ⊖ {ℓ′ (i + 1), ℓ′′ (i + 1)}. T ′ is then a dup-tree over [k] − {i + 1} such that
L2 (T ′ ) = [i] and u remains as a duplication node in T ′ .
Conversely, let T ′′ be a dup-tree over [k] − {i + 1} such that L2 (T ′′ ) = [i]. If T ′′ contains
a duplication node u whose conjugate subtrees are of d leaves, then simultaneously grafting
two leaves labeled with i + 1 in each of the (2d − 2) pairs of conjugate edges in the conjugate
subtrees of u as well as in the two edges incident to u, we obtain (2d − 1) dup-trees over
[k] such that u is still a duplication node. Note that there are (2d − 3)!! rooted binary trees
4 COUNTING 1-GALLED NETWORKS
10
with d leaves, and removing the conjugate subtrees from T ′′ and treating u as a leaf with a
new label generates a dup-tree D with (k − 1) − d + 1 labels, i − d of which are duplicated
labels. Summing over all possible d values from 1 to i, we obtain:
X
O1 =
#(d-element subsets of [i]) · (1 + #(edges in a tree in Td )) · |Td | · |U Di−d,k−d |
1≤d≤i
X i
=
(2d − 1) · (2d − 3)!! · |UDi−d,k−d |.
d
1≤d≤i
Therefore, Eqn. (8) holds.
To prove Eqn. (9), we let Pi,k (S) be the set of the unrooted dup-trees T over [k] in which
L2 (T ) = [i] and where there is a duplication node u whose conjugated subtrees have leaves
with labels in S for any S ⊆ [i] of d labels. Grafting a new leaf labeled with i + 1, ℓ′ (i + 1),
in each edge in a fixed conjugate subtree of u as well as an edge incident to u gives (2d − 1)
dup-trees T ′ such that T ′ ⊖ ℓ′ (i + 1) = T . Clearly, T contains one more duplication node
than such a T ′ . Note that the removal of the conjugate subtrees of u transforms each tree
in Pi,k (S) into a tree T ′′ over {u} ∪ [k] − S such that L2 (T ′′ ) = [i] − S if u is considered to
be a labeled leaf. Thus,
|Pi,k (S)| = (2d − 1) · (2d − 3)!! · |UDi−d,k−d+1 |.
Conversely, let T ∈ U Di+1,k . If T ⊖ ℓ′ (i + 1) and T ⊖ ℓ′′ (i + 1) both contain a new
duplication node compared with T , T can be generated by grafting from two different duptrees in ∪S⊆[i] Pi,k (S). Therefore,
i
X
X
i
|Pi,k (S)| =
· (2d − 1)!! · |U Di−d,k−d+1 | = O3 + 2O4 .
d
d=1
S⊆[i]
This proves Eqn. (9).
By plugging Eqn. (7)–(9) into Eqn. (6), we obtain the following recursive formula for
|UDi+1,k |.
(i)
Theorem 3 Let i < k and Nk = |UDi,k | = |Gi,k−1 |. We then have:
1 X i
(i−1)
(i)
(i+1)
(i−d)
(i−d)
(2d − 1)!! Nk−d − Nk−d+1 .
+
= (k + i − 2)Nk + iNk
Nk
2 1≤d≤i d
Example 4.1 For k = 4, we have:
(0)
N4 = (2 × 4 − 5)!! = 3,
(1)
(0)
N4 = (4 − 2)N4 = 6,
1 (0)
(0)
(2)
(1)
(0)
= 20,
N −N
N4 = 3N4 + N4 +
2 3 4
2
1
2 (1)
(3)
(2)
(1)
(1)
(0)
(0)
N4 = 4N4 + 2N4 +
= 87.
+
N 3 − N4
3!! N2 − N3
2
1
2
(i)
Table 1 lists the values of Nk for i and k such that 0 ≤ i < k and 2 ≤ k ≤ 10.
(10)
11
5 COUNTING GENERAL GALLED NETWORKS
(i)
Table 1: The values of Nk for 0 ≤ i < k and 2 ≤ k ≤ 10.
k
2
i
4.2
3
4
5
6
7
8
9
10
11
0
1
1
3
15
105
945
10,395
135,135
2,027,025
34,459,425
1
0
1
6
45
420
4,725
62,370
945,945
16,216,200
310,134,825
2
-
3
20
189
2,160
28,875
442,260
7,640,325
147,026,880
3,119,591,475
3
-
-
87
993
13,407
207,135
3,603,915
69,757,065
1,487,243,835
34,639,019,415
4
-
-
-
6,249
97,182
1,701,855
33,121,890
709,428,825
16,587,636,030
420,498,508,815
5
-
-
-
-
804,585
15,738,765
338,588,685
7,946,584,695
202,099,078,125
5,537,451,658,725
6
-
-
-
-
-
161,685,045
3,808,469,970
97,162,333,695
2,669,506,204,050
78,595,220,899,125
7
-
-
-
-
-
-
46,726,507,485
1,287,228,175,065
37,987,475,258,565
1,195,779,444,849,670
8
-
-
-
-
-
-
-
18,363,976,595,055
579,247,192,040,580
19,410,597,807,225,300
9
-
-
-
-
-
-
-
-
9,420,991,174,195,960
334,803,875,697,765,000
10
-
-
-
-
-
-
-
-
-
6,114,381,201,716,870,000
A formula for counting 1-galled networks
A 1-galled network over [k] may contain 0 to k reticulate nodes. Since the 1-galled networks
with i reticulate nodes have one-to-one correspondence with the rooted dup-trees over [k]
that have i duplicated labels, they have one-to-one correspondence with the unrooted duptrees over [k + 1] that have i duplicated labels. Therefore, we have the following theorem.
Theorem 4 Let G1 (k) denote the number of 1-galled networks over k taxa. We then have,
k
X
k
(i)
G1 (k) =
(11)
Nk+1 ,
i
i=0
(i)
where Nk+1 is defined in Eqn. (10).
Example 4.1 (con’t) By Theorem 4, the number of 1-galled network on three taxa is:
3
3
3
(0)
(1)
(2)
(3)
N4 +
N4 +
N4 +
N4 = 168.
1
2
3
All 34 topological structures of these 168 1-galled networks are drawn in Figure S1.
5
5.1
Counting general galled networks
Compression of galled networks
The technique of network decomposition was first introduced to study two algorithmic problems for RPNs in [10]. Recently, component-wise compression was formally investigated to
reveal the connection between several classes of RPNs [11]. Intuitively, compressing a RPN
N involves replacing every component in N with a node of degree 2 or more, thereby creating
a smaller network Ñ that summarizes the relationships among tree-components in N . The
node and edge sets of the compression network Ñ of N is rigorously defined as follows:
V(Ñ ) =L(N ) ∪ {vτ : τ is a tree- or reticulation component in N }, and
E(Ñ ) ={(vτ , ℓ) : ℓ ∈ L(N ) and p(ℓ) ∈ τ }
∪ {(vτ , vτ′ ) : there is (x, y) ∈ E(N ) such that x ∈ τ, y ∈ τ ′ },
12
5 COUNTING GENERAL GALLED NETWORKS
where p(ℓ) denotes the parent of the leaf ℓ. The operation of network compression is illustrated in Figure 5.
In a galled network, each reticulate node is inner and thus both its parents are in a
common tree-component. Therefore, a tree-component becomes a node with at least two
children and each reticulate node becomes a node of indegree 1 and outdegree 1 after the treecomponents are compressed. Thus, the compression of a galled network is a tree (Theorem
3.1, [11]) (see Figure 5), implying that a galled network consists of a set of 1-galled networks
stacked one on the top of the other in a tree shape.
5.2
A counting method
We are now ready to count general galled networks over [k]. Let Ak be the set of non-binary
phylogenetic trees over [k] in which every non-leaf node has two or more children. Assume
that T ∈ Ak . For a non-leaf node v ∈ V(T ), we use clf (v) and cnlf (v) to denote the numbers
of leaf and non-leaf children of v in T , respectively, and define c(v) = clf (v) + cnlf (v). Clearly
c(v) is the number of the children of v in T .
Consider a binary galled network N over [k]. By Theorem 3.1 in [11], the compression
C(N ) of N is a tree over [k]. A node of indegree and outdegree 1 in C(N ) corresponds
one-to-one to a reticulate node in N , whereas a tree node with two or more children in
C(N ) corresponds one-to-one to a tree-component in N . For convenience, we suppress all
the nodes of indegree and outdegree 1 in C(N ) to get rooted tree C ′ (N ) ∈ Ak . Clearly, the
tree-components of N are still in one-to-one correspondence with the tree nodes in C ′ (N ). By
reverse-engineering this process, we can enumerate and count general galled networks over
[k], as all possible general rooted trees over [k] can be enumerated and counted recursively
[6].
Theorem 5 Let G(n) be the number of galled networks over n taxa. We then have:
c(v)
X
Y
X
clf (v)
(j)
Nc(v)+1 ,
G(n) =
c(v) − j
T ∈A
j=cnlf (v)
v∈I(T )
k
(12)
(j)
where I(T ) denotes the set of non-leaf nodes in T and Nc(v)+1 is defined in Eqn. (10).
b
a
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Figure 5: Illustration of the network compression operation. (a) A galled network over [8].
It has four tree-components. (b) The compression of the network in (a). It is a rooted tree
in which each reticulate node becomes a node of indegree 1 and outdegree 1.
13
6 CONCLUSION
Proof. Let T ∈ Ak such that C ′ (N ) = T for some galled network N . Consider a nonleaf node v in T . We first consider how to reconstruct the tree-component σ of N that
corresponds to v. Let R denote the set of reticulate nodes in N whose parents are both in
σ. For each child u of v that is a non-leaf, the root of the tree-component corresponding
to u must be a child of a reticulate node in R. However, for each child ℓ of v that is a
leaf, the parent of ℓ in N may be a tree node
in σ or a reticulate node in R. Therefore,
clf
cnlf (v) ≤ |R| ≤ c(v) and R has |R|−cnlf (v) possibilities. For each possible selection of R,
σ corresponds to a 1-galled network with c(v) leaves and |R| reticulate nodes. Thus, the
(j)
Pc(v)
clf (v)
component σ corresponding to v has j=cnlf (v) c(v)−j
Nc(v)+1 choices.
Since the reconstructions of two distinct tree-components in N are independent from
each other,
networks whose compression correspond to T is
Pthe number of galled
Q
(j)
c(v)
clf (v)
v∈I(T )
j=cnlf (v) c(v)−j Nc(v)+1 . Hence, the theorem follows.
The numbers G(n) of galled networks over n taxa was calculated according to Therem 5
and are listed in Table 2. For example, G(3) = 240. This implies that there are 240−168 = 72
galled networks with two tree-components over three taxa, the topological structures of which
are listed in Figure S2.
Table 2: The values of G(n) for 1 ≤ n ≤ 10.
6
n
G(n)
1
1
2
6
3
240
4
20,502
5
2,868,990
6
589,130,280
7
167,357,180,970
8
63,356,654,623,500
9
31,092,212,800,634,500
10
19,327,089,427,089,400,000
Conclusion
We have presented a linear recurrence formula for counting all possible 1-galled networks
and a method for counting and enumerating general galled networks. We conclude the study
with a couple of remarks.
First, using the same counting technique as in Section 4.1, we can derive the following
recurrence formula for the number of all unrooted dup-trees T such that L2 [T ] = [i], denoted
14
REFERENCES
(i)
by Bk :
(0)
Bk = (2k − 5)!!,
B (1) = (k − 1) · (2k − 5)!!
(i+1)
Bk
= (n + k −
(i)
1)Bk
1 X i
(i−d)
(i−d)
+
· (2d − 1)!! · Bk−d − Bk−d+1 .
2 1≤d≤i d
(13)
Second, galled networks form a subclass of reticulation-visible networks. We therefore
pose counting reticulaiton-visible networks as an open question.
Acknowledgements
The authors thank HW Yan and Jonathan M. Woenardi for participanting in discussion on
this work. This work was supported by Singapore Ministry of Education Academic Research
Fund Tier-1 [grant R-146-000-238-114] and National Research Fund [grant NRF2016NRFNSFC001-026].
References
[1] Bordewich, M., Semple, C.: Reticulation-visible networks. Adv. Applied Math. 78,
114–141 (2016)
[2] Bouvel. M., Gambette, P., Mansouri, M.: Counting level-k phylogenetic networks. In
preparation (2018)
[3] Cardona, G., Llabrés, M., Rosselló, F., Valiente, G.: Metrics for phylogenetic networks
i: Generalizations of the Robinson-Foulds metric. IEEE/ACM Trans. Comput. Biol.
Bioinform. 6(1), 46–61 (2009)
[4] Cardona, G., Rossello, F., Valiente, G.: Comparison of tree-child phylogenetic networks.
IEEE/ACM Trans. Comput. Biol. Bioinform. 6(4), 552–569 (2009)
[5] Czabarka É, Erdős PL, Johnson V., Moulton V.: Generating functions for multi-labeled
trees. Discrete Applied Math. 161, 107-117 (2013)
[6] Felenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland, MA, USA (2004)
[7] Francis, A.R., Steel, M.: Which phylogenetic networks are merely trees with additional
arcs? Syst. Biol. 64(5), 768–777 (2015)
[8] Fuchs, M., Gittenberger, B. and Mansouri, M.: Counting phylogenetic networks
with few reticulation vertices: Tree-child and normal networks. arXiv preprint
arXiv:1803.11325 (2018)
[9] Gambette, P., Gunawan, A.D., Labarre, A., Vialette, S., Zhang, L.: Locating a tree in
a phylogenetic network in quadratic time. In: Proc. Int’l Confer. on Res. in Comput.
Mol. Biol. (RECOMB), pp. 96–107. Springer, New York (2015)
REFERENCES
15
[10] Gunawan, A.D., DasGupta, B., Zhang, L.: A decomposition theorem and two algorithms for reticulation-visible networks. Inform. Comput. 252, 161–175 (2017)
[11] Gunawan, A.D., Yan, H., Zhang, L.: Compression of phylogenetic networks and algorithm for the tree containment problem. J. Comput. Biol. (in press). ArXiv preprint
arXiv:1806.07625 (2018)
[12] Gusfield, D.: ReCombinatorics: the Algorithmics of Ancestral Recombination Graphs
and Explicit Phylogenetic Networks. MIT Press, Boston, USA (2014)
[13] Gusfield, D., Eddhu, S., Langley, C.: The fine structure of galls in phylogenetic networks.
INFORMS J. Comput. 16(4), 459–469 (2004)
[14] Huber KT, Moulton V.: Phylogenetic networks from multi-labelled trees. J. Math. Biol.
52, 613–632 (2006)
[15] Huson, D.H., Klöpper, T.H.: Beyond galled trees–decomposition and computation of
galled networks. In: Proc. Int’l Confer. on Res. in Comput. Mol. Biol. (RECOMB), pp.
211–225. Springer, New York, USA (2007)
[16] Huson, D.H., Rupp, R., Berry, V., Gambette, P., Paul, C.: Computing galled networks
from real data. Bioinformatics 25(12), i85–i93 (2009)
[17] Huson, D.H., Rupp, R., Scornavacca, C.: Phylogenetic networks: Concepts, Algorithms
and Applications. Cambridge University Press, Cambridge, UK (2010)
[18] Jain R, Rivera MC, Lake JA.: Horizontal gene transfer among genomes: the complexity
hypothesis. Proc. Nat’l Acad. Sc. U.S.A. 96, 3801–3806 (1999)
[19] McDiarmid, C., Semple, C. and Welsh, D.: Counting phylogenetic networks. Annals
Combin. 19, 205–224 (2015)
[20] Semple, C. and Steel, M.: Unicyclic networks: compatibility and enumeration.
IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 84–91 (2006)
[21] Steel, M.: Phylogeny: Discrete and Random Processes in Evolution. SIAM, Philadelphia, USA (2016)
[22] Wang, L., Zhang, K., Zhang, L.: Perfect phylogenetic networks with recombination. J.
Comput. Biol. 8(1), 69–78 (2001)
[23] Yan, H., Gunawan, A.D., Zhang, L.: S-cluster++: a fast program for solving the cluster
containment problem for phylogenetic networks. Bioinformatics 34(17), i680–i686
[24] Zhang, L.: On tree-based phylogenetic networks. J. Comput. Biol. 23(7), 553–565
(2016)
[25] Zhang, L.: Clusters, trees and phylogenetic network classes. In T. Warnow (ed.): Bioinformatics and Phylogenetics: Seminal Contributions of Bernard Moret, Springer, New
York (2019)
REFERENCES
16
Figure S1: The 34 toplogical structures of the 168 1-galled networks over three taxa.
Figure S2: The 16 toplogical structures of the 72 galled networks over three taxa that have
two tree-components.