Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Gene

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Saitou, N. and Nei, M. 1987.

The neighbour-joining method: A new method for reconstructing


phylogenetic trees. Molecular Biology and Evolution 4: 406--425.

1.Molecular Ecology
Joanna R. Freeland
The Open University, Milton Keynes
Fleischer, McIntosh and Tarr (1998) superimposed these geological ages onto
phylogenetic trees to calibrate the rates of sequence divergence in several endemic
taxa. This provided them with molecular clocks of 1.9 per cent per million years
for the yolk protein gene in Drosophila, 1.6 per cent per million years for the
cytochrome b gene in Hawaiian honeycreeper birds (Drepananidae), and a variable
rate of 2.4--10.2 per cent per million years for parts of the mitochondrial 12S and
16S rRNA and tRNA valine in Laupala crickets. The authors stressed that these
estimates were based on a number of assumptions, including the establishment of
populations very near to the time at which individual islands were formed, and
there having been very little subsequent movement between populations. The
surprisingly high rates for a ribosomal-RNA encoding gene that were calculated for
Laupala crickets suggested that in this species at least one or more of the
assumptions were not met.
There are two final points worth noting about molecular clocks. First, the rate at
which a sequence evolves is not necessarily constant through time; in some cases,
mutation rates are relatively rapid in newly diverged taxa but then slow down over
time (Mindell and Honeycutt, 1990). Second, although many of the estimates
presented in this section may appear very similar, a difference in mutation rates of
only 0.5 per cent per million years can have a significant impact on the estimated
timing of evolutionary events. If the sequences of two species diverged by 5 per
cent then this would translate into a 5-million-year separation according to a clock
of 1 per cent per million years, but a 10-million-year separation according to a clock
of 0.5 per cent per million years. Molecular clocks remain widespread in the
literature but are also highly contentious. In fact, some researchers have argued
that we may never achieve molecular clocks that are sufficiently reliable to allow us
to date past events (Graur and Martin, 2004). Molecular clocks should therefore be
interpreted with caution and ideally should be based on accurately dated geological
events or fossils, and be calibrated specifically for the gene region and taxonomic
group that is being studied.

Bifurcating Trees
One appeal of molecular clocks is that they are relatively easy to use once the
correct calibration has been done, but with a bit more work a great deal more
information on the evolutionary relationships of genetic lineages can be obtained
from DNA sequences through the reconstruction of phylogenies. Traditionally,
most phylogenetic inferences have been depicted in the form of hierarchical
bifurcating trees, in other words trees that reflect a series of branching processes
in which one lineage splits into two descendant lineages. These trees can be based

on morphological characters, although in this book we will limit our discussion to


phylogenetic trees that are inferred from genetic characters. The positioning of
organisms on a tree is generally based on their genetic similarity to one another.

Figure 5.1 A phylogeny of 13 dragonfly species based on the mitochondrial 12S ribosomal DNA
gene. First species names, and then family names, are shown to the right of the tree. Note that
congeneric species are closest together on the tree because they are genetically most similar to one
another. Adapted from Saux, Simon and Spicer, (2003)

This is illustrated in Figure 5.1, which shows a tree that portrays the evolutionary
relationships of some dragonfly species, genera and families. Congeneric species
that diverged from a common ancestor relatively recently, such as Libellula
saturata and L. luctuosa, will be close to each other on the tree. Confamilial
genera, such as Libellula and Erythemis (Figure 5.2), are further apart on the tree
because their common ancestor was more remote, and members of different
families are even more widely spaced.
There are many different ways in which phylogenies can be reconstructed from
genetic data, but most of them fall into one of four categories: distance,
parsimony, likelihood and Bayesian methods. Note that the following discussion
will focus on the phylogenies of closely related populations and species, and the
limitations outlined below are not necessarily relevant to the phylogenies of more
distantly related taxa.
Distance methods are based on measures of evolutionary distinctiveness
between all pairs of taxa (Figure 5.3). These metrics may be calculated from the
number of nucleotide differences if based on DNA sequence data or from estimates
such as Neis D (Chapter 4) if based on allele frequency data, such as that provided
by allozymes or microsatellites. There are many different algorithms that can be
used to reconstruct trees from genetic distances, the most common being the
neighbour-joining method (Saitou and Nei, 1987). Details of these various
methods are beyond the scope of this book; suffice it to say that the goal is to

build a tree that accurately reflects how much genetic change has occurred -- and
therefore roughly how much time has passed -- since lineages split from one other.
Because branch lengths reflect the evolutionary distance between two points on a
tree, this approach should ensure that neighbouring branches on a tree are occupied by
those lineages that have descended most recently from a common ancestor. When
applied to closely related lineages, distance-based trees may be poorly resolved
because a number of different lineages may be separated by the same distance, in
which case decisions as to which lineages should be closest to each other on the tree
are arbitrary.

Figure 5.3 A general distance method for reconstructing phylogenies. (a) The pairwise genetic
distances between species AD are provided in a matrix format, with the number referring to the
percentage difference between any pair of species, e.g. the sequence from species A differs from that
of
species B sequence by 2%. (b) The genetic distances are then used to reconstruct a tree in which
species that are separated by the smallest genetic distances are grouped together. Note that the
branch
lengths are proportional to the amount of genetic change that has occurred, and these add up to the
total genetic distances that are given in (A)

Figure 5.4 A maximum parsimony (MP) phylogenetic analysis based on the DNA sequences shown in
(a) of species a, b, c and d. Three possible trees are shown in (b). Vertical bars on branches represent
the mutations that must have occurred at particular sequence sites. The tree that requires six
mutations
is more parsimonious than the trees that require seven mutations and therefore under MP analysis
would be considered the correct tree

A maximum parsimony tree is the tree that contains the minimum number of
steps possible, in other words the smallest number of mutations that can explain
the distribution of lineages on the tree (Fitch, 1971; Figure 5.4). Parsimony is based
on Ockhams Razor, the principle proposed by William of Ockham in the 14th

century, which states that the best hypothesis for explaining a process is the one
that requires the fewest assumptions. A maximum parsimony tree will maximize
the agreement between characters on a tree. However, although intuitively
appealing, parsimony trees may remain unresolved if data are insufficiently
polymorphic, which is often the case in the recently diverged lineages that are
typically found within and among populations. The small number of mutational
changes that differentiate many conspecific haplotypes may mean that multiple,
equally parsimonious trees exist, once again leading to a situation in which it may
be impossible to determine which haplotypes should be adjacent to one another
on the tree.
The third and fourth categories of phylogenetic analysis are maximum likelihood
(ML; Chapter 3) and Bayesian approaches, both of which are based on
specific models that describe the evolution of individual characters. Each model
will make a particular set of assumptions, for example that all nucleotide
substitutions are equally likely or, alternatively, that each nucleotide is replaced
by each alternative nucleotide at a particular rate. Models are typically complex,
for example they can accommodate different rates of transitions and transversions,
and heterogeneous substitution rates, along a particular stretch of DNA. Once
the assumptions have been established, ML determines the probability that a
data set is best represented by a particular tree by calculating the likelihood of
each possible phylogenetic tree occurring within a specified evolutionary model
Although similar in some respects, an important difference in
the more recently developed and increasingly popular Bayesian approach is that
it maximizes the probability that a particular tree is the correct one, given the
evolutionary model and the data that are being analysed (Huelsenbeck et al.,
2001). In both of these approaches all variable sites are informative, and these
methods can be powerful if the parameters of the model can be set with
confidence.
Traditional phylogenetic analyses have been invaluable in evolutionary biology.
However, although bifurcating trees are appropriate for taxonomic groups at the
species level and beyond, which have experienced a period of reproductive
isolation long enough to allow for the fixation of different alleles, a hierarchical
bifurcating tree will not always be appropriate for population studies. This is partly
because, as outlined above, there may be insufficient polymorphism in comparisons
of conspecific sequences. In addition, bifurcating trees allow for neither the
co-existence of ancestors and descendants nor the rejoining of lineages through
hybridization or recombination (reticulated evolution), two processes that occur
commonly at the population level. As a result, traditional phylogenetic trees are
not always the most appropriate method for analysing the genealogies within and
among conspecific populations, and in these cases can result in poorly resolved
and sometimes misleading phylogenetic trees (Posada and Crandall, 2001). In
recent years, this limitation has provided the impetus for researchers to develop a
number of methods for phylogenetic anlaysis that are specifically tailored to
accommodate the similar sequences that often emerge from comparisons of
populations and closely related species.

You might also like