Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Molecular phylogenetics: principles and practice

Key Points

  • The rapid accumulation of genome sequence data has made phylogenetics an indispensable tool to various branches of biology. However, it has also posed considerable statistical and computational challenges to data analysis.

  • Distance, parsimony, likelihood and Bayesian methods of phylogenetic analysis have different strengths and weaknesses. Although distance methods are good for large data sets of highly similar sequences, likelihood and Bayesian methods often have more power and are more robust, especially for inferring deep phylogenies.

  • Assessing phylogenetic uncertainty remains a difficult statistical problem.

  • Data partitioning may have an important influence on the phylogenetic analysis of genome-scale data sets.

  • Systematic biases, such as long-branch attraction, may be more important than random sampling errors in the analysis of genomic-scale data sets.

Abstract

Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Markov models of nucleotide substitution.
Figure 2: The neighbour joining algorithm.
Figure 3: Long-branch attraction in theory and in practice.

Similar content being viewed by others

References

  1. Maser, P. et al. Phylogenetic relationships within cation transporter families of Arabidopsis. Plant Physiol. 126, 1646–1667 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Edwards, S. V. Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009).

    Article  CAS  PubMed  Google Scholar 

  3. Marra, M. A. et al. The genome sequence of the SARS-associated coronavirus. Science 300, 1399–1404 (2003).

    Article  CAS  PubMed  Google Scholar 

  4. Grenfell, B. T. et al. Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303, 327–332 (2004).

    Article  CAS  PubMed  Google Scholar 

  5. Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proc. Natl Acad. Sci. USA 103, 5448–5453 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science 323, 479–483 (2009).

    Article  CAS  PubMed  Google Scholar 

  7. Brady, A. & Salzberg, S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nature Methods 8, 367 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003).

    Article  CAS  PubMed  Google Scholar 

  9. Pedersen, J. S. et al. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2, e33 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 1031–1034 (2011).

    Article  CAS  PubMed  Google Scholar 

  13. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ma, J. Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. J. Comput. Biol. 18, 879–893 (2011).

    Article  CAS  PubMed  Google Scholar 

  16. Kingman, J. F. C. On the genealogy of large populations. J. Appl. Probab. 19A, 27–43 (1982).

    Article  Google Scholar 

  17. Kingman, J. F. C. The coalescent. Stoch. Process. Appl. 13, 235–248 (1982).

    Article  Google Scholar 

  18. Edwards, S. V., Liu, L. & Pearl, D. K. High-resolution species trees without concatenation. Proc. Natl Acad. Sci. USA 104, 5936–5941 (2007). This paper introduces a method for estimating the species tree despite the presence of conflicting gene trees.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Than, C. & Nakhleh, L. Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5, e1000501 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Rannala, B. & Yang, Z. Phylogenetic inference using whole genomes. Annu. Rev. Genomics Hum. Genet. 9, 217–231 (2008).

    Article  CAS  PubMed  Google Scholar 

  21. Felsenstein, J. Phylogenies and the comparative method. Am. Nat. 125, 1–15 (1985). This paper introduces the bootstrap approach to phylogenetic analysis. This is the most commonly used method for assessing sampling errors in estimated phylogenies.

    Article  Google Scholar 

  22. Yang, Z. in Handbook of Statistical Genetics (eds Balding, D., Bishop, M. & Cannings, C.) 377–406 (Wiley, New York, 2007).

    Google Scholar 

  23. Felsenstein, J. Inferring Phylogenies (Sinauer Associates, Sunderland, Massachusetts, 2004).

    Google Scholar 

  24. Yang, Z. Computational Molecular Evolution (Oxford Univ. Press, UK, 2006).

    Book  Google Scholar 

  25. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

    CAS  PubMed  Google Scholar 

  26. Jukes, T. H. & Cantor, C. R. in Mammalian Protein Metabolism (ed. Munro, H. N.) 21–123 (Academic Press, New York, 1969).

    Book  Google Scholar 

  27. Kimura, M. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980).

    Article  CAS  PubMed  Google Scholar 

  28. Hasegawa, M., Kishino, H. & Yano, T. Dating the human–ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985).

    Article  CAS  PubMed  Google Scholar 

  29. Tavaré, S. Some probabilistic and statistical problems on the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986).

    Google Scholar 

  30. Yang, Z. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39, 105–111 (1994).

    PubMed  Google Scholar 

  31. Yang, Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10, 1396–1401 (1993).

    CAS  PubMed  Google Scholar 

  32. Cavalli-Sforza, L. L. & Edwards, A. W. F. Phylogenetic analysis: models and estimation procedures. Evolution 21, 550–570 (1967).

    Article  CAS  PubMed  Google Scholar 

  33. Fitch, W. M. & Margoliash, E. Construction of phylogenetic trees. Science 155, 279–284 (1967).

    Article  CAS  PubMed  Google Scholar 

  34. Rzhetsky, A. & Nei, M. A simple method for estimating and testing minimum-evolution trees. Mol. Biol. Evol. 9, 945–967 (1992).

    CAS  Google Scholar 

  35. Desper, R. & Gascuel, O. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9, 687–705 (2002).

    Article  CAS  PubMed  Google Scholar 

  36. Gascuel, O. & Steel, M. Neighbor-joining revealed. Mol. Biol. Evol. 23, 1997–2000 (2006).

    Article  CAS  PubMed  Google Scholar 

  37. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Bruno, W. J., Socci, N. D. & Halpern, A. L. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17, 189–197 (2000).

    Article  CAS  PubMed  Google Scholar 

  39. Fitch, W. M. Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20, 406–416 (1971).

    Article  Google Scholar 

  40. Hartigan, J. A. Minimum evolution fits to a given tree. Biometrics 29, 53–65 (1973).

    Article  Google Scholar 

  41. Swofford, D. L. PAUP*: Phylogenetic Analysis by Parsimony (and Other Methods)4.0 Beta (Sinauer Associates, Massachusetts, 2000).

    Google Scholar 

  42. Goloboff, P. A., Farris, J. S. & Nixon, K. C. TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786 (2008).

    Article  Google Scholar 

  43. Felsenstein, J. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978).

    Article  Google Scholar 

  44. Huelsenbeck, J. P. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47, 519–537 (1998).

    CAS  PubMed  Google Scholar 

  45. Swofford, D. L. et al. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539 (2001).

    Article  CAS  PubMed  Google Scholar 

  46. Yang, Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11, 367–372 (1996).

    Article  CAS  PubMed  Google Scholar 

  47. Philippe, H. et al. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470, 255–258 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Zhong, B. et al. Systematic error in seed plant phylogenomics. Genome Biol. Evol. 3, 1340–1348 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981). This paper introduces the pruning algorithm for likelihood calculation on a tree. This approach forms the basis for modern likelihood and Bayesian methods of phylogenetic analysis.

    Article  CAS  PubMed  Google Scholar 

  50. Yang, Z. Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42, 294–307 (1996).

    Article  CAS  PubMed  Google Scholar 

  51. Felsenstein, J. Phylip: Phylogenetic Inference Program, Version 3.6. (Univ. of Washington, Seattle, 2005).

  52. Adachi, J. & Hasegawa, M. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 28, 1–150 (1996).

    Google Scholar 

  53. Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003).

    Article  PubMed  Google Scholar 

  54. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).

    Article  CAS  PubMed  Google Scholar 

  55. Zwickl, D. Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets Under the Maximum Likelihood Criterion. Thesis, Univ. Texas at Austin (2006).

    Google Scholar 

  56. Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).

    Article  CAS  PubMed  Google Scholar 

  57. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).

    Article  CAS  PubMed  Google Scholar 

  58. Blanquart, S. & Lartillot, N. A site- and time-heterogeneous model of amino acid replacement. Mol. Biol. Evol. 25, 842–858 (2008).

    Article  CAS  PubMed  Google Scholar 

  59. Goldman, N. Statistical tests of models of DNA substitution. J. Mol. Evol. 36, 182–198 (1993).

    Article  CAS  PubMed  Google Scholar 

  60. Zuckerkandl, E. & Pauling, L. in Evolving Genes and Proteins (eds Bryson, V. & Vogel, H. J.) 97–166 (Academic Press, New York, 1965).

    Book  Google Scholar 

  61. Nielsen, R. & Yang, Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–936 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Yang, Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573 (1998).

    Article  CAS  PubMed  Google Scholar 

  63. Yang, Z. & Nielsen, R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908–917 (2002).

    Article  CAS  PubMed  Google Scholar 

  64. Huelsenbeck, J. P. & Rannala, B. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276, 227–232 (1997).

    Article  CAS  PubMed  Google Scholar 

  65. Whelan, S., Liò, P. & Goldman, N. Molecular phylogenetics: state of the art methods for looking into the past. Trends Genet. 17, 262–272 (2001).

    Article  CAS  PubMed  Google Scholar 

  66. Rannala, B. & Yang, Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43, 304–311 (1996).

    Article  CAS  PubMed  Google Scholar 

  67. Yang, Z. & Rannala, B. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo Method. Mol. Biol. Evol. 14, 717–724 (1997).

    Article  CAS  PubMed  Google Scholar 

  68. Mau, B. & Newton, M. A. Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo. J. Comput. Graph. Stat. 6, 122–131 (1997).

    Google Scholar 

  69. Li, S., Pearl, D. & Doss, H. Phylogenetic tree reconstruction using Markov chain Monte Carlo. J. Am. Stat. Assoc. 95, 493–508 (2000).

    Article  Google Scholar 

  70. Larget, B. & Simon, D. L. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 (1999).

    Article  CAS  Google Scholar 

  71. Huelsenbeck, J. P. & Ronquist, F. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).

    Article  CAS  PubMed  Google Scholar 

  72. Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006). This paper introduces a Bayesian MCMC algorithm (the BEAST program) to estimate rooted trees under relaxed-clock models.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  73. Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).

    Article  PubMed  Google Scholar 

  74. Felsenstein, J. & Kishino, H. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42, 193–200 (1993).

    Article  Google Scholar 

  75. Efron, B., Halloran, E. & Holmes, S. Bootstrap confidence levels for phylogenetic trees. Proc. Natl Acad. Sci. USA 93, 7085–7090 (1996); corrected article Proc. Natl Acad. Sci. USA 93, 13429–13434 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Berry, V. & Gascuel, O. On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13, 999–1011 (1996).

    Article  CAS  Google Scholar 

  77. Susko, E. First-order correct bootstrap support adjustments for splits that allow hypothesis testing when using maximum likelihood estimation. Mol. Biol. Evol. 27, 1621–1629 (2010).

    Article  CAS  PubMed  Google Scholar 

  78. Suzuki, Y., Glazko, G. V. & Nei, M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl Acad. Sci. USA 99, 16138–16143 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Lewis, P. O., Holder, M. T. & Holsinger, K. E. Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54, 241–253 (2005).

    Article  PubMed  Google Scholar 

  80. Yang, Z. & Rannala, B. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst. Biol. 54, 455–470 (2005).

    Article  PubMed  Google Scholar 

  81. Huelsenbeck, J. P. & Rannala, B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. 53, 904–913 (2004).

    Article  PubMed  Google Scholar 

  82. Brown, J. M., Hedtke, S. M., Lemmon, A. R. & Lemmon, E. M. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59, 145–161 (2010).

    Article  PubMed  Google Scholar 

  83. Rannala, B., Zhu, T. & Yang, Z. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29, 325–335 (2012).

    Article  CAS  PubMed  Google Scholar 

  84. Zhang, C., Rannala, B. & Yang, Z. Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst. Biol. 10 Feb 2012 (doi: 10.1093/sysbio/sys030).

    Article  PubMed  Google Scholar 

  85. Suchard, M. & Rambaut, A. Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Zierke, S. & Bakos, J. FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods. BMC Bioinform. 11, 184 (2010).

    Article  CAS  Google Scholar 

  87. Bininda-Emonds, O. R. P. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life (Kluwer Academic, the Netherlands, 2004).

    Book  Google Scholar 

  88. de Queiroz, A. & Gatesy, J. The supermatrix approach to systematics. Trends Ecol. Evol. 22, 34–41 (2007).

    Article  PubMed  Google Scholar 

  89. Yang, Z. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 42, 587–596 (1996).

    Article  CAS  PubMed  Google Scholar 

  90. Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9 (2006).

    Article  CAS  PubMed  Google Scholar 

  91. Ren, F., Tanaka, H. & Yang, Z. A likelihood look at the supermatrix–supertree controversy. Gene 441, 119–125 (2009).

    Article  CAS  PubMed  Google Scholar 

  92. Criscuolo, A., Berry, V., Douzery, E. J. & Gascuel, O. SDM: a fast distance-based approach for (super) tree building in phylogenomics. Syst. Biol. 55, 740–755 (2006).

    Article  PubMed  Google Scholar 

  93. Wiens, J. J. & Moen, D. S. Missing data and the accuracy of Bayesian phylogenetics. J. Syst. Evol. 46, 307–314 (2008).

    Google Scholar 

  94. Dwivedi, B. & Gadagkar, S. Phylogenetic inference under varying proportions of indel-induced alignment gaps. BMC Evol. Biol. 9, 1471–2148 (2009).

    Article  CAS  Google Scholar 

  95. Rodrigue, N., Philippe, H. & Lartillot, N. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles. Proc. Natl Acad. Sci. USA 107, 4629–4634 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Pagel, M. & Meade, A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571–581 (2004).

    Article  PubMed  Google Scholar 

  97. Nishihara, H., Okada, N. & Hasegawa, M. Rooting the Eutherian tree — the power and pitfalls of phylogenomics. Genome Biol. 8, R199 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Leigh, J. W., Susko, E., Baumgartner, M. & Roger, A. J. Testing congruence in phylogenomic analysis. Syst. Biol. 57, 104–115 (2008).

    Article  PubMed  Google Scholar 

  99. Higgins, D. G. & Sharp, P. M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237–244 (1988).

    Article  CAS  PubMed  Google Scholar 

  100. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl Acad. Sci. USA 102, 10557–10562 (2005).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  102. Löytynoja, A. & Goldman, N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632–1635 (2008).

    Article  CAS  PubMed  Google Scholar 

  103. Thorne, J. L., Kishino, H. & Felsenstein, J. An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991); erratum J. Mol. Evol. 34, 91 (1992).

    Article  CAS  PubMed  Google Scholar 

  104. Hein, J., Jensen, J. L. & Pedersen, C. N. Recursions for statistical multiple alignment. Proc. Natl Acad. Sci. USA 100, 14960–14965 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Redelings, B. D. & Suchard, M. A. Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54, 401–418 (2005).

    Article  PubMed  Google Scholar 

  106. Lunter, G., Miklos, I., Drummond, A., Jensen, J. L. & Hein, J. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6, 83 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  107. Thorne, J. L., Kishino, H. & Painter, I. S. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647–1657 (1998). This paper describes the first Bayesian MCMC method for dating species divergence using minimum and maximum bounds to incorporate fossil calibrations.

    Article  CAS  PubMed  Google Scholar 

  108. Kishino, H., Thorne, J. L. & Bruno, W. J. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol. 18, 352–361 (2001).

    Article  CAS  PubMed  Google Scholar 

  109. Rannala, B. & Yang, Z. Inferring speciation times under an episodic molecular clock. Syst. Biol. 56, 453–466 (2007).

    Article  PubMed  Google Scholar 

  110. Yang, Z. & Rannala, B. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23, 212–226 (2006).

    Article  CAS  PubMed  Google Scholar 

  111. Inoue, J., Donoghue, P. C. H. & Yang, Z. The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Syst. Biol. 59, 74–89 (2010).

    Article  PubMed  Google Scholar 

  112. Tavaré, S., Marshall, C. R., Will, O., Soligos, C. & Martin, R. D. Using the fossil record to estimate the age of the last common ancestor of extant primates. Nature 416, 726–729 (2002).

    Article  PubMed  CAS  Google Scholar 

  113. Wilkinson, R. D. et al. Dating primate divergences through an integrated analysis of palaeontological and molecular data. Syst. Biol. 60, 16–31 (2011).

    Article  CAS  PubMed  Google Scholar 

  114. Knowles, L. L. Statistical phylogeography. Annu. Rev. Ecol. Syst. 40, 593–612 (2009).

    Article  Google Scholar 

  115. Lemey, P., Rambaut, A., Drummond, A. J. & Suchard, M. A. Bayesian phylogeography finds its roots. PLoS Comp. Biol. 5, e1000520 (2009).

    Article  CAS  Google Scholar 

  116. Lemey, P., Rambaut, A., Welch, J. J. & Suchard, M. A. Phylogeography takes a relaxed random walk in continuous space and time. Mol. Biol. Evol. 27, 1877–1885 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Takahata, N., Satta, Y. & Klein, J. Divergence time and population size in the lineage leading to modern humans. Theor. Popul. Biol. 48, 198–221 (1995).

    Article  CAS  PubMed  Google Scholar 

  118. Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003). This study describes the multi-species coalescent model. This is the basis for carrying out comparative analyses of individual genomes and phylogeographic studies and for applying species tree methods.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Drummond, A. J., Nicholls, G. K., Rodrigo, A. G. & Solomon, W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161, 1307–1320 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Hey, J. Isolation with migration models for more than two populations. Mol. Biol. Evol. 27, 905–920 (2010).

    Article  CAS  PubMed  Google Scholar 

  121. Knowles, L. L. & Carstens, B. C. Delimiting species without monophyletic gene trees. Syst. Biol. 56, 887–895 (2007).

    Article  PubMed  Google Scholar 

  122. Yang, Z. & Rannala, B. Bayesian species delimitation using multilocus sequence data. Proc. Natl Acad. Sci. USA 107, 9264–9269 (2010). This paper describes a Bayesian MCMC method for delimiting species using sequence data from multiple loci under the multi-species coalescent model.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Rohland, N. et al. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants. PLoS Biol. 8, e1000564 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Bos, K. I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506–510 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Patterson, N., Richter, D. J., Gnerre, S., Lander, E. S. & Reich, D. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006).

    Article  CAS  PubMed  Google Scholar 

  126. Innan, H. & Watanabe, H. The effect of gene flow on the coalescent time in the human–chimpanzee ancestral population. Mol. Biol. Evol. 23, 1040–1047 (2006).

    Article  CAS  PubMed  Google Scholar 

  127. Becquet, C. & Przeworski, M. A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 1505–1519 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  129. Burgess, R. & Yang, Z. Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 1979–1994 (2008).

    Article  CAS  PubMed  Google Scholar 

  130. Becquet, C. & Przeworski, M. Learning about modes of speciation by computational approaches. Evolution 63, 2547–2562 (2009).

    Article  PubMed  Google Scholar 

  131. Yang, Z. A likelihood ratio test of speciation with gene flow using genomic sequence data. Genome Biol. Evol. 2, 200–211 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  132. Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Sitnikova, T., Rzhetsky, A. & Nei, M. Interior-branch and bootstrap tests of phylogenetic trees. Mol. Biol. Evol. 12, 319–333 (1995).

    CAS  PubMed  Google Scholar 

  134. Zhong, B., Yonezawa, T., Zhong, Y. & Hasegawa, M. The position of gnetales among seed plants: overcoming pitfalls of chloroplast phylogenomics. Mol. Biol. Evol. 27, 2855–2863 (2010).

    Article  CAS  PubMed  Google Scholar 

  135. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  136. Kosakovsky Pond, S. L., Frost, S. D. W. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).

    Article  CAS  Google Scholar 

  137. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  PubMed  Google Scholar 

  138. Lartillot, N. & Philippe, H. Computing Bayes factors using thermodynamic integration. Syst. Biol. 55, 195–207 (2006).

    Article  PubMed  Google Scholar 

  139. Xie, W., Lewis, P. O., Fan, Y., Kuo, L. & Chen, M.-H. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2011).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the three referees for their constructive comments and M. Hasegawa and B. Zhong for providing the seed-plant phylogenies of Fig. 3. Z.Y. is supported by a UK Biotechnology and Biological Sciences Research Council grant and a Royal Society Wolfson Research Merit Award. B.R. is supported by a US National Institutes of Health grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziheng Yang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Ziheng Yang's homepage

Bruce Rannala's homepage

A comprehensive list of phylogenetic programs maintained by Joe Felsenstein

Nature Reviews Genetics article series on Study designs

Glossary

Systematics

The inference of phylogenetic relationships among species and the use of such information to classify species.

Taxonomy

The description, classification and naming of species.

Coalescent

The process of joining ancestral lineages when the genealogical relationships of a random sample of sequences from a modern population are traced back.

Gene trees

The phylogenetic or genealogical tree of sequences at a gene locus or genomic region.

Statistical phylogeography

The statistical analysis of population data from closely related species to infer population parameters and processes such as population sizes, demography, migration patterns and rates.

Species tree

A phylogenetic tree for a set of species that underlies the gene trees at individual loci.

Systematic errors

Errors that are due to an incorrect model assumption. They are exacerbated when the data size increases.

Random sampling errors

Errors or uncertainties in parameter estimates owing to limited data.

Cluster algorithm

An algorithm of assigning a set of individuals to groups (or clusters) so that objects of the same cluster are more similar to each other than those from different clusters. Hierarchical cluster analysis can be agglomerative (starting with single elements and successively joining them into clusters) or divisive (starting with all objects and successively dividing them into partitions).

Markov chain

A stochastic sequence (or chain) of states with the property that, given the current state, the probabilities for the next state do not depend on the past states.

Transitions

Substitutions between the two pyrimidines (T↔C) or between the two purines (A↔G).

Transversions

Substitutions between a pyrimidine and a purine (T or C↔A or G).

Unrooted trees

Phylogenetic trees for which the location of the root is unspecified.

Long-branch attraction

The phenomenon of inferring an incorrect tree with long branches grouped together by parsimony or by model-based methods under simplistic models.

Likelihood ratio test

A general hypothesis-testing method that uses the likelihood to compare two nested hypotheses, often using the χ2 distribution to assess significance.

Molecular clock

The hypothesis or observation that the evolutionary rate is constant over time or across lineages.

Prior distribution

The distribution assigned to parameters before the analysis of the data.

Posterior distribution

The distribution of the parameters (or models) conditional on the data. It combines the information in the prior and in the data (likelihood).

Markov chain Monte Carlo algorithms

(MCMC algorithms). A Monte Carlo simulation is a computer simulation of a biological process using random numbers. An MCMC algorithm is a Monte Carlo simulation algorithm that generates a sample from a target distribution (often a Bayesian posterior distribution).

Clades

Groups of species that have descended from a common ancestor.

Graphical processing units

(GPU). Specialized units that are traditionally used to manipulate output on a video display and have recently been explored for use in parallel computation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Z., Rannala, B. Molecular phylogenetics: principles and practice. Nat Rev Genet 13, 303–314 (2012). https://doi.org/10.1038/nrg3186

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3186

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing