Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Hybrid tree reconstruction methods

Published: 31 December 1999 Publication History
  • Get Citation Alerts
  • Abstract

    A major computational problem in biology is the reconstruction of evolutionary trees for species sets, and accuracy is measured by comparing the topologies of the reconstructed tree and the model tree. One of the major debates in the field is whether large evolutionary trees can be even approximately accurately reconstructed from biomolecular sequences of realistically bounded lengths (up to about 2000 nucleotides) using standard techniques (polynomial-time distance methods, and heuristics for NP-hard optimization problems). Using both analytical and experimental techniques, we show that on large trees, the two most popular methods in systematic biology, Neighbor-Joining and Maximum Parsimony heuristics, as well as two promising methods introduced by theoretical computer scientists, are all likely to have significant errors in the topology reconstruction of the model tree. We also present a new general technique for combining outputs of different methods (thus producing hybrid methods), and show experimentally how one such hybrid method has better performance than its constituent parts.

    Supplementary Material

    TAR File (p5-huson.tar)
    The software suite accompanying the article; this is a small Unix tar file, which includes the source code, a Makefile, and the test files used in the article.
    PS File (vol4nbr5.ps)
    TAR File (vol4nbr5.tex.tar)

    References

    [1]
    AGARWALA, R., BAFNA, V., FARACH, M., NARAYANAN, B., PATERSON, M., AND THORUP, M. 1996. On the approximability of numerical taxonomy: fitting distances by tree metrics. Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms, 365-372.
    [2]
    ATTESON, K. 1997. The performance of neighbor-joining algorithms of phylogeny reconstruction. In T. JIANG AND D. LEE Eds., Lecture Notes in Computer Science, 1276, pp. 101-110. Springer-Verlag. Computing and Combinatorics, Third Annual International Conference, COCOON '97, Shanghai, China, August 1997, Proceedings.
    [3]
    BARTHÉLEMY, J.-P. 1991. Trees and Proximity Representations. Wiley.
    [4]
    BERRY, V. AND GASCUEL, O. 1997. Inferring evolutionary trees with strong combinatorial evidence. In T. JIANG AND D. LEE Eds., Lecture Notes in Computer Science, 1276, pp. 111-123. Springer-Verlag. Computing and Combinatorics, Third Annual International Conference, COCOON '97, Shanghai, China, August 1997 Proceedings.
    [5]
    BUNEMAN, P. 1971. The recovery of trees from measures of dissimilarity. In Mathematics in the Archaeological and Historical Sciences, pp. 387-395. Edinburgh University Press.
    [6]
    ERDÖS, P. L., STEEL, M. A., SZÉKELY, L. A., AND WARNOW, T. 1997b. Constructing big trees from short sequences. In G. Goos, J. HARTMANIS, AND J. VAN LEEUWEN Eds., Lecture Notes in Computer Science, Volume 1256. ICALP'97, 24th International Colloquium on Automata, Languages, and Programming (Silver Jubilee of EATCS), Bologna, Italy, July 7th-11th, 1997.
    [7]
    ERDÖS, P. L., STEEL, M. A., SZÉKELY, L. A., AND WARNOW, T. 1997a. A few logs suffice to build (almost) all trees I. DIMACS Technical Report 97-71, submitted to: Random Structures and Algorithms.
    [8]
    FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401-410.
    [9]
    FELSENSTEIN, J. 1989. PHYLIP--phylogeny inference package (version 3.2). Cladistics 5, 164-166.
    [10]
    FOULDS, L. R. AND GRAHAM, R. L. 1982. The steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3, 43-49.
    [11]
    HILLIS, D. 1996. Inferring complex phylogenies. Nature 383, 130-131.
    [12]
    HILLIS, D., HUELSENBECK, J., AND CUNNINGHAM, C. 1994. Application and accuracy of molecular phylogenies. Science 264, 671-677.
    [13]
    HUELSENBECK, J. 1995a. Performance of phylogenetic methods in simulation. Syst. Biol. 44, 17-48.
    [14]
    HUELSENBECK, J. 1995b. The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor-joining. Mol. Biol. Evol. 12, 5, 843-849.
    [15]
    HUELSENBECK, J. AND HILLIS, D. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42, 247-264.
    [16]
    JUKES, T. AND CANTOR, C. 1969. Evolution of protein molecules. In H. MUNRO Ed., Mammalian Protein Metabolism, pp. 21-132. Academic Press.
    [17]
    KUHNER, M. AND FELSENSTEIN, J. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459-468.
    [18]
    PURVIS, A. AND QUICKE, D. 1997. Trends in Ecology and Evolution 12, 2, 49-50.
    [19]
    RANNALA, B., HUELSENBECK, J., YANG, Z., AND NIELSEN, R. 1998. Taxon sampling and the accuracy of large phylogenies. To appear in: Systematic Biology.
    [20]
    RICE, K. 1997. ECAT, an evolution simulator, http://www.cis.upenn.edu/~krice.
    [21]
    RICE, K., DONOGHUE, M., AND OLMSTEAD, R. 1997. Analyzing large datasets: rbcl 500 revisited. Systematic Biology.
    [22]
    RICE, K. AND WARNOW, T. 1997. Parsimony is hard to beat! In T. JIANG AND D. LEE Eds., Lecture Notes in Computer Science, 1276, pp. 124-133. Springer-Verlag. Proceedings, Computing and Combinatorics, Third Annual International Conference, COCOON '97, Shanghai, China, August 1997.
    [23]
    SAITOU, N. AND IMANISHI, T. 1989. Relative efficiencies of the Fitch-Margoliash, maximum parsimony, maximum likelihood, minimum evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol. Biol. Evol. 6, 514-525.
    [24]
    SAITOU, N. AND NEI, M. 1987. The neighbor-joining method: a new method, for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406-425.
    [25]
    SCHÖNIGER, M. AND VON HAESELER, A. 1995. Performance of maximum likelihood, neighbor-joining, and maximum parsimony methods when sequence sites are not independent. Syst. Biol. 44, 4, 533-547.
    [26]
    SOURDIS, J. AND NEI, M. 1996. Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree. Mol. Biol. Evol. 5, 3, 393-311.
    [27]
    STRIMMER, K. AND VON HAESELER, A. 1996. Accuracy of neighbor-joining for n-taxon trees. Syst. Biol. 45, 4, 516-523.
    [28]
    SWOFFORD, D. L. 1992. PAUP: Phylogenetic analysis using parsimony, version 3.os.
    [29]
    WARNOW, T. 1994. Tree compatibility and inferring evolutionary history. J. of Algorithms 16, 388-407.
    [30]
    WARNOW, W. 1996. Some combinatorial problems in phylogenetics. To appear in the proceedings of the International Colloquium on Combinatorics and Graph Theory, Balatonlelle, Hungary, July 15-20, eds. A. Gyárfás, L. Lovász, L. A. Székely, in a forthcoming volume of Bolyai Society Mathematical Studies.
    [31]
    WATERMAN, M., SMITH, T., AND BEYER, W. 1977. Additive evolutionary trees. Journal Theoretical Biol. 63, 199-213.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal of Experimental Algorithmics
    ACM Journal of Experimental Algorithmics  Volume 4, Issue
    1999
    165 pages
    ISSN:1084-6654
    EISSN:1084-6654
    DOI:10.1145/347792
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 December 1999
    Published in JEA Volume 4

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2003)Performance study of phylogenetic methodsJournal of Algorithms10.1016/S0196-6774(03)00049-X48:1(173-193)Online publication date: 1-Aug-2003
    • (2003)Phylogenetic Reconstruction from Gene-Rearrangement Data with Unequal Gene ContentAlgorithms and Data Structures10.1007/978-3-540-45078-8_4(37-46)Online publication date: 2003
    • (2002)Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental AlgorithmicsExperimental Algorithmics10.1007/3-540-36383-1_8(163-180)Online publication date: 16-Dec-2002
    • (2001)Performance study of phylogenetic methodsProceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms10.5555/365411.365908(196-205)Online publication date: 9-Jan-2001
    • (2001)Absolute convergenceProceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms10.5555/365411.365443(186-195)Online publication date: 9-Jan-2001
    • (2001)Estimating true evolutionary distances between genomesProceedings of the thirty-third annual ACM symposium on Theory of computing10.1145/380752.380861(637-646)Online publication date: 6-Jul-2001
    • (2001)Zinc finger gene clusters and tandem gene duplicationProceedings of the fifth annual international conference on Computational biology10.1145/369133.369241(297-304)Online publication date: 22-Apr-2001
    • (2000)An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in CampanulaceaeComparative Genomics10.1007/978-94-011-4309-7_11(99-121)Online publication date: 2000

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media