Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Using INC Within Divide-and-Conquer Phylogeny Estimation

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2019)

Abstract

In a recent paper (Zhang, Rao, and Warnow, Algorithms for Molecular Biology 2019), the INC (incremental tree building) algorithm was presented and proven to be absolute fast converging under standard sequence evolution models. A variant of INC which allows a set of disjoint constraint trees to be provided and then uses INC to merge the constraint trees was also presented (i.e., Constrained INC). We report on a study evaluating INC on a range of simulated datasets, and show that it has very poor accuracy in comparison to standard methods. We also explore the design space for divide-and-conquer strategies for phylogeny estimation that use Constrained INC, and show modifications that provide improved accuracy. In particular, we present INC-ML, a divide-and-conquer approach to maximum likelihood (ML) estimation that comes close to the leading ML heuristics in terms of accuracy, and is more accurate than the current best distance-based methods.

Supported by the University of Illinois at Urbana-Champaign and NSF grants DGE-1144245, CCF-1535977, and CCF-1535989. Computational experiments were performed on Blue Waters, supported by NSF grants OCI-0725070 and ACI-1238993 and by the State of Illinois.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bayzid, M.S., Hunt, T., Warnow, T.: Disk covering methods improve phylogenomic analyses. BMC Genomics 15(Suppl 6), S7 (2014)

    Article  Google Scholar 

  2. Boc, A., Diallo, A., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res. 40, W573–W579 (2012)

    Article  Google Scholar 

  3. Buneman, P.: A note on the metric properties of trees. J. Comb. Theory (B) 17, 48–50 (1974)

    Article  MathSciNet  Google Scholar 

  4. Erdös, P., Steel, M., Székely, L., Warnow, T.: Local quartet splits of a binary tree infer all quartet splits via one dyadic inference rule. Comput. Artif. Intell. 16(2), 217–227 (1997)

    MathSciNet  MATH  Google Scholar 

  5. Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (I). Random Struct. Algorithms 14, 153–184 (1999)

    Article  MathSciNet  Google Scholar 

  6. Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (II). Theor. Comput. Sci. 221, 77–118 (1999)

    Article  MathSciNet  Google Scholar 

  7. Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)

    Article  Google Scholar 

  8. Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H. (ed.) Mammalian Protein Metabolism, vol. 3, pp. 21–132. Academic Press, New York (1969)

    Chapter  Google Scholar 

  9. Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)

    Article  MathSciNet  Google Scholar 

  10. Le, T.: GitHub site for the INC and constrained - INC software (2019). https://github.com/steven-le-thien/INC

  11. Le, T., Sy, A., Molloy, E., Zhang, Q., Rao, S., Warnow, T.: Using INC within divide-and-conquer phylogeny estimation - datasets (2019). https://databank.illinois.edu/datasets/IDB-8518809

  12. Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015). https://doi.org/10.1093/molbev/msv150

    Article  Google Scholar 

  13. Liu, K., et al.: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst. Biol. 61(1), 90–106 (2012). https://doi.org/10.1093/sysbio/syr095

    Article  Google Scholar 

  14. Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)

    Article  Google Scholar 

  15. Mallo, D., De Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). https://doi.org/10.1093/sysbio/syv082

    Article  Google Scholar 

  16. Mirarab, S., Nguyen, N., Wang, L.S., Guo, S., Kim, J., Warnow, T.: PASTA: ultra-large multiple sequence alignment of nucleotide and amino acid sequences. J. Comput. Biol. 22, 377–386 (2015)

    Article  Google Scholar 

  17. Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)

    Article  Google Scholar 

  18. Molloy, E.K., Warnow, T.: NJMerge: a generic technique for scaling phylogeny estimation methods and its application to species trees. In: Blanchette, M., Ouangraoua, A. (eds.) RECOMB-CG 2018. LNCS, vol. 11183, pp. 260–276. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00834-5_15

    Chapter  Google Scholar 

  19. Molloy, E.K., Warnow, T.: Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge. bioRxiv (2018). https://doi.org/10.1101/469130

  20. Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28(12), i274–i282 (2012)

    Article  Google Scholar 

  21. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2 - approximately maximum-likelihood trees for large alignments. PloS One 5(3), 1–10 (2010)

    Article  Google Scholar 

  22. Roch, S., Sly, A.: Phase transition in the sample complexity of likelihood-based phylogeny inference. Probab. Theory Relat. Fields 169(1), 3–62 (2017)

    Article  MathSciNet  Google Scholar 

  23. Sayyari, E., Whitfield, J.B., Mirarab, S.: Fragmentary gene sequences negatively impact gene tree and species tree reconstruction. Mol. Biol. Evol. 34(12), 3279–3291 (2017)

    Article  Google Scholar 

  24. Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)

    Article  Google Scholar 

  25. Swofford, D.L.: PAUP* (*Phylogenetic Analysis Using PAUP), Version 4a161 (2018). http://phylosolutions.com/paup-test/

  26. Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57–86. American Mathematical Society (1986)

    Google Scholar 

  27. Warnow, T.: Divide-and-conquer tree estimation: opportunities and challenges. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics. Springer (2019)

    Google Scholar 

  28. Warnow, T., Moret, B.M., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 186–195. Society for Industrial and Applied Mathematics (2001)

    Google Scholar 

  29. Zhang, Q., Rao, S., Warnow, T.: Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy. Algorithms Mol. Biol. 14(2), 2 (2019). https://rdcu.be/blBXm

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tandy Warnow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Le, T., Sy, A., Molloy, E.K., Zhang, Q.(., Rao, S., Warnow, T. (2019). Using INC Within Divide-and-Conquer Phylogeny Estimation. In: Holmes, I., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2019. Lecture Notes in Computer Science(), vol 11488. Springer, Cham. https://doi.org/10.1007/978-3-030-18174-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18174-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18173-4

  • Online ISBN: 978-3-030-18174-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics