Abstract
In a recent paper (Zhang, Rao, and Warnow, Algorithms for Molecular Biology 2019), the INC (incremental tree building) algorithm was presented and proven to be absolute fast converging under standard sequence evolution models. A variant of INC which allows a set of disjoint constraint trees to be provided and then uses INC to merge the constraint trees was also presented (i.e., Constrained INC). We report on a study evaluating INC on a range of simulated datasets, and show that it has very poor accuracy in comparison to standard methods. We also explore the design space for divide-and-conquer strategies for phylogeny estimation that use Constrained INC, and show modifications that provide improved accuracy. In particular, we present INC-ML, a divide-and-conquer approach to maximum likelihood (ML) estimation that comes close to the leading ML heuristics in terms of accuracy, and is more accurate than the current best distance-based methods.
Supported by the University of Illinois at Urbana-Champaign and NSF grants DGE-1144245, CCF-1535977, and CCF-1535989. Computational experiments were performed on Blue Waters, supported by NSF grants OCI-0725070 and ACI-1238993 and by the State of Illinois.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bayzid, M.S., Hunt, T., Warnow, T.: Disk covering methods improve phylogenomic analyses. BMC Genomics 15(Suppl 6), S7 (2014)
Boc, A., Diallo, A., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res. 40, W573–W579 (2012)
Buneman, P.: A note on the metric properties of trees. J. Comb. Theory (B) 17, 48–50 (1974)
Erdös, P., Steel, M., Székely, L., Warnow, T.: Local quartet splits of a binary tree infer all quartet splits via one dyadic inference rule. Comput. Artif. Intell. 16(2), 217–227 (1997)
Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (I). Random Struct. Algorithms 14, 153–184 (1999)
Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (II). Theor. Comput. Sci. 221, 77–118 (1999)
Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H. (ed.) Mammalian Protein Metabolism, vol. 3, pp. 21–132. Academic Press, New York (1969)
Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)
Le, T.: GitHub site for the INC and constrained - INC software (2019). https://github.com/steven-le-thien/INC
Le, T., Sy, A., Molloy, E., Zhang, Q., Rao, S., Warnow, T.: Using INC within divide-and-conquer phylogeny estimation - datasets (2019). https://databank.illinois.edu/datasets/IDB-8518809
Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015). https://doi.org/10.1093/molbev/msv150
Liu, K., et al.: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst. Biol. 61(1), 90–106 (2012). https://doi.org/10.1093/sysbio/syr095
Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)
Mallo, D., De Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). https://doi.org/10.1093/sysbio/syv082
Mirarab, S., Nguyen, N., Wang, L.S., Guo, S., Kim, J., Warnow, T.: PASTA: ultra-large multiple sequence alignment of nucleotide and amino acid sequences. J. Comput. Biol. 22, 377–386 (2015)
Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)
Molloy, E.K., Warnow, T.: NJMerge: a generic technique for scaling phylogeny estimation methods and its application to species trees. In: Blanchette, M., Ouangraoua, A. (eds.) RECOMB-CG 2018. LNCS, vol. 11183, pp. 260–276. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00834-5_15
Molloy, E.K., Warnow, T.: Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge. bioRxiv (2018). https://doi.org/10.1101/469130
Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28(12), i274–i282 (2012)
Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2 - approximately maximum-likelihood trees for large alignments. PloS One 5(3), 1–10 (2010)
Roch, S., Sly, A.: Phase transition in the sample complexity of likelihood-based phylogeny inference. Probab. Theory Relat. Fields 169(1), 3–62 (2017)
Sayyari, E., Whitfield, J.B., Mirarab, S.: Fragmentary gene sequences negatively impact gene tree and species tree reconstruction. Mol. Biol. Evol. 34(12), 3279–3291 (2017)
Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)
Swofford, D.L.: PAUP* (*Phylogenetic Analysis Using PAUP), Version 4a161 (2018). http://phylosolutions.com/paup-test/
Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57–86. American Mathematical Society (1986)
Warnow, T.: Divide-and-conquer tree estimation: opportunities and challenges. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics. Springer (2019)
Warnow, T., Moret, B.M., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 186–195. Society for Industrial and Applied Mathematics (2001)
Zhang, Q., Rao, S., Warnow, T.: Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy. Algorithms Mol. Biol. 14(2), 2 (2019). https://rdcu.be/blBXm
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Le, T., Sy, A., Molloy, E.K., Zhang, Q.(., Rao, S., Warnow, T. (2019). Using INC Within Divide-and-Conquer Phylogeny Estimation. In: Holmes, I., MartÃn-Vide, C., Vega-RodrÃguez, M. (eds) Algorithms for Computational Biology. AlCoB 2019. Lecture Notes in Computer Science(), vol 11488. Springer, Cham. https://doi.org/10.1007/978-3-030-18174-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-18174-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18173-4
Online ISBN: 978-3-030-18174-1
eBook Packages: Computer ScienceComputer Science (R0)