Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Mixed Integer Linear Programming for Maximum-Parsimony Phylogeny Inference

Published: 01 July 2008 Publication History

Abstract

Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we present two integer linear programming (ILP) formulations to find the most parsimonious phylogenetic tree from a set of binary variation data. One method uses a flow-based formulation that can produce exponential numbers of variables and constraints in the worst case. The method has, however, proven extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods, solving several large mtDNA and Y-chromosome instances within a few seconds and giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality. An alternative formulation establishes that the problem can be solved with a polynomial-sized ILP. We further present a web server developed based on the exponential-sized ILP that performs fast maximum parsimony inferences and serves as a front end to a database of precomputed phylogenies spanning the human genome.

References

[1]
M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman & Co., 1979.
[2]
Int'l HapMap Consortium, "The International HapMap Project," Nature, vol. 426, pp. 789-796, www.hapmap.org, 2005.
[3]
E.M. Smigielski, K. Sirotkin, M. Ward, and S.T. Sherry, "dbSNP: A Database of Single Nucleotide Polymorphisms," Nucleic Acids Research, vol. 28, no. 1, pp. 352-355, 2000.
[4]
Chimpanzee Sequencing and Analysis Consortium, "Initial Sequence of the Chimpanzee Genome and Comparison with the Human Genome," Nature, vol. 437, no. 7055, pp. 69-87, http:// dx.doi.org/10.1038/nature04072, 2005.
[5]
K. Linblad-Toh, E. Winchester, M.J. Daly, D.G. Wang, J.N. Hirschhorn, J.P. Laviolette, K. Ardlie, D.E. Reich, E. Robinson, P. Sklar, N. Shah, D. Thomas, J.B. Fan, T. Gingeras, J. Warrington, N. Patil, T.J. Hudson, and E.S. Lander, "Large-Scale Discovery and Genotyping of Single-Nucleotide Polymorphisms in the Mouse," Nature Genetics, vol. 24, no. 4, pp. 381-386, 2000.
[6]
K. Linblad-Toh, C.M. Wade, T.S. Mikkelsen, E.K. Karlsson, D.B. Jaffe, M. Kamal, M. Clamp, J.L. Chang, E.J. Kulbokas, M.C. Zody, E. Mauceli, X. Xie, M. Breen, R.K. Wayne, E.A. Ostrander, C.P. Ponting, F. Galibert, D.R. Smith, P.J. deJong, E. Kirkness, P. Alvarez, T. Biagi, W. Brockman, J. Butler, C.-W. Chin, A. Cook, J. Cuff, M.J. Daly, D. DeCaprio, S. Gnerre, M. Grabherr, M. Kellis, M. Kleber, C. Bardeleben, L. Goodstadt, A. Heger, C. Hitte, L. Kim, K.-P. Kopfli, H.G. Parker, J.P. Pollinger, S.M.J. Searle, N.B. Sutter, R. Thomas, C. Webber, and E.S. Lander, "Genome Sequence, Comparative Analysis and Haplotype Structure of the Domestic Dog," Nature, vol. 438, no. 7069, pp. 803-819, http://dx.doi.org/ 10.1038/nature04338, 2005.
[7]
ENCODE Project Consortium, "The ENCODE (ENCyclopedia of DNA Elements) Project," Science, vol. 306, no. 5696, pp. 636-640, 2004.
[8]
R. Agarwala and D. Fernandez-Baca, "A Polynomial-Time Algorithm for the Perfect Phylogeny Problem When the Number of Character States Is Fixed," SIAM J. Computing, vol. 23, pp. 1216- 1224, 1994.
[9]
D. Gusfield, "Efficient Algorithms for Inferring Evolutionary Trees," Networks, vol. 21, pp. 19-28, 1991.
[10]
S. Kannan and T. Warnow, "A Fast Algorithm for the Computation and Enumeration of Perfect Phylogenies," SIAM J. Computing, vol. 26, pp. 1749-1763, 1997.
[11]
H.J. Bandelt, P. Forster, B.C. Sykes, and M.B. Richards, "Mitochondrial Portraits of Human Populations Using Median Networks," Genetics, vol. 141, pp. 743-753, 1989.
[12]
J. Felsenstein, "PHYLIP (Phylogeny Inference Package) Version 3.6," distributed by the author, Dept. of Genome Sciences, Univ. of Washington, 2005.
[13]
N. Saitou and M. Nei, "The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees," Molecular Biology and Evolution, vol. 4, no. 4, pp. 406-425, 1987.
[14]
G.E. Blelloch, K. Dhamdhere, E. Halperin, R. Ravi, R. Schwartz, and S. Sridhar, "Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction," Proc. 33rd Int'l Colloquium Automata, Languages and Programming (ICALP '06), pp. 667- 689, 2006.
[15]
D. Fernandez-Baca and J. Lagergren, "A Polynomial-Time Algorithm for Near-Perfect Phylogeny," SIAM J. Computing, vol. 32, pp. 1115-1127, 2003.
[16]
S. Sridhar, G.E. Blelloch, R. Ravi, and R. Schwartz, "Optimal Imperfect Phylogeny Reconstruction and Haplotyping," Proc. Computational Systems Bioinformatics Conf. (CSB '06), pp. 199-210, 2006.
[17]
S. Sridhar, K. Dhamdhere, G.E. Blelloch, E. Halperin, R. Ravi, and R. Schwartz, "Simple Reconstruction of Binary Near-Perfect Phylogenetic Trees," Proc. Int'l Workshop Bioinformatics Research and Applications (IWBRA), 2006.
[18]
D. Gusfield, "Haplotyping by Pure Parsimony," Proc. 14th Symp. Combinatorial Pattern Matching (CPM '03), pp. 144-155, 2003.
[19]
Steiner Trees in Industry, X. Cheng and D.Z. Zu, eds., Springer, 2002.
[20]
F.K. Hwang, D.S. Richards, and P. Winter, "The Steiner Minimum Tree Problems," Annals of Discrete Math., vol. 53, 1992.
[21]
L.R. Foulds and R.L. Graham, "The Steiner Problem in Phylogeny Is NP-Complete," Advances in Applied Math., vol. 3, 1982.
[22]
P. Buneman, "The Recovery of Trees from Measures of Dissimilarity," Math. in the Archeological and Historical Sciences, F. Hodson et al., eds., pp. 387-395, 1971.
[23]
J. Barthélemy, "From Copair Hypergraphs to Median Graphs with Latent Vertices," Discrete Math, vol. 76, pp. 9-28, 1989.
[24]
C. Semple and M. Steel, Phylogenetics. Oxford Univ. Press, 2003.
[25]
V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph, "Haplotyping as Perfect Phylogeny: A Direct Approach," J. Computational Biology, vol. 10, pp. 323-340, 2003.
[26]
D. Gusfield and V. Bansal, "A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters," Proc. Ninth Int'l Conf. Research in Computational Molecular Biology (RECOMB '05), pp. 217-232, 2005.
[27]
J. Beasley, "An Algorithm for the Steiner Problem in Graphs," Networks, vol. 14, pp. 147-159, 1984.
[28]
N. Maculan, "The Steiner Problem in Graphs," Annals of Discrete Math., vol. 31, pp. 185-212, 1987.
[29]
R. Wong, "A Dual Ascent Approach for Steiner Tree Problems on a Directed Graph," Math. Programming, vol. 28, pp. 271-287, 1984.
[30]
A.C. Stone, R.C. Griffiths, S.L. Zegura, and M.F. Hammer, "High Levels of Y-Chromosome Nucleotide Diversity in the Genus Pan," Proc. Nat'l Academy of Sciences, vol. 99, pp. 43-48, 2002.
[31]
T. Wirth, X. Wang, B. Linz, R.P. Novick, J.K. Lum, M. Blaser, G. Morelli, D. Falush, and M. Achtman, "Distinguishing Human Ethnic Groups by Means of Sequences from Helicobacter Pylori: Lessons from Ladakh," Proc. Nat'l Academy of Sciences, vol. 101, no. 14, pp. 4746-4751, 2004.
[32]
S. Sharma, A. Saha, E. Rai, A. Bhat, and R. Bamezai, "Human mtDNA Hypervariable Regions, HVR I and II, Hint at Deep Common Maternal Founder and Subsequent Maternal Gene Flow in Indian Population Groups," Am. J. Human Genetics, vol. 50, pp. 497-506, 2005.
[33]
C.J. Lewis, R. Tito, B. Lizarraga, and A. Stone, "Land, Language, and Loci: mtDNA in Native Americans and the Genetic History of Peru," Am. J. Physical Anthropology, vol. 127, pp. 351-360, 2005.
[34]
A. Helgason, G. Palsson, H.S. Pedersen, E. Angulalik, E.D. Gunnarsdottir, B. Yngvadottir, and K. Stefansson, "mtDNA Variation in Inuit Populations of Greenland and Canada: Migration History and Population Structure," Am. J. Physical Anthropology, vol. 130, pp. 123-134, 2006.
[35]
M. Merimaa, M. Liivak, E. Heinaru, J. Truu, and A. Heinaru, "Functional Co-Adaption of Phenol Hydroxylase and Catechol 2,3-Dioxygenase Genes in Bacteria Possessing Different Phenol and p-Cresol Degradation Pathways," Proc. Eighth Symp. Bacterial Genetics and Ecology (BAGECO '05), vol. 31, pp. 185-212, 2005.
[36]
S. Sridhar, F. Lam, G. Blelloch, R. Ravi, and R. Schwartz, "Efficiently Finding the Most Parsimonious Phylogenetic Tree via Linear Programming," Proc. Int'l Symp. Bioinformatics Research and Applications (ISBRA '07), pp. 37-48, 2007.

Cited By

View all
  • (2012)Identifying rogue taxa through reduced consensusProceedings of the 8th international conference on Bioinformatics Research and Applications10.1007/978-3-642-30191-9_9(87-98)Online publication date: 21-May-2012
  • (2009)Constructing majority-rule supertreesProceedings of the 9th international conference on Algorithms in bioinformatics10.5555/1812906.1812913(73-84)Online publication date: 12-Sep-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 3
July 2008
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2008
Published in TCBB Volume 5, Issue 3

Author Tags

  1. Algorithms
  2. Computational Biology
  3. Integer Linear Programming
  4. Maximum parsimony
  5. Phylogenetic tree reconstruction
  6. Steiner tree problem

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Identifying rogue taxa through reduced consensusProceedings of the 8th international conference on Bioinformatics Research and Applications10.1007/978-3-642-30191-9_9(87-98)Online publication date: 21-May-2012
  • (2009)Constructing majority-rule supertreesProceedings of the 9th international conference on Algorithms in bioinformatics10.5555/1812906.1812913(73-84)Online publication date: 12-Sep-2009

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media