Abstract
Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach toward this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [1] proposed two measures, R h and R s , that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing recombination lower bounds. Our principal results include:
-
We show that computing the lower bound R h is NP-hard and adapt the greedy algorithm for the set cover problem [2] to obtain a polynomial time algorithm for computing a diversity based bound R g . This algorithm is several orders of magnitude faster than the Recmin program [1] and the bound R g matches the bound R h almost always.
-
We also show that computing the lower bound is also NP-hard using a reduction from MAX-2SAT. We give a O(m 2n) time algorithm for computing R s for a dataset with n haplotypes and m SNP’s. We propose a new bound R I which extends the history based bound R s using the notion of intermediate haplotypes. This bound detects more recombination events than both R h and R s bounds on many real datasets.
-
We extend our algorithms for computing R g and R s to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [3] than previous bounds and provide stronger evidence for the presence of a recombination hotspot.
-
We apply our lower bounds to a real dataset [4] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Myers, S., Griffiths, R.: Bounds on the Minimum Number of Recombination Events in a Sample History. Genetics 163, 375–394 (2003)
Johnson, D.: Approximation algorithms for combinatorial problems. Journal of Comput. System Sci. 9, 256–278 (1972)
Nickerson, D., et al.: DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature 19, 233–240 (1998)
Jeffreys, A.J., Kauppi, L., Neumann, R.: Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics 29, 217–222 (2001)
Gabriel, S.B., et al.: The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002)
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)
Jeffreys, A., Ritchie, A., Neumann, R.: High resolution analysis of haplotype diversity and meiotic crossover in the human tap2 recombination hotspot. Hum. Mol. Genet. 9, 725–733 (2000)
Griffiths, R.C., Marjoram, P.: Ancestral inference from samples of DNA sequences with recombination. Journal of Computational Biology 3, 479–502 (1996)
Fearnhead, P., Donnelly, P.: Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001)
Hudson, R.R.: Two-locus sampling distributions and their applications. Genetics 159, 1805–1817 (2001)
Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003)
The International HapMap Consortium: The international hapmap project. Nature 426, 789–796 (2003)
McVean, G., et al.: The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004)
Crawford, D., et al.: Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics 36, 700–706 (2004)
Hein, J.: Reconstructing Evolution of sequences subject to recombination using parsimony. Math. Biosci. 98, 185–200 (1990)
Hein, J.: A Heuristic Method to Reconstruct the History of Sequences Subject to Recombination. J. Mol. Evol. 20, 402–411 (1993)
Song, Y., Hein, J.: Parsimonious Reconstruction of Sequence Evolution and Haplotype Blocks: Finding the Minimum Number of Recombination Events. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 287–302. Springer, Heidelberg (2003)
Wang, L., Zhang, K., Zhang, L.: Perfect phylogenetic networks with recombination. Journal of Computational Biology 8, 69–78 (2001)
Gusfield, D., Eddhu, S., Langley, C.: Efficient reconstruction of phylogenetic networks with constrained recombination. In: Proc. of IEEE CSB Conference, pp. 363–374 (2003)
Templeton, A., et al.: Recombinational and mutational hotspots within the human lipoprotein lipase gene. American Journal of Human Genetics 66, 69–83 (2000)
Fearnhead, P., et al.: Application of coalescent methods to reveal fine-scale rate variation and recombination hotspots. Genetics 167, 2067–2081 (2004)
Kreitman, M.: Nucleotide Polymorphism at the Alcohol Dehydrogenase Locus of Drosophila Melanogaster. Nature 304, 412–417 (1983)
SeattleSNPs. NHLBI Program for Genomic Applications, UW-FHCRC, Seattle, WA (2004), http://pga.gs.washington.edu
Hudson, R.R., Kaplan, N.L.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985)
Song, Y., Hein, J.: On the minimum number of recombination events in the evolutionary history of dna sequences. Journal of Mathematical Biology 48, 160–186 (2004)
Bafna, V., Bansal, V.: The number of recombination events in a sample history: Conflict graph and lower bounds. IEEE Trans. on Comp. Biology and Bioinformatics 1, 78–90 (2004)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman and Company, New York (1979)
Eskin, E., Halperin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20, 1842–1849 (2003)
Kimmel, G., Shamir, R.: The incomplete perfect phylogeny haplotype problem. In: Second RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotypes (2004)
Clark, A., et al.: Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. American Journal of Human Genetics 63, 595–612 (1998)
Goldstein, D.B.: Islands of linkage disequilibrium. Nature Genetics 29, 109–111 (2001)
Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bafna, V., Bansal, V. (2005). Improved Recombination Lower Bounds for Haplotype Data. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_43
Download citation
DOI: https://doi.org/10.1007/11415770_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)