Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Haplotype phasing: existing methods and new developments

Key Points

  • Haplotype phase may be generated through either computational or experimental methods.

  • Computational phasing is simple and inexpensive and results in good accuracy for common variants over small regions.

  • Computational phasing of closely related individuals (such as parent–offspring trios) results in high accuracy at a high proportion of sites because of the additional information provided by Mendelian constraints.

  • Although specialized software for analysing complex relationships is somewhat limited, good results can be obtained by treating the related individuals as if they were unrelated when performing computational phasing.

  • A new development in computational phasing of unrelated individuals is the detection and use of segments of identity-by-descent that arise from distant relationships. In their current form, these methods are only suitable for small, isolated populations, but improvements in algorithms may lead to applicability to large samples from outbred populations.

  • Experimental phasing has a very high accuracy at a high proportion of sites and can phase de novo or very rare variants without the need to obtain data from closely related individuals.

  • Experimental phasing currently adds substantially to the cost of generating the genotype or sequence data (at least doubling the cost) and requires technical expertise, additional preparation time and, in some cases, specialized equipment.

Abstract

Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Statistical phasing of unrelated individuals using haplotype frequencies.
Figure 2: Comparison of recent statistical haplotype phasing methods.
Figure 3: Use of identity-by-descent to determine haplotype phase.
Figure 4: Accuracy of statistical phasing of cryptic relatives when relationship is not explicitly accounted for.

Similar content being viewed by others

References

  1. Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nature Rev. Genet. 12, 215–223 (2011).

    Article  CAS  PubMed  Google Scholar 

  2. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007).

    Article  CAS  PubMed  Google Scholar 

  3. Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Kang, H., Qin, Z. S., Niu, T. & Liu, J. S. Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms. Am. J. Hum. Genet. 74, 495–510 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Browning, B. L. & Yu, Z. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am. J. Hum. Genet. 85, 847–861 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Yu, Z., Garner, C., Ziogas, A., Anton-Culver, H. & Schaid, D. J. Genotype determination for polymorphisms in linkage disequilibrium. BMC Bioinformatics 10, 63 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  8. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  9. Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 21, 952–960 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li, Y., Sidore, C., Kang, H. M., Boehnke, M. & Abecasis, G. R. Low-coverage sequencing: Implications for design of complex trait association studies. Genome Res. 21, 940–951 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Scheet, P. & Stephens, M. Linkage disequilibrium-based quality control for large-scale genetic studies. PLoS Genet. 4, e1000147 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Tishkoff, S. A. et al. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271, 1380–1387 (1996).

    Article  CAS  PubMed  Google Scholar 

  13. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40, 1068–1075 (2008). This paper describes the use of an IBD-based phasing method called 'long-range phasing' in a large sample from the Icelandic population.

    Article  CAS  PubMed  Google Scholar 

  14. Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).

    Article  CAS  PubMed  Google Scholar 

  15. Tao, H., Cox, D. R. & Frazer, K. A. Allele-specific KRT1 expression is a complex trait. PLoS Genet. 2, e93 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Gusfield, D. Haplotype inference by pure parsimony. Lect. Notes Comp. Sci. 2676, 144–155 (2003).

    Article  Google Scholar 

  17. Wang, L. & Xu, Y. Haplotype inference by maximum parsimony. Bioinformatics 19, 1773–1780 (2003).

    Article  CAS  PubMed  Google Scholar 

  18. Weale, M. E. A survey of current software for haplotype phase inference. Hum. Genomics 1, 141–144 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Clark, A. G. Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7, 111–122 (1990). This paper describes the first computational phasing method for more than two markers.

    CAS  PubMed  Google Scholar 

  20. Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B 39, 1–38 (1977).

    Google Scholar 

  21. Excoffier, L. & Slatkin, M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12, 921–927 (1995). This was one of the earliest papers describing the use of the EM algorithm for statistical phasing of unrelated individuals.

    CAS  PubMed  Google Scholar 

  22. Hawley, M. E. & Kidd, K. K. HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J. Hered. 86, 409–411 (1995).

    Article  CAS  PubMed  Google Scholar 

  23. Long, J. C., Williams, R. C. & Urbanek, M. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am. J. Hum. Genet. 56, 799–810 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Qin, Z. S., Niu, T. & Liu, J. S. Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am. J. Hum. Genet. 71, 1242–1247 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Stephens, M., Smith, N. J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010).

    Article  PubMed  Google Scholar 

  27. Drysdale, C. M. et al. Complex promoter and coding region β 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc. Natl Acad. Sci. USA 97, 10483–10488 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Rosenberg, N. et al. The frequent 5,10-methylenetetrahydrofolate reductase C677T polymorphism is associated with a common haplotype in whites, Japanese, and Africans. Am. J. Hum. Genet. 70, 758–762 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. McVean, G. A. & Cardin, N. J. Approximating the coalescent with recombination. Phil. Trans. R. Soc. B 360, 1387–1393 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003). This paper describes the approximate coalescent model used by the MACH and IMPUTE statistical phasing methods. The model is similar to that used by PHASE.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Statist. Soc. B 62, 605–655 (2000).

    Article  Google Scholar 

  32. Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Stephens, M. & Scheet, P. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449–462 (2005). This paper describes PHASE, which has been considered as a gold standard for computational phasing accuracy, although it is too computationally intensive to be applied to large data sets.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006). This paper describes fastPHASE, which was one of the first computational phasing methods suitable for genome-wide SNP data.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Celeux, G. & Diebolt, J. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Statist. Quart. 2, 73–82 (1985).

    Google Scholar 

  37. Tregouet, D. A., Escolano, S., Tiret, L., Mallet, A. & Golmard, J. L. A new algorithm for haplotype-based association analysis: the stochastic-EM algorithm. Ann. Hum. Genet. 68, 165–177 (2004).

    Article  CAS  PubMed  Google Scholar 

  38. Marchini, J. et al. A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78, 437–450 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Delaneau, O., Coulonges, C. & Zagury, J. F. Shape-IT: new rapid and accurate algorithm for haplotype inference. BMC Bioinformatics 9, 540 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007). This paper describes the BEAGLE method for statistical phasing in samples of unrelated individuals.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Auton, A. et al. Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res. 19, 795–803 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    Article  CAS  PubMed  Google Scholar 

  44. The International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  45. Kenny, E. E. et al. Systematic haplotype analysis resolves a complex plasma plant sterol locus on the Micronesian Island of Kosrae. Proc. Natl Acad. Sci. USA 106, 13886–13891 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Browning, S. R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439–450 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Tregouet, D. A. et al. Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nature Genet. 41, 283–285 (2009).

    Article  CAS  PubMed  Google Scholar 

  48. Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Browning, S. R. & Browning, B. L. High-resolution detection of identity by descent in unrelated individuals. Am. J. Hum. Genet. 86, 526–539 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Hickey, J. M. et al. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genet. Sel. Evol. 43, 12 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Daetwyler, H. D., Wiggans, G. R., Hayes, B. J., Woolliams, J. A. & Goddard, M. E. Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 24 Jun 2011 (doi:10.1534/genetics.111.128082).

  52. Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Holm, H. et al. A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nature Genet. 43, 316–320 (2011).

    Article  CAS  PubMed  Google Scholar 

  54. Kruglyak, L., Daly, M. J., ReeveDaly, M. P. & Lander, E. S. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. H. Genet. 58, 1347–1363 (1996).

    CAS  Google Scholar 

  55. Schaid, D. J., McDonnell, S. K., Wang, L., Cunningham, J. M. & Thibodeau, S. N. Caution on pedigree haplotype inference with software that assumes linkage equilibrium. Am. J. Hum. Genet. 71, 992–995 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Rohde, K. & Fuerst, R. Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. Hum. Mutat. 17, 289–295 (2001).

    Article  CAS  PubMed  Google Scholar 

  58. Zhang, K., Sun, F. & Zhao, H. HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination. Bioinformatics 21, 90–103 (2005).

    Article  CAS  PubMed  Google Scholar 

  59. Abecasis, G. R. & Wigginton, J. E. Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am. J. Hum. Genet. 77, 754–767 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Zhang, F. & Deng, H. W. Confounding from cryptic relatedness in haplotype-based association studies. Genetica 138, 945–950 (2010).

    Article  PubMed  Google Scholar 

  61. Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).

    Article  CAS  PubMed  Google Scholar 

  62. Andres, A. M. et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31, 659–671 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235–250 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Jostins, L., Morley, K. I. & Barrett, J. C. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur. J. Hum. Genet. 19, 662–666 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Geraci, F. A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26, 2217–2225 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. He, D., Choi, A., Pipatsrisawat, K., Darwiche, A. & Eskin, E. Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26, i183–i190 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Long, Q., MacArthur, D., Ning, Z. & Tyler-Smith, C. HI: haplotype improver using paired-end short reads. Bioinformatics 25, 2436–2437 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  69. Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 29, 59–63 (2011). This paper describes the use of an experimental phasing method that was applied to the sequence of an individual and the population-genetic inferences that were made using the phased haplotypes.

    Article  CAS  Google Scholar 

  70. Suk, E.-K. K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 3 Aug 2011 (doi:10.1101/gr.125047.111).

  71. Duitama, J., Huebsch, T., McEwen, G., Suk, E.-K. & Hoehe, M. R. in Proc. 1st ACM Int. Conf. Bioinf. Comp. Biol. 160–169 (Association for Computing Machinery, Niagara Falls, New York, 2010).

  72. Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).

    Article  PubMed  Google Scholar 

  73. Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotech. 29, 51–57 (2011).

    Article  CAS  Google Scholar 

  74. Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nature Methods 7, 299–301 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Hert, D. G., Fredlake, C. P. & Barron, A. E. Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresis and non-electrophoresis methods. Electrophoresis 29, 4618–4626 (2008).

    Article  CAS  PubMed  Google Scholar 

  76. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Metzker, M. L. Sequencing technologies — the next generation. Nature Rev. Genet. 11, 31–46 (2010).

    Article  CAS  PubMed  Google Scholar 

  78. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    Article  CAS  PubMed  Google Scholar 

  79. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Su, S. Y. et al. Inferring combined CNV/SNP haplotypes from genotype data. Bioinformatics 26, 1437–1445 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Li, Z. et al. A partition-ligation-combination-subdivision EM algorithm for haplotype inference with multiallelic markers: update of the SHEsis (http://analysis.bio-x.cn). Cell Res. 19, 519–523 (2009).

    Article  CAS  PubMed  Google Scholar 

  82. Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Rev. Genet. 11, 415–425 (2010).

    Article  CAS  PubMed  Google Scholar 

  83. Yang, H., Chen, X. & Wong, W. H. Completely phased genome sequencing through chromosome sorting. Proc. Natl Acad. Sci. USA 108, 12–17 (2011).

    Article  CAS  PubMed  Google Scholar 

  84. The UK IBD Genetics Consortium & The Wellcome Trust Case Control Consortium 2. Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nature Genet. 41, 1330–1334 (2009).

  85. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

Download references

Acknowledgements

This study was supported by the US National Institutes of Health (NIH) awards R01HG005701 and R01HG004960. This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under awards 076113 and 085475. The content of this study is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the Wellcome Trust.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sharon R. Browning or Brian L. Browning.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Sharon R. Browning's homepage

Brian L. Browning's homepage

Arlequin

BEAGLE

fastPHASE

GENEHUNTER

The Genome Analysis Toolkit

IMPUTE2

MACH

MERLIN

Nature Reviews Genetics series on Study Designs

PHASE

PL-EM

'Read-backed phasing' algorithm

SHAPE-IT

Glossary

Imputation

In the context of this article, this is the estimation of missing genotype values by using the genotypes at nearby SNPs and the haplotype frequencies seen in other individuals.

Calling genotypes

Estimating genotype values from raw data. Genotyping technology provides information about the underlying genotype, typically in the form of signal intensities or read counts of the two alleles. Statistical techniques are used to resolve this information into genotype calls. Typically, information across individuals is used, and correlation across SNPs (that is, haplotype phase) is also helpful.

Identical-by-descent

Two haplotypes are identical-by-descent if they are identical copies of a haplotype inherited from a common ancestor.

Cryptic relatedness

The undocumented existence of relatives within a sample.

Posterior distribution

Probabilities that account for the prior information and the information in the data. For haplotype phase estimation, the posterior distribution accounts for all available information, including the genotypes and the estimated haplotype frequencies in the population.

Expectation maximization algorithm

(EM algorithm). An iterative approach for finding the values of the unobserved data (such as haplotype phase) that maximize the statistical likelihood of the observed incomplete data. Although the likelihood increases with each iteration, the approach is not guaranteed to find the global maximum.

Partition–ligation

A divide-and-conquer strategy that is designed to reduce the computational burden for phasing methods that do not scale well with increasing region size. A large region is divided up into smaller regions, and haplotype phase estimates from the smaller regions are used to limit the possibilities when phasing the large region.

Hidden Markov model

(HMM). A mathematically elegant and computationally tractable class of models in which the observed data are generated by an unobserved Markov process. A Markov process is a probabilistic process in which the distribution of future states (for example, states that are further along the chromosome) depends only on the current state and not on previous states.

Haplotype block

A short genomic region within which inter-marker linkage disequilibrium is strong.

Approximate coalescent

The coalescent is a model for the process by which the ancestry of alleles converges when looking back in time. An approximate coalescent is a model that generates patterns of genetic variation that are similar to patterns generated by the coalescent but that is computationally simpler.

Linkage disequilibrium

(LD). Non-independence (correlation) between genetic variants at the population level. In general, LD decreases with genomic distance and is not present between variants on different chromosomes.

Effective population size

The size of a population of randomly mating individuals that would show the same amount of genetic drift as is found in the actual population. The effective population size is usually smaller than the actual population size.

Compound heterozygosity

The presence of two deleterious variants located in the same gene but on different chromosome copies of an individual. It is possible to distinguish between compound heterozygosity and the occurrence of two variants on the same chromosome copy by determining the haplotype phase.

D′

A measure of linkage disequilibrium (LD) between two markers. D′ takes values between 0 and 1. Absence of LD is indicated by 0, and 1 indicates maximum possible LD given the allele frequency of the markers.

Reference panel

A collection of samples that are not of direct interest but that are included in an analysis for the purposes of increasing statistical power or accuracy for the samples of interest. Reference panels are commonly used for genotype imputation and can also be used for haplotype phasing.

Genotype likelihoods

Statistical likelihoods that encapsulate the relative evidence for each possible genotype call.

Fluorescence-activated cell sorting

(FACS). A type of flow cytometry in which individual particles (such as chromosomes) are separated and fluorescence intensities (from earlier staining) are measured.

Barcode labelling

Tagging of each sample with a unique short sequence (barcode) before pooling samples. After sequencing, the sample corresponding to each read can be determined from the barcode.

Admixed ancestry

An individual has admixed ancestry if he or she has recent ancestors deriving from different continental populations.

Large-insert clones

Large haplotype fragments that are inserted into, for example, bacterial artificial chromosomes (BACs).

Shotgun sequencing

A sequencing method in which DNA is randomly sheared into small fragments before being sequenced.

Fosmid

A type of hybrid DNA molecule comprising bacterial DNA and a section of genomic DNA of ~40 kb in length.

Microfluidics

The manipulation of fluids on a very small scale. This approach can be used to separate individual chromosomes before sequencing for experimental phasing.

Metaphase

A stage of mitosis at which chromosomes are highly condensed, facilitating their separation for some experimental phasing methods.

Paired-end sequencing

Sequencing of haplotype fragments from each end. The two sequenced ends are typically separated by a gap.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Browning, S., Browning, B. Haplotype phasing: existing methods and new developments. Nat Rev Genet 12, 703–714 (2011). https://doi.org/10.1038/nrg3054

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3054

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research