Haplotype phasing: existing methods and new developments

Browning, Sharon R.; Browning, Brian L.

doi:10.1038/nrg3054

Review Article
Published: 16 September 2011

Haplotype phasing: existing methods and new developments

Sharon R. Browning¹ &
Brian L. Browning²Â

Nature Reviews Genetics volumeÂ 12,Â pages 703â714 (2011)Cite this article

30k Accesses
419 Citations
40 Altmetric
Metrics details

Subjects

Key Points

Haplotype phase may be generated through either computational or experimental methods.
Computational phasing is simple and inexpensive and results in good accuracy for common variants over small regions.
Computational phasing of closely related individuals (such as parentâoffspring trios) results in high accuracy at a high proportion of sites because of the additional information provided by Mendelian constraints.
Although specialized software for analysing complex relationships is somewhat limited, good results can be obtained by treating the related individuals as if they were unrelated when performing computational phasing.
A new development in computational phasing of unrelated individuals is the detection and use of segments of identity-by-descent that arise from distant relationships. In their current form, these methods are only suitable for small, isolated populations, but improvements in algorithms may lead to applicability to large samples from outbred populations.
Experimental phasing has a very high accuracy at a high proportion of sites and can phase de novo or very rare variants without the need to obtain data from closely related individuals.
Experimental phasing currently adds substantially to the cost of generating the genotype or sequence data (at least doubling the cost) and requires technical expertise, additional preparation time and, in some cases, specialized equipment.

Abstract

Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Statistical phasing of unrelated individuals using haplotype frequencies.**

**Figure 2: Comparison of recent statistical haplotype phasing methods.**

**Figure 3: Use of identity-by-descent to determine haplotype phase.**

**Figure 4: Accuracy of statistical phasing of cryptic relatives when relationship is not explicitly accounted for.**

Accurate, scalable and integrative haplotype estimation

Article Open access 28 November 2019

Genotype phasing in pedigrees using whole-genome sequence data

Article 29 January 2020

Multiple haplotype reconstruction from allele frequency data

Article 22 April 2021

References

Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nature Rev. Genet. 12, 215â223 (2011).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906â913 (2007).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210â223 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816â834 (2010).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Kang, H., Qin, Z. S., Niu, T. & Liu, J. S. Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms. Am. J. Hum. Genet. 74, 495â510 (2004).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Browning, B. L. & Yu, Z. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am. J. Hum. Genet. 85, 847â861 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Yu, Z., Garner, C., Ziogas, A., Anton-Culver, H. & Schaid, D. J. Genotype determination for polymorphisms in linkage disequilibrium. BMC Bioinformatics 10, 63 (2009).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061â1073 (2010).
Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 21, 952â960 (2011).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Li, Y., Sidore, C., Kang, H. M., Boehnke, M. & Abecasis, G. R. Low-coverage sequencing: Implications for design of complex trait association studies. Genome Res. 21, 940â951 (2011).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Scheet, P. & Stephens, M. Linkage disequilibrium-based quality control for large-scale genetic studies. PLoS Genet. 4, e1000147 (2008).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Tishkoff, S. A. et al. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271, 1380â1387 (1996).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40, 1068â1075 (2008). This paper describes the use of an IBD-based phasing method called 'long-range phasing' in a large sample from the Icelandic population.
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832â837 (2002).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Tao, H., Cox, D. R. & Frazer, K. A. Allele-specific KRT1 expression is a complex trait. PLoS Genet. 2, e93 (2006).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Gusfield, D. Haplotype inference by pure parsimony. Lect. Notes Comp. Sci. 2676, 144â155 (2003).
ArticleÂ Google ScholarÂ
Wang, L. & Xu, Y. Haplotype inference by maximum parsimony. Bioinformatics 19, 1773â1780 (2003).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Weale, M. E. A survey of current software for haplotype phase inference. Hum. Genomics 1, 141â144 (2004).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Clark, A. G. Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7, 111â122 (1990). This paper describes the first computational phasing method for more than two markers.
CASÂ PubMedÂ Google ScholarÂ
Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B 39, 1â38 (1977).
Google ScholarÂ
Excoffier, L. & Slatkin, M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12, 921â927 (1995). This was one of the earliest papers describing the use of the EM algorithm for statistical phasing of unrelated individuals.
CASÂ PubMedÂ Google ScholarÂ
Hawley, M. E. & Kidd, K. K. HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J. Hered. 86, 409â411 (1995).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Long, J. C., Williams, R. C. & Urbanek, M. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am. J. Hum. Genet. 56, 799â810 (1995).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Qin, Z. S., Niu, T. & Liu, J. S. Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am. J. Hum. Genet. 71, 1242â1247 (2002).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Stephens, M., Smith, N. J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978â989 (2001).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564â567 (2010).
ArticleÂ PubMedÂ Google ScholarÂ
Drysdale, C. M. et al. Complex promoter and coding region Î² 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc. Natl Acad. Sci. USA 97, 10483â10488 (2000).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Rosenberg, N. et al. The frequent 5,10-methylenetetrahydrofolate reductase C677T polymorphism is associated with a common haplotype in whites, Japanese, and Africans. Am. J. Hum. Genet. 70, 758â762 (2002).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
McVean, G. A. & Cardin, N. J. Approximating the coalescent with recombination. Phil. Trans. R. Soc. B 360, 1387â1393 (2005).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213â2233 (2003). This paper describes the approximate coalescent model used by the MACH and IMPUTE statistical phasing methods. The model is similar to that used by PHASE.
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Statist. Soc. B 62, 605â655 (2000).
ArticleÂ Google ScholarÂ
Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics 159, 1299â1318 (2001).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Stephens, M. & Scheet, P. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449â462 (2005). This paper describes PHASE, which has been considered as a gold standard for computational phasing accuracy, although it is too computationally intensive to be applied to large data sets.
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629â644 (2006). This paper describes fastPHASE, which was one of the first computational phasing methods suitable for genome-wide SNP data.
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Celeux, G. & Diebolt, J. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Statist. Quart. 2, 73â82 (1985).
Google ScholarÂ
Tregouet, D. A., Escolano, S., Tiret, L., Mallet, A. & Golmard, J. L. A new algorithm for haplotype-based association analysis: the stochastic-EM algorithm. Ann. Hum. Genet. 68, 165â177 (2004).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Marchini, J. et al. A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78, 437â450 (2006).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Delaneau, O., Coulonges, C. & Zagury, J. F. Shape-IT: new rapid and accurate algorithm for haplotype inference. BMC Bioinformatics 9, 540 (2008).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084â1097 (2007). This paper describes the BEAGLE method for statistical phasing in samples of unrelated individuals.
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Auton, A. et al. Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res. 19, 795â803 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913â918 (2007).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851â861 (2007).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
The International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52â58 (2010).
Kenny, E. E. et al. Systematic haplotype analysis resolves a complex plasma plant sterol locus on the Micronesian Island of Kosrae. Proc. Natl Acad. Sci. USA 106, 13886â13891 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Browning, S. R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439â450 (2008).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Tregouet, D. A. et al. Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nature Genet. 41, 283â285 (2009).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173â182 (2011).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Browning, S. R. & Browning, B. L. High-resolution detection of identity by descent in unrelated individuals. Am. J. Hum. Genet. 86, 526â539 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hickey, J. M. et al. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genet. Sel. Evol. 43, 12 (2011).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Daetwyler, H. D., Wiggans, G. R., Hayes, B. J., Woolliams, J. A. & Goddard, M. E. Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 24 Jun 2011 (doi:10.1534/genetics.111.128082).
Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868â874 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Holm, H. et al. A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nature Genet. 43, 316â320 (2011).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Kruglyak, L., Daly, M. J., ReeveDaly, M. P. & Lander, E. S. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. H. Genet. 58, 1347â1363 (1996).
CASÂ Google ScholarÂ
Schaid, D. J., McDonnell, S. K., Wang, L., Cunningham, J. M. & Thibodeau, S. N. Caution on pedigree haplotype inference with software that assumes linkage equilibrium. Am. J. Hum. Genet. 71, 992â995 (2002).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636â639 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Rohde, K. & Fuerst, R. Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. Hum. Mutat. 17, 289â295 (2001).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Zhang, K., Sun, F. & Zhao, H. HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination. Bioinformatics 21, 90â103 (2005).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Abecasis, G. R. & Wigginton, J. E. Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am. J. Hum. Genet. 77, 754â767 (2005).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Zhang, F. & Deng, H. W. Confounding from cryptic relatedness in haplotype-based association studies. Genetica 138, 945â950 (2010).
ArticleÂ PubMedÂ Google ScholarÂ
Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443â451 (2011).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Andres, A. M. et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31, 659â671 (2007).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Huang, L. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet. 84, 235â250 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Jostins, L., Morley, K. I. & Barrett, J. C. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur. J. Hum. Genet. 19, 662â666 (2011).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Geraci, F. A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26, 2217â2225 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
He, D., Choi, A., Pipatsrisawat, K., Darwiche, A. & Eskin, E. Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26, i183âi190 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Long, Q., MacArthur, D., Ning, Z. & Tyler-Smith, C. HI: haplotype improver using paired-end short reads. Bioinformatics 25, 2436â2437 (2009).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860â921 (2001).
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 29, 59â63 (2011). This paper describes the use of an experimental phasing method that was applied to the sequence of an individual and the population-genetic inferences that were made using the phased haplotypes.
ArticleÂ CASÂ Google ScholarÂ
Suk, E.-K. K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 3 Aug 2011 (doi:10.1101/gr.125047.111).
Duitama, J., Huebsch, T., McEwen, G., Suk, E.-K. & Hoehe, M. R. in Proc. 1st ACM Int. Conf. Bioinf. Comp. Biol. 160â169 (Association for Computing Machinery, Niagara Falls, New York, 2010).
Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153âi159 (2008).
ArticleÂ PubMedÂ Google ScholarÂ
Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotech. 29, 51â57 (2011).
ArticleÂ CASÂ Google ScholarÂ
Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nature Methods 7, 299â301 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hert, D. G., Fredlake, C. P. & Barron, A. E. Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresis and non-electrophoresis methods. Electrophoresis 29, 4618â4626 (2008).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Metzker, M. L. Sequencing technologies â the next generation. Nature Rev. Genet. 11, 31â46 (2010).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133â138 (2009).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297â1303 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Su, S. Y. et al. Inferring combined CNV/SNP haplotypes from genotype data. Bioinformatics 26, 1437â1445 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Li, Z. et al. A partition-ligation-combination-subdivision EM algorithm for haplotype inference with multiallelic markers: update of the SHEsis (http://analysis.bio-x.cn). Cell Res. 19, 519â523 (2009).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Rev. Genet. 11, 415â425 (2010).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Yang, H., Chen, X. & Wong, W. H. Completely phased genome sequencing through chromosome sorting. Proc. Natl Acad. Sci. USA 108, 12â17 (2011).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
The UK IBD Genetics Consortium & The Wellcome Trust Case Control Consortium 2. Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nature Genet. 41, 1330â1334 (2009).
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661â678 (2007).

Download references

Acknowledgements

This study was supported by the US National Institutes of Health (NIH) awards R01HG005701 and R01HG004960. This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under awards 076113 and 085475. The content of this study is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the Wellcome Trust.

Author information

Authors and Affiliations

Department of Biostatistics, University of Washington, Seattle, 98195, Washington, USA
Sharon R. Browning
Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, 98195, Washington, USA
Brian L. Browning

Authors

Sharon R. Browning
View author publications
You can also search for this author in PubMedÂ Google Scholar
Brian L. Browning
View author publications
You can also search for this author in PubMedÂ Google Scholar

Corresponding authors

Correspondence to Sharon R. Browning or Brian L. Browning.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Imputation: In the context of this article, this is the estimation of missing genotype values by using the genotypes at nearby SNPs and the haplotype frequencies seen in other individuals.
Calling genotypes: Estimating genotype values from raw data. Genotyping technology provides information about the underlying genotype, typically in the form of signal intensities or read counts of the two alleles. Statistical techniques are used to resolve this information into genotype calls. Typically, information across individuals is used, and correlation across SNPs (that is, haplotype phase) is also helpful.
Identical-by-descent: Two haplotypes are identical-by-descent if they are identical copies of a haplotype inherited from a common ancestor.
Cryptic relatedness: The undocumented existence of relatives within a sample.
Posterior distribution: Probabilities that account for the prior information and the information in the data. For haplotype phase estimation, the posterior distribution accounts for all available information, including the genotypes and the estimated haplotype frequencies in the population.
Expectation maximization algorithm: (EM algorithm). An iterative approach for finding the values of the unobserved data (such as haplotype phase) that maximize the statistical likelihood of the observed incomplete data. Although the likelihood increases with each iteration, the approach is not guaranteed to find the global maximum.
Partitionâligation: A divide-and-conquer strategy that is designed to reduce the computational burden for phasing methods that do not scale well with increasing region size. A large region is divided up into smaller regions, and haplotype phase estimates from the smaller regions are used to limit the possibilities when phasing the large region.
Hidden Markov model: (HMM). A mathematically elegant and computationally tractable class of models in which the observed data are generated by an unobserved Markov process. A Markov process is a probabilistic process in which the distribution of future states (for example, states that are further along the chromosome) depends only on the current state and not on previous states.
Haplotype block: A short genomic region within which inter-marker linkage disequilibrium is strong.
Approximate coalescent: The coalescent is a model for the process by which the ancestry of alleles converges when looking back in time. An approximate coalescent is a model that generates patterns of genetic variation that are similar to patterns generated by the coalescent but that is computationally simpler.
Linkage disequilibrium: (LD). Non-independence (correlation) between genetic variants at the population level. In general, LD decreases with genomic distance and is not present between variants on different chromosomes.
Effective population size: The size of a population of randomly mating individuals that would show the same amount of genetic drift as is found in the actual population. The effective population size is usually smaller than the actual population size.
Compound heterozygosity: The presence of two deleterious variants located in the same gene but on different chromosome copies of an individual. It is possible to distinguish between compound heterozygosity and the occurrence of two variants on the same chromosome copy by determining the haplotype phase.
Dâ²: A measure of linkage disequilibrium (LD) between two markers. Dâ² takes values between 0 and 1. Absence of LD is indicated by 0, and 1 indicates maximum possible LD given the allele frequency of the markers.
Reference panel: A collection of samples that are not of direct interest but that are included in an analysis for the purposes of increasing statistical power or accuracy for the samples of interest. Reference panels are commonly used for genotype imputation and can also be used for haplotype phasing.
Genotype likelihoods: Statistical likelihoods that encapsulate the relative evidence for each possible genotype call.
Fluorescence-activated cell sorting: (FACS). A type of flow cytometry in which individual particles (such as chromosomes) are separated and fluorescence intensities (from earlier staining) are measured.
Barcode labelling: Tagging of each sample with a unique short sequence (barcode) before pooling samples. After sequencing, the sample corresponding to each read can be determined from the barcode.
Admixed ancestry: An individual has admixed ancestry if he or she has recent ancestors deriving from different continental populations.
Large-insert clones: Large haplotype fragments that are inserted into, for example, bacterial artificial chromosomes (BACs).
Shotgun sequencing: A sequencing method in which DNA is randomly sheared into small fragments before being sequenced.
Fosmid: A type of hybrid DNA molecule comprising bacterial DNA and a section of genomic DNA of ~40 kb in length.
Microfluidics: The manipulation of fluids on a very small scale. This approach can be used to separate individual chromosomes before sequencing for experimental phasing.
Metaphase: A stage of mitosis at which chromosomes are highly condensed, facilitating their separation for some experimental phasing methods.
Paired-end sequencing: Sequencing of haplotype fragments from each end. The two sequenced ends are typically separated by a gap.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Browning, S., Browning, B. Haplotype phasing: existing methods and new developments. Nat Rev Genet 12, 703â714 (2011). https://doi.org/10.1038/nrg3054

Download citation

Published: 16 September 2011
Issue Date: October 2011
DOI: https://doi.org/10.1038/nrg3054

This article is cited by

Inferring compound heterozygosity from large-scale exome sequencing data
- Michael H. Guo
- Laurent C. Francioli
- Kaitlin E. Samocha
Nature Genetics (2024)
Analysis of dog breed diversity using a composite selection index
- Wei-Tse Hsu
- Peter Williamson
- Mehar Singh Khatkar
Scientific Reports (2023)
Multiallelic models for QTL mapping in diverse polyploid populations
- Alejandro ThÃ©rÃ¨se Navarro
- Giorgio Tumino
- Chris Maliepaard
BMC Bioinformatics (2022)
A joint use of pooling and imputation for genotyping SNPs
- Camille Clouard
- Kristiina Ausmees
- Carl Nettelblad
BMC Bioinformatics (2022)
Duet: SNP-assisted structural variant calling and phasing using Oxford nanopore sequencing
- Yekai Zhou
- Amy Wing-Sze Leung
- Ruibang Luo
BMC Bioinformatics (2022)