Abstract
In this chapter, we will review current statistical and computational approaches for genotype calling and haplotype phasing from next generation data. We will focus on statistical ideas and ignore many practical bioinformatics issues such as image processing for base calling, read mapping, sequencing error rate recalibration, etc, each of which is a topic in its own right. We will give derivations of commonly used approaches, emphasize their assumptions, and aim to unify them in an all-encompassing Bayesian framework. We will point out limitations of single-site genotype likelihood methods that dominate current practice and discuss strategies to use haplotype informative reads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abecasis, G.R., et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
Bansal, V., et al.: An MCMC algorithm for haplotype assembly from wholegenome sequence data. Genome Res. 18, 1336–1346 (2008)
Baum, L.E.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3, 1–8 (1972)
Brent, R.P.: Algorithms for Minimization Without Derivatives. Courier Dover Publications, New York (1973)
DePristo, M.A., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)
Howie, B., et al.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44 955–959 (2012)
Li, H.: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011)
Li, H.: Improving SNP discovery by base alignment quality. Bioinformatics 27, 1157–1158 (2011)
Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003)
Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)
Li, Y., et al.: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010)
Li, Y., et al.: Low-coverage sequencing: Implications for design of complex trait association studies. Genome Res. 21, 940–951 (2011)
Martin, E.R., et al.: SeqEM: an adaptive genotype-calling approach for nextgeneration sequencing studies. Bioinformatics 26, 2803–2810 (2010)
Menelaou, A., Marchini, J.: Genotype calling and phasing using nextgeneration sequencing reads and a haplotype scaffold. Bioinformatics 29, 84–91 (2013)
Nielsen, R., et al.: Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011)
Wen, X., Stephens, M.: Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 4, 1158–1182 (2010)
Yang, W.Y., et al.: Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data. Bioinformatics 29, 2245–2252 (2013)
Zhang, K., Zhi, D.: Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads. Bioinformatics 29, 2427–2434 (2013)
Zhi, D., et al.: Genotype calling from next-generation sequencing data using haplotype information of reads. Bioinformatics 28, 938–946 (2012)
Acknowledgements
This work is partly supported by National Institute of Health (NIH) grant R00 RR024163. Computational portions of this research were supported by NIH S10RR026723.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Zhi, D., Zhang, K. (2014). Genotype Calling and Haplotype Phasing from Next Generation Sequencing Data. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-07212-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07211-1
Online ISBN: 978-3-319-07212-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)