Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Genotype Calling and Haplotype Phasing from Next Generation Sequencing Data

  • Chapter
  • First Online:
Statistical Analysis of Next Generation Sequencing Data

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

  • 7654 Accesses

Abstract

In this chapter, we will review current statistical and computational approaches for genotype calling and haplotype phasing from next generation data. We will focus on statistical ideas and ignore many practical bioinformatics issues such as image processing for base calling, read mapping, sequencing error rate recalibration, etc, each of which is a topic in its own right. We will give derivations of commonly used approaches, emphasize their assumptions, and aim to unify them in an all-encompassing Bayesian framework. We will point out limitations of single-site genotype likelihood methods that dominate current practice and discuss strategies to use haplotype informative reads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abecasis, G.R., et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

    Article  Google Scholar 

  2. Bansal, V., et al.: An MCMC algorithm for haplotype assembly from wholegenome sequence data. Genome Res. 18, 1336–1346 (2008)

    Article  MathSciNet  Google Scholar 

  3. Baum, L.E.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3, 1–8 (1972)

    Google Scholar 

  4. Brent, R.P.: Algorithms for Minimization Without Derivatives. Courier Dover Publications, New York (1973)

    MATH  Google Scholar 

  5. DePristo, M.A., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)

    Article  Google Scholar 

  6. Howie, B., et al.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44 955–959 (2012)

    Article  Google Scholar 

  7. Li, H.: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011)

    Article  Google Scholar 

  8. Li, H.: Improving SNP discovery by base alignment quality. Bioinformatics 27, 1157–1158 (2011)

    Article  Google Scholar 

  9. Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003)

    Google Scholar 

  10. Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)

    Article  Google Scholar 

  11. Li, Y., et al.: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010)

    Article  Google Scholar 

  12. Li, Y., et al.: Low-coverage sequencing: Implications for design of complex trait association studies. Genome Res. 21, 940–951 (2011)

    Article  Google Scholar 

  13. Martin, E.R., et al.: SeqEM: an adaptive genotype-calling approach for nextgeneration sequencing studies. Bioinformatics 26, 2803–2810 (2010)

    Article  Google Scholar 

  14. Menelaou, A., Marchini, J.: Genotype calling and phasing using nextgeneration sequencing reads and a haplotype scaffold. Bioinformatics 29, 84–91 (2013)

    Article  Google Scholar 

  15. Nielsen, R., et al.: Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011)

    Article  Google Scholar 

  16. Wen, X., Stephens, M.: Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 4, 1158–1182 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  17. Yang, W.Y., et al.: Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data. Bioinformatics 29, 2245–2252 (2013)

    Article  MATH  Google Scholar 

  18. Zhang, K., Zhi, D.: Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads. Bioinformatics 29, 2427–2434 (2013)

    Article  Google Scholar 

  19. Zhi, D., et al.: Genotype calling from next-generation sequencing data using haplotype information of reads. Bioinformatics 28, 938–946 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This work is partly supported by National Institute of Health (NIH) grant R00 RR024163. Computational portions of this research were supported by NIH S10RR026723.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Degui Zhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Zhi, D., Zhang, K. (2014). Genotype Calling and Haplotype Phasing from Next Generation Sequencing Data. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_16

Download citation

Publish with us

Policies and ethics