Abstract
The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects.
A fundamental problem with such approaches for population studies is that the uncertainly of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of Non-Hodgkin’s Lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR), and is particularly suitable for metagenomic quantification of closely-related species.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Manolio, T.A., et al.: A HapMap harvest of insights into the genetics of common disease. The Journal of Clinical Investigation 118(5), 1590–1605 (2008)
Matsuzaki, H., et al.: Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nature Methods 1(2), 109–111 (2004)
Gunderson, K.L., et al.: A genome-wide scalable SNP genotyping assay using microarray technology. Nature Genetics 37(5), 549–554 (2005)
Wheeler, D.A., et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 452(7189), 872–876 (2008)
Skibola, C.F., et al.: Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma. Nature Genetics 41(8), 873–875 (2010)
Brown, K.M., et al.: Common sequence variants on 20q11.22 confer melanoma susceptibility. Nature Genetics 40(7), 838–840 (2008)
Hanson, R.L., et al.: Identification of PVT1 as a candidate gene for end-stage renal disease in type 2 diabetes using a pooling-based genome-wide single nucleotide polymorphism association study. Diabetes 56(4), 975–983 (2007)
Erlich, Y., et al.: DNA Sudoku–harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Research 19(7), 1243–1253 (2009)
Golan, D., et al.: Weighted pooling–practical and cost-effective techniques for pooled high-throughput sequencing. Bioinformatics 28(12), i197–i206 (2012)
Prabhu, S., Pe’er, I.: Overlapping pools for high-throughput targeted resequencing. Genome Research 19(1), 1254–1261 (2009)
Savage, D.C., et al.: The Gastrointestinal Epithelium and its Autochthonous Bacterial Flora. The Journal of Experimental Medicine 127(1), 67–76 (1968)
Guarner, F., Malagelada, J.R.: Gut flora in health and disease. Lancet 361(9356), 512–519 (2003)
Heselmans, M., et al.: Gut Flora in Health and Disease: Potential Role of Probiotics. Current Issues in Intestinal Microbiology 6(1), 0–8 (2005)
Mahida, Y.R.: Epithelial cell responses. Best Practice & Research Clinical Gastroenterology 18(2), 241–253 (2004)
Amir, A., Zuk, O.: Bacterial community reconstruction using compressed sensing. Journal of Computational Biology 18(11), 1723–1741 (2011)
Hamady, M., et al.: Error-correcting barcoded primers allow hundreds of samples to be pyrosequenced in multiplex. Nature Methods 5(3), 235–237 (2008)
Dethlefsen, L., et al.: The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing. PLoS Biology 6(11), e280 (2008)
Angly, F.E., et al.: The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Computational Biology 5(12), e1000593 (2009)
Xia, L.C., et al.: Accurate genome relative abundance estimation based on shotgun metagenomic reads. PloS One 6(12), e27992 (2011)
Lin, W.Y., et al.: Evaluation of pooled association tests for rare variant identification. BMC Proceedings 5(suppl. 9), S118 (2011)
Price, A.L., et al.: Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics 86(6), 832–838 (2010)
Lee, J.S., et al.: On Optimal Pooling Designs to Identify Rare Variants Through Massive Resequencing. Genetic Epidemiology 35(3), 139–147 (2011)
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in Graphical Models, 1977, pp. 355–368. Kluwer Academic Publishers (1998)
Kimmel, G., Shamir, R.: A block-free hidden Markov model for genotypes and its application to disease association. Journal of Computational Biology 12(10), 1243–1260 (2005)
Kennedy, J., et al.: Genotype error detection using Hidden Markov Models of haplotype diversity. Journal of Computational Biology 15(9), 1155–1171 (2008)
Browning, S.R.: Multilocus association mapping using variable-length Markov chains. American Journal of Human Genetics 78(6), 903–913 (2006)
Conde, L., et al.: Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32. Nature Genetics 42(8), 661–664 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eskin, I. et al. (2013). eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds) Research in Computational Molecular Biology. RECOMB 2013. Lecture Notes in Computer Science(), vol 7821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37195-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-37195-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37194-3
Online ISBN: 978-3-642-37195-0
eBook Packages: Computer ScienceComputer Science (R0)