Abstract
Viral haplotype estimation in a population is an important problem in virology. Viruses undergo a high number of mutations and recombinations during replication for their survival in host cells and exist as a population of closely related genetic variants. Due to this, estimating the number of haplotypes and their relative frequencies in the population becomes a challenging task. The usage of a sequenced reference genome has its limitations due to the high mutational rates in viruses. We propose a method for estimating viral haplotypes based only on the counts of k-mers present in the viral population without using the reference genome. We compute k-mer pairs that are related to each other by one mutation, and compute a minimal set of viral haplotypes that explain the whole population based on these k-mer pairs. We compare our method to the software ShoRAH (which uses a reference genome) on simulated dataset and obtained comparable results, even without using a reference genome.
Chapter PDF
Similar content being viewed by others
Keywords
References
Astrovskaya, I., Tork, B., Mangul, S., Westbrooks, K., Măndoiu, I., Balfe, P., Zelikovsky, A.: Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12(6) (2011)
Beerenwinkel, N., Gunthard, H.F., Roth, V., Metzner, K.J.: Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Frontiers in Microbiology 329(3) (2012)
Benjamini, Y., Speed, T.P.: Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic Acids Research 40(10), e72 (2012)
Boerlijst, M.C., Bonhoeffer, S., Nowak, M.A.: Viral quasi-species and recombination. Proceedings of the Royal Society of London. Series B: Biological Sciences 263(1376), 1577–1584 (1996)
Boeva, V., Zinovyev, A., Bleakley, K., Vert, J.-P., Janoueix-Lerosey, I., Delattre, O., Barillot, E.: Control-free calling of copy number alterations in deep-sequencing data using gc-content normalization. Bioinformatics 27(2), 268–269 (2011)
Collins, M.J., Kempe, D., Saia, J., Young, M.: Nonnegative integral subset representations of integer sets. Inf. Process. Lett. 101, 129–133 (2007)
Eigen, M., McCaskill, J., Schuster, P.: The molecular quasi-species. Adv. Chem. Phys. 75, 149–263 (1989)
Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.-Y., Wang, C., Gharizadeh, B., Ronaghi, M., Shafer, R.W., Beerenwinkel, N.: Viral population estimation using pyrosequencing. PLoS Comput. Biol. 4(5), e1000074 (2008)
Hoffmann, C., Minkah, N., Leipzig, J., Wang, G., Arens, M.Q., Tebas, P., Bushman, F.D.: DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations. Nucleic Acids Research 35, 91 (2007)
Jojic, V., Hertz, T., Jojic, N.: Population sequencing using short reads: HIV as a case study. In: Proc. Pac. Symp. Biocomput., pp. 114–125 (2008)
Macalalad, A.R., Zody, M.C., Charlebois, P., Lennon, N.J., Newman, R.M., Malboeuf, C.M., Ryan, E.M., Boutwell, C.L., Power, K.A., Brackney, D.E., Pesko, K.N., Levin, J.Z., Ebel, G.D., Allen, T.M., Birren, B.W., Henn, M.R.: Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput. Biol. 8(3), e1002417 (2012)
Port, E., Sun, F., Martin, D., Waterman, M.S.: Genomic mapping by end characterized random clones: A mathematical analysis. Genomics 26, 84–100 (1995)
Prabhakara, S., Malhotra, R., Poss, M., Acharya, R.: Mutant Bin: Unsupervised Haplotype Estimation of Viral Population Diversity Without Reference Genome. Journal of Computational Biology (in press)
Prosperi, M., Prosperi, L., Bruselles, A., Abbate, I., Rozera, G., Vincenti, D., Solmone, M., Capobianchi, M., Ulivi, G.: Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing. BMC Bioinformatics 12, 5 (2011)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim: A sequencing simulator for genomics and metagenomics. PLoS One 3, 3373 (2008)
Westbrooks, K., Astrovskaya, I., Campo, D., Khudyakov, Y., Berman, P., Zelikovsky, A.: HCV quasispecies assembly using network flows. In: Măndoiu, I., Wang, S.-L., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, pp. 159–170. Springer, Heidelberg (2008)
Zagordi, O., Bhattacharya, A., Eriksson, N., Beerenwinkel, N.: ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12(1), 119 (2011)
Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N.: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. Journal of Computational Biology 17(3), 417–428 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malhotra, R., Prabhakara, S., Poss, M., Acharya, R. (2013). Estimating Viral Haplotypes in a Population Using k-mer Counting. In: Ngom, A., Formenti, E., Hao, JK., Zhao, XM., van Laarhoven, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2013. Lecture Notes in Computer Science(), vol 7986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39159-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-39159-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39158-3
Online ISBN: 978-3-642-39159-0
eBook Packages: Computer ScienceComputer Science (R0)