Abstract
String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.
The work of BD was supported in part by NSF grants CCR-0206795, CCR-0208749 and NSF CAREER grant IIS-0346973. The work of KMK and AAS was supported in part by NSF ITR grant 0121277. The work of IIM was supported in part by a “Large Grant” from the University of Connecticut’s Research Foundation.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Berman, P., DasGupta, B., Kao, M.-Y.: Tight approximability results for test set problems in bioinformatics. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 39–50. Springer, Heidelberg (2004)
Berman, P., DasGupta, B., Sontag, E.: Randomized approximation algorithms for set multicover problems with applications to reverse engineering of protein and gene networks. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 39–50. Springer, Heidelberg (2004)
Borneman, J., Chrobak, M., Vedova, G.D., Figueora, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 1, 1–9 (2001)
Cazalis, D., Milledge, T., Narasimhan, G.: Probe selection problem: Structure and algorithms. In: Proc. 8th Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2004), pp. 124–129 (2004)
Cheung, V.G., Nelson, S.F.: Whole genome amplification using a degenerate oligonucleotide primer allows hundreds of genotypes to be performed on less than one nanogram of genomic dna. Proc. Natl. Acad. Sci. USA. 93, 14676–14679 (1996)
Chvátal, V.: A greedy heuristic for the set covering problem. Math. of Op. Res. 4, 233–235 (1979)
NCBI Completed Microbial Genomes (2004), http://www.ncbi.nlm.nih.gov/genomes/microbes/complete.html
DasGupta, B., Konwar, K., Mandoiu, I.I., Shvartsman, A.: Highly scalable algorithms for robust string barcoding. ACM Computing Research Repository (2005) cs.DS/0502065
Dean, F.B., Hosono, S., Fang, L., Wu, X., Fawad Faruqi, A., Bray-Ward, P., Sun, Z., Zong, Q., Du, Y., Du, J., Driscoll, M., Song, W., Kingsmore, S.F., Egholm, M., Lasken, R.S.: Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl. Acad. Sci. USA. 99, 5261–5266 (2002)
Gharizadeh, B., Käller, M., Nyrén, P., Andersson, A., Uhlén, M., Lundeberg, J., Ahmadian, A.: Viral and microbial genotyping by a combination of multiplex competitive hybridization and specific extension followed by hybridization to generic tag arrays. Nucleic Acids Research 31(22) (2003)
Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Sys. Sci. 9, 256–278 (1974)
Linhart, C., Shamir, R.: The degenerate primer design problem. Bioinformatics 181, S172–S181 (2002)
Lovász, L.: On the ratio of optimal integral and fractional covers. Discrete Mathematics 13, 383–390 (1975)
Rash, S., Gusfield, D.: String barcoding: Uncovering optimal virus signatures. In: Proc. 6th Annual International Conference on Computational Biology, pp. 254–261 (2002)
Souvenir, R., Buhler, J., Stormo, G., Zhang, W.: Selecting degenerate multiplex PCR primers. In: Proc. 3rd Intl. Workshop on Algorithms in Bioinformatics (WABI), pp. 512–526 (2003)
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
DasGupta, B., Konwar, K.M., Măndoiu, I.I., Shvartsman, A.A. (2005). Highly Scalable Algorithms for Robust String Barcoding. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds) Computational Science – ICCS 2005. ICCS 2005. Lecture Notes in Computer Science, vol 3515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428848_129
Download citation
DOI: https://doi.org/10.1007/11428848_129
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26043-1
Online ISBN: 978-3-540-32114-9
eBook Packages: Computer ScienceComputer Science (R0)