Abstract
The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus or plasmid for amplification.
This research was partially supported by NSF grants EIA-0325123 and DBI-0444815.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cello, J., Paul, A., Wimmer, E.: Chemical synthesis of poliovirus cDNA: Generation of infectious virus in the absence of natural template. Science 297, 1016–1018 (2002)
Smith, H., Hutchison, C., Pfannkoch, C., Venter, J.C.: Generating a synthetic genome by whole genome assembly: Phix174 bacteriophage from synthetic oligonucleotides. Proc. Nat. Acad. Sci. 100, 15440–15445 (2003)
Kodumal, S., Pael, K., Reid, R., Menzella, H., Welch, M., Santi, D.: Total synthesis of long DNA sequences: Synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc. Nat. Acad. Sci. 44, 15573–15578 (2004)
Ball, P.: Starting from scratch. Nature 431, 624–626 (2004)
Tian, J., Gong, H., Sheng, N., Zhou, Z., Gulari, E., Gao, X., Church, G.: Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432, 1050–1054 (2004)
Skiena, S., Wimmer, E.: Gene design for vaccines and theraputic phages. NSF ITR Award 0325123 (2003)
Cohen, B., Skiena, S.: Natural selection and algorithmic design of mrna. J. Computational Biology 10, 419–432 (2003)
Skiena, S.: Designing better phages. Bioinformatics 17, 253–261 (2001)
Fukuda, Y., Washio, T., Tomita, M.: Evolution of overlapping genes: Comparative genomics of mycoplasma genitalium and mycoplasma pneumoniae. In: The Ninth Workshop on Genome Informatics (1998)
Cann, A.J.: Principles of Molecular Virology. Academic Press, London (1993)
Keese, P., Gibbs, A.: Origins of genes: “big bang” or continuous creation? Proc. Natl. Acad. Sci. 89, 9489–9493 (1992)
Krakauer, D.C.: Evolutionary principles of genomic compression. Comments on Theor. Biol. (2002)
Oppenheim, D., Yahofsky, C.: Translational coupling during expression of the tryptophan operon of e. coli. Genetics 95, 785–795 (1980)
Miyata, T., Yasunaga, T.: Evolution of overlapping genes. Nature 272, 532–535 (1978)
Krakauer, D.C.: Stability and evolution of overlapping genes. Evolution 54(3), 731–739 (2000)
Veeramachaneni, V., Makalowski, W., Galdzicki, M., Sood, R., Makalowska, I.: Mammalian overlapping genes: The comparative method. Genome Research 14, 280–286 (2004)
Fukuda, Y., Nakayama, Y., Tomita, M.: On dynamics of overlapping genes in bacterial genomes. Gene. 323, 181–187 (2003)
Rogozin, I., Spiridonov, A., Sorokin, A., Wolf, Y., King, J., Tatusov, R., Koonin, E.: Purifying and directional selection in overlapping prokaryotic genes. Trends Genet. 18(5), 228–232 (2002)
Karlin, S., Chen, C., Gentles, A., Cleary, M.: Associations between human disease genes and overlapping gene groups and multiple amino acid runs. Proc. Natl. Acad. Sci. 99(26), 17008–17013 (2002)
Freeland, S., Hurst, L.: Evolution encoded. Sci. Am. 290(4), 84–91 (2004)
Gilis, D., Massar, S., Cerf, N.J., Rooman, M.: Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2(11) (2001)
Marti-Renom, M.A., Stuart, A.C., Fiser, A., Sanchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)
Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59–107 (1976)
Elber, R., Karplus, M.: Enhanced sampling in molecular dynamics: Use of the time-dependent hartree approximation for a simulation of carbon monoxide diffusion through myoglobin. J. Am. Chem. Soc. 112, 9161–9175 (1990)
Hornak, V., Simmerling, C.: Generation of accurate protein loop conformations through low-barrier molecular dynamics. Proteins 51, 577–590 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, B., Papamichail, D., Mueller, S., Skiena, S. (2006). Two Proteins for the Price of One: The Design of Maximally Compressed Coding Sequences. In: Carbone, A., Pierce, N.A. (eds) DNA Computing. DNA 2005. Lecture Notes in Computer Science, vol 3892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11753681_31
Download citation
DOI: https://doi.org/10.1007/11753681_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34161-1
Online ISBN: 978-3-540-34165-9
eBook Packages: Computer ScienceComputer Science (R0)