Abstract
In this paper we propose a chaining method that can align a draft genomic sequence against a finished genome. We introduce the use of an overlap tree to enhance the state information available to the chaining procedure in the context of sparse dynamic programming, and demonstrate that the resulting procedure more accurately penalizes the various biological rearrangements. The algorithm is tested on a whole genome alignment of seven yeast species. We also demonstrate a variation on the algorithm that can be used for co-assembly of two genomes and show how it can improve the current assembly of the Ciona savignyi (sea squirt) genome.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abouelhoda, M.I., Ohlebusch, E.: A Local Chaining Algorithm and Its Applications in Comparative Genomics. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 1–16. Springer, Heidelberg (2003)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., Lander, E.S.: ARACHNE: A whole genome shotgun assembler. Genome Research 12, 177–189 (2002)
Bray, N., Dubchak, I., Pachter, L.: AVID: A Global Alignment Program. Genome Research 13, 97–102 (2003)
Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4(1), 66 (2003a)
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA. Genome Research 13(4), 721–731 (2003b)
Brudno, M., Malde, S., Poliakov, A., Do, C.B., Couronne, O., Dubchak, I., Batzoglou, S.: Glocal alignment: finding rearrangements during alignment. Bioinformatics 19(1), i54–i62 (2003c)
Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of the IEEE Computer Society Bioinformatics Conference CSB (2002)
Burton, F.W., Huntbach, M.M.: Multiple Generation Text Files Using Overlapping Tree. The Computer Journal 28(4), 414–416 (1985)
Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B., Johnston, M.: Finding functional features in Saccharomyces Genomes by phylogenetic footprinting. Science 301, 71–76 (2003)
Delcher, A.L., Kasif, S., Fleischmann, R.D., Peterson, J., White, O., Salzberg, S.L.: Alignment of Whole Genomes. Nucleic Acids Research 27(11), 2369–2376 (1999)
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast Algorithms for Large-scale Genome Alignment and Comparision. Nucleic Acids Research 30(11), 2478–2483 (2002)
Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucl Acids Res. 22, 2079–2088 (1994)
Eppstein, D., Galil, R., Giancarlo, R., Italiano, G.F.: Sparse dynamic programming I: linear cost functions. J. ACM 39, 519–545 (1992)
Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., et al.: Whole-genome random sequencing and assembly of Haemophilus influenzae. Science 269(5223), 496–512 (1995)
Jaffe, D.B., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J.P., Zody, M.C., Lander, E.S.: Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13(1), 91–96 (2003)
Kellis, M., Birren, B., Lander, E.: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624 (2004)
Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)
Lippert, R.A., Zhao, X., Florea, L., Mobarry, C., Istrail, S.: Finding Anchors for Genomic Sequence Comparison. In: Proceedings of ACM RECOMB (2004)
Mullikin, J.C., Ning, Z.: The phusion assembler. Genome Res 13(1), 81–90 (2003)
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Smith, T.F., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Tzouramanis, T., Vassilakopoulos, M., Manolopoulos, Y.: Multiversion Linear Quadtree for Spatio-Temporal Data. In: Masunaga, Y., Thalheim, B., Štuller, J., Pokorný, J. (eds.) ADBIS 2000 and DASFAA 2000. LNCS, vol. 1884, p. 279. Springer, Heidelberg (2000)
Veeramachaneni, V., Berman, P., Miller, W.: Aligning two fragmented sequences. Discrete Applied Mathematics 127(1), 119–143 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sundararajan, M., Brudno, M., Small, K., Sidow, A., Batzoglou, S. (2004). Chaining Algorithms for Alignment of Draft Sequence. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-30219-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive