Abstract
Comparison of large, unfinished genomic sequences requires fast methods that are robust to misordering, misorientation, and duplications. A number of fast methods exist that can compute local similarities between such sequences, from which an optimal one-to-one correspondence might be desired. However, existing methods for computing such a correspondence are either too costly to run or are inappropriate for unfinished sequence. We propose an efficient method for refining a set of segment matches such that the resulting segments are of maximal size without non-identity overlaps. This resolved set of segments can be used in various ways to compute a similarity measure between any two large sequences, and hence can be used in alignment, matching, or tree construction algorithms for two or more sequences.
New address: WSI-AB, Tübingen University, Sand 14, 72076 Tübingen, Germany.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. F. Altschul and B. W. Erickson. Locally optimal subalignments using nonlinear similarity functions. Bull. Math. Biol., 48:633–660, 1986.
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.
S. Batzoglou, L. Pachter, J. P. Mesirov, B. Berger, and E. S. Lander. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Research, 10:950–958, 2000.
A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369–2376, 1999.
Delcher, A. and others. unpublished.
G. Jacobson and K.-P. Vo. Heaviest increasing/common subsequence problems. In Proceedings 3rd Annual Symposium on Combinatorial pattern matching (CPM), pages 52–66, 1992.
J. D. Kececioglu. The maximum weight trace problem in multiple sequence alignment. In Proc. 4-th Symp. Combinatorial Pattern Matching, number 684 in Lecture Notes in Computer Science, pages 106–119. Springer-Verlag, 1993.
J. D. Kececioglu, H.-P. Lenhof, K. Mehlhorn, P. Mutzel, K. Reinert, and M. Vingron. A polyhedral approach to sequence alignment problems. Discrete Applied Mathematics, 104:143–186, 2000.
S. Kurtz and C. Schleiermacher. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics, 15(5):426–427, 1999.
G. S. Luecker. A data structure for orthogonal range queries. Proc. 19th IEEE Symposium on Foundations of Computer Science, pages 28–34, 1978.
B. Morgenstern, W. R. Atchley, K. Hahn, and A. Dress. Segment-based scores for pairwise and multiple sequence alignments. In Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB-98), 1998.
E. W. Myers, G. G. Sutton, H. O. Smith, M. D. Adams, and J. C. Venter. On the sequencing and assembly of the human genome. Proc Natl Acad Sci U S A, 99(7):4145–4146, 2002.
P. A. Pevzner and M. S. Waterman. Generalized sequence alignment and duality. Advances in Applied Mathematics, 14:139–171, 1993.
S. Schwartz, Z. Zhang, K. A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller. PipMaker-a web server for aligning two genomic dna sequences. Genome Research, 10:577–586, 2000.
J.-S. Varré, J.-P. Delahaye, and E. Rivals. Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics, 15(3):194–202, 1999.
W. J. Wilbur and D. J. Lipman. The context dependent comparison of biological sequences. SIAM J. Applied Mathematics, 44(3):557–567, 1984.
D. E. Willard. New data structures for orthogonal queries. SIAM Journal of Computing, pages 232–253, 1985.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Halpern, A.L., Huson, D.H., Reinert, K. (2002). Segment Match Refinement and Applications. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_10
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive