Abstract
We describe a supervised learning approach to resolve difficulties in finding biologically significant local alignments. It was noticed that the O(n 2) algorithm by Smith-Waterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring shorter, biologically significant ones. Arslan et. al. proposed an O(n 2 log n) algorithm which outputs a normalized local alignment that maximizes the degree of similarity rather than the total similarity score. Given a properly selected normalization parameter, the algorithm can discover significant alignments that would be missed by the Smith-Waterman algorithm. Unfortunately, determining a proper normalization parameter requires repeated executions with different parameter values and expert feedback to determine the usefulness of the alignments. We propose a learning approach that uses existing biologically significant alignments to learn parameters for intelligently processing sub-optimal Smith-Waterman alignments. Our algorithm runs in O(n 2) time and can discover biologically significant alignments without requiring expert feedback to produce meaningful results.
Supported by a grant from Rensselaer Polytechnic Institute.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alexandrov, N., Solovyev, V.: Statistical significance of ungapped alignments. Pacific Symp. on Biocomputing (1998) 463–472
Altschul, S., Erickson, B.: Significance levels for biological sequence comparison using nonlinear similarity functions. Bulletin of Mathematical Biology 50 (1988) 77–92
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped Blast and Psi-Blast: a new generation of protein database search programs. Nucleic Acids Research 25 (1997) 3389–3402
Arslan, A., Egecioglu, Ö., Pevzner, P.: A new approach to sequence comparison: normalized sequence alignment. Proceeding of the Fifth Annual International Conference on Molecular Biology(2001) 2–11
Arslan, A., Egecioglu, Ö.: An efficient uniform-cost normalized edit distance algorithm. 6th Symp. on String Processing and Info. Retrieval(1999) 8–15
Bafna, V., Huson, D.: The conserved exon method of gene finding. Proc. of the 8th Int. Conf. on Intelligent Systems for Molecular Bio. (2000) 3–12
Barton, G.: An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. Computer Applications in the Biosciences 9 (1993) 729–734
Batzoglou, S., Pachter, L., Mesirov, J., Berger, B., Lander, E.: Comparative analysis of mouse and human DNA and application to exon prediction. Proc. of the 4th Annual Int. Conf. on Computational Molecular Biology(2000) 46–53
Dinkelbach, W.: On nonlinear fractional programming. Management Science 13 (1967) 492–498
Gelfand, M., Mironov, A., Pevzner P.: Gene recognition via spliced sequence align-ment. Proc. Natl. Acad. Sci. USA 93 (1996) 9061–9066
Goad, W., Kanehisa, M.: Pattern recognition in nucleic acid sequences: a general method for finding local homologies and symmetries. Nucleic Acids Research 10 (1982) 247–263
Huang, X., Pevzner, P., Miller, W.: Parametric recomputing in alignment graph. Proc. of the 5th Annual Symp. on Comb. Pat. Matching (1994) 87–101
Oommen, B., Zhang, K.: The normalized string editing problem revisited. IEEE Trans. on PAMI 18 (1996) 669–672
Seller, P.: Pattern recognition in genetic sequences by mismatch density. Bull. of Math. Bio.46 (1984) 501–504
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981) 195–197
Vidal, E., Marzal, A., Aibar, P.: Fast computation of normalized edit distances. IEEE Trans. on PAMI 17 (1995) 899–902
Zhang, Z., Berman, P., Miller, W.: Alignments without low-scoring regions. J. Comput. Biol. 5 (1998) 197–200
Zhang, Z., Berman, P., Wiehe, T., Miller, W.: Post-processing long pairwise alignments. Bioinformatics 15 (1999) 1012–1019
Zuker, M.: Suboptimal sequence alignment in molecular biology: alignment with error analysis. Journal of Molecular Biology221 (1991) 403–420
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Breimer, E., Goldberg, M. (2002). Learning Significant Alignments: An Alternative to Normalized Local Alignment. In: Hacid, MS., RaÅ›, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_6
Download citation
DOI: https://doi.org/10.1007/3-540-48050-1_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43785-7
Online ISBN: 978-3-540-48050-1
eBook Packages: Springer Book Archive