Abstract
The best known algorithm computes the sensitivity of a given spaced seed on a random region with running time O((M + L)|B|), where M is the length of the seed, L is the length of the random region, and |B| is the size of seed-compatible-suffix set, which is exponential to the number of 0’s in the seed. We developed two algorithms to improve this running time: the first one improves the running time to O(|B′|2 ML), where B′ is a subset of B; the second one improves the running time to O((M|B|)2.236 log(L/M)), which will be much smaller than the original running time when L is large. We also developed a Monte Carlo algorithm which can guarantee to quickly find a near optimal seed with high probability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J.Mol.Biol. 215, 403–410 (1990)
Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for hidden markov models, with application to homologous coding regions. In: CPM 2003. The 14th Annual Symposium on Combinatorial Pattern Matching, Washington, DC, USA, pp. 42–54. IEEE Computer Society Press, Los Alamitos (2003)
Brown, D.: Optimizing multiple seeds for protein homology search. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(1), 29–38 (2005)
Burkhardt, S., Crauser, A., Lenhof, H., Rivals, E., Ferragina, P., Vingron, M.: q-gram based databse searching using a suffix array. In: Third Annual International Conference on Computational Molecular Biology, pp. 11–14 (1999)
Choi, K., Zeng, F., Zhang, L.: Good spaced seeds for homology search. Bioinformatics 20(7), 1053–1059 (2004)
Choi, K., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. Journal of Computer and System Sciences 68, 22–40 (2004)
Delcher, A., Kasif, S., Fleischmann, R., Peterson, J., White, O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Res. 27, 2369–2376 (1999)
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter ii: highly sensitive and fast homology search. JBCB 2(3), 417–439 (2004)
Li, M., Ma, B., Zhang, L.: Superiority and complexity of the spaced seeds. In: SODA 2006. Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithms, pp. 444–453. ACM Press, New York (2006)
Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)
Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Preparata, F., Zhang, L., Choi, K.: Quick, practical selection of effective seeds for homology search. JCB 12(9), 1137–1152 (2005)
Tatusova, T., Madden, T.: Blast 2 sequences - a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247–250 (1999)
Yang, I., Wang, S., Chen, Y., Huang, P.: Efficient methods for generating optimal single and multiple spaced seeds. In: BIBE 2004. Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering, Washington, DC, USA, p. 411. IEEE Computer Society Press, Los Alamitos (2004)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning dna sequences. J.Comput.Biol. 7, 203–214 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, X., Li, S.C., Lu, Y. (2007). New Algorithms for the Spaced Seeds. In: Preparata, F.P., Fang, Q. (eds) Frontiers in Algorithmics. FAW 2007. Lecture Notes in Computer Science, vol 4613. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73814-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-73814-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73813-8
Online ISBN: 978-3-540-73814-5
eBook Packages: Computer ScienceComputer Science (R0)