Abstract
This paper focuses on the 2-Interval pattern matching problem for {< , ⊂} -structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn 3 log n) time solution to the problem, where m, n are the number of intervals in the pattern and the given 2-interval set. This solution however is not practical for scanning the secondary structure in a genome-wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture more characteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are {(A,U), (C,G), (U,G)} which are the case for RNA molecule, we can further improve the time complexity to O(m q), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R.: Rfam: An RNA family database. Nucleic Acids Research 31(1), 439–441 (2003)
Noncoding RNA database, http://biobases.ibch.poznan.pl/ncRNA
Klein, R., Eddy, S.: RSEARCH: Finding homologs of single structured RNA sequences. BMC Bioinformatics 4(1), 44 (2003)
Wong, T., Chiu, Y.S., Lam, T.-W., Yiu, S.M.: A memory efficient algorithm for structural alignment of RNAs with embedded simple pseudoknots. In: Proceedings of the 6th Asia-Pacific Bioinformatics Conference, pp. 89–99 (2008)
Zhang, S., Hass, B., Eskin, E., Bafna, V.: Searching genomes for noncoding RNA using FastR. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4) (2005)
Vialette, S.: On the computational complexity of 2-interval pattern matching problems. Theor. Comput. Sci. 312(2-3), 223–249 (2004)
Blin, G., Fertin, G., Vialette, S.: New results for the 2-interval pattern problem. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 311–322. Springer, Heidelberg (2004)
Chen, E., Yang, L., Yuan, H.: Improved algorithms for largest cardinality problem. J. Comb. Optim. 13(3), 263–275 (2007)
Crochemore, M., Hermelin, D., Landau, G.M., Vialette, S.: Approximating the 2-interval pattern problem. In: Brodal, G.S., Leonardi, S. (eds.) ESA 2005. LNCS, vol. 3669, pp. 426–437. Springer, Heidelberg (2005)
Nawrocki, E.P., Eddy, S.R.: Query-Dependent Banding (QDB) for faster RNA similarity searchers. PLoS Computational Biology 3(3), 540–554 (2007)
Weinberg, Z., Ruzzo, W.L.: Faster genome annotation of non-coding RNA families without loss of accuracy. In: Proceedings of the 8th Annual International Conference on Computational Molecular Biology (RECOMB) (2004)
Weinberg, Z., Ruzzo, W.L.: Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics 22(1), 35–39 (2006)
Yoon, B.-J., Vaidyanathan, P.P.: Fast structural similarity search of noncoding RNAs based on matched filtering of stem patterns. In: IEEE conference on Signals, Systems and Computers (ACSSC 2007), pp. 44–48 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wong, T.K.F., Yiu, S.M., Lam, T.W., Sung, WK. (2009). The 2-Interval Pattern Matching Problems and Its Application to ncRNA Scanning. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-00727-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00726-2
Online ISBN: 978-3-642-00727-9
eBook Packages: Computer ScienceComputer Science (R0)