Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Product grammars for alignment and folding

Published: 01 May 2015 Publication History

Abstract

We develop a theory of algebraic operations over linear and context-free grammars that makes it possible to combine simple "atomic" grammars operating on single sequences into complex, multi-dimensional grammars. We demonstrate the utility of this framework by constructing the search spaces of complex alignment problems on multiple input sequences explicitly as algebraic expressions of very simple one-dimensional grammars. In particular, we provide a fully worked frameshift-aware, semiglobal DNA-protein alignment algorithm whose grammar is composed of products of small, atomic grammars. The compiler accompanying our theory makes it easy to experiment with the combination of multiple grammars and different operations. Composite grammars can be written out in LATEX for documentation and as a guide to implementation of dynamic programming algorithms. An embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler. Software and supplemental files available here: http://www.bioinf. uni-leipzig.de/Software/gramprod/

References

[1]
D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, "A tool for multiple sequence alignment," Proc. Nat. Acad. Sci. USA, vol. 86, no. 12, pp. 4412-4415, 1989.
[2]
R. Giegerich and C. Meyer, "Algebraic dynamic programming," in Proc. Algebraic Methodol. Softw. Technol., 2002, vol. 2422, pp. 349-364.
[3]
R. Giegerich, C. Meyer, and P. Steffen, "A discipline of dynamic programming over sequence data," Sci. Comput. Program., vol. 51, no. 3, pp. 215-263, 2004.
[4]
C. M. Reidys, F. W. D. Huang, J. E. Andersen, R. C. Penner, P. F. Stadler, and M. E. Nebel, "Topology and prediction of RNA pseudoknots," Bioinformatics, vol. 27, pp. 1076-1085, 2011.
[5]
H. Chitsaz, R. Salari, S. C. Sahinalp, and R. Backofen, "A partition function algorithm for interacting nucleic acid strands," Bioinformatics, vol. 25, pp. i365-i373, 2009.
[6]
F. W. D. Huang, J. Qin, C. M. Reidys, and P. F. Stadler, "Partition function and base pairing probabilities for RNA-RNA interaction prediction," Bioinformatics, vol. 25, pp. 2646-2654, 2009.
[7]
S. Will, C. Schmiedl, M. Miladi, M. Möhl, and R. Backofen, "SPARSE: Quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics," in Proc. 17th Int. Conf. Res. Comput. Mol. Biol., 2013, pp. 289-290.
[8]
C. Höner zuSiederdissen, I. L. Hofacker, and P. F. Stadler, "How to multiply dynamic programming algorithms," in Proc. Brazilian Symp. Bioinformat., 2013, pp. 82-93.
[9]
O. Gotoh, "An improved algorithm for matching biological sequences," J. Mol. Biol., vol. 162, pp. 705-708, 1982.
[10]
F. Lefebvre, Grammairs s-attribuées multi-bandes et applications à l'analyse automatique de séquences biologiques, Ph.D. dissertation, École Politechnique, Palaiseau, France, 1997.
[11]
S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," J. Mol. Biol., vol. 48, no. 3, pp. 443-453, 1970.
[12]
D. Sankoff, "Simultaneous solution of the RNA folding, alignment and protosequence problems," SIAM J. Appl. Math., vol. 45, pp. 810-825, 1985.
[13]
The GHC Team. (1989-2013). The Glasgow Haskell Compiler (GHC) [Online]. Available: http://www.haskell.org/ghc/
[14]
T. Sheard, and S. P. Jones, "Template Meta-programming for Haskell," in Proc. ACM SIGPLAN Workshop Haskell, 2002, pp. 1-16.
[15]
G. Mainland, "Why it's nice to be quoted: Quasiquoting for Haskell," in Proc. ACM SIGPLAN Workshop Haskell Workshop, 2007, pp. 73-82.
[16]
D. Coutts, R. Leshchinskiy, and D. Stewart, "Stream fusion: From lists to streams to nothing at all," in Proc. 12th ACM SIGPLAN Int. Conf. Funct. Program., 2007, pp. 315-326.
[17]
C. Höner zu Siederdissen, "Sneaking around concatMap: Efficient combinators for dynamic programming," in Proc. 17th ACM SIGPLAN Int. Conf. Funct. Program., 2012, pp. 215-226.
[18]
G. Mainland, R. Leshchinskiy, S. P. Jones, and S. Marlow, "Exploiting vector instructions with generalized stream fusion," in Proc. 18th ACM SIGPLAN Int. Conf. Funct. Program., 2013, pp. 37-48.
[19]
G. Keller, M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier, "Regular, shape-polymorphic, parallel arrays in Haskell," in Proc. 15th ACM SIGPLAN Int. Conf. Funct. Program., 2010, pp. 261-272.
[20]
O. Gotoh, "Alignment of three biological sequences with an efficient traceback procedure," J. Theor. Biol., vol. 121, pp. 327-337, 1986.
[21]
T. G. Dewey, "A sequence alignment algorithm with an arbitrary gap penalty function," J. Comput. Biol., vol. 8, pp. 177-190, 2001.
[22]
A. S. Konagurthu, J. Whisstock, and P. J. Stuckey, "Progressive multiple alignment using sequence triplet optimization and three-residue exchange costs," J. Bioinf. Comput. Biol., vol. 2, pp. 719- 745, 2004.
[23]
M. Kruspe, and P. F. Stadler, "Progressive multiple sequence alignments from triplets," BMC Bioinformat., vol. 8, p. 254, 2007.
[24]
L. Steiner, P. F. Stadler, and M. Cysouw, "A pipeline for computational historical linguistics," Lang. Dynamics Change, vol. 1, pp. 89- 127, 2011.
[25]
N. Retzlaff, "Bigramm-Alignierung und ihre Anwendung in der historischen Linguistik," Bachelors Thesis, Department of Computer Science, Eberhard Karls Universität Tbingen, Tbüingen, Germany, 2013.
[26]
T. F. Smith, and M. S. Waterman, "Identification of common molecular subsequences," J. Mol. Biol., vol. 147, pp. 195-197, 1981.
[27]
S. R. Eddy, "HMMER: Profile HMMs for protein sequence analysis," Bioinformatics, vol. 14, pp. 755-763, 1998.
[28]
S. R. Eddy, and R. Durbin, "RNA sequence analysis using covariance models," Nucl. Acids Res., vol. 22, pp. 2079-2088, 1994.
[29]
E. P. Nawrocki, D. L. Kolbe, and S. R. Eddy, "Infernal 1.0: inference of RNA alignments," Bioinformatics, vol. 25, no. 10, pp. 1335- 1337, 2009.
[30]
F. Lefebvre, "An optimized parsing algorithm well suited to RNA folding," in Proc. 3rd Int. Conf. Intell. Syst. Mol. Biol., 1995, pp. 222-230.
[31]
K. Prüfer, U. Stenzel, M. Hofreiter, S. Pääbo, J. Kelso, and R. E. Green, "Computational challenges in the analysis of ancient DNA," Genome Biol., vol. 11, p. R47, 2010.
[32]
J. M. Gott, N. Parimi, and R. Bundschuh, "Discovery of new genes and deletion editing in Physarum mitochondria enabled by a novel algorithm for finding edited mRNAs," Nucl. Acids Res., vol. 33, pp. 5063-5072, 2005.
[33]
C. Beargie, T. Liu, M. Corriveau, H. Y. Lee, J. Gott, and R. Bundschuh, "Genome annotation in the presence of insertional RNA editing," Bioinformatics, vol. 24, pp. 2571-2578, 2008.
[34]
R. Bundschuh, "Computational approaches to insertional RNA editing," Methods Enzymol., vol. 424, pp. 173-195, 2007.
[35]
R. Bundschuh, J. Altmüller, C. Becker, P. Nürnberg, and J. M. Gott, "Complete characterization of the edited transcriptome of the mitochondrion of Physarum polycephalum using deep sequencing of RNA," Nucl. Acids Res., vol. 39, pp. 6044-6055, 2011.
[36]
H. Takano, T. Abe, R. Sakurai, Y. Moriyama, Y. Miyazawa, H. Nozaki, S. Kawano, N. Sasaki, and T. Kuroiwa, "The complete DNA sequence of the mitochondrial genome of Physarum polycephalum," Mol. General Genetics, vol. 264, pp. 539-545, 2001.
[37]
B. F. Lang, G. Burger, C. J. O'Kelly, R. Cedergren, G. B. Golding, C. Lemieux, D. Sankoff, M. Turmel, and M. W. Gray, "An ancestral mitochondrial DNA resembling a eubacterial genome in miniature," Nature, vol. 387, pp. 493-497, 1997.
[38]
R. Nussinov, G. Piecznik, J. R. Griggs, and D. J. Kleitman, "Algorithms for loop matching," SIAM J. Appl. Math., vol. 35, pp. 68-82, 1978.
[39]
R. D. Dowell, and S. R. Eddy, "Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction," BMC Bioinformat., vol. 5, p. 71, 2004.
[40]
S. A. Greibach, "A new normal-form theorem for context-free phrase structure grammars," J. ACM, vol. 12, pp. 42-52, 1965.
[41]
A. Ehrenfeucht, and G. Rozenberg, "An easy proof of Greibach normal form," Inf. Control, vol. 63, pp. 190-199, 1984.
[42]
N. Blum, and R. Koch, "Greibach normal form transformation revisited," Inf. Comput., vol. 150, pp. 112-118, 1999.
[43]
R. Hammack, W. Imrich, and S. Klavžar, Handbook of Product Graphs, 2nd ed. Boca Raton, FL, USA: CRC Press, 2011.
[44]
R. Giegerich, "Explaining and controlling ambiguity in dynamic programming," in Combinatorial Pattern Matching. New York, NY, USA: Springer, 2000, pp. 46-59.
[45]
R. Giegerich, and C. Höner zu Siederdissen, "Semantics and ambiguity of stochastic RNA family models," IEEE/ACM Trans. Comput. Biol. Bioinformat., vol. 8, no. 2, pp. 499-516, Mar. 2011.
[46]
P. Steffen, and R. Giegerich, "Versatile and declarative dynamic programming using pair algebras," BMC Bioinformat., vol. 6, p. 224, 2005.
[47]
P. Steffen and R. Giegerich, "Table design in dynamic programming," Inf. Comput., vol. 204, no. 9, pp. 1325-1345, 2006.
[48]
H. L. Bodlaender and J. A. Telle. (2004, Dec.). Space-efficient construction variants of dynamic programming. Nordic J. Comput. [Online]. 11(4), pp. 374-385. Available: http://dl.acm.org/citation.cfm?id=1062991.1062995.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 12, Issue 3
May/June 2015
227 pages
ISSN:1545-5963
  • Editor:
  • Ying Xu
Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 May 2015
Accepted: 10 May 2014
Revised: 29 April 2014
Received: 12 December 2013
Published in TCBB Volume 12, Issue 3

Author Tags

  1. Haskell
  2. context free grammar
  3. linear grammar
  4. multiple alignment
  5. product structure

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 47
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media