Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Staged parser combinators for efficient data processing

Published: 15 October 2014 Publication History

Abstract

Parsers are ubiquitous in computing, and many applications depend on their performance for decoding data efficiently. Parser combinators are an intuitive tool for writing parsers: tight integration with the host language enables grammar specifications to be interleaved with processing of parse results. Unfortunately, parser combinators are typically slow due to the high overhead of the host language abstraction mechanisms that enable composition.
We present a technique for eliminating such overhead. We use staging, a form of runtime code generation, to dissociate input parsing from parser composition, and eliminate intermediate data structures and computations associated with parser composition at staging time. A key challenge is to maintain support for input dependent grammars, which have no clear stage distinction.
Our approach applies to top-down recursive-descent parsers as well as bottom-up non-deterministic parsers with key applications in dynamic programming on sequences, where we auto-generate code for parallel hardware. We achieve performance comparable to specialized, hand-written parsers.

References

[1]
The Apache HTTP server project. http://httpd.apache.org/.
[2]
L. Cartey, R. Lyngsø, and O. de Moor. Synthesising graphics card programs from DSLs. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 121--132, New York, NY, USA, 2012. ACM.
[3]
D.-J. Chang, C. Kimmer, and M. Ouyang. Accelerating the Nussinov RNA folding algorithm with CUDA/GPU. In Proceedings of the 10th IEEE International Symposium on Signal Processing and Information Technology, ISSPIT '10, pages 120--125, Washington, DC, USA, 2010. IEEE Computer Society.
[4]
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP '07, pages 315--326, New York, NY, USA, 2007. ACM.
[5]
J. Eisner, E. Goldlust, and N. A. Smith. Dyna: A declarative language for implementing dynamic programs. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACLdemo '04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics.
[6]
B. Ford. Packrat parsing: Simple, powerful, lazy, linear time, functional pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming, ICFP '02, pages 36--47, New York, NY, USA, 2002. ACM.
[7]
R. Frost. Monadic memoization towards correctness-preserving reduction of search. In Proceedings of the 16th Canadian Society for Computational Studies of Intelligence Conference on Advances in Artificial Intelligence, AI '03, pages 66--80, Berlin, Heidelberg, 2003. Springer.
[8]
R. A. Frost and B. Szydlowski. Memoizing purely functional top-down backtracking language processors. Science of Computer Programming, 27(3):263--288, November 1996.
[9]
Y. Futamura. Partial evaluation of computation process - an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12(4):381--391, 1999.
[10]
R. Giegerich, C. Meyer, and P. Steffen. A discipline of dynamic programming over sequence data. Science of Computer Programming, 51(3):215--263, June 2004.
[11]
R. Giegerich and G. Sauthoff. Yield grammar analysis in the Bellman's GAP compiler. In Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications, LDTA '11, pages 7:1--7:8, New York, NY, USA, 2011. ACM.
[12]
A. Gill and S. Marlow. Happy: The parser generator for Haskell. http://www.haskell.org/happy/, 2010.
[13]
I. L. Hofacker. Vienna RNA secondary structure server. Nucleic Acids Research, 31(13):3429--3431, 2003.
[14]
C. Höner zu Siederdissen. Sneaking around concatmap: efficient combinators for dynamic programming. In Proceedings of the 17th ACM SIGPLAN international conference on Functional programming, ICFP '12, pages 215--226, New York, NY, USA, 2012. ACM.
[15]
S. Janssen, C. Schudoma, G. Steger, and R. Giegerich. Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction. BMC Bioinformatics, 12(429), 2011.
[16]
S. C. Johnson. YACC: Yet Another Compiler-compiler, volume 32 of Computing Science Technical Report. Bell Laboratories, Murray Hill, NJ, 1975.
[17]
P. Koopman and R. Plasmeijer. Efficient combinator parsers. In Implementation of Functional Languages, LNCS, pages 122--138, Berlin, Heidelberg, 1998. Springer.
[18]
D. Leijen and E. Meijer. Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-35, Department of Information and Computing Sciences, Utrecht University, 2001.
[19]
Y. Liu, A. Wirawan, and B. Schmidt. CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14:117, 2013.
[20]
A. Moors, F. Piessens, and M. Odersky. Parser combinators in Scala. CW Reports CW491, Department of Computer Science, K.U. Leuven, February 2008.
[21]
M. Odersky, L. Spoon, and B. Venners. Programming in Scala: A Comprehensive Step-by-step Guide. Artima Incorporation, USA, 1st edition, 2008.
[22]
T. J. Parr and R. W. Quong. ANTLR: A predicated-LL(k) parser generator. Softw., Pract. Exper., 25(7):789--810, 1995.
[23]
A. Prokopec. Scalameter: Automate your performance testing today. http://scalameter.github.io/.
[24]
T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-Virtualized: Linguistic reuse for deep embeddings. Higher Order and Symbolic Computation, August-September: 1-43, 2013.
[25]
T. Rompf and M. Odersky. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, GPCE '10, pages 127--136, New York, NY, USA, October 10-13 2010. ACM.
[26]
T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: New directions for extensible compilers based on staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '13, pages 497--510, New York, NY, USA, 2013. ACM.
[27]
E. F. d. O. Sandes and A. C. M. A. de Melo. CUDAlign: Using GPU to accelerate the comparison of megabase genomic sequences. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 137--146, New York, NY, USA, 2010. ACM.
[28]
E. F. d. O. Sandes and A. C. M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. In Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '11, pages 1199--1211, Washington, DC, USA, May 16-20 2011. IEEE Computer Society.
[29]
E. F. d. O. Sandes and A. C. M. A. de Melo. Retrieving Smith-Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Transactions on Parallel and Distributed Systems, 24(5):1009--1021, 2013.
[30]
G. Sauthoff. Bellman's GAP: a 2nd generation language and system for algebraic dynamic programming. PhD thesis, Bielefeld University, 2011.
[31]
M. Sperber and P. Thiemann. The essence of LR parsing. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM '95, pages 146--155, New York, NY, USA, 1995. ACM.
[32]
P. Steffen, R. Giegerich, and M. Giraud. Gpu parallelization of algebraic dynamic programming. In Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II, PPAM '09, pages 290--299, Berlin, Heidelberg, 2010. Springer.
[33]
K. Swadi, W. Taha, O. Kiselyov, and E. Pasalic. A monadic approach for avoiding code duplication when staging memoized functions. In Proceedings of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM '06, pages 160--169, New York, NY, USA, 2006. ACM.
[34]
I. Sysoev. The nginx HTTP server. http://nginx.org/.
[35]
W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248(1-2):211--242, 2000.
[36]
P. Wadler. Monads for functional programming. In Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques, Tutorial Text, volume 925 of LNCS, pages 24--52, Berlin, Heidelberg, May 24-30 1995. Springer.
[37]
A. Warth, J. R. Douglass, and T. Millstein. Packrat parsers can support left recursion. In Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semanticsbased Program Manipulation, PEPM '08, pages 103--110, New York, NY, USA, 2008. ACM.
[38]
C.-C. Wu, J.-Y. Ke, H. Lin, and W. chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS '11, pages 96--103, Washington, DC, USA, December 7-9 2011. IEEE Computer Society.
[39]
T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to rule them all. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! '13, pages 187--204, New York, NY, USA, 2013. ACM.
[40]
S. Xiao and W. chun Feng. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '10, pages 1--12, Washington, DC, USA, April 19-23 2010. IEEE Computer Society.

Cited By

View all
  • (2024)Daedalus: Safer Document ParsingProceedings of the ACM on Programming Languages10.1145/36564108:PLDI(816-840)Online publication date: 20-Jun-2024
  • (2022)Oregano: staging regular expressions with Moore Cayley fusionProceedings of the 15th ACM SIGPLAN International Haskell Symposium10.1145/3546189.3549916(66-80)Online publication date: 6-Sep-2022
  • (2020)Multi-stage programming in the large with staged classesProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426961(35-49)Online publication date: 16-Nov-2020
  • Show More Cited By

Index Terms

  1. Staged parser combinators for efficient data processing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 49, Issue 10
      OOPSLA '14
      October 2014
      907 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2714064
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
        October 2014
        946 pages
        ISBN:9781450325851
        DOI:10.1145/2660193
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2014
      Published in SIGPLAN Volume 49, Issue 10

      Check for updates

      Author Tags

      1. algebraic dynamic programming
      2. multi-stage programming
      3. parser combinators

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)23
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Daedalus: Safer Document ParsingProceedings of the ACM on Programming Languages10.1145/36564108:PLDI(816-840)Online publication date: 20-Jun-2024
      • (2022)Oregano: staging regular expressions with Moore Cayley fusionProceedings of the 15th ACM SIGPLAN International Haskell Symposium10.1145/3546189.3549916(66-80)Online publication date: 6-Sep-2022
      • (2020)Multi-stage programming in the large with staged classesProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426961(35-49)Online publication date: 16-Nov-2020
      • (2020)Staged selective parser combinatorsProceedings of the ACM on Programming Languages10.1145/34090024:ICFP(1-30)Online publication date: 3-Aug-2020
      • (2016)Reflections on LMS: exploring front-end alternativesProceedings of the 2016 7th ACM SIGPLAN Symposium on Scala10.1145/2998392.2998399(41-50)Online publication date: 30-Oct-2016
      • (2016)Optimizing Parser CombinatorsProceedings of the 11th edition of the International Workshop on Smalltalk Technologies10.1145/2991041.2991042(1-13)Online publication date: 23-Aug-2016
      • (2023)flap: A Deterministic Parser with Fused LexingProceedings of the ACM on Programming Languages10.1145/35912697:PLDI(1194-1217)Online publication date: 6-Jun-2023
      • (2022)Staging with class: a specification for typed template HaskellProceedings of the ACM on Programming Languages10.1145/34987236:POPL(1-30)Online publication date: 12-Jan-2022
      • (2020)Fluid quotes: metaprogramming across abstraction boundaries with dependent typesProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426953(98-110)Online publication date: 16-Nov-2020
      • (2019)Staged abstract interpreters: fast and modular whole-program analysis via meta-programmingProceedings of the ACM on Programming Languages10.1145/33605523:OOPSLA(1-32)Online publication date: 10-Oct-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media