Recursive Descent Parsing
Recursive Descent Parsing
http://www.techhui.com/profiles/blogs/1702911:BlogPost:1683
Hawaii's Technology and New Media Community Main My Page Members Events Forum Groups Photos Videos Blogs Listings Directory FAQ Coders More All Blog Posts My Blog Add
Introduction
If you're implementing any sort of miniature language as part of some software (query language, scripting capability) or even just attempting to parse structured input of some kind, using the full-boogie compilerbuilder tools like YACC, bison, and friends can seem like overkill. If you just try to write a completely ad-hoc parser off-the-cuff, however, you'll find it laborious, painful, and likely find the result to be full of bugs and nasty corner cases. However, there's a happy medium, which is recursive descent parsing. I'm going to give you a quick hand-wavy explanation of how to use it.
Note: In reality, for practicality's sake we'd probably match RealNum as a token with a pattern-matching rule rather than in EBNF.
BNF left-factorized simplification of EBNF grammar: To convert the grammar to a parser, begin by converting it to the simpler BNF form, which eliminates any use of optional elements ([ ] or ?), any use of repetition operators (*) and any use of parenthesis operators for grouping. This can always be done at the expense of a slightly longer simple BNF form. We also must transform the grammar (which again may introduce some new non-terminal symbols) so that there is never a self-recursive or recursive derivation of a given symbol on the left side of a rule, and left-factor it, i.e. modify it so that whenever a given symbol appears as the first element in two or more possible derivations of a given symbol, it gets "factored out" for common processing, by splitting it into two sub-rules. This gives us:
Expr := Term ExprTail ExprTail := nil | '+' Term | '-' Term Term := Factor TermTail TermTail := nil | '*' Factor | '/' Factor Factor := RealNum | '-' RealNum | '(' Expr ')' RealNum := Digit RestOfNum RestOfNum := nil | Digit RestOfNum
Simplified parser code corresponding to BNF grammar: Translating this quite mechanically into very simplified code, assuming we have a TokenSource type with "peekToken" and "nextToken" methods, gives us code something like the following:
void ParseExpr( TokenSource t ) { ParseTerm( t ); ParseExprTail( t );
1 of 3
1/12/2013 12:06 PM
http://www.techhui.com/profiles/blogs/1702911:BlogPost:1683
} void ParseExprTail( TokenSource t ) { string s = t.peekToken(); if ( s == '+' ) { s = t.nextToken(); ParseTerm( t ); } else if ( s == '-' ) { s = t.nextToken(); ParseTerm( t ); } else { // nil } } void ParseTerm( TokenSource t ) { ParseFactor( t ); ParseTermTail( t ); } void ParseTermTail( TokenSource t ) { string s = t.peekToken(); if ( s == '*' ) { s = t.nextToken(); ParseFactor( t ); } else if ( s == '/' ) { s = t.nextToken(); ParseFactor( t ); } else { // nil }
etc. As you can see, it's a very mechanical process. It almost takes less time to write the code than it does to format this blog post. I've slightly glossed over the need for the parser to do something and return its results; in a real parser, the routines would be returning a pointer to the expression tree it had built thus far, or perhaps evaluating it on the fly and returning a real.
Web links:
General information on BNF and EBNF
BNF and EBNF: What are they and how do they work? Wikipedia on Backus-Naur form
Comment
Comment by Cameron Souza on December 27, 2008 at 6:37pm Nice post! I hope to see more posts on this subject. Are you doing this at work or just for fun? Have you used JavaCC?
Comment by Ken Berkun on January 2, 2008 at 8:49am Hey, nice to see a good technical article. Thanks for posting. Ken
Comment by Garth Henderson on December 30, 2007 at 5:42pm The ParserNotes.pdf with C# is of great interest to me. The best business application development system that I've worked with (Unix/C++) used YACC to create P-Code from visual painters. I'm currently trying to recreate an environment within VS (w/C#) that is as good as this system was. Your post has got me thinking. Thanks.
Comment by Daniel Leuck on December 20, 2007 at 2:12pm Thank you for the great post! I've used Javacc and its tree generating preprocessor JJTree for a number of projects over the past few years. I really love these tools. Its easy to get started by downloading one of the grammars. I got started by adding some Java 5 language constructs to the BeanShell grammar.
2 of 3
1/12/2013 12:06 PM
http://www.techhui.com/profiles/blogs/1702911:BlogPost:1683
Sergey Dmitriev from JetBrains has done a lot of interesting work on tools for generating DSLs. He is currently working on a language workbench for IntelliJ IDEA. With all the buzz around LOP (language oriented programming) and DSLs (Domain Specific Languages), we are sure to see a lot of interesting tools coming out to assist in the creation of new languages and dialects. RSS Welcome to TechHui
Sign Up
or Sign In Or sign in with:
Sponsors
3 of 3
1/12/2013 12:06 PM