An Introduction To LEX and YACC: SYSC-3101 1
An Introduction To LEX and YACC: SYSC-3101 1
An Introduction To LEX and YACC: SYSC-3101 1
SYSC-3101
Programming Languages
CONTENTS
CONTENTS
Contents
1 General Structure 2 Lex - A lexical analyzer 3 Yacc - Yet another compiler compiler 4 Main Program 3 4 10 17
SYSC-3101
Programming Languages
1 GENERAL STRUCTURE
General Structure
Program will consist of three parts: 1. lexical analyzer: scan.l 2. parser: gram.y 3. Everything else: main.c
SYSC-3101
Programming Languages
2
%{ /* }% /* %% /* %% /*
SYSC-3101
Programming Languages
Lex Denitions
Lex Denitions
Table with two columns: 1. regular expressions 2. actions ie: integer printf("found keyword INT");
SYSC-3101
Programming Languages
Regular Expressions
Regular Expressions
text characters: a - z, 0 - 9, space... \n : newline. \t : tab. operators: " \ [ ] - ? . * + | ( ) $ / { } % < > "..." : treat ... as text characters (useful for spaces). \ : treat next character as text character. . : match anything.
SYSC-3101
Programming Languages
Regular Expressions
operators (cont): [...] : match anything within [] ? : match zero or one time, eg: ab?c ac, abc * : match zero or more times, eg: ab*c ac, abc, abbc... + : match one or more times, eg: ab+c abc, abbc... (...) : group ..., eg: (ab)+ ab, abab... | : alternation, eg ab|cd ab, cd {n,m} : repitition, eg a{1,3} a, aa, aaa {defn} : substitute defn (from rst section).
SYSC-3101
Programming Languages
Actions
Actions
; Null action. ECHO; printf("%s", yytext); {...} Multi-statement action. return yytext; send contents of yytext to the parser. yytext : C-String of matched characters (Make a copy if neccessary!) yylen : Length of the matched characters.
SYSC-3101
Programming Languages
Actions
SYSC-3101
Programming Languages
3
%{ /* }% /* %% /* %% /*
SYSC-3101
10
Programming Languages
YACC Rules
YACC Rules
A grammar rule has the following form: A : BODY ; A is a non-terminal name (LHS). BODY consists of names, literals, and actions. (RHS) literals are enclosed in quotes, eg: + \n newline. \ single quote.
SYSC-3101
11
Programming Languages
YACC Rules
SYSC-3101
12
Programming Languages
YACC Rules
Names representing tokens must be declared; this is most simply done by writing %token name1 name2 . . .
Dene name1, name2,... in the declarations section. Every name not dened in the declarations section is assumed to represent a nonterminal symbol. Every nonterminal symbol must appear on the left side of at least one rule.
SYSC-3101
13
Programming Languages
Actions
Actions
the user may associate actions to be performed each time the rule is recognized in the input process, eg: XXX : YYY ZZZ { printf("a message\n"); } ; $ is special! $n $$ psuedo-variables which refer to the values returned by the components of the right hand side of the rules. The value returned by the left-hand side of a rule. expr ) { $$ = $2 ; }
expr : (
Declarations
Declarations
%token : declares ALL terminals which are not literals. %type : declares return value type for non-terminals. %union : declares other return types. the type typedef union { body of union ... } YYSTYPE; is generated and must be included into the lex source so that types can be associated with tokens.
SYSC-3101
15
Programming Languages
Declarations
SYSC-3101
16
Programming Languages
4 MAIN PROGRAM
Main Program
Figure 5: Main template
#include <stdio.h> #include <stdlib.h> extern int yyerror(), yylex(); #define YYDEBUG 1 #include "gram.tab.c" #include "lex.yy.c" main() { /* yydebug = 1; */ yyparse(); }
SYSC-3101
17
Programming Languages
4 MAIN PROGRAM
SYSC-3101
18
Programming Languages
4 MAIN PROGRAM
E:\YaccLex>cl main.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.88 Copyright (C) Microsoft Corp 1984-1998. All rights reserved. E:\YaccLex>main.exe 2 + 1 Result is 3
SYSC-3101
19
Programming Languages