LEX and YACC

Aim: Study the LEX and YACC tool.
Description: LEX-A Lexical analyzer generator:
Lex is a computer program that generates lexical analyzers ("scanners" or "lexers").Lex is commonly
used with the yacc parser generator.
Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the
lexer in the C programming language
1. A lexer or scanner is used to perform lexical analysis, or the breaking up of an input stream into
meaningful units, or tokens.
2. For example, consider breaking a text file up into individual words.
3. Lex: a tool for automatically generating a lexer or scanner given a lex specification (.l file).
Structure of a Lex file
The structure of a Lex file is intentionally similar to that of a yacc file; files are divided up into three
sections, separated by lines that contain only two percent signs, as follows:
Definition section:
%%
Rules section:
%%
C code section:
<statements>
➢ The definition section is the place to define macros and to import header files written in C. It is also
possible to write any C code here, which will be copied verbatim into the generated source file.
➢ The rules section is the most important section; it associates patterns with C statements. Patterns are
simply regular expressions. When the lexer sees some text in the input matching a given pattern, it
executes the associated C code. This is the basis of how Lex operates.
➢The C code section contains C statements and functions that are copied verbatim to the generated
source file. These statements presumably contain code called by the rules in the rules section. In large
programs it is more convenient to place this code in a separate file and link it in at compile time.
Description:-
The lex command reads File or standard input, generates a C language program, and writes it to a file
named lex.yy.c. This file, lex.yy.c, is a compilable C language program. A C++ compiler also can compile
the output of the lex command. The -C flag renames the output file to lex.yy.C for the C++ compiler. The
C++ program generated by the lex command can use either STDIO or IOSTREAMS. If the cpp define
_CPP_IOSTREAMS is true during a C++ compilation, the program uses IOSTREAMS for all I/O. Otherwise,
STDIO is used.
The lex command uses rules and actions contained in File to generate a program, lex.yy.c,which can be
compiled with the cc command. The compiled lex.yy.c can then receive input, break the input into the
logical pieces defined by the rules in File, and run program fragments contained in the actions in File.
The generated program is a C language function called yylex. The lex command stores the yylex function
in a file named lex.yy.c. You can use the yylex function alone to recognize simple one-word input, or you
can use it with other C language programs to perform more difficult input analysis functions. For
example, you can use the lex command to generate a program that simplifies an input stream before
sending it to a parser program generated by the yacc command. The yylex function analyzes the input
stream using a program structure called a finite state machine. This structure allows the program to
exist in only one state (or condition) at a time. There is a finite number of states allowed. The rules in
File determine how the program moves from one state to another. If you do not specify a File, the lex
command reads standard input. It treats multiple files as a single file.
Note: Since the lex command uses fixed names for intermediate and output files, you can have only one
program generated by lex in a given directory.
Regular Expression Basics
. : matches any single character except \n
* : matches 0 or more instances of the preceding regular expression
+ : matches 1 or more instances of the preceding regular expression
? : matches 0 or 1 of the preceding regular expression
| : matches the preceding or following regular expression
[ ] : defines a character class
() : groups enclosed regular expression into a new regular expression
“…”: matches everything within the “ “ literally
Special Functions
• yytext – where text matched most recently is stored

• yyleng – number of characters in text most recently matched
• yylval – associated value of current token
• yymore() – append next string matched to current contents of yytext
• yyless(n) – remove from yytext all but the first n characters
• unput(c) – return character c to input stream
• yywrap() – may be replaced by user – The yywrap method is called by the lexical analyser whenever it
inputs an EOF as the first character when trying to match a regular expression.
Files
y.output--Contains a readable description of the parsing tables and a report on conflicts generated by
grammar ambiguities.
y.tab.c---- Contains an output file.
y.tab.h----- Contains definitions for token names.
yacc.tmp-----Temporary file.
yacc.debug----Temporary file.
yacc.acts-----Temporary file.
/usr/ccs/lib/yaccpar---Contains parser prototype for C programs.
/usr/ccs/lib/liby.a----Contains a run-time library.

LEX and YACC

Uploaded by

Copyright:

Available Formats

LEX and YACC

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LEX and YACC

Uploaded by

Copyright:

Available Formats

Aim: Study the LEX and YACC tool.

Description: LEX-A Lexical analyzer generator:

2. For example, consider breaking a text file up into individual words.

Structure of a Lex file

Regular Expression Basics

. : matches any single character except \n

* : matches 0 or more instances of the preceding regular expression

+ : matches 1 or more instances of the preceding regular expression

? : matches 0 or 1 of the preceding regular expression

| : matches the preceding or following regular expression

[ ] : defines a character class

() : groups enclosed regular expression into a new regular expression

“…”: matches everything within the “ “ literally

• yytext – where text matched most recently is stored

• yylval – associated value of current token

• yymore() – append next string matched to current contents of yytext

• yyless(n) – remove from yytext all but the first n characters

• unput(c) – return character c to input stream

y.tab.c---- Contains an output file.

y.tab.h----- Contains definitions for token names.

/usr/ccs/lib/yaccpar---Contains parser prototype for C programs.

/usr/ccs/lib/liby.a----Contains a run-time library.

You might also like