Automata Theory and Compiler Design
Automata Theory and Compiler Design
Automata Theory and Compiler Design
Goudru
Professor
Department of ISE
Sambhram Institute of Technology
Bangalore
MODULE – 1
The user call the target program to process the input and
produce output.
Interpreter
(iv) The machine code will be linked together with other object files
and library files by the Linker.
(v) The loader then puts together all the executable object files into
memory for execution.
Phases of Compiler OR Structure of Compiler
Compiler is a software program that convert high-level source code into low-
level machine code which can be executed by the computer. The process of
conversion has the following phases.
(v) Optimizer
An optimizer apply various optimization techniques to the intermediate code to
improve the performance of the machine code.
Syntax tree
3) Semantic analysis
The semantic analyser use syntax tree and the information in
the symbol table to check for semantic errors and save it in the
syntax tree or symbol table.
Also, the semantic analyser perform
(i) type checking.
(ii) checks that each operator has matching operands.
(iii) checks matching with respect to array type declaration etc.
End of Module-1
MODULE – 2 : Lexical analysis phases of CD
The role of lexical analyzer
The main task of lexical analyzer is to identify the lexeme.
It read the input characters from the source program, group
them into lexeme, and produce output as a sequence of token for
each lexeme in the source program.
The stream of tokens is sent to the parser for syntax analysis.
When a lexical analyzer discovers a lexeme constituting an
identifier, it enter that lexeme into the symbol table.
Another important task of Lexical analyzer is removal of
comment statement, white space, newline, tab etc.
1) keyword class: Keywords like if, then, else etc., belongs class
keyword.
2) Identifier class: Variables declared in the program like var,
var1, sum, count etc., belongs to identifier class.
3) Constant class: Constants like 2, 5, -4, 5.4 etc., belongs to
constant class.
4) operator class : Symbols like (),[], <=,>=, = etc., belongs to
operator class.
5) delimiter class : Punctuation marks like ;, :, “ ”,/, etc., belongs
to delimiter class.
6) White space class: Blank space, \n, \t etc.
Example of Tokens
Lexical Analyser return to the Parser the Token name and its
attribute values describing the lexeme represented by the
token.
For operators, punctuations, and keywords, there is no need for attribute value.
For example, the token number has given an integer-value attribute.
Lexical Error
Since fi is a valid lexeme for the token id, the lexical analyzer
return the token id to the parser and in this case parser handle the
error.
Input Buffering
The program is stored in the hard disc.
To read a Token LA use two pointers.
The first pointer is lexemeBegin pointer and second pointer is
forward pointer.
Example:
Int main()
{
}
This program statement is stored in the memory as follows:
lexemeBegin
i n t m a i n ( ) { }
forward
The buffering take place as follows:
LexemeBegin pointer points to the beginning character of the
current lexeme.
int is a token.
Forward pointer is placed at the beginning of character I, moves
to next character n, after reading t, the pointer encounter blank
space, assume that it is the end of a token.
After reading first token both lexemeBegin and forward pointers
move to the first character of second token.
lexemeBegin
i n t m a i n ( ) { }
forward
Problem with this method of buffering
To read each character form the hard disc, the processor use
one
system call.
It use only one buffer block to read the string size is 4096 bytes.
The problem with this method is when the size of the input
string is very large and buffer block is minimum fails to store the
string.
Find
No. RE Language
1 a|b L={a,b}
2 (a|b)(a|b) L={aa,ab,ba,bb}
3 a* L={ε,a,aa,aaa,…..}
4 (a|b)* L={ε,a,b,aa,ab,ba,bb,aaa…..}
5 a|a*b L={a,b,ab,aab,aaab,….}
Unsigned numbers as
END OF MODULE-2