Compiler Construction: Tahir Iqbal
Compiler Construction: Tahir Iqbal
Compiler Construction: Tahir Iqbal
Lecture 2
Tahir Iqbal
1
Source Code
Lexical Analyzer
Syntax Analyzer
Code Optimizer
Code Generator
Object Code 2
Lexical Analyzer
(Part One)
3
4
Tokens :
A token is a syntactic category in a sentence of a language. Consider the
sentence:
The words in the sentence are: “He”, “wrote”, “the” and “program”. The blanks
between words have been ignored. These words are classified as subject, verb,
object etc. These are the roles
Example in C
if(b == 0) a = b
Words: are
“if”, “(”, “b”, “==”, “0”, “)”, “a”, “=” and “b”.
5
Lexical Analysis
INPUT: sequence of characters
OUTPUT: sequence of tokens
Next_char() Next_token()
Input
Scanner Parser
character token
Symbol
Table
6
Tokens
• Identifiers: x y11 maxsize
• Keywords: if else while for
• Integers: 2 1000 -44 5L
• Floats: 2.0 0.0034 1e5
• Symbols: ( ) + * / { } < > ==
• Strings: “enter x” “error”
7
Role of Lexical Analyzer
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes identifiers
6. Correlates error messages with the source program
8
1. Removal of white space
• By white space we mean
– Blanks
– Tabs
– New lines
• Why ?
– White space is generally used for formatting
source code.
A = B + C Equals A=B+C
9
1. Removal of white space
Learn by Example
// This is beginning of my code
int A;
int B = 2;
int C = 33;
A = B + C;
/* This is
end of
my code
*/
10
1. Removal of white space
Learn by Doing
// This is beginning of my code
int A ;
A = A
*
A
;
/* This is
end of
my code
*/
11
2. Removal of comments
Why ?
– Comments are user-added strings which do not
contribute to the source code
Example in Java
// This is beginning of my code Means nothing to the program
int A;
int B = 2;
int C = 33;
A = B + C;
/* This is
end of Means nothing to the program
my code
*/
12
3. Recognizes constants/numbers
• How is recognition done?
– If the source code contains a stream of digits coming
together, it shall be recognized as a constant.
Example in Java
// This is beginning of my code
int A;
int B = 2 ;
int C = 33 ;
A = B + C;
/* This is
end of
my code
*/
13
4. Recognizes keywords
• Keywords in C and Java
– If , else , for, while, do , return etc
14
5. Recognizes identifiers
• What are identifiers ?
– Names of variables, functions, arrays , etc
15
6. Correlates error messages with the source
program
• How ?
– Keeps track of the number of new line characters seen in the source
code
– Tells the line number when an error message is to be generated.
• Example in Java
1. This is beginning of my code
2. int A; Error Message at line 1
3. int B2 = 2 ;
4. int C4R = 33 ;
5. A = B + C;
6. /* This is
7. end of
8. my code
9. */
16
Errors generated by Lexical Analyzer
1. Illegal symbols
• =>
2. Illegal identifiers
• 2ab
3. Un terminated comments
• /* This is beginning of my code
17
• Learn by example
– // Beginning of Code
– int a char } switch b[2] =;
– // end of code
• No error generated
• Why ?
• It is the job of syntax analyzer
18
Terminologies
• Token
– A classification for a common set of strings
– Examples:
Identifier, Integer, Float, LeftParen
• Lexeme
– Actual sequence of characters that matches a pattern and has
a given Token class.
– Examples:
Identifier: Name, Data, x
Integer: 345, 2, 0, 629
• Pattern
– The rules that characterize the set of strings for a token
– Example:
Integer: A digit followed or not followed by digits
Identifier: A character followed or not followed by characters or
digits
19
20
Learn by Example:
Input string: size := r * 32 + c
Identify the <token ,lexeme> pairs
1. <id, size>
2. <assign, :=>
3. <id, r>
4. <arith_symbol, *>
5. <integer, 32>
6. <arith_symbol, +>
7. <id, c>
21
Learn by Doing
Input string:
position = initial + rate * 60
22
Lets Revise!
23
Lexical Analysis
Next_char()
Next_token()
Input
Scanner Parser
character token
Symbol
Table
24
Role of Lexical Analyzer
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes identifiers
6. Correlates error messages with the source
program
25
Terminologies
• Token
–Identifier, Integer, Float, LeftParen
• Lexeme
– Identifier: Name, Data, x
Integer: 345, 2, 0, 629
Pattern
– Example:
Integer: A digit followed or not followed by digits
Identifier: A character followed or not followed by
characters or digits
26
Homework
Identify the <token ,lexeme> pairs
1. For ( int x= 0; x<=5; x++)
2. B= (( c + a) * d ) / f
3. While ( a < 5 )
a= a+1
4. Char MyCourse[5];
5. if ( a< b)
a=a*a;
else
b=b*b;
27
Assignment-LAB01
Write a program in C++ or Java that reads a source file
and performs the followings operations:
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes identifiers
You’ve already done the above
Bring your code in the next LAB-02
28