0% found this document useful (0 votes)

139 views

CS3304 9 LanguageSyntax 2 PDF

The document describes the process of lexical and syntax analysis in compiler design. It discusses how lexical analysis is used to tokenize the input text into lexemes and tokens, and syntactic analysis uses these tokens to construct a parse tree. The key steps are: 1. Lexical analysis scans the input text and identifies tokens using techniques like character classes, state diagrams, and lookup tables. 2. Syntactic analysis consumes the tokens to build a parse tree and check syntax validity. 3. Separating lexical and syntax analysis keeps the design simple, efficient, and portable.

Uploaded by

ruba

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views

CS3304 9 LanguageSyntax 2 PDF

Uploaded by

ruba

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Lexical and Syntax Analysis

In Text: Chapter 4
N. Meng, F. Poursardar
Lexical and Syntactic Analysis
• Two steps to discover the syntactic structure of a
program
– Lexical analysis (Scanner): to read the input characters and
output a sequence of tokens
– Syntactic analysis (Parser): to read the tokens and output a
parse tree and report syntax errors if any

2
Compilation
Process

3
Interaction between lexical analysis and
syntactic analysis

4
Reasons to Separate Lexical and Syntax
Analysis
• Simplicity - less complex approaches can be used for
lexical analysis; separating them simplifies the parser
• Efficiency - separation allows optimization of the
lexical analyzer
• Portability - parts of the lexical analyzer may not be
portable, but the parser is always portable

5
Scanner
• Pattern matcher for character strings
– If a character sequence matches a pattern, it is identified as
a token
• Responsibilities
– Tokenize source, report lexical errors if any, remove
comments and whitespace, save text of interesting tokens,
save source locations, (optional) expand macros and
implement preprocessor functions

6
Tokenizing Source
• Given a program, identify all lexemes and their
categories (tokens)

7
Lexeme, Token, & Pattern
• Lexeme
– A sequence of characters in the source program with the
lowest level of syntactic meanings
o E.g., sum, +, -
• Token
– A category of lexemes
– A lexeme is an instance of token
– The basic building blocks of programs

8
Token Examples

Token Informal Description Sample

Lexemes
keyword All keywords defined in the language if else
comparison <, >, <=, >=, ==, != <=, !=
id Letter followed by letters and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6

literal Anything surrounded by “’s, but “core dumped”
exclude “

9
Another Token Example
Consider the following example of an assignment
statement:
result = oldsum – value / 100;
• Following are the tokens and lexemes of this
statement:

10
Lexeme, Token, & Pattern
• Pattern
– A description of the form that the lexemes of a token may
take
– Specified with regular expressions

11
Motivating Example
• Token set:
– assign -> :=
– plus -> +
– minus -> -
– times -> *
– div -> /
– lparen -> (
– rparen -> )
– id -> letter(letter|digit)*
– number -> digit digit*|digit*(.digit|digit.)digit*

12
Motivating Example
• What are the lexemes in the string “a_var:=b*3” ?
• What are the corresponding tokens ?
• How do you identify the tokens?

13
Lexical Analysis
• Three approaches to build a lexical analyzer:
– Write a formal description of the tokens and use a software
tool that constructs a table-driven lexical analyzer from such a
description
– Design a state diagram that describes the tokens and write a
program that implements the state diagram
– Design a state diagram that describes the tokens and hand-
construct a table-driven implementation of the state diagram

14
State Diagram
• A state transition diagram, or just state diagram,
is a directed graph.
• The nodes of a state diagram are labeled with state
names.
• The arcs are labeled with the input characters that
cause the transitions among the states.
• An arc may also include actions the lexical analyzer
must perform when the transition is taken.

15
State Diagram
• State diagrams of the form used for lexical analyzers
are representations of a class of mathematical
machines called finite automata.
• Finite automata can be designed to recognize
members of a class of languages called regular
languages.
• Regular grammars are generative devices for regular
languages.
• The tokens of a programming language are a regular
language, and a lexical analyzer is a finite automaton.

16
State Diagram Design
• A naïve state diagram would have a transition from
every state on every character in the source
language - such a diagram would be very large!

• Reason? Because every node in the state diagram

would need a transition for every character in the
character set of the language being analyzed.
• Solution: Consider ways to simplify

17
State Diagram Design - Example
• Design a lexical analyzer that recognizes only arithmetic
expressions, including variable names and integer literals
as operands.
• Assume that the variable names consist of strings of
uppercase letters, lowercase letters, and digits but must
begin with a letter.
• Names have no length limitation.

• How many transitions for initial state?

• How can we simplify it?
18
Example (continued)
• There are 52 different characters (any uppercase or
lowercase letter) that can begin a name, which would
require 52 transitions from the transition diagram’s
initial state.
• However, a lexical analyzer is interested only in
determining that it is a name and is not concerned
with which specific name it happens to be.
• Therefore, we define a character class named
LETTER for all 52 letters and use a single transition
on the first letter of any name.

19
Example (continued)
• Another opportunity for simplifying the transition
diagram is with the
• integer literal tokens.
• There are 10 different characters that could begin an
integer literal lexeme. This would require 10
transitions from the start state of the state diagram.
• define a character class named DIGIT for digits and
use a single transition on any character in this
character class to a state that collects integer literals

20
Lexical Analysis (continued)
• In many cases, transitions can be combined to
simplify the state diagram
– When recognizing an identifier, all uppercase and
lowercase letters are equivalent
o Use a character class that includes all letters
– When recognizing an integer literal, all digits are equivalent
- use a digit class

21
Lexical Analysis (continued)
• Reserved words and identifiers can be recognized
together (rather than having a part of the diagram
for each reserved word)
– Use a table lookup to determine whether a possible
identifier is in fact a reserved word

22
State Diagram

23
Lexical Analysis (continued)
• Convenient utility subprograms:
– getChar - gets the next character of input, puts it in
nextChar, determines its class and puts the class in
charClass
– addChar - puts the character from nextChar into the
place the lexeme is being accumulated
– lookup - determines whether the string in lexeme is a
reserved word (returns a code)

24
/* Function declarations */
void addChar();
void getChar();
void getNonBlank();
int lex();

/* Character classes */
#define LETTER 0
#define DIGIT 1
#define UNKNOWN 99

/* Token codes */
#define INT_LIT 10
#define IDENT 11
#define ASSIGN_OP 20
#define ADD_OP 21
#define SUB_OP 22
#define MULT_OP 23
#define DIV_OP 24
#define LEFT_PAREN 25
#define RIGHT_PAREN 26

25
Implementation Pseudo-code
static TOKEN nextToken;
static CHAR_CLASS charClass;
int lex() {
switch (charClass) {
case LETTER:
// add nextChar to lexeme
addChar();
// get the next character and determine its class
getChar();
while (charClass == LETTER || charClass == DIGIT)
{
addChar();
getChar();
}
nextToken = ID;
break;
26
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
nextToken = INT_LIT;
break;
…
case EOF:
nextToken = EOF;
lexeme[0] = ‘E’;
lexeme[1] = ‘O’;
lexeme[2] = ‘F’;
lexeme[3] = 0;
}
printf (“Next token is: %d, Next lexeme is %s\n”,
nextToken, lexeme);
return nextToken;
} /* End of function lex */

27
Lexical Analyzer
Implementation:
→ front.c (pp. 166-170)

- Following is the output of the lexical analyzer

of front.c when used on (sum + 47) /
total
Next token is: 25 Next lexeme is (
Next token is: 11 Next lexeme is sum
Next token is: 21 Next lexeme is +
Next token is: 10 Next lexeme is 47
Next token is: 26 Next lexeme is )
Next token is: 24 Next lexeme is /
Next token is: 11 Next lexeme is total
Next token is: -1 Next lexeme is EOF
28
The Parsing Problem
• Given an input program, the goals of the parser:
– Find all syntax errors; for each, produce an appropriate
diagnostic message and recover quickly
– Produce the parse tree, or at least a trace of the parse
tree, for the program

29
The Parsing Problem (continued)
• The Complexity of Parsing
– Parsers that work for any unambiguous grammar are
complex and inefficient ( O(n3), where n is the length of
the input )
– Compilers use parsers that only work for a subset of all
unambiguous grammars, but do it in linear time ( O(n),
where n is the length of the input )

30
Two Classes of Grammars
• Left-to-right, Leftmost derivation (LL)
• Left-to-right, Rightmost derivation (LR)
• We can build parsers for these grammars that run in
linear time

31
Grammar Comparison
LL LR
E -> T E’ E -> E + T | T
E’ -> + T E’ | ε T -> T * F | F
T -> F T’ F -> id
T’ -> * F T’ | ε
F -> id

32
Two Categories of Parsers
• LL(1) Parsers
– L: scanning the input from left to right
– L: producing a leftmost derivation
– 1: using one input symbol of lookahead at each step to make
parsing action decisions
• LR(1) Parsers
– L: scanning the input from left to right
– R: producing a rightmost derivation in reverse
– 1: the same as above

33
Two Categories of Parsers
• LL(1) parsers (predicative parsers)
– Top down
o Build the parse tree from the root
o Find a left most derivation for an input string
• LR(1) parsers (shift-reduce parsers)
– Bottom up
o Build the parse tree from leaves
o Reducing a string to the start symbol of a grammar

34
Top-down Parsers
• Given a sentential form, xAα, the parser must choose
the correct A-rule to get the next sentential form in
the leftmost derivation, using only the first token
produced by A
• The most common top-down parsing algorithms:
– Recursive descent - a coded implementation
– LL parsers - table driven implementation

35
Bottom-up parsers
• Given a right sentential form, α, determine what
substring of α is the right-hand side of the rule in the
grammar that must be reduced to produce the
previous sentential form in the right derivation
• The most common bottom-up parsing algorithms are
in the LR family

36
Recursive Descent Parsing
• Parsing is the process of tracing or constructing a parse tree for
a given input string
• Parsers usually do not analyze lexemes; that is done by a lexical
analyzer, which is called by the parser
• A recursive descent parser traces out a parse tree in top-down
order; it is a top-down parser
• Each nonterminal has an associated subprogram; the
subprogram parses all sentential forms that the nonterminal can
generate
• The recursive descent parsing subprograms are built directly
from the grammar rules
• Recursive descent parsers, like other top-down parsers, cannot be
built from left-recursive grammars

37
Recursive Descent Example
• Example: For the grammar:
<term> -> <factor> {(* | /) <factor>}
• Simple recursive descent parsing subprogram:
void term() {
factor(); /* parse the first factor*/
while (next_token == ast_code ||
next_token == slash_code) {
lexical(); /* get next token */
factor(); /* parse the next factor */
}
}

38
39

Sudkamp Solutions
No ratings yet
Sudkamp Solutions
115 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
355mt2f16sol PDF
No ratings yet
355mt2f16sol PDF
5 pages
The - Method.of - Mathematical.induction Sominskii 1961
No ratings yet
The - Method.of - Mathematical.induction Sominskii 1961
66 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Lecture3_E
No ratings yet
Lecture3_E
153 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
Lexical
No ratings yet
Lexical
34 pages
Compiler_Construction_Lexical_Analysis
No ratings yet
Compiler_Construction_Lexical_Analysis
63 pages
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
No ratings yet
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
80 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Compiler Design 1
100% (1)
Compiler Design 1
30 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Describing Syntax and Semantics: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Describing Syntax and Semantics: CSE 325/CSE 425: Concepts of Programming Language
46 pages
2 Lexical Analyzer
No ratings yet
2 Lexical Analyzer
21 pages
cd1
No ratings yet
cd1
92 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Lexical Analysis - Compiler Design: Token, Pattern and Lexeme
No ratings yet
Lexical Analysis - Compiler Design: Token, Pattern and Lexeme
5 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Lesson 08 2
No ratings yet
Lesson 08 2
33 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
HW_31712
No ratings yet
HW_31712
22 pages
c2 PDF
No ratings yet
c2 PDF
13 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
Compiler 2
No ratings yet
Compiler 2
38 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
1.describing Syntax and Semantics
No ratings yet
1.describing Syntax and Semantics
110 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
5.Tokens, Patterns, and Lexemes
No ratings yet
5.Tokens, Patterns, and Lexemes
7 pages
Lecture 3- Lexical Analysis (1)
No ratings yet
Lecture 3- Lexical Analysis (1)
42 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
Lecture-2-10022025-035804pm
No ratings yet
Lecture-2-10022025-035804pm
27 pages
CT - Lecture 2
No ratings yet
CT - Lecture 2
23 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Cs1352 Principles of Compiler Design
No ratings yet
Cs1352 Principles of Compiler Design
33 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Ch2_Lexical Analysis
No ratings yet
Ch2_Lexical Analysis
71 pages
MOD 04 - Language Description & Lexical Analysis
No ratings yet
MOD 04 - Language Description & Lexical Analysis
107 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
Week 5-6
No ratings yet
Week 5-6
33 pages
Chapter 3 Finite automata and lexical analysis
No ratings yet
Chapter 3 Finite automata and lexical analysis
100 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
95 pages
Compiler Design
No ratings yet
Compiler Design
12 pages
Lecture 7 (1)
No ratings yet
Lecture 7 (1)
27 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Wordpress Tutorial
No ratings yet
Wordpress Tutorial
1 page
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Data Mining (DM) : Lecture 3: Know Your Data
No ratings yet
Data Mining (DM) : Lecture 3: Know Your Data
53 pages
Distributed Database Management Systems: Week-3
No ratings yet
Distributed Database Management Systems: Week-3
7 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Distributed Database Management Systems: Week-4
No ratings yet
Distributed Database Management Systems: Week-4
24 pages
Distributed Database Management Systems: Week-3
No ratings yet
Distributed Database Management Systems: Week-3
7 pages
Week 5
No ratings yet
Week 5
23 pages
Lexical Analysis: 4.1 Motivation of The Chapter
No ratings yet
Lexical Analysis: 4.1 Motivation of The Chapter
2 pages
WEEK1
No ratings yet
WEEK1
20 pages
Lecture 2
No ratings yet
Lecture 2
29 pages
Compiler Design - Theory Tools and Examples PDF
No ratings yet
Compiler Design - Theory Tools and Examples PDF
320 pages
Brouwer1998 Chapter MythsAndFactsAboutTheEfficient PDF
No ratings yet
Brouwer1998 Chapter MythsAndFactsAboutTheEfficient PDF
15 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit 2 - Session 3
No ratings yet
Unit 2 - Session 3
21 pages
Rami Grossberg and Saharon Shelah - On Hanf Numbers of The Infinitary Order Property
No ratings yet
Rami Grossberg and Saharon Shelah - On Hanf Numbers of The Infinitary Order Property
23 pages
Roll No: CS3300 Compiler Design: (From Yesterday's Class)
No ratings yet
Roll No: CS3300 Compiler Design: (From Yesterday's Class)
2 pages
LOGIC
No ratings yet
LOGIC
4 pages
Chapter 7 Logical Agents 2
No ratings yet
Chapter 7 Logical Agents 2
98 pages
GEED 10053 Quiz 2
100% (2)
GEED 10053 Quiz 2
3 pages
Theory of Computation
67% (3)
Theory of Computation
24 pages
Symbolic Logic
No ratings yet
Symbolic Logic
39 pages
Discrete Mathematics (Module - II)
No ratings yet
Discrete Mathematics (Module - II)
35 pages
Math - CA-2 - Shreyasi Chattopadhyay (35) - Compressed
No ratings yet
Math - CA-2 - Shreyasi Chattopadhyay (35) - Compressed
8 pages
Back Naur
No ratings yet
Back Naur
4 pages
Rosen 7 e Extra Examples 0106
No ratings yet
Rosen 7 e Extra Examples 0106
4 pages
Math Symbols
0% (1)
Math Symbols
1 page
Selected Solutions 3
No ratings yet
Selected Solutions 3
4 pages
n5474411399520 2 PDF
No ratings yet
n5474411399520 2 PDF
42 pages
Successive Differentiation Part II Leibnitz Theorem
No ratings yet
Successive Differentiation Part II Leibnitz Theorem
9 pages
Compiler Design: 4. Language Grammars
No ratings yet
Compiler Design: 4. Language Grammars
14 pages
Regular Expressions
No ratings yet
Regular Expressions
34 pages
Wa0001
No ratings yet
Wa0001
6 pages
Properties of CFG
No ratings yet
Properties of CFG
84 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
1 Dfa
No ratings yet
1 Dfa
8 pages
Ai - 2
No ratings yet
Ai - 2
48 pages
Bcse304l Theory-Of-computation TH 1.0 70 Bcse304l
No ratings yet
Bcse304l Theory-Of-computation TH 1.0 70 Bcse304l
2 pages
Parallel Postulate
No ratings yet
Parallel Postulate
2 pages
Benjamin Mayne CPSC 627: Context Sensitive Languages and Linear Bounded Automata
No ratings yet
Benjamin Mayne CPSC 627: Context Sensitive Languages and Linear Bounded Automata
25 pages
Mathematics in The Modern World: Gematmw
No ratings yet
Mathematics in The Modern World: Gematmw
109 pages