Material For CAT 1
Material For CAT 1
Material For CAT 1
Analysis and Synthesis are the two parts of compilation. o The analysis part breaks up the
source program into constituent pieces and creates an intermediate representation of the
source program. o The synthesis part constructs the desired target program from the
intermediate representation.
Tokens- Sequence of characters that have a collective meaning. Patterns- There is a set of
strings in the input for which the same token is produced as output. This set of strings is
described by a rule called a pattern associated with the token Lexeme- A sequence of
characters in the source program that is matched by the pattern for a token.
· T is a set of terminals
· S is a start symbol
E-->E+T/T
T-->T*F/F
F-->(E)/id
1. Union
Union is the most common set operation. Consider the two languages L and M.
Then the union of these two languages is denoted by:
L ∪ M = { s | s is in L or s is in M}
That means the string s from the union of two languages can either be from
language L or from language M.
2. Concatenation
Concatenation links the string from one language to the string of another language
in a series in all possible ways. The concatenation of two different languages is
denoted by:
3. Kleene Closure
Kleene closure of a language L provides you with a set of strings. This set of
strings is obtained by concatenating L zero or more time. The Kleene closure of
the language L is denoted by:
4. Positive Closure
The positive closure on a language L provides a set of strings. This set of strings
is obtained by concatenating ‘L’ one or more times. It is denoted by:
It is similar to the Kleene closure. Except for the term L0, i.e. L+ excludes ∈ until
it is in L itself.
L-> SL’
L’ -> ,SL’ | ∈
S → (L) / a
L → SL’
L’ → ,SL’ / ∈
First:
First(s) ==> { ( , a }
S -> Sab | T
Left Recursion :
S-->T S’
S’ --> abS’ | ∈
PART-C
Lexical analyzer read the source program character by character and produces a
stream of tokens.
where the terminals if, then, else, relop, id, and num generate sets of strings given
by the following regular definitions:
For this language fragment the lexical analyzer will recognize the keywords if,
then, else, as well as the lexemes denoted by relop, id, and num. To simplify
matters, we assume keywords are
reserved; that is, they cannot be used as identifiers. The num represents the
unsigned integer and real numbers of Pascal. In addition, we assume lexemes are
separated by white space,
consisting of nonnull sequences of blanks, tabs, and newlines. The lexical
analyzer will strip out white space. It will do so by comparing a string against the
regular definition ws, below.
If a match for ws is found, the lexical analyzer does not return a token to the
parser.
Transition Diagram
There are two notations for representing Finite Automata. They are
Transition Diagram
Transition Table
Example:
The type of the identifier newval must match with the type of expression
(oldval+12).
Example:
Semantic analysis
• Syntactically correct, but semantically incorrect
example:
sum = a + b;
int a;
double sum; data type mismatch
char b;
Example:
Example:
The above intermediate code will be optimized as:
Temp1 = Id3 * 1
Id1 = Id2 + Temp1
Phase-6: Code Generation
• The last phase of translation is code generation.
• Takes as input an intermediate representation of the source program and maps it
into the target language
• If the target language is machine, code, registers or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of machine
instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers to
hold variables.
Example:
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
Leftmost Derivation
The process of deriving a string by expanding the leftmost non-terminal at each step is
called as leftmost derivation.
The geometrical representation of leftmost derivation is called as a leftmost derivation
tree.
S → aB
→ aaBB (Using B → aBB)
→ aaaBBB (Using B → aBB)
→ aaabBB (Using B → b)
→ aaabbB (Using B → b)
→ aaabbaBB (Using B → aBB)
→ aaabbabB (Using B → b)
→ aaabbabbS (Using B → bS)
→ aaabbabbbA (Using S → bA)
→ aaabbabbba (Using A → a)
Derivation Tree
Rightmost Derivation-
The process of deriving a string by expanding the rightmost non-terminal at each step
is called as rightmost derivation.
The geometrical representation of rightmost derivation is called as a rightmost
derivation tree.
S → aB
→ aaBB (Using B → aBB)
→ aaBaBB (Using B → aBB)
→ aaBaBbS (Using B → bS)
→ aaBaBbbA (Using S → bA)
→ aaBaBbba (Using A → a)
→ aaBabbba (Using B → b)
→ aaaBBabbba (Using B → aBB)
→ aaaBbabbba (Using B → b)
→ aaabbabbba (Using B → b)
Derivation Tree
6. Check whether the following grammar can be implemented using predictive parser.
Check whether the string “abfg” is accepted or not using predictive parsing.
SA
AaB|Ad
BbBC|f
Cg
Step 1: (Eliminate Left Recursion)
S -> A
A -> aBA’
A’-> dA’|€
B -> bBC|f
C -> g
S’--> AA
S-->AA
A-->aA|b
Canonical collection of given grammer:
.
S’ --> AA
.
S--> AA
A--> .aA|.b
Constructing Data flow diagram
Construction of parsing Table: