Compiler Lab Manual
Compiler Lab Manual
Compiler Lab Manual
Mecheri, Salem
TOTAL: 45 PERIODS
OUTCOMES:
At the end of the course, the student should be able to
Implement the different Phases of compiler using tools
Analyze the control flow and data flow of a typical program
Optimize a given program
Generate an assembly language program equivalent to a source language program
Page Marks
S. No. Date Name of the Experiment Remarks
No. Awarded
9 Construction of DAG
Page Marks
S. No. Date Name of the Experiment Remarks
No. Awarded
9 Construction of DAG
Aim:
To implement Symbol Table using C program.
Description:
Symbol table management:
Symbol table is a data structure containing the record of each identifier, with fields
for the attributes of the identifier. The data structure allows us to find the record for each
identifier quickly and store or retrieve data from that record quickly. When
the lexical analyzer detects an identifier in the source program, the identifier is entered into
symbol table. The remaining phases enter information about identifiers in to the symbol table.
Coding:
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
#include<alloc.h>
#include<math.h>
void main()
{
int i=0,j=0,x=0,n,flag=0;
void *p,*add[15];
char ch,c,srch,b[15],d[15];
clrscr();
printf("Expression terminated by $:");
while((c=getchar())!='$')
{
b[i]=c;
i++;
}
n=i-1;
printf("given expression");
i=0;
while(i<=n)
{
printf("%c",b[i]);
i++;
}
printf("symbol table");
printf("\nsymbol \taddr \ttype\n");
while(j<=n)
{ c=b[j];
OUTPUT:
Result:
Thus the symbol table using C program was implemented successfully.
Aim:
To implement Lexical Analysis using C.
Description:
In compiler, lexical analysis is also called linear analysis or scanning. In lexical analysis the
stream of characters making up the source program is read from left to right and grouped into tokens
that are sequences of characters having a collective meaning.
LEX:
LEX helps write programs whose control flow is directed by instances of regular expressions
in the input stream. It is well suited for editor-script type transformations and for segmenting input in
preparation for a parsing routine.
LEX is a program generator designed for Lexical processing of character input streams. It
accepts a high-level, problem oriented specification for character string matching, and produces a
program in a general purpose language which recognizes regular expressions. The regular expressions
are specified by the user in the source specifications given to LEX. The LEX written code recognizes
these expressions in an input stream and partitions the input stream into strings matching the
expressions. At the boundaries between strings program sections provided by the user are executed.
The LEX source file associates the regular expressions and the program fragments. As each
expression appears in the input to the program written by LEX, the corresponding fragment is
executed
The user supplies the additional code beyond expression matching needed to complete his
tasks, possibly including code written by other generators. The program that recognizes the
expressions is generated in the general purpose programming language employed for the user's
program fragments. Thus, a high level expression language is provided to write the string expressions
to be matched while the user's freedom to write actions is unimpaired. This avoids forcing the user
who wishes to use a string manipulation language for input analysis to write processing programs in
the same and often inappropriate string handling language
LEX is not a complete language, but rather a generator representing a new language feature
which can be added to different programming languages, called ``host languages.'' Just as general
purpose languages can produce code to run on different computer hardware, LEX can write code in
different host languages
LEX turns the user's expressions and actions (called source in this memo) into the host
general-purpose language; the generated program is named yyLEX. The yyLEX program will
recognize expressions in a stream (called input in this memo) and perform the specified actions for
each expression as it is detected. See Figure 1.
LEX Source:
The general format of LEX source is:
{definitions}
%%
{rules}
%%
{user subroutines}
where the definitions and the user subroutines are often omitted. The second %% is optional, but the
first is required to mark the beginning of the rules. The absolute minimum LEX program is thus (no
definitions, no rules) which translates into a program which copies the input to the output unchanged.
CODING:
Program Name:
#include<string.h>
#include<ctype.h>
#include<stdio.h>
void keyword(char str[10])
{
if(strcmp("for",str)==0||strcmp("while",str)==0||strcmp("do",str)==0||strcmp("int",str
)==0||strcmp("float",str)==0||strcmp("char",str)==0||strcmp("double",str)==0||strcmp("
static",str)==0||strcmp("switch",str)==0||strcmp("case",str)==0)
printf("\n%s is a keyword",str);
else
printf("\n%s is an identifier",str);
}
Result:
Aim:
Aim:
To implement Lexical Analysis using Lex Tool.
Description:
In compiler, lexical analysis is also called linear analysis or scanning. In lexical analysis the
stream of characters making up the source program is read from left to right and grouped into tokens
that are sequences of characters having a collective meaning.
LEX:
LEX helps write programs whose control flow is directed by instances of regular expressions
in the input stream. It is well suited for editor-script type transformations and for segmenting input in
preparation for a parsing routine.
LEX is a program generator designed for Lexical processing of character input streams. It
accepts a high-level, problem oriented specification for character string matching, and produces a
program in a general purpose language which recognizes regular expressions. The regular expressions
are specified by the user in the source specifications given to LEX. The LEX written code recognizes
these expressions in an input stream and partitions the input stream into strings matching the
expressions. At the boundaries between strings program sections provided by the user are executed.
The LEX source file associates the regular expressions and the program fragments. As each
expression appears in the input to the program written by LEX, the corresponding fragment is
executed
The user supplies the additional code beyond expression matching needed to complete his
tasks, possibly including code written by other generators. The program that recognizes the
expressions is generated in the general purpose programming language employed for the user's
program fragments. Thus, a high level expression language is provided to write the string expressions
to be matched while the user's freedom to write actions is unimpaired. This avoids forcing the user
who wishes to use a string manipulation language for input analysis to write processing programs in
the same and often inappropriate string handling language
LEX is not a complete language, but rather a generator representing a new language feature
which can be added to different programming languages, called ``host languages.'' Just as general
purpose languages can produce code to run on different computer hardware, LEX can write code in
different host languages
LEX turns the user's expressions and actions (called source in this memo) into the host
general-purpose language; the generated program is named yyLEX. The yyLEX program will
LEX Source:
The general format of LEX source is:
{definitions}
%%
{rules}
%%
{user subroutines}
where the definitions and the user subroutines are often omitted. The second %% is optional, but the
first is required to mark the beginning of the rules. The absolute minimum LEX program is thus (no
definitions, no rules) which translates into a program which copies the input to the output unchanged.
CODING:
ALGORITHM:
Step1: Declare all the variables, constants and regular definitions in the declaration
Section
Step2: Define the patterns and respective actions in the rule section
Step3: Patterns are regular expression and the actions are program fragments.
Step4: In the main function, yylex() function is used to place a call to rule section to
produce necessary tokens.
Step5: It uses yytext-a character array to print the tokens matching the patterns.
Program Name:
Result:
Aim:
Description:
YACC
Theory:
Yacc stands for "yet another compiler-compiler," reflecting the popularity of parser
generators in the early 1970's when the first version of Yacc was created by S. C. Johnson. Yacc is
available as a command on the UNIX system, and has been used to help implement hundreds of
compilers.
Structure of a Yacc Grammar
A yacc grammar consists of three sections: the definition section, the rules section, and the user
subroutines section.
... definition section ...
%%
... rules section ...
%%
... user subroutines section ...
The sections are separated by lines consisting of two percent signs. The first two sections are required,
although a section may be empty. The third section and the preceding "%%" line may be omitted.
CODING:
Program Name:
1. Write a YACC Program to check whether given string a^nb^n is accepted by
the grammar.
%{
#include<stdio.h>
%}
%token A B
%%
S : ASB
| AB
|
;
%%
int yylex()
{
char ch;
ch=getchar();
if (ch==’a’)
return A;
else if (ch==’b’)
return B;
else if(ch==’\n’)
return 0;
else return ch;
}
int main()
{
Printf(“enter the expression \n”);
yyparse();
Printf(“\n string accepted by grammar”);
Return 0;
}
%{
#include<stdio.h>
#include<stdlib.h>
%}
%token ID
%left '+','-'
%left '*','/'
%%
E:E'+'E { printf("valid expression);}
| E'-'E { printf("valid expression);}
| E'/'E { printf("valid expression);}
| E'*'E { printf("valid expression);}
| ID
;
%%
int yylex()
{
Char ch;
yyerror()
int main()
{
printf("Enter the expression");
yyparse();
Output:
Result:
Aim:
Description:
Syntax analysis
It is also called as Hierarchical analysis or parsing. It involves grouping the tokens of the
source program into grammatical phrases that are used by the compiler to synthesize output. Usually,
a parse tree represents the grammatical phrases of the sourse program.
CODING:
ALGORITHM 1:
Step 3: When the regular definition morning is given return MOR token
Step 5: In the YACC program when the yyparse() is called it will execute the yylex()
function in turn to get the necessary token (i.e) MOR and NEWLINE
Step 6: When “morning” is keyed it will ask for the name, prints “successful execution of
program”
Program Name:
RETURN NEWLINE AND TOKEN
new.y:
%{
#include<stdio.h>
printf("\nerror...........");
printf("\nPlease enter valid input(morning)");
}
int yywrap()
{
return 1;
}
jj.l:
%{
#include<stdio.h>
#include"y.tab.h"
%}
%%
printf(“Type morning:”)
"morning" {return MOR;}
[\n] {return NEWLINE;}
[.] { }
%%
y.tab.h:
#ifndef YYERRCODE
#define YYERRCODE 256
#endif
#define MOR 257
OUTPUT:
Type morning:
morning
Result:
Thus the Program to recognize a valid variable which starts with a letterfollowed by any
number of letters or digits was implemented successfully.
Aim:
ALGORITHM 2:
Step 3: In the procedure section, using yylex() the token number manipulation is returned to
the
parser.
Step 4: When the yyparse() is called it will do the necessary calculation, also it will get
token
CODING:
Program Name:
j.y
%{
#include<stdio.h>
#include<ctype.h>
#define YYSTYPE double
%}
%token NUMBER
%left '+''-'
%left '*''/'
%%
list:list '\n'
OUTPUT:
34/0
divide by 0 exception
RESULT:
Thus a YACC program to return the NEWLINE & MORNING token and a program
to perform calculator is executed and verified successfully.
Aim:
Description:
LR Parser:
This is bottom –Up syntax analysis technique that can be used to parse a class of CFGs. LR
Parser is actually called LR (K) parsing where,
L – Left to right scanning
R – rightmost derivation
K – Number of input symbols of look ahead that are used in making parsing decisions.
(k is omitted it is 1)
• LR parsing is attractive because:
– LR parsing is most general non-backtracking shift-reduce parsing, yet it is still
efficient.
– The class of grammars that can be parsed using LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.
– LL(1)-Grammars LR(1)-Grammars
– An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-
right scan of the input.
Shift-Reduce Parsing:
Shift-reduce parsing is a method for syntax analysis that constructs the parse
tree on seeing an input string beginning at the leaves and working towards the
root.
At each step it attempts to reduce a substring from the input by replacing it
with the right side of a grammar production, thus attempting to reach the start
symbol of the grammar.
At the end of the operation of the shift-reduce parser there can be traced in
reverse the rightmost derivation of the input string according to the grammar.
CODING:
Program Name:
<int.l>
%{
#include"y.tab.h"
#include<stdio.h>
#include<string.h>
int LineNo=1;
%}
Result:
Thus the BNF rules into Yacc form and Abstract Syntax Tree was generated.
Aim:
Description:
Languages differ greatly in how strict their static semantics is: none of the things above is
checked by all programming languages!
In general, the more there is static checking in the compiler, the less need there is for manual
debugging.
CODING:
Result:
Aim:
st art
1 2
st art
1
a 2
(2)
(a) Building states and transitions of partial NFA for unions
a
2 3
st art
1 6
b
4 5
start a b
1 2 3
(c) Building states and transitions of partial NFA for closures
st art a
1 2 3 4
Description:
Input : A Regular expression ‘r’
Output : An DFA that recognizes L(r).
Method:
1) Construct an syntax tree for the augmented regular expression (r)#, where # is a
unique end marker appended to (r).
2) Construct the functions Null able, first pos, Laptops and follow-on by making
depth- first traversals of T.
3) Construct States, the set of states of D, the Duran the transition table of D by the
following procedure.
4) The start state of D is firstpos(root) and the accepting states are all those
containing the position associated with the end marker #.
Algorithm:
ALGORITHM:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main()
{
char reg[20];
int q[20][3],i,j,len,a,b;
for(a=0;a<20;a++)
{
for(b=0;b<3;b++)
{
q[a][b]=0;
}}
printf("Regular expression :\n");
scanf("%s",reg);
len=strlen(reg);
i=0;
j=1;
while(i<len)
{
if(reg[i]=='a' & reg[i+1]!='/' & reg[i+1]!='*')
{
q[j][0]=j+1;
j++;
}
if(reg[i]=='b' & reg[i+1]!='/' & reg[i+1]!='*')
{
q[j][1]=j+1;
j++;
}
if(reg[i]=='e' & reg[i+1]!='/' & reg[i+1]!='*')
{
q[j][2]=j+1;
j++;
}
if(reg[i]=='a' & reg[i+1]=='/' & reg[i+2]=='b')
{
q[j][2]=((j+1)*10)+(j+3);
j++;
q[j][0]=j+1;
j++;
q[j][2]=j+3;
j++;
q[j][2]=j+1;
j++;
i=i+2;
}
if(reg[i]=='a'& reg[i+1]=='*')
{
OUTPUT:
[student@localhost~] vi stmt.l
(i)Regular expression:
aba*a
Transition function
q[1,a]->2
q[2,b]->3
q[3,e]->4&6
q[4,a]->5
q[6,e]->7&5
q[7,a]->8
(ii)Regular expression:
b/a
Transition function
q[1,e]->2&4
q[2,b]->3
q[3,e]->6
q[4,a]->5
q[5,e]->6
(iii)Regular expression:
abab
Transition function
q[1,a]->2
q[2,b]->3
q[3,a]->4
q[4,b]->5
RESULT:
Thus the C program to construct NFA from a regular expression is executed and
verified successfully.
AIM:
To write a C program to construct a DFA from a regular expression.
ALGORITHM:
PROGRAM:
#include<stdio.h>
#include<string.h>
int main()
{
int t,i=0,j=0;
char ch[10],l[10],r[10];
printf("\nRegular expression to DFA \n");
printf("\nEnter the regular expression :");
scanf("%s",ch);
t=strlen(ch);
while(i<=t)
{
if(ch[i]=='a'||ch[i]=='b')
{
l[j]=ch[i];
j++;
}
else if(ch[i]=='*')
{
r[j-1]=ch[i-1];
l[j-1]='\0';
j--;
}
else if(ch[i]=='+')
{
r[j]=ch[i-1];
}
i++;
}
printf("\nTransition table for DFA\n");
printf("\nStates\ta\tb\n");
for(i=0;i<=j;i++)
{
if(r[i]=='a')
{
if(l[i]=='a')
printf("\n %d\t%d\t%d",i,i,i+1);
else if(l[i]=='b')
printf("\n %d\t%d\t%d",i,i,i+1);
else
printf("\n %d\t %d",i,i);
}
else if(r[i]=='b')
{
OUTPUT:
(i)Regular expression to DFA
Enter the regular expression: a+b
Transition table for DFA
States a b
0 1
1 1 2
2 - -
RESULT:
Thus the C program to construct DFA from a regular expression is executed and
verified successfully.
Aim:
Description:
Heap allocation:
CODING:
Program Name:
INSERTION
PUSH(item)
1. If (item = max of stack)
Print “overflow”
Return
2. top = top + 1
3. stack[top] = item
4. Return
DELETION
POP(item)
1. If (top = - 1)
#include<stdio.h>
#include<conio.h>
#define MAXSIZE 10
void push();
int pop();
void traverse();
int stack[MAXSIZE];
int Top=-1;
void main()
{
int choice;
char ch;
do
{
clrscr();
printf("\n1. PUSH ");
printf("\n2. POP ");
printf("\n3. TRAVERSE ");
printf("\nEnter your choice");
scanf("%d",&choice);
switch(choice)
{
case 1: push();
break;
case 2: printf("\nThe deleted element is %d",pop());
break;
case 3: traverse();
break;
default: printf("\nYou Entered Wrong Choice");
}
printf("\nDo You Wish To Continue (Y/N)");
fflush(stdin);
scanf("%c",&ch);
}
while(ch=='Y' || ch=='y');
}
void push()
{
int item;
if(Top == MAXSIZE - 1)
{
printf("\nThe Stack Is Full");
getch();
exit(0);
}
else
Output:
1.Push
2.POP
3.traverse
Enter the choice: 1
Enter the element to be inserted:
Result:
Aim:
Description:
A directed acyclic graph (DAG!) is a directed graph that contains no cycles. A rooted tree is
a special kind of DAG and a DAG is a special kind of directed graph. For example, a DAG
may be used to represent common subexpressions in an optimising compiler.
+ +
. . . .
. . . .
* () *<---| ()
.. . . .. | . .
. . . . . . | . |
a b f * a b | f |
.. ^ v
. . | |
a b |--<----
Tree DAG
expression: a*b+f(a*b)
Coding:
Aim:
Description:
Intermediate codes are machine independent codes, but they are close to machine instructions.
The given program in a source language is converted to an equivalent program in an
intermediate language by the intermediate code generator.
Intermediate language can be many different languages, and the designer of the compiler
decides this intermediate language.
o syntax trees can be used as an intermediate language.
o postfix notation can be used as an intermediate language.
o three-address code (Quadruples) can be used as an intermediate language
we will use quadruples to discuss intermediate code generation
Quadruples are close to machine instructions, but they are not actual machine
instructions.
ALGORITHM:
Step2: Initialize the code, input and the code generator functions.
Step4: The intermediate form of expression generates the code in assembler language
Step5: The ADD, SUB, MUL, DIV are the results obtained for +,-,*,/. The MOV
operation is also used to process with registers.
Step6: Print the code for the expression using the code generator function.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
char optr[4]={'+','-','*','/'};
char code[5][4]={{"ADD"},{"SUB"},{"MUL"},{"DIV"}};
char input[20][6];
void codegen();
void getip();
int main()
getip();
codegen();
void getip()
int i;
for(i=0;;i++)
scanf("%s",input[i]);
if(strcmp("#",input[i])==0)
break;
}}
void codegen()
for(i=0;strcmp("#",input[i])!=0;i++)
for(j=0;j<5;j++)
flag=0;
if(input[i][3]=='\0')
printf("MOV %c,%c\n",input[i][2],input[i][0]);
flag=1;
break;
else
if(input[i][3]==optr[j])
assemble(input[i],j);
flag=1;
break;
}}}
if(flag==0)
printf("ERROR!!!!!!!!\n");
exit(0);
int index=0;
printf("MOV %c,R%d\n",pos[4],index);
printf("%s %c,R%d\n",code[j],pos[2],index);
printf("MOV R%d,%c\n",index,pos[0]);
OUTPUT:
a=b+7
#
MOV 7,R0
ADD b,R0
MOV R0,a
a=5+c
#
MOV c,R0
ADD 5,R0
MOV R0,a
Result:
Aim:
Code Optimization is an important phase of compiler. This phase optimizes the three
address codes into sequence of optimized three address codes.
Code Optimization
Intermediate code statements Optimized three address
codes(three address code)
A Simple but effective technique for locally improving the target code is peephole
optimization,
A method for trying to improve the performance of the target program
By examining a short sequence of target instructions and replacing these instructions by a
shorter or faster sequence whenever possible.
Characteristics of peephole optimization
1. Redundant instruction elimination
2. Flow of control information
3. Algebraic Simplification
4. Use of machine Idiom.
ALGORITHM:
Step4: Generate the left side and right side of each productions
Step5: Eliminate the dead codes which are not used and then the common
expression.
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<string.h>
struct op
{
char l;
char r[20];
}op[10],pr[10];
void main()
{
int a,i,k,j,n,z=0,m,q;
char *p,*l;
char temp,t;
char *tem;
clrscr();
printf("enter no of values");
scanf("%d",&n);
for(i=0;i<n;i++)
{
printf("left\t");
op[i].l=getche();
printf("right:\t");
scanf("%s",op[i].r);
}
printf("intermediate Code\n") ;
for(i=0;i<n;i++)
{
printf("%c=",op[i].l);
OUTPUT:
ENTER NO OF VALUES: 5
Left a right 9
Left b right c+d
Left e right c+d
Left f right b+e
Left r right f
INTERMEDIATE CODE
a=9
b=c+d
e=c+d
f=b+e
r=f
AFTER DEAD CODE ELIMINATION
b=c+d
e=c+d
f=b+e
r=f
ELIMINATE COMMON EXPRESSION
b=c+b
b=c+d
f=b+b
r=f
OPTIMIZED CODE
b=c+d
f=b+b
r=f
Result: