2021UCS1618 Compiler
2021UCS1618 Compiler
LABORATORY FILE
Principle of Compiler Construction
(COCSC14)
Submitted By:
Name: Shobhit
Roll Number: 2021UCS1618
Branch: CSE-3
Index
S.No. Practical Dates Remarks
1. To setup Lex(Flex) and Yacc(Bison) then print 29.08.2023
“Hello world!” using them.
Theory:
Lex (lexical analyzer) and Yacc (yet another compiler compiler) are powerful tools used in
compiler construction. They aid in the process of translating source code into structured data
that can be processed by compilers or interpreters.
- Lex generates lexical analyzers, which break down input text into smaller units called
tokens using regular expressions.
- Yacc generates parsers that analyze tokens and determine the structure of input using
context-free grammars.
To Install in Ubuntu/Debian:
- sudo apt update && sudo apt install bison flex -y
To Compiler && Run:
- lex hello.l
- yacc -d hello.y
- gcc lex.yy.c y.tab.c -o hello -ll
- ./hello
Code:
Hello.l (Lex):
%{
#include "y.tab.h"
%}
%%
"hello" { return HELLO; }
[ \t\n] ; /* Skip whitespace and newlines */
. ; /* Ignore any other characters */
%%
int yywrap() {
return 1;
}
Hello.y(Yacc)
%{
#include <stdio.h>
int yylex();
void yyerror(const char* msg);
%}
%token HELLO
%%
start: HELLO { printf("Hello, World!\n"); }
%%
int main() {
yyparse();
return 0;
}
Output:
Experiment-1
Aim: Write a program to separate tokens in Lex
Theory:
Tokenization is the process of breaking a sequence of characters into meaningful
chunks, called tokens. Tokens are fundamental building blocks in programming
languages, representing keywords, identifiers, constants, operators, and more. In this
experiment, you will use Lex to tokenize a sample C code file and categorize different
parts of the code into various types of tokens.
Code:
Tokens.l
%{
%}
%%
; printf("%s is an delimiter\n",yytext);
, printf("%s is a separator\n",yytext);
int yywrap(void)
{
return 1;
}
int main()
{
// reads input from a file named test.c rather than terminal
freopen("test.c", "r", stdin);
yylex();
return 0;
}
Test.c
int main()
{
int a = 10, b = 20;
int c = 0;
// find the greater integer
if (a < b)
c = b;
else
c = a;
// c is now the greater integer
return 0;
}
Output:
9
Experiment-2
Aim: Write a program to implement the lexical analysis phase of compiler.
Theory:
The lexical analysis phase, often referred to as the lexer or scanner, is the initial stage of a
compiler responsible for transforming the source code into a structured stream of tokens, where
tokens are the fundamental units like keywords, identifiers, numbers, and symbols, while
eliminating extraneous elements like whitespace and comments. This crucial process lays the
foundation for subsequent compiler phases, enabling syntax analysis and further translation or
execution of the program.
Code:
Tokens.l
%{
%}
%%
%%
int yywrap(void)
{
return 1;
}
int main()
{
// reads input from a file named test.c rather than terminal
freopen("test.c", "r", stdin);
yylex();
return 0;
}
Input.c
int main()
{
int a = 10, b = 20;
int c = 0;
// find the greater integer
if (a < b)
c = b;
Else
c = a;
// c is now the greater integer
return 0;
}
Output:
9
Experiment-5
Aim: Develop a lexical and syntax analyser for the same using the LEX and YACC tools.
Also, implement the bookkeeper module.
Theory:
In the process of compiler construction, the Lexical Analysis phase, implemented using tools
like LEX, transforms the source code into a sequence of tokens based on defined regular
expressions. The Syntax Analysis phase, facilitated by tools like YACC or Bison, parses these
tokens according to a specified context-free grammar, creating a parse tree. A bookkeeper
module manages symbol tables, type checking, and scope handling. These phases ensure the
conversion of human-readable source code into a structured representation for further
compilation stages.
Code:
Lexer.l
%{
#include "y.tab.h"
int countn=0;
%}
%option yylineno
alpha [a-zA-Z]
digit [0-9]
unary "++"|"--"
%%
%%
int yywrap() {
return 1;
}
Parser.y
%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<ctype.h>
#include"lex.yy.c"
int count=0;
int q;
char type[10];
extern int countn;
%}
%token VOID CHARACTER PRINTFF SCANFF INT FLOAT CHAR FOR IF ELSE TRUE FALSE
NUMBER FLOAT_NUM ID LE GE EQ NE GT LT AND OR STR ADD MULTIPLY DIVIDE
SUBTRACT UNARY INCLUDE RETURN
%%
body: FOR { add('K'); } '(' statement ';' condition ';' statement ')' '{' body '}'
| IF { add('K'); } '(' condition ')' '{' body '}' else
| statement ';'
| body body
| PRINTFF { add('K'); } '(' STR ')' ';'
| SCANFF { add('K'); } '(' STR ',' '&' ID ')' ';'
;
else: ELSE { add('K'); } '{' body '}'
|
;
arithmetic: ADD
| SUBTRACT
| MULTIPLY
| DIVIDE
;
relop: LT
| GT
| LE
| GE
| EQ
| NE
;
%%
int main() {
yyparse();
printf("\n\n");
printf("\t\t\t\t\t\t\t\t PHASE 1: LEXICAL ANALYSIS \n\n");
printf("\nSYMBOL DATATYPE TYPE LINE NUMBER \n");
printf("_______________________________________\n\n");
int i=0;
for(i=0; i<count; i++) {
printf("%s\t%s\t%s\t%d\t\n", symbol_table[i].id_name, symbol_table[i].data_type,
symbol_table[i].type, symbol_table[i].line_no);
}
for(i=0;i<count;i++) {
free(symbol_table[i].id_name);
free(symbol_table[i].type);
}
printf("\n\n");
}
void add(char c) {
q=search(yytext);
if(!q) {
if(c == 'H') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup(type);
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Header");
count++;
}
else if(c == 'K') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup("N/A");
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Keyword\t");
count++;
}
else if(c == 'V') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup(type);
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Variable");
count++;
}
else if(c == 'C') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup("CONST");
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Constant");
count++;
}
else if(c == 'F') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup(type);
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Function");
count++;
}
}
}
void insert_type() {
strcpy(type, yytext);
}
Input1.c
#include<stdio.h>
#include<string.h>
int main() {
int a;
int x=1;
int y=2;
int z=3;
x=3;
y=10;
z=5;
if(x>5) {
for(int k=0; k<10; k++) {
y = x+3;
printf("Hello!");
}
} else {
int idx = 1;
}
for(int i=0; i<10; i++) {
printf("Hello World!");
scanf("%d", &x);
if (x>5) {
printf("Hi");
}
for(int j=0; j<z; j++) {
a=1;
}
}
return 1;
}
Commands:
Output:
Experiment-7
Aim: To build a simple calculator in Lex and Yacc
Theory:
In this calculator program, we are using Lex and Yacc to tokenize and parse arithmetic
expressions. Lex generates a lexical analyzer, and Yacc generates a parser to evaluate
the expressions.
Code:
Calc.l
%{
#include<stdio.h>
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;
}
[\t] ;
[\n] return 0;
. return yytext[0];
%%
int yywrap()
{
return 1;
}
Calc.y
%{
#include<stdio.h>
int flag=0;
%}
%token NUMBER
%left '+' '-'
%left '*' '/' '%'
%left '(' ')'
%%
ArithmeticExpression: E{
printf("\nResult=%d\n",$$);
return 0;
};
E:E'+'E {$$=$1+$3;}
|E'-'E {$$=$1-$3;}
|E'*'E {$$=$1*$3;}
|E'/'E {$$=$1/$3;}
|E'%'E {$$=$1%$3;}
|'('E')' {$$=$2;}
| NUMBER {$$=$1;}
;
%%
void main()
{
printf("\nEnter Any Arithmetic Expression which can have operations Addition,
Subtraction, Multiplication, Divison, Modulus and Round brackets:\n");
yyparse();
if(flag==0)
printf("\nEntered arithmetic expression is Valid\n\n");
}
void yyerror()
{
printf("\nEntered arithmetic expression is Invalid\n\n");
flag=1;
}
Output:
Experiment-11
Aim: Represent ‘C’ language using Context Free Grammar.
Theory:
The provided Context-Free Grammar (CFG) defines a simplified version of the C programming
language. It uses symbols like `<program>` to represent language structures, terminals like `ID`
for specific elements, and production rules to outline how various parts of the language fit
together. While it doesn't cover the entire C language, this CFG serves as a basis for creating a
parser capable of understanding and analyzing the structure of C code.The provided
Context-Free Grammar (CFG) defines a simplified version of the C programming language. It
uses symbols like `<program>` to represent language structures, terminals like `ID` for specific
elements, and production rules to outline how various parts of the language fit together. While it
doesn't cover the entire C language, this CFG serves as a basis for creating a parser capable of
understanding and analyzing the structure of C code.
var -> ID
addop -> + | -
mulop -> * | /
empty ->
ID -> [a-zA-Z_][a-zA-Z0-9_]*
INT_LITERAL -> [0-9]+
FLOAT_LITERAL -> [0-9]+\.[0-9]+
CHAR_LITERAL -> '[a-zA-Z0-9]'
Result:
Given Context Free Grammer defines the rules for ‘C’ language.
Practical 9
Aim: Implement a two-pass assembler 8085/8086
Theory:
A two-pass assembler for 8085/8086 in C involves a two-step process to convert assembly
language code into machine code. In the first pass, the assembler scans the source code,
creates a symbol table with addresses for labels and variables, and generates intermediate
code. Syntax errors are checked at this stage. In the second pass, the assembler uses the
symbol table to replace symbolic addresses with actual values, resolves forward references, and
generates the final machine code. The two-pass approach ensures that all symbols are correctly
addressed and facilitates the creation of error-free machine code for execution by the target
processor. Implementation in C requires data structures for the symbol table and intermediate
code, along with algorithms for address resolution and code generation.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
char label[10];
int address;
} SymbolTableEntry;
typedef struct {
char opcode[10];
char operand[10];
char label[10];
int location;
} IntermediateCode;
int main() {
FILE *inputFile;
SymbolTableEntry symbolTable[MAX_SIZE];
int symbolTableSize = 0;
IntermediateCode intermediateCode[MAX_SIZE];
int intermediateCodeSize = 0;
inputFile = fopen("input.asm", "r");
if (inputFile == NULL) {
perror("Error opening file");
return 1;
}
passOne(inputFile, symbolTable, &symbolTableSize);
fseek(inputFile, 0, SEEK_SET);
passTwo(inputFile, symbolTable, symbolTableSize, intermediateCode,
&intermediateCodeSize);
fclose(inputFile);
printf("\nIntermediate Code:\n");
printf("Location\tOperand\tLabel\tOpcode\n");
for (int i = 0; i < intermediateCodeSize; i++) {
printf("%d\t\t%s\t%s\t%s\n", intermediateCode[i].location, intermediateCode[i].operand,
intermediateCode[i].label, intermediateCode[i].opcode);
}
return 0;
}
void passOne(FILE *inputFile, SymbolTableEntry symbolTable[], int *symbolTableSize) {
char line[100];
int locationCounter = 0;
Input.asm file :
Output :
Conclusion:
Two-pass assembler for 8085/8086 successfully implemented in C lang.
Practical 12
Aim : Add assignment statement, If then else statement and while loop to the
calculator and generate the three address code for the same.
Theory :
In the context of enhancing a calculator program, the addition of control flow constructs like
assignment statements, if-then-else statements, and while loops contributes to its functionality.
An assignment statement allocates values to variables, allowing for dynamic input. The
if-then-else statement introduces conditional logic, enabling the calculator to make decisions
based on specified conditions. Meanwhile, the inclusion of a while loop facilitates repetitive
calculations until a certain condition is met. To represent these constructs in a format suitable
for machine processing, the three-address code is generated. This code consists of instructions
using three operands, where the first two represent the source operands, and the third
represents the destination operand. It offers a structured way to represent complex
computations and control flow in a form that can be easily translated into machine code or
intermediate code for further processing.
Code :
#include <stdio.h>
#include <string.h>
int i, choice, j, l, address = 100;
char userInput[10], expr[10], expr1[10], expr2[10], id1[5], op[5], id2[5];
int main()
{
printf("Enter the Expression : ");
scanf("%s", userInput);
strcpy(expr, userInput);
l = strlen(expr);
expr1[0] = '\0';
for (i = 0; i < 2; i++)
{
if (expr[i] == '+' || expr[i] == '-')
{
if (expr[i + 2] == '/' || expr[i + 2] == '*')
{
strrev(expr);
j = l - i - 1;
strncat(expr1, expr, j);
strrev(expr1);
printf("Three Address Code\nT = %s\nT1 = %c%cT\n", expr1,
expr[j + 1], expr[j]);
}
else
{
strncat(expr1, expr, i + 2);
printf("Three Address Code\nT = %s\nT1 = T%c%c\n", expr1, expr[i + 2], expr[i + 3]);
}
}
else if (expr[i] == '/' || expr[i] == '*')
{
strncat(expr1, expr, i + 2);
printf("Three Address Code\nT = %s\nT1 = T%c%c\n", expr1, expr[i + 2],
expr[i + 3]);
}
}
return 0;
}
Output :
Conclusion-
Conditional statements with while loop for calculator and 3 address code generator
successfully implemented.