0% found this document useful (0 votes)

84 views

Process of Execution of A Program:: Compiler Design

The document discusses the process of compiling a program from a high-level language to machine code. It involves several steps: 1. A user writes a program in a high-level language like C. 2. The program is compiled from the high-level language to assembly code by a compiler. 3. An assembler then translates the assembly code to machine code (object code). 4. A linker links all parts of the program together to create executable machine code, which is loaded into memory by a loader and executed by the processor.

Uploaded by

Naresh Software

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views

Process of Execution of A Program:: Compiler Design

Uploaded by

Naresh Software

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

UNIT-1 COMPILER DESIGN

UNIT-1
INTRODUCTION
Process of execution of a program:

The hardware understands a language, which humans cannot understand. So we write programs
in high level language, which is easier for us to understand and remember. These programs are
then fed into a series of tools and OS components to get the desired code that can be used by the
machine. This is known as Language Processing System.

The high-level language is converted into binary language in various phases. A compiler is a
program that converts high-level language to assembly language. Similarly, an assembler is a
program that converts the assembly language to machine-level language.

Let us first understand how a program, using C compiler, is executed on a host machine.

  User writes a program in C language (high-level language).

 The C compiler compiles the program and translates it to assembly program (low level
 language).
  An assembler then translates the assembly program into machine code (object).
 A linker tool is used to link all the parts of the program together for execution (executable
 machine code).
 A loader loads all of them into memory and then the program is executed.

Before diving straight into the concepts of compilers, we should understand a few other tools that
work closely with compilers.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 1

UNIT-1 COMPILER DESIGN

PREPROCESSOR:

A preprocessor produce input to compilers. They may perform the following functions.

1. Macro processing: A preprocessor may allow a user to define macros that are short hands for
longer constructs.

2. File inclusion: A preprocessor may include header files into the program text.

3. Rational preprocessor: these preprocessors augment older languages with more modern flow-
of-control and data structuring facilities.

4. Language Extensions: These preprocessor attempts to add capabilities to the language by

certain amounts to build-in macro.

COMPILER:

Compiler is a translator program that translates a program written in (HLL) the source program
and translates it into an equivalent program in (MLL) the target program. As an important part of
a compiler is error showing to the programmer.

Executing a program written n HLL programming language is basically of two parts. the source
program must first be compiled translated into a object program. Then the results object program
is loaded into a memory executed

ASSEMBLER: programmers found it difficult to write or read programs in machine language.

They begin to use a mnemonic (symbols) for each machine instruction, which they would
subsequently translate into machine language. Such a mnemonic machine language is now called
an assembly language. Programs known as assembler were written to automate the translation of
assembly language in to machine language. The input to an assembler program is called source
program, the output is a machine language translation (object program).

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 2

UNIT-1 COMPILER DESIGN

Assembly language Machine code

Source program

COMPILER ASSEMBLER

INTERPRETER:

An interpreter is a program that appears to execute a source program as if it were machine

language. Which produces the result directly when the source language and data is given to it as
input.

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses
interpreter. The process of interpretation can be carried out in following phases.

1. Lexical analysis
2. Synatx analysis
3. Semantic analysis
4. Direct Execution

Advantages:

 Modification of user program can be easily made and implemented as execution

 proceeds.
  Type of object that denotes various may change dynamically.
 Debugging a program and finding errors is simplified task for a program used for
 interpretation.
 The interpreter for the language makes it machine independent.

Disadvantages:

  The execution of the program is slower.

 Memory consumption is more.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 3

UNIT-1 COMPILER DESIGN

The difference between an interpreter and a compiler is given below:

LOADER AND LINK-EDITOR:

Once the assembler procedures an object program, that program must be placed into memory and
executed. The assembler could place the object program directly in memory and transfer control
to it, thereby causing the machine language program to be execute. This would waste core by
leaving the assembler in memory while the user’s program was being executed. Also the
programmer would have to retranslate his program with each execution, thus wasting translation
time. To overcome this problem of wasted translation time and memory, System programmers
developed another component called loader.

“A loader is a program that places programs into memory and prepares them for execution.” It
would be more efficient if subroutines could be translated into object form the loader could
”relocate” directly behind the user’s program. The task of adjusting programs o they may be
placed in arbitrary core locations is called relocation. Relocation loaders perform four functions.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 4

UNIT-1 COMPILER DESIGN

STRUCTURE OF A COMPILER: A compiler can broadly be divided into two phases based
on the way they compile.

Analysis Phase: Known as the front-end of the compiler, the analysis phase of the compiler
reads the source program, divides it into core parts, and then checks for lexical, grammar, and
syntax errors. The analysis phase generates an intermediate representation of the source program
and symbol table, which should be fed to the Synthesis phase as input.

Synthesis Phase: Known as the back-end of the compiler, the synthesis phase generates the
target program with the help of intermediate source code representation and symbol table. A
compiler can have many phases and passes.

Pass: A pass refers to the traversal of a compiler through the entire program.

Phase: A phase of a compiler is a distinguishable stage, which takes input from the previous
stage, processes and yields output that can be used as input for the next stage. A pass can have
more than one phase.

PHASES OF A COMPILER:

A compiler operates in phases. A phase is a logically interrelated operation that takes source
program in one representation and produces output in another representation. The phases of a
compiler are shown in below there are two phases of compilation.

a. Analysis (Machine Independent/Language Dependent)

b. Synthesis (Machine Dependent/Language independent)
Compilation process is partitioned into no-of-sub processes called ‘phases’.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 5

UNIT-1 COMPILER DESIGN

Lexical Analysis: The first phase of scanner works as a text scanner. This phase scans the source
code as a stream of characters and converts it into meaningful lexemes. Lexical analyzer
represents these lexemes in the form of tokens as:

<token-name, attribute-value>

Syntax Analysis: The next phase is called the syntax analysis or parsing. It takes the token
produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code grammar, i.e., the parser checks if the
expression made by the tokens is syntactically correct.

Semantic Analysis: Semantic analysis checks whether the parse tree constructed follows the
rules of language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and
expressions; whether identifiers are declared before use or not, etc. The semantic analyzer
produces an annotated syntax tree as an output.

Intermediate Code Generation: After semantic analysis, the compiler generates an

intermediate code of the source code for the target machine. It represents a program for some
abstract machine. It is in between the high-level language and the machine language. This

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 6

UNIT-1 COMPILER DESIGN

intermediate code should be generated in such a way that it makes it easier to be translated into
the target machine code.

Code Optimization: The next phase does code optimization of the intermediate code.
Optimization can be assumed as something that removes unnecessary code lines, and arranges
the sequence of statements in order to speed up the program execution without wasting resources
(CPU, memory).

Code Generation In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code generator translates the
intermediate code into a sequence of (generally) re-locatable machine code. Sequence of
instructions of machine code performs the task as the intermediate code would do.

Symbol Table It is a data-structure maintained throughout all the phases of a compiler. All the
identifiers’ names along with their types are stored here. The symbol table makes it easier for the
compiler to quickly search the identifier record and retrieve it. The symbol table is also used for
scope management.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 7

UNIT-1 COMPILER DESIGN

ROLE OF LEXICAL ANALYZER:

It is the first phase of compiler. It reads input source program form left to right one character at a
time and generates the sequence of tokens.

Each token is a single logical cohesive unit such as identifier, keywords, operations and
punctuation marks. Then the parser to determine the syntax of the source program can use these
tokens.

The role of lexical analyzer in the process of compilation is given below.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 8

UNIT-1 COMPILER DESIGN

As the lexical analyzer scans the program to recognize the tokens it is also called as scanner.
Apart from token identification lexical analyzer also performs following functions.

Functions of lexical analyzers:

  It produces stream of tokens

  It eliminates blank and comments
 It generates symbol table which stores the information about identifiers, constants
 encountered in the input.
  It keeps track of line numbers.
 It reports error encounter while generating the tokens.

The lexical analyzer works in two phases. In first phase it performs scan and in the
second phase it does lexical analysis means it generates the series of tokens.

DIFFERENCES BETWEEN LEXICAL ANALYSIS VS PARSING:

LEXICAL ANALYSIS PARSING

It is one of the phases in compilation process in It is one of the phase in the compilation
which the stream of tokens is generated by the processes
scanning of source data. In which the stream of tokens is obtained from
lexical analysis phase for building the parse
free.
This phase is also recognized as scanning This phase is also recognized as syntax
phase analyzing phase
The input buffering scheme is used to scanning The top down and bottom up parsing
the source code techniques are used for syntax analysis
The regular expressions and finite automata are The context free grammars are used in the
used in the design of lexical analysis design of parsing
lex is an automated tool which is used to The yacc is an automated tool which is used to

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 9

UNIT-1 COMPILER DESIGN

generate lexical analyzer generate syntax analyzer.

TOKEN, PATTERNS & LEXEME: Let us learn some terminologies which are frequently
used when we talk about the activity of lexical analysis.

Tokens: it describes the class or category of input string. For example identifier, key words,
constant are called tokens.

Patterns: set the rule that describes the token.

Lexemes: sequence of characters in the source program that the matched with the pattern of the
token for example int, i, num, ans

For example if (a<b)

Here “if” “(“ “a” “<” “b” “)” are all lexemes.

If is a keyword, “(“ is a open parentheses, ‘a’ is identifier.

Now to define the identifier pattern could be

  Identifier is collection of letters.

 Identifier is a collection of alpha numeric character and identifiers beginning character
should be necessarily a letter.

When we want to compile a given source program we submit this program to compiler. A
compiler scans the program and produces the sequence of tokens therefore lexical analysis is also
called as scanner.

LEXEME TOKEN
Int Key word
Max Identifier
( Operator
Int Keyword
A Identifier
, Operator
B Identifier
{ operator

The blank and new line characters can be ignored.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 10

UNIT-1 COMPILER DESIGN

Lexical errors:

These types of errors can be detected during lexical analysis phase, typical lexical phase errors
are

  Exceeding length of identifier (or)numeric constants,

  Appearance of illegal characters.
 Unmatched string

Consider if(“ \n hello India ”); $

This is also a lexical error as an illegal character $ appears at the end of the statement. If length
of the identifier gets exceeded the errors occurs.

INPUT BUFFERING:

The lexical analyzer scans the input string from left to right one character at a time .it uses two
pointers begin-ptr(bp) and forward –ptr(fp) to keep frack of the portion input scanned . Initially
both the pointers point to the first character of the input string. As shown below.

i N t i , j ; i = i + 1 ; j = j + 1 ;

fp
Initial configuration

The forward –ptr moves ahead to search for end lexeme .as soon as the blank space is
encountered. It indicates end of lexeme. In above example as soon as forward –ptr(fp)encounters
a blank space the lexeme “int” is identified.

fp Figure

The fp will be moved ahead at white space. When fp encounters white space. It ignore
and moves ahead . Then both the begin –ptr (bp) and forward-ptr(fp) are set at next token i.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 11

UNIT-1 COMPILER DESIGN

The input character is read from secondary storage. But reading in this way from secondary
storage is costly. Hence buffering technique is used.

A block of data is first read into a buffer and then scanned by lexical analyzer .there are two
methods used in this context:

  one buffer scheme

 Two buffer scheme.

One buffer scheme: In this only one buffer is used to store the input string .but the problem with
this scheme is that if lexeme is very long then it crosses the buffer boundary, to scan rest of the
lexeme the buffer has to be refilled ,that makes overwriting the first part of the lexeme.

i n T i = i + 1
fp

Two buffer scheme: to overcome the problems of one buffer scheme, in this method two buffers
are used to store the input string.


 The first buffer and second buffer are scanned alternatively when end of current buffer is
reached the other buffer is filled

The only problem with this method is that is length of the 
lexeme is larger than length of the
 buffer then scanning input cannot be scanned completely.

Initially both the fp and bp are pointing tothe first character of first buffer then fp moves
 towards right in search of end of lexeme.

As soon as blank character is recognized the string between bp and fp is identified as
corresponding token .to identify the boundary of first buffer “end of buffer” character
should be placed at the end of first buffer.

Similarly end of second buffer is also recognized by the end of buffer mark present at the end of
second buffer. When fp encounters first eof then one can recognize end of first buffer and hence
filling up of second buffer is started. bp

Buffer 1 i n t i = i + 1

Buffer 2 ; j = j + 1 ; eof
fp

In the same way when second eof is obtained .then it indicates end of second buffer.
Alternatively both the buffers can be filled up until end of the input program & stream of token is
identified. This eof character introduces at the end is called sentinel.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 12

UNIT-1 COMPILER DESIGN

Code for input buffering:

If(fp==eof(buff 1))
{
Fp++; /*refill buffer 2*1
}
Else if ((fp==eof(buff2)))
{
fp ++ / *refill buffer 1*1
}
Else if (fp==eof(input))
Return;
Else fp++;

Regular expressions: Regular expressions are mathematically symbolisms which describe the
set of strings of specific language. It provides convenient and useful notation for representing
tokens.

Here are some rules that describe definition of the regular expression over the input set denoted
by ∑.

1. ∑ is a regular expression that denotes the set containing empty string.

2. If R1 and R2 are regular expressions then R=R1+R2 is also regular expression which
represents union operation.
3. If R1 and R2 are regular expressions then R=R1.R2 is also a regular expression, which
represents concatenation operation.
4. If r1 is a regular expression then R=R1* is also a regular expression which represents
kleen closure.

A language denoted by regular expressions is said to be a regular set (or) a regular language

Problems: write a regular expression for a language containing the strings of length two over
∑={0, 1}

Sol: R.E=(0+1)(0+1)


Write
 a regular expression for a language containing strings which end with “add”over ∑={a,
 b}

Sol: (a+b)* add.

Recognizing of tokens: for a programming language there are various types of tokens such as
identifier, key words, constants, operators and so on. The token is usually represented by a pair
token type& token value

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 13

UNIT-1 COMPILER DESIGN

Token type Token value

The token type tells us the category of token and token value gives the information regarding
token .the token value is also called as token attribute. During lexical analysis process the
symbol table is maintained.

The token value can be a pointer to symbol table in case of identifier and constant. The lexical
analyzer reads the input program and generates a symbol table for tokens.

We will consider some encoding of tokens as follows

Token Code Value

If 1 -
Else 2 -
While 3 -
For 4 -
Identifier 5 Ptr to symbol table
Constant 6 Ptr to symbol table
< 7 1
<= 7 2
> 7 3
>= 7 4
!= 7 5
( 8 1
) 8 2
+ 9 1
- 9 2
= 10 -

Our lexial analyzer will generate following token stream. Consider code

if(a<10)

I=i+2;

Else

I=i-2;

1, (8,1),(5,100),(7,1),(6,105),(8,2),(5,107)(9,1)(6,10). 2,(5,107),10,(5,107),(9,2)(6,110).

The corresponding symbol table for identifiers and constant will be

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 14

UNIT-1 COMPILER DESIGN

Location Type Value

counter
100 Identifier A
.
.

105 Constant 10
.
107 Identifier i
.
110 Constant 2

REGULAR EXPRESSION: Regular expression is used to store describe tokens of a

programming language. RE is built or consisting of smaller RE.

 Each RE denotes a language .the language denoted by regular expression is called as

 regular set.
 Here are some rules that describe definition of the regular expression over the ip set
 denoted by∑.
  The regular expression defined as.
  ∑ is a RE denoting a language which has an empty string.
 A is a RE denoting a language containing only{a}

Suppose r and s are regular expressions denoting the language L(r) and L(s) then

1. (r)/(s) is a regular expression denoting the L(r) U L(S) which represents the union
operation.
2. (r). (s) is a Re denoting L(r).L(s), which represents the concatenation operation.
3. (r)* is a RE denoting (L(r))*, which represents the kleen closure.

Ex: let ∑={a, b)

  The RE a/b denotes the set{a,b}

  The RE’s (a/b)(a/b) denotes {aa, ab,ba,bb} the set of all strings of a’s &b’s of length two.
 Another RE for this same subset is aa/ab/ba/bb.
 The RE a* denotes the set of all strings zero / more a’s
{∑, a,aa,aaa,……….}
  The RE (a/b)* denotes the set containing the string a and all string consisting of zero
/more a’s followed by ab.

AXIOM DESCRIPTION

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 15

UNIT-1 COMPILER DESIGN

r/s=s/r / is commutative

r/(s/t)=(r/s)/t / is associate

(rs)t=r(st) concatenation is associative

R(s/t)=rs/rt concatenation is distribution over/

(s|t)r=sr/tr concatenation is distribution over/

€r=r

r€=r E is identify element for

concatenation.

R=(r/E) relation between * and E

R**=r* * is idempotent.

RECOGNIZING TOKENS:

For a programming language there are various types of tokens such as identifier, key words,
constants and operations so on. The token is usually represented by a pair, token type and toke
value.

Token type Token value

The token type tells us the category of token and toke value gives us the information regarding
token.

Ex: consider the grammar,


S iEts/iEtses/E

E T relop T/T

T id/num.
Here terminals are I, t, e, relop, id and num. They generate set of string given by following
regular definitions.


i i

t- t

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 16

UNIT-1 COMPILER DESIGN


e e

relop =|<|>|<=|>=|<>

id letter (letter / digit)*

num digit (. Digit +)?

Letter [A-Z a……….z]

Digit [o………q]
These are the patterns for token for expression

Lexemes Token name Attribute value

I I -
E E -
T T -
= Relop EQ
< Relop LT
> Relop GT
>= Relop GE
<= Relop LE
<> Relop NE
Any id Id pointer to table
entry
Any num Num pointer to table
entry

REGULAR DEFINITION FOR LANGUAGE CONSTRUCTS:

Regular grammar: A regular grammar (G) is defined as

G=<V, T, P, S> where

V= set of variables

T= set of terminals

P=set of production

S= start symbol


Ex: let A ab

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 17

UNIT-1 COMPILER DESIGN


B B/E
Then the grammar G is defined as

V= {A,B}

T= {a,b}

 
P= {A aB, B b|€}
S= {A}

This is called as regular language and it can be represented by DFA .regular grammar which can
represent by using finite automata.

Sequence of string: Any string formed by removing Zero /more not necessarily the contiguous
symbol is called sequence of strings

for example.

Multiline=>mutine can be sequence of string.

In above example, scanner scans the input string and recognize “if” as a key word and returns
token type as 1 since in given encoding code 1 indicates keyword “if” and hence 1 is at the
beginning of token stream.

Next is a pair (8, 1) where 8 indicates parenthesis And ‘1’ indicates opening parenthesis”
(”. Then we scan the input ‘a’ if recognize it as identifier and searches the symbol table to check
whether the same entry is present. If not it inserts the information about this identifier in symbol
table and returns 10.

If the same identifier or variable is already present in symbol table than lexical analyzer does not
insert it into the table inserted it returns the location where it is present.

REGULAR DEFINITION FOR LANGUAGE CONSTRUCTS:

1) Strings: string is a collection of finite number of alphabets or letters. The strings are
synonymously called as words.
  
The length of a string is denoted by |s|
  
The empty string can be denoted by €
 
The empty set of string is denoted by φ


2) Sequences: Following terms are commonly used in strings.



Prefix of string: A string obtained by removing zero or more tail symbols. For ex, for string
Hindustan the prefix could be “Hindu”.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 18

UNIT-1 COMPILER DESIGN


  zero or more loading symbol for ex the
Suffix of string: A string obtained by removing
string “Hindustan” Suffix could be “stan”.
 
Sub string: A string obtained by removing prefix and suffix of a given string is called
substring. For example the string “Hindustan”, The string “indu” can be substring.
3) Comments: The regular expression for the comment statement can be

r.e=// (letter+digit+whitespace)*

The regular expression for multipleline comments statement can be

r.e=/* (letter+digit+whitespace+newline)* */

TRANSITION DIAGRAM FOR REORGANIZATION OF TOKEN:

Lexical analysis is a process of recognizing tokens from input source program. To recognize
tokens lexical analyzer performs following steps.

Step 1: lexical analyzer store the input in input buffer.

Step2: the token is read from input buffer and regulator expression is built for corresponding
token.

Step 3: from these regular expressions finite automata is built .the finite automata is usually in
non deterministic form. That means a non-deterministic finite automaton is built.

Step4: for each state of NFA, a function is designed and each input along the translation edges
corresponds to input parameters of these functions.

Step 5: The set of such functions ultimately create lexical analyzer program.

The finite automata are typically represented using transition diagram. The transition graph can
be defined as collection of

1. Finite set of states K

2. 2. Finite set of symbol ∑
3. 3. A non empty set S of K. it is called start state
4. A set F<= K of final states.

5. A transition function K*A K with a K as state and A as input from ∑*

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 19

UNIT-1 COMPILER DESIGN

RESERVED WORDS AND IDENTIFIERS:

Reserved words are the special words used in the programming language which is associated
with some means for example if, else, while, for, break, pf, sf, exit so on are the reserved words
that are used in C language. The lexical analyzer should identify the reserved words correctly.

The identifier is a kind of variable which stores some values. It is a collection of letters or
alphanumeric letters. The first letter of the identifier must always be a letter.

SIGNIFICANCE OF LEXEMENS WITH LONGEST PREFIX:

The lexical analysis is a process of recognizing tokens from source program and complier does
this job by constructing recognizer that looks for the lexemes stored in the input buffer. This
working is based on the rule “if more than one pattern matches then recognizer has to choose the
longest lexeme matched.

LEX: For efficient design of compiler various tools have been built for constructing lexical
analyzer using the special purpose notations called regular expressions.

The regular expressions are used in recognizing the tokens. Now we will discuss a special
language that specifies the tokens the tokens using regular expression. A tool called LEX gives
this specification.

Lex scans the source program in order to get the stream of tokens and these are related together
so that various programming constructs such as expressions block of statements, procedures,
control structure can realized.

This task of relating the tokens to gather is known as parsing.

During the parsing of the program the rules are defined to establish the relationship between the
tokens. These roles are called grammar

The YACC yet another complier is another automated tool which is used to specify the grammar
for realizing the source programming constructs.

YACC takes the description of a grammar in some specification file & produces the C routine
called parser. Thus LEX and YACC are two important utilities that generate the lexical analyzer
and syntax analyzer.

LEX: LEXICAL ANALYZER GENERATOR:

For efficient design of complier various tools have been built for constructing lexical analyzer
using the special purpose notation called regular expressions.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 20

UNIT-1 COMPILER DESIGN

The regular expressions are used in recognizing the tokens .now we will discuss a special
language that specifies the tokens. Using regular expressions, a tool called LEX gives this
specification. Basically LEX is a Unix utility which generates the analyzer.

A LEX laxer is very much faster in finding the token as compared to the handwritten LEX
program in C.

The lex specification file can be created using the extension .l. For example the specification file
can be x.1. This lex.yy.c is a C program which is actually a lexical analyzer program.

The lex specification files stores the regular expressions for the tokens and the lexyy.c file
consists of the tabular representation of the transition diagrams constructed from the regular
expression.

LEX compiler Lex.yy.c

Lex source
program

C compiler
a.out
lex.yy.c

a.out Sequence
input stream of tokens

The lexemes can be represented recognized eith the help of this tabular representation of
transition diagram.

The action associated with regular expression in lex.l are pieces of a C code and are carried our
directly to lex.yy.c.

Finally lex.yy.c is run through the C complier to produces an object program a.out, which is the
lexical analyzer that transforms an input stream into a sequence of tokens.

Recognizing words with LEX:

Lex program consists of three sections.

1. Declaration section
2. Rule section
3. Procedure section (auxiliary procedure section)

Lex source program has the basic format as

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 21

UNIT-1 COMPILER DESIGN

Declaration section

Rule section

Auxiliary procedure section.

Declaration section:


 
Declaration of variables is done in declaration section regular definitions can also be written
here.

 is used to define macros and is used to improve important
In general the definition section
headers files written in C.

Rule section: Rule section consists of regular expressions with associated actions. The
translations rules can take the format as

R1{action}

R2{action}

Rn{action}


 action I describe the action that action needs to
Here Ri indicates the regular expression and
take for corresponding regular expression.

Rule section is the most importantsection here the patterns are associated with C statements
pattern are nothing but regular C.

Auxilary procedure section:


 In this section required procedures are being defined these procedures may also be required
by the action in the role section.

It also called as C –code section it contains C statements and function they consist of code
 called by the rules in the rules sec.

Ex : %{

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 22

UNIT-1 COMPILER DESIGN

“RAMA”
“SITA”
“geeta”/ pf(“\n noun”)
“sings”/
“dances”/ pf(“\n verb”)
“eat”
%%
Main()
{
Yylex();
}
Int yywrap()
{
Return I;
}

USE OF SYMBOL TABLE IN LEX:


During the process of compilation it would be always efficient to have a symbol table so
that while lexer is running we can add new words without modifying or recompiling the
 lex program. There could be two important activities that  are associated with the symbol
table and those are insert –word () and search –word().
  The insert word will insert the newly encountered word into the symbol table . 
 
The search word will perform the look up activity.

Parsing command line with LEX:

The command line parameters are the parameters that are appearing on the prompt. The
command line interface is the interface. This allows user to interact with the computer by typing
the commands. In C we can pass these parameters to the main function in the form of character
array.

Ex: $ cp abc.txt pqr.txt

Here argv[0]=cp

argv[1]=abc.txt

argv[2]=pqr.txt.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 23

UNIT-1 COMPILER DESIGN

THE EVOLUTION OF PROGRAMMING LANGUAGES

The first electronic computers appeared in the 1940's and were programmed in machine language
by sequences of O's and l's that explicitly told the computer what operations to execute and in
what order. The operations themselves were very low level: move data from one location to
another, add the contents of two registers, compare two values, and so on. Needless to say, this
kind of programming was slow, tedious, and error prone. And once written, the programs were
hard to understand and modify.

The Move to Higher-level Languages

The first step towards more people-friendly programming languages was the
development of mnemonic assembly languages in the early 1950's. Initially, the instructions in
an assembly language were just mnemonic representations of machine instructions. Later, macro
instructions were added to assembly languages so that a programmer could define parameterized
short hands for frequently used sequences of machine instructions.

Impacts on Compilers
Since the design of programming languages and compilers are intimately related, the
advances in programming languages placed new demands on compiler writ-ers. They had to
devise algorithms and representations to translate and support the new language features. Since
the 1940's, computer architecture has evolved as well. Not only did the compiler writers have to
track new language fea-tures, they also had to devise translation algorithms that would take
maximal advantage of the new hardware capabilities.

THE SCIENCE OF BUILDING A COMPILER

Compiler design is full of beautiful examples where complicated real-world problems are
solved by abstracting the essence of the problem mathematically. These serve as excellent
illustrations of how abstractions can be used to solve problems: take a problem, formulate a
mathematical abstraction that captures the key characteristics, and solve it using mathematical
techniques. The problem formulation must be grounded in a solid understanding of the
characteristics of computer programs, and the solution must be validated and refined empirically.

Modelling in Compiler Design and Implementation:

The study of compilers is mainly a study of how we design the right mathematical models and
choose the right algorithms, while balancing the need for generality and power against simplicity
and efficiency.
The Science of Code Optimization
The term "optimization" in compiler design refers to the attempts that a com-piler makes
to produce code that is more efficient than the obvious code. "Op-timization" is thus a misnomer,
since there is no way that the code produced by a compiler can be guaranteed to be as fast or
faster than any other code that performs the same task.

APPLICATIONS OF COMPILER TECHNOLOGY

Compiler design is not only about compilers, and many people use the technology learned by
studying compilers in school, yet have never, strictly speaking, written (even part of) a compiler
for a major programming language. Compiler technology has other important uses as well.
Additionally, compiler design impacts several other areas of computer science. In this section,
we review the most important interactions and applications of the technology.
Implementation of High-Level Programming
Languages
A high-level programming language defines a programming abstraction: the programmer
expresses an algorithm using the language, and the compiler must translate that program to the
target language. Generally, higher-level programming languages are easier to program in, but are
less efficient, that is, the target programs run more slowly. Programmers using a low-level
language have more control over a computation and can, in principle, produce more efficient
code.

Optimizations for Computer Architectures

The rapid evolution of computer architectures has also led to an insatiable demand for new
compiler technology. Almost all high-performance systems take advantage of the same two basic
techniques: parallelism and memory hierarchies. Parallelism can be found at several levels: at
the instruction level, where multiple operations are executed simultaneously and at
the processor level, where different threads of the same application are run on different
processors. Memory hierarchies are a response to the basic limitation that we can build very fast
storage or very large storage, but not storage that is both fast and large.

Parallelism
All modern microprocessors exploit instruction-level parallelism. However, this parallelism can
be hidden from the programmer. Programs are written as if all instructions were executed in
sequence; the hardware dynamically checks for dependencies in the sequential instruction stream
and issues them in parallel when possible.
Memory Hierarchies
A memory hierarchy consists of several levels of storage with different speeds and sizes, with
the level closest to the processor being the fastest but small-est. The average memory-access
time of a program is reduced if most of its accesses are satisfied by the faster levels of the
hierarchy. Both parallelism and the existence of a memory hierarchy improve the potential
performance of a machine, but they must be harnessed effectively by the compiler to deliver real
performance on an application.

PROGRAMMING LANGUAGE BASICS

If a language uses a policy that allows the compiler to decide an issue, then we say that the
language uses a static policy or that the issue can be decided at compile time. On the other hand,
a policy that only allows a decision to be made when we execute the program is said to be a
dynamic policy or to require a decision at run time.

G.LAVANYA, Asst.Prof, NIT, NARASARAOPETA Page 24

Report Fitting
No ratings yet
Report Fitting
33 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
101 pages
Compiler Design
No ratings yet
Compiler Design
59 pages
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Compiler Design Unit 1 Notes
No ratings yet
Compiler Design Unit 1 Notes
21 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
AT&FL Lab 11
No ratings yet
AT&FL Lab 11
6 pages
Assembler: Cousins of Compiler
0% (1)
Assembler: Cousins of Compiler
7 pages
Chapter 3 Regular Expression
No ratings yet
Chapter 3 Regular Expression
25 pages
CD Course File
No ratings yet
CD Course File
114 pages
10marks: 1.what Are The Various Data Types in C?Explain
100% (1)
10marks: 1.what Are The Various Data Types in C?Explain
4 pages
Compiler Unit 1
No ratings yet
Compiler Unit 1
110 pages
C++ Lab Manual
No ratings yet
C++ Lab Manual
26 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Compiler Design Code Generation
No ratings yet
Compiler Design Code Generation
4 pages
JNTUH Data Structures Important Questions
No ratings yet
JNTUH Data Structures Important Questions
2 pages
III Year-V Semester: B.Tech. Computer Science and Engineering 5CS4-02: Compiler Design UNIT-1
100% (1)
III Year-V Semester: B.Tech. Computer Science and Engineering 5CS4-02: Compiler Design UNIT-1
11 pages
Compiler Design
No ratings yet
Compiler Design
188 pages
Unit 3 PPL
No ratings yet
Unit 3 PPL
16 pages
Latex Typesetting For Beginners: Credit Distribution, Eligibility and Pre-Requisites of The Course
No ratings yet
Latex Typesetting For Beginners: Credit Distribution, Eligibility and Pre-Requisites of The Course
2 pages
Yacc
No ratings yet
Yacc
5 pages
Computer Programming and Utilization - 2110003 PDF
50% (2)
Computer Programming and Utilization - 2110003 PDF
5 pages
Module-2 Lexical Analyzer
No ratings yet
Module-2 Lexical Analyzer
36 pages
Web Development Using PHP
No ratings yet
Web Development Using PHP
65 pages
CD PPTS 2
No ratings yet
CD PPTS 2
27 pages
UNIT-V NLP
No ratings yet
UNIT-V NLP
25 pages
Advance Algorithms PDF
0% (2)
Advance Algorithms PDF
2 pages
FAFL Notes (2010) (SJBIT) PDF
No ratings yet
FAFL Notes (2010) (SJBIT) PDF
124 pages
MODULE 3 Syntax Analysis
100% (1)
MODULE 3 Syntax Analysis
182 pages
Office Automation
No ratings yet
Office Automation
14 pages
Unit1 Introduction Algorithm
No ratings yet
Unit1 Introduction Algorithm
161 pages
Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
No ratings yet
Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
101 pages
Computer Programming Lab Manual
No ratings yet
Computer Programming Lab Manual
159 pages
Core Python Syllabus
No ratings yet
Core Python Syllabus
5 pages
C Programming Basic Concepts VV
No ratings yet
C Programming Basic Concepts VV
45 pages
Class 12 Computer Notes by Binod Rijal
100% (1)
Class 12 Computer Notes by Binod Rijal
31 pages
Re To DFA
No ratings yet
Re To DFA
6 pages
MCSE 204: Adina Institute of Science & Technology
No ratings yet
MCSE 204: Adina Institute of Science & Technology
16 pages
Chapter 7: Deadlocks: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
No ratings yet
Chapter 7: Deadlocks: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
33 pages
Oops Notes
No ratings yet
Oops Notes
6 pages
C Programming Interview Questions PDF Download
No ratings yet
C Programming Interview Questions PDF Download
6 pages
Role of Parse1
No ratings yet
Role of Parse1
20 pages
Lab Manual
No ratings yet
Lab Manual
20 pages
Compilers
No ratings yet
Compilers
25 pages
Context Free Grammar and Parsing
0% (1)
Context Free Grammar and Parsing
138 pages
Python Loops
No ratings yet
Python Loops
12 pages
C++ Chapter 01 - Introduction
100% (1)
C++ Chapter 01 - Introduction
23 pages
Parsing and Parsing Techniques in Compiler Construction
No ratings yet
Parsing and Parsing Techniques in Compiler Construction
12 pages
Introduction To Blue Tooth Networking
No ratings yet
Introduction To Blue Tooth Networking
47 pages
Theory of Computation
No ratings yet
Theory of Computation
120 pages
Compiler Lecture 6
No ratings yet
Compiler Lecture 6
90 pages
ACP Question Bank
No ratings yet
ACP Question Bank
5 pages
Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Instruction-Level Parallelism and Superscalar Processors
22 pages
JNTUH FLAT Study Material
No ratings yet
JNTUH FLAT Study Material
211 pages
Object Oriented Programming: File I/O
No ratings yet
Object Oriented Programming: File I/O
20 pages
Java Programming - Unit I Java: SS Govt. Arts College
No ratings yet
Java Programming - Unit I Java: SS Govt. Arts College
15 pages
CSE 330 My Exam Cheat Sheet PDF
No ratings yet
CSE 330 My Exam Cheat Sheet PDF
2 pages
Flat Online Bits (Mid-I)
No ratings yet
Flat Online Bits (Mid-I)
6 pages
2 Compiler Design Notes
No ratings yet
2 Compiler Design Notes
31 pages
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
C Programming: Core Concepts and Techniques
From Everand
C Programming: Core Concepts and Techniques
William Smith
No ratings yet
Share Face Recognition Technology
No ratings yet
Share Face Recognition Technology
15 pages
Motivational Strategies Followed in Accenture
0% (1)
Motivational Strategies Followed in Accenture
10 pages
Superconductors
No ratings yet
Superconductors
6 pages
MF-218 Piping and Miscellaneous Practice in Engine Room PDF
No ratings yet
MF-218 Piping and Miscellaneous Practice in Engine Room PDF
35 pages
Your CFO Guy - Board Reporting - Portrait
No ratings yet
Your CFO Guy - Board Reporting - Portrait
6 pages
Bagaimanakah Beban Kerja Dan Stres Kerja Mempengaruhi Kinerja Karyawan Dengan Burnout Sebagai Variabel Mediasi
No ratings yet
Bagaimanakah Beban Kerja Dan Stres Kerja Mempengaruhi Kinerja Karyawan Dengan Burnout Sebagai Variabel Mediasi
14 pages
7.soru Fren Si̇stem Ve Balata Deği̇şi̇mi̇
No ratings yet
7.soru Fren Si̇stem Ve Balata Deği̇şi̇mi̇
10 pages
DQ XBQ 8 IA5 B 4 XWNRB
No ratings yet
DQ XBQ 8 IA5 B 4 XWNRB
2 pages
Method Statement For Application of GRP Lining Inside Water Tank
100% (1)
Method Statement For Application of GRP Lining Inside Water Tank
5 pages
FTS-7 (CODE A) - Solution
No ratings yet
FTS-7 (CODE A) - Solution
21 pages
Blank Project Work Plan and Budget Matrix
No ratings yet
Blank Project Work Plan and Budget Matrix
2 pages
Patients Safety - Key Issues and Challenges
No ratings yet
Patients Safety - Key Issues and Challenges
4 pages
202210197590HA_Emmanuel
No ratings yet
202210197590HA_Emmanuel
1 page
Improvement of A Multiphase Flow Model For Wellhead Chokes Under Critical and Subcritical Conditions Using Field Data
No ratings yet
Improvement of A Multiphase Flow Model For Wellhead Chokes Under Critical and Subcritical Conditions Using Field Data
17 pages
VR Amritsar
No ratings yet
VR Amritsar
10 pages
Suggested Answer May 2018
No ratings yet
Suggested Answer May 2018
23 pages
Newest Loyo Led Light Bar Quotation 2018
0% (1)
Newest Loyo Led Light Bar Quotation 2018
3 pages
Upper Assam Road Trip Visiting Namsai Arunachal
No ratings yet
Upper Assam Road Trip Visiting Namsai Arunachal
1 page
LOAN
No ratings yet
LOAN
3 pages
Unit-1 - Technology of Meat, Fish, Poultry & Their Products
No ratings yet
Unit-1 - Technology of Meat, Fish, Poultry & Their Products
11 pages
CMMS Technical Object and Preventative Maintenance Form
No ratings yet
CMMS Technical Object and Preventative Maintenance Form
7 pages
Dhananjay Sharma Vs State of Haryana and Ors On 2 May, 1995
No ratings yet
Dhananjay Sharma Vs State of Haryana and Ors On 2 May, 1995
21 pages
UP8244
No ratings yet
UP8244
660 pages
Role and Functions of NGO
0% (1)
Role and Functions of NGO
2 pages
Ict - Telecom Osp Installation (Fiber Optic Cable) NC Ii PDF
No ratings yet
Ict - Telecom Osp Installation (Fiber Optic Cable) NC Ii PDF
15 pages
Phrasal Verbs About Clothes and Fashion
No ratings yet
Phrasal Verbs About Clothes and Fashion
14 pages
PLC HMI: + in One Unit
No ratings yet
PLC HMI: + in One Unit
13 pages
Population Projection-3-4
No ratings yet
Population Projection-3-4
2 pages
GED Mid Term Question Self
No ratings yet
GED Mid Term Question Self
19 pages