CD - 1

Compiler Design
Unit – I--Syllabus
Introduction: Language processors, The Structure of a Compiler, the science of building a
complier.
Lexical Analysis: The Role of the lexical analyzer, Input buffering, Specification of tokens,
Recognition of tokens, the lexical analyzer generator Lex, Design of a Lexical Analyzer
generator.
INTRODUCTION
Programming languages are notations for describing computations to peopleand to

machines. The world as we know it depends on programming languages,because all the software
running on all the computers was written in someprogramming language. But, before a program
can be run, it first must betranslated into a form in which it can be executed by a computer.
“The software systems that do this translation are called compilers.”
LANGUAGE PROCESSOR:
A language processor is a special type of computer software that has the capacity of
translator the source code or program codes into machine codes. The following are different
types of language processors are:
Compiler
Assembler
Interpreter
Compiler
A compiler is a program that can read a program in one language(the source language) and
translate it into an equivalent program inanother language (the target language). An important
role of thecompiler is to report any errors in the source program that it detects duringthe
translation process.
If the target program is an executable machine-language program,then the user to process inputs
and produces outputs.
1
Simply, the Compiler Translates high level language into assembly language.
Interpreter:
An interpreter is another common kind of language processor. Instead ofproducing a
target program as a translation, an interpreter appears to directlyexecute the operations specified
in the source program on inputs supplied bythe user, as shown in Fig.
An interpreter,however, can usually give better error diagnostics than a compiler, because
itexecutes the source program statement by statement.
The modified source program is then fed to a compiler. The compiler mayproduce an
assembly-language program as its output, because assembly languageis easier to produce as
output and is easier to debug. The assemblylanguage is then processed by a program called an
assembler that producesrelocatable machine code as its output.
THE STRUCTURE OF A COMPILER:
A compiler as a single box that maps a sourceprogram into a semantically equivalent

target program.There are two parts in Compiler:
Analysis
Synthesis.
The analysispart breaks up the source program into constituent pieces andimposes a grammatical
structure on them. It then uses this structure to createan intermediate representation of the source
2
program. If the analysis partdetects that the source program is either syntactically ill formed or
semanticallyunsound, then it must provide informative messages, so the user can takecorrective
action. The analysis part also collects information about the sourceprogram and stores it in a data
structure called a symbol table, which is passedalong with the intermediate representation to the
synthesis part.
The synthesispart constructs the desired target program from the intermediaterepresentation and
the information in the symbol table. The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
The Compiler has Six Phases:

1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate code generation
5. Code Optimization
6. Code Generation
3
Lexical Analysis:
The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads the
stream of characters making up the source program and groups the characters into meaningful
sequences called lexemes. For each lexeme, the lexical analyzer produces as output a token of the
form (token-name, attribute-value)that it passes on to the subsequent phase, syntax analysis.
For example, suppose a source program contains the assignment statement.
p o s i t i o n = i n i t i a l + r a t e * 60
The characters in this assignment could be grouped into the following lexemesand mapped into
the following tokens passed on to the syntax analyzer:
4
1. p o s i t i o n is a lexeme that would be mapped into a token (id, 1), where idis an abstract
symbol standing for identifier and 1 points to the symboltableentry for p o s i t i o n . The
symbol-table entry for an identifier holdsinformation about the identifier, such as its name and
type.
2. Theassignment symbol = is a lexeme that is mapped into the token (=).Since this token needs
no attribute-value, we have omitted the secondcomponent. We could have used any abstract
symbol such as assign forthe token-name, but for notational convenience we have chosen to use
thelexeme itself as the name of the abstract symbol.
3. i n i t i a l is a lexeme that is mapped into the token (id, 2), where 2 pointsto the symbol-table
entry for i n i t i a l .
4. + is a lexeme that is mapped into the token (+).
5.r a t e is a lexeme that is mapped into the token (id, 3), where 3 points tothe symbol-table entry
for r a t e .
6.* is a lexeme that is mapped into the token (*).
7. 60 is a lexeme that is mapped into the token (60).
Blanks separating the lexemes would be discarded by the lexical analyzer.Figure 1.7 shows the
representation of the assignment statement (1.1) afterlexical analysis as the sequence of tokens
( i d , l ) <=) (id, 2) (+) (id, 3) (*) (60) ………………..(1.2)
In this representation, the token names =, +, and * are abstract symbols forthe assignment,
addition, and multiplication operators, respectively.
Syntax Analysis:
The second phase of the compiler is syntax analysis or parsing.The parser usesthe first
components of the tokens produced by the lexical analyzer to createa tree-like intermediate
representation that depicts the grammatical structureof the token stream. A typical representation
is a syntax treein which eachinterior node represents an operation and the children of the node
represent thearguments of the operation.
Semantic Analysis:
The semantic analyzer uses the syntax tree and the information in the symboltable to
check the source program for semantic consistency with the languagedefinition. An important
5
part of semantic analysis is type checking, where the compilerchecks that each operator has
matching operands. For example, many programminglanguage definitions require an array index
to be an integer; the compilermust report an error if a floating-point number is used to index an
array.
Intermediate Code Generation:

In the process of translating a source program into target code, a compiler mayconstruct one or
more intermediate representations, which can have a varietyof forms. This
intermediaterepresentation should have two important properties: it should be easy toproduce
and it should be easy to translate into the target machine. we consider an intermediate form
called three-address code,which consists of a sequence of assembly-like instructions with three
operandsper instruction. Each operand can act like a register. The output of the intermediatecode
generator.
Code Optimization:
The machine-independent code-optimization phase attempts to improve theintermediate
code so that better target code will result. Usually better meansfaster, but other objectives may be
desired, such as shorter code, or target codethat consumes less power.
A simple intermediate code generation algorithm followed by code optimizationis a

reasonable way to generate good target code. The optimizer can deducethat the conversion of 60
from integer to floating point can be done once and forall at compile time, so the
inttofloatoperation can be eliminated by replacingthe integer 60 by the floating-point number
60.0. Moreover, t3 is used onlyonce to transmit its value to i d l so the optimizer can transform
(1.3) into theshorter sequence.
t1 = id3 * 60.0
idl = id2 + tl
Code Generation:
6
The code generator takes as input an intermediate representation of the sourceprogram
and maps it into the target language. If the target language is machinecode, registers Or memory
locations are selected for each of the variables used bythe program. Then, the intermediate
instructions are translated into sequencesof machine instructions that perform the same task.
For example, using registers Rl and R2, the intermediate code in (1.4) mightget translated into
the machine code
LDF R2, id3

MULF R2, R2, #60.0
LDF Rl, id2
ADDF Rl, Rl, R2
STF idl, Rl
The first operand of each instruction specifies a destination. The F in each instruction
tells us that it deals with floating-point numbers. The code loads the contents of address id3 into
register R2, and then multiplies it withfloating-point constant 60.0. The # signifies that 60.0 is to
be treated as animmediate constant. The third instruction moves id2 into register Rl and
thefourth adds to it the value previously computed in register R2. Finally, the valuein register
Rlis stored into the address of idl, so the code correctly implementsthe assignment statement
(1.1).
Symbol-Table Management:
The symbol table is a data structure containing a record for each variablename, with
fields for the attributes of the name. The data structure should bedesigned to allow the compiler
to find the record for each name quickly and tostore or retrieve data from that record quickly.
The Grouping of Phases into Passes:

Thephases deals with the logical organization of a compiler. Inan implementation,
activities from several phases may be grouped togetherinto a passthat reads an input file and
writes an output file. For example,the front-end phases of lexical analysis, syntax analysis,
semantic analysis, andintermediate code generation might be grouped together into one pass.
Codeoptimization might be an optional pass. Then there could be a back-end passconsisting of
code generation for a particular target machine.
A compiler can have many phases and passes.

Pass : A pass refers to the traversal of a compiler through the entire program.
Phase : A phase of a compiler is a distinguishable stage, which takes input from the
previous stage, processes and yields output that can be used as input for the next stage.
A pass can have more than one phase.
7
Compiler-Construction Tools:
Some commonly used compiler-construction tools include
1.Parser generatorsthat automatically produce syntax analyzers from agrammatical description
of a programming language.
2.Scanner generatorsthat produce lexical analyzers from a regular-expressiondescription of the
tokens of a language.
3.Syntax-directed translation enginesthat produce collections of routinesfor walking a parse tree
and generating intermediate code.
4. Code-generation generatorsthat produce a code generator from a collectionof rules for
translating each operation of the intermediate language intothe machine language for a target
machine.
5.Data-flow analysis enginesthat facilitate the gathering of informationabout how values are
transmitted from one part of a program to eachother part. Data-flow analysis is a key part of code
optimization.
6.Compiler-construction toolkitsthat provide an integrated set of routinesfor constructing
various phases of a compiler.
THE SCIENCE OF BUILDING A COMPILER
A compiler must accept all source programs that conform to the specificationof the
language; the set of source programs is infinite and any program can bevery large, consisting of
possibly millions of lines of code. Any transformationperformed by the compiler while
translating a source program must preserve themeaning of the program being compiled.
Compiler writers thus have influenceover not just the compilers they create, but all the programs
that their compilerscompile.
1. Modeling in Compiler Design and Implementation
The study of compilers is mainly a study of how we design the right mathematical models and
choose the right algorithms, while balancing the need forgenerality and power against simplicity and
efficiency.
2. The Science of Code Optimization:

The term "optimization" in compiler design refers to the attempts that a compilermakes to
produce code that is more efficient than the obvious code. "Optimization"is thus a misnomer,
since there is no way that the code producedby a compiler can be guaranteed to be as fast or
faster than any other codethat performs the same task.
Compiler optimizations must meet the following design objectives:

• The optimization must be correct, that is, preserve the meaning of thecompiled program,
• The optimization must improve the performance of many programs,
8
• The compilation time must be kept reasonable,
• The engineering effort required must be manageable.
Applications of Compiler Technology:
1. Implementation of High-Level Programming Languages.
2. Optimizations for Computer Architectures
3. Design of New Computer Architectures
4. Program Translations
5. Software Productivity Tools
LEXICAL ANALYSIS
It is process of converting sequence of characters into sequence of tokens. A laxer

consists of a scanner and a tokenizer.
THE ROLE OF THE LEXICAL ANALYZER:
The first phase of a compiler, the main task of the lexical analyzer is toread the input
characters of the source program, group them into lexemes, andproduce as output a sequence of
tokens for each lexeme in the source program.The stream of tokens is sent to the parser for
syntax analysis.When thelexical analyzer discovers a lexeme constituting an identifier, it needs
to enterthatlexeme into the symbol table.
Fig: Interactions between the lexical analyzer and the parser
Since the lexical analyzer is the part of the compiler that reads the sourcetext, it may
perform certain other tasks besides identification of lexemes. Onesuch task is stripping out
comments and whitespace (blank, newline, tab, andperhaps other characters that are used to
separate tokens in the input). Anothertask is correlating error messages generated by the
compiler with the sourceprogram.
The lexical analyzers are divided into two processes:
a) Scanningconsists of the simple processes that do not require tokenizationof the input, such as
deletion of comments and compaction of consecutivewhitespace characters into one.
9
b) Lexical analysisproper is the more complex portion, where the scannerproduces the sequence
of tokens as output.
Lexical Analysis Versus Parsing:
There are a number of reasons why the analysis portion of a compiler is

normallyseparated into lexical analysis and parsing (syntax analysis) phases.
1.Simplicity of design is the most important consideration. The separationof lexical and
syntactic analysis often allows us to simplify at least oneof these tasks.
2.Compiler efficiency is improved. A separate lexical analyzer allows us toapply specialized
techniques that serve only the lexical task, not the jobof parsing.
3.Compiler portability is enhanced. Input-device-specific peculiarities canbe restricted to the
lexical analyzer.
Tokens, Patterns, and Lexemes:

Token:
It is a group of characters with logical meaning.
Eg: identifiers, keywords, numbers, operators, special symbols.
Pattern:
It is a rule that describe the charecters that can be grouped into tokens. It expressed as a
regular expression.
Eg: If the Regular Expression is [A-Za-z] [A-Za-z,0-9]* then patterns [XYZ,abcd,1234].
Lexeme:
It is a actual text/ character that matches with the pattern and it is recognized as a token.
Example:
Lexeme Token Pattern
Float Float Float
Key Id A letter followed by any no.of letters or digits
= Relop >=|>=|=…………..
1.2 Num Any numeric constant
; ; ;
TOKEN INFORMAL DESCRIPTION SAMPLE LEXEMES

ifcharacters i, f If
elsecharacters e, 1, s, ee l s e
comparison< or > or <= or >= or == or ! = <=, ! =
idletter followed by letters and digitspi, score, D2
numberany numeric constant3.14159, 0, 6.02e23
10
Fig: Examples of tokens
Lexical Errors:
It is hard for a lexical analyzer to tell, without the aid of other components,that there is a
source-code error. For instance, if the string fiis encounteredfor the first time in a C program in
the context:
fi ( a == f ( x ) ) . ..
A lexical analyzer cannot tell whether fi is a misspelling of the keyword if oran undeclared
function identifier. Since f i is a valid lexeme for the token id,the lexical analyzer must return the
token id to the parser and let some otherphase of the compiler — probably the parser in this case
— handle an errordue to transposition of the letters.
Other possible error-recovery actions are:
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters.
Attributes for Tokens:

We shall assume that tokens have at most one associated attribute, althoughthis attribute may
have a structure that combines several pieces of information.The most important example is the
token id, where we need to associate withthe token a great deal of information. Normally,
information about an identifier— e.g., its lexeme, its type, and the location at which it is first
found (incase an error message about that identifier must be issued) — is kept in thesymbol
table. Thus, the appropriate attribute value for an identifier is a pointerto the symbol-table entry
for that identifier.
Example: The token names and associated attribute values for the Fortranstatement
E = M * C ** 2
is written below as a sequence of pairs.
<id, pointer to symbol-table entry for E>

<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>
11
Lexical Errors:
It is hard for a lexical analyzer to tell, without the aid of other components,that there is a source-
code error. For instance, if the string fiis encounteredfor the first time in a C program in the
context:
fi ( a == f ( x ) ) . ..
a lexical analyzer cannot tell whether f i is a misspelling of the keyword if oran undeclared
function identifier. Since f i is a valid lexeme for the token id,the lexical analyzer must return the
token id to the parser and let some otherphase of the compiler — probably the parser in this case
— handle an errordue to transposition of the letters.
Other possible error-recovery actions are:

1. Delete one character from the remaining input. (Ex: printff printf)
2. Insert a missing character into the remaining input. (Ex: pritf printf)
3. Replace a character by another character. (Ex: Printf printf)
4. Transpose two adjacent characters. (prinft printf)
INPUT BUFFERING
This task is made difficult by the fact that we often haveto look one or more characters
beyond the next lexeme before we can be surewe have the right lexeme. For instance,we cannot
be sure we've seen the end of an identifier until we see a characterthat is not a letter or digit, and
therefore is not part of the lexeme for id. InC, single-character operators like -, =, or < could also
be the beginning of atwo-character operator like ->, ==, or <=.
There are two methods
1. Two-buffer scheme- that handles large lookaheads safely.
2. "Sentinels" that saves time checking for the ends of buffers.
Two Buffer Scheme:

Because of the amount of time taken to process characters and the large numberof characters that
must be processed during the compilation of a large sourceprogram, specialized buffering
techniques have been developed to reduce theamount of overhead required to process a single
input character. An importantscheme involves two buffers that are alternately reloaded, as
suggested inFig.
12
Each buffer is of the same size N, and N is usually the size of a disk block,e.g., 4096 bytes. Using
one system read command we can read N charactersinto a buffer, rather than using one system
call per character. If fewer than Ncharacters remain in the input file, then a special character,
represented by eof,marks the end of the source file and is different from any possible character
ofthe source program.
Two pointers to the input are maintained:

1. Pointer lexemeBegin,marks the beginning of the current lexeme, whoseextent we are
attempting to determine.
2. Pointer forward scans ahead until a pattern match is found; the exactstrategy whereby this
determination is made will be covered in the balanceof this chapter.
Once the next lexeme is determined, forward is set to the character at its rightend. Then, after
the lexeme is recorded as an attribute value of a token returnedto the parser, lexemeBeginis set
to the character immediately after the lexemejust found. In above Fig, we see forward has
passed the end of the next lexeme,** (the Fortran exponentiation operator), and must be retracted
one positionto its left.
Algorithm :
if(forward is at end of first buffer )

{
reloadsecond buffer;
forward = beginning of second buffer;
}
else if (forwardis at end of second buffer )
{
reloadfirst buffer;
forward = beginning of first buffer;
}
else/* eofwithin a buffer marks the end of input */
terminate lexical analysis;
break;
13
Sentinels:
If we use the scheme of previous, we must check, each time weadvance forward, that we have
not moved off one of the buffers; if we do, thenwe must also reload the other buffer. Thus, for
each character read, we maketwo tests: one for the end of the buffer, and one to determine what
characteris read (the latter may be a multiway branch). We can combine the buffer-endtest with
the test for the current character if we extend each buffer to hold asentinel character at the end.
The sentinel is a special character that cannotbe part of the source program, and a natural choice
is the character eof.
Algorithm:
caseeof:
if(forwardis at end of first buffer ) {
reloadsecond buffer;
forward = beginning of second buffer;
}
else if (forwardis at end of second buffer ) {
reloadfirst buffer;
forward = beginning of first buffer;
}
else /* eofwithin a buffer marks the end of input */
terminate lexical analysis;
break;
SPECIFICATION OF TOKENS:
Regular expressions are an important notation for specifying lexeme patterns.While they
cannot express all possible patterns, they are very effective in specifying those types of patterns
that we actually need for tokens. In this sectionwe shall study the formal notation for regular
expressions,
Strings and Languages
14
An alphabet is any finite set of symbols. Typical examples of symbols are letters,digits, and
punctuation. The set {0,1} is the binary alphabet. ASCII is animportant example of an alphabet;
it is used in many software systems.
Terms for Parts of Strings

The following string-related terms are commonly used:
1. A prefixof string s is any string obtained by removing zero or moresymbols from the end of s.
For example, ban, banana, and e areprefixes of banana.
2. A suffixof string s is any string obtained by removing zero or moresymbols from the
beginning of s. For example, nana, banana, and eare suffixes of banana.
3. A substringof s is obtained by deleting any prefix and any suffixfrom s. For instance, banana,
nan, and e are substrings of banana.
4. The properprefixes, suffixes, and substrings of a string s are those,prefixes, suffixes, and
substrings, respectively, of s that are not e ornot equal to s itself.
5. A subsequenceof s is any string formed by deleting zero or morenot necessarily consecutive
positions of s. For example, baan is asubsequence of banana.
Operations on Languages:
Using theoperators of Fig:

1. L U D is the set of letters and digits — strictly speaking the languagewith 62 strings of length
one, each of which strings is either one letter orone digit.
2. LD is the set c-f 520 strings of length two, each consisting of one letterfollowed by one digit.
15
3. L4 is the set of all 4-letter strings.
4. L* is the set of all strings of letters, including e, the empty string.
5. L(L U D)* is the set of all strings of letters and digits beginning with aletter.
6. D+ is the set of all strings of one or more digits.
RECOGNITION OF TOKENS:
It means how to take the patterns for all the needed tokensand build a piece of code that
examines the input string and finds a prefix thatis a lexeme matching one of the patterns. Our
discussion will make use of thefollowing running example.
stmt—>• if expr then stmt

| ifexpr then stmtelse stmt
Ie
expr—>• term relopterm
| term
term-> id
| number
Figure: A grammar for branching statements
For relop, we use the comparison operators of languages like Pascal or SQL,where = is "equals"
and <> is "not equals," because it presents an interestingstructure of lexemes.
The terminals of the grammar, which are if, then, else, relop, id, andnumber, are the names of
tokens as far as the lexical analyzer is concerned. Thepatterns for these tokens are described
using regular definitions, as shown in below.
digit-------->[0-9]
digits--------> digit+
number-----> digits (. digits)? ( E [+-]? digits )?
letter-------->[A-Za-z]
id----------->letter ( letter \ digit )*
if------------->if
then--------->then
else--------->e l se
relop------->< 1 > 1 <= 1 >= 1 = 1 <>
16
Figure: Tokens, their patterns, and attribute values
Transition Diagrams:
Transition diagrams have a collection of nodes or circles, called states. Eachstate
represents a condition that could occur during the process of scanningthe input looking for a
lexeme that matches one of several patterns. We maythink of a state as summarizing all we need
to know about what characters wehave seen between the lexemeBeginpointer and the forward
pointer.
Edges are directed from one state of the transition diagram to another.Each edge is
labeled by a symbol or set of symbols. If we are in some state5, and the next input symbol is a,
we look for an edge out of state slabeledby a (and perhaps by other symbols, as well). If we find
such an edge, weadvance the forward pointer arid enter the state of the transition diagram
towhich that edge leads. We shall assume that all our transition diagrams aredeterministic,
meaning that there is never more than one edge out of a givenstate with a given symbol among
its labels.
Example: Below Figure is a transition diagram that recognizes the lexemesmatching the token
relop. We begin in state 0, the start state. If we see < as thefirst input symbol, then among the
lexemes that match the pattern for relopwe can only be looking at <, <>, or <=. We therefore go
to state 1, and look atthe next character. If it is =, then we recognize lexeme <=, enter state 2,
andreturn the token relopwith attribute LE, the symbolic constant representingthis particular
comparison operator. If in state 1 the next character is >, theninstead we have lexeme <>, and
enter state 3 to return an indication that thenot-equals operator has been found. On any other
character, the lexeme is <,and we enter state 4 to return that information. Note, however, that
state 4has a * to indicate that we must retract the input one position.
17
Figure: Transition diagram for relop
Recognition of Reserved Words and Identifiers

Recognizing keywords and identifiers presents a problem. Usually, keywords likeif or
then are reserved (as they are in our running example), so they are notidentifiers even though
they look like identifiers. Thus, although we typicallyuse a transition diagram like that of Fig.
3.14 to search for identifier lexemes,this diagram will also recognize the keywords if, then, and e
l s e of our runningexample.
The transition diagram for token n u m b e r is shown in Fig. 3.16, and is sofar the most complex
diagram we have seen. Beginning in state 12, if we see adigit, we go to state 13. In that state, we
can read any number of additionaldigits. However, if we see anything but a digit or a dot, we
have seen a numberin the form of an integer; 123 is an example. That case is handled by
enteringstate 20, where we return token n u m b e r and a pointer to a table of constantswhere the
found lexeme is entered. These mechanics are not shown on thediagram but are analogous to the
way we handled identifiers.
18
The final transition diagram, shown in Fig. 3.17, is for whitespace. In thatdiagram, we look for
one or more "whitespace" characters, represented by d e l I min that diagram — typically these
characters would be blank, tab, newline, andperhaps other characters that are not considered by
the language design to bepart of any token.
THE LEXICAL-ANALYZER GENERATOR LEX:

We introduce a tool called Lex, or in a more recent implementationFlex, that allows one
to specify a lexical analyzer by specifying regularexpressions to describe patterns for tokens. The
input notation for the Lex toolis referred to as the Lex language and the tool itself is the Lex
compiler.
The Lex compiler transforms the input patterns into a transitiondiagram and generates
code, in a file called lex.yy.c, that simulates this transitiondiagram.
Use of Lex:
An input file, which we call lex.l , iswritten in the Lex language and describes the lexical
analyzer to be generated.The Lex compiler transforms lex.1 to a C program, in a file that is
alwaysnamed lex.yy.c. The latter file is compiled by the C compiler into a file calleda.out , as
always. The C-compiler output is a working lexical analyzer that cantake a stream of input
characters and produce a stream of tokens.
Fig: Creating a lexical analyzer with Lex
19
Structure of Lex Programs:
A Lex program has the following form:
declarations
%%
translation rules
%%
auxiliary functions
The declarations section includes declarations of variables, manifest constants(identifiers

declared to stand for a constant, e.g., the name of a token), andregular definitions.
The translation rules each have the form
Pattern1{ Action1 }
Pattern2{ Action2 }
……….
……….
Pattern-n{ Action-n }
Each pattern is a regular expression, which may use the regular definitions ofthe
declaration section. The actions are fragments of code, typically written inC, although many
variants of Lex using other languages have been created.The third section holds whatever
additional functions are used in the actions.Alternatively, these functions can be compiled
separately and loaded with thelexical analyzer.
In the declarations section we see a pair of special brackets, °/.{ and %}.Anything within
these brackets is copied directly to the file l e x .yy . c , and isnot treated as a regular definition. It
is common to place there the definitions ofthe manifest constants, using C #defi n e statements to
associate unique integercodes with each of the manifest constants.
The third section holds whatever additional functions are used in the actions.Alternatively,
these functions can be compiled separately and loaded with thelexical analyzer.
DESIGN OF A LEXICAL-ANALYZER GENERATOR:

In this there are twoapproaches, based on NFA's and DFA's; the latter is essentially the
implementationof Lex.
20
21
Example: Suppose the DFA of Fig. 3.54 is given input abba. The sequenceof states entered is
0137,247,58,68, and at the final athere is no transitionout of state 68. Thus, we consider the
sequence from the end, and in thiscase, 68 itself is an accepting state that reports pattern p2 =
abb.
22

CD - 1

Uploaded by

Copyright:

Available Formats

CD - 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD - 1

Uploaded by

Copyright:

Available Formats

Compiler Design

Programming languages are notations for describing computations to peopleand to

“The software systems that do this translation are called compilers.”

THE STRUCTURE OF A COMPILER:

A compiler as a single box that maps a sourceprogram into a semantically equivalent

The Compiler has Six Phases:

For example, suppose a source program contains the assignment statement.

Intermediate Code Generation:

A simple intermediate code generation algorithm followed by code optimizationis a

LDF R2, id3

The Grouping of Phases into Passes:

A compiler can have many phases and passes.

THE SCIENCE OF BUILDING A COMPILER

2. The Science of Code Optimization:

Compiler optimizations must meet the following design objectives:

It is process of converting sequence of characters into sequence of tokens. A laxer

THE ROLE OF THE LEXICAL ANALYZER:

Fig: Interactions between the lexical analyzer and the parser

Lexical Analysis Versus Parsing:

There are a number of reasons why the analysis portion of a compiler is

Tokens, Patterns, and Lexemes:

TOKEN INFORMAL DESCRIPTION SAMPLE LEXEMES

Attributes for Tokens:

<id, pointer to symbol-table entry for E>

Other possible error-recovery actions are:

Two Buffer Scheme:

Two pointers to the input are maintained:

if(forward is at end of first buffer )

Terms for Parts of Strings

Using theoperators of Fig:

stmt—>• if expr then stmt

Figure: A grammar for branching statements

Recognition of Reserved Words and Identifiers

THE LEXICAL-ANALYZER GENERATOR LEX:

Fig: Creating a lexical analyzer with Lex

The declarations section includes declarations of variables, manifest constants(identifiers

The translation rules each have the form

DESIGN OF A LEXICAL-ANALYZER GENERATOR:

You might also like