Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

PART1 - Compiler Lecture Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

PHASES OF A COMPILER

1. Lexical Analysis: It is a first phase of compiler also known as scanner because it


scans the source code as stream of characters and convert it into meaningful sequence of
characters which is called as lexemes. It generate the tokens with the help of lexemes as
<token_name, attribute_value> where token name is abstract symbol used during syntax
analysis and attribute value points to entry in symbol table for this token. It separates
characters of source language into groups called as tokens that logically belong together.
• Lexical analysis is the first phase of compiler which is also termed as scanning.
• Source program is scanned to read the stream of characters and those characters are
grouped to form a sequence called lexemes which produces token as output.
• Token: Token is a sequence of characters that represent lexical unit, which matches with
the pattern, such as keywords, operators, identifiers etc.
• Lexeme: Lexeme is instance of a token i.e., group of characters forming a token. ,
• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure
that must be matched by strings.
• Once a token is generated the corresponding entry is made in the symbol table.

2. Syntax Analysis: It is also known as parsing as it generates parse tree from the
token produced. It checks whether the expression made by tokens follow the syntax or not.
The interior node of tree represent string of token i.e operation and children represents the
arguments of operations
• Syntax analysis is the second phase of compiler which is also called as parsing.
• Parser converts the tokens produced by lexical analyzer into a tree like representation
called parse tree.

• A parse tree describes the syntactic structure of the input.


• Syntax tree is a compressed representation of the parse tree in which the operators
appear as interior nodes and the operands of the operator are the children of the node for
that operator.

3. Semantic Analysis: It uses syntax tree and information to check source program.
It checks whether the parse tree to be constructed follows the rules of languages or not.
Type information saves it in either the syntax tree or symbol table for subsequent use
during intermediate code generation. It also keeps track to identify their types and
expressions whether identifiers are declared before it is used or not.
• Semantic analysis is the third phase of compiler.
• It checks for the semantic consistency.
• Type information is gathered and stored in symbol table or in syntax tree.
• Performs type checking.

4. Intermediate Code Generator: After semantic analysis, compiler generates


intermediate code from the source code. It represents as abstract machine for target
machine. If B is in between high level language and machine language, intermediate code
which is to be generated is such as it makes easier to translate into target machine code. It
may construct one or more intermediate representation which can have variety of array[4].
This representation has two important properties. ➢

• Intermediate code generation produces intermediate representations for the source


program which are of the following forms:
 Postfix notation
 Three-address code
 Syntax tree
Most commonly used form is the three-address code.
t1 = inttofloat
t2 = id3* tl
t3 = id2 + t2
id1 = t3
Properties of intermediate code
• It should be easy to produce.
• It should be easy to translate into target program.

5. Code optimization: It attempts to improve intermediate code to perform better


(faster) target code result. Other objectives may be desired such as shorter code consuming
less power. A simple code generation algorithm followed by this phase is reasonable way to
generate good target code. The optimiser can convert integer to float[3]. It must follow
three rules as:

i) The o/p code must not change the meaning of program.

ii) Optimization should increase speed of program with less resource.

iii) It should itself be fast and should not delay overall compilation process.

It is written in code optimisation form as follows:

t1=id3*60.0

id1=id2+t1

• Code optimization phase gets the intermediate code as input and produces optimized
intermediate code as output.
• It results in faster running machine code.
• It can be done by reducing the number of lines of code for a program.
• This phase reduces the redundant code and attempts to improve the intermediate code so
that faster-running machine code will result.
• During the code optimization, the result of the program is not affected.
To improve the code generation, the optimization involves
 Deduction and removal of dead code (unreachable code).
 Calculation of constants in expressions and terms.
 Collapsing of repeated expression into temporary string.
 Loop unrolling.
 Moving code outside the loop.
 Removal of unwanted temporary variables.
t1 = id3* 5.0
id1 = id2 + t1
6. Code Generation: It also keeps track of idea. It takes input or intermediate
representation of source program and maps it into target language. It is the final phase of
compiler. It generates object code of some lower level programming language[3]. Assembly
language, the code generated by code generator has following meaningful minimum
property to carry exact meaning of source code. If target language is machine code registers
or memory location then intermediate instructions are translated into sequence of machine
instruction that perform some tasks[3]. The intermediate might get translated into machine
code.

Code Generation
• Code generation is the final phase of a compiler.
• It gets input from code optimization phase and produces the target code or object code as
result.
• Intermediate instructions are translated into a sequence of machine instructions that
perform the same task.
The code generation involves
 Allocation of register and memory.
 Generation of correct references.
 Generation of correct data types.
 Generation of missing code.

DISADVANTAGES & ADVANTAGES OF COMPILERS

Advantages:

 Self-Contained and Efficient


One major advantage of programs that are compiled is that they are self-contained units
that are ready to be executed. Because they are already compiled into machine language
binaries, there is no second application or package that the user has to keep up-to-date.
If a program is compiled for Windows on an x86 architecture, the end user needs only a
Windows operating system running on an x86 architecture. Additionally, a precompiled
package can run faster than an interpreter compiling source code in real-time.

 Hardware Optimization
While being locked into a specific hardware package has its downsides, compiling a
program can also increase its performance. Users can send specific options to compilers
regarding the details of the hardware the program will be running on. This allows the
compiler to create machine language code that makes the most efficient use of the
specified hardware, as opposed to more generic code. This also allows advanced users to
optimize a program's performance on their computers.

Disadvantages:

 Hardware Specific
Because a compiler translates source code into a specific machine language, programs
have to be specifically compiled for OS X, Windows or Linux, as well as specifically for 32-
bit or 64-bit architectures. For a programmer or software company trying to get a product
out to the widest possible audience, this means maintaining multiple versions of the
source code for the same application. This result in a more time spent on source code
maintenance and extra trouble when updates are released.

 Compile Times
One of the drawbacks of having a compiler is that it must actually compile source code.
While the small programs that many novice programmers code take trivial amounts of
time to compile, larger application suites can take significant amounts of time to compile.
When programmers have nothing to do but wait for the compiler to finish, this time can
add up—especially during the development stage, when the code has to be compiled in
order to test functionality and troubleshoot glitches.
Bootstrapping in Compiler Design
Bootstrapping is a process in which simple language is used to translate more complicated
program which in turn may handle for more complicated program. This complicated program
can further handle even more complicated program and so on.
Writing a compiler for any high level language is a complicated process. It takes lot of time to
write a compiler from scratch. Hence simple language is used to generate target code in some
stages. to clearly understand the Bootstrapping technique consider a following scenario.
Suppose we want to write a cross compiler for new language X. The implementation language
of this compiler is say Y and the target code being generated is in language Z. That is, we
create XYZ. Now if existing compiler Y runs on machine M and generates code for M then it is
denoted as YMM. Now if we run XYZ using YMM then we get a compiler XMZ. That means a
compiler for source language X that generates a target code in language Z and which runs on
machine M.
Following diagram illustrates the above scenario.
Example:
We can create compiler of many different forms. Now we will generate.

Advantages
Bootstrapping a compiler has the following advantages:
 it is a non-trivial test of the language being compiled, and as such is a form
of dogfooding.
 compiler developers and bug reporting part of the community only need to know the
language being compiled.
 compiler development can be performed in the higher-level language being compiled.
 improvements to the compiler's back-end improve not only general-purpose programs
but also the compiler itself.
 it is a comprehensive consistency check as it should be able to reproduce its own object
code.
Note that some of these points assume that the language runtime is also written in the
same language

Bootstrapping is a technique that is widely used in compiler development. It has four main
uses:

1. It enables new programming languages and compilers to be developed starting


from existing ones.
2. It enables new features to be added to a programming language and its compiler.
3. It also allows new optimisations to be added to compilers.
4. It allows languages and compilers to be transferred between processors with
different instruction sets.

You might also like