10.question Bank With Answers
10.question Bank With Answers
10.question Bank With Answers
Q. 2
Name of variable is X
DS
<constant>
[Label]
DC
<value>
DS: stands for Declare storage, DC: stands for Declare constant.
The DS statement reserves area of memory and associates name with them.
A DS 10
Above statement reserves 10 word of memory for variable A.
The DC statement constructs memory words containing constants.
ONE DC 1
Above statement associates the name ONE with a memory word containing the value 1
Page 1
START
This directive indicates that first word of machine should be placed in the memory
word with address <constant>.
START <Constant>
Ex: START 500
First word of the target program is stored from memory location 500 onwards.
II.
END
This directive indicates end of the source program.
The operand indicates address of the instruction where the execution of program
should begin.
By default it is first instruction of the program.
END <operand 2>
Execution control should transfer to label given in operand field.
III.
ORIGIN
This directive is like START instruction, which indicates address of the next
consecutive instruction or data.
Format of this statement is as follows ORIGIN
<operand2>
Operand may constant, symbol or symbolic expression.
The ORIGIN directive is useful when the machine code is not stored in consecutive
memory location.
IV.
Sr. no.
Assembly program
START 100
LOOP
LC
MOVER BREG=2
100
MOVER AREG,N
101
ADD AREG=1
102
ORIGIN LOOP
NEXT BC ANY,LOOP
100
EQU
This directive simply associate the name <symbol> with <operand>.where
<operand> may be constant or symbol.
Page 2
EQU B
LTORG
This directive allocates memory to all literals of current pool and update literal
table, pool table.
Format of this instruction is as follows
LTORG.
If LTORG statement is not present, literals are placed after the END statement.
Q. 3
OR
Explain analysis and synthesis phases of an assembler by clearly stating their tasks. OR
Design specification of an assembler.
Analysis Phase
The primary function performed by the analysis phase is the building of the symbol table.
For this purpose it must determine address of the symbolic name.
It is possible to determine some address directly, however others must be inferred. And
this function is called memory allocation.
To implement memory allocation a data structure called location counter (LC) is used, it is
initialized to the constant specified in the START statement.
We refer the processing involved in maintaining the location counter as LC processing.
Tasks of Analysis phase
1. Isolate the label, mnemonics opcode, and operand fields of a constant.
2. If a label is present, enter the pair (symbol, <LC content>) in a new entry of
symbol table.
3. Check validity of mnemonics opcode.
4. Perform LC processing.
Sourcepr
og.
mnemonics
opcode
length
ADD
01
SUB
02
Analysis
phase
Synthesis
phase
Target
prog.
symbol
address
AGAIN
104
113
Symbol table
Page 3
Synthesis Phase
Consider the assembly statement,
MOVER
BREG, ONE
2.
The first item of information depends on the source program; hence it must be available
by analysis phase.
The second item of information does not depend on the source program; it depends on the
assembly language.
Based on above discussion, we consider the use of two data structure during synthesis
phase:
1. Symbol table:
Each entry in symbol table has two primary field- name and address. This table is
built by analysis phase
2. Mnemonics table:
An entry in mnemonics table has two primary field- mnemonics and opcode.
Task of Synthesis phase
1. Obtain machine opcode through look up in the mnemonics table.
2. Obtain address of memory operand from the symbol table.
3. Synthesize a machine instruction.
Q. 4
OR
OR
Page 4
Q. 5
OR
Explain the role of mnemonic opcode table, symbol table, literal table, and pool table in
assembling process of assembly language program.
OR
Mnemonics
Class
info
MOVER
IS
(04,1)
DS
DL
R#7
START
AD
R#11
.
.
SYMTAB
A SYMTAB entry contains the symbol name, field address and length.
Some address can be determining directly, e.g. the address of the first instruction in the
program, however other must be inferred.
To find address of other we must fix the addresses of all program elements preceding it.
This function is called memory allocation.
Symbol
Address
Length
LOOP
202
NEXT
214
LAST
216
217
BACK
202
218
Page 5
LITTAB
A table of literals used in the program.
A LITTAB entry contains the field literal and address.
The first pass uses LITTAB to collect all literals used in a program.
Awareness of different literal pools is maintained using the auxiliary table POOLTAB.
This table contains the literal number of the starting literal of each literal pool.
At any stage, the current literal pool is the last pool in the LITTAB.
On encountering an LTORG statement (or the END statement), literals in the current pool
are allocated addresses starting with the current value in LC and LC is appropriately
incremented.
Literal no
1
#1
#3
3
POOLTAB
LITTAB
Q. 6
Pass I
Algorithm for Pass I
1) loc_cntr=0(default value)
pooltab_ptr=1; POOLTAB[1]=1;
littab_ptr=1;
2) While next statement is not END statement
a) If a label is present then
this_label=symbol in label field
Enter (this_label, loc_cntr) in SYMTAB
b) If an LTORG statement then
(i)
Process
literals
LITTAB
to
allocate
memory
and
put
the
address
field.updateloc_cntr accordingly
c)
(ii)
pooltab_ptr= pooltab_ptr+1;
(iii)
(ii)
e) If a declaration
(i)
(ii)
Page 6
f)
(iii)
loc_cntr=loc_cntr+size;
(iv)
Generate IC (DL,code)..
(ii)
(iii)
Go to pass II
Declaration statement
DC
01
DS
02
Assembler directive
START
01
END
02
ORIGIN
03
EQU
04
LTORG
05
The information in the mnemonics field is assumed to have the same representation in all
the variants.
Page 7
Cod
AREG
01
BREG
02
CREG
03
DREG
04
Condition
Code
LT
01
LE
02
EQ
03
GT
04
GE
05
ANY
06
The second operand, which is a memory operand, is represented by a pair of the form
(operand class, code)
Where operand class is one of the C, S and L standing for constant, symbol and literal.
For a constant, the code field contains the internal representation of the constant itself.
Ex: the operand descriptor for the statement START 200 is (C,200).
For a symbol or literal, the code field contains the ordinal number of the operands entry in
SYMTAB or LITTAB.
Variant II
This variant differs from variant I of the intermediate code because in variant II symbols,
condition codes and CPU register are not processed.
So, IC unit will not generate for that during pass I.
LOOP
START
200
(AD,01)
(C, 200)
(AD,01)
(C, 200)
READ
(IS, 09)
(S, 01)
(IS, 09)
AREG, A
(IS, 04)
(1)(S, 01)
(IS, 04)
AREG, A
MOVER
..
SUB
AREG, =1 BC
LOOP
STOP
A
DS
1 LTORG
..
.
GT,
(IS, 02)
(1)(L, 01)
(IS, 02)
AREG,(L, 01)
(IS, 07)
(4)(S, 02)
(IS, 07)
GT, LOOP
(IS, 00)
(DL, 02)
(IS, 00)
(C,1)
(AD, 05)
Variant I
(DL, 02)
(C,1)
(AD, 05)
Variant II
Page 8
Variant II
Pass II(Algorithm)
It has been assumed that the target code is to be assembled in the are named code_area.
1. Code_area_adress= address of code_ares;
Pooltab_ptr=1;
Loc_cntr=0;
2. While next statement is not an END statement
a) Clear machine_code_buffer;
b) If an LTORG statement
i)
Process
literals
in
LITTAB
and
assemble
the
literals
in
machine_code_buffer.
c)
ii)
iii)
Pooltab_ptr=pooltab_ptr +1;
ii)
Size=0;
d) If a declaration statement
i)
ii)
e) If an imperative statement
f)
i)
ii)
iii)
Size=size of instruction;
If size 0 then
i)
Move
contents
of
machine_code_buffer
code_area_address+loc_cntr;
ii)
Loc_cntr=loc_cntr+size;
to
the
address
Page 9
Statements
START 200
MOVER AREG,A
200
MVER BREG, A
207
Address
ADD BREG, B
208
14
A DS 1
209
21
A DC 5
227
**ERROR**
dulicate
defination
of
symbol
in
symbol A
.
.
35
END
**ERROR**
undefined
statememt 10
Error indication at statement 10 is also easy because symbol table is searched for an entry
B. if match is not found, error is reported.
Page 10
Q. 8
1
2
3
4
5
6
7
12
13
14
15
16
17
18
19
20
21
22
Q.9
START
READ
MOVER
MOVEM
MULT
MOVER
ADD
MOVEM
COMP
BC
MOVEM
AGAIN
101
N
BREG, ONE
BREG, TERM
BREG,TERM
CREG, TERM
CREG, ONE
CREG, TERM
CREG, N
LE, AGAIN
BREG,
RESULT
RESULT
PRINT
STOP
DS
DS
DC
DS
END
N
RESULT
ONE
TERM
1
1
1
1
Opcode
(2
digit)
Register
operand (1
digit)
Memory
operand
(3 digit)
101)
102)
103)
104)
105)
106)
107)
108)
109)
110)
09
04
05
03
04
01
05
06
07
05
0
2
2
2
3
3
3
3
2
2
113
115
116
116
116
115
116
113
104
114
111)
112)
113)
114)
115)
116)
10
00
0
0
114
000
00
001
START
100
READ
READ
READ
MOVER
AREG,A
ADD
AREG,B
ADD
AREG,C
MULT
AREG,C
MOVEM
AREG,RESULT
RESULT
STOP
A
DS
DS
DS
RESULT
DS
END
Page 11
Program-1 IC in variant-I
(AD,01)
(C,100)
(IS,09)
(S,01)
(IS,09)
(S,02)
(IS,09)
(S,03)
(IS,04)
(01)(S,01)
(IS,01)
(01)(S,02)
(IS,01)
(01)(S,03)
(IS,03)
(01)(S,03)
(IS,05)
(01)(S,04)
(IS,10)
(S,04)
(IS,00)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(AD,02)
Symbol
Address
111
112
113
RESULT
114
Program-2
START
101
READ
READ
MOVER
BREG,A
MULT
BREG,B
MOVEM
BREG,D
STOP
A
DS
DS
DS
END
Symbol
Address
108
109
110
Program-2 Variant-I
Program-2 Variant-II
(AD,01)
(C,101)
(AD,01)
(C,101)
(IS,09)
(S,01)
(IS,09)
(IS,09)
(S,02)
(IS,09)
Page 12
(2)(S,01)
(IS,04)
BREG,A
(IS,03)
(2)(S,02)
(IS,03)
BREG,B
(IS,05)
(2)(S,03)
(IS,05)
BREG,D
(IS,00)
(IS,00)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(AD,02)
(AD,02)
Page 13
Q.1
Compiler
a) Generate code to implement meaning of a source program in the execution domain (target
code generation)
b) Provide diagnostics for violations of PL semantics in a program (Error reporting)
There are four issue involved in implementing these aspects(Q. What are the issue in
code generation in relation to compilation of expression? Explain each issue in
brief. (June-13 GTU))
1.
Data types : semantics of a data type require a compiler to ensure that variable of
a type are assigned or manipulated only through legal operation
Compiler must generate type specific code to implement an operation.
2.
3.
Scope rules: compiler performs operation called scope analysis and name resolution
to determine the data item designated by the use of a name in the source program
4.
TEJAS PATEL
Page
Compiler
3. Memory management
Mapping names in the source program to addresses of data objects in run time memory is
done cooperatively by the front end and the code generator. We assume that a name in a
three-address statement refers to a symbol table entry for the name.
4. Instruction selection
If we do not care about the efficiency of the target program, instruction selection is
straightforward. It requires special handling. For example, the sequence of statements
a := b + c
d := a + e
would be translated into
MOV
b, R0
ADD
c, R0
MOV
R0, a
MOV
a, R0
ADD
e, R0
MOV
R0, d
TEJAS PATEL
Page
Compiler
Memory Binding:
1. Determine the amount of memory required to represent the value of a data item.
2.
Use an appropriate memory allocation model to implement the lifetimes and scopes of data
items.
3.
Determine appropriate memory mappings to access the values in a non scalar data item,
e.g. values in an array.
Static binding
2.
Dynamic binding
By linking the blocks in a list allocation and deallocation can be done quickly with little or
no storage overhead.
Initialization of the area is done by using a portion of each block for a link to the next
block. Pointer available points to the first block.
Allocation consists of taking a block off the list and deallocation consists of putting the
block back on the list. We can treat each block as a variant record.
TEJAS PATEL
3
Page
Compiler
There is no space overhead because the user program can use the entire block for its own
purposes.
When the block is de-allocated then the compiler routine uses some of the space from the
block itself to link it into the list of available blocks.
Explicit Allocation of Variable-Sized Blocks
When blocks are allocated and de-allocated storage can become fragmented that is the
heap may consists of alternate blocks that are free and in use.
Fragmentation will not occur if blocks are of fixed size, but if they are of variable-size then
it occurs.
One method for allocating variable sized blocks is first fit method. When a block of size s if
allocated we search for the first free block that is of size f s (where f - size of free block).
This block is then subdivided into a used block of size s and a free block of size (f - s).
Because of that it incurs a time overhead as we have to search for a free block that is
large enough.
When a block is de-allocated, we check to see if it is next to a free block. If possible the
de-allocated block is combined with a free block next to it to create a larger free block.
Combining a adjacent free blocks into a larger free block prevent further fragmentation
from occurring.
2. Implicit De-allocation
Implicit de-allocation requires cooperation between the user program and the run-time
package, because run time package needs to know when a storage block is no longer in
use.
The first problem is that of recognizing block boundaries. If the size of blocks 75 fixed,
then position information can be used.
For example if each block occupies 20 words then a new block begins every 20 words.
Otherwise in the inaccessible storage attached to a block we keep the size of a block. So
we can determine where the next block begins.
The second problem it that of recognizing if a block is in use we assume that a block is in
use if it is possible for the user program to refer to the information in the block.
The reference may occur through a pointer or after following a sequence of pointers, so the
compiler needs to know the position in storage of all pointers.
Reference counts
We keep track of the number of blocks that point directly to the present block. If this count
ever drops to 0 then the block can be de-allocated because it cannot be referred to i.e. the
block has become garbage that can be collected. Maintaining the reference counts can be
costly. Reference counts are best used when pointer between blocks never appear in cycles
Marking Techniques
An alternative approach is to suspend temporarily execution of the user program and use
the frozen pointers to determine which blocks are in use. This approach requires all the
pointers into the heap to be known.
Conceptually we pour paint into the heap through these pointers. Any block that is reached
TEJAS PATEL
4
Page
Compiler
In more detail, we go through the heap and mark all blocks unused. Then we follow
pointers marking as used any block that is reached in the process. A final sequential scan
of the heap allows all blocks still marked unused to be allocated.
Q.3
Variable X is accessed within the block B1 if it can be accessed by any statement situated
in block B1.
2. Variable X is accessed by any statement in block B2 and block B2 is situated in block B1.
There are two types of variable situated in the block structured language
1. Local variable
2. Non local variable
To understand local and non local variable consider the following example
Procedure A
{
Intx,y,z
Procedure B
{
Inta,b
}
Procedure C
{
Intm,n
}
}
Procedure
Local variables
x,y,z
a,b
x,y,z
m,n
x,y,z
Variables x,y and z are local variables to procedure A but those are non local to block B and
TEJAS PATEL
Page
Compiler
C because these variable are not defined locally within the block B and C but are accessible
within these blocks.
Q.4
Control link
Access link
Saved M/c status
Local variables
Temporaries
1. Temporary values: The temporary variables are needed during the evaluation of
expressions. Such variables are stored in the temporary field of activation record.
2. Local variables: The local data is a data that is local to the execution procedure is stored
in this field of activation record.
3. Saved machine registers: This field holds the information regarding the status of machine
just before the procedure is called. This field contains the registers and program counter.
4.
Control link: This field is optional. It points to the activation record of the calling
procedure. This link is also called dynamic link.
5. Access link: This field is also optional. It refers to the non-local data in other activation
record. This field is also called static link field.
6. Actual parameters: This field holds the information about the actual parameters. These
actual parameters are passed to the called procedure.
7. Return values: This field is used to store the result of a function call.
Q.5
The actual parameters are evaluated and their values are passed to caller procedure(formal
parameter).
TEJAS PATEL
6
Page
Compiler
This extends the capability of the call by value mechanism by copying the value of formal
parameter back to corresponding actual parameter at return
3. Call by reference :
4. Call by name:
Procedure is treated like macro. The procedure body is substituted for call in caller with
actual parameters substituted for formals.
Q.6
The local names of called procedure and names of calling procedure are distinct
Register descriptors
Attribute
Addressability
(int, 1)
Address(a)
(int, 1)
Address(b)
(int, 1)
Address(AREG)
A register descriptor has two fields
1. Status: Contains the code free or occupied to indicate register status.
2.
Operand descriptor #: If status = occupied, this field contains the descriptorfor the operand
contained in the register.
TEJAS PATEL
7
Page
Compiler
In above Example the register descriptor for AREG after generating code for a*b would be
Occupied
#3
This indicates that register AREG contains the operand described by descriptor #3.
Q.7
it a list of nodes of the tree in which a node appears immediately after its children
In three address code form at the most three addresses are used to represent
statement. The general form of three address code representation is -a:= b op c
For the expression like a = b+c+d the three address code will be
t1=b+c
t2=t1+d
Here t1 and t2 are the temporary names generated by the compiler. There are most three
addresses allowed. Hence, this representation is three-address code.
Q.8
There are three representations used for three code such as quadruples, triples and
indirect triples.
Quadruple representation
The quadruple is a structure with at the most tour fields such as op,arg1,arg2 and result.
The op field is used to represent the internal code for operator, the arg1 and arg2
represent the two operands used and result field is used to store the result of an
expression.
t1=uminus a
(0)
t2 := t1 * b
(1)
:t3= - a
(2)
Op
Arg1
uminus
*
uminus
t1
a
Arg2
result
t1
t2
t3
TEJAS PATEL
8
Page
Compiler
t4 := t3 * b
(3)
t3
t4
t5 := t2 + t4
(4)
t2
t4
t5
x= t5
(5)
:=
t5
Triples
The triple representation the use of temporary variables is avoided by referring the pointers
in the symbol table.
the expression x : = - a * b
Number
Op
Arg1
(0)
uminus
(1)
(2)
uminus
(3)
(2)
(4)
(1)
(3)
(5)
:=
(4)
Indirect Triples
The indirect triple representation the listing of triples is been done. And listing pointers
are used instead of using statements.
Number
Op
Arg1
(0)
uminus
(1)
(2)
uminus
(3)
(13)
(4)
(12)
(5)
:=
(11)
Ar
(0)
Arg2
Statement
(0)
(11)
(1)
(12)
(2)
(13)
(3)
(14)
(4)
(15)
(5)
(16)
Q.9
Compile time evaluation means shifting of computations from run time to compilation.
There are two methods used to obtain the compile time evaluation.
TEJAS PATEL
9
Page
Compiler
1. Folding
In the folding technique the computation of constant is done at compile time instead of
run time.
2. Constant propagation
Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then
computation of 3.14 * 5 * 5 is done during compilation.
II.
Then if the operands of this sub expression do not get changed at all then result of such
sub expression is used instead of recomputing it each time
Example:
t1 := 4 * i
t2 := a[t1]
t3 := 4 * j
t4 : = 4 * i
t5:= n
t6 := b[t4]+t5
The above code can be optimized using common sub expression elimination
t1=4*i
t2=a[t1]
t3=4*j
t5=n
t6=b[t1]+t5
The common sub expression t4:= 4 * i is eliminated as its computation is already in t1 and
value of i is not been changed from definition to use.
}
III.
TEJAS PATEL
10
Page
Compiler
Example:
while(i<=max-1)
{
sum=sum+a[i];
}
Can be optimized as a
N=max-1;
While(i<=N)
{ sum=sum+a[i
]; }
IV.
Strength Reduction
In this technique the higher strength operators can be replaced by lower strength
operators.
Example:
for(i=1;i<=50;i++)
{
count = i x 7;
}
Here we get the count values as 7, 14, 21 and so on up to less than 50.
V.
On the other hand, the variable is said to be dead at a point in a program if the value
contained into it is never been used. The code containing such a variable supposed to
be a dead code. And an optimization can be performed by eliminating such a dead code.
Example :
i=0;
if(i==1)
{
TEJAS PATEL
Page
11
Compiler
a=x+5;
}
if statement is a dead code as this condition will never get satisfied hence, statement can
be eliminated and optimization can be done.
VI.
Code Motion
The aim to improve the execution time of the program by reducing the evaluation
frequency of expressions.
Evaluation of expressions is moved from one part of the program to another in such a way
that it is evaluated lesser frequently.
Example:
a = 200;
while (a
> 0)
{
b = x + y;
if ( a%b == 0)
printf (%d, a);
}
The statement b = x + y is executed every time with the loop. But because it is loop
invariant,
a = 200;
b = x + y;
while
(a
> 0)
{
if ( a%b == 0)
printf (%d, a);
}
Q.10
TEJAS PATEL
12
Page
Compiler
5) Global optimization: The optimizing transformations are applied over a program unit.
6) Basic block: basic block is sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without halt or branching.
Q.11
Q.12
Before discussing the data flow properties consider some basic terminologies that be
used while giving the data flow property.
A program point at which a reference to a data item is made is called reference point.
A program point at which some evaluating expression is given is called evaluation point.
For example :
Definition point
W1:x=3
Reference point
W2: y=x
Evaluation point
W3: z=a*b
I.
Available expression
TEJAS PATEL
13
Page
Compiler
An expression x+y is available at a program point w if and only if along all paths are
reaching to w.
1.
2.
B3: t2=4*i
B2:
t2:c+d[t1]
B4: t4=a[t2]
Expression 4 * i is the available expression for B2, B3 and B4 because this expression is not
been changed by any of the block before appearing in B4.
II.
Reaching definition
A definition D reaches at the point P if there is a path from D to P if there is a path from
D to P along witch D is not killed.
The definition D1 is reaching definition for block B2, but the definition D1 not is reaching
definition for block B3, because it is killed by definition D2 in block B2.
III.
Q.13
Live variable
used before it is
variable is said to be
Interpreter.
Data store
TEJAS PATEL
14
Page
Compiler
Symbol table
Data manipulation routine
Types of interpreter
1) Pure interpreter
Data
Source
Program
Interpreter
Result
2) Impure interpreter
Data
Source
Program
Interpreter
IR
Interpreter
Result
A high-level programming language translator that translates and runs the program at the
same time.
It converts one program statement into machine language, executes it, and then proceeds
to the next statement. This differs from regular executable programs that are presented to
the computer as binary-coded instructions.
Interpreted programs remain in the source language the programmer wrote in, which is
human readable text.
Interpreters are not much different than compilers. They also convert the high level
language into machine readable binary equivalents.
Each time when an interpreter gets a high level language code to be executed, it converts
the code into an intermediate code before converting it into the machine code.
Each part of the code is interpreted and then execute separately in a sequence and an error
is found in a part of the code it will stop the interpretation of the code without translating
the next set of the codes.
The advantage of an interpreter, however, is that it does not need to go through the
compilation stage during which machine instructions are generated.
This process can be time-consuming if the program is long. The interpreter, on the other
hand, can immediately execute high-level programs.
For this reason, interpreters are sometimes used during the development of a program,
when a programmer wants to add small sections at a time and test them quickly.
Interpreter characteristics:
TEJAS PATEL
15
Page
Q.14
Compiler
Source
code
Compiler
Machine
code
Errors
Two pass compiler
A two pass assembler does two passes over the source file (the second pass can be over a file
generated in the first pass).
In the first pass all it does is looks for label definitions and introduces them in the symbol table.
In the second pass, after the symbol table is complete, it does the actual assembly by translating
TEJAS PATEL
16
Page
Source
code
Front
end
IR
Compiler
Back
end
Machine
code
Errors
TEJAS PATEL
Page
17
Processor
Q.1
Execution
gap
gap
Application
Domain
PL Domain
Execution
Domain
can only be used for specific application; hence they are called problem oriented languages.
Procedure oriented language: Procedure oriented language provides general purpose facilities
required in most application domains. Such a language is independent of specific application
domains and results in a large specification gap which has to be bridged by an application
designer.
Processor
Q.2
Errors
Program
specification
Program
generator
Target
Program
Program Execution
Two popular models for program execution are translation and interpretation.
Translation
The program translation model bridges the execution gap by translating a program written in PL,
called source program, into an equivalent program in machine or assembly language of the
computer system, called target program.
Errors
Source
program
Translator
Data
M/c language
program
Target
program
Interpretation
The interpreter reads the source program and stores it in its memory.
The CPU uses the program counter (PC) to note the address of the next instruction to be
executed.
The statement would be subjected to the interpretation cycle, which could consist the following
steps:
1.
2.
3.
TEJAS PATEL
Fetch the
instruction
Analyse the
statement and
determine its
Page
2
Processor
Interpreter
Memory
Source
prog.
+
Data
PC
Error
Q.3
Language Processor
Source
Program
Analysis
phase
Errors
Synthesis
phase
Target
Program
Errors
If language processing can be performed on statement by statement basis- that is, analysis of
source statement cab be immediately followed by synthesis of equivalent target statement. This
may not be feasible due to:
Forward reference: a forward reference of a program entity is a reference to the entity which
precedes its definition in the program.
This problem can be solved by postponing the generation of target code until more information
concerning the entity becomes available.
It leads to multipass model of language processing.
Language processor pass: a language processor pass is the processing of every statement in a
source program, to perform language processing function.
In Pass I: Perform analysis of the source program and note relevant information
In Pass II:
It once again analyses the source program to generate target code using type
TEJAS PATEL
3
Page
Processor
An intermediate representation is a representation of a source program which reflects the effect
of some, but not all, analysis and synthesis task performed during language processing.
Source
Program
Front End
Target
Program
Back End
Intermediate
representation (IR)
(For example, while an integer constant is a string of digits with an optional sign,
a reserved id is an id whose name matches one of the reserved names mentioned in the
language specification.)
Lexical analysis builds a descriptor, called a token. We represent token as
Consider following code
code#no
i: integer;
a,b: real;
a=b+i;
The statement a:b+i is represented as a string of token
a
Id#1
Op#1
Id#2
Op#2
Id#3
The IC is passed to
Semantic Analysis
Semantic
analysis of declaration
imperative statements.
statements
of variables.
analysis of
TEJAS PATEL
4
Page
Processor
The letter identifies the sequence of actions necessary to implement the meaning of a source
statement.
In both cases the structure of a source statement guides the application of the semantic rules.
When semantic analysis determines the meaning of a sub tree in the IC, it adds information to a
table or adds an action to the sequence of actions.
It then modifies the IE to enable further semantic analysis. The analysis ends when the tree has
been completely processed. The updated tables and the sequence of actions constitute the IR
produced by the analysis phase.
It adds information to a table or adds action to the sequence of actions.
The analysis ends when the tree has been completely processed.
=
a, real
a, real
+
a,real
temp,real
b, real
b,real
i, int
i*,real
Intermediate representation
IR contains intermediate code and table.
Symbol table
symbol
Type
int
real
real
i*
real
temp
real
length
address
Intermediate code
1. Convert(id1#1) to real, giving (id#4)
2. Add(id#4) to (id#3), giving (id#5)
3. Store (id#5) in (id#2)
Memory allocation
Memory allocation
is a simple task given the presence of the symbol table. The memory
requirement of an identifier is computed from its type, length and dimensionality and memory is
allocated to it.
The address of the memory area is entered in the symbol table. After memory allocation,
the
Type
int
2000
real
2001
real
2002
TEJAS PATEL
5
length
address
Page
Processor
Code generation
the synthesis phase may decide to hold the value of i* and temp in machine registers and may
generate the assembly code
Q.4
CONV_R
AREG, I
ADD_R
AREG, B
MOVEM
AREG, A
Q.5
Reduction
P1: A:=
P1: A:=
TEJAS PATEL
6
Page
Processor
Consider the grammar G
<sentence>= <noun phrase><verb phrase>
<noun phrase>= <article><noun>
<verb phrase>= <verb><noun phrase>
<article>= a| an| the
<noun>= boy | apple
<verb>= ate
following reduction.
of LG
<article><noun><verb phrase>
<article><noun><verb> an apple
<article><noun><verb><article> apple
<article><noun><verb><article><noun>
<noun phrase><verb><article><noun>
<noun phrase><verb><noun phrase>
<noun phrase><verb phrase>
<sentence>
Parse tree
A sequence of derivation or reduction reveals the syntactic structure of a string with respect to G.
We depict the syntactic structure in the form of a parse tree.
Derivation according to the production A:= gives rise to the following elemental parse tree
..
NTi
Ex:
<sentence>
<Noun phrase>
<Article>
. < Noun> i
<Verb phrase>
< Noun>
<Noun phrase>
<Article>
The
TEJAS PATEL
boy
ate
an
<Noun>
apple
Page
Language
Processor
Q.6
TEJAS PATEL
8
Page
Q.1
Q.2
Program relocation is the process of modifying the addresses used in the address sensitive
instruction of a program such that the program can execute correctly from the designated
area of memory.
Let AA be the set of absolute address - instruction or data addresses used in the instruction
of a program P.
AA implies that program P assumes its instructions and data to occupy memory words
with specific addresses.
Such a program called an address sensitive program contains one or more of the
following:
An address sensitive program P can execute correctly only if the start address of the memory
area allocated to it is the same as its translated origin.
To execute correctly from any other memory area, the address used in each address
sensitive instruction of P must be corrected.
Performing relocation
Let the translated and linked origins of program P be t_originp and l_originp, respectively.
Let its translation time address be tsymb and link time address be lsymb.
Relocation _factorp=l_originp-t_originp
Consider a statement which uses symb as an operand. The translator puts the address tsymb
in the instruction generated for it. Now,
.....(1)
TEJAS PATEL
1
Page
Using (1),
.....(2)
Let IRPp designate the set of instructions requiring relocation in program P. Following (2) ,
relocation of program P can be performed by computing the relocation factor for P and
adding it to the translation time address(es) in every instruction i IRP p.
Linking
A program unit Pi interacts with another program unit Pj by using addresses of Pjs
instructions and data in its own instructions.
To realize such interactions, Pj and Pi must contain public definitions and external references
as defined in the following: (Explain public definition and external reference)
o
Q.3
A self relocating program is a program which can perform the relocation of its own address
sensitive instructions.
Code to perform the relocation of address sensitive instructions also exists as a part
of the program. This is called the relocating logic.
The start address of the relocating logic is specified as the execution start address of the
program.
Thus the relocating logic gains control when the program is loaded in memory for the
execution.
It uses the load address and the information concerning address sensitive instructions to
TEJAS PATEL
2
Page
This is very important in time sharing operating systems where the load address of a
program is likely to be different for different executions.
Q.4
statement
Offset
0000
0001
DATA_HERE
SEGMENT
0002
ABC
DW
0003
DW?
0012
SAMPLE
SEGMENT
ASSUME
0013
25
0002
CS:SAMPLE
DS:DATA_HERE
0014
MOV
AX, DATA_HERE
0000
0015
MOV
DS, AX
0003
0016
JMP
0005
0017
MOV
AL, B
0008
AX, BX
0196
0027
MOV
0043
SAMPLE
ENDS
0044
END
Consider the above program, the ASSUME statement declares the segment register
CS and DS to be available for memory addressing.
Hence all memory addressing is performed using suitable displacement for their
contents.
This avoids the use of an absolute address; hence the instruction is not address
sensitive. Now no relocation is needed is segment SAMPLE is to be loaded with the
address 2000 because the CS register would be loaded with the address 2000 by a
calling program.
A similar situation exists with the reference to B in statement 17. The reference to B
TEJAS PATEL
3
Page
Since the DS register would be loaded with the execution time address of
DATA_HERE, the reference to B would be automatically relocated to correct address.
2. Linking requirement
In FORTRAN all program units are translated separately, hence all sub program calls
and common variable references require linking.
Pascal procedures are typically nested inside the main program; hence procedure
references do not require linking.
In C, program files are program files translated separately so, only function calls that
cross file boundaries and references to global data require linking.
A name table (NTAB) is defined for use in program linking. Each entry of the table
contains the following fields:
Symbol: symbolic name of an external reference or an object module
Linked_address: for a public definition, this field contains linked address of the
symbol. For an object module, it contains the linked origin of the object module.
Q.5
We discuss the design of a linker for the Intel 8088/80x86 processors which resembles LINK
of MS DOS in many respects.
It may be noted that the object modules of MS DOS differ from the Intel specifications in
some respects.
An Intel 8088 object module is a sequence of object records, each object record describing
specific aspects of the programs in the object module.
There are 14 types of object records containing the following five basic categories of
information:
External references
Public definitions
We only consider the object records corresponding to first three categories-a total of eight
object record types.
Each object record contains variable length information and may refer to the contents of
previous object records.
name
TEJAS PATEL
4
Page
THEADR record
80H
length
T-module name
check-sum
The module name in the THEADR record is typically derived by the translator from the source
file name.
An assembly programmer can specify the module name in the NAME directive.
LNAMES record
96H
length
name-list
check-sum
The LNAMES record lists the names for use by SEGDEF records.
SEGDEF record
98H
length
attributes
segment length
name index
(1-4)
(2)
(1)
check-sum
A SEGDEF record designates a segment name using an index into this list.
The attributes field of a SEGDEF record indicates whether the segment is relocatable or
absolute, whether (and in what manner) it can be combined with other segments, as also the
alignment requirement of its base address (e.g. byte, word or paragraph, i.e. 16 byte,
alignment).
Stack segments with the same name are concatenated with each other, while common
segments with the same name are overlapped with one another.
The attribute field also contains the origin specification for an absolute segment.
90H
length
length
base
name
offset
check-sum
check-sum
(2-4)
The EXTDEF record contains a list of external references used by the programs of this
module.
A FIXUPP record designates an external symbol name by using an index into this list.
A PUBDEF record contains a list of public names declared in a segment of the object module.
Each (name, offset) pair in the record defines one public name, specifying the name of the
symbol and its offset within the segment designated by the base specification.
LEDATA records
A0H
length
segment index
data offset
(1-2)
(2)
data
check-sum
An LEDATA record contains the binary image of the code generated by the language
translator.
Segment index identifies the segment to which the code belongs, and offset specifies the
TEJAS PATEL
5
Page
length
locat
fix
frame
target
target
(1)
dat
datum
datum
offset
(1)
(1)
(1)
(2)
checksum
A FIXUPP record contains information for one or more relocation and linking fixups to be
performed.
The locat field contains a numeric code called loc code to indicate the type of a fixup.
Meaning
Offset is to be fixed.
Segment is to be fixed.
locat also contains the offset of the fixup location in the previous LEDATA record.
The frame datum field, which refers to a SEGDEF record, identifies the segment to which the
fixup location belongs.
The target datum and target offset fields specify the relocation or linking information.
Target datum contains a segment index or an external index, while target offset contains an
offset from the name indicated in target datum.
The fix dat field indicates the manner in which the target datum and target offset fields are to
be interpreted.
The numeric codes used for this purpose are given in below table.
code
MODEND record
8AH
length
type
start addr
(1)
(5)
check-sum
The MODEND record signifies the end of the module, with the type field indicating whether it
is the main program.
This has two components: (a) the segment, designated as an index into the list of segment
names defined in SEGDEF record(s), and (b) an offset within the segment.
TEJAS PATEL
6
Page
An overlay is part of a program (or software package) which has the same load origin as
some other part of the program.
A set of overlays.
To start with, the root is loaded in memory and given control for the purpose of execution.
Note that the loading of an overlay overwrites a previously loaded overlay with the same load
origin.
It also makes it possible to execute programs whose size exceeds the amount of memory
which can be allocated to them.
For linking and execution of an overlay structured program in MS DOS the linker produces a
single executable file at the output, which contains two provisions to support overlays.
Second, all calls that cross overlay boundaries are replaced by an interrupt producing
instruction.
To start with, the overlay manager receives control and loads the root.
This interrupt is processed by the overlay manager and the appropriate overlay is loaded into
memory.
When each overlay is structured into a separate binary program, as in IBM mainframe
systems, a call which crosses overlay boundaries leads to an interrupt which is attended by
the OS kernel.
Q.7
Control is now transferred to the OS loader to load the appropriate binary program.
TEJAS PATEL
7
Page
Assembler is loaded in one part of memory and assembled program directly into their
assigned memory location
After the loading process is complete, the assembler transfers the control to the starting
instruction of the loaded program.
Advantages
The user need not be concerned with the separate steps of compilation, assembling,
linking, loading, and executing.
Program
loader in
memory
Source
program
Compiler & go
assembler
Assembler
Disadvantages
There is wastage in memory space due to the presence of the assembler.
2) Absolute loader
It is a simple type of loader scheme which fits object code into main memory without
relocation.
This load accepts the machine text and placed into main memory at location prescribe by the
translator.
Advantage
Very simple
Disadvantage
A program unit Pi interacts with another program unit Pj by using address of Pj s instruction
and data in its own instruction.
To realize such instruction pj an dpi must contain public definitions and external reference
Public definition: program unit which may be referenced in other program unit
External reference: This is not defined in program unit containing the reference.
ENTRY statement: this list the public definition of the program unit.
EXTRN statement: lists the symbol in which external references are made in the program
TEJAS PATEL
8
Page
unit.
4) Relocating loader (BSS loader)
To avoid possible assembling of all subroutine when a single subroutine is changed and to
perform task of allocation and linking for the programmer, the general class of relocating
loader was introduced.
In the above program the address of var5iable X in the instruction ADD AREG, X will be 30
If this program is loaded from the memory location 500 for execution then the address of X
in the instruction ADD AREG, X must become 530.
Offset=10
ADD AREG,X
500
ADD AREG,X
X DS 1
Offset=30
530
X DS 1
It is a general re-locatable loader and is perhaps the most popular loading scheme presently
used.
Advantages
Accessing ability
Relocation facility
Disadvantage
TEJAS PATEL
9
Page
6) Dynamic loader
In order for the overlay structure to work, it is necessary for the module loader to load their
various procedures as they are needed.
The portion of the loader that actually interprets the calls and loads the necessary procedure
is called overlay supervisor or flipper.
Q.8
Q.9
The objet module of a program contains all information necessary to relocate and link the
program with other programs.
Header: The header contains translated origin, size and execution start address of P.
2.
3.
Relocation table: (RELOCTAB) This table describes IRRP. Each RELOCTAB entry contains a
TEJAS PATEL
10
Page
single field:
Translated address: Translated address of an address sensitive instruction.
4.
Linking table (LINKTAB): This table contains information concerning the public definitions and
external references in P.
Each LINKTAB entry contains three fields:
Symbol:
Symbolic name
For a public definition, this is the address of the first memory word
allocated to the symbol. For an external reference, it is the address of
the memory word which is required to contain the address of the symbol.
Example:
Statement
Address
START
500
ENTRY
TOTAL
EXTRN
HAX, ALPHA
Code
+ 09 0 540
A
READ
500) 501)
LOOP
.
.
.
+ 04 1 000
MOVER
AREG, ALPHA
518)
BC
ANY, HAX
519)
+ 06 6 000
. .
.
BC
LT, LOOP
STOP
538) 539)
DS
TOTAL
DS
540)
END
541)
+ 06 1 601
+ 00 0 000
TEJAS PATEL
ALPHA
EXT
518
MAX
EXT
519
PD
540
Page
11
Q.1
MacroProcessors
Macro prototype statement: it declares macro name and formal parameter list
2.
One or more model statement: from which an assembly statement can be generated
3.
Macro call:A macro is called by writing macro name in the mnemonics field and set of actual
parameters.
<macro name>[<actual parameter name>]
Q.2
This determines the order in which model statements are visited during macro expansion.
A preprocessor statement can alter flow of control during expansion such that model
statements are never visited during expansion (conditional expansion) or repeatedly visited
during expansion (expansion time loop).
The flow control during macro expansion is implemented using a macro expansion
counter(MEC)
Algorithm:
1. MEC:= statement number of first statement following the prototype statement;
2. While statement pointed by MEC is not a MEND statement
(a) If a model statement then
(i) Expand the statement.
(ii) MEC:= MEC+1;
(b) Else (i.e. a preprocessor statement)
(i) MEC:= new value specified in the statement;
3.
MEC is set to point at the statement following the prototype statement. It is incremented by 1after
expanding a model statement
TEJAS PATEL
1
Page
MacroProcessors
2. Lexical substitution
Amodel statement consist of 3 types of strings:
1. An ordinary string, which stand for itself
2. Name of formal parameter which is preceded by the character &.
3. Name of preprocessor variable, is preceded by the character &.
During lexical expansion, strings of type 1 are retained without substitution. Strings of types 2 and 3
are replaced by the values of the formal parameters or preprocessor variables.
2.1 Positional parameters
A positional formal parameter starts with '&' sign and it is defined in operand field of macro
name.
The actual parameters of macro call on macro using positional parameters are simply ordinary
string.
The value of first actual parameter of macro call is assigned to first positional formal
parameter defined in operand field of macro name.
The value of second actual parameter of macro call is assigned to second positional" formal
parameter defined in operand field of macro name.
Similarly the value of nth actual parameter is assigned to nth positional formal parameter
defined in operand field of macro name.
A keyword formal parameter starts with &KW string or &OP string or ® or &CC depending
on macro processor. It is defined in operand field of macro name.
A keyword formal parameter ends with = sign depending on macro processor. It is defined in
operand field of macro name.
Formal keyword parameter mayor may not have default value. Again this is depends on macro
processor.
The actual parameter of macro call on macro using keyword parameter is simply ordinary
string if they are used as positional parameters.
Keyword parameter is always used at the place of mnemonic instruction or at the place of
operand 1.
Value of keyword parameter is always keywords. That are ADD, SUB, AREG, BREG, LT, LE etc.
A label formal parameter starts with &LAB string depending on macro processor. It is defined
in operand field of macro name.
A label formal parameter ends with = sign depending on macro processor. It is defined in
operand field of macro name.
Every label formal parameter should not have any default value. Again this depends on macro
processor.
The actual parameter of macro call on macro using label parameter is simply ordinary string if
they are used as a positional parameter.
TEJAS PATEL
2
Page
MacroProcessors
A macro may be defined to all parameters i.e. positional parameter, keyword parameter and
label parameter
Q.3
Find the original position of XYZ in the list of formal parameter in the macro prototype
statement.
2.
Find the actual parameter specification occupying the same ordinal position in the list
of actual parameter in macro call statement.
Keyword parameter
Keyword parameters are used for following purposes: 1.
Default value can be assigned to the parameter
2.
During a macro call, a keyword parameter is specified by its name. it takes the
following form:
<parameter name>=<parameter value>
MACRO
INCR &VARIABLE=X, &INCR=Y, ®=AREG
MEND
Q.4
Compare the features of subroutine and macros with respect to following: (i) Execution
Speed (ii) Processing requirement by assembler (iii) Flexibility and generality
Macros use string replacement for its invocation whereas subroutines use calls.
Due to replacement nature, macro can exist multiple copies in the programs whereas
subroutines can exist only in one copy.
Because of multiple copies possibility, you cannot obtain a macros address, whereas you can
obtain a subroutines address.
Macros can be faster since it doesnt have calling and return time penalty.
Macros can be harder to debug since the replacement may be obstacle in the resulting code.
TEJAS PATEL
Page
SUBROUTINE
transfers
to
the
program
subroutine
again
and
and
after
executing
remaining instructions.
This
process
not
required
any
stack
of program
It
requires
extra
processing
time
for
SUBROUTINE
follows
follows
[Label]<CALL><Subroutine name>
Example:
FACTORIAL
A,FACT
Example:
CALL FACTORIAL
Where
subroutine.
(iii)
is
the
name
of
nesting
in
FACTORIAL
SUBROUTINE
We
can
use
looping
and
level.
TEJAS PATEL
4
Page
environment
space
Q.5
A model statement in a macro may constitute a call on another macro. Such calls are known
as nested macro calls.
We refer to the macro containing the nested call as the outer macro and the called macro as
the inner macro.
Expansion of nested macro calls follows the last-in-first-out (LIFO) rule. Thus, in a structure of
nested macro calls, expansion of the latest macro call (i.e. the innermost macro call in the
structure) is completed first.
Example
The below defined is the definition of INCR_D macro.
MACRO
INCR_D
&MEM_VAL=,&INCR_VAL=, ®=AREG
MOVER
®, &MEM_VAL
ADD
®, &INCR_VAL
MOVEM
®, &MEM_VAL
MEND
Macro COMPUTE defined below contains a nested call on macro INCR_D defined above.
MOVEM
BREG, TM
MOVER
BREG, X
ADD
BREG, Y
MOVEM
BREG, X
MOVER
BREG, TM
MACRO
COMPUTE
MOVEM
BREG, TMP
INCR_D
MOVER
BREG, TMP
MEND
TEJAS PATEL
5
Page
X, Y
is described as follows.
+
COMPUTE X , Y
INCR_D
X,Y
MOVER BREG,TEMP[5]
+ ADD BREG, Y 3
+ MOVEM BREG, X 4
Q.6
If the relational expression evaluates to true, expansion time control is transferred to the
statement containing <sequencing symbol> in its label field.
AGO
It is often necessary to generate many similar statements during the expansion of a macro.
Expansion time loops can be written using expansion time variables (EVs) and expansion time
control transfer statements AIF and AGO.
Example
MACRO
&M
CLEAR
&X, &N
LCL
&M
SET
MOVER
AREG, =0
MOVEM
AREG, &X+&M
.MORE
SET
&M + 1
&M
AIF
(&M NE N) .MORE
TEJAS PATEL
6
Page
The expansion of model statement MOVEM, AREG, &X+&M thus leads to generation of the
statement MOVEM AREG, B.
Expansion time variables (EV's) are variables which can only be used during the expansion of
macro calls.
A global EV exists across all macro calls situated in a program and can be used in any macro
which has a declaration for it.
Local and global EV's are created through declaration statements with the following syntax:
o
<EV specification> has the syntax &<EV name>, where <EV name> is an ordinary string.
where< EV specification > appears in the label field and SET in mnemonic field.
LCL
&A
SET
DB
&A
SET
&A+l
DB
&A
MEND
The second SET statement assigns the value '2' to A and the second DB statement declares a
constant '2'.
3.
It represents information about the value of the formal parameter, i.e. about the
TEJAS PATEL
7
Page
The type, length and size attributes have the names T, L and S.
Example
MACRO
DCL_CONST
&A
AIF
(L'&A EQ 1) .NEXT
--.NEXT
-MEND
Here expansion time control is transferred to the statement having .NEXT field only if the
actual parameter corresponding to the formal parameter length of ' 1'.
Q.7
Lexical expansion implies replacement of a character string by another character string during
program generation.
Semantic expansion:
Semantic expansion is characterized by the fact that different uses of a macro can lead to
codes which differ in the number, sequence and opcodes of instructions.
Eg: Generation of type specific instructions for manipulation of byte and word operands.
It can be achieved by a combination of advanced macro facilities like AIF, AGO statements and
expansion time variables.
Here, the number of MOVEM AREG, Statements generated bya call on CLEAR is determined
by the value of the second parameter of CLEAR.
Macro EVAL of example is another instance of conditional expansion wherein one of two
alternative code sequences is generated depending on the peculiarities of actual parameters of
a macro call.
&X, &Y
TEJAS PATEL
8
Page
&Y
AIF
(T &X EQ B) .BYTE
DW
25
&A
Q.8
.OVER
AGO .BYTE
ANOP
&Y
DB
.OVER
MEND
25
nd
The type of the constant matches the type of the first parameter.
parameter.
Describe task and data structures considered for the design of a macro preprocessor
Macro preprocessor
The macro preprocessor accepts an assembly program containing macro definitions and calls
and translates it into an assembly program which does not contain any macro definition or
calls.
The program from output by the macro preprocessor can now be handed over to an assembler
to obtain the target form output by macro preprocessor can now be handed over to an
assembler to obtain language form of program.
Macro PreAssembler
processor
Program
Without
Target program
Macros
Task has identified the key data structures of the macro preprocessor. To obtain a detailed
design of the data structures it is necessary to apply the practical criteria of processing
efficiency and memory requirements.
The tables APT, PDT and EVT contain pairs which are searched using the first component of
the pair as a key-for example, the formal parameter name is used as the key to obtain its
value from APT. This search can be eliminated if the position of an entity within a table is
known when its value is to be accessed. We will see this in the context of APT.
The value of a formal parameter ABC is needed while expanding a model statement using it,
viz.
TEJAS PATEL
9
Page
Let the pair (ABC, ALPHA) occupy entry #5 in APT. The search in APT can be avoided if the
model statement appears as
MOVER AREG, (P, 5)
in the MDT, where(P, 5) stand for the words parameter #5.
Thus, macro expression can be made for efficient by storing an intermediate code for a
statement, rather than its source form, in the MDT.
All parameter names could be replaced by pairs of the form (P, n) in the model statement and
preprocessor statement stored in MDT.
An interesting offshoot of this decision is that the first component of the pairs stored in APT is
no longer used during macro expansion, e.g. the information (P, 5) appearing in a model
statement is sufficient to access the value of formal parameter ABC. Hence APT containing
(<formal parameter name>, <value>) pairs is replaced by another table called APTAB which
only contains <value>'s.
To implement this simplification, ordinal numbers are assigned to all parameter of a macro. A
table named parameter name table (PNTAB) is used for this purpose.
Parameter names are entered in PNTAB in the same order in which they appear in the
prototype statement.
The entry # of a parameter's entry in PNTAB is now its ordinal number. This entry is used to
replace the parameter name in the model and preprocessor statements the macro while
storing it in the MDT.
In effect, the information (<formal parameter name>, <value>) in APT been split into two
tables PNTAB which contains formal parameter name.
APTAB - which contains formal parameter value.(i.e. contains actual parameter)
Table
Q.9
Parameter name
EV name
SS name
Value
SS Table (SSTAB)
MDT entry #
Explain design specification task for macro preprocessor with suitable example
TEJAS PATEL
10
Page
MacroProcessors
Design Overview
2.
3.
4.
5.
6.
The following 4 step procedure is followed to arrive at a design specification for each task:
1.
2.
3.
4.
TEJAS PATEL
11
Page
Macro definition table (MDT) stores set of preprocessor statements and model statements. The flow of
control during macro expansion determines when a model statement is to be visited for expansion. It
is updated after expanding a model statement or on processing a macro preprocessor statement.
Determine values of sequencing symbols
A sequencing symbols table(SST) is maintained to hold this information. The table contains pairs of
the form
(<sequencing symbols name >, <MDT entry #>)
where<MDT entry #> is the number of the MDT entry which contains the model statement defining
the sequencing symbol.
Perform expansion of a model statement
This is a trivial task given the following:
1.
2.
Values of formal parameters and EV's are available in APT and EVT, respectively.
3.
The model statement defining a sequencing symbol can be identified from SST.
4.
Q.10
Write a macro that moves n number from the first operand to the second operand, where n
is specified as third operand of the macro.
MACRO
MOVEA
LL
&M
LCL
&M
SET
.NE
MOVER
XT
&M
&M + 1
SET
AIF
MEND
Q.11
MOVER
AREG, &X
MUL
AREG, &Y
MOVEM
AREG,&X
MOVER
AREG, &Y
MUL
AREG, &Z
ADD
AREG, &X
MEND
Q.12
Draw a flow chart and explain simple one pass macro processor.
TEJAS PATEL
12
Page
Start
MDTC =1
MNTC =1
Read line
From
source
No
Is Macro
Pseudo up
Yes
Is
END
Read line
From
source
Yes
Go for
Assembl
Update
MNT
N
o
Search
in
No
Found
Yes
Update
PNTAB
Read line
From
Write into
output source
file
Replace formal
parameter
Write into
output
MDTC++
Is
MEND
?
No
Yes
In this type of preprocessor only one pass is used to construct data structure and use that data
structure.
It is also called as preprocessor, Because it is processed before translator. It is shown in figure.
TEJAS PATEL
13
Page
Source code
with macro
One pass
Macro processor
MNT
MDT
PNTAB
APTAB
SSTAB
KPPTAB
Source code
without macro
Step 2: Read LC
4.2: Read LC
TEJAS PATEL
Page
14
MacroProcessors
Find out total number of parameter, keyword parameter and expansion time variables
and store it in MNT.
Store the value of all pointers in MNT.
4.5: Update PNTAB, KPDTAB, EVNTAB, SSNTAB, SSTAB.
4.6: Increments all the pointers of updated tables.
4.7: MNTP=MNTP+1.
4.8: LC=LC+1.
th
4.9: Read LC line from source code that means input program.
4.10: Isolate label instruction and operand from line and store it into MDT at MDTP location.
4.11: MDTP=MDTP+ 1.
4.12 : If instruction="MEND"
If yes
Go to step 2.
If no
Go to step 4.6.
If no
Go to step 4.
Step 5: Search instruction in MNT.
Step 6: If instruction found in MNT?
If yes
6.1: Find out Actual parameter &store it in APTAB.
6.2: Find out MDTP from MNT.
6.3: Search macro definition from MDT at MDTP position.
6.4: Adjust all model statements as follows.
6.4.1: Replace Actual parameters with formal parameters using PNTAB, KPDTAB, and APTAB.
6.4.2: Replace each expansion time variable name with its value using EVNTAB, EVTAB.
6.4.3: Find out labels from SSNTAB and its address from SSTAB, sequence label with
sequence number and replace it in old place.
6.5: Write all these adjusted model statements in output source file.
6.6: LC=LC+1.
6.7: Go to step 2.
If no
6.8: If instruction ="END"
If yes
Go to Assembler.
If no
Write line in output source file LC=LC+1.
Go to step 2.
TEJAS PATEL
Page
15
Q.1
Write
unambiguous
production
rules
(grammar)
for
arithmetic
expression
TEJAS PATEL
1
Page
F
F
<id>
<id> <id>
<id>
<id>
id
*
id
id
id
Q.3
id
TEJAS PATEL
2
Page
Parsing
A-> 1 |.| n
EX:
A->xByA | xByAzA | a
B->b
Left factored, the grammar becomes
A->xByAA | a
A->zA |
B-> b
Left Recursion:
A grammar is left-recursive if we can find some non-terminal A which will eventually derive
a sentential form with itself as the left-symbol.
Immediate left recursion occurs in rules of the form
Where
and are sequences of non-terminals and terminals, and doesn't start with.
A -
-
The general algorithm to remove immediate left recursion follows.
A -mn
where:
A is a left-recursive nonterminal
A - -m
And create a new nonterminal
-n
Q.4
Let CSF be of the form A, such that is a string of Ts and A is the leftmost NT in CSF.
Exit with success if CSF=
TEJAS PATEL
3
Page
Go to step 2.
Ex:
Consider a given grammar
S->aAb
A->cd | c derive string acb
S
backtracking
CSF
symbol
prediction
<id>
E->TE
TE
<id>
T->VT
VTE
<id>
V-><id>
<id>TE
T->
<id>E
E->+E
<id>+E
<id>
E->TE
<id>+TE
<id>
T->VT
<id>+VTE
<id>
V-><id>
<id>+<id> TE
T->*T
TEJAS PATEL
4
Page
<id>+<id> *TE
<id>
T->VT
11
<id>+<id>*V TE
<id>
V-><id>
12
<id>+<id>*<id>TE
T->
13
<id>+<id>*<id>E
E->
14
<id>+<id>*<id>
TEJAS PATEL
5
Page
The' 1' in LL(1) indicates that the grammar uses a look-ahead of one source symbol-that is, the
prediction to be made is determined by the next source symbol.
FOLLOW
{(,id}
{$,)}
{+,}
{$,)}
{(,id}
{+,$,)}
{*,}
{+,$,)}
{(,id}
{+,*,$,)}
Non- terminal
<id>
E =>TE
E
T =>FT
E =>
E =>
T =>
T =>
T=>FT
T =>
-|
E=>TE
E => +TE
T =>* FT
F =><id>
F=>(E)
A parsing table entry PT (nti, t j) indicates what prediction should be made if ntiis the leftmost
NT in a sentential form and tjis the next source symbol.
A blank entry in PT indicates an error situation.
A source string is assumed to be enclosed between the symbols ' |-' and ' -|'.
Hence the parser starts with the sentential form |- E -|.
The
sequence
of
predictions
made
by
the
parser
for
the
source
string
Symbol
Prediction
|- E -|
<id>
E => TE
|- TE -|
<id>
T => FT
|- FTE -|
<id>
F =><id>
T => *FT
|- <id>TE -|
|-
TEJAS PATEL
6
Page
<id>
F =><id>
T => *FT
<id>
F =><id>
|- <id>*<id>TE -|
|- <id>*<id>*FTE -|
Q.5
|- <id>*<id>*<id>TE -|
T =>
|- <id>*<id>*<id>E -|
E => +TE
|- <id>*<id>*<id>+TE -|
<id>
T => FT
|- <id>*<id>*<id>+FTE -|
<id>
F =><id>
|- <id>*<id>*<id>+<id>TE -|
-|
T =>
|- <id>*<id>*<id>+<id>E -|
-|
E =>
|- <id>*<id>*<id>+<id> -|
Q.6
SSM := SSM + 1;
n := n + 1;
goto step 2;
2) Operator Precedence Parsing
TEJAS PATEL
7
Page
What is operator precedence parsing? Show operator precedence matrix for following
operators :+,-,*,(,). Parse following string: |-<id> + <id> * <id>-|(GTU Dec_11,Jan_13)
Operator precedence parsing is based on bottom-up parsing techniques and uses a precedence
table to determine the next action.
Disadvantages
1.
It cannot handle the unary minus (the lexical analyzer should handle
the unary
minus).
2.
3.
Advantages
1.
simple
2.
LHS
oper ator s
RHS operators
+
<
<
<
<
<
<
<
<
<
>
<
<
<
<
id
<
|-
<
>
>
>
>
>
>
.
>
.
>
>
>
>
>
>
<
>
<
>
>
>
>
<
>
<
>
>
>
<
>
<
id
-|
<
<
<
<
<
<
>
<
<
<
>
>
>
>
<
<
>
>
>
>
>
>
=
.
>
.
>
>
<
>
>
TEJAS PATEL
8
Page
..
..
..
E+ E * < Id > -|
E+E*E
+*
Insert |- and -|
|- +* -|
.
..
|- < + >-|
|- -|
Parsing Done
Figures (a)-(c) show the stack and the AST when current operator is '+', '*' and '-|'
respectively.
This leads to reduction of '*'. Figure (d) shows the situation after the reduction.
TEJAS PATEL
9
Page
(a)
Stack
SB,TOS
AST
a
||-
(b)
SB
TOS
b
|-
(c)
-|
a
b
c
SB
TOS
*
|-
-|
(d)
SB
TOS
*
b
(e)
-|
SB,TOS
c
+
|a
c
Q.7
Input buffer
Action
Id-id*id$
Shift
$id
-id*id$
Reduce E->id
$E
-id*id$
shift
$E-
id*id$
shift
$E- id
*id$
Reduce E->id
$E-E
*id$
shift
$E-E*
id$
shift
$E-E*id
Reduce E->id
$E-E*E
Reduce E->E*E
TEJAS PATEL
10
$E-E
Reduce E->E-E
$E
Accept
Page
Parsing
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not work, we backtrack to
try other alternatives.)
Not efficient
Predictive Parsing
no backtracking
efficient
Bottom up parser
Bottom-up parsers build parse trees from the leaves and work up to the root.
Bottom-up syntax analysis known as shift-reduce parsing.
Shift-reduce parsing
Shift input symbols until a handle is found. Then, reduce the substring to the nonterminal on the lhs of the corresponding production.
Operator-precedence parsing
At each reduction step a particular substring matching the right side of a production is replaced
by the symbol on the left of that production, and if the substring is chosen correctly at each
step, a rightmost derivation is traced out in reverse.
Q.9
[+|-](d)+
TEJAS PATEL
11
Page
[+|-](d)+.(d)+
[+|-](d)+.(d)*
identifier
l( l|d)*
Next Symbol
Start
Id
Int
Id
Id
Id
Int
Int
S2
Real
Real
Real
S2
Q.10
Functions:Newnode(operator,
l_operatorand_pointer,
r_operand_pointer)
creates
node
with
appropriate
Pointer fields and returns a pointer to the node.
1. TOS:= SB-1; SSM=0;
2. Push |- on the stack.
3. Ssm=ssm+1;
4. x:=newnode(source symbol, null, null)
TOS.operand_pointer:=x;
TEJAS PATEL
12
Page
Parsing
Go to step 3;
5. while TOS operator .> current operator,
x:=newnode(TOS operator, TOSM.operand_pointer, TOS.operand_pointer)
pop an entry of the stack;
TOS.operand_pointer:=x;
6. If TOS operator <. current operator, then
Push the current operator on the stack.
Go to step 3;
7. while TOS operator .= current operator, then
if TOS operator = |-- then exit successfully
if TOS operator =(, then
temp:=TOS.operand_pointer;
pop an entry off the stack
TOS.operand_pointer:=temp;
Go to step 3;
8. if no precedence define between TOS operator and current operator the report error and exit
unsuccessfully.
Q.11
TEJAS PATEL
Page
13