System Programming
System Programming
Class: TECSE-I
SYSTEM
PROGRAMMIMG
Unit 1: Language Processor
➢ Introduction: -
• The designer expresses the ideas in terms related to the application
domain of the software.
• To implement these ideas, their description has to be interpreted in
terms related to the execution domain of the computer system.
• The term semantics to represent the rules of the meaning of a domain
and the term semantic gap to represent the difference between the
semantics of the two domains.
• The semantic gap has many consequences. Some of the important
are once being large development efforts and poor quality of
software.
• These issues are taken by software engineering through the use of
methodologies and programming language (PL’s).
Semantic gap
b) PL implementation steps.
➢ Interpreters: -
• An interpreter is a language processor which bridges an execution
gap without generating a machine language program.
• Interpreter is a language translator.
• The absence of target program implies the absence of an output
interface of the interpreter.
• Thus the language processing activites.
• Where interpreter domain encompasses the PL domain as well as
execution domain.
• Thus the specification language of PL domain is identical with
specification language of interpreter domain, interpreter also
incorporate with execution domain.
➢ Program Generation: -
• The program generator is a software system which accepts the
specification of a program to be generated,and generates a program
in target PL.
• The program generator introduces a new domain between
application domain and PL domain i.e program generator domain.
• The application gap is the now between the application domain and
program generator domain.This gap is smaller than the gap between
the application domain and target PL domain.
• Reduction in specification gap increases the reliability of the
generated program.Since the program generator domain is close to
the application domain.
• The execution gap between the target PL domain and the execution
domain is bridged by the compiler or interpreter for the PL.
d) Program interpretations:
➢ Analysis Phase: -
• The analysis phase uses the each component of the source language
specification to determine relevant information concerning a
statement in source program.
• Thus, analysis of source statement consist of lexical,syntax,and
semantics analysis.
➢ Synthesis Phase: -
• The Synthesis phase is concerned with the construction of target
language statement which have same meaning as a source
statement.
• Typically, this consist of two main activities:
▪ Creation of data structure in target program.
▪ Generation of code.
Example:
1. i int
2. a real
3. b real
4. i* real
5. temp real
Intermediate code:
1) Convert (Id,#1)to
real,giving(Id,#4).
2) Add (Id,#4),to(Id,#3),giving
(Id,#5).
3) Store (Id,#5)in (Id,#2).
• Lexical analysis(Scanning):
▪ Lexical analysis identifies the lexical units in the source
statement.
▪ It then classifies the units into different lexical classes.eg-
id’s,constants,reserved id’s etc.
▪ And enters them into different tables.
▪ Lexical analysis builds a descriptor,called a token,for each
lexical unit.A token contains a lexical unit belongs,number in
class is the entry number of the lexical unit in the relevant
table.
• Syntax analysis:
▪ Syntax analysis processes the string of tokens built by lexical
analysis to dertermine the statement class.
▪ Eg-assignment statement,if statement etc.
▪ It then builds an IC which represents the structure of the
statement.The IC is passed to semantic analysis to determine
the meaning of the statement.
• Semantic analysis:
▪ Semantic analysis of declaration statements differs from the
semantic analysis of imperative statements.
▪ The latter identifies the sequences of actions necessary to
implement the meaning of the source statement .
▪ In both cases the structure of a source statement guides the
application of the semantic rules.
▪ When semantic analysis determines the meaning of a subtree
in the IC,it adds information to table or adds an action to the
sequence of actions. It then modifies the IC to enable further
semantic analysis .
▪ The analysis ends when the tree has been completely
processed.
▪ The updated tables and the sequence of actions constitute the
IR produced by the analysis phase.
o Memory allocation:
▪ Memory allocation is a simple task given the presence of the
symbol table.
▪ The memory requirement of an identifier is computed from its
type,length and dimensionality,and memory is allocated ton
it.
▪ The address of the memory area is entered in the symbol table.
o Code Generation:
▪ Code generation uses the knowledge of the target
architecture,viz knowledge of instructions and addressing
modes in the target computer,to select the appropriate
instructions.
▪ The important issues in code generation are:
1) Determine the places where the intermediate results should
be kept,i.e whether they should be kept in memory
locations or held in machine registers. This is preparatory
step for code generation.
2) Determine which instructions should be used for type
conversion operators.
3) Determine which addressing modes should be used for
accessing variable.
Fig. Back End of a Toy Compiler
• Classification of grammars:
o Grammars are defined on the basis of the nature of productions used
in them(Chomsky,1963).
o Each grammar class has its own characteristics and limitations.
o Type 0 grammars: These grammars are also known as phrase
structure grammars, contain productions of the form,
a:: =b
where both a and b can be strings of T’s and NT’s. Such productions
permits arbitrary substitution of strings during derivation or
reduction ,hence they are not relevant to specification of
programming language.
a A b::=a pi b
Note that this grammar also satisfy the type 2 grammar. Type 3
grammar are also known as linear grammars or regular grammars.
• YACC:
YACC (Yet Another Compiler Compiler) is the standard
parser generator for the UNIX Operating System. Each string
specification in the input to YACC resembles a grammar production.
The parser generated by YACC performs reductions
according to this grammar.The actions associated with a string
specification are executed when a reduction is make according to the
specification.
An attribute is associated with every nonterminal
symbol. The value of this attribute can be manipulated during
parsing .The attribute can be given any user designed structure. A
symbol in the RHS of the string specification ‘$$’represents the
attribute of the LHS symbol of the string specification.
Unit 2: ASSEMBLERS
Statement format:-
An assembly language statement has the following
format:
[Label] <Opcode> <operand spec>[,<operand spec>…]
where,
-1st specification refers- memory word with which the name AREA is
associated.
Fig 2.1 lists the mnemonic opcodes for machine instructions. The MOVE
instructions move a value between a memory word and a register. In the
MOVER instruction the second operand is the source operand and first operand
is the target operand. A comparison instruction sets a condition code analogues
to a subtract instruction without affecting the values of operands.
1. Imperative statements:-
An imperative statement indicates an action to be performed during the
execution of the program. Each imperative statement translates into one
machine instruction.
2. Declaration statements:-
The syntax of declaration statement is as follows:
[Label] DS <constant>
[Label] DC <Value>
The DS is declare storage. The DS statement reserves areas of memory
and associates names with them.
For eg:- The first statement reserves a memory area of 1 word and
associates the name A with it.
The second statement reserves a block of 200 memory words. The name
G is associated with first memory word. Other words can be accessed
through offsets from G.
e.g. :- G+5 for 6th word of memory block etc.
The DC is short for declare constants, and DC statements constructs
memory words containing constants.
The statement:-
ONE DC ‘1’.
associates the name ‘one’ with a memory word containing the value ‘1’.
Constants can be declared by the programmer in different forms-decimal,
binary, hexadecimal.
Use of Constants:-
The DC does not really implement constants, it just
initializes memory words to the given words to the given values. These
values may be changed by moving a new value into the memory word. An
assembly program can use constants in two ways – as immediate
operands and as literals.
The immediate operands can be used in an assembly
statement only if architecture of target machine includes the necessary
features.
Consider the assembly statement
ADD AREG,5
This statement is translated into an introduction with two operands-AREG
and value ‘5’ as an immediate operand.
• A literal is an operand with the syntax=’[value]’
It differs from a constant because its location cannot be specified in the
assembly program. Due to this fact, its value is not changed during the
execution of the program. It differs from an immediate operand because
no specific architecture is needed for its use. An assembler handles a
literal by mapping its use into the features of assembly language.
3. Assembler Directives:-
Assembler directives instruct the assembler to
perform certain activities during the assembly of a program. Some
assembler directives are:-
a. START [constant]:-
This directive indicates that the first word of the
target program should be placed in the memory word with
address[constant].
b. 2 END [operand spec]:-
This directive indicates the end of the source
program. The [operand spec] indicates the address of the inst.
where execution of the program begin.
1. Synthesis phase :-
The first item of info. Depends on the source program. So, it must be
made available by analysis phase.
The second item does not depends on source program, it depends on the
assembly language. Hence the synthesis phase can determine this
information itself.
e.g.(101,ONE)
By the time, the END statement is processed, the symbol table would
contain the addresses of all symbols defined in the source program and TII
would contain info. Describing forward references. The assembler can now
process each entry in TII to complete the concerned instruction.
Tasks are performed by the passes of two pass assemble are as follows:-
PASS 1:-
(1) Separate the symbol, mnemonic opcode and operand fields.
(2) Build the symbol table.
(3) Perform LC processing.
(4) Construct intermediate representation
PASS 2:-
The Pass 1 performs analysis of the source program and synthesis of the
intermediate representation while Pass 2 processes the intermediate
representation (IR) to synthesize the target program. Before the details of design
of assembler passes, we should know about advanced assembler directives.
Advanced Assembler Directives:-
(1) ORIGIN:-
The syntax of this directive is
ORIGIN <address spec>
where <address spec> is an <operand spec> or <constant>. This directive
indicates that LC should be set to the address given by <address spec>. The
‘ORIGIN’ statement is useful when the target program does not consist of
consecutive memory words. The ability to use an in the ORIGIN statement
provides the ability to perform LC processing in a relative manner rather
than absolute manner.
(2) EQU:-
<symbol> EQU <address spec>
where <address spec> is an <operand spec> or <constant>. The EQU
statement defines the symbol to represent <address spec>. This differs from
DC/DS statement as associates the name <symbol> with <address spec>.
Fig 2.7 An assembly program illustrating ORIGIN
(3) LTORG:-
The LTORG statement permits a programmer to specify where
literals should be placed. By default, assembler places the literals after the END
statement. At every LTORG statement, the assembler allocates memory to the
literals of a literal pool. This pool contains all the literals used in the program.
The LTORG directive has less relevance (applicapability) for the simple
assembly languages because allocation of literals at intermediate points in the
program is efficient rather than at the end.
Pass 1 Algorithm: -
(1) Loc ctr = ‘0’ (default value)
Littab ptr = ‘1’
Pool tab ptr = ‘1’
Mnemonic field: -
The mnemonic field contains a pair of the form,
(statement class, code)
where statement class can be IS (Imperative statement), DL(Declaration
statement) or AD (Assembler directives).
For imperative statement, code is the instruction opcode in the machine
language.
For declaration and assembler directives, code is an ordinal number within
the class.
Figure 2.11 shows the codes for various declaration statements and
assembler directives.
Fig 2.12 Codes for declaration statements and directives
Variant 1: -
We consider two variants of intermediate code which differ in the
information contained in their operand fields.
The address field is assumed to contain identical information in
both variants.
The first operand is represented by a single digit
• 1-4 for AREG-DREG
• 1-6 for LT-ANY
The second operand which is a memory operand is represented by
(operand class, code)
where operand class is one of C (constants), S (symbol), L (literals).
• For constant, the code field contains the internal representation of
the constant itself.
• For symbol or literals, code field contain the ordinal number of the
operand’s entry in SYMTAB or LITTAB .
Fig 2.13 Intermediate code – Variant 1
Variant 2: -
In variant II,
(f) If size != 0
Store the memory buffer code in code area address.
Loc ctr = loc ctr + size
where <macro name> appears in the mnemonic field of assembly Statement and
<formal parameter specification> is of the form
2. Model statements:-
A model statement is a statement from which assembly language statements
may be generated during macro expansion.
Macro call
A macro is called by writing the macro name in the mnemonic field of an
assembly statement. The macro call has syntax
MEND
1. Lexical expansion.
2. Semantic expansion
1. Lexical expansion:-
Lexical expansion implies replacement of a character string by another
character string during program generation. It replaces the occurrences of
formal parameter by corresponding actual parameters.
MEND
INCR A, B, AREG
MEM_VAL A
INCR_VAL B
REG AREG
+ MOVER AREG, A
+ ADD AREG, B
+ MOVEM AREG, A
2. Semantic expansion:-
Semantic expansion implies generation of instructions tailored to the
requirements of a specific usage e.g. generation of type specific instruction for
manipulation of byte and word operands. It is characterized by the fact that
different use of macro can lead to codes which differ in the number, sequence
and opcodes of instructions.
LCL &M
&M SET 0
MEND
+ MOVEM AREG, B
Expansion time control flow: This determines the order in which model
statements are visited during macro expansion.
Preprocessor statements can alter the flow of control during expansion. So some
model statements are either never visited during expansion or are repeatedly
visited during expansion.
(ii)MEC: = MEC+1;
Lexical Substitution
A model statement consists of 3 types of strings:
Types of parameters
1. Positional parameters.
2. Keyword parameters.
3. Default parameters.
1. Positional parameters
A positional formal parameter is written as &<parameter name>, e.g.
&SAMPLE where SAMPLE is name of parameter. A <actual parameter
specification> is simply an <ordinary string>. The value of positional parameter
XYZ is determined by the rule of positional association as follows:
(i) Find the ordinal position of XYZ in the list of formal parameters in the
macro prototype statement.
(ii) Find the actual parameter specification occupying the same ordinal position
in the list of actual parameters in the macro call statement. Let this be the
ordinary string ABC. Then the value of formal parameter XYZ is ABC.
MEND
INCR A, B, AREG
MEM_VAL A
INCR_VAL B
REG AREG
+ MOVER AREG, A
+ ADD AREG, B
+ MOVEM AREG, A
2. Keyword parameters
A keyword formal parameter is written as ‘&<parameter name>=’.The <actual
parameter specification> is written as <formal parameter name>=<ordinary
string>
(i) Find the actual parameter specification which has the form XYZ=<ordinary
string>.
(ii) Let <ordinary string> in the specification be the string ABC. Then the value
of formal parameter XYZ is ABC.
MEND
or
MEM_VAL A
INCR_VAL B
+ MOVER AREG,A
+ ADD AREG,B
+ MOVEM AREG,A
MEND
The first call overrides to us a default specification for the parameter BREG.
MEM_VAL A
INCR_VAL B
REG AREG
+ MOVER AREG,A
+ ADD AREG,B
+ MOVEM AREG,A
4. Macro with mixed parameter lists
A macro may be defined to use both positional and keyword parameters. In such
case, all positional parameters must precede all keyword parameters.
MEND
INCR A, B, REG=BREG
+ ADD AREG,B
+ MOVEM AREG,A
MEND
The formal parameters and their values
X A
Y B
OP MULT
LAB LOOP
+ ADD AREG,B
+ MOVEM AREG,A
3.3 Nested Macro Calls:-
A model statement in a macro may constitute a call on another macro. Such
calls are known as nested macro calls. Expansion of nested macro calls follows
the last-in-first-out (LIFO) rule. Thus, in a structure of nested macro calls,
expansion of the latest macro call (I.e. the innermost macro call in the structure)
is completed first.
MEND
MACRO
MEND
3.4 Advanced Macro Facilities:-
Advanced macro facilities are aimed at supporting semantic expansion. These
facilities can be grouped into
3. Attributes of parameters
.<ordinary string>
As SS is defined by putting it in the label field of a statement in the macro body.
It is used as an operand in an AIF or AGO statement to designate the destination
of an expansion time control transfer. It never appears in the expanded form of a
mode statement.
AGO<sequencing symbol>
<sequencing symbol>ANOP
Example 10:
MACRO
AGO .OVER
(i) Local EV: A local EV is created for use only during a particular macro call
and has following syntax.
(ii) Global EV: A global EV exists across all macro calls situated in a program
and can be used in any macro which has a declaration for it. It has following
syntax
<EV specification> has a syntax has &<EV name>, where <EV name> is an
ordinary string. Values of EV’s can be manipulated through the preprocessor
statement SET. A SET statement is written as
where <EV specification> appears in the label field and SET in mnemonic field.
A SET statement assigns the value of <SET expression> to the EV specified in
<EV specification>. The value of EV can be used in any field of a model
statement and in the expression of an AIF statement.
Example 11:
LOCAL
MACRO
CONSTANTS
LCL &A
&A SET 1
DB &A
SET &A+1
DB &A
MEND
It represents information about the value of the formal parameter i.e. about
corresponding actual parameter. The type, length and size attributes have the
names T, L and S.
Example 12:
MACRO
DCL_CONST &A
.NEXT -----
-----
-----
MEND
Here expansion time control is transferred to the statement having .NEXT in its
label field only if the parameter corresponding to the formal parameter A has
the length of ‘1’.
generates efficient code to evaluate A-B+C in AREG. When the first two
parameter of a call are identical then EVAL should generate a single MOVER
instruction to load the 3rd parameter into AREG. This is achieved as follows:
Example 13:
MACRO
AGO .OVER
.OVER MEND
CLEAR &A
MEND
When called as CLEAR B, the statement puts the value 0 in AREG, while the
three MOVEM statements store this value in 3 consecutive bytes with the
address B, B+1, B+2. The same can be achieved by writing an expansion time
loop which visits the model statement, or set of model statements, repeatedly
during macro expansion. Expansion time loops can be written using EV’s and
expansion time control transfer statements AIF and AGO.
Example 15:
MACRO
LCL &M
&M SET 0
MEND
Consider expansion of macro call
CLEAR B, 3
+ MOVEM AREG, B
1. REPT statement
2. IRP statement
REPT statement
Syntax: REPT<expression>
Example 16: Following example illustrates the use of this facility to declare
10 constants with the value 1, 2, …, 10
MACRO
CONST10
LCL &M
&M SET 1
REPT 10
DC ‘&M’
MEND
IRP statement
Syntax: IRP <formal parameter>, <argument list>
The formal parameter mentioned in the statements takes successive values from
the argument list. For each value, the statements between the IRP and ENDM
statements are expanded once.
Example 17:
MACRO
DC ‘&Z’
ENDM
MEND
Example 18:
MACRO
&Y DW 25
AGO .OVER
.BYTE ANOP
&Y DB 25
.OVER MEND
This macro creates a constant ‘25’ with the name given by the 2nd parameter
type of the constant matches the type of the first parameter.
Example 19:
MACRO
LCL &M
&M SET 0
MEND
Example 20:
MACRO
AGO .OVER
.OVER MEND
3.5 Design of Macro Preprocessor:-
The macro preprocessor accepts an assembly program containing definitions
and calls and translate it into an assembly program which does not contain any
macro definitions or calls. The program form output by the macro preprocessor
can now be handed over to an assembler to obtain the target language form of
the program. Thus the macro preprocessor segregates macro expansion from the
process of program assembly.
Macro Name Table (MNT) is designed to hold the names of all macro definition
in a program. A macro name is entered in this table when a macro definition is
processed. While processing a statement in the source program, the
preprocessor compares the string found in its mnemonic field with the macro
names in MNT. A match indicate that the current statement is a macro call.
If a macro call statement does not specify a value for some parameter then its
default value would be copied from PDT to APT.
Expansion Time Variable’s Table (EVT) is maintained for this purpose. Each
entry in the table is a pair
Macro definition Table (MDT) is used to store the body of macro. The flow of
control during macro expansion determines when a model statement is to be
visited for expansion.
where < MDT entry #> is the number of the MDT entry which contains the
model statement defining the sequencing symbol. This entry is made on
encountering a statement which contains the sequencing symbol in its label field
or on encountering a reference prior to its definition.
2. Values of formal parameter and EV’s are available in APT and EVT
respectively.
Similarly analysis leads to splitting of EVT into EVNTAB and EVTAN and
SST into SSNTAB and SSTAB. PDT is replaced by a keyword parameter
default table (KPDTAB).
LCL &M
&M SET 0
MEND
1. SSNTAB_PTR := 1;
PNTAB_PTR := 1;
KPDTAB_PTR := 1;
SSTAB_PTR := 1;
MDT_PTR := 1;
2. Process the macro prototype statement and form the MNT entry
a) name := macro_name;
c) KPDTP := KPDTAB_PTR;
v) #KP := #KP+1
e) MDTP := MDT_PTR
f) #EV := 0
g) SSTP := SSTAB_PTR
else
q := SSNTAB_PTR
SSNTAB_PTR := SSNTAB_PTR+1
SSTAB[SSTP+q-1] := MDT_PTR
v. MDT_PTR :=MT_PTR+1
4. MEND statement
If SSNTAB+PTR=1 then
SSTP=0
Else
KPDTAB[KPDTP]……KPDTAB[KPDTP+#KP-1] into
APTAB[#PP+1]…..APTAB{#PP+#KP}
b) If a SET statement with the specification (E, #m) in the label field then
value in EVTAB[m].
MEC := SSTAB[SSTP+s-1];
MEC := SSTAB[SSTP+s-1]
Pass I
2. SYMTAB construction.
Pass II
1. Macro expansion.
3. Processing of literals.
Pass III
The pass structure can be simplified if attributes of actual parameters are not to
be supported. The macro preprocessor would then be a single pass program.
Integrating pass I of the assembler with preprocessor would give us the
following two pass structure
Pass I
2. Macro expansion.
4. Processing of literals.
Pass II
Two aspects:
– Generate code.
– Provide Diagnostics.
To understand the implementation issue, we should know PL
features contributing to semantic gap between PL and Execution domain.
PL Features:
1. Data Types
2. Data Structure
3. Scope Rules
4. Control Structures
1. Data type
Definition: A data type is the specification of
(i) Values those entities of the type may have
(ii) Operations that performed on entities of type
The following tasks are involved:
1. Check legality of operation for types of operand
2. Use type conversion operation
3. Use appropriate instruction sequence of the target machine
var
x, y : float;
i,j : integer;
Begin
y := 10;
x := y + i;
Type conversion of i is needed.
i: integer;
CONV_R AREG, I
ADD_R AREG, Y
MOVEM AREG, X
2. Data Structure
• PL permits declaration of DS and use it.
• To compile the reference of element of DS compiler must develop memory
mapping to access allocated area.
• A record, heterogeneous DS leads to complex memory mapping.
• User defined DS requires mapping of different kind.
• Proper combination of DS is required to manage such complexity of structure.
• Two kind of mapping is involved
– Mapping array reference
– Access field of record
Example:
Program example (input, output);
type
employee = record
name : array [1...100] of character;
sex : character;
id: integer
end;
var
info : array [1..500] of employee;
i,j : integer;
begin {main program}
info[i].id := j;
end
3. Scope Rules
• Determine the accessibility of variable declared in different blocks of a program.
• E.g.: x, y : real;
y, z : integer;
x := y; B A
• Stat x:=y uses value of block ‘B’.
• To determine accessibility of variable compiler performs operation:
– Scope Analysis
– Name Resolution
4. Control Structure
• Def: It is collection of language features for altering flow of control during
execution.
• This includes:
– Conditional transfer of control
– Conditional execution
– Iterative control
– Procedure call
• Compiler must ensure non-violation of program semantics.
Eg: for i := 1 to 100 do
begin
lab1 : if i = 10 then…
end;
• Forbidden: control is transferred to label1 from outside the loop.
• Assignment statements are also not allowed.
4.2 Memory Allocation:-
6. E.g.: Fortran
7. No. of flavors or types
- 1. Automatic allocation
- 2. Program controlled allocation
1. Scope Rules
5. Recursion
1. Scope Rules:
• If variables var (i) is created with name name(i) in block B.
• Rule 2: Rule 1 + ‘B’ is enclosed in ‘B’ unless ‘B’ contains declaration using
same name name(i)
B’ = non-local
A{
x,y,z : integer;
B{ g : real;
C{ h,z : real;
}C
}B
D{ i,j : integer;
}D
}A
fig.Block Structured Program.
1. TOS := TOS + 1;
2. TOS* := ARB;
3. ARB := TOS;
4. TOS := TOS + 1;
6. TOS := TOS + n;
De-allocation:
1. TOS := ARB – 1;
2. ARB := ARB* ;
3. Accessing non-local variables
– n1_var : is a non local variable
• Then textual ancestor of block b use is a block which encloses block b use that
is b def.
– Static Pointers
– Display
• At the time of creation of AR for Block B its static pointer is set to point AR of
static ancestor of b.
3. r := 1(r)
– TOS := TOS + 1
• r:= ARB;
• r:= 1(r);
• r:= 1(r);
(ii) Displays:
• For large value of level difference, it is expensive to access non- local variables
using static pointers.
– Generate code.
– Symbol
– Displacement.
5. Recursion:
• Extended stack model best for recursion.
– Accommodated
commodated in symbol table
•Attributes:
•Addressability:
•Specifies:
–Where operand is located
–How it can be accessed.
•Addressability Code:
–M : operand is in memory
–R : operand is in register
–AR : address is in register
–AM : address is in memory
•Address:
–Address of CPU register or memory
•Operand descriptor is build for every operand that is id’s, constant, partial results.
–PRi:-Partial Results
–Opj:-Some operator
•Operand descriptor is an array in which operand descriptions are stored.
•Descriptor # is a descriptor in operand descriptor array.
b) Register Descriptor
•It has 2 fields:
–Status: free / occupied
–Operand Descriptor #
•Stored in register descriptor array.
c) Generating on Instruction
•Rule: any one operand need to be in register to perform any operation over it.
•Function code genis called with OP; and descriptors of its operands as parameters.
d) Saving Partial Results
•If all the registers are occupied, register are freed by transferring content of
temporary location in memory.
•r is available to evaluate operator OPi.
•Thus, ‘temp’ array is declared in target program to hold partial results.
•Descriptor of partial result must change/modify when partial result are moved to
temporary location.
•After partial result a*b is moved to temp location.
a) Postfix Strings
•Here each operand appear immediately after its last operand.
•Thus, operators can be evaluated in order in which they appear in string.
•Eg: a+b*c+d*e^fabc*+def^*+
•We perform code generation from postfix string using stack of operand
descriptors.
•Operand appears and then operand descriptors are pushed to the stack i.e. stack
would contain descriptor fro a,b & c when first * is encountered.
•Little modification in extended stack model can manage postfix string efficiently.
1) Triples:
•Triple is a representation of elementary operations in the form of a pseudo
machine instruction
•Slight change in algorithm of operator precedence help us convert infix string to
triples.
Example: Triples for a+b*c+d*e^f
Operator Operand 1 Operand 2
* B C
+ 1 A
^ E F
* D 3
+ 2 4
2) Indirect Triples:
•Are useful in optimizing compiler.
•For efficiency Hash Organization can be used for the table of triples.
3) Quadruple:
•Result name: designates result of evaluation that can be used as operand for other
quadruple.
•More convenient than using triples & indirect triples.
•For example : a+b*c+d*e^f
•Remember, they are not temporary locations (t) but result name.
•For elimination, result name can become temporary location.
c) Expression Tree
•Operator are evaluated in order determined by bottom up parsing which is not
most efficient.
•Hence, compiler’s back-end analyze expression to find best evaluation order.
•What will help us here?
•Expression Tree: is a AST (Abstract Syntax Tree) which depicts the structure of
an expression.
•Thus, simplifying analysis of an expression to determine best evaluation order.
•How to determine best evaluation order for the expression?
–Step 1: Register Requirement Label (RR Label) indicates no. of register required
by CPU to evaluate sub-tree.
–Step 2: Top Down parsing and RR Label information is used and order of
evaluation is determined.
1. Call by Value
•Actual parameters are passed to called function.
•These values are assigned to corresponding formal parameters.
•Values are passed in ‘one direction’. i.efrom calling program to called program.
•If function changes value of formal parameter, changes are not reflected on actual
parameter.
•Thus, can’t produce any side effect on parameters.
•Generally used in built in function.
•Advantage:
–Simplicity
–Efficient if parameters are scalar variables.
•Advantage:
–Simplicity
•Disadvantage:
–Incurs higher overhead.
3. Call by Reference
•Address of actual parameter is passed to called function.
•Parameter list is actually the list of addresses.
•At every access, corresponding actual parameter is obtained from parameter list.
•Code:
1. r <-<ARB> + (dDp)AR orr <-<r_par_list> + (dDp)par_list
2. Access value using address contained in register.
•Code analysis:
Step 1: incurs overhead
Step 2: produces instantaneous side effects.
•Mechanism is popular because has clear semantics.
•Plays important role at the time of nesting of structures.
•How? It provides updated value.
•See eg. 6.29 on pg. no. 197.
•Here, z,iare non local variables of alpha.
•Alpha be called as (d[i],x).
•Value of ‘x’ changes as ‘b’ also changes.
4. Call by Name
•Same effect as call by reference.
•Every occurrence of formal parameters in the called function is replaced by the
name of the corresponding actual parameter.
•Eg:
a = d[i];
z = d[i];
i= i+ 1;
-b = d[i] + 5;
x = d[i] + 5;
•Achieves instantaneous side effects.
•Has implication of changes in parameters during execution.
•Code:
1. r <-<ARB> + (dDp)AR
r <-<r_par_list> + (dDp)par_list
2. Call the function whose address is contained in r.
3. Use the address returned by the function to access p.
•Advantage:
Changes are made dynamically which makes call by name mechanism
immensely powerful.
•Dis-advantage:
High overhead at step 2.
Not much practically practiced.
4.5 INTERPRETERS:-
Both Compilers and Interpreters analyze a source statement to determine its
meaning.
During compilation, analysis of statement is followed by code generation
and during interpretation, it is followed by actions which implements its
meaning.
• Notation:
Example:
• Let,
– Size = 200
Components:
3. Data Manipulation Routines: A set containing a routine for every legal data
manipulation actions in the source language.
• Advantages:
A Toy Interpreter:
Last few locations in the rvar and ivar arrays are used as stacks for
expression evaluations with the help of the pointers r_tos and i_tos
respectively.
Pure & Impure Interpreters:
Pure Interpreter:
-In a pure interpreter, the source program is retained in the source form all
through its interpretation.
-This arrangement incures substantial analysis overheads while interpreting
a statement .
Data
Source
Interpreter Result
program
Impure Interpreter:
-E.g. Intermediate code(IC) can be analyzed more efficiently than the source
form of the program.
Data
Source
Preprocessor Interpreter Result
program
Unit 5: Linkers
Figure 5.1 contains the schematic showing steps 1-4 in the execution of
the program.The translator outputs a program from calledanobjectmodule
for the program. The linker processes a set of object modules to produce a
ready-to-execute program form, which we will call a binaryprogram. The
loader loads this program into the memory for the purpose of execution.
As shown in the schematic, the object module(s) and ready to execute
program form can be stored in the form of files for repeated use.
Control flow
result
Source
Program object module binary program
Fig.5.1 A schematic representation of program execution.
The origin of a program may have to be changed by the linker or loader
for one of the two reasons. First, the same set of translated addresses may
have been used in different object modules constituting a program, e.g.
object modules of library rotines often have the same translated origin.
Memory allocation to such programs would conflict unless their origins
are changed. Second, an operating system may require that a program
should execute from a specific area of memory. This may require a
change in its origin. The change of origin leads to changes in the
execution start address and in the addresses assigned to symbols. The
following terminology used to refer to the address of a program entity at
different times:
1. Translation time (or translated) address: Address assigned by the
translator
2. Linked address: Address assigned by the linker
3. Load rime (or load) address: Address assigned by the loader
The same prefixes translation time (or translated), linked and load time
(or load) are used with the origin and execution start address of a
program. Thus,
The same prefixes translation time (or translated) , linked and load time
(or load) are used with the origin and execution start address of a
program. Thus ,
1.Translated origin: Address of the origin assumed by the translator This
I is the address specified by the programmer in an ORIGN statement.
2.Linked Origin: Address of the origin assigned by the linker while
producing a binary program.
3.Load origin: Address of the origin assigned by the loader while loading
the program for execution.
The linked and load origins may differ from the translated origin of a
program due to one of the reasons mentioned earlier.
Example 5.1 Consider the assembly program and its generated code
shown in Fig. 5.2 The translated origin of the program is 500. The
translation tine address of LOOP is there fore 501. If the program is
loaded for execution in the memory area starting with address 900.the
load time origin is 900. The load time address of LOOP would be 901.
Example 5.2 The translated origin of the program in Fig.5.2 is 500. The
translation time address of symbol A is 540. The instruction
corresponding to the statement READ A (existing in translated memory
word 500) uses the address 540, hence it is an address sensitive
instruction. If the linked origin is 900, A would have the link time address
940. Hence the address in the READ instruction should be corrected to
940. Similarly, the instruction in translated memory word 538 contains
501, the address of LOOP. This should be corrected to 901. (Note that the
operand addresses in the instructions with the addresses 518 and 519 also
need to be corrected. This is explained in Section 5.1.2.)
Performing relocation:
Using 5.1
5.1.2 Linking:-
Example 5.4 In the assembly program of Fig. 5.2, the ENTRY statement
indicates that a public definition of TOTAL exists in the program. Note
that LOOP and A are not public definitions even though they am defined
it the program. The EXTRN statement indicates that the program contains
external references to MAX and AL PHA. The assembler does not know
the address of an external symbol. Hence it puts zeros in the address
fields of the instructions corresponding to the statements MOVER
AREG, ALPHA and ВС ANY. MAX. If the EXTERN statement did not
exist, the assembler would have flagged references to MAX and ALPHA
as errors.
Resolving external references:
Example 5.5. Let the program unit of Fig 5.2 (referred to as program unit
P) be linked with the program unit Q described in Fig. 5.3
Program unit P contains an external reference to symbol ALPHA which
is a public definition in Q with the translation time address 231. Let the
link origin of P be 900 and its size be 42 words. The link origin of Q is
therefore 942, and the link time putting the link time address of ALPHA
is 973. Linking is performed by putting the link time address of ALPHA
in the instruction of P using ALPHA, i.e. by putting the address 973 in
the w instruction with the translation time address 518 in P.
Binary Programs:
1. Pi has been relocated to the memory area starting at its link origin, and
2. Linking has been performed for each external reference Pi.
1. Header: The header contains translated origin, size and execution start
address of P.
2.Program: This component contains the machine language program
corresponds to P.
3.Relocation table: (RELOCTAB) This table describes IRRp. Each
RELOCTAB entry contains a single field:
Translation address : translated address of an address sensitive
instructions
4.Linking table (LINKTAB): This table contains information concerning
thepublic definition and external references in P.
Each LINKTAB entry contains three fields:
4.Linking table
A PD 540
Here the displacement and the segment base of FAR_LAB are to be put
in the JMP instruction itself. The assembler puts the displacement of
FAR_LAB in the first two operand bytes of the instruction, and makes a
RELOCTAB entry for the third and fourth operand bytes which are to
hold the segment base address. A statement like
ADDR_A DW OFFSET A
(which is an ‘address constant') does not need any relocation since the
assembler can itself put the required offset in the bytes.
Example 5.8 Let the address of work_area be 300. While relocating the
object module of Ex. 5.6, relocation factor=400. For the first
RELOCTAB entry, address_in_work_area= 300+500-500= 300, This
word contains the instruction for READ A It is relocated by adding 400
t0 the operand address in it. For the second RELOCTAB entry, address in
work area 300 +538-500= 338. The instruction in this word similarly
relocated by adding 400 to the operand address in it.
Symbol Linked
address
P 900
A 940
Q 942
ALPHA 973
Exercise 5.2
1.It is required to merge a set of object modules {omi} to construct a
single object module om’. This reduces the linking and relocation time in
situations where the object modules in {omi} are independent- that is,
they call or use each other
(a)What are the public definitions of ‘om’ ?
(b)What are the external references in om’?
(c)Explain how the RELOCTAB’s can be merge.
Exercise 5.3
1. Comment on following statements:
(a) Self-relocating programs are less efficient than relocatable programs.
(b) There would be no need for linkers if all programs are coded as self-
relocating programs.
2.A self-relocating program needs to find its load address before it can
execute its relocating logic. Comment on how this information can be
determined by the program.
15K 15K
trans_a write
35K
trans_b 20K
50K 30K
trans_c 35K trans_c
60K trans_b
write 40K
65K trans_a
Example 5.17 the IBM mainframe linker commands for the overlay
structure of Fig.5.9 are as follows:
Phase main: PHASE MAIN, +10000
INCLUDE INIT
INCLUDE READ
INCLUDE READ
Phase trans_a: PHASE A+TRANS, *
INCLUDE TRANS_A
Phase trans_b: PHASE B_TRANS, A_TRANS
INCLUDE TRANS_B
Phase trans_c: PHASE C_TRANS, A_TRANS
INCLUDE TRANS_C
Exercise 5.5
5.5 LOADERS:-
As described in Section 5.1.1 an absolute loader can only load programs
with load origin linked origin. This can be inconvenient if the load
address of a program is likely to be different for different executions of a
program. A relocating loader performs relocation while loading a
program for execution. This permits a program to be executed in different
parts of the memory.
Exercise 5.6
Important steps in testing and debugging are selection of test data for the
program.
Software tools to assist the programmer in these following steps:
1. Test data generators help the user in selecting test data for his
program. Their use helps in ensuring that a program is thoroughly
tested.
2. Automated test drivers help in regression testing .
3. Debug monitor help in obtaining information for localization of errors.
4. Source code control system help to keep track of modification in the
source code.
Test data selection uses the notion of an execution path which is
sequence of program statements visited during an execution. For testing, a
program can be viewed as a set of execution paths. A test data is a set of input
values which satisfy these conditions.
PROGRAMMING ENVIROMENTS
A programming environment is a software system that provides integrated
facilities for program creation, editing, testing and debugging .it consist of
the following component.
1. A syntax directed editor(which is structure editor)
2. A language processor –a compiler ,interpreter, or both
3. A debug monitor.
4. A dialog monitor.
All components accessed through the dialog monitor.
The syntax directed editor incorporates a front end for the programming
language.
Editor performs syntax analysis and converts it into an intermediate
representation(IR).
The compiler and the debug monitor share the IR.
If a compiler is used ,it is activated after the editor has converted a
statement to IR.
At any time during execution the programmer can interrupt program
execution and enter the debug mode or return the editor.
The system may provide the program development and testing function.
3. Direct Manipulation:-
A direct manipulation system provides the user with a visual display of the
universe of the application.
The display shows the important objects in the universe.
Actions or operations over objects are indicated using some kind of pointing
devices.
Eg. A cursor or a mouse.
Hypertext:-
Hypertext visualizes a document to consist of a hierarchical arrangement of
information units and provide a variety of means to locate the required
information. this takes the form of
1. Tables and indexes
2. String searching function
3. Means to navigate within the document
4. Backtracking facilities
6.4.4 Structure of a User Interface :-
MENULAY
Menulay is an early UIMS using the screen layout as the basis for the dialog
model the UI designer starts by designing the user screen to consist the set of
icons.
This action is performed when icon is selected .
The interface consists of set of screens the system generate set of icon tables
giving the name and description of an icon and a list when an event is
selected.
HYPERCARD
This UIMS from apple incorporates object orientation in the event oriented
approach .
A card has an associated screen layout containing buttons and fields .
A button can be selected by clicking the mouse on it .
A field contain editable text.
Many cards can share the same background.
A hypercard program is thus a hierarchy of card called a stack.
Action foreign event is determined by using hierarchy of cards as an
inheritance hierarchy.
Hypercard uses an interpretive schematic to implement UI.