Ssnotes

Unit 1:Introduction- System Software and Machine Architecture- Simplified Instructional Computer (SIC)-
SIC Machine Architecture- SIC/XE Machine Architecture- Traditional (CISC) Machines- VAX Architecture-
Pentium Pro Architecture – RISC Machines – UltraSPARC Architecture- PowerPC Architecture- Cray T3E
Architecture.
1.0 Introduction
Software is set of instructions or programs written to carry out certain task on digital computers. It is
classified into system software and application software. System software consists of a variety of programs
that support the operation of a computer. Application software focuses on an application or problem to be
solved. System software consists of a variety of programs that support the operation of a computer.
Examples for system software are Operating system, compiler, assembler, macro processor, loader or
linker, debugger, text editor, database management systems (some of them) and, software engineering
tools. These software’s make it possible for the user to focus on an application or other problem to be
solved, without needing to know the details of how themachine works internally.
1.1 System Software and Machine Architecture

One characteristic in which most system software differs from application software is machine
dependency.
System software – support operation and use of computer. Application software - solution to a
problem. Assembler translates mnemonic instructions into machine code. The instruction formats,
addressing modes etc., are of direct concern in assembler design. Similarly, Compilers must generate
machine language code, taking into account such hardware characteristics as the number and type of
registers and the machine instructions available. Operating systems are directly concerned with the
management of nearly all of the resources of a computing system.
There are aspects of system software that do not directly depend upon the type of computing
system, general design and logic of an assembler, general design and logic of a compiler and, code
optimization techniques, which are independent of target machines. Likewise, the process of linking
together independently assembled subprograms does not usually depend on the computer being used.
1.2 The Simplified Instructional Computer (SIC)
Simplified Instructional Computer (SIC) is a hypothetical computer that includes the hardware
features most often found on real machines. There are two versions of SIC, they are, standard model
(SIC), and, extension version (SIC/XE) (extra equipment or extra expensive).
1.2.1 SIC Machine Architecture

We discuss here the SIC machine architecture with respect to its Memory and Registers, Data
Formats, Instruction Formats, Addressing Modes, Instruction Set, Input and Output
Memory
There are 215 bytes in the computer memory, that is 32,768 bytes , It uses Little Endian format to
store the numbers, 3 consecutive bytes form a word , each location in memory contains 8-bit bytes.
Registers
There are five registers, each 24 bits in length. Their mnemonic, number and use are given in the
following table.
1
Mnemonic Number Use
A 0 Accumulator; used for arithmetic operations
X 1 Index register; used for addressing
L 2 Linkage register; JSUB
PC 8 Program counter
SW 9 Status word, including CC
Data Formats
Integers are stored as 24-bit binary numbers , 2’s complement representation is used for negative
values, characters are stored using their 8-bit ASCII codes, No floating- point hardware on the standard
version of SIC.
Instruction Formats
Opcode(8) x Address (15)
All machine instructions on the standard version of SIC have the 24-bit format as shown above
Addressing Modes
Mode Indication Target address calculation
Direct x=0 TA = address
Indexed x=1 TA = address + (x)
There are two addressing modes available, which are as shown in the above table.
Parentheses are used to indicate the contents of a register or a memory location.
Instruction Set
SIC provides, load and store instructions (LDA, LDX, STA, STX, etc.). Integer arithmetic
operations: (ADD, SUB, MUL, DIV, etc.). All arithmetic operations involve register A and a word in
memory, with the result being left in the register. Two instructions are provided for subroutine linkage.
COMP compares the value in register A with a word in memory, this instruction sets a condition code
CC to indicate the result. There are conditional jump instructions: (JLT, JEQ, JGT), these instructions
test the setting of CC and jump accordingly. JSUB jumps to the subroutine placing the return address in
register L, RSUB returns by jumping to the address contained in register L.
Input and Output
Input and Output are performed by transferring 1 byte at a time to or from the rightmost 8 bits of
register A (accumulator). The Test Device (TD) instruction tests whether the addressed device is ready
2
to send or receive a byte of data. Read Data (RD), Write Data (WD) are used for reading or writing the
data.
Data movement and Storage Definition
LDA, STA, LDL, STL, LDX, STX ( A- Accumulator, L – Linkage Register, X – Index
Register), all uses 3-byte word. LDCH, STCH associated with characters uses 1-byte. There are no
memory-memory move instructions.
Storage definitions are
 WORD - ONE-WORD CONSTANT

 RESW - ONE-WORD VARIABLE
 BYTE - ONE-BYTE CONSTANT
 RESB - ONE-BYTE VARIABLE
Example Programs (SIC)
Example 1(Simple data and character movement operation)
LDA FIVE STA ALPHA

LDCH CHARZ
STCH C1
.
ALPHA RESW 1
FIVE WORD 5
CHARZ BYTE C’Z’
C1 RESB 1
Example 2( Arithmetic operations)
LDA ALPHAADD INCR

SUB ONE STA BEETA
……..
……..
……..
……..
ONE WORD 1
ALPHA RESW 1
BEETA RESW 1
INCR RESW 1
Example 3(Looping and Indexing operation)
LDX ZERO : X=0

MOVECH LDCH STR1, X : LOAD A FROM STR1
STCH STR2, X : STORE A TO STR2
TIX ELEVEN : ADD 1 TO X, TEST
3
JLT MOVECH
.
.
.
STR1 BYTE C ‘HELLO WORLD’
STR2 RESB 11
ZERO WORD 0
ELEVEN WORD 11
Example 4( Input and Output operation)
INLOOP TD INDEV : TEST INPUT DEVICE

JEQ INLOOP : LOOP UNTIL DEVICE IS READY
RD INDEV : READ ONE BYTE INTO A
STCH DATA : STORE A TO DATA
.
.
OUTLP TD OUTDEV : TEST OUTPUT DEVICE
JEQ OUTLP : LOOP UNTIL DEVICE IS READY
LDCH DATA : LOAD DATA INTO A
WD OUTDEV : WRITE A TO OUTPUT DEVICE
.
.
INDEV BYTE X ‘F5’ : INPUT DEVICE NUMBER
OUTDEV BYTE X ‘08’ : OUTPUT DEVICE NUMBER
DATA RESB 1 : ONE-BYTE VARIABLE
Example 5 (To transfer two hundred bytes of data from input device to memory)
LDX ZERO
CLOOP TD INDEV
JEQ CLOOP
RD INDEV STCH RECORD,
XTIX B200
JLT CLOOP
.
.
INDEV BYTE X ‘F5’
RECORD RESB 200
ZERO WORD 0
B200 WORD 200
Example 6 (Subroutine to transfer two hundred bytes of data from input device tomemory)
JSUB READ
………….
………….
READ LDX ZERO
CLOOP TD INDEV
4
JEQ CLOOP
RD INDEV
STCH RECORD, X
TIX B200 : add 1 to index compare 200 (B200)
JLT CLOOP
RSUB
……..
……..
RECORD RESB 200
ZERO WORD 0
B200 WORD 200
1.2.2 SIC/XE Machine ArchitectureMemory

Maximum memory available on a SIC/XE system is 1 Megabyte (220 bytes)
Registers
Additional B, S, T, and F registers are provided by SIC/XE, in addition to theregisters of SIC
Mnemonic Number Special use

B 3 Base register
S 4 General working register
T 5 General working register
F 6 Floating-point accumulator (48 bits)
Floating-point data type
There is a 48-bit floating-point data type, F*2(e-1024)

1 11 36
s exponent fraction
Instruction Formats
The new set of instruction formats fro SIC/XE machine architecture are as follows. Format 1 (1
byte): contains only operation code (straight from table). Format 2 (2 bytes): first eight bits for operation
code, next four for register 1 and following four for register 2. The numbers for the registers go according
to the numbers indicated at the registers section (ie, register T is replaced by hex 5, F is replaced by hex
6). Format 3 (3 bytes): First 6 bits contain operation code, next 6 bits contain flags, last 12 bits contain
displacement for the address of the operand. Operation code uses only 6 bits, thus the second hex digit
will be affected by the values of the first two flags (n and i). The flags, in order, are: n, i, x, b, p, and e.
Its functionality is explained in the next section. The last flag e indicates the instruction format (0 for 3
and 1 for 4). Format 4 (4 bytes): same as format 3 with an extra 2 hex digits (8 bits) for addresses that
require more than 12 bits to be represented.
5
Format 1 (1 byte)
op
Format 2 (2 bytes)
8 4 4
op r1 r2
Formats 1 and 2 are instructions do not reference memory at all
Format 3 (3 bytes)
6 1 1 1 1 1 1 12
op n i x b p e disp
Format 4 (4 bytes)
6 1 1 1 1 1 1 20
op n i x b p e address
Addressing modes & Flag Bits
Five possible addressing modes plus the combinations are as follows.
Direct (x, b, and p all set to 0): operand address goes as it is. n and i are both set to the same
value, either 0 or 1. While in general that value is 1, if set to 0 for format 3 we can assume that the rest
of the flags (x, b, p, and e) are used
as a part of the address of the operand, to make the format compatible to theSIC format
Relative (either b or p equal to 1 and the other one to 0): the address of the operand should be
added to the current value stored at the B register (if b = 1) or to the value stored at the PC register (if p =
1)
Immediate (i = 1, n = 0): The operand value is already enclosed on the instruction (ie. lies on
the last 12/20 bits of the instruction)
Indirect (i = 0, n = 1): The operand value points to an address that holds the address for the
operand value.
6
Indexed (x = 1): value to be added to the value stored at the register x to obtain real address of
the operand. This can be combined with any of the previous modes exceptimmediate.
The various flag bits used in the above formats have the following meaningse - e = 0
means format 3, e = 1 means format 4
Bits x,b,p: Used to calculate the target address using relative, direct, and indexedaddressing Modes
Bits i and n: Says, how to use the target address
b and p - both set to 0, disp field from format 3 instruction is taken to be the target address. For a format
4 bits b and p are normally set to 0, 20 bit address is the target address
x - x is set to 1, X register value is added for target address calculation

i=1, n=0 Immediate addressing, TA: TA is used as the operand value, no memoryreference
i=0, n=1 Indirect addressing, ((TA)): The word at the TA is fetched. Value of TA is taken as the address
of the operand value
i=0, n=0 or i=1, n=1 Simple addressing, (TA):TA is taken as the address of the operandvalue
Two new relative addressing modes are available for use with instructions assembled using
format 3.
Mode Indication Target address calculation

Base relative b=1,p=0 TA=(B)+ disp
sp
Program-counter TA=(PC)+ disp
relative b=0,p=1 (-
Instruction Set
SIC/XE provides all of the instructions that are available on the standard version. In addition we
have, Instructions to load and store the new registers LDB, STB, etc, Floating-point arithmetic
operations, ADDF, SUBF, MULF, DIVF, Register move instruction : RMO, Register-to-register
arithmetic operations, ADDR, SUBR, MULR, DIVR and, Supervisor call instruction : SVC.
Input and Output
There are I/O channels that can be used to perform input and output while the CPU is
executing other instructions. Allows overlap of computing and I/O, resulting in more efficient system
operation. The instructions SIO, TIO, and HIO are used to start, test and halt the operation of I/O
channels.
Example Programs (SIC/XE)
Example 1 (Simple data and character movement operation)
LDA #5
STA ALPHA LDA #90
STCH C1
7
.
.
ALPHA RESW 1
C1 RESB 1
Example 2(Arithmetic operations)
LDS INCR LDA ALPHA

ADD S,A SUB #1
STA BEETA
………….
…………..
ALPHA RESW 1
BEETA RESW 1
INCR RESW 1
Example 3(Looping and Indexing operation)
LDT #11
LDX #0 : X=0
MOVECH LDCH STR1, X : LOAD A FROM STR1
STCH STR2, X : STORE A TO STR2
TIXR T : ADD 1 TO X, TEST (T)
JLT MOVECH
……….
……….
………
STR1 BYTE C ‘HELLO WORLD’
STR2 RESB 11
Example 4 (To transfer two hundred bytes of data from input device to memory)
LDT #200
LDX #0 CLOOP TD
INDEV
JEQ CLOOPRD
INDEV
STCH RECORD, XTIXR T
JLT CLOOP
.
.
RECORD RESB 200
Example 5 (Subroutine to transfer two hundred bytes of data from input device to memory)
JSUB READ
……….
……….
READ LDT #200
LDX #0
CLOOP TD INDEV
JEQ CLOOP
8
RD INDEV
STCH RECORD, X
TIXR T : add 1 to index compare T
JLT CLOOP
RSUB
……..
……..
RECORD RESB 200
1.3 Different Architectures

The following section introduces the architectures of CISC and RISC machines. CISC machines
are called traditional machines. In addition to these we have recent RISC machines. Different machines
belonging to both of these architectures are compared with respect to their Memory, Registers, Data
Formats, Instruction Formats, Addressing Modes, Instruction Set, Input and Output
1.3.1 CISC machines
Traditional (CISC) Machines, are nothing but, Complex Instruction Set Computers, has relatively
large and complex instruction set, different instruction formats, different lengths, different addressing
modes, and implementation of hardware for these computers is complex. VAX and Intel x86 processors
are examples for this type of architecture.
1.3.1.1 VAX Architecture

Memory - The VAX memory consists of 8-bit bytes. All addresses used are byte addresses. Two
consecutive bytes form a word, Four bytes form a longword, eight bytes form a quadword, sixteen bytes
form a octaword. All VAX programs operate in a virtual address space of 232 bytes , One half is called
system space, other half process space.
Registers – There are 16 general purpose registers (GPRs) , 32 bits each, named as R0 to R15, PC
(R15), SP (R14), Frame Pointer FP ( R13), Argument Pointer AP (R12) ,Others available for
general use. There is a Process status longword (PSL) – for flags.
Data Formats - Integers are stored as binary numbers in byte, word, longword, quadword, octaword. 2’s
complement notation is used for storing negative numbers. Characters are stored as 8-bit ASCII codes.
Four different floating-point data formats are also available.
Instruction Formats - VAX architecture uses variable-length instruction formats – op code 1 or 2 bytes,
maximum of 6 operand specifiers depending on type of instruction. Tabak – Advanced Microprocessors
(2nd edition) McGraw-Hill, 1995, gives more information.
Addressing Modes - VAX provides a large number of addressing modes. They are Register mode,
register deferred mode, autoincrement, autodecrement, base relative, program-counter relative, indexed,
indirect, and immediate.
Instruction Set – Instructions are symmetric with respect to data type - Uses prefix – type of operation,
suffix – type of operands, a modifier – number of operands. For example, ADDW2 - add, word length, 2
operands, MULL3 - multiply, longwords, 3 operands CVTCL - conversion from word to longword.
VAX also provides instructionsto load and store multiple registers.
Input and Output - Uses I/O device controllers. Device control registers are mapped to separate I/O
space. Software routines and memory management routines are used for input/output operations.
9
1.3.1.2 Pentium Pro Architecture
Introduced by Intel in 1995.
Memory - consists of 8-bit bytes, all addresses used are byte addresses. Two consecutive bytes form a
word, four bytes form a double word (dword). Viewed as collection of segments, and, address = segment
number + offset. There are code, data, stack , extra segments.
Registers – There are 32-bit, eight GPRs, namely EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP. EAX,
EBX, ECX, EDX – are used for data manipulation, other four are used to hold addresses. EIP – 32-bit
contains pointer to next instruction to be executed. FLAGS is an 32 - bit flag register. CS, SS, DS, ES,
FS, GS are the six 16-bit segment registers.
Data Formats - Integers are stored as 8, 16, or 32 bit binary numbers, 2’s complement for negative
numbers, BCD is also used in the form of unpacked BCD, packed BCD. There are three floating point
data formats, they are single, double, and extended- precision. Characters are stored as one per byte –
ASCII codes.
Instruction Formats – Instructions uses prefixes to specify repetition count, segment register, following
prefix (if present), an opcode ( 1 or 2 bytes), then number of bytes to specify operands, addressing
modes. Instruction formats varies in length from 1 byte to 10 bytes or more. Opcode is always present in
every instruction
Addressing Modes - A large number of addressing modes are available. They are immediate mode,
register mode, direct mode, and relative mode. Use of base register, index register with displacement is
also possible.
Instruction Set – This architecture has a large and complex instruction set, approximately 400 different
machine instructions. Each instruction may have one, two or three operands. For example Register-to-
register, register-to-memory, memory-tomemory, string manipulation, etc…are the some the
instructions.
Input and Output - Input is from an I/O port into register EAX. Output is from EAX to an I/O port
1.3.2 RISC Machines

RISC means Reduced Instruction Set Computers. These machines are intended to simplify the
design of processors. They have Greater reliability, faster execution and less expensive processors. And
also they have standard and fixed instruction length. Number of machine instructions, instruction
formats, and addressing modes relatively small. UltraSPARC Architecture and Cray T3E Architecture
are examples of RISC machines.
1.3.2.1 UltraSPARC Architecture

Introduced by Sun Microsystems. SPARC – Scalable Processor ARChitecture. SPARC,
SuperSPARC, UltraSPARC are upward compatible machines and share the same basic structure.
Memory - Consists of 8-bit bytes, all addresses used are byte addresses. Two consecutive bytes form a
halfword, four bytes form a word , eight bytes form a double word. Uses virtual address space of 264
bytes, divided into pages.
Registers - More than 100 GPRs, with 64 bits length each called Register file. There are 64 double
precision floating-point registers, in a special floating-point unit (FPU). In addition to these, it contains
PC, condition code registers, and control registers.
Data Formats - Integers are stored as 8, 16, 32 or 64 bit binary numbers. Signed, unsigned for integers
10
and 2’s complement for negative numbers. Supports both big- endian and little-endian byte orderings.
Floating-point data formats – single, double and quad-precision are available. Characters are stored as 8-
bit ASCII value.
Instruction Formats - 32-bits long, three basic instruction formats, first two bits identify the format.
Format 1 used for call instruction. Format 2 used for branch instructions. Format 3 used for load, store
and for arithmetic operations.
Addressing Modes - This architecture supports immediate mode, register-direct mode,PC-relative,
Register indirect with displacement, and Register indirect indexed.
Instruction Set – It has fewer than 100 machine instructions. The only instructions that access memory
are loads and stores. All other instructions are register-to-register operations. Instruction execution is
pipelined – this results in faster execution, and hence speed increases.
Input and Output - Communication through I/O devices is accomplished through memory. A range of
memory locations is logically replaced by device registers. When a load or store instruction refers to this
device register area of memory, the corresponding device is activated. There are no special I/O
instructions.
1.3.2.2 Cray T3E Architecture
Announced by Cray Research Inc., at the end of 1995 and is a massively parallel processing
(MPP) system, contains a large number of processing elements (PEs), arranged in a three-dimensional
network. Each PE consists of a DEC Alpha EV5 RISC processor, and local memory.
Memory - Each PE in T3E has its own local memory with a capacity of from 64 megabytes to 2
gigabytes, consists of 8-bit bytes, all addresses used are byte addresses. Two consecutive bytes form a
word, four bytes form a longword, eight bytes form a quadword.
Registers – There are 32 general purpose registers(GPRs), with 64 bits length each called R0 through
R31, contains value zero always. In addition to these, it has 32 floating-point registers, 64 bits long, and
64-bit PC, status , and control registers.
Data Formats - Integers are stored as long and quadword binary numbers. 2’s complement notation for
negative numbers. Supports only little-endian byte orderings. Two different floating-point data formats –
VAX and IEEE standard. Characters stored as8-bit ASCII value.
Instruction Formats - 32-bits long, five basic instruction formats. First six bits always identify the
opcode.
Addressing Modes - This architecture supports, immediate mode, register-direct mode, PC-relative, and
Register indirect with displacement.
Instruction Set - Has approximately 130 machine instructions. There are no byte or word load and store
instructions. Smith and Weiss – “PowerPC 601 and Alpha 21064: A Tale of TWO RISCs “ – Gives more
information.
Input and Output - Communication through I/O devices is accomplished through multiple ports and
I/O channels. Channels are integrated into the network that interconnects the processing elements. All
channels are accessible and controllable from all PEs.
11
Unit II:
Assemblers- Basic Assembler Functions- A simple SIC Assembler- Assembler Algorithm and Data
Structures. Machine-Dependent Assembler features-Instruction formats and addressing modes. Machine-
Independent Assembler features-Literals- Expressions-Program blocks. Assembler Design options--One pass
Assemblers- Multi-pass Assemblers.
2.1 Basic Functions of Assembler

 Assembler is a system software which is used to convert an assembly language
program to its equivalent object code(machine code).
 The input to the assembler is a source code written in assembly language (using
mnemonics) and the output is the object code.
Assembly Language Object Code

Assembler
 Functions of an assembler includes:

o Translating mnemonic operation codes to their machine language
equivalents.
o Assigning machine addresses to symbolic labels used by the programmer.
 The design of an assembler depends upon the machine architecture as the language
used is mnemonic language.
2.2 A simple SIC Assembler

 The translation of source program to object code requires the following functions:
1. Convert mnemonic operation codes to their machine language equivalents. Eg: In
the program given on next page Translate STL to 14 (line 10).
2. Convert symbolic operands to their equivalent machine addresses. Eg:Translate
RETADR to 1033 (line 10).
3. Build the machine instructions in the proper format.
4. Convert the data constants specified in the source program into their internal
machine representations. Eg: Translate EOF to 454F46(line 80).
5. Write the object program and the assembly listing.
Consider the following assembly language program for SIC. This program contains
a main routine that calls the subroutine RDREC which reads records from an input device(
code F1) and WRREC which copies them to an output device(code 05).
The main routine calls subroutines:
• RDREC – To read a record into a buffer.
• WRREC – To write the record from the buffer to the output device.
At the end of the file it writes EOF on the output device.(The end of each record is marked
with a null character (hexadecimal 00)).
The line numbers are for reference only. Indexed addressing is indicated by adding
the modifier ”X” following the operand. Lines beginning with ”.” contain comments only.
12
13
Figure 2.1 –Example of a SIC assembler language program
Explanation of above program(No need to study, just to understand the program):
14
Data transfer (RD, WD)
A buffer (BUFFER) is used to store record. The end of each record is marked with a
null character (0016). Buffer length is 4096 Bytes The end of the file is indicated by a zero-
length record(EOF). When the end of file is detected, the program writes EOF on the output
device and terminates by RSUB.
Subroutines (JSUB, RSUB)

RDREC is the subroutine for reading records, WRREC is the subroutine for writing
the records to output device. The contens of the linkage register (L) is saved into RETADR
variable before jumping to subroutine.
Figure 2.1 shows a sample SIC assembler language program along with the
generated object code for each statement. Assembler directives, START,END, RESW, RESB,
WORD, BYTE etc do not generate the object code but directs the assembler to perform certain
operation.. Assume the program starting at address 1000. The first column shows the line
number for that instruction, second column shows the addresses allocated to each instruction.
The third column indicates the labels given to the statement, and is followed by the instruction
consisting of opcode and operand. The last column gives the equivalent object code. The
object code later will be loaded into memory for execution. The object program contains three
types of records as explained below.
2.3 Assembler output format - Header, Text and End Records

The object program contains three types of records:
• Header record
Col. 1 H
Col. 2 7 Program name
Col. 8 13 Starting address of object program (hex)
Col. 14 19 Length of object program in bytes (hex)
• Text record
Col. 1 T
Col. 2 7 Starting address for object code in this record (hex)
Col. 8 9 Length of object code in this record in bytes (hex)
15
Col. 10 69 Object code, represented in hex (2 col. per byte). So a maximum of
30 bytes can be stored in each text record.
• End record
Col.1 E
Col.2 7 Address of first executable instruction in object program (hex).
(”ˆ” is only for separation only)
Fig 2.2 - Object code for the above example program:
We have two columns per byte for object code. Each machine instruction is 3 bytes
that is it occupies 6 columns. In the first text record we are saving 10 machine instructions
each of 3 bytes size. So we are storing a total of 30 bytes (60 columns) which is 1E in
decimal.(IE marked in a circle in the example given).
2.4 Design of a two pass assembler
2.4.1 Necessity of two passes and Forward reference:
Forward reference: It is the reference to a label that is defined later in the program.
In the above example in line number 1000 the instruction STL will store the linkage
register with the contents of RETADR. But during the processing of this instruction the
value of this symbol is not known as it is defined at the line number 1033.
To generate the object code for the instruction at 1000 we need the opcode for STL
and the value for the symbol RETADR. But the value or address of RETADR is not
available until 1033. This reference of RETADR before it is defined is called forward
referencing.
16
So generating the object code by scanning the entire program only once becomes
difficult. Due to this reason usually the design is done in two passes. A two pass assembler
resolves the forward references with the help of a SYMBOL TABLE and then converts the
program into the object code.
Functions of the two passes of assembler:

Pass 1 (Define symbols)
1. Assign addresses to all statements in the program.
2. Save the addresses assigned to all labels for use in Pass 2.
3. Perform some processing of assembler directives.
Pass 2 (Assemble instructions and generate object programs)

1. Assemble instructions (translating operation codes and looking up addresses).
2. Generate data values defined by BYTE,WORD etc.
3. Perform processing of assembler directives not done in Pass 1.
4. Write the object program and the assembly listing.
2.4.2 Data Structures Used

The data structures used in the design of 2 pass algorithm are:
• Operation Code Table (OPTAB)
• Symbol Table (SYMTAB)
• Location Counter(LOCCTR)
Operation Code Table (OPTAB)

It is used to lookup mnemonic operation codes and translates them to their machine
language equivalents. In more complex assemblers the table also contains information about
instruction format and length.
In pass 1 the OPTAB is used to look up and validate the operation code in the source
program and to find the instruction length for incrementing LOCCTR. In pass 2, it is used to
translate the operation codes to machine language.
(OPTAB is usually organized as a hash table, with mnemonic operation code as the
key. The hash table organization is particularly appropriate, since it provides fast retrieval
with a minimum of searching. Most of the cases the OPTAB is a static table that is, entries
are not normally added to or deleted from it. In such cases it is possible to design a special
hashing function or other data structure to give optimum performance for the particular set
of keys being stored.)
Symbol Table (SYMTAB)

This table includes the name and value for each label in the source program, together
with flags to indicate the error conditions (e.g., if a symbol is defined in two different
places).
During Pass 1,labels are entered into the symbol table along with their assigned
address value as they are encountered. All the symbols address value should get resolved at
17
the pass 1. During Pass 2, symbols used as operands are looked up the symbol table to
obtain the address value to be inserted in the assembled instructions. SYMTAB is usually
organized as a hash table for efficiency of insertion and retrieval. A sample SYMTAB is
shown below.
Location Counter (LOCCTR)

Apart from the SYMTAB and OPTAB, this is another important variable which
helps in the assignment of the addresses. LOCCTR is initialized to the beginning address
mentioned in the START statement of the program. After each statement is processed, the
length of the assembled instruction is added to the LOCCTR to make it point to the next
instruction. Whenever a label is encountered in an instruction the LOCCTR value gives the
address to be associated with that label.
(Both pass 1 and pass 2 require reading the source program. Apart from this an
intermediate file is created by pass 1 that contains each source statement together with its
assigned address, error indicators, etc. This file is one of the inputs to the pass 2. A copy of
the source program is also an input to the pass 2, which is used to retain the operations that
may be performed during pass 1 (such as scanning the operation field for symbols and
addressing flags), so that these need not be performed during pass 2. )
18
2.4.3 The Algorithm for Pass 1
19
Explanation of Pass 1 Algorithm:
The algorithm scans the first statement START and saves the operand field (the
address) as the starting address of the program. Initializes the LOCCTR value to this
address. This line is then written to the intermediate file. If no operand is mentioned the
LOCCTR is initialized to zero.
If a label is encountered, the symbol has to be entered in the symbol table along with
its associated address value. If the symbol already exists that indicates an entry of the same
symbol already exists. So an error flag is set indicating a duplication of the symbol.
Next it checks for the mnemonic code, it searches for this code in the OPTAB. If
found then the length of the instruction is added to the LOCCTR to make it point to the next
instruction.
If the opcode is the assembler directive WORD it adds a value 3 to the LOCCTR. If
it is RESW, it needs to add the number of data word to the LOCCTR (each word is of size
3bytes so 3*no of words). If it is BYTE it adds the length of the constant to the LOCCTR, if
RESB it adds number of bytes reserved. If it is END directive then it is the end of the
program it finds the length of the program by evaluating current LOCCTR minus the starting
address mentioned in the operand field of the END directive. Each processed line is written
to the intermediate file.
2.4.2 The Algorithm for Pass 2
20
Explanation of Pass 2 Algorithm:
Here the first input line is read from the intermediate file. If the opcode is START,
then this line is directly written to the listing file(output file). A header record is written in
the object program which gives the starting address and the length of the program (which is
calculated during pass 1).
21
Then the first text record is initialized. Comment lines are ignored. OPTAB is
searched to find the object code of an opcode. If there is a symbol in the operand field, the
symbol table is searched to get the address value for this which gets appended to the object
code of the opcode. If the address is not found then zero value is stored as operand's address.
An error flag is set indicating it as undefined. If symbol itself is not found then store 0 as
operand address and the object code instruction is assembled.
If the opcode is BYTE or WORD, then the constant value is converted to its
equivalent object code( for example, for character EOF, its equivalent hexadecimal value
'454f46' is stored). If the object code cannot fit into the current text record, a new text record
is created and the rest of the instructions object code is listed. The text records are written to
the object program. Once the whole program is assemble and when the END directive is
encountered, the End record is written.
Machine Dependent Assembler Features

The features which are closely related(dependent) to machine architecture are called
machine dependent assembler features.The machine dependent assembler features includes
1. Instruction Formats and Addressing Modes

2. Program Relocation
Instruction Formats and Addressing Modes
Study the instruction formats and addressing modes of SIC/XE from first module.
Program Relocation
Sometimes it is required to load and run several programs at the same time. The
system must be able to load these programs wherever there is place in the memory.
Therefore the exact starting address is not known until the load time.
In an absolute program the starting address to which the program has to be loaded is
mentioned in the program itself using the START directive. So the address of every
instruction and labels are known while assembling itself. This is called absolute addressing.
Consider an example
This statement says that the register A is loaded with the value stored at location
102D(which is the address of THREE). Suppose we need to load and execute the program at
location 3000 instead of location 1000. Since program is loaded into location 3000, at
address 102D (address of THREE) the required value which needs to be loaded in the
22
register A is no more available. The address of the symbols also get changed relative to the
displacement of the program. Hence we need to make some changes in the address portion
of the instruction so that we can load and execute the program at location 3000.
Since assembler will not know actual location where the program will get loaded, it
cannot make the necessary changes in the addresses used in the program. However, the
assembler can identifies and informs the loader those parts of the program which need
modification. An object program that has the information necessary to perform this kind of
modification is called the relocatable program.
The above diagram shows the concept of relocation. Initially the program is loaded at
location 0000. The instruction JSUB is loaded at location 0006. The address field of this
instruction contains 01036, which is the address of the instruction labeled RDREC. The
second figure shows that if the program is to be loaded at new location 5000. The address of
the instruction JSUB gets modified to new location 6036. Likewise the third figure shows
that if the program is relocated at location 7420, the JSUB instruction would need to be
changed to 4B108456 that correspond to the new address of RDREC.
The only part of the program that require modification at load time are those that
specify direct addresses. The rest of the instructions need not be modified. The instructions
which doesn't require modification are the ones that is not a memory address (immediate
addressing) and PC-relative, Base-relative instructions.
23
It is not possible for the loader to distinguish the address and constant from the
object program. So the assembler must keep some information to tell the loader which part
of the object program need to be modified. For this the concept of modification record is
record.
Modification record is a type of record which is added to the object program. One
modification record is created for each address to be modified. The assembler produces a
modification record to store the starting location and the length of the address field to be
modified.
The Modification record has the following format:
 The length is stored in half-bytes (4 bits)

 The starting location is the location of the byte containing the leftmost bits of the
address field to be modified.
 If the field contains an odd number of half-bytes, the starting location begins in
the middle of the first byte.
Example for a relocatable object program:
24
 The object code lines at the end starting with M are the descriptions
of the modification records for those instructions which need change if
relocation occurs.
 M00000705 is the modification suggested for the statement at location
0007 and requires modification 5-half bytes.
 Similarly for the remaining modification records.
MACHINE INDEPENDENT ASSEMBLER FEATURES
The features which are NOT closely dependent to machine architecture are called
machine independent assembler features. The machine independent assembler features
includes:
1. Literals
2. Symbol Defining Statements
3. Expressions
4. Program Blocks
5. Control Sections and Program Linking
LITERALS
 It is convenient for the programmer to be able to write the value of a constant operand as
part of the instruction that uses it.
 This avoids having to define the constant elsewhere in the program and make a label for it.
 Such an operand is called a Literal because the value is literally in the instruction.
 A literal is defined with a prefix '=' followed by a specification of the literal value.
 Consider the following example:
.
.
LDA FIVE
.
.
FIVE WORD 5
Using the concept of literal we can rewrite the above code as:
.
.
LDA =X’05’
Difference between literal operands and immediate operands
 For literals prefix is =, and for immediate addressing prefix is #.
 In immediate addressing, the operand value is assembled as part of the machine instruction,
ie there is no memory reference.
Line no Location Counter
55 0020 LDA #03 010003
In the above example the last 12 bits of the machine code corresponds to 003 which is equal
to the immediate value.
 With a literal, the assembler generates the specified value as a constant at some other
memory location. The address of this generated constant is used as the target address (TA)
for the machine instruction ( using PC-relative or base-relative addressing with memory
reference.)
25
Literal Pool
 All the literal operands used in a program are gathered together into one or more literal
pools. This is usually placed at the end of the program.
 In some cases, it is desirable to place literals into a pool at some other location in the
object program. To allow this an assembler directive LTORG is used.
 When the assembler encounters a LTORG statement, it generates a literal pool containing
all literal operands used since previous LTORG or the beginning of the program
 Literals placed in a pool by LTORG will not be repeated in a pool at the end of the
program.
 Reason for using LTORG is to keep the literal operand close to the instruction (otherwise
PC-relative addressing may not be allowed)
Literal Table (LITTAB)
 A literal table(LITTAB) is created for storing the literals which are used in the program.
 The literal table contains the literal name, operand value and length.
 The literal table is usually created as a hash table on the literal name.
Duplicate literals
 The same literal used more than once in the program, then it can be consider as a duplicate
literal.
 In such cases, only one copy of the specified value needs to be stored
 To recognize the duplicate literals, two methods are there
1. Compare the character strings defining them
Easier to implement e.g. =X’05’. But not possible to handle the literals like
=C’EOF’ and =X’454F46’.
Here both literals are same in the form of their data value.
2. Compare the generated data value
Possible to handle the literals like =C’EOF’ and =X’454F46’. Here both literals are same
in the form of their generated data value. So comparison based on generated data value is
needed to identify duplicate literals or not. But this is difficult to implement compared to
the first method.
Implementation of Literals
During Pass-1:
The literal encountered is searched in the literal table. If the literal already exists, no
action is taken; if it is not present, the literal is added to the LITTAB and for the address
value it waits till it encounters LTORG or END statement for literal definition.
When Pass 1 encounters a LTORG statement or the end of the program, the
assembler makes a scan of the literal table. At this time each literal currently in the table
26
is assigned an address. As addresses are assigned, the location counter is updated to reflect
the number of bytes occupied by each literal.
During Pass-2:
The assembler searches the LITTAB for each literal encountered in the instruction
and replaces it with its equivalent value as if these values are generated by BYTE or WORD.
The following figure shows the difference between the SYMTAB and LITTAB
3.1.2 SYMBOL DEFINING STATEMENTS AND EXPRESSIONS

EQU Statement:
 Most assemblers provide an assembler directive that allows the programmer to define
symbols and specify their values.
 The directive used for this EQU (Equate).
 The general form of the statement is
Symbol EQU value
 This statement defines the given symbol (i.e., entering in the SYMTAB) and assigns the
value specified to that symbol.
 The value can be a constant or an expression involving constants and any other symbol
which is already defined.
 One common usage is to define symbolic names that can be used to improve readability
in place of numeric values. For example
LDA #100
This loads the register A with immediate value 100, this does not clearly mention what
exactly this value indicates. If a statement is included as:
MAXLEN EQU 100
and then LDA #MAXLEN then it clearly indicates that the value of MAXLEN is some
maximum length value and it is to be loaded in A register.
 When the assembler encounters EQU statement, it enters the symbol MAXLEN along
with its value in the symbol table. During LDA the assembler searches the SYMTAB for
its entry and its equivalent value as the operand in the instruction.
 The object code generated is the same for both the options discussed, but is easier to
understand.
 If the maximum length is changed from 100 to 500, it is difficult to change if it is
mentioned as an immediate value wherever required in the instructions. We have to scan
the whole program and make changes wherever 100 is used.
27
 If we mention this value in the instruction through the symbol defined by EQU, we may
not have to search the whole program but change only the value of MAXLENGTH in the
EQU statement.
ORG Statement:
 This directive can be used to indirectly assign values to the symbols. The directive is
usually called ORG (means origin).
 Its general format is:
ORG value
where value is a constant or an expression involving constants and previously defined
symbols.
 When this statement is encountered during assembly of a program, the assembler resets its
location counter (LOCCTR) to the specified value.
 Since the values of symbols used as labels are taken from LOCCTR, the ORG statement
will affect the values of all labels defined until the next ORG is encountered.
 Eg: ORG AlPHA
When this statement is encountered during assembly of a program, the assembler resets its
location counter (LOCCTR) to the value of ALPHA.
EXPRESSIONS
 The assemblers allow the use of expressions as operand
 The assembler evaluates the expressions and produces a single operand address or value.
 Assemblers generally allow arithmetic expressions as operands formed according to the
normal rules using arithmetic operators +, - *, /. (Division is usually defined to produce an
integer result.)
 Individual terms may be constants, user-defined symbols, or special terms.
 The only special term used is * ( the current value of location counter) which indicates the
value of the next unassigned memory location.
Thus the statement
BUFFEND EQU *
Assigns the value of LOCCTR to BUFFEND, which is the address of the next byte
following the buffer area.
 Some values in the object program are relative to the beginning of the program and some
are absolute (independent of the program location, like constants). Hence, expressions are
classified as either absolute expression or relative expressions depending on the type of
value they produce.
 Absolute Expressions:
 The expression that uses only absolute terms is absolute expression. Absolute
expression may contain relative term provided the relative terms occur in pairs with
opposite signs for each pair.
 Example:
MAXLEN EQU BUFEND-BUFFER
In the above instruction the difference in the expression BUFEND-BUFFER gives a
value that does not depend on the location of the program and hence gives an
absolute value
 Relative Expressions:
 The expression that uses the values relative to the program are called relative
expression.
 Absolute expression may contain relative term provided the relative terms occur in
pairs with opposite signs for each pair.
28
 Example:
MAXLEN EQU ALPHA + BUFEND-BUFFER
In the above instruction the difference in the expression BUFEND-BUFFER gives
a value that does not depend on the location of the program but it is added to the
value of ALPHA which is program relative. Hence this expression is relative.
3.1.3 PROGRAM BLOCKS

 Program blocks allow the generated machine instructions and data to appear in the
object program in a different order by separating blocks for storing code, data, stack,
and larger data block.
 To implement the program block the Assembler Directive used is USE
 Syntax is
USE [block name]
 At the beginning, statements are assumed to be part of the unnamed (or default) block.
 Whenever a USE CDATA statement is encountered, statements upto next USE belongs
to the program block named CDATA.
 If no USE statements are included, the entire program belongs to this single block.
 Each program block may actually contain several separate segments of the source
program. Assemblers rearrange these segments to gather together the pieces of each
block and assign address.
 Consider the following example:
COPY START 0
LDA LENGTH
………
………
USE CDATA
MAX RESW 1
LENGTH RESW 1
USE CBLOC
KS
BUFFER RESB 00
………
//Subroutine to read record into buffer
USE
RDREC CLEAR X
LDA
INPU
T
………..
…………
USE CDATA
INPUT BYTE X’F1’
…………
//Subroutine to write record from buffer
USE
WRREC STA MAX
29
………
USE CDAT
A
MIN RESW 1
BUFEND RESW 1
In the example give above three
program blocks are used :DEFAULT: executable instructions.
CDATA: all data areas that are less in length.
CBLOCKS: all data areas that consists of larger blocks of memory.
DEFAULT
CDATA
CBLOCKS
Arranging code into program blocks:

During Pass 1 assembler performs the following operations:
 A separate location counter for each program block is maintained.
 At the beginning of a block, LOCCTR is set to 0.
 Save and restore LOCCTR when switching between blocks.
 Assign each label an address relative to the start of the block.
 Store the block name or number in the SYMTAB along with the assigned relative
address of the label
 Indicate the block length as the latest value of LOCCTR for each block at the end of
Pass1
 Assign to each block a starting address in the object program by concatenating the
program blocks in a particular order
 At the end of pass 1 a block table is generated.
Block Table
Block Name Block Starting Ending Length of

Number Address Address Block
Default 0 0000 0065 0066

CDATA 1 0066 0070 000B
CBLKS 2 0071 1070 1000
During Pass 2 assembler performs the following operations:
 Calculate the address for each symbol relative to the start of the object program by
adding
o The location of the symbol relative to the start of its block
o The starting address of this block
Program Blocks Loaded in Memory
Separation of program into blocks results in the movement of the large buffer
(CBLKS) to the end of the object program. As a result extended format, base register
addressing etc are no longer needed. Modification records are also not needed. This improves
program readability.
30
3.1.4 CONTROL SECTIONS
 A control section is a part of the program that maintains its identity after assembly;
each control section can be loaded and relocated independently of the others.
Different control sections are most often used for subroutines or other logical
subdivisions.
 The programmer can assemble, load, and manipulate each of these control sections
separately. Because of this, there should be some means for linking control sections
together.
 For example, instructions in one control section may refer to the data or instructions
of other control sections. Since control sections are independently loaded and
relocated, the assembler is unable to process these references in the usual way. Such
references between different control sections are called external references.
 The assembler generates the information about each of the external references that
will allow the loader to perform the required linking. When a program is written
31
using multiple control sections, the beginning of each of the control section is
indicated by an assembler directive: CSECT
 The syntax
secname CSECT
 The assembler maintain separate LOCCTR beginning at 0 for each control sections.
 Control sections differ from program blocks in that they are handled separately by the
assembler.
Handling of External References
Instructions in one control section may need to refer to instructions or data located in
another section. This is called as external references. The external references are indicated
by two assembler directives: EXTDEF and EXTREF
EXTDEF (External Definition)

 It defines the symbols that are defined in this control section and may be used by
other sections
 Syntax - EXTDEF name [, name]
 Ex: EXTDEF BUFFER, BUFEND, LENGTH which means the symbols BUFFER,
BUFFEND and LENGTH are defined in this control section and may be used by
some other control sections.
EXTREF (External Reference)

 It names symbols that are used in this section but are defined in some other control
section.
 Syntax - EXTREF name [,name]
 Ex: EXTREF A,B which means the symbols A and B are used in this control section
but are defined in some other control section.
The assembler must include information in the object program that will cause the loader to
handle external references properly. For this three types of records are used in object
program: Define, Refer and Modification Record.
32
The format of modification record which we studied in Module 2 is revised to support
the handling of external references.
Consider the following code segments:
COPY START 0
EXTDEF BUFFER,
BUFFEND, LENGTH EXTREF
A,B
LDA ALPHA
………………
………………
………………
BUFFER WORD 3
BUFFEND EQU *
LENGTH EQU BUFFEND-BUFFER
RDREC CSECT
EXTREF BUFFER, BUFFEND, LENGTH
………………………..
………………………..
……………………
…… LDA
BUFF
ER
…………………………
…………………………..
…………………
………. END
The object program generated for the above code segment is:
H^ COPY ^ 000000^001033
D^BUFFER^000033^BUFEND^001033^LENGTH^00
002D
R^A ^B
T^…………………………
33
…
T^……………………………
……………………………….
……………………………
…
M^000004^05^+RDREC
……………………………
… E^000000
3.2 ASSEMBLER DESIGN OPTIONS

In this section, two alternatives to the standard two-pass assembler logic is discussed.
They are:
Single Pass Assembler
Multipass Assembler
3.2.1 SINGLE PASS ASSEMBLER

These assemblers are used when it is necessary or desirable to avoid a second pass
over the source program. The main problem in designing the assembler using single pass was
to resolve forward references.
One-pass assemblers could produce object codes either in memory or to external
storage. One-pass assemblers usually need to modify object code already generated, so
whether object code is stored in memory or external storage imposes different considerations
on assembler design. Based on this one-pass assemblers can be classified into two types:
1. One that produces object code directly in memory for immediate execution (Load-
and-go assemblers).
2. One pass assembler generating object code for later execution.
1. Load-and-Go Assembler
Load-and-go assembler generates their object code in memory for immediate
execution. Since no object program is written out, no loader is needed. It is useful in a system
with frequent program development and testing. Since the object program is produced in
memory, the handling of forward references becomes less difficult.
Working of One pass assembler (Load and Go Assembler)
In load-and-Go assemblers when a forward reference is encountered :
 Omits the operand address if the symbol has not yet been defined(placess 000 at the
operand addresses position)
 Enters this undefined symbol into SYMTAB and indicates that it is undefined
 Adds the location at which the operand is referenced to a list of forward references
associated with the SYMTAB entry
 When the definition for the symbol is encountered, scans the reference list and inserts
the address.
 At the end of the program, reports the error if there are still SYMTAB entries
indicated undefined symbols(* indicates undefined).
 When the END statement is encountered, search SYMTAB for the symbol named in
the END statement and jumps to this location to begin execution if there is no error.
In short, whenever any undefined symbol is encountered it will insert into SYMTAB as a
new entry and indicate that it is undefined and also adds the location at which the operand is
34
referenced as a linked list associated with that SYMTAB entry. When the definition for the
symbol is encountered, scans the reference list and inserts the address in proper location.
Algorithm for Single Pass Assembler (Load and Go Assembler)

begin
read first input line
if OPCODE = ‘START’ then

{ save #[OPERAND] as starting
address initialize LOCCTR as
starting address
} //end of if OPCODE =
‘START’ else
initialize LOCCTR to
0 write Header record to object
program read next input line
while OPCODE ≠ ‘END’
{ if this is not a comment line
{ if there is a symbol in the LABEL field

{ search SYMTAB for LABEL
35
if found
{ if symbol value as null

{ set symbol value as LOCCTR
search the attached forward reference list(if exist) and

the address of the symbol is inserted into any
instructions previously generated
delete the forward reference list attached to that symbol
els
e insert (LABEL, LOCCTR) into SYMTAB
} //end of if there is a symbol in the

LABEL field search OPTAB for
OPCODE
if found
search SYMTAB for OPERAND

ADDRESS if found
{ if symbol value not equal to null
store symbol value as operand address
else
insert a node with address LOCCTR at the

end of the forward reference list of that
} symbol
els
e
{ insert (symbol name, null)
insert a node with address LOCCTR at the end of

the forward reference list of that symbol
add 3 to
LOCCTR else if
OPCODE =’WORD’
add 3 to
LOCCTR else if
OPCODE =’RESW’
36
add 3 #[OPERAND] to LOCCTR
else if OPCODE =’RESB’
add #[OPERAND] to LOCCTR

else if OPCODE = ‘BYTE’
37
{ find length of constant in
bytes add length to
LOCCTR convert
constant to object code
}
if object code will not fit into current text record
{ write Text record to object

program initialize new text
record
}
add object code to Text

record read next input
line
}
}//end of while OPCODE ≠

‘END’ write last Text record to
object program write End record
to object program
end
Example:
The following figure shows the status upto this point. The symbol RREC is referred once at
location 2013, ENDFIL at 201C and WRREC at location 201F. None of these symbols are
defined. The figure shows that how the pending definitions along with their addresses are
included in the symbol table.
38
When the definition for the symbols RDREC and ENDFILL are encountered, the
reference list associated with the symbols is scanned and the address is inserted at proper
location. It is gioven in following figure:
2. One pass assembler generating object code for later execution.

In this type of one pass assembler, the generated object program is stored in external
storage (e.g.,files on disks). So random updates to operands target addresses(as in load-and-
go load-and- assemblers do) are not permitted.
For any symbol involved in forward references, once the target address of the symbol
is identified, additional text records must be generated to overwrite those previously omitted
target addresses. Records must be loaded in the same order as they appear in the object
program. Actually, the handling of forward references are jointly done by the assembler and
the linking loader.
One pass assembler which generates object code unlike load and go assembler
operates in the following fashion:
 If the operand contains an undefined symbol, use 0 as the address and write the Text
record to the object program.
 Forward references are entered into lists as in the load-and-go assembler.
39
 When the definition of a symbol is encountered, the assembler generates another Text
record with the correct operand address of each entry in the reference list.
 When loaded, the incorrect address 0 will be updated by the latter Text record
containing the symbol definition.
Example:
3.2.1 MULTI PASS ASSEMBLER

 For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
 Here the problem is, the symbol BETA cannot be assigned a value when it is encountered
during Pass 1 because DELTA has not yet been defined. Hence ALPHA cannot be
evaluated during Pass 2. So that the symbol definition must be completed in pass 1.
 The general solution for this type of forward references is to use a multi-pass assembler
that can make as many passes as are needed to process the definitions of symbols.
 It is not necessary for such an assembler to make more than 2 passes over the entire
program.
 The portions of the program that involve forward references in symbol definition are
saved during Pass 1.Additional passes through these stored definitions are made as the
assembly progresses. This process is followed by a normal Pass 2.
Implementation of Multipass Assembler
40
 For a forward reference in symbol definition, we store in the SYMTAB:
o The symbol name
o The defining expression
o The number of undefined symbols in the defining expression
 The undefined symbol (marked as *) associated with a list of symbols depend on this
undefined symbol.
 When a symbol is defined, we can recursively evaluate the symbol expressions
depending on the newly defined symbol.
 The portions of the program that involve forward references in symbol definition are
saved during Pass 1.Additional passes through these stored definitions are made as the
assembly progresses. This process is followed by a normal Pass 2.
Example:
 Consider the symbol table entries from Pass 1 processing of the statement.
HALFS2 EQU MAXLEN/2
 Since MAXLEN has not yet been defined, no value for HALFS2 can be computed.
The defining expression for HALFS2 is stored in the symbol table in place of its
value.
 The entry &1 indicates that 1 symbol in the defining expression undefined.
 SYMTAB simply contain a pointer to the defining expression.
 The symbol MAXLEN is also entered in the symbol table, with the flag * identifying
it as undefined. Associated with this entry is a list of the symbols whose values
depend on MAXLEN.
41
If possible study the portion given below
Unit III: Loaders & Linkers: Basic Loader Functions- Design of Absolute Loader- Simple Bootstrap
Loader-Machine Dependent Loader features-Relocation-Program linking-Algorithm and Data structures for
a Linking loader. Loader Design options.
LINKER AND LOADER
4.1 NEED FOR LINKING AND LOADING
42
 To execute an object program, we need:
 Relocation - which modifies the object program so that it can be loaded at an address
different from the location originally specified
 Linking - which combines two or more separate object programs and supplies the
information needed to allow references between them
 Loading and Allocation - which allocates memory location and brings the object
program into memory for execution
43
44
 The system software which performs linking operation is called linker. The system software
which loads the object program into memory and starts its execution is called loader.
Linkers and loaders perform several related but conceptually separate actions.
4.2 BASIC LOADER FUNCTIONS

 Fundamental functions of a loader are Bringing an object program into memory and
starting its execution.
 In this section, two basic loader designs are discussed
1. Absolute Loader
2. Bootstrap Loader
4.2.1 Design of an Absolute Loader

An absolute loader is a loader that places absolute code into main memory beginning
with the initial address(absolute address) assigned by the assembler. No address manipulation is
performed. That is there is no need for relocation and linking because the program will be
loaded into the location specified in the program.
For a simple absolute loader, all functions are accomplished in a single pass as follows:
1) The Header record of object programs is checked to verify that the correct program has
been presented for loading.
2) As each Text record is read, the object code it contains is moved to the indicated address in
memory.
3) When the End record is encountered, the loader jumps to the specified address to begin
execution of the loaded program.
Algorithm for absolute loader

begin
read Header record
verify program name and length
read first Text record
while record type ≠ E
begin
//if object code is in character form, convert it into internal representation
move object code to specified location in memory
45
read next object program record
end
jump to address specified in End record
end
Advantages and disadvantages of absolute loader

The advantage of absolute loader is that it is simple and efficient, but the need for
programmer to specify the actual address restricts the flexibility. As a result we cannot run
several independent programs together, sharing memory between them. Another disadvantage
is that it is difficult to use subroutine libraries while using an absolute loader.
4.2.2 A Simple Bootstrap Loader

 Given an idle computer with no program in memory, how do we get things started? Two
solutions are there.
1. On some computers, an absolute loader program is permanently resident in a read-
only memory (ROM). When some hardware signal occurs, the machine begins to
execute this ROM program. This is referred to as a bootstrap loader.
2. On some computers, there’s a built-in hardware which read a fixed-length record
from some device into memory at a fixed location. After the read operation, control
is automatically transferred to the address in memory.
When a computer is first turned on or restarted, a special type of absolute loader, called
a bootstrap loader, is executed. This bootstrap loader loads the first program to be run by the
computer – usually an operating system.
Working of a SIC Bootstrap loader

 SIC uses the above mentioned second method.
 The bootstrap begins at address 0 in the memory of the machine.
 It loads the operating system at address 80.
 Each byte of object code to be loaded is represented on device F1 as two hexadecimal
digits just as it is in a Text record of a SIC object program.
 The object code from device F1 is always loaded into consecutive bytes of memory,
starting at address 80.
 The main loop of the bootstrap keeps the address of the next memory location to be
loaded in register X.
 After all of the object code from device F1 has been loaded, the bootstrap jumps to
address 80, which begins the execution of the program that was loaded.
 Much of the work of the bootstrap loader is performed by the subroutine GETC.
 GETC is used to read and convert a pair of characters from device F1 representing
1 byte of object code to be loaded. For example, two bytes = C “D8”  ‘4438’H
converting to one byte ‘D8’H.
 The resulting byte is stored at the address currently in register X, using STCH
instruction that refers to location 0 using indexed addressing.
 The TIXR instruction is then used to add 1 to the value in X.
Bootstrap Loader for SIC/XE
46
This bootstrap main function reads object code from device F1 and enters it into memory starting at address 80
(hexadecimal) . After all of the code from dev F1 has been seen entered into memory, the bootstrap executes a
jump to address 80 to begin execution of the program just loaded. Register X contains the next address to be
loaded.
BOOT START 0
CLEAR A CLEAR REGISTER A TO ZERO
LDX #128 INITIALIZE REGISTER X TO HEX 80
LOOP JSUB GETC READ HEX DIGIT FROM PROGRAM BEING LOADED
RMO A, S SAVE IN REGISTER S
SHIFTL S , 4 MOVE TO HIGHORDER 4 BITS OF BYTE
JSUB GETC GET NEXT HEX DIGIT
ADDR S ,A COMBINE DIGITS TO FORM ONE BYTE
STCH 0 ,X STORE AT ADDRESS IN REGISTER X
TIXR X ADD 1 TO MEMORY ADDRESS BEING LOADED
JUMP LOOP LOOP UNTIL END OF INPUT IS REACHED
GETC subroutine read one character from input device and convert it from ASCII code to hexadecimal digit value.
The converted digit value is returned in register A. When an end of fileis read, control is transferred to the starting
address (hex 80)
GETC TD INPUT TEST INPUT DEVICE

JEQ GETC LOOP UNTIL READY
RD INPUT READ CHARACTER
COMP #4 IF CHARACTER IS HEX 04 (END OF FILE) ,
JEQ 80 JUMP TO START OF PROGRAM JUST LOADED
COMP #48 COMPARE TO HEX 30 (CHARACTER ' 0 ' )
JLT GETC SKIP CHARACTERS LESS THAN ' 0 '
SUB #48 SUBTRACT HEX 30 FROM ASCII CODE
COMP 10 IF RESULT IS LESS THAN 10 , CONVERSION IS
JLT RETURN COMPLETE. OTHERWISE, SUBTRACT 7 MORE
SUB #7 (FOR HEX DIGITS 'A' THROUGH 'F ' )
RETURN RSUB RETURN TO CALLER
INPUT BYTE X'F1 ' CODE FOR INPUT DEVICE
END LOOP
4.3 MACHINE DEPENDENT LOADER FEATURES

The features of loader that depends on machine architecture are called machine
47
dependent loader features. It includes:
1. Program Relocation
2. Program Linking
4.3.1 Program Relocation (Relocating Loader)

 The absolute loader has several disadvantages. One of the most obvious is the need for the
programmer to specify the actual address at which it will be loaded into memory.
 On a simple computer with a small memory the actual address at which the program will be
loaded can be specified easily.
 On a larger and more advanced machine, we often like to run several independent
programs together, sharing memory between them. We do not know in advance where a
program will be loaded. Hence we write relocatable programs instead of absolute ones.
 Writing absolute programs also makes it difficult to use subroutine libraries efficiently.
This could not be done effectively if all of the subroutines had preassigned absolute
addresses.
 The need for program relocation is an indirect consequence of the change to larger and
more powerful computers. The way relocation is implemented in a loader is also dependent
upon machine characteristics.
 Program relocation is explained in Module 2
 Loaders that has the capability to perform relocation are called relocating loaders or
relative loaders.
 There are two methods for specifying relocation in object program

1. Modification Record
2. Relocation Bit
Modification Record
 A Modification record is used to describe each part of the object code that must be
changed when the program is relocated.
 The Modification has the following format:(Its explained in detail in module 2)
 Each Modification record specifies the starting address and length of the field whose
value is to be altered. It then describes the modification to be performed.
48
 Consider the following object program, here the records starting with M represents the
modification record. In this example, the record M 000007 05 + COPY is the
modification suggested for the statement at location 000007 and requires modification
of 5-half bytes and the modification to be performed is add the value of the symbol
COPY, which represents the starting address of the program.(means add the starting
address of program to the statement at 000007). Similarly for other records.
The Modification record is not well suited for certain cases. In some programs the
addresses in majority of instructions need to be modified when the program is relocated. This
would require large number of Modification records, which results in an object program more
than twice as large as the normal. In such cases, the second method called relocation bit is used.
Relocation Bit
 To overcome the disadvantage of modification record, relocation bit is used.
 The Text records are the same as before except that there is a relocation bit associated
with each word of object code.
 Since all SIC instructions occupy one word, this means that there is one relocation bit
for each possible instruction.
 The relocation bits are gathered together into a bit mask following the length indicator
in each Text record.
 Text record format
 If the relocation bit corresponding to a word of object code is set to 1, the programs
starting address is to be added to this word when the program is relocated.
 A bit value of 0 indicates that no modification is necessary.
 If a Text record contains fewer than 12 words of object code, the bits corresponding to
unused words are set to 0.
 In the following object code, the bit mask FFC (representing the bit string
111111111100) in the first Text record specifies that all 10 words of object code are to
be modified during relocation.
49
4.3.2 Program Linking (Linking Loader)
 Many programming languages allow us to write different pieces of code called modules,
separately. This simplifies the programming task because we can break a large program
into small, more manageable pieces. Eventually, though, we need to put all the modules
together. Apart from this, a user code often makes references to code and data
defined in some "libraries".
 Linking is the process in which references to "externally" defined symbols are processed
so as to make them operational.
 A linker or link editor is a program that combines object modules to form an executable
program.
 A Linking Loader is a program that has the capability to perform relocation, linking and
loading. Linking and relocation is performed at load time.
Algorithm and Data Structures for a Linking Loader

 The algorithm for a linking loader is considerably more complicated than the absolute
loader algorithm.
 A linking loader usually makes two passes over its input, just as an assembler does.
 In terms of general function, the two passes of a linking loader are quite similar to the
two passes of an assembler:
o Pass 1 assigns addresses to all external symbols.
50
o Pass 2 performs the actual loading, relocation, and linking.
 The main data structure needed for our linking loader is an external symbol table
ESTAB. This table, which is analogous to SYMTAB in our assembler algorithm, is
used to store the name and address of each external symbol in the set of control sections
being loaded.
 Two other important variables are PROGADDR (program load address) and
CSADDR (control section address).
(1) PROGADDR is the beginning address in memory where the linked program is to be
loaded. Its value is supplied to the loader by the OS.
(2) CSADDR contains the starting address assigned to the control section currently
being scanned by the loader. This value is added to all relative addresses within the
control section to convert them to actual addresses.
Linking loader PASS 1

 During Pass 1, the loader is concerned only with Header and Define records.
 Variables and Data structures used in PASS1
o PROGADDR (Program Load Address) from OS
o CSADDR (Control Section Address)
o CSLTH (Control Section Length)
o ESTAB (External Symbol Table)
Algorithm for Pass 1
51
Explanation of Pass 1 algorithm
 The beginning load address for the linked program (PROGADDR) is obtained from the
OS. This becomes the starting address (CSADDR) for the first control section in the
input sequence.
 The control section name from Header record is entered into ESTAB, with value given
by CSADDR.
 All external symbols appearing in the Define record for the control section are also
entered into ESTAB. Their addresses are obtained by adding the value specified in the
Define record to CSADDR.
 When the End record is read, the control section length CSLTH (which was saved from
the End record) is added to CSADDR. This calculation gives the starting address for the
next control section in sequence.
 At the end of Pass 1, ESTAB contains all external symbols defined in the set of control
sections together with the address assigned to each.
Linking loader PASS 2

 Pass 2 of linking loader performs the actual loading, relocation, and linking of the
program.
Algorithm for Pass 2
Explanation of Pass 2 Algorithm

52
 As each Text record is read, the object code is moved to the specified address (plus the
current value of CSADDR).
 When a Modification record is encountered, the symbol whose value is to be used for
modification is looked up in ESTAB.
 This value is then added to or subtracted from the indicated location in memory.
 The last step performed by the loader is usually the transferring of control to the loaded
program to begin execution. The End record for each control section may contain the
address of the first instruction in that control section to be executed. Loader takes this as
the transfer point to begin execution.
4.3 MACHINE INDEPENDENT LOADER FEATURES

The features of loader that doesn’t depends the architecture of machine are called
machine independent loader features. It includes:
o Automatic Library search
o Loader Options that can be selected at the time of loading and linking
4.3.1 Automatic Library Search

One of the important machine independent feature of loader is to use an automatic
library search process for handling external reference.Many linking loaders can
automatically incorporate routines from a subprogram library into the program being loaded.
The subroutines called by the program being loaded are automatically fetched from the library,
linked with the main program, and loaded. Linking loaders that supp ort automatic library
search must keep track of external symbols that are referred to, but not defined, in the primary
input to the loader.
At the end of Pass 1, the symbols in ESTAB that remain undefined represent unresolved
external references. The loader searches the library or libraries specified for routines that
contain the definitions of these symbols, and processes the subroutines found by this search
exactly as if they had been part of the primary input stream. Note that the subroutines fetched
from a library in this way may themselves contain external references. It is therefore necessary
to repeat the library search process until all references are resolved. If unresolved external
references remain after the library search is completed, these must be treated as errors.
Automatic Library search process is described below:

1. Enter the symbols from each Refer record into ESTAB
2. When the definition is encountered (Define record), the address is assigned
3. At the end of Pass 1, the symbols in ESTAB that remain undefined represent unresolved
external references
4. The loader searches the libraries specified (or standard) for undefined symbols or
subroutines
The library search process may be repeated since the subroutines fetched from a library
53
may themselves contain external references. Programmer defined subroutines have higher
priority. So the programmer can override the standard subroutines in the library by supplying
their own routines. Searching on the libraries is done by scanning through the define records of
all the object programs in the library. This method is quiet inefficient. So we go for a directory
structure. Assembled or compiled versions of the subroutines in a library is structured using a
directory that gives the name of each routine and a pointer to its address within the library. Thus
the library search involves only a search on the directory, followed by reading the object
programs indicated by this search.
The library contains an internal directory where each files along with their address are
stored. This facilitates the linking of library functions more easy, because whenever a library
function is needed its address can be directly obtained from internal directory.
4.3.2 Loader Options

Many loaders allow the user to specify options that modify the standard processing.
Option 1:
 allows the selection of alternative sources of input.
 Ex. INCLUDE program-name (library-name)
This direct the loader to read the designated object program from a library and treat it
as if it were part of the primary loader input.
Option 2:
 allows the user to delete external symbols or entire control sections.
 Ex. DE LETE csect-name
This instruct the loader to delete the named control section(s) from the set of programs
being loaded.
Option 3:
 allows the user to change the name of external symbol
 Ex: CHANGE name1, name2
this cause the external symbol name1 to be changed to name2 wherever it appears in
the object programs.
Option 4:
 This involves the automatic inclusion of library routines to satisfy external references.
 Ex: LIBRARY MYLIB
Such user-specified libraries are normally searched before the standard system libraries.
This allows the user to use special versions of the standard routines.
Option 5:
 NOCALL STDDEV, PLOT, CORREL
 To instruct the loader that these external references are to remain unresolved. This
avoids the overhead of loading and linking the unneeded routines, and saves the
memory space that would otherwise be required.
Example:
If we would like to use the utility routines READ and WRITE instead of RDREC and
WRREC in our programs, for a temporary measure, we use the following loader
commands
INCLUDE READ(UTLIB)
INCLUDE WRITE(UTILB)
DELETE RDREC, WRREC
CHANGE RDREC, READ
54
CHANGE WRREC, WRITE
55
These commands would ask the loader to include control sections READ and
WRITE from the library UTLIB and to delete the control sections WRREC and
RDREC. The first CHANGE command would change all the external references to the
symbol RDREC to be changed to refer to READ and second CHANGE will cause
references to WRREC to be changed to WRITE.
4.4 LOADER DESIGN OPTIONS (or Variants of basic loader model)

In this section some alternatives for basic loader models are discussed.
1. Linkage Editors – which perform linking prior to load time
2. Dynamic Linking – which perform the linking function at execution time.
3. Bootstrap Loaders – used to load operating system or the loader into the
memory.
4.4.1 Linkage Editors

Linking loaders perform all linking and relocation at load time. There are two
alternatives: Linkage editors, which perform linking prior to load time, and dynamic linking, in
which the linking function is performed at execution time. Difference between linkage editor
and linking loader is explained below:
• A linking loader performs all linking and relocation operations, including automatic
library search if specified, and loads the linked program directly into memory for execution.
• A linkage editor produces a linked version of the program (load module or executable
image), which is written to a file or library for later execution.
 A linkage editor produces a linked version of the program (load module or executable
image), which is written to a file or library for later execution. When the user is ready to run
the linked program, a simple relocating loader can be used to load the program into memory.
The only object code modification necessary is the addition of an actual load address to
relative values within the program.
Figure: Processing of an object program using a) linking loader and b)linkage editor
56
 The Linkage Editor(LE) performs relocation of all control sections relative to the start of
the linked program. Thus, all items that need to be modified at load time have values that are
relative to the start of the linked program. This means that the loading can be accomplished
in one pass with no external symbol table required.
 If a program is to be executed many times without being reassembled, the use of a linkage
editor substantially reduces the overhead required. Linkage editors can perform many useful
functions besides simply preparing an object program for execution. Resolution of external
reference and library searching are only performed once for linkage editor.
 If a program is under development or is used infrequently, the use of a linking loader
outperforms a linkage editor.
 Consider a program PLANNER with a number of subroutines. You want to improve a
subroutine (PROJECT) of the program (PLANNER) without going back to the original
versions of all of the other subroutines. For that you can use linkage editor commands as
follows:
INCLUDE PLANNER (PROGLIB)

DELETE PROJECT // d e l e t e from e x i s t i n g PLANNER
INCLUDE PROJECT (NEWLIB) // includ e new v e r s i o n
REPLACE PLANNER (PROGLIB)
4.4.2 Dynamic Linking/Dynamic Loading/Load-on-call
 Linkage editors perform linking operations before the program is loaded for
execution.
 Linking loaders perform these same operations at load time.
 Dynamic linking, dynamic loading, or load on call postpones the linking
function until execution time. That is a subroutine is loaded and linked to the
rest of the program when it is first called.
 Dynamic linking, dynamic loading, or load on call postpones the linking function
until execution time. That is a subroutine is loaded and linked to the rest of the program
when it is first called.
 Dynamic linking is often used to allow several executing programs to share one copy of
a subroutine or library (eg. run-time support routines for a high-level language like C.)
 With a program that allows its user to interactively call any of the subroutines of a large
mathematical and statistical library, all of the library subroutines could potentially be
needed, but only a few will actually be used in any one execution. Dynamic
linking can avoid the necessity of loading the entire library for each execution except
those necessary subroutines.
 For example, that a program contains subroutines that correct or clearly diagnose error
in the input data during execution. If such error are rare, the correction and diagnostic
routines may not be used at all during most execution of the program. However, if the
program were completely linked before execution, these subroutines need to be loaded
and linked every time.
 Fig 3.14 illustrates a method in which routines that are to be dynamically loaded must
be called via an OS service request.
57
Figure: Loading and calling of a subroutine using dynamic linking
58
 Fig (a): Whenever the user program needs a subroutine for its execution, the program
makes a load-and-call service request to OS(instead of executing a JSUB instruction
referreing to an external symbol) . The parameter of this request is the symbolic
name(ERRHANDL) of the routine to be called.
 Fig (b): OS examines its internal tables to determine whether or not the routine is
already loaded. If necessary, the routine is loaded from the specified user or system
libraries.
 Fig (c): Control is then passed from OS to the routine being called.
 Fig (d): When the called subroutine completes it processing, it returns to its caller (i.e.,
OS). OS then returns control to the program that issued the request.
 Fig (e): If a subroutine is still in memory, a second call to it may not require another
load operation. Control may simply be passed from the dynamic loader to the called
routine.
4.4.3 Bootstrap Loaders

 Given an idle computer with no program in memory, how do we get things
started?Two solutions are there.
1. On some computers, an absolute loader program is permanently resident in a read-

only memory (ROM). When some hardware signal occurs, the machine begins to
execute this ROM program. This is referred to as a bootstrap loader.
2. On some computers, there’s a built-in hardware which read a fixed-length record

from some device into memory at a fixed location. After the read operation, control
is automatically transferred to the address in memory. If the loading process
requires more instructions than can be read in a single record, this first record causes
the reading of others, and these in turn can cause the reading of more records
When a computer is first turned on or restarted, a special type of absolute loader,

called a bootstrap loader, is executed. This bootstrap loader loads the first program to be run
by the computer – usually an operating system.
Unit IV:Compilers - Basic compiler Functions – Grammars - Lexical Analysis – Syntactic Analysis-
Code Generation-Compiler Design options.
Compilers
5.1 Basic Compiler Functions

 Fig 5.1 shows an example Pascal program for the following explanations.
７
 For the purposes of compiler construction, a high-level programming language is usually
described in terms of grammar.
This grammar specifies the form, or syntax, of legal statements in the language.
The problem of compilation then becomes one of matching statements written by the
programmer to structures defined by the grammar, and generating theappropriate object
code for each statement.
 A source program statement can be regarded as a sequence of tokens rather than simply as a
string of characters.
Tokens may be thought of as the fundamental building blocks of the language. For example, a
token might be a keyword, a variable name, an integer, an arithmetic operator, etc.
 The task of scanning the source statement, recognizing and classifying the various tokens, is
known as lexical analysis. The part of the compiler that performs this analytic function is
commonly called the scanner.
 After the token scan, each statement in the program must be recognized as some language
construct, such as a declaration or an assignment statement, described by the grammar.
This process, called syntactic analysis or parsing, is performed by a part of the compiler that is
usually called the parser.
 The last step in the basic translation process is the generation of object code. Most compilers
create machine-language programs directly instead of producing a symbolic program for later
translation by an assembler.
 Although we have mentioned three steps in the compilation process – scanning, parsing, and
code generation – it is important to realize that a compiler does not necessarily make three
７
passes over the program being translated.
For some languages, it is quite possible to compile a program in a single pass.
5.1.1 Grammars
 A grammar for a programming language is a formal description of the syntax, or form, of
programs and individual statements written in the language.
 The grammar does not describe the semantics, or meaning, of the various statements; such
knowledge must be supplied in the code-generation routines.
Example: for the difference between syntax and semantics, consider the two statements (I := J
+ K) and (X := Y + I), where X and Y are REAL variables and I, J, K are INTEGER variables.
These two statements have identical syntax. However, the semantics of the two statements are
quite different. The first statement specifies that the variables in the expression are to be added
using integer arithmetic operations. The second statement specifies a floating-point
addition, with the integer operand I being converted to floating point before adding.
 Obviously, these two statements would be compiled into very different sequences of machine
instructions. However, they would be described in the same way by the grammar.
The differences between the statements would be recognized during code generation.
 A number of different notations can be used for writing grammars. The one we describe is
called BNF (for Backus-Naur Form). Fig 5.2 gives one possible BNF grammar for a highly
restricted subset of Pascal.
 A BNF grammar consists of a set of rules, each of which defines the syntax of some construct
in the programming language.
For example, Rule 13 in Fig 5.2: <read> ::= READ ( <id-list> ). This is a definition
of the syntax of a Pascal READ statement that is denoted in the grammar as
<read>.
The symbol ::= can be read “is defined to be”. On the left of this symbol is the language
construct being defined,
<read>, and on the right is a description of the syntax being defined for it.
 Character strings enclosed between the angle brackets < and > are called nonterminal symbols
(such as ‘<read>’ and ‘<id-list>’). These are the names of constructs defined in the grammar.
７
Entries not enclosed in angle brackets are terminal symbols of the grammar (i.e., tokens, such
as ‘READ’, ‘(‘, and ‘)’).
The blank spaces in the grammar rules are not significant.
They have been included only to improve readability.
 To recognize a <read> (to resolve all nonterminal symbols), we also need the definition of <id-
list>. This is provided by Rule 6 in Fig 5.2. <id-list> ::= id |
<id-list>, id
This rule offers two possibilities, separated by the | symbol, for the syntax of an <id-list>.
The first alternative specifies that an <id-list> may consist simply of a token id (the notation id
denotes an identifier that is recognized by the scanner).
The second alternative is an <id-list>, followed by the token “,” (comma), followed by a token
id.
Example: ALPHA is an <id-list> that consists of a single id ALPHA; ALPHA , BETA is an
<id-list> that consists of another <id-list> ALPHA, followed by a comma, followed by an id
BETA, and so forth.
 It is often convenient to display the analysis of a source statement in terms of a grammar as a
tree. This tree is usually called the parse tree, or syntax tree, for the statement. Fig 5.3(a)
shows the parse tree for the statement READ ( VALUE ).
７
 Rule 9 of the grammar in Fig 5.2 provides a definition of the syntax of an assignment
statement:
<assign> ::= id := <exp>
That is, an <assign> consists of an id, followed by the token :=, followed by an expression
<exp>.
 Rule 10 gives a definition of an <exp>:
<exp> ::= <term> | <exp> + <term> | <exp> - <term>
 Continuously, Rule 11 defines a <term> to be any sequence of <factor>s connected by * and
DIV.
 Again, Rule 12 specifies that a <factor> may consist of an identifier id or an integer int (which
is also recognized by the scanner) or an <exp> enclosed in parentheses.
 Fig 5.3(b) shows the parse tree for statement 14 from Fig
5.1 in terms of the rules just described.
Note that the parse tree in Fig 5.3(b) implies that multiplication and division are done before
addition and subtraction (that is, multiplication and division have higher precedence than
addition and subtraction). The terms SUMSQ DIV 100 and MEAN * MEAN must be
calculated first since these intermediate results are the operands (left and right subtrees) for the
– operation.
 The parse trees shown in Fig 5.3 represent the only possible ways to analyze these two
statements in terms of the grammar of Fig 5.2. If there is more than one possible parse tree for
a given statement, the grammar is said to be ambiguous.
 Fig 5.4 shows the parse tree for the entire program in Fig 5.1.
７
5.1.2 Lexical Analysis
 Lexical analysis involves scanning the program to be compiled and recognizing the tokens that
make up the source statements. Scanners are usually designed to recognize keywords,
operators, and identifiers, as well as integers, floating-point numbers, character strings, and
other similar items.
 Items such as identifiers and integers are usually recognized directly as single tokens and might
be defined as a part of the grammar. For example,
<ident> ::= <letter> | <ident> <letter> | <ident> <digit>
<letter> ::= A | B | C | … | Z
<digit> ::= 0 | 1 | …| 9
 The output of the scanner consists of a sequence of tokens. For efficiency of later use, each
token is usually represented by some fixed-length code, such as an integer, rather than as a
variable-length character string.
In such a token coding scheme for the grammar of Fig 5.2 (shown in Fig 5.5), the token
PROGRAM would be represented by the integer value 1, an identifier id would be represented
by the value 22, and so on.
 When the token being scanned is a keyword or an operator, such a coding scheme gives
sufficient information. However, in the case of identifier, it is also necessary to specify the
particular identifier name that was scanned.
The same is true for integers, floating-point values, character-string constants, etc.
This can be accomplished by associating a token specifier with the type code for such tokens.
The specifier gives the identifier name, integer value, etc., that was found by the scanner.
 Fig 5.6 shows the output from a scanner for the program in Fig 5.1, using the token coding
scheme in Fig 5.5.
For token type 22 (identifier), the token specifier is a pointer to a
symbol-table entry (denoted be ^SUM,
^SUMSQ, etc.).
For token type 23 (integer), the specifier is the value of the integer (denoted by #0, #100, etc.).
 The scanner usually is responsible for reading the lines of the source program as needed, and
possibly for printing the source listing. Comments are ignored by the scanner, except for
printing on the output listing.
 The process of lexical scanning is quite simple. However, many languages have special
characteristics that must be considered when programming a scanner.
For example, in FORTRAN, a number in columns 1-5 of a source statement should be
interpreted as a statement number, not as an integer.
 Languages that do not have reserved words create even more difficulties for the scanner.
For example, in FORTRAN, any keyword may also be used as an identifier (See the case in
the lower part of page 237).
In such a case, the scanner might interact with the parser so that it could tell the proper
interpretation of each word, or it might simply place identifiers and keywords in the same
class, leaving the task of distinguishing between them to the parser.
Modeling Scanners as Finite Automata
 The tokens of most programming languages can be recognized by a finite automaton. Finite
automata are often represented graphically, as illustrated in Fig 5.7(a).
States are represented by circles, and transitions by arrows from one state to another. Each
arrow is labeled with a character or a set of characters that cause the specified transition to
occur.
 Consider, for example, the finite automaton shown in Fig 5.7(a) and the first input string in Fig
5.7(b).
The automaton starts in State 1 and examines the first character of the input string. The
The b causes a transition from State 2 to State 3, etc.

 The first two input strings in Fig 5.7(b) can be recognized by the finite automaton in Fig 5.7(a).
Consider the third input string in Fig 5.7(b). The finite automaton beings in State 1, as before,
and the
Now the next character to be scanned is c. However, there is no transition from State 2 that is
labeled with c. Therefore, the automaton must stop in State 2.
 Fig 5.8 shows several finite automata that are designed to recognize typical programming
language tokens.
Fig 5.8(a) recognizes identifiers and keywords that begin with a letter and may continue with
any sequence of letters and digits.
Some languages allow identifiers such as NEXT_LINE, which contains the underscore
character (_). Fig 5.8(b) shows a finite automaton that recognizes identifiers of this type.
The finite automaton in Fig 5.8(c) recognizes integers that consist of a string of digits, including
those that contain leading zeroes, such as 000025.
Fig 5.8(d) shows an automaton that does not allow
leading zeroes, except in the case of the integer 0.
 Each of the finite automata we have seen so far was designed to recognize one particular
type of token. Fig
5.9 shows a finite automaton that can recognize all of the tokens listed in Fig 5.5.
 In Fig 5.9, a special case occurs in State 3. Suppose that the scanner encounters an erroneous
token such as “VAR.”.
When the automaton stops in State 3, the scanner should perform a check to see whether the
string being recognized is “END.”.
If it is not, the scanner could back up to State 2 (recognizing the “VAR”). The period
would then be
rescanned as part of the following token the next time the scanner is called.
 Finite automata provide an easy way to visualize the operation of a scanner. Fig 5.10(a) shows
a typical algorithm to recognize such a token.
Fig 5.10(b) shows the finite automaton from Fig 5.8(b) represented in a tabular form.
5.1.3 Syntactic Analysis
 During syntactic analysis, the source statements written by the programmer are recognized as
language constructs described by the grammar being used.
 We may think of this process as building the parse tree for the statements. Parsing techniques
are divided into two general classes – bottom-up and top-down – according to the way in which
the parse tree is constructed.
Top-down methods (ex. recursive-descent parsing) begin with the rule of the grammar that
specifies the goal of the analysis (i.e., the root of the tree), and attempt to construct the tree so
that the terminal nodes match the statements being analyzed.
Bottom-up methods (ex. operator-precedence parsing) begin with the terminal nodes of the
tree (the statements being analyzed), and attempt to combine these into successively higher-
level nodes until the root is reached.
 A large number of different parsing techniques have been devised, most of which are
applicable only to grammars that satisfy certain condition.
Operator-Precedence Parsing
 The bottom-up parsing technique we consider is called the operator precedence method. This
method is based on examining pairs of consecutive operators in the source program, and
making decisions about which operation should be performed first.
For example, the arithmetic expression “A + B * C – D”. According to usual rules of
arithmetic, * and / have higher precedence than + and –. If we examine the first two operators
+ and *, we find that + has lower precedence
than *. This is often written as “+ < *”.
１６
Similarly, for the next part pair of operators * and –, we would find that * has higher
precedence than –. We may write this as “* > –”.
 A+ B * C – D
< >
This implies that the subexpression B*C is to be computed before either of the other operations
in the expression is performed.
 The first step in constructing an operator-precedence parser is to determine the precedence
relations between the operators of the grammar. In this context, operator is taken to mean any
terminal symbol (i.e., any token), so we also have precedence relations involving tokens such
as BEGIN, READ, id, etc.
The matrix in Fig 5.11 shows these precedence relations for the grammar in Fig 5.2.
 The relation ≐ indicates that the two tokens involved have equal precedence and should be
recognized by the parser as part of the same language construct.
 Note that the precedence relations do not follow the ordinary rules for comparisons.
For example, we have “; > END” but “END > ;”.
That is, when ; is followed by END, the ; has higher precedence.
２０
But when END is followed by ;, the END has higher
precedence.
 Also note that in many cases, there is no precedence relation between a pair of tokens. This
means that these two tokens cannot appear together in any legal statement. If such a
combination occurs during parsing, it should be recognized as a syntax error.
 There are algorithmic methods for constructing a precedence matrix like Fig 5.11 from a
grammar [see, for example, Aho et al. (1998)]. For the operator-precedence parsing method to
be applied, it is necessary that all the precedence relations be unique.
 Fig 5.12 shows the application of the operator-precedence parsing method to
the READ statement from line 9 of the program in Fig 5.1.
The statement is scanned from left to right, one token at a time. For each pair of operators, the
precedence relation between them is determined.
Part (ii) of Fig 5.12 shows the statement being analyzed
with id replaced by <N1>.
Part (ii) of Fig 5.12 also shows the precedence relations that hold in the new version of the
statement. An operator-precedence parser generally uses a stack to save tokens that have been
scanned but yet parsed, so it can reexamine them in this way.
Precedence relations hold only between terminal symbols, so <N1> is not involved in this
process, and a relationship is determined between ( and ).
 Fig 5.13 shows a similar step-by-step parsing of the assignment statement from line 14 of the
program in Fig 5.1.
２０
２１
Note that the left-to-right scan is continued in each step only far enough to determine the next
portion of the statement to be recognized, which is the first portion delimited by < and >.
Once this portion has been determined, it is interpreted as a nonterminal according to some rule
of the grammar.
 This process continues until the complete statement is recognized. Note that (see Fig 5.13)
each portion of the parse tree is constructed from the terminal nodes up toward the root, hence
the term bottom-up parsing.
Although we have illustrated operator-precedence
parsing only on single statements, the same techniques can be applied to an entire program.
 Behind the operator precedence technique, a more general method known as shift-reduce
parsing was developed.
Shift-reduce parsers make use of a stack to store tokens that have not yet been recognized in
terms of the grammar.
The actions of the parser are controlled by entries in a table, somewhat similar to the
precedence matrix discussed before.
The two main actions are shift (push the current token onto the stack) and reduce (recognize
２６
symbols on top of the stack according to a rule of the grammar).
 Fig 5.14 illustrates this shift-reduce process, using the same READ statement considered in Fig
5.12. The token currently being examined by the parser is indicated by
↑.
In Fig 5.14(a), the parser shifts (pushing the currently token onto the stack) when it encounters
２６
the token BEGIN.
In Fig 5.14 (b-d), similar to the action in Fig 5.14(a).
In Fig 5.14(e), when parser examines the token ), the reduce action is invoked. A set of tokens
from the top of the stack (in this case, the single token id) is reduced to a nonterminal symbol
from the grammar (in this case,
<id-list>).
In Fig 5.14(f), the token ) is considered again. This time, it will be pushed onto the stack, to be
reduced later as part of the READ statement.
 For this simple type of grammar, shift roughly corresponds to the action taken by
an operator-precedence parser when it encounters the
relations < and ≐. Reduce roughly corresponds to the action taken when an operator-
precedence parser encounters the relation >.
Recursive-Descent Parsing
 The other parsing technique is a top-down method known as recursive descent. A recursive
descent parser is made up of a procedure for each nonterminal symbol.
 As an example for illustrating the parsing process of a recursive descent parser, consider Rule
13 of the grammar in Fig 5.2.
The procedure for <read> in a recursive-decent parser first examines the next two input tokens,
looking for READ and (.
If these are found, the procedure for <read> then calls the procedure for <id-list>.
If that procedure (for <id-list>) succeeds, the <read> procedure examines the next input
token, looking for ).
If all these tests are successful, the <read> procedure
returns an indication of success to its caller and advances to the next token following ).
Otherwise, the <read> procedure returns an indication of
failure.
 When there are several alternatives defined by the grammar for a nonterminal, the procedure is
only slightly more complicated. For the recursive-descent technique, it must be possible to
decide which alternative to use by examining the next input token.
For example, the procedure for <stmt> looks at the next token to decide which of its four
alternatives to try.
If the token is READ, it calls the procedure for <read>;
if the token is id, it calls the procedure for <assign> because this is the only alternative that can
begin with the token id, and so on.
 There is a problem. For example, the procedure for
<id-list>, corresponding to Rule 6, would be unable to decide between its two alternatives
since id and <id-list> can begin with id.
If the procedure decided to try the 2nd alternative (<id-list>,
id), it would immediately call itself recursively to find an
<id-list>. This could result in another immediate recursive call, which leads to an unending
chain.
The reason for this is that one of the alternatives for
<id-list> begins with <id-list>.
Therefore, top-down parsers cannot be directly used with a grammar that contains this kind of
immediate left recursion.
 Fig 5.15 shows the grammar from Fig 5.2 with left recursion eliminated.
２６
 Top-down parsing using new grammar: Consider Rule 6a in Fig 5.15.
This notation specifies that the terms between {and} may be omitted, or repeated one or more
times.
Thus, Rule 6a defines <id-list> as being composed of an
id followed by zero or more occurrences of “, id”. This is clearly equivalent to Rule 6
of Fig 5.2.
 Fig 5.16 illustrates a recursive-descent parse of the READ statement on line 9 of Fig 5.1, using
the grammar in Fig 5.15.
２７
２９
Fig 5.16(a) shows the procedures for the nonterminals
<read> and <id-list>.
Assume that the variable TOKEN contains the type of the next input token, using the coding
scheme shown in Fig 5.5.
 Fig 5.16(b) (corresponding to the algorithms in Fig 5.16(a)) gives a graphic representation of
the recursive-descent parsing process for the statement being analyzed.
In part (i), the READ procedure has been invoked and has examined the tokens READ and (
from the input stream (indicated by the dashed lines).
In part (ii), READ has called IDLIST (indicated by the solid line), which has examined the
token id.
In part (iii), IDLIST has returned to READ, indicating success; READ has then examined the
input token ).
This completes the analysis of the source statement. The procedure READ will now return to
its caller, indicating that a <read> was successfully found.
 Fig 5.17 illustrates a recursive-descent parse of the
assignment statement on line 14 of Fig 5.1.
２９
３０
81
Fig 5.17(a) shows the procedures (ASSIGN, EXP, TERM, FACTOR) for the
nonterminal symbols that are involved in parsing this statement. You should
carefully compare these procedures to the corresponding rules of the grammar.
Fig 5.17(b) is a step-by-step representation of the procedure calls and token
examinations similar to that shown in Fig 5.16(b).
Note that the same technique can be applied to an entire program.
82
Unit V:Other System Software: Text Editors- Interactive Debugging Systems.
Editors and Debugging Systems
4.0 Introduction
An Interactive text editor has become an important part of almost any computing
environment. Text editor acts as a primary interface to the computer for all type of
“knowledge workers” as they compose, organize, study, and manipulate computer-based
information.
An interactive debugging system provides programmers with facilities that aid in

testing and debugging of programs. Many such systems are available during these days.
Our discussion is broad in scope, giving the overview of interactive debugging systems –
not specific to any particular existing system.
4.1 Text Editors
An Interactive text editor has become an important part of almost any computing
environment. Text editor acts as a primary interface to the computer for all type of
“knowledge workers” as they compose, organize, study, and manipulate computer-based
information.
A text editor allows you to edit a text file (create, modify etc…). For example the
Interactive text editors on Windows OS - Notepad, WordPad, Microsoft Word, and text
editors on UNIX OS - vi, emacs, jed, pico.
Normally, the common editing features associated with text editors are, Moving
the cursor, Deleting, Replacing, Pasting, Searching, Searching and replacing, Saving and
loading, and, Miscellaneous(e.g. quitting).
4.1.1 Overview of the editing process
An interactive editor is a computer program that allows a user to create and revise
a target document. Document includes objects such as computer diagrams, text, equations
tables, diagrams, line art, and photographs. Here we restrict to text editors, where
character strings are the primary elements of the target text.
83
Document-editing process in an interactive user-computer dialogue has four tasks
- Select the part of the target document to be viewed and manipulated

- Determine how to format this view on-line and how to display it
- Specify and execute operations that modify the target document
- Update the view appropriately
The above task involves traveling, filtering and formatting. Editing phase involves
– insert, delete, replace, move, copy, cut, paste, etc…
- Traveling – locate the area of interest

- Filtering - extracting the relevant subset
- Formatting – visible representation on a display screen
There are two types of editors. Manuscript-oriented editor and program oriented
editors. Manuscript-oriented editor is associated with characters, words, lines, sentences
and paragraphs. Program-oriented editors are associated with identifiers, keywords,
statements. User wish – what he wants – formatted.
4.1.2 User Interface
Conceptual model of the editing system provides an easily understood abstraction

of the target document and its elements. For example, Line editors – simulated the
world of the key punch – 80 characters, single line or an integral number of lines, Screen
editors – Document is represented as a quarter-plane of text lines, unbounded both down
and to the right.
The user interface is concerned with, the input devices, the output devices and,
the interaction language. The input devices are used to enter elements of text being edited,
to enter commands. The output devices, lets the user view the elements being edited and
the results of the editing operations and, the interaction language provides
communication with the editor.
Input Devices are divided into three categories, text devices, button devices and,
locator devices. Text Devices are keyboard. Button Devices are special function keys,
symbols on the screen. Locator Devices are mouse, data tablet. There are voice input
devices which translates spoken words to their textual equivalents.
Output Devices are Teletypewriters (first output devices), Glass teletypes

(Cathode ray tube (CRT) technology), Advanced CRT terminals, TFT Monitors
(Wysiwyg) and Printers (Hard-copy).
The interaction language could be, typing oriented or text command oriented and
menu-oriented user interface. Typing oriented or text command oriented interaction was
with oldest editors, in the form of use of commands, use of function keys, control keys
etc.,
84
Menu-oriented user interface has menu with a multiple choice set of text strings
or icons. Display area for text is limited. Menus can be turned on or off.
4.1.3 Editor Structure

Most text editors have a structure similar to that shown in the following figure.
That is most text editors have a structure similar to shown in the figure regardless of
features and the computers
Command language Processor accepts command, uses semantic routines –

performs functions such as editing and viewing. The semantic routines involve traveling,
editing, viewing and display functions.
Editing
buffer
component Editing
filter
Traveling Main
component memory
input Command
language Viewing
processor Viewing Viewing filter
component buffer
Paging
Routines
Output
devices File
Display system
Control
component
Data
Typical Editor Structure
Editing operations are specified explicitly by the user and display operations are
specified implicitly by the editor. Traveling and viewing operations may be invoked
either explicitly by the user or implicitly by the editing operations.
In editing a document, the start of the area to be edited is determined by the

current editing pointer maintained by the editing component. Editing component is a
collection of modules dealing with editing tasks. Current editing pointer can be set or
reset due to next paragraph, next screen, cut paragraph, paste paragraph etc..,.
85
When editing command is issued, editing component invokes the editing filter –
generates a new editing buffer – contains part of the document to be edited from current
editing pointer. Filtering and editing may be interleaved, with no explicit editor buffer
being created.
In viewing a document, the start of the area to be viewed is determined by the

current viewing pointer maintained by the viewing component. Viewing component is a
collection of modules responsible for determining the next view. Current viewing pointer
can be set or reset as a result of previous editing operation.
When display needs to be updated, viewing component invokes the viewing filter
– generates a new viewing buffer – contains part of the document to be viewed from
current viewing pointer. In case of line editors – viewing buffer may contain the current
line, Screen editors - viewing buffer contains a rectangular cutout of the quarter plane of
the text. Viewing buffer is then passed to the display component of the editor, which
produces a display by mapping the buffer to a rectangular subset of the screen – called a
window. The editing and viewing buffers may be identical or may be completely disjoint.
Identical – user edits the text directly on the screen. Disjoint – Find and Replace (For
example, there are 150 lines of text, user is in 100th line, decides to change all
occurrences of ‘text editor’ with ‘editor’). The editing and viewing buffers can also be
partially overlap, or one may be completely contained in the other. Windows typically
cover entire screen or a rectangular portion of it. May show different portions of the same
file or portions of different file. Inter-file editing operations are possible.
The components of the editor deal with a user document on two levels: In main
memory and in the disk file system. Loading an entire document into main memory may
be infeasible – only part is loaded – demand paging is used – uses editor paging routines.
Documents may not be stored sequentially as a string of characters. Uses separate editor
data structure that allows addition, deletion, and modification with a minimum of I/O and
character movement.
4.1.4 Types of editors based on computing environment
Editors function in three basic types of computing environments: Time sharing,

Stand-alone, and Distributed. Each type of environment imposes some constraints on the
design of an editor.
In time sharing environment, editor must function swiftly within the context of
the load on the computer’s processor, memory and I/O devices. In stand-alone
environment, editors on stand-alone system are built with all the functions to carry out
editing and viewing operations – The help of the OS may also be taken to carry out some
tasks like demand paging. In distributed environment, editor has both functions of stand-
alone editor, to run independently on each user’s machine and like a time sharing editor,
contend for shared resources such as files.
86
4.2 Interactive Debugging Systems
An interactive debugging system provides programmers with facilities that aid in

testing and debugging of programs. Many such systems are available during these days.
Our discussion is broad in scope, giving the overview of interactive debugging systems –
not specific to any particular existing system.
Here we discuss
- Introducing important functions and capabilities of IDS

- Relationship of IDS to other parts of the system
- The nature of the user interface for IDS
4.2.1 Debugging Functions and Capabilities
One important requirement of any IDS is the observation and control of the flow
of program execution. Setting break points – execution is suspended, use debugging
commands to analyze the progress of the program, résumé execution of the program.
Setting some conditional expressions, evaluated during the debugging session, program
execution is suspended, when conditions are met, analysis is made, later execution is
resumed.
A Debugging system should also provide functions such as tracing and traceback.
Tracing can be used to track the flow of execution logic and data modifications. The
control flow can be traced at different levels of detail – procedure, branch, individual
instruction, and so on… Traceback can show the path by which the current statement in
the program was reached. It can also show which statements have modified a given
variable or parameter. The statements are displayed rather than as hexadecimal
displacements
4.2.2 Program-Display capabilities
A debugger should have good program-display capabilities. Program being

debugged should be displayed completely with statement numbers. The program may be
displayed as originally written or with macro expansion. Keeping track of any changes
made to the programs during the debugging session. Support for symbolically displaying
or modifying the contents of any of the variables and constants in the program. Resume
execution – after these changes.
To provide these functions, a debugger should consider the language in which the
program being debugged is written. A single debugger – many programming languages –
language independent. The debugger - a specific programming language – language
dependent. The debugger must be sensitive to the specific language being debugged.
The context being used has many different effects on the debugging interaction.
The statements are different depending on the language
87
Cobol - MOVE 6.5 TO X
Fortran - X = 6.5
C - X = 6.5
Examples of assignment statements

Similarly, the condition that X be unequal to Z may be expressed as
Cobol - IF X NOT EQUAL TO Z
Fortran - IF ( X.NE.Z)
C - IF ( X <> Z)
Similar differences exist with respect to the form of statement labels, keywords and so
on…
The notation used to specify certain debugging functions varies according to the
language of the program being debugged. Sometimes the language translator itself has
debugger interface modules that can respond to the request for debugging by the user.
The source code may be displayed by the debugger in the standard form or as specified
by the user or translator.
It is also important that a debugging system be able to deal with optimized code.
Many optimizations like
- Invariant expressions can be removed from loops

- Separate loops can be combined into a single loop
- Redundant expression may be eliminated
- Elimination of unnecessary branch instructions
Leads to rearrangement of segments of code in the program. All these

optimizations create problems for the debugger, and should be handled carefully.
4.2.3 Relationship with Other Parts of the System
The important requirement for an interactive debugger is that it always be

available. Must appear as part of the run-time environment and an integral part of the
system. When an error is discovered, immediate debugging must be possible. The
debugger must communicate and cooperate with other operating system components such
as interactive subsystems.
Debugging is more important at production time than it is at application-

development time. When an application fails during a production run, work dependent on
that application stops. The debugger must also exist in a way that is consistent with the
security and integrity components of the system. The debugger must coordinate its
activities with those of existing and future language compilers and interpreters.
88
4.2.4 User-Interface Criteria
Debugging systems should be simple in its organization and familiar in its language, closely reflect
common user tasks. The simple organization contribute greatly to ease of training and ease of use. The
user interaction should make use of full-screen displays and windowing-systems as much as possible. With
menus and full-screen editors, the user has far less information to enter and remember. There should be
complete functional equivalence between commands and menus – user where unable to use full- screen
IDSs may use commands. The command language should have a clear, logical and simple syntax;
command formats should be as flexible as possible. Any good IDSs should have an on-line HELP facility.
HELP should be accessible from any state of the debugging session.
89

Ssnotes

Uploaded by

Copyright:

Available Formats

Ssnotes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ssnotes

Uploaded by

Copyright:

Available Formats

Unit 1:Introduction- System Software and Machine Architecture- Simplified Instructional Computer (SIC)-

1.1 System Software and Machine Architecture

1.2.1 SIC Machine Architecture

X 1 Index register; used for addressing

L 2 Linkage register; JSUB

SW 9 Status word, including CC

Opcode(8) x Address (15)

Mode Indication Target address calculation

Direct x=0 TA = address

Indexed x=1 TA = address + (x)

Input and Output

Data movement and Storage Definition

 WORD - ONE-WORD CONSTANT

Example Programs (SIC)

Example 1(Simple data and character movement operation)

LDA FIVE STA ALPHA

Example 2( Arithmetic operations)

LDA ALPHAADD INCR

Example 3(Looping and Indexing operation)

LDX ZERO : X=0

Example 4( Input and Output operation)

INLOOP TD INDEV : TEST INPUT DEVICE

1.2.2 SIC/XE Machine ArchitectureMemory

Additional B, S, T, and F registers are provided by SIC/XE, in addition to theregisters of SIC

Mnemonic Number Special use

Floating-point data type

There is a 48-bit floating-point data type, F*2(e-1024)

Formats 1 and 2 are instructions do not reference memory at all

Addressing modes & Flag Bits

Five possible addressing modes plus the combinations are as follows.

Bits i and n: Says, how to use the target address

x - x is set to 1, X register value is added for target address calculation

Mode Indication Target address calculation

Input and Output

Example Programs (SIC/XE)

Example 1 (Simple data and character movement operation)

LDS INCR LDA ALPHA

Example 3(Looping and Indexing operation)

1.3 Different Architectures

1.3.1.1 VAX Architecture

1.3.2 RISC Machines

1.3.2.1 UltraSPARC Architecture

2.1 Basic Functions of Assembler

Assembly Language Object Code

 Functions of an assembler includes:

2.2 A simple SIC Assembler

Explanation of above program(No need to study, just to understand the program):

Subroutines (JSUB, RSUB)

2.3 Assembler output format - Header, Text and End Records

Fig 2.2 - Object code for the above example program:

2.4 Design of a two pass assembler

2.4.1 Necessity of two passes and Forward reference:

Functions of the two passes of assembler:

Pass 2 (Assemble instructions and generate object programs)

2.4.2 Data Structures Used

Operation Code Table (OPTAB)

Symbol Table (SYMTAB)

Location Counter (LOCCTR)