Unit II
Unit II
Unit II
ASSEMBLERS
Assembler is a software program that translates or converts assembly language
program (ALP) into machine language program.
Assembly Lang
Program (ALP)
(Source program)
Assembler
Machine or low
level language
(Object program)
In this chapter the design of a basic assembler is done on standard SIC machine.
2.1 BASIC ASSEMBLER FUNCTIONS :
To understand the concept of basic assembler function an example program is
written with two subroutines.
The modules are
COPY
RDREC
WRREC
Here is an example outline program to show the basic assembler function. The
complete program is given in Appendix-B.
Label
COPY
FIRST
CLOOP
ENDFIL
EOF
ZERO
LENGTH
BUFFER
Mnemonic Operand
Explanation
START
1000
Copy file from input to output
STL
RETADR SAVE return address
JSUB
RDREC
Jumps to subroutine RDREC
LDA
LENGTH Load A with length value
COMP
ZERO
Compare accumulator with zero
JEQ
ENDFIL
If equal jump to ENDFIL
.
LDA
EOF
Load A with EOF value
.
BYTE
CEOF
Define constant value EOF
WORD
0
Define constant value 0
RESW
1
Reserve memory area for length
RESB
4096
4096 byte buffer area
Fig 2.1 Sample Assembly Language Program
This is an incomplete main program COPY which calls RDREC and WRREC
subroutine.
The above example uses the following assembler directives in addition to
mnemonics machine instructions (executable statements)
START
END
BYTE
WORD
RESB
RESW
In the above program I/O instructions TD, RD, WD are used for reading or writing
into output device.
The EOF (End Of File) instruction is used to indicate that all the records are
written into output device.
The buffer is used to store the input, because I/O rates may differ with CPU
rate.
2.1.1 A Simple SIC Assembler :
There are five basic assembler functions must be done for translation of source
program to object code as given below :
o Converts all mnemonics op-codes (executable codes) into their
equivalent machine language.
Example : Translate STL to 14
(Refer Appendix A)
location
1000
COPY
1000
2
3
4
5
.
10
11
12
.
25
.
35
1000
1003
1006
1009
.
1030
1033
1036
.
2039
.
2061
is not executed
FIRST
STL
RETADR 141033
CLOOP
JSUB
RDREC 482039
LDA
LENGTH 001036
COMP
ZERO
281030
.
.
.
.
ZERO
WORD
0
000000
RETADR
RESW
1
.
LENGTH
RESW
1
.
.
.
.
.
RDREC
LDX
ZERO
041030
.
.
.
.
WRREC
LDX
ZERO
041030
Fig 2.2 ALP with Object code
So JSUB
RETADR
141033
JSUB
48
RDREC
RDREC
482039
(From appendix A)
Likewise all instructions are converted to object code. The remaining program
codes is left as reader exercise.
Forward Reference :
Forward reference is a reference to a label that is referred earlier, but defined
later in the program. For example ALPHA is a forward reference symbol in the
following program, because it is referred in the first line and defined later(third
line).
Example :
START
LDA
ALPHA
BETA
RESW
ALPHA
WORD
Similarly, in the example 2.2 RETADR label is used in line no.2, but defined
later in line no. 11 is an example for forward reference.
So in the first pass (scan) of ALP, the forward reference symbols values are
unknown and in the second pass only all the symbol values are replaced.
Avoiding forward reference :
One method to avoid forward reference is, defining all the symbols
first, and then used later in program.
Example : The above forward reference example can be rewritten as
follows :
BETA
RESW
ALPHA
WORD
START
LDA
ALPHA
Column 2-7
Program name
Column 8-13
Column 14-19
Column 2-7
Column 8-9
Column 10-69
Example
The text record for program in fig 2.2 is
T ^001000^ 1E^141033 ^001036 ^281030.
001000
1E
141033
3.End Record : It marks end of the object program and specifies the starting
address where the execution of the program is to begin.
End Record Format :
Column 1
Column 2-7
Perform processing of assembler directives that are not processed during Pass 1
Write the object program and assembly listing.
2.1.2 Assembler algorithm and Data Structures :
Data Structures : Three major data structures are used in assembler design
1) Symbol Table : Stores the symbols, its value and Flag (to indicate absolute
or relocation)
Example :
Symbol
COPY
FIRST
CLOOP
RETADR
ZERO
Value
1000
1000
1003
1033
1030
Flag
R
R
R
R
R
2) Literal Table : Stores all the literals used in the program. Literals are
preceded by = symbol in the program.
3) Operation Code Table (OPTAB) : OPTAB is used to look-up mnemonic
opcodes and translate them to their machine equivalents. The complete
OPTAB is given in Appendix-A.
Example :
Mnemonic Opcode
STL
14
JSUB
48
LDA
00
COMP
28
JEQ
30
J
3C
LDX
04
LDL
08
JLT
38
The symbol table, literal tables are generated during pass 1 of two pass
assembler.
OPTAB is specified by manufacturer(Appendix-A) and is referred by pass 2 to
construct the machine code.
OPTAB is organized as a hash table, which provides fast retrieval with a
minimum of searching.
SYMTAB (Symbol table) is also organized as a hash table for efficiency of
insertion and retrieval.
Pass 1 output(symbol table, literal table, source program) is written to an
intermediate file.
This intermediate file is input to pass-2 which produces object code from
.pass-1.
Algorithm :
The following two algorithms show the logic flow of the two passes of SIC
assembler.
Algorithm for Pass 1 of assembler :
Pass 1 :
begin
read first input line
if OPCODE = START then
begin
initialize LOCCTR to starting address
read next input line
end (if START)
It then reads OPCODE, compare it with OPTAB and if a match occurs, the
OPCODE is converted to the corresponding machine instruction.
If the OPCODE is not in OPTAB then it checks for assembler directives
(RESW, RESB, BYTE, WORD) and do the necessary translation.
Algorihtm for Pass 2 of assembler :
Pass 2 :
begin
read first input line
if OPCODE = START then
begin
read next input line
end (if START)
while OPCODE END do
begin
Search OPTAB for OPCODE
if found then
begin
if symbol is in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address.
end (if symbol)
assembler object code from OPTAB and SYMTAB
[STL RETADR 141033]
end (if found)
else if OPCODE = BYTE or WORD then
convert constant to object code
end (While not END)
end (pass 2)
Fig 2.4 Pass-2 Algorithm for two pass Assembler
This algorithm generate machine instruction from OPTAB, SYMTAB and
generate constant values for WORD, BYTE.
Mnemonic
START
Operand
0
FIRST
STL
READR
LDB
#length
BASE
Length
CLOOP
+JSUB
RDREC
----LDA
LENGTH
COMP
#0
------------J
@RETADR
LENGTH
RESW
1
RDREC
CLEAR
X
------------WRREC
CLEAR
X
------------END
FIRST
Fig 2.5 Example of SIC/XE
Effect
Copy file from input to
output
Save return address
Establish base register
Read input record
Load register A with length
----Return to caller
----Clear X register
----Clear X register
----program
Figure 2.6 shows the object code generated for each statement in the program
of Figure 2.5
Line No.
1
2
3
4
5
6
7
----12
13
25
----35
----41
Location
0000
0000
0003
0006
000A
000D
----002A
1033
1036
----105D
-----
Label
COPY
FIRST
Mnemonic
START
STL
LDB
BASE
CLOOP
+JSUB
LDA
COMP
--------J
LENGTH RESW
RDREC
CLEAR
--------WRREC CLEAR
--------END
Operand
0
RETADR
#LENGTH
LENGTH
RDREC
LENGTH
#0
----@RETADR
1
X
----X
----FIRST
Object Code
17202D
69202D
4B101036
032026
290000
----3E2003
----B410
----B410
-----
+JSUB
RDREC
4B101036
1011
0001
01036
address field
1 0
1 1
0
1
0 1 20 bit address
1 1 01036 (5 X 4 =20)
FIRST
For STL
STL
RETADR
17202D
> 17
CLOOP
3F2FEC
STCH
BUFFER, X
57C003
Here the base register content is defined by LDB #length instruction. So base
register content is 33.(length symbol value in fig 2.2.1)
The value(address) of buffer symbol is 0036.
So displacement = value of symbol content of base register
= 0036 0033
= 03=003(12 bits of 15 bits displacement)
Since x and b bits are set to 1 for indexed and base relative addressing we get
1100=C.
So STCH value is 57 and displacement is C003, which yields to the machine
instruction 57C003.
4.Immediate Addressing Mode:
Here the immediate operand(#3) is converted into internal representation and is
inserted into object code.
Example :
0020
LDA
#3
010003
Here i=1 and n=0 is set in format 3 instruction format.(so we get 01 for LDA
instead of 00)
2.2.2 Program Relocation :
Generally, more than one program shares memory and other resources of the
system. So the user cannot predict the memory location where program can be
loaded at the assembly time. The program will be loaded whenever there is a
room for it. (That is the free memory location area)
For example, the program in figure 2.6 must be loaded at address 0000 in order to
execute properly. Consider the instruction
0006
CLOOP
+JSUB
RDREC
4B101036
Here the RDREC subroutine address is 1036 when the program is loaded at
address 0000.
If the starting address of the program(0000) is not free, then this program will be
loaded at some other location where the memory is free.(which is called program
relocation). But here the RDREC address 1036 will not be a correct value.
The following example shows the program relocation concept with different
memory locations.
0000
0003
0006
.
1036
.
1076
17202D
69202D
4B101036 +JSUB
.
.
B410
.
5000
5003
5006
.
1036
.
6076
(a)
17202D
69202D
4B106036 +JSUB
.
7420
B410
7423
.
7426
.
.
8456
.
8496
(b)
17202D
69202D
4B106036 +JSUB
.
B410
(c)
statement RDREC.
o That is 5000 + 01036 = 06036 is the correct address for the relocated
statement RDREC.
(c) Shows memory locations content when the memory address 7429 is free.
Here RDREC address is changed to 7420 + 01036 = 08456.
Relocation or relocatable program
If the starting address specified in the program is not free, then the object code
will be loaded wherever the memory is free. An object program that contains the
Record
Col. 1
Col. 2 7
Col. 8 9
17202D
2.3.1 Literals :
It is often convenient to write the value of a constant operand as a part of the
instruction that uses it. This avoids having to define the constant elsewhere in the
program and make a label for it. Such an operand is called a literal.
Literal is a constant value, which is specified as part of the instruction operand. It
is identified with = symbol precedes the literal operand.
Example : LDA
= CEOF
In general all the literals are stored at the end of the programs memory location.
These collection of literals gathered together are called literal pools.
But the problem in storing at the end is whenever a larger memory area is reserved
in middle of the program (Example : BUFFER RESW
4096),
storing
Location
Label
0000
0000
0003
----001A
----002A
002D
002D
0030
COPY
FIRST
Mnemonic
Operand
Object
Code
1
2
3
----10
----17
18
19
START
0
STL
RETADR
LDB
#LENGTH
------------ENDFIL
LDA
=CEOF
------------J
@RETADR
LTORG
=CEOF
RETADR RESW
1000
Fig 2.8 Example program with LTORG
17202D
69202D
----032010
----3E2003
454F46
In the above program the literal =CEOF is placed immediately after LTORG
directive.
All the literals are stored in literal table (LITTAB) with its value as memory
address. This LITTAB is also organized as hash table, for faster access.
EQU
Value
EQU
4096
Example :
MAXLEN
EQU
EQU
EQU
ALPHA
RESW
BETA
EQU
ALPHA
EQU
ALPHA
ALPHA
RESW
Value.
RESB
1100
ORG
STAB
SYMBOL
RESB
VALUE
RESB
FLAGS
RESB
ORG
STAB + 1100
o The first ORG resets the location counter to the value of STAB.
o Each entry in symbol table consists of a 6 byte SYMBOL, followed by a
one word VALUE , followed by a 2 byte FLAGS.
SYMBOL
VALUE
FLAG
(6)
(1)
(2)
1100 bytes
o The last ORG sets LOCCTR back to its previous value, the address of
next unassigned byte of memory after the table STAB.
o In ORG, all symbols used to specify new location counter value must
have been previously defined.
2.3.3 Expressions :
Assemblers can also use expression as a operand and will be evaluated to
produce a single operand address or value. This expression can be formed using
arithmetic operators +, -, * , /.
Expressions are classified either absolute expression or relative expressions.
1) Absolute expressions :
o An expression that contains two or more operands of absolute terms on
all terms are relative in pairs will result absolute value.
Example : MAXLEN EQU BUFFEND BUFFER
o Here both symbols are relative with values 1036 and 0036 respectively.
EQU
BUFFER + 100
starting address.
o From this discussion the symbol table entries might be
Symbol
Type
RETADR
R
BUFFER
R
BUFFEND
R
MAXLEN
A
R stands for relocatable
Value
0030
0036
1036
1000
Label
BUFFEND
Mnemonic
EQU
Operand
*
So program blocks can be used to refer to segments of code that are rearranged
within a single object program unit and will be handled by loader separately.
The assembler directive USE is used to indicate the portion of the source
program belong to various blocks.
There are three blocks used in our example program
1) The first (unnamed) program block contains executable statement which
is known as default block.
2) The second (named DATA) contains all data areas that are a few words
or less in length
3) The third (named BLKS) contains all areas that consist of larger blocks
of memory.
The USE state may also indicate a continuation of a previously begun block.
Line
No.
1
2
---19
20
21
22
23
24
Location /
Label
Mnemonic
Operand
Object
Block
code
0000 0
COPY
START
0
0000 0
FIRST
STL
RETADR
172063
---------------0024 0
J
@RETADR 3E203F
0000 1
USE
CDATA
0000 1
RETADR RESW
1
0003 1
LENGTH RESW
1
0000 2
USE
CBLKS
0000 2
BUFFER
RESW
4096
Fig 2.9 Program with USE directive for blocks
Control Section :
o A control section is a part of the program that refers to segments that
are translated into independent object program units, which
maintains its identify after assembly.
2.4 Assembler Design Options :
In this section we discuss two alternatives to the standard two pass assembler.
Section 2.4.1 describes the structure and logic of one pass assembler. These
assemblers are used when it is necessary or desirable to avoid a second pass
over the source program.
Section 2.4.2 introduces the notion of multi pass assembler, an extension to the
two pass logic that allows on assembler to handle forward references during
symbol definition.
2.4.1 One Pass Assembler :
The main problem in trying to assemble a program in one pass involves
forward references. That is instruction operands that have not yet been defined
earlier in the source program.
This forward reference can be eliminated by defining all the symbols before
they are referenced. That is the storage reservation statements can be placed at
the start of the program rather than at the end.
But some forward reference (forward jump based on some condition) cannot be
eliminated.
Therefore the assembler must make some special provision for handling
forward references. The forward reference list is used to handle the forward
reference in the symbol table entries.
There are two main types of one pass assembler
1) Load and go assembler :
o This assembler generate the object code in memory that is ready for
execution.
needed.
o This is useful in a system that is oriented toward program
development and testing.
o Example : A university computing system for student.
2) The second type produces the object code for later execution so this object
code may be loaded into some secondary memory
In a single pass assembler the symbols that are not defined is flagged to
indicate that the symbol is undefined.
The address of this operand field is added to a list of forward references
associated with symbol table entry.
Whenever the definition for a symbol is encountered, the forward reference list
for that symbol is scanned and proper address is inserted into the instruction
that is previously generated.
Example program :
Line No.
Location
Label
Mnemonic
Operand
Object
Code
0
1
2
3
4
5
----9
----12
----19
1000
COPY
START
1000
1000
RETADR
RESW
1
1003
LENGTH
RESW
1
1016
FIRST
STL
RETADR
1019
CLOOP
JSUB
RDREC
101C
LDA
LENGTH
----------------1020
JEQ
ENDFIL
----------------1029
ENDFIL
LDA
EOF
----------------203D
RDREC
LDX
ZERO
Fig 2.10 Example program for One-Pass Assembler
14 1000
48 203D
00 1003
----30 1029
---------
Value
1000
1003
1016
1019
*
ENDFIL
*
1021
Value
1000
1000
1003
1016
1019
1029
203D
After scanning line no. 19, the ENDFIL symbol value 1029 is filled in the
address 101A (in forward reference list) and RDREC address 203D is filled in
the location 1021 (in forward reference list).
The header record, text record, End record are produced in the same way as in
two pass assembler as follows.
T ^ 001016 ^ 141000 ^ 48203D ^ 001003
-------------E ^ 001016
2.4.2 Multi Pass Assembler :
A Multi pass assembler can take as many passes(or more than two passes) as
are needed to process the definition of all symbols.
Example :
ALPHA
EQU
BETA
BETA
EQU
DELTA
DELTA
RESW
During the first pass ALPHA, BETA values cannot be assigned, and only
DELTA value is known.
During the second pass BETA is assigned with value DELTA, but ALPHA
value cannot be assigned.
During the third pass only ALPHA value can be assigned with BETA value So
three passes are needed to define all the symbol values.
Sometimes, it is not necessary to make more than two passes over the entire
program. Instead, the portions of program that involve forward references are
stored in symbol tale, and are processed subsequent passes.
Example of multi-pass assembler operation
Location
1000
HALFSZ
EQU
MAXLEN / 2
1003
MAXLEN
EQU
BUFEND-BUFFER
1006
PREUBT
EQU
BUFFER 1
------------------------------------------1034
16
BUFFER
RESB
4096
2034
17
BUFEND
EQU
During pass1 after scanning line no. 1 we create the following symbol table
entries
HALFSZ
&1
MAXLEN / 2
MAXLEN
HALFSZ
undefined.
The symbol MAXLEN is also entered with the flag * identifying it is
undefined. HALFSZ symbol is associated with this entry is a list of the
symbols whose values depends on MAXLEN.
The following symbol table shows the entries after scanning line no. 2, that is
MAXLEN
EQU
BUFEND BUFFER
HALFSZ
&1
MAXLEN / 2
MAXLEN
&2
BUFEND-BUFFER
HALFSZ
BUFEND
MAXLEN
BUFFER
MAXLEN
associated list.(MAXLEN)
The symbol table entries after scanning line no. 3 shows as follows
HALFSZ
&1
MAXLEN / 2
MAXLEN
&2
BUFEND-BUFFER
HALFSZ
BUFEND
MAXLEN
BUFFER
MAXLEN
PREVBT
BUFFER
RESB
BUFEND
HALFSZ
&1
MAXLEN / 2
PREVBT
MAXLEN
4096
BUFFER
HALFSZ
1033
&1
MAXLEN
BUFEND - BUFFER
1034
EQU * .
The BUFEND value is *, that is value of current location counter which is 2034.
Now all the symbol values can be computed with BUFEND symbol value as
follows :
BUFEND
2034
HALFSZ
300
PREVBT
1033
MAXLEN
1000
BUFFER
1034
If any symbols
remained undefined at the end of the program, the assembler would flag them
as errors.
2.5 Implementation Examples :
In this section we discuss the example of assembler for real machine as MASM
assembler (Microsoft Assembler)
2.5.1 MASM assembler :
An MASM assembler language program is written as a collection of segments.
Each segment is defined as belonging to a particular class corresponding to its
contents. The commonly used classes are CODE, DATA, CONST and STACK.
CODE segments are addressed using register CS, and is set to indicate the
segment that contains the starting label specified in the END statement of the
program.
STACK segments are addressed using register SS, and is set to indicate the
segment processed by the loader.
DATA segments are normally addressed using DS, ES, FS or GS.
By default the assembler assumes that all references to data segments use
register DS.
ES
DATASEG2
tells the assembler to assume that register ES indicates the segment DATASEG2.
Thus, any references to labels that are define in DATSEG2 will be assembled
using register ES.
Registers DS, ES, FS and GS must be loaded by the program before they can
be used to address data segments.
For example, the instructions
MOV
AX, DATASEG2
MOV
ES, AX
Example :
JUMP
SHORT
TARGET
6. Write down the pass number (PASS 1 / PASS 2) of the following activities that occur
in a two pass assembler.
a. Object code generation.
b. Literals added to literal table.
c. Listing printed.
d. Address resolution of local symbols.
Pass II
Pass I
Pass II
Pass I
7. What is a program block?
Program block : It refer to segments of code that are rearranged within a single object
program unit and control sections to refer to segments that are translated into independent
object program unit.
8. Define an assembler.
As assembler is a translator that translates source instructions (in symbolic language)
into target instructions (in machine language), on a one to one basis.
9. What are the advantages of having a dual assembler loader system?
It makes it possible to write programs in separate parts that may also be in different
languages. It keeps the assembler small. This is an important advantage. The size of the
assembler depends on the size of its internal tables (especially the symbol table and the
macro definition table). An assembler designed to assemble large programs is large
because of its large tables. Separate assembly makes it possible to assemble very large
programs with a small assembler. When a change is made in the source code, only the
modified program needs to be reassembled. This property is a benefit if one assumes that
assembly is slow and loading is fast. Many times, however, loading is slower than
assembling, and this property is just a feature, not an advantage, of a dual assembler
loader system. The loader automatically loads routines from a library. This is considered
by some an advantage of a dual assembler loader system but, actually, it is not. It could
easily be done in a single assembler loader program. In such a program, the library
would have to contain the source code of the routines, but this is typically not larger than
the object code.
10. What if a certain symbol is needed in pass 2, to assemble an instruction, and is not
found in the symbol table?
The symbol is simply undefined, an error situation occurs.
11. What is the advantage of allowing characters other than letters and digits in a
label?
Certain characters, such as -, allow for a natural division of the name into easy
to read components. Other characters, such as $, = make labels more descriptive.
Examples : NO OF FONTS is more readable than No Of Fonts. REG = DATA is more
descriptive than RegEqData
12. How does the assembler handle an expression such as A B + K L in which all
the symbols are relative but K, L are external?
In the above example, the assembler calculates as much of the expression as it can (A
B) and generates two modify loader directives, one to add the value of K and the other,
to subtract L, both executed at load time.
13. Can we start a program with a USE ABC? In other words, can the first section be
other than the main section?
Yes. The only problem is that the loader needs to be told where to start the execution
of the entire program. This, however, is specified in the END directive and is a standard
feature, used even where no multiple LCs are supported.
14. Why is it a good idea to require a label to start with a letter? What is wrong with 1
A as a label?
It makes it easier for the assembler to distinguish a symbol from a constant in the
operand field. In an instruction such as ADD R1, 1A, the assembler has to scan the
entire name of the symbol 1A to verify that it is a symbol. The restriction to a letter
allows the assembler to scan an instruction such as ADD R1, A1 and, immediately after
scanning the first character of the symbol name, decide that the instruction uses a symbol.
This simplifies the lexical analysis phase of the assembler.
15. What if symbol names can start with a character other than a letter? Can this data
structure still be used? If yes, how?
It depends on the characters allowed. The ASCII codes of the characters <, =, >,
?, @ immediately precede the code of A.
immediately follow Z in the ASCII sequence. If those codes are used, then it is still
easy to use buckets. Given the first character of a symbol name, we only need to subtract
from in the ASCII code of the first of the allowed characters, say <, to get the bucket
number. If other characters are allowed, then buckets may not be a good data structure
for the symbol table.
16. Some assemblers require each directive to start with a period ., for easy
identification. Why isnt such a convention adopted by every assembler?
It adds more work to the programmer and doesnt speed up the identification by much.
Even if the assembler identifies a source line as a directive by the period, it still needs to
search some table to find the start address of the routine that executes it.
17. What is a common use of ORG?
The most common use for ORG is to specify a start address for the program in a
computer without an operating system. On such a machine, the user may select a start
address and may want to load different programs starting at different addresses. In such a
case, the first source line is an ORG and is the only ORG in the program.
18. A 2- pass assembler can handle future symbols and an instruction can therefore use
a future symbol as an operand. This is not always true for directives. The EQU
directive, for example, cannot use a future symbol. The directive A EQU B + 1 is
easy to execute if B is previously defined, but impossible if B is a future symbol.
Whats the reason for this?
The reason is that instructions are assembled in Pass 2, where all the symbols are
already in the symbol table; certain directives, however, are executed in Pass 1, where
future symbols have not been found yet.
symbols.
19. Suggest a way for the assembler to eliminate this limitation such that any source
line could use future symbols.
The simplest way is to add another pass. The directive A EQU B + 1 can be handled
in three passes. In the first pass, label A cannot be defined, since label B is not yet in the
symbol table. However, later in the same pass, B is found and is stored in the symbol
table. In the second pass label A can be defined and, in the third pass, the program can be
assembled. This of course, is not a general solution, since it is possible to nest future
symbols very deep. Imagine something like :
A EQU B
B EQU C
C EQU D
DSuch a program requires four passes just to collect all the symbol definitions, followed
by another pass to assemble instructions. Generally one could design a percolative
assembler that would perform as many passes as necessay, until no more future symbols
remain. This may be a nice theoretical concept but its practical value is nil. Cases such
as AEQUB, where B is a future symbol, are not important and be considered invalid.
20. What exactly does the assembler discover in pass 2?
It finds the new definition of begin, and discovers that it is different from the one
already in the symbol table.
21. What are three types of assembly language statements?
1. Imperative statements : it indicates an action to be performed during the expansion
of the assembled program.
2. Declaration statements : Declare the constant in decimal, binary and hexadecimal
forms.
3. Assembler directives : It instructs the assembler to perform certain action during
the assembly of a program.
4. When the definition for a symbol is encountered, the proper address for the symbol
is then inserted into any instructions previous generated according to the forward
reference list.
27. What is the use of location counter in assembler?
The LC is a variable, maintained by the assembler that contains the address into which
the current instruction will eventually be loaded. When the assembler starts, it clears the
LC, assuming that the first instruction will go into location 0. After each instruction is
assembled, the assembler increments the LC by the size of the instruction. Thus the LC
always contains the current address.
instructions into memory. It writes them on the object file, to be eventually loaded into
memory by the loader. The LC, therefore, does not point to the current instruction. It just
shows where the instruction will eventually be loaded. When the source line has a label
(a newly defined symbol), the label is assigned the current value of the LC as its value.
The label and its value (plus some other information) are then placed in the symbol table.
28. What is the use of symbol table in assembler?
The symbol table is an internal, dynamic table that is generated, maintained, and use
by the assembler. Each entry in the table contains the definition of a symbol and has
fields for the name, value, and type of the symbol. Some symbol tables contain other
information about the symbols. The symbol table starts empty, labels and entered into it
as their definitions are found in the source, and the table is also searched frequently to
find the values and types of symbols whose names are known.
29. Intermediate file contains which type of record?
A record in a typical intermediate file contains
1. The record type. It can be an instruction, a directive, a comment, or an invalid
line.
2. The LC value for the line.
3. A pointer to a specific entry in the OpCode table or the directive table. The second
pass uses this pointer to locate the information necessary to assemble or execute
the line.
4. A copy of the source line. Notice that a label, if any, is not use by pass 2 must be
included in the intermediate .le since it is needed in the final listing.
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
165
170
175
180
185
190
195
Source Statement
Mnemonic
Label
Operand
or Opcode
COPY
FIRST
CLOOP
EOF
THREE
ZERO
RETADR
LENGTH
BUFFER
.
.
.
RDREC
START
STL
JSUB
LDA
COMP
JEQ
JSUB
J
LDA
STA
LDA
STA
JSUB
LDL
RSUB
BYTE
WORD
WORD
RESW
RESW
RESB
LDX
LDA
RLOOP
TD
JEQ
RDREC
RD
COMP
JEQ
STCH
TIX
JLT
EXIT
STX
RSUB
INPUT
BYTE
MAXLEN WORD
.
1000
RETADR
RDREC
LENGTH
ZERO
ENDFIL
WRREC
CLOOP
EOF
BUFFER
THREE
LENGTH
WRREC
READR
C'EOF'
3
0
1
1
4096
Explanation
Write EOF
Get return address
Return to caller
Length of record
4096 - byte buffer area
X'F1'
4096
Subroutine to write record from buffer
200
205
210
215
220
225
230
235
240
245
250
255
.
.
WRREC
WLOOP
INPUT
LDX
TD
JEQ
LDCH
WD
TIX
JLT
RSUB
BYTE
END
ZERO
OUTPUT
WLOOP
BUFFER, X
OUTPUT
LENGTH
WLOOP
X'05'
FIRST
Figure 2 :
Line
Loc
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
1000
1000
1003
1006
1009
100C
100F
1012
1015
1018
101B
101E
1021
1024
1027
102A
102D
1030
1033
1036
1309
2039
203C
203F
Label
COPY
FIRST
CLOOP
EOF
THREE
ZERO
RETADR
LENGTH
BUFFER
.
.
.
RDREC
RLOOP
Source Statement
Mnemonic
Operand
or Opcode
START
STL
JSUB
LDA
COMP
JEQ
JSUB
J
LDA
STA
LDA
STA
JSUB
LDL
RSUB
BYTE
WORD
WORD
RESW
RESW
RESB
1000
RETADR
RDREC
LENGTH
ZERO
ENDFIL
WRREC
CLOOP
EOF
BUFFER
THREE
LENGTH
WRREC
READR
C'EOF'
3
0
1
1
4096
Object Code
141033
482039
001036
281030
301015
482061
3C1003
00102A
0C1039
00102D
0C1036
482061
081033
4C0000
454F46
000003
000000
ZERO
ZERO
INPUT
041030
001030
E0205D
140
145
150
155
160
165
170
175
180
185
190
195
200
205
210
215
220
225
230
235
240
245
250
255
2042
2045
2048
204B
204E
2051
2054
2057
205A
205D
205E
2061
2064
2067
206A
206D
2070
2073
2076
2079
RDREC
EXIT
INPUT
MAXLEN
.
.
.
WRREC
WLOOP
INPUT
JEQ
RD
COMP
JEQ
STCH
TIX
JLT
STX
RSUB
BYTE
WORD
RLOOP
INPUT
ZERO
EXIT
BUFFER, X
MAXLEN
RLOOP
LENGTH
X'F1'
4096
30203F
D8205D
281030
302057
549039
2C205E
38203F
101036
4C0000
F1
1000
ZERO
OUTPUT
WLOOP
BUFFER, X
OUTPUT
LENGTH
WLOOP
X'05'
FIRST
041030
E02079
302064
509039
DC2079
2C1036
382064
4C0000
5
Figure 3 :
Line
Loc
5
10
12
13
15
20
25
30
35
40
45
50
55
60
65
0000
0000
0003
COPY
FIRST
0006
000A
000D
0010
0013
0017
001A
001D
0020
0023
0026
CLOOP
Label
ENDFIL
Source Statement
Mnemonic
Operand
or Opcode
START
STL
LDB
BASE
+ JSUB
LDA
COMP
JEQ
+ JSUB
J
LDA
STA
LDA
STA
+ JSUB
0
.RETADR
#LENGTH
LENGTH
RDREC
LENGTH
#0
ENDFIL
WRREC
CLOOP
EOF
BUFFER
#3
LENGTH
WRREC
Object Code
17202D
69202D
4B101036
032026
290000
332007
4B10105D
3F2FEC
032010
0F2016
010003
0F200D
4B10105D
70
80
95
100
105
110
115
120
125
130
132
133
135
140
145
150
155
160
165
170
175
180
185
195
200
205
210
212
215
220
225
230
235
240
245
250
255
002A
002D
0030
0033
0036
1036
1038
103A
103C
1040
1043
1046
1049
104B
104E
1051
1053
1056
1059
105C
105D
105F
1062
1065
1068
106B
106E
1070
1073
1076
EOF
RETADR
LENGTH
BUFFER
.
.
.
RDREC
RLOOP
EXIT
INPUT
.
.
.
WRREC
WLOOP
OUTPUT
J
BYTE
RESW
RESW
RESB
@RETADR
C'EOF'
1
1
4096
3E2003
454F46
X
A
S
#4096
INPUT
RLOOP
INPUT
A, S
EXIT
BUFFER, X
T
RLOOP
LENGTH
X'F1'
B410
B400
B440
75101000
E32019
332FFA
DB2013
A004
332008
57C003
B850
3B2FEA
134000
4F0000
F1
X
LENGTH
OUTPUT
WLOOP
BUFFER, X
OUTPUT
T
WLOOP
X'05'
FIRST
B410
774000
E32011
332FFA
53C003
DF2008
B850
3B2FEF
4F0000
5