Name: Asad Ahmed Sheikh SEAT # EB19103017 Serial # 09 Subject: Assembly Language - 402 Assignment # 01 Question # 1
Name: Asad Ahmed Sheikh SEAT # EB19103017 Serial # 09 Subject: Assembly Language - 402 Assignment # 01 Question # 1
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
QUESTION # 1
Draw the block diagram of three bus architecture and label each block of it.
QUESTION # 2
What are the advantages of having wider address and data bus? Give examples of each bus.
QUESTION # 3
Is data bus bidirectional? If so justify your answer (hint: fetch, decode and extension)
The microprocessor has to fetch (read) the data from the memory or input device for
processing and after processing it has to store (write) the data in the memory or output device.
Hence, the data bus is bidirectional.
Data bus is bidirectional because data flow in both directions, from microprocessor to
memory or Input/output devices and from memory or Input/output devices to
microprocessor
QUESTION # 4
What is the function of EU and BIU in a microprocessor? Also list the registers of 8086
8086 Microprocessor is divided into two functional units, i.e., EU (Execution Unit) and BIU (Bus
Interface Unit).
EU (Execution Unit)
Execution unit gives instructions to BIU stating from where to fetch the data and then decode
and execute those instructions. Its function is to control operations on data using the
instruction decoder & ALU. EU has no direct connection with system buses as shown in the
above figure, it performs operations over data through BIU.
Let us now discuss the functional parts of 8086 microprocessors.
ALU
It handles all arithmetic and logical operations, like +, −, ×, /, OR, AND, NOT operations.
Flag Register
It is a 16-bit register that behaves like a flip-flop, i.e. it changes its status according to the
result stored in the accumulator. It has 9 flags and they are divided into 2 groups − Conditional
Flags and Control Flags.
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
Conditional Flags
It represents the result of the last arithmetic or logical instruction executed. Following is the
list of conditional flags −
Carry flag − This flag indicates an overflow condition for arithmetic operations.
Auxiliary flag − When an operation is performed at ALU, it results in a carry/barrow
from lower nibble (i.e. D0 – D3) to upper nibble (i.e. D4 – D7), then this flag is set, i.e.
carry given by D3 bit to D4 is AF flag. The processor uses this flag to perform binary to
BCD conversion.
Parity flag − This flag is used to indicate the parity of the result, i.e. when the lower
order 8-bits of the result contains even number of 1’s, then the Parity Flag is set. For
odd number of 1’s, the Parity Flag is reset.
Zero flag − This flag is set to 1 when the result of arithmetic or logical operation is zero
else it is set to 0.
Sign flag − This flag holds the sign of the result, i.e. when the result of the operation is
negative, then the sign flag is set to 1 else set to 0.
Overflow flag − This flag represents the result when the system capacity is exceeded.
Control Flags
Control flags controls the operations of the execution unit. Following is the list of control flags
−
Trap flag − It is used for single step control and allows the user to execute one
instruction at a time for debugging. If it is set, then the program can be run in a single
step mode.
Interrupt flag − It is an interrupt enable/disable flag, i.e. used to allow/prohibit the
interruption of a program. It is set to 1 for interrupt enabled condition and set to 0 for
interrupt disabled condition.
Direction flag − It is used in string operation. As the name suggests when it is set then
string bytes are accessed from the higher memory address to the lower memory
address and vice-a-versa.
General purpose register
There are 8 general purpose registers, i.e., AH, AL, BH, BL, CH, CL, DH, and DL. These registers
can be used individually to store 8-bit data and can be used in pairs to store 16bit data. The
valid register pairs are AH and AL, BH and BL, CH and CL, and DH and DL. It is referred to the
AX, BX, CX, and DX respectively.
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
AX register − It is also known as accumulator register. It is used to store operands for
arithmetic operations.
BX register − It is used as a base register. It is used to store the starting base address of
the memory area within the data segment.
BIU takes care of all data and addresses transfers on the buses for the EU like sending
addresses, fetching instructions from the memory, reading data from the ports and the
memory as well as writing data to the ports and the memory. EU has no direction connection
with System Buses so this is possible with the BIU. EU and BIU are connected with the Internal
Bus.
It has the following functional parts −
Instruction queue − BIU contains the instruction queue. BIU gets upto 6 bytes of next
instructions and stores them in the instruction queue. When EU executes instructions
and is ready for its next instruction, then it simply reads the instruction from this
instruction queue resulting in increased execution speed.
Fetching the next instruction while the current instruction executes is called pipelining.
Segment register − BIU has 4 segment buses, i.e. CS, DS, SS& ES. It holds the addresses
of instructions and data in memory, which are used by the processor to access memory
locations. It also contains 1 pointer register IP, which holds the address of the next
instruction to executed by the EU.
o CS − It stands for Code Segment. It is used for addressing a memory location in
the code segment of the memory, where the executable program is stored.
o DS − It stands for Data Segment. It consists of data used by the program andis
accessed in the data segment by an offset address or the content of other
register that holds the offset address.
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
o SS − It stands for Stack Segment. It handles memory to store data and addresses
during execution.
o ES − It stands for Extra Segment. ES is additional data segment, which is used by
the string to hold the extra destination data.
Instruction pointer − It is a 16-bit register used to hold the address of the next
instruction to be executed.
QUESTION # 5
Differentiate between real and protected mode of a microprocessor
The main difference is the mode the cpu is in. In protected mode the OS can use features like
paging and virtual memory. Also real mode code is never in 32 bits whereas protected mode
code can be 16 bits or 32 bits. Every x86 cpu starts in real mode and the OS must switch to
protected mode.
Real Mode:
1. Following a system reset the 386 is initialized in Real Mode. In this mode the chip looks
virtually identical to an 8086. That is, it has the following features:
o The address space is limited to 1 MB using address lines AO-A19 (the high
address lines A20-A31 are inactive).
o The segment memory addressing mechanism of the 8086 is retained with each
segment limited to 64 KB.
2. Two new features are available to the programmer in Real Mode—access to the 32-bit
register set of the 386 and the addition of two new segments called F and G. This will be
explained in more detail when the programming model is introduced later in this
chapter.
Protected Mode:
1. The primary difference between Real Mode and Protected Mode is the latter's new
addressing mechanism and protection levels.
2. Although memory segments are still retained, each segment may range from a single
byte to 4 GB (the full physical address space of the 386).
3. The addresses stored in the segment registers are now interpreted as pointers into a
descriptor table.
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
4. Each segment's entry in this table is eight bytes long and identifies the 32-bit base
address of the segment, the segment size, and the access rights. Memory addresses are
computed by adding the offset specified by the instruction to the segment base address.
QUESTION # 6
How multitasking is achieved using protected mode of a micro-processer?
QUESTION # 7
What is the concept of VM (Virtual Memory), Cache memory and FPU (Floating point unit) of a
microprocessor?
Virtual Machine
Cache Memory
Cache memory is a chip-based computer component that makes retrieving data from the
computer's memory more efficient. It acts as a temporary storage area that the computer's
processor can retrieve data from easily. This temporary storage area, known as a cache, is more
readily available to the processor than the computer's main memory source, typically some
form of DRAM.
Separate bus interconnect with the CPU. Therefore, it is more accessible to the processor, and
able to increase efficiency, because it's physically close to the processor.
In order to be close to the processor, cache memory needs to be much smaller than main
memory. Consequently, it has less storage space. It is also more expensive than main memory,
as it is a more complex chip that yields higher performance.
What it sacrifices in size and price, it makes up for in speed. Cache memory operates between
10 to 100 times faster than RAM, requiring only a few nanoseconds to respond to a CPU
request.
The name of the actual hardware that is used for cache memory is high-speed static random
access memory (SRAM). The name of the hardware that is used in a computer's main memory
is dynamic random access memory (DRAM).
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
Cache memory is not to be confused with the broader term cache. Caches are temporary stores
of data that can exist in both hardware and software. Cache memory refers to the specific
hardware component that allows computers to create caches at various levels of the network.
A floating point unit is an integrated circuit which handles all mathematical operations that
have anything to do with floating point numbers or fractions. It is a dedicated logic unit
specifically designed to work on floating point numbers and nothing else, hence the name. It
can be defined as a specialized coprocessor that can manipulate numbers quicker than the
basic microprocessor circuitry itself.
The FPU performs simple mathematical tasks which include addition, subtraction, division,
multiplication and square root. Older FPUs process transcendental functions like exponential
and trigonometric calculations but these can be expensive and complicated to implement, so in
modern FPUs, these are done via software library routines.
Not all computer systems have hardware FPU. Those that do not have FPU can emulate its
functions in multiple ways:
QUESTION # 8
Define super scalar architecture of a Pentium processor.
'Pentium processor has a superscalar architecture'. Explain the meaning of the statement.
The Pentium microprocessor is organized along with three execution units. Single executes
floating-point instructions, and the other two are U-pipe and V-pipe executes integer
instructions. It means that this is possible to execute three instructions concurrently.
A more aggressive approach is to equip the processor with multiple processing units to
handle several instructions in parallel in each processing stage. With this arrangement,
several instructions start execution in the same clock cycle and the process is said to use
Multiple issue. Such processors are capable of achieving an instruction execution throughput
of more than one instruction per cycle. They are known as ‘Superscalar Processors’.
In the above diagram, there is a processor with two execution units; one for integer and one
for floating point operations. The instruction fetch unit is capable of reading the instructions
at a time and storing them in the instruction queue. In each cycle, the dispatch unit retrieves
and decodes up to two instructions from the front of the queue. If there is one integer, one
floating point instruction and no hazards, both the instructions are dispatched in the same
clock cycle.
Superscalar Architectures: • If one pipeline is good, then two pipelines are better. Example:
Consider the following architecture (as in Pentium); – Single instruction fetch unit fetches pairs
of instructions together and puts each one into its own pipeline, complete with its own ALU for
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
parallel operation. – The main pipeline (U-Pipeline) could execute an arbitrary Pentium
instruction. – The V-Pipeline could execute only simple integer instructions (and also one simple
floating-point instruction). – If the instructions in a pair were not simple enough or
incompatible, only the first one was executed (in U-pipeline). The second one was then held
and paired with the instruction following it
Example: A superscalar processor (Pentium II) with 5 functional units: • It is possible to have
multiple ALUs in stage (S4). • Most of the functional units in stage (S4) take longer than one
clock cycle to execute. • Stage (S3) can issue instructions faster than the S4 stage.
QUESTION # 9
How can you distinguish core i3, i5 from 80486 series of processors?
Due to a large on-chip cache and an integrated floating-point unit. It represents a fourth
generation of binary compatible CPUs since the original 8086 of 1978.
A 50 MHz 80486 executes around 40 million instructions per second on average and is able to
reach 50 MIPS peak performance, approximately twice as fast as the 80386 or 80286 per clock
cycle, thanks to its five stage pipeline with all stages bound to a single cycle. The on chip
enhanced FPU unit was also significantly faster than the 80387 per cycle
Intel Core i3 processors are where the Core lineup starts for each generation. In general, Core i3
processors have lower core counts than higher-grade CPUs. This used to mean that Core i3’s
started with dual-core processors, but for recent generations, that core count has gone up to
four on the desktop.
Those earlier dual-core Core i3’s also tended to have four threads, also known as Hyper-
Threading. Intel has elected not to double the thread count in recent Core i3 generations;
instead, it’s building CPUs with four cores and four threads.
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
Core i3 processors also have lower cache sizes (onboard memory). They handle less RAM than
other Core processors and have varying clock speeds. At this writing, the ninth-generation, Core
i3 desktop processors have a top clock speed of 4.6 GHz; however, that’s only the higher-
end Core i3-9350K.
A step up from Core i3 is the Core i5. This is often where bargain-hunting PC gamers look for
solid deals on processors. An i5 typically lacks Hyper-Threading, but it has more cores
(currently, six, rather than four) than Core i3. The i5 parts also generally have higher clock
speeds, a larger cache, and can handle more memory. The integrated graphics are also a bit
better.
You see new Core i5 processors with Hyper-Threading on laptops, but not desktops.
Specifications
Please visit Intel 80486 overdrive 50 MHz and Intel Core i3-530 pages for more detailed
specifications of both microprocessors.
Manufacturer Intel
Basic details
CPU features
Clock Multiplier 2
TDP (Watt) 73
Cores 1 2
Threads 1 4
Multiprocessing 1
MMX - +
SSE - +
SSE2 - +
SSE3 - +
SSE4.1 - +
SSE4.2 - +
QUESTION # 10
Register holds the small amount of Memory of the computer can range
2. data around 32-bits to 64-bits. from some GB to TB.
6. Registers are faster than memory. RAM is much slower than registers.
BASIS FOR
COMPARISON REGISTER MEMORY
Register holds the small amount of Memory of the computer can range
Capacity data around 32-bits to 64-bits. from some GB to TB.
QUESTION # 11
1.0 ns
Clock Time
It is the reciprocal of the clock frequency. For example, a 1 GHz processor has a cycle time of
1.0 ns and a 4 GHz processor has a cycle time of 0.25 ns
Clock Time
Clock time (CT) is the period of the clock that synchronizes the circuits in a processor. It is the
reciprocal of the clock frequency.
For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor has a cycle
time of 0.25 ns.
QUESTION # 12
Address Contents
0 01101010
1 11011101
2 00010001
3 11111111
4 01010101
Answer:
a)00010001 11111111
b)11111111 01010101
c)00010001 11011101
QUESTION # 13
HEX TO BINARY
(FAE2CH)16 = (11111010111000101100)2
HEX TO DECIMAL
(FAE2CH)16 = (1027628)10
QUESTION # 14
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
FEFFEh + FBCADh
QUESTION # 15
A 16-bit register can store 216 different values. The signed range of integer values that can be
stored in 16 bits is −32,768 (−1 × 215) through 32,767 (215 − 1); the unsigned range is 0 through
65,535 (216 − 1). Since 216 is 65,536, a processor with 16-bit memory addresses can directly
access 64 KB (65,536 bytes) of byte-addressable memory.
65535
16 bit unsigned numbers
There are 65,536 different unsigned 16-bit numbers. The smallest unsigned 16-bit number is 0
and the largest is 65535.
QUESTION # 16
To form a multiplexed of data bus and address bus, four bits of 8086 address bus are grounded.
QUESTION # 17
Answer: 7500h 3. Determine the physical address of a memory location given by 0A51:CD90h.
Answer: physical address= 172A0 4.
QUESTION # 18
QUESTION # 19
You are using gcc to link and that will by default add the C libraries which expect entry
point main and already contain a _start that invokes main. That's why you have the multiple
definition.
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
If you do not need the C library (and in this code doesn't), but still would like to use gcc for
linking, try gcc -nostdlib -m32 -o hello hello.o.
The wrong format error is due to trying to produce a 64 bit executable from a 32 bit object file.
Adding the -m32 fixes that so you get a 32 bit executable (since your code is 32 bit). If you
intend to create a 64 bit program, use -f elf64 for nasm and of course write 64 bit compatible
code
CODE EXAMPLE
section .data
section .text
mov eax,4
mov ebx,1
mov ecx,msg
mov edx,len
int 0x80
mov eax,1
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
xor ebx,ebx
int 0x80
QUESTION # 21
Addressing modes are the ways how architectures specify the address of an object they want to
access. In GPR machines, an addressing mode can specify a constant, a register or a location in
memory.
The most common names for addressing modes (names may differ among architectures)
Addressing Example
Meaning When used
modes Instruction
Register Add R4,R3 R4 <- R4 + R3 When a value is in a register
Immediate Add R4, #3 R4 <- R4 + 3 For constants
Add R4,
Displacement R4 <- R4 + M[100+R1] Accessing local variables
100(R1)
Register Accessing using a pointer or a
Add R4,(R1) R4 <- R4 + M[R1]
differed computed address
Indexed Add R3, (R1 + R3 <- R3 + M[R1+R2] Useful in array addressing:
R2) R1 - base of array
NAME: ASAD AHMED SHEIKH
SEAT # EB19103017
SERIAL # 09
SUBJECT: ASSEMBLY LANGUAGE – 402
ASSIGNMENT # 01
R2 - index amount
Direct Add R1, (1001) R1 <- R1 + M[1001] Useful in accessing static data
Memory If R3 is the address of a pointer p, then
Add R1, @(R3) R1 <- R1 + M[M[R3]]
deferred mode yields *p
Useful for stepping through arrays in a
Auto- R1 <- R1 +M[R2] loop.
Add R1, (R2)+
increment R2 <- R2 + d R2 - start of array
d - size of an element
Same as auto increment.
Auto- R2 <-R2-d
Add R1,-(R2) Both can also be used to implement a
decrement R1 <- R1 + M[R2]
stack as push and pop
Used to index arrays. May be applied to
Add R1, R1<-
Scaled any base addressing mode in some
100(R2)[R3] R1+M[100+R2+R3*d]
machines.
Notation:
<- - assignment
M - the name for memory:
M[R1] refers to contents of memory location whose address is given by the contents of R1
Another important instruction set measurement is the range of values for immediate . Small
immediate values are used most heavily. However, large immediate are sometimes used, most
likely in address calculations.
QUESTION # 22