Computer Organization and Architecture

Module 2
Computer Organization and Architecture, Carl

Hamacher, Zvonko Vranesic and Safwat Zaky. Fifth
Edition McGraw-Hill, 2002.
Contents
• Basic processing unit Arithmetic algorithms:
• Fundamental concepts Algorithms for multiplication and
division of binary and BCD numbers

• Instruction cycle
Array multiplier
• Execution of a complete
Booth’s multiplication algorithm
instruction
Restoring and non-restoring division
• Multiple- bus organization
Algorithms for floating point,
• Sequencing of control signals.
multiplication and division.
Basic Processing Unit
Overview
• Processor is also called Instruction Set Processor (ISP)
• Central Processing Unit (CPU)
• A typical computing task consists of a series of steps specified
by a sequence of machine instructions that constitute a
program.
• An instruction is executed by carrying out a sequence of more
elementary operations.
SOME FUNDAMENTAL CONCEPTS
Fundamental Concepts
• Processor fetches one instruction at a time and perform
the operation specified.
• Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
• Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
• Instruction Register (IR) holds instruction to be executed
Contd...
• The operation specified by an instruction can be carried out by
performing one or more of the following actions:
1) Read the contents of a given memory-location and load them
into a register.
2) Read data from one or more registers.

3) Perform an arithmetic or logic operation and place the result
into a register.
4) Store data from a register into a given memory-location.
Executing an Instruction
• Fetch the contents of the memory location pointed to by
the PC. The contents of this location are loaded into the
IR (fetch phase).
IR ← [[PC]]
• Assuming that the memory is byte addressable,

increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
• Carry out the actions specified by the instruction in the IR

(execution phase).
Model of Control Unit
Functions of Control Unit using Control Signals
• Sequencing
– CU causes the CPU to step through a series of micro-operations in proper sequence
based on the program being executed.
– E.g. In order to carry out a task such as ADD, the control unit must generate a set of
control signals in a predefined sequence governed by the HW structure of the
processing section.
• Execution
– CU causes each micro-operation to be performed
• Control Signals
– External: inputs indicating the state of the system
– Internal: logic required to perform the sequencing and execution functions

Single bus organization
• ALU, control unit and all the registers are connected via a single common bus
(Called Internal Bus)
• Bus is internal to processor and should not be confused with external bus
that connects processor to memory and I/O devices.
• Data lines of the external memory bus are connected to internal processor
bus via MDR.
– Register MDR has two inputs and two outputs.
– Data may be loaded to (from) MDR from (to) internal processor bus or external
memory bus.
• Address lines of the external memory bus are connected to internal
processor bus via MAR.
– MAR receives input from the internal processor bus.
– MAR provides output to external memory bus.
Single bus
Processor Organization
organization of
datapath inside a
processor
Contd…
• Instruction decoder and control logic block, or control unit issues signals
to control operation of all units inside processor and for interacting with
memory bus.
– Control signals depend on instruction loaded in the Instruction Register (IR)
• Outputs from the control logic block are connected to:

– Control lines of the memory bus.
– ALU, to determine which operation is to be performed.
– Select input of multiplexer MUX to select between Register Y and constant 4.
– Control lines of the registers, to select the registers.

Contd…
• Registers Y, Z, and TEMP:
– Used by processor for temporary storage during execution of some
instructions.
– Note that Registers R0 to R(n-1) are used to store data generated by
one instruction for later use by another instruction.
– The programmer cannot access these 3 registers.
• Multiplexer MUX selects either the output of register Y or a constant 4,
depending upon the control input Select.
– Constant 4 is used to increment the value of the PC.
• B input of ALU is obtained directly from processor-bus.
– As instruction execution progresses, data are transferred from one
register to another, often passing through ALU
Executing an Instruction
• Transfer a word of data from one processor register to another or
to the ALU.
• Perform an arithmetic or a logic operation and store the result in
a processor register.
• Fetch the contents of a given memory location and load them
into a processor register.
• Store a word of data from a processor register into a given
memory location.
Contd...
• Disadvantage: Only one data-word can be

transferred over the bus in a clock cycle.
• Solution:
– Provide multiple internal-paths.
– Multiple paths allow several data-transfers to take
place in parallel
Types of Operations
1.Register Transfer
2.Fetch from Memory
3.Store to Memory
4.Arithmetic/Logic Ops.
5.Execution of Complete Instruction
6.Branching Ops.
REGISTER TRANSFERS
• Instruction execution involves a sequence of steps in which data
are transferred from one register to another.
• For each register, two control-signals are used: Ri in & Riout. These
are called Gating Signals.
• Riin=1  data on bus is loaded into Ri.
• Riout=1  content of Ri is placed on bus.
• Riout=0,  bus can be used for transferring data from other

registers.
Contd…
EX:
Transfer the contents of R1 to R4(MOV R1,R4)
1. Enable output of register R1 by setting R1out=1. This
places the contents of R1 on the processor bus.
2. Enable input of register R4 by setting R4in=1. This
loads the data from the processor bus into register R4.
Contd...
• All operations and data transfers within the processor take
place within time-periods defined by the processor-clock.
• Control-signals that govern a particular transfer are
asserted at the start of the clock cycle.
Register Transfers Internal processor
b us
Riin
Input and output gating for the registers

Ri
Ri out
Y in
Constant 4
Select MUX
A B
ALU
Z in
Z out
Simple register transfer example
CONTROL-SIGNALS OF MDR
• The MDR register has 4 control-signals :
– MDRin & MDRout control the connection to the internal processor
data bus
– MDRinE & MDRoutE control the connection to the memory Data bus.
• MAR register has 2 control-signals.

– MARin controls the connection to the internal processor address bus
– MARout controls the connection to the memory address bus
Fetch from Memory
Fetching a Word from Memory
• To fetch instruction/data from memory, processor transfers required address to MAR.
• At the same time, processor issues Read signal on control-lines of memory-bus.
• When requested-data are received from memory, they are stored in MDR. From MDR,
they are transferred to other registers.
• The response time of each memory access varies (based on cache miss, memory-mapped
I/O).
• To accommodate this, MFC is used. (MFC  Memory Function Completed).
• MFC is a signal sent from addressed-device to the processor.
• MFC informs the processor that the requested operation has been completed by
addressed-device.
Contd...
• Consider the instruction Move (R1),R2. The sequence of steps
– R1out, MARin, Read ;desired address is loaded into MAR & Read
command is issued.
– MDRinE, WMFC ;load MDR from memory-bus & Wait for MFC
response from memory.
– MDRout, R2in ;load R2 from MDR. where WMFC=control-signal
that causes processor's control. circuitry to wait for arrival of
MFC signal
Storing a Word in Memory
• Consider the instruction Move R2,(R1). This

requires the following sequence:
– R1out, MARin ;desired address is loaded into MAR.
– R2out, MDRin, Write ;data to be written are loaded into
MDR & Write command is issued.
– MDRoutE, WMFC ;load data into memory-location
pointed by R1 from MDR.
Store into Memory
Performing an Arithmetic or Logic Operation
• ALU performs arithmetic operations on 2 operands

– One of the operands is output of MUX;
– And, the other operand is obtained directly from processor-bus.
• Result (produced by ALU) is stored temporarily in register Z.

• The sequence of operations for [R3][R1]+[R2] is as
follows:
– R1out, Yin
– R2out, SelectY, Add, Zin
– Zout, R3in
Contd...
• Instruction execution proceeds as follows:
– Step 1 --> Contents from register R1 are loaded into register Y.
– Step2 --> Contents from Y and from register R2 are applied to the A
and B inputs of ALU; Addition is performed & Result is stored in the Z
register.
– Step 3 --> The contents of Z register is stored in the R3 register.
• The signals are activated for the duration of the clock cycle
corresponding to that step. All other signals are inactive.
Instruction Cycle
• The instruction execution may involve several operations and depends on
the nature of the instruction.
• Processing required for a single instruction is called an instruction cycle.
• At the beginning of each instruction cycle, processor fetches an instruction
from memory.
• Program counter (PC) holds address of the instruction to be fetched next.
• The processor increments the PC after each instruction fetch so that it will
fetch the next instruction in sequence
• The fetched instruction is loaded into instruction register (IR).
• Instruction contains bits that specify the action the processor is to take.
• The processor interprets the instruction and performs the required action.
ADD B,A
• ADD B,A, that stores the sum of the contents of memory locations B and A
into memory location A.
• A single instruction cycle with the following steps occurs:

– Fetch the ADD instruction.
– Read the contents of memory location A into the processor
– Read the contents of memory location B into the processor. In order that the contents
of A are not lost, the processor must have at least two registers for storing memory
values.
– Add the two values.

Instruction Cycle - with Interrupts
• Virtually all computers provide a mechanism by

which other modules (I/O, memory) may interrupt
the normal processing of the processor.
• Interrupts are provided primarily as a way to improve
processing efficiency.
• For example, most external devices are much slower
than the processor.
Execution of a Complete Instruction
• Consider the instruction Add (R3),R1 which adds the

contents of a memory-location pointed by R3 to register R1.
Executing this instruction requires the following actions:
– Fetch the instruction
– Fetch the first operand (the contents of the memory location

pointed to by R3)
– Perform the addition
– Load the result into R1

Execution of a Complete Instruction
Internal processor
bus
Control signals
Add (R3), R1
PC
Instruction
Step Action Address
decoder and
lines
MAR control logic
1 PC out , MAR in , Read,Select4,Add, Zin Memory

bus
2 Zout , PC in , Y in , WMF C MDR

Data
IR
3 MDR out , IR in lines
4 R3out , MAR in , Read Y

R0
5 R1out , Y in , WMF C Constant 4
6 MDR out , SelectY,Add, Zin Select MUX
7 Zout , R1in , End Add

A B
ALU Sub R n - 1 
control ALU
lines
Carry-in
XOR TEMP
Figure7.6. Control sequence
for executionof theinstructionAdd (R3),R1.
Z
Figure 7.1. Single-bus organization of the datapath inside a processor.

Contd...
Step1--> The instruction-fetch operation is initiated by
→ loading contents of PC into MAR &
→ sending a Read request to memory.
• The Select signal is set to Select4, which causes the Mux to select
constant 4.
• This value is added to operand at input B (PC‟s content), and the result
is stored in Z.
Step2--> Updated value in Z is moved to PC. This completes the PC
increment operation and PC will now point to next instruction.

Contd...
Step3--> Fetched instruction is moved into MDR and then to IR.
• The step 1 through 3 constitutes the Fetch Phase.
• At the beginning of step 4, the instruction decoder
interprets the contents of the IR.
• This enables the control circuitry to activate the
control-signals for steps 4 through 7.
• The step 4 through 7 constitutes Execution Phase.
Contd...
Step4--> Contents of R3 are loaded into MAR & a memory read signal is
issued.
Step5--> Contents of R1 are transferred to Y to prepare for addition.
Step6--> When Read operation is completed, memory-operand is
available in MDR, and the addition is performed.
Step7--> Sum is stored in Z, then transferred to R1.The End signal
causes a new instruction fetch cycle to begin by returning to step1.

Execution of Branch Instructions
• A branch instruction replaces the contents of PC with the
branch target address, which is usually obtained by adding an
offset X given in the branch instruction.
• The offset X is usually the difference between the branch

target address and the address immediately following the
branch instruction.
• Conditional branch
Execution of Branch Instructions
StepAction
1 PCout , MAR in , Read,Select4,Add, Zin

2 Zout, PCin , Yin, WMF C
3 MDRout , IR in
4 Offset-field-of-IRout
, Add, Zin
5 Zout, PCin , End
Figure 7.7. Control sequence for an unconditional branch instruction.

Contd...
• Step 1-3--> The processing starts & the fetch phase ends in step3.
• Step 4--> The offset-value is extracted from IR by instruction-decoding
circuit. Since the updated value of PC is already available in register Y, the
offset X is gated onto the bus, and an addition operation is performed.
• Step 5--> the result, which is the branch-address, is loaded into the PC.
• The branch instruction loads the branch target address in PC so that PC will
fetch the next instruction from the branch target address.
• Branch target address is usually obtained by adding offset in contents of PC.
• The offset X is usually the difference between the branch target-address and
the address immediately following the branch instruction.

Contd...
• In case of conditional branch,

– we have to check the status of the condition-codes before
loading a new value into the PC.
– e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
– If N=0, processor returns to step 1 immediately after step 4.
– If N=1, step 5 is performed to load a new value into PC.

Multiple-Bus Organization
• Disadvantage of Single-bus organization:
– Only one data-word can be transferred over the bus in a clock
cycle.
– This increases the steps required to complete the execution of the
instruction
• Solution:
– To reduce the number of steps, most processors provide multiple
internal-paths.
– Multiple paths enable several transfers to take place in parallel.
Multiple-bus organization
• Simple single-bus structure

– Results in long control sequences, because only one data
item can be transferred over the bus in a clock cycle.
• Multiple-bus organization.
– Most commercial processors provide multiple internal
paths to enable several transfers to take place in parallel.
Contd….
• General purpose registers are combined into a single block

called register file.
• 3 ports,2 output ports

– Access two different registers and have their contents on buses A
and B
– Third port allows data on bus c during same clock cycle.
– Bus A & B are used to transfer source operands to A & B inputs of

the ALU.
– ALU operation is performed.
– The result is transferred to the destination over the bus C.

Contd…
• ALU may simply pass one of its 2 input operands
unmodified to bus C.
– The ALU control signals for such an operation R=A or R=B.
• Incrementer unit is used to increment the PC by 4.

– Using the incrementer eliminates the need to add the constant
value 4 to the PC using the main ALU.
– The source for the constant 4 at the ALU input multiplexer can
be used to increment other address such as loadmultiple &
storemultiple
Three bus organization
Multiple-Bus Organization
• Add R4, R5, R6
StepAction
1 PCout, R=B, MAR in , Read, IncPC

2 WMFC
3 MDRoutB, R=B, IR in
4 R4outA, R5outB, SelectA,Add, R6in, End
Figure 7.9. Control sequence for the instruction. Add R4,R5,R6,

for the three-bus organization in Figure 7.8.
1
2
3
4
Contd…
• Step 1:The contents of PC are passed through the ALU using

R=B control signal & loaded into MAR to start a memory read
operation. At the same time PC is incrementer by 4
• Step 2:The processor waits for MFC
• Step 3: Loads the data ,received into MDR ,then transfers

them to IR.
• Step 4: The execution phase of the instruction requires only
one control step to complete.
Exercise
Internal processor
bus
Control signals
PC
• What is the control Address

Instruction
decoder and
lines
MAR control logic
sequence for execution of Memory

bus
MDR
Data
the instruction lines IR
Add R1, R2 Constant 4 R0
Select MUX
including the instruction Add

A B
ALU Sub R n - 1 
control ALU
lines
fetch phase? (Assume single XOR
Carry-in
TEMP
Z
bus architecture)
Figure 7.1. Single-bus organization of the datapath inside a processor.

Number Representation- Unsigned Integer
• 3 major number representations:

Sign and magnitude representation
One’s complement representation
Two’s complement representation
• Assumptions:
4-bit machine word
16 different values can be represented
Roughly half are positive, half are negative
Sign and Magnitude Representation
-7 +0
-6 1111 0000 +1
1110 0001
-5 +2 +
1101 0010
-4 1100 0011 +3 0 100 = + 4
-3 1011 0100 +4 1 100 = - 4

1010 0101
-2 +5 -
1001 0110
-1 1000 0111 +6
-0 +7
• High order bit is sign: 0 = positive (or zero), 1 = negative
• Number range for n bits = +/-2n-1 -1
• Two representations for 0

One’s Complement Representation
-0 +0
-1 1111 0000 +1
1110 0001
-2 +2 +
1101 0010
-3 1100 0011 +3 0 100 = + 4
-4 1011 0100 +4 1 011 = - 4

1010 0101
-5 +5 -
1001 0110
-6 1000 0111 +6
-7 +7
 Subtraction implemented by addition & 1's complement
 Still two representations of 0! This causes some problems
 Some complexities in addition
Two’s Complement Representation
-1 +0
-2 1111 0000 +1
1110 0001
-3 +2 +
1101 0010
-4 1100 0011 +3 0 100 = + 4
like 1's comp
except shifted -5 1011 1 100 = - 4
one position
0100 +4
clockwise 1010 0101
-6 +5 -
1001 0110
-7 1000 0111 +6
-8 +7
 Only one representation for 0
 One more negative number than positive number
Binary, Signed-Integer Representations
B Values represented
Sign and
b3 b2 b1 b0 magnitude 1' s complement 2' s complement
0 1 1 1 +7 +7 + 7
0 1 1 0 +6 +6 + 6
0 1 0 1 +5 +5 + 5
0 1 0 0 +4 +4 + 4
0 0 1 1 +3 +3 + 3
0 0 1 0 +2 +2 + 2
0 0 0 1 +1 +1 + 1
0 0 0 0 +0 +0 + 0
1 0 0 0 - 0 -7 - 8
1 0 0 1 - 1 -6 - 7
1 0 1 0 - 2 -5 - 6
1 0 1 1 - 3 -4 - 5
1 1 0 0 - 4 -3 - 4
1 1 0 1 - 5 -2 - 3
1 1 1 0 - 6 - 1 - 2
1 1 1 1 - 7 -0 - 1
Binary, signed-integer representations.

Addition of Positive Numbers
0 1 0 1
+ 0 + 0 + 1 + 1
0 1 1 10
Carry-out
Addition of 1-bit numbers.

Addition and Subtraction – Sign Magnitude
4 0100 -4 1100
result sign bit is the
same as the operands' +3 0011 + (-3) 1011
sign
7 0111 -7 1111
when signs differ, 4 0100 -4 1100

operation is subtract,
sign of result depends -3 1011 +3 0011
on sign of number with
the larger magnitude 1 0001 -1 1001
Addition and Subtraction – 1’s Complement
4 0100 -4 1011
+3 0011 + (-3) 1100
7 0111 -7 10111
End around carry 1
1000
4 0100 -4 1011
-3 1100 +3 0011
1 10000 -1 1110
End around carry 1
0001
Addition and Subtraction – 2’s Complement
4 0100 -4 1100
+3 0011 + (-3) 1101
If carry-in to the high
order bit = 7 0111 -7 11001
carry-out then ignore
carry
if carry-in differs from 4 0100 -4 1100

carry-out then overflow
-3 1101 +3 0011
1 10001 -1 1111
Simpler addition scheme makes twos complement the most common

choice for integer number systems within digital systems
Overflow - Add two positive numbers to get a negative number or two
negative numbers to get a positive number
-1 +0 -1 +0
-2 1111 0000 +1 -2 1111 0000 +1
1110 0001 1110 0001
-3 +2 -3
1101 1101 +2
0010 0010
-4 -4
1100 0011 +3 1100 0011 +3
-5 1011 -5 1011
0100 +4 0100 +4
1010 1010
-6 0101 -6 0101
1001
+5 +5
0110 1001
0110
-7 1000 0111 +6 -7 1000 +6
0111
-8 +7 -8 +7
5 + 3 = -8 -7 - 2 = +7
Overflow
 Overflow can occur only when adding 2 numbers that have the
same sign.
 The carry out signal from the sign bit position is not a sufficient
indicator of overflow when adding signed numbers.
 To detect overflow, examine the signs of the 2 summands X & Y
and the sign of the result. When both operands X & Y have the
same sign, an overflow occurs when the sign of S is not the same
as the signs of X & Y.
Addition of Unsigned Numbers – Half Adder
x 0 0 1 1
+y +0 +1 +0 +1
c s 0 0 0 1 0 1 1 0
Carry Sum
(a) The four possible cases
Carry Sum
x y c s
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
(b) Truth table
x
s
y
x s
HA
y c
c
(c) Circuit (d) Graphical symbol

Logic specification for a stage of binary addition- Full adder.
xi yi Carry-in ci Sum s i Carry-out c i +1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x  y  c
i i i
c i +1 = y i c i + x i ci + x i y i
E xample:
X 7 0 1 1 1 xi
Carry-out Carry-in
+ Y = +6 = + 0 0 1 1 1 1 0 0 0 yi
c i+1 ci
Z 13 1 1 0 1 si
Legend for stage i

•A full adder (FA)
yi
c
i
xi
xi
yi si c
c i +1
i
ci
x
xi yi i
yi
ci + 1 Full adder ci
(FA)
s
i
(a) Logic f or a single stage

• n-bit ripple-carry adder – cascaded connection of n full adder blocks can
be used to add 2 n-bit numbers.
x y x y x y
n - 1 n - 1 1 1 0 0
c c
n - 1 1
c c
n
FA FA FA
0
s s s
n - 1 1 0
Most significant bit Least significant bit
(MSB) position (LSB) position
An n -bit r ipple-carr y adder

• kn-bit ripple-carry adder: carry signals are also useful for
interconnecting k adders to form an adder capable of handling
input numbers that are kn bits long
x y x y x y x y x y
k n - 1 k n - 1 2n - 1 2n - 1 n n n - 1 n - 1 0 0
c
n - bit n- bit n n - bit
c c
kn adder adder adder 0
s s s s s s
k n - 1 k - 1 n 2n - 1 n n - 1 0
(c) Cascade of k n-bit adders
Logic for addition of binary vectors.

Addition/subtraction logic unit
• The circuit network can be used to perform either addition or subtraction based on
the value applied to Add/Sub input control line.
• Add/Sub= 0 for addition.
• When the Add/Sub is set to 1, the Y is 1’s complemented by the XOR gates and
C0 is set to 1 to complete the 2’s complementation of Y.
• An XOR gate can be added to detect the overflow
•Overflow can only occur when the signs of the 2 operands are
the same.
•Overflow occurs if the sign of the result is different

•Therefore, a circuit to detect overflow can be added to the n-
bit adder by implementing the logic expression:
– Overflow = xn-1yn-1s’ n-1 + x’n-1y’n-1sn-1
•Overflow occurs when the carry bits Cn & Cn-1 are different.
•Therefore a simpler circuit for detecting overflow can be
obtained by implementing the logic expression Cn XOR Cn-1
with an XOR gate
Multiplication of positive numbers
•The product of two n-digit numbers can be
accommodated in 2n digits, so the product of the two
4-bit numbers fits into 8 bits.
•In the binary system,

– if the multiplier bit is 1, the multiplicand is entered
in the appropriate position to be added to the
partial product.
– If the multiplier bit is 0, then 0’s are entered

Manual Multiplication Algorithm
1 1 0 1 (13) Multiplicand M
 1 0 1 1 (11) Multiplier Q
1 1 0 1
1 1 0 1
0 0 0 0
1 1 0 1
1 0 0 0 1 1 1 1 (143) Product P
(a) Manual multiplication algorithm

Computer Organization
Array Multiplication
• Binary multiplication of positive operands can be implemented
in a combinational 2D logic array.
• The main component in each cell is a full adder FA.
• The AND gate in each cell determines whether a multiplicand
bit, mj, is added to the incoming partial product bit, based on the
value of the multiplier bit, qi.
• Each row i adds multiplicand to the incoming partial product,
PPi, to generate the outgoing partial product, PP(i+1), if qi=1.
• If qi=0, PPi is passed vertically downward unchanged.
• PP0 is all 0s, and PP4 is the desired product.
• The multiplicand is shifted left one position per row by the
diagonal signal path.
Array Multiplier
Sequential Circuit Binary Multiplier
• To perform multiplication use the adder circuitry in the ALU for

a number of sequential steps,
• The circuit performs multiplication by using a single n-bit adder

n times to implement the spatial addition performed by the n
rows of ripple carry adders.
• Register A & Q combined hold PPi while multiplier bit qi

generates the signal Add/Noadd.
• This signal controls the addition of the multiplicand, M to PPi to

generate PP(i+1).
• The product is computed in n cycles.

•The partial product grows in length by one bit per cycle
from the initial vector PP0, of n 0s in A.
•The carry out from the adders is stored in flip flop C.

•At the start, the multiplier is loaded into Q, multiplicand
into M and C & A are cleared to 0.
•At the end of each cycle, C, A & Q are shifted right one bit
position to allow for growth of the partial product as
multiplier is shifted out of Q.
• Because of this shifting, multiplier bit qi appears at the LSB of Q
to generate Add/Noadd signal at the correct time, starting with
q0 during first cycle, q1 during the second cycle, and so on.
• After they are used, the multiplier bits are discarded by the right
shift operation.
• The carry out from the adder is the leftmost bit of PP(i+1), and it
must be held in the C to be shifted right with the contents of A &
Q.
• After n cycles, the high order half of the product is in A and the
low order half is in Q
Iteration Step Multiplier Multiplicand Product
0 Initial values 0011 0000 0010 0000 0000
1a: 1=>Prod=Prod + Mcand 0011 0000 0010 0000 0010

1
2: Shift left Muliplicand 0011 0000 0100 0000 0010
3:Shift right Multiplier 0001 0000 0100 0000 0010
1a: 1=>Prod=Prod + Mcand 0001 0000 0100 0000 0110
2
1:0=>no operation 0000 0000 1000 0000 0110
3
1:0=>no operation 0000 0001 0000 0000 0110
4
Second Version
Iteration Step Multiplicand Product
0 Initial values 0010 0000 0011
1 1a:1=>Prod=Prod + Mcand 0010 0010 0011

2: Shift right Product 0001 0001
2 1a: 1=>Prod=Prod + Mcand 0010 0011 0001

3 1:0=>no operation 0010 0001 1000

4 1:0=>no operation 0010 0000 1100

Sequential Circuit Binary Multiplier
Signed Operand Multiplication
• Consider the case of a positive multiplier and a negative

multiplicand.
• When we add a negative multiplicand to a partial product, we

must extend the sign bit value of the multiplicand to the left as
far as the product will extend.
• The hardware used for the multiplication of positive numbers can

be used for negative multiplication if it provides for sign
extension of the partial products.
• For eg:, -13 (5 bit signed operand) * +11 to get the 10 bit product,
-143.
Signed Multiplication
1 0 0 1 1  - 13
0 1 0 1 1 ( + 11)
1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 0 0 1 1
Sign extension is
shown in blue 0 0 0 0 0 0 0 0
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 0 1 1 1 0 0 0 1  - 143
Sign extension of negative multiplicand.

Signed Multiplication
• For a negative multiplier, form the 2’s-complement of both the

multiplier and the multiplicand and proceed as in the case of a
positive multiplier.
• This is possible because complementation of both operands does

not change the value or the sign of the product.
•A technique that works equally well for both negative and

positive multipliers – Booth algorithm.
Booth’s Algorithm
• Powerful direct algorithm to perform signed number multiplication.
• The algorithm is based on the fact that any binary number can be
represented by the sum and difference of other binary numbers.
• It operates on the fact that strings of 0’s in the multiplier require no

addition but just shifting, and a string of 1’s in the multiplier from bit
weight 2k to weight 2m can be treated as 2k+1 – 2m
• Features
• It handles both + ve & - ve numbers uniformly.
• It achieves some efficiency in the number of additions required,

when the multiplier has a few large blocks of 1’s.
Multiply 7*3

0 Initial values 0111 0000 00110
1 1a: 10=>Prod=Prod - Mcand 0111 1001 00110
2: Arithmetic Shift right Product
1100 10011
2 1:11=>no operation 0111 1100 10011
2: Arithmetic Shift right Product 1110 01001
3 1:01=> Prod=Prod + Mcand 0111 0101 01001

4 1:00=>no operation 0111 0010 10100

Result =(0001 0101)2 = 21
Multiply 7*-3

0 Initial values 0111 0000 1101 0
1: 1a: 10=>Prod=Prod - Mcand 1001 1101 0
1 0111
2:Arithmetic Shift right Product 1100 11101
2 1a: 01=>Prod=Prod +Mcand 0111 0011 11101

3 1: 1a: 10=>Prod=Prod - Mcand 0111 1010 11110

4 1:11=>no operation 0111 1101 01111

21 = 0001 0101
-21 = 1s complement of 21 + 1 = 1110 1010 + 1 = 1110 1011
Result =(1110 1011)2 = - 21
Booth Algorithm
Multiplier
version of multiplicand
selected by bit i
Bit i Bit i -1
0 0 0xM
0 1 +1x M
1 0 1 × M
1 1 0xM
Booth multiplier recoding table.

Booth Algorithm
• In the Booth scheme, -1 times the shifted multiplicand is selected
when moving from 0 to 1, and +1 times the shifted multiplicand is
selected when moving from 1 to 0, as the multiplier is scanned
from right to left.
0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0
0 +1 - 1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1 +1 0 - 1 0 0
Booth recoding of a multiplier.
• Booth algorithm can be extended to any number of blocks
of 1s in a multiplier, including the situation in which a
single 1 is considered a block.
• The case when the LSB of the multiplier is 1 is handled by
assuming that an implied 0 lies to its right.
Booth Algorithm
• Consider in a multiplication, the multiplier is positive 0011110,

how many appropriately shifted versions of the multiplicand
are added in a standard procedure?
0 1 0 1 1 0 1
0 0 +1 +1 + 1 +1 0
0 0 0 0 0 0 0
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 1 0
Booth Algorithm
•Since 0011110 = 0100000 – 0000010, if we use the

expression to the right, what will happen?
0 1 0 1 1 0 1
0 +1 0 0 0 -1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2's complement of
1 1 1 1 1 1 1 0 1 0 0 1 1 the multiplicand
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 1 0 1
0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 1 0
Booth Algorithm
0 1 1 0 1 ( + 13 ) 0 1 1 0 1
 1 1 0 1 0 - 6 0 - 1 +1 - 1 0
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0  - 78
Booth multiplication with a negative multiplier.

Booth Algorithm
• Best case – a long string of 1’s (skipping over 1s)
• Worst case – 0’s and 1’s are alternating
Worst-case
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
multiplier
+1 -1 +1 -1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1
Ordinary
1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0
multiplier
0 -1 0 0 +1 - 1 +1 0 - 1 +1 0 0 0 - 1 0 0
0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
Good
multiplier
0 0 0 +1 0 0 0 0 - 1 0 0 0 +1 0 0 - 1
•The transformation 011….110 to 100….0-10 is called
skipping over 1s.
•Only a few versions of the shifted multiplicand must

be added to generate the product, thus speeding up
the multiplication operation.
•However in worst case, that is of alternate 1s and 0s in

the multiplier, each bit of the multiplier selects a
summand.
Hardware for Booth’s Algorithm
10111
Binary Division by Shift and Subtract
consider the pencil-and-paper method for dividing the byte 10010011
by the nibble 1011:

Binary Division by Shift and Subtract
•Basically the reverse of the mutliply by shift and add.

Set quotient to 0
Align leftmost digits in dividend and divisor
Repeat
If that portion of the dividend above the divisor is > or equal to the divisor
Then subtract divisor from that portion of the dividend and
Concatentate 1 to the right hand end of the quotient
Else concatentate 0 to the right hand end of the quotient
Shift the divisor one place right
Until dividend is less than the divisor
quotient is correct, dividend is remainder
STOP
Restoring Division- Implementation of longhand Method
1. The required subtractions are facilitated by

using 2’s compliment arithmetic.
2. The extra bit position at the left end of both A and M
accomodate the sign bit during the subtractions
Circuit arrangement for binary division
Status of the registers
At the begining of the operarion.
 n bit positive devisor is loaded into register M
 n bit positive dividend is loaded into register Q
After the division is complete:
 N bit quotient is in register Q
 Remainder is in Register A
Algorithm for restoring division
Do the following n times:

1. Shift A and Q left one binary position.
2. Subtract M from A and place the answer back in A
3. If the sign of A is 1, set q0 to 0 and add M back to A
(ie. restore A); otherwise set q0 to 1.

The unsigned numbers division algorithm with restoring method
Restoring division Example
Nonrestoring Division

The restoring algorithm can be improved by avoiding the need for
restoring A after an unsuccessful subtraction.

In restoring method,
If A is positive, we shift left and subtract M. i.e. we perform 2A-M.
If A is negative, we restore it by performing A + M,
then we shift it left and Subtract M.

This is equivalent to performing 2A+M
Non-Restoring division Example
Nonrestoring Division algorithm
Step1: Do the following n times.
1. If the sign of A is 0, shift A & Q left one position and
subtract M from A; otherwise shift A and Q left and add M
to A.
2. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0
Step2: If the sign of A is 1, add M to A.
Note: Step 2 is needed to leave the proper remainder in A

at the end of the n cycles of step1.
Logic circuit is same as in restoring
division
BCD
Every four bits represent one decimal digit

BCD Addition
Two errors will occurs in a standard binary adder.

- The result is not a valid BCD digit.
(Note: 4-bit values above binary 9 (1001) are not used in BCD)
- A valid BCD digit, but not the correct result.
Solution: You need to add 6 to the result generated by a binary adder.
BCD Subtraction
The nine’s complement in BCD, generated by subtracting the value to be complemented
from another value that has all 9S as its digits.
Adding one to this value produces the ten’s complement, the negative of the original
value.
e.g, the nine’s complement (631) is 999 – 631 = 368

then ten’s complement (631) is 368 + 1 = 369
BCD Multiplication (Shift- Add)
• Decimal Multiplication of BCD
• B is Multiplicand
• Q Multiplier, QL Last digit of the multiplier
• Bs and Qs are sign bit of multiplier and Multiplicand
• Ae Carry bit
• K is the number of digits in the multiplier
• Result will be in AeAQ
BCD Multiplication
Scientific Notation
Floating point numbers
 Used to represent real numbers
 Very similar to scientific notation
3.5×106, 0.82×10–5, 75×106, …
 Both decimal numbers in scientific notation and floating point
numbers can be normalized:
3.5×106, 8.2×10–6, 7.5×107, …
Normalizing binary numbers
• 0.1 becomes 1.0×2-1
• 0.01 becomes 1.0×2-2
• 0.11 becomes 1.1×2-1
• 1.1 is already normalized and equal to1.1×20

• 10.01 becomes 1.001×21
• 11.11 becomes 1______×2_____
Representation
• Sign + exponent + coefficient
SExp Coefficient
IEEE Standard 754
– 1 + 8 + 23 = 32 bits
– 1+ 11 + 52 = 64 bits (double precision)
The sign bit
• 0 indicates a positive number

• 1 a negative number
The exponent (I)
– 8 bits for single precision
– 11 bits for double precision
• With 8 bits, we can represent exponents
between -126 and + 127
–
The exponent (II)
• Exponents are represented using a biased

notation
– Stored value = actual exponent + bias
• For 8 bit exponents, bias is 127
– Stored value of 1 corresponds to –126
– Stored value of 254 corresponds to +127
The exponent (III)
• Biased notation simplifies comparisons:

• If two normalized floating point numbers have
different exponents, the one with the bigger
exponent is the bigger of the two
The coefficient
Also known as fraction or significand
Most significant bit is always one
Implicit and not represented
example
• Represent 7:
– Convert to binary: 111
– Normalize: 1.11×22
– Sign bit is 0
– Biased exponent is 127 + 2 = 12910=10000001two
Coefficient is 1100…0
–
example
• Represent –2
– Convert to binary: 10
– Normalize: 1.0×21
– Sign bit is 1
– Biased exponent is 127 + 1 = 10000000two
– Coefficient is 00…0
–
Floating Point Addition
• 5.25×103 + 1.22×102 = ?
• Denormalize number with smaller
exponent:
• 5.25×103 + 0.122×103
• Add the numbers:
• 5.25×103 + 0.122×103 = 5.372×103
• Result is normalized
Decimal floating point addition
• 9.25×103 + 8.22×102 = ?
• Denormalize number with smaller
exponent:
• 9.25×103 + 0.822×103
• 9.25×103 + 0.822×103 = 10.072×103
• Normalize the result:
• 10.072×103 = 1.0072×104
Binary floating point addition
• Say 1.01×22 + 1.1×21
• Denormalize number with smaller exponent:
• 1.01×22 + 0.11×22
• 1.01×22 + 0.11×22 = 10.00×22
• Normalize the results
• 10.00×22 = 1.000×23
Binary floating point subtraction
• 1.01×22 – 1.1×21
• Denormalize number with smaller exponent:
• 1.01×22 – 0.11×22
• Perform the subtraction:
• 1.01×22 – 0.11×22 = 0.10×22
• Normalize the results
• 0.10×22 = 1.0×21
Add/Subtract Rule/Algorithm
1. Choose the number with the smaller exponent and shift its mantissa right a
number of steps equal to the diffrence in exponents.
2. Set the exponent of the result equal to the larger exponents.
3. perform addition /subtraction on the mantissa and determine the sign of the
result.
4. Normalise the resulting value if necessary.

Registers used with Floating Point Numbers
Floating Point Multiplication
FP Multiplication Example
Example of Floating Point Multiplication
Multiply 0.5 and –0.4375 (both base 10) to give
-0.21875ten = 0.00111two
From before: (1.000 x 2(-1+127)) x (-1.110 x 2(-2+127)) using biased exponent.
Adding exponents (and dropping the extra bias): 126+125-127 = 124
Multiply mantissas using a previously described multiply algorithm:
1.110 x 1.000 = 1.110000
Yielding: 1.110000 x 2124 = 1.110 x 2124 keeping to 4 bits
Product is already normalized and no overflow since 1  124  254
Rounding makes no change
Signs of operands differ, hence answer is negative: -1.110 x 2-3
Converting to decimal: -1.110 x 2-3 = -0.001110 = -0.21875ten
Floating Point Division
Floating Point Division
AC ←BR * QR
truncate low order bits

Computer Organization and Architecture

Uploaded by

Copyright:

Available Formats

Computer Organization and Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Organization and Architecture

Uploaded by

Copyright:

Available Formats

Module 2

Computer Organization and Architecture, Carl

• Fundamental concepts Algorithms for multiplication and

division of binary and BCD numbers

2) Read data from one or more registers.

• Assuming that the memory is byte addressable,

• Carry out the actions specified by the instruction in the IR

based on the program being executed.

control signals in a predefined sequence governed by the HW structure of the

– Internal: logic required to perform the sequencing and execution functions

• Outputs from the control logic block are connected to:

– ALU, to determine which operation is to be performed.

– Select input of multiplexer MUX to select between Register Y and constant 4.

– Control lines of the registers, to select the registers.

• Disadvantage: Only one data-word can be

• Riin=1  data on bus is loaded into Ri.

• Riout=1  content of Ri is placed on bus.

• Riout=0,  bus can be used for transferring data from other

Input and output gating for the registers

• MAR register has 2 control-signals.

• At the same time, processor issues Read signal on control-lines of memory-bus.

• To accommodate this, MFC is used. (MFC  Memory Function Completed).

• MFC is a signal sent from addressed-device to the processor.

• Consider the instruction Move R2,(R1). This

• ALU performs arithmetic operations on 2 operands

• Result (produced by ALU) is stored temporarily in register Z.

into memory location A.

• A single instruction cycle with the following steps occurs:

– Read the contents of memory location A into the processor

– Add the two values.

• Virtually all computers provide a mechanism by

• Consider the instruction Add (R3),R1 which adds the

– Fetch the first operand (the contents of the memory location

– Load the result into R1

1 PC out , MAR in , Read,Select4,Add, Zin Memory

2 Zout , PC in , Y in , WMF C MDR

4 R3out , MAR in , Read Y

6 MDR out , SelectY,Add, Zin Select MUX

7 Zout , R1in , End Add

Figure 7.1. Single-bus organization of the datapath inside a processor.

→ loading contents of PC into MAR &

→ sending a Read request to memory.

Step2--> Updated value in Z is moved to PC. This completes the PC

increment operation and PC will now point to next instruction.

Step5--> Contents of R1 are transferred to Y to prepare for addition.

Step6--> When Read operation is completed, memory-operand is

available in MDR, and the addition is performed.

Step7--> Sum is stored in Z, then transferred to R1.The End signal

causes a new instruction fetch cycle to begin by returning to step1.

• The offset X is usually the difference between the branch

1 PCout , MAR in , Read,Select4,Add, Zin

Figure 7.7. Control sequence for an unconditional branch instruction.

• Step 4--> The offset-value is extracted from IR by instruction-decoding

circuit. Since the updated value of PC is already available in register Y, the

offset X is gated onto the bus, and an addition operation is performed.

fetch the next instruction from the branch target address.

• Branch target address is usually obtained by adding offset in contents of PC.

the address immediately following the branch instruction.

• In case of conditional branch,