Chapter7 - Basic Processing Unit 1
Chapter7 - Basic Processing Unit 1
Overview
Instruction Set Processor (ISP)
Central Processing Unit (CPU)
PC
Instruction
Address
decoder and
lines
MDR HAS MAR control logic
TWO INPUTS Memory
AND TWO bus
OUTPUTS MDR
Data
lines IR
Datapath
Y
Constant 4 R0
Select MUX
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
Z
Textbook Page 413
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Register Transfers
All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure
Figure7.3. Inputand
7.3. Input andoutput
outputgating
gatingforfor one
one registerbit.
register bit.
Performing an Arithmetic or
Logic Operation
The ALU is a combinational circuit that has no
internal storage.
ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Fetching a Word from Memory
Address into MAR; issue Read operation; data into MDR.
Timing Clock
MR
MDRinE
Data
Ri
Ri out
Yin
Constant 4
Select MUX
A B
ALU
Z in
Z out
Add (R3), R1
Control signals
PC
Instruction
Step Action Address
decoder and
lines
MAR control logic
Constant 4 R0
5 R1out , Yin , WMF C
6 MDR out , SelectY,Add, Zin Select MUX
terms of throughput.
Pipelined organization requires sophisticated
compilation techniques.
Basic Concepts
Making the Execution of
Programs Faster
Use faster circuit technology to build the
processor and the main memory.
Arrange the hardware so that more than one
Laundry Example
Ann, Brian, Cathy, Dave
Time
30 40 20 30 40 20 30 40 20 30 40 20
Sequential laundry takes 6
A hours for 4 loads
If they learned pipelining, how
D
6 PM 7 8 9 10 11 Midnight
Time
T
a 30 40 40 40 40 20
s
k A
Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
Pipelining doesn’t help latency
6 PM 7 8 9 of single task, it helps
throughput of entire workload
Time Pipeline rate limited by slowest
T pipeline stage
a 30 40 40 40 40 20
Multiple tasks operating
s simultaneously using different
A
k resources
Potential speedup = Number
pipe stages
O B
Unbalanced lengths of pipe
r
stages reduces speedup
d C Time to “fill” pipeline and time
e to “drain” it reduces speedup
r Stall for Dependences
D
Use the Idea of Pipelining in a
Computer
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit
T ime
Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E 1 W 1
F D E W
I2 2 2 2 2
F D E W
I 3 3 3 3
3
F D E W
I4 4 4 4 4
Interstage b uf fers
D : Decode
F : Fetch instruction E: Ex ecute W : Write
instruction and fetch operation
operands results
B1 B2 B3
A 4-stage pipeline.
Role of Cache Memory
Each pipeline stage is expected to complete in one clock
cycle.
The clock period should be long enough to let the slowest
pipeline stage to complete.
Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched from
main memory, pipeline is almost useless.
Fortunately, we have cache.
Pipeline Performance
The potential increase in performance resulting
from pipelining is proportional to the number of
pipeline stages.
However, this increase would be achieved only if
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure: Ef fect of an ex ecution operation taking more than one clock c ycle.