Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Advanced Linux Programming

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Computer Organization

CS1403
Pipelined Data-Path

Mayank Pandey, MNNIT, Allahabad, India


Pipelining
• Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Not pipelined
B

Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order

A Pipelined
B

D
Pipelined vs. Single-Cycle
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns

Assume 2 ns for memory access, ALU operation; 1 ns for register access:


therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access

Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access

2 ns 2 ns 2 ns 2 ns 2 ns
Pipelining: Keep in Mind
• Pipelining does not reduce latency of a single task,
it increases throughput of entire workload
• Pipeline rate limited by longest stage
– potential speedup = number pipe stages
– unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces speedup
Pipelining MIPS
• What makes it easy with MIPS?
– all instructions are same length
• so fetch and decode stages are similar for all instructions
– just a few instruction formats
• simplifies instruction decode and makes it possible in one stage
– memory operands appear only in load/stores
• so memory access can be deferred to exactly one later stage
– operands are aligned in memory
• one data transfer instruction requires one memory access stage
Pipelining MIPS
• What makes it hard?
– structural hazards: different instructions, at different stages, in the
pipeline want to use the same hardware resource
– control hazards: succeeding instruction, to put into pipeline, depends
on the outcome of a previous branch instruction, already in pipeline
– data hazards: an instruction in the pipeline requires data to be
computed by a previous instruction still in the pipeline

• Before actually building the pipelined datapath and control


we first briefly examine these potential hazards individually…
Structural Hazards
• Structural hazard: inadequate hardware to simultaneously support all
instructions in the pipeline in the same clock cycle
• E.g., suppose single – not separate – instruction and data memory in
pipeline below with one read port
– then a structural hazard between first and fourth lw instructions
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
Hazard if single memory
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

• MIPS was designed to be pipelined: structural hazards are easy to


avoid!
Control Hazards
• Control hazard: need to make a decision based on the result of a previous
instruction still executing in pipeline
• Solution 1 Stall the pipeline

Program
execution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg Note that branch outcome is
Instruction Data computed in ID stage with
beq $1, $2, 40 Reg ALU Reg
2ns fetch access
added hardware (later…)
Instruction Data
lw $3, 300($0) bubble Reg ALU Reg
fetch access

4 ns 2ns

Pipeline stall
Control Hazards
• Solution 2 Predict branch outcome
– e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg

Instruction Data
beq $1, $2, 40 Reg ALU Reg
2 ns fetch access

Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access

Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access

Instruction Data
beq $1, $2, 40 Reg ALU Reg
fetch access
2 ns
bubble bubble bubble bubble bubble

Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
Prediction failure: undo (=flush) lw
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay –
compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome
MIPS does this

1/22/2019 Mayank Pandey, MNNIT, Allahabad, India 10


Data Hazards
• Data hazard: instruction needs data from the result of a previous
instruction still executing in pipeline
• Solution Forward data if possible…

2 4 6 8 10
Time

IF ID EX
Instruction pipeline diagram:
add $s0, $t0, $t1 MEM WB
shade indicates use –
left=write, right=read

Program
execution 2 4 6 8 10
order Time
(in instructions)
Without forwarding – blue line
add $s0, $t0, $t1 IF ID EX MEM WB
– data has to go back in time;
with forwarding – red line
sub $t2, $s0, $t3 IF ID EX MEM WB
– data is available in time

Mayank Pandey, MNNIT, Allahabad, India


Data Hazards
• Forwarding may not be enough
– e.g., if an R-type instruction following a load uses the result of the load –
called load-use data hazard
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB Without a stall it is impossible


to provide input to the sub
sub $t2, $s0, $t3 IF ID EX MEM WB instruction in time

2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB With a one-stage stall, forwarding


can get the data to the sub
bubble bubble bubble bubble bubble instruction in time
sub $t2, $s0, $t3 IF ID EX MEM WB
Reordering Code to Avoid Pipeline Stall
• Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)

• Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)
Pipelined Datapath
• We now move to actually building a pipelined datapath
• First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
• Review: single-cycle processor
– all 5 steps done in a single clock cycle
– dedicated hardware required for each step

• What happens if we break the execution into multiple cycles, but keep
the extra hardware?
Review - Single-Cycle Data-path “Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data RD M

16
E
X 32
Memory U
X
T WD
N
D

EX
Execute/ Address
IF ID Calc. MEM WB
Instruction Fetch Instruction Decode Memory Access Write Back
Pipelined Datapath – Key Idea
• What happens if we break the execution into multiple cycles,
but keep the extra hardware?
– Answer: We may be able to start executing a new instruction at each
clock cycle - pipelining
• …but we shall need extra registers to hold data between
cycles – pipeline registers
Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 32
Instruction 16 5 5 5

Memory RN1 RN2 WN


RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD

4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
RN1 RN2 WN
Memory
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D

IF/ID ID/EX EX/MEM MEM/WB


Bug in the Datapath

IF/ID ID/EX EX/MEM MEM/WB


ADD

4 ADD

PC Instruction I <<2
ADDR RD
32 16 32
5 5 5
Instruction
RN1 RN2 WN
Memory RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data RD M

16
E
X 32
Memory U
X
T WD
N
D

Write register number comes from another later instruction!


Corrected Datapath
IF/ID ID/EX EX/MEM MEM/WB
ADD
ADD
4 64 bits 133 bits
<<2 102 bits 69 bits
PC
ADDR RD 5
RN1 RD1
32
Zero
Instruction 5
RN2 ALU
Memory Register
WN
5 File RD2 M
WD U ADDR
X
Data
E RD M

16 X 32
Memory U
X
T WD
N
5 D

Destination register number is also passed through ID/EX, EX/MEM


and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
• Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram: Clock Cycle 1
LW
Single-Clock-Cycle Diagram: Clock Cycle 2
SW LW
Single-Clock-Cycle Diagram: Clock Cycle 3
ADD SW LW
Single-Clock-Cycle Diagram: Clock Cycle 4
SUB ADD SW LW
Single-Clock-Cycle Diagram: Clock Cycle 5
SUB ADD SW LW
Single-Clock-Cycle Diagram: Clock Cycle 6
SUB ADD SW
Single-Clock-Cycle Diagram: Clock Cycle 7
SUB ADD
Single-Clock-Cycle Diagram: Clock Cycle 8
SUB
Alternative View – Multiple-Clock-Cycle Diagram

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8
Time axis
IM REG ALU DM REG
lw $t0, 10($t1)

IM REG ALU DM REG


sw $t3, 20($t4)

add $t5, $t6, $t7 IM REG ALU DM REG

sub $t8, $t9, $t10 IM REG ALU DM REG


Notes
• One significant difference in the execution of an R-type instruction
between multi-cycle and pipelined implementations:
– register write-back for the R-type instruction is the 5th (the last write-
back) pipeline stage vs. the 4th stage for the multi-cycle
implementation. Why?
– think of structural hazards when writing to the register file…
• Worth repeating: the essential difference between the pipeline and
multi-cycle implementations is the insertion of pipeline registers to
decouple the 5 stages
• The CPI of an ideal pipeline (no stalls) is 1. Why?

You might also like