Advanced Linux Programming

Computer Organization
CS1403
Pipelined Data-Path
Mayank Pandey, MNNIT, Allahabad, India

Pipelining
• Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Not pipelined
B
Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A Pipelined
B
D
Pipelined vs. Single-Cycle
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Assume 2 ns for memory access, ALU operation; 1 ns for register access:

therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Pipelining: Keep in Mind
• Pipelining does not reduce latency of a single task,
it increases throughput of entire workload
• Pipeline rate limited by longest stage
– potential speedup = number pipe stages
– unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces speedup
Pipelining MIPS
• What makes it easy with MIPS?
– all instructions are same length
• so fetch and decode stages are similar for all instructions
– just a few instruction formats
• simplifies instruction decode and makes it possible in one stage
– memory operands appear only in load/stores
• so memory access can be deferred to exactly one later stage
– operands are aligned in memory
• one data transfer instruction requires one memory access stage
Pipelining MIPS
• What makes it hard?
– structural hazards: different instructions, at different stages, in the
pipeline want to use the same hardware resource
– control hazards: succeeding instruction, to put into pipeline, depends
on the outcome of a previous branch instruction, already in pipeline
– data hazards: an instruction in the pipeline requires data to be
computed by a previous instruction still in the pipeline
• Before actually building the pipelined datapath and control

we first briefly examine these potential hazards individually…
Structural Hazards
• Structural hazard: inadequate hardware to simultaneously support all
instructions in the pipeline in the same clock cycle
• E.g., suppose single – not separate – instruction and data memory in
pipeline below with one read port
– then a structural hazard between first and fourth lw instructions
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
fetch access
Pipelined
Instruction Data
fetch access
Instruction Data
Hazard if single memory
fetch access
Instruction Data
2 ns fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
• MIPS was designed to be pipelined: structural hazards are easy to

avoid!
Control Hazards
• Control hazard: need to make a decision based on the result of a previous
instruction still executing in pipeline
• Solution 1 Stall the pipeline
Program
execution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg Note that branch outcome is
Instruction Data computed in ID stage with
beq $1, $2, 40 Reg ALU Reg
2ns fetch access
added hardware (later…)
Instruction Data
lw $3, 300($0) bubble Reg ALU Reg
fetch access
4 ns 2ns
Pipeline stall
Control Hazards
• Solution 2 Predict branch outcome
– e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg
Instruction Data
2 ns fetch access
Instruction Data
2 ns fetch access
Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access
Instruction Data
fetch access
2 ns
bubble bubble bubble bubble bubble
Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
Prediction failure: undo (=flush) lw
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay –
compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome
MIPS does this
1/22/2019 Mayank Pandey, MNNIT, Allahabad, India 10

Data Hazards
• Data hazard: instruction needs data from the result of a previous
instruction still executing in pipeline
• Solution Forward data if possible…
2 4 6 8 10
Time
IF ID EX
Instruction pipeline diagram:
add $s0, $t0, $t1 MEM WB
shade indicates use –
left=write, right=read
Program
execution 2 4 6 8 10
order Time
(in instructions)
Without forwarding – blue line
add $s0, $t0, $t1 IF ID EX MEM WB
– data has to go back in time;
with forwarding – red line
sub $t2, $s0, $t3 IF ID EX MEM WB
– data is available in time
Mayank Pandey, MNNIT, Allahabad, India

Data Hazards
• Forwarding may not be enough
– e.g., if an R-type instruction following a load uses the result of the load –
called load-use data hazard
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
lw $s0, 20($t1) IF ID EX MEM WB Without a stall it is impossible

to provide input to the sub
sub $t2, $s0, $t3 IF ID EX MEM WB instruction in time
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
lw $s0, 20($t1) IF ID EX MEM WB With a one-stage stall, forwarding

can get the data to the sub
bubble bubble bubble bubble bubble instruction in time
sub $t2, $s0, $t3 IF ID EX MEM WB
Reordering Code to Avoid Pipeline Stall
• Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)
• Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)
Pipelined Datapath
• We now move to actually building a pipelined datapath
• First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
• Review: single-cycle processor
– all 5 steps done in a single clock cycle
– dedicated hardware required for each step
• What happens if we break the execution into multiple cycles, but keep
the extra hardware?
Review - Single-Cycle Data-path “Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data RD M
16
E
X 32
Memory U
X
T WD
N
D
EX
Execute/ Address
IF ID Calc. MEM WB
Instruction Fetch Instruction Decode Memory Access Write Back
Pipelined Datapath – Key Idea
• What happens if we break the execution into multiple cycles,
but keep the extra hardware?
– Answer: We may be able to start executing a new instruction at each
clock cycle - pipelining
• …but we shall need extra registers to hold data between
cycles – pipeline registers
Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 32
Instruction 16 5 5 5
Memory RN1 RN2 WN

RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D
IF/ID ID/EX EX/MEM MEM/WB

Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
RN1 RN2 WN
Memory
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D

Bug in the Datapath

ADD
4 ADD
PC Instruction I <<2
ADDR RD
32 16 32
5 5 5
Instruction
RN1 RN2 WN
Memory RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data RD M
16
E
X 32
Memory U
X
T WD
N
D
Write register number comes from another later instruction!

Corrected Datapath
ADD
ADD
4 64 bits 133 bits
<<2 102 bits 69 bits
PC
ADDR RD 5
RN1 RD1
32
Zero
Instruction 5
RN2 ALU
Memory Register
WN
5 File RD2 M
WD U ADDR
X
Data
E RD M
16 X 32
Memory U
X
T WD
N
5 D
Destination register number is also passed through ID/EX, EX/MEM

and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
• Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram: Clock Cycle 1
LW
SW LW
ADD SW LW
SUB ADD SW LW
SUB ADD SW LW
SUB ADD SW
SUB ADD
SUB
Alternative View – Multiple-Clock-Cycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8
Time axis
IM REG ALU DM REG
lw $t0, 10($t1)
IM REG ALU DM REG

sw $t3, 20($t4)
add $t5, $t6, $t7 IM REG ALU DM REG
sub $t8, $t9, $t10 IM REG ALU DM REG

Notes
• One significant difference in the execution of an R-type instruction
between multi-cycle and pipelined implementations:
– register write-back for the R-type instruction is the 5th (the last write-
back) pipeline stage vs. the 4th stage for the multi-cycle
implementation. Why?
– think of structural hazards when writing to the register file…
• Worth repeating: the essential difference between the pipeline and
multi-cycle implementations is the insertion of pipeline registers to
decouple the 5 stages
• The CPI of an ideal pipeline (no stalls) is 1. Why?

Advanced Linux Programming

Uploaded by

Copyright:

Available Formats

Advanced Linux Programming

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Linux Programming

Uploaded by

Copyright:

Available Formats

Computer Organization

Mayank Pandey, MNNIT, Allahabad, India

Assume 2 ns for memory access, ALU operation; 1 ns for register access:

• Before actually building the pipelined datapath and control

• MIPS was designed to be pipelined: structural hazards are easy to

1/22/2019 Mayank Pandey, MNNIT, Allahabad, India 10

Mayank Pandey, MNNIT, Allahabad, India

lw $s0, 20($t1) IF ID EX MEM WB Without a stall it is impossible

lw $s0, 20($t1) IF ID EX MEM WB With a one-stage stall, forwarding

Memory RN1 RN2 WN

IF/ID ID/EX EX/MEM MEM/WB

IF/ID ID/EX EX/MEM MEM/WB

IF/ID ID/EX EX/MEM MEM/WB

Write register number comes from another later instruction!

Destination register number is also passed through ID/EX, EX/MEM

IM REG ALU DM REG

add $t5, $t6, $t7 IM REG ALU DM REG

sub $t8, $t9, $t10 IM REG ALU DM REG

You might also like