Slide 4
Slide 4
Slide 4
Objectives
•Design the processor such that
•Clock period (T) should be lesser than a
single-cycle processor [similar to a
multi-cycle one]
• IPC (1/CPI) should be 1
2
Comparison of Single & Multi-Cycle MIPS Processor
4
Problems of Multi-cycle Processor
• The fundamental problem
• Split the slowest instruction, lw, 5-steps
• The processor’s clock cycle time does not improve 5-times, 185 ps
• The steps take unequal length of time
• Only one stage is busy and the remaining stages are idle
• 5-non-architectural registers and an additional multiplexer
Multi-cycle Single-cycle
Instructions
(Clock-cycle) (Clock-cycle) Single Cycle: Non-shared FUs, CPI =1 or IPC=1, clock period (Tsingle)= slowest
LW 5 1 instr. in ISA
SW 4 1 Multi-cycle: Shared FUs, CPI > 1 or IPC <1, clock period: Tmulti < Tsingle
R-type 4 1
BEQ 3 1 Can we have a microprocessor like: IPC=1 & clock period [< Tmulti <
ADDI 4 1 Tsingle]?
J 3 1 Cycles Per Instruction (CPI) Program Execution time: #instr. x CPI x Clk (T)
CPI >1 1
Instructions Per Cycle/Seconds (IPC) = 1/CPI
Lesser than More than
CLK Period: T
Single-cycle Multi-Cycle
5
Problems of Multi-cycle Processor
Only one stage is busy and remaining stages are idle at anytime
Book- P&H-COD
3
Pipeline in a Chemical Plant
Additive
Steam
Water
Filter Mixer
Boiler
7
Pipeline in the Instruction Execution
Stage-1 1 2 3 4 5
Stage-2 1 2 3 4
Stage-3 1 2 3
Time
Multiplexer Tmux 25
Book- P&H-COD
Delay elements and stage registers 15
P
C IM RF DM
ALU
ALU
P IM RF
C
Datapath for R-type: ADD R1, R2, R3 17
op rs rt rd shamt funct
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM
ALU
P IM RF
C
P IM RF
C
ALU
Datapath for J-type: J Offset 19
op address
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID
P
IM
C
Datapath for I-type: LW R1, #5( R3) 20
op rs rt Offset
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM MEM_WB
P
C IM RF DM
ALU
Datapath for I-type: SW R1, #5( R3) 21
op rs rt Offset
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM MEM_WB
P
C IM RF DM
ALU
Datapath for LW R1, #5( R3) & R-type 22
op rs rt rd shamt funct
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM MEM_WB
P
C IM RF DM
ALU
23
Combined Datapath
•Stages
•Insert multiplexer
•Stage registers
•Union of registers added for each instruction
Combined Datapath 24
Jump
Book- P&H-COD
25
Instr. Jump RegDst RegWrite ALUSrc Branch ALUOp1 ALUOp0 MemRead MemWrite MemtoReg
R-type 0 1 1 0 0 1 0 0 0 0
lw 0 0 1 1 0 0 0 1 0 1
sw 0 x 0 1 0 0 0 0 1 x
addi 0 0 1 1 0 0 0 0 0 0
B-type 0 x 0 0 1 0 1 0 0 x
J-type 1 x 0 x x x x 0 0 x
Control generation: Multi-cycle 27
Fetch Decode Execute Memory Write Back
Starting State jump ALUSrc Branch MemtoReg
ALUOp MemRead RegWrite
IF ID
RegDst MemWrite
(T0) (T1)
J
ADD LW ADDI BNE
SW
EXE EXE EXE
EXE (T8)
(T6) (T2) ADDI (T10)
LW
SW
ADD
WB
(T4)
Control generation: Strategy - 1 28
P
IM
C
Book- P&H-COD
30
Instr Jump RegDs ALUOp ALUOp ALUSrc Branc MemRea MemWrit RegWri MemtoRe
t 1 0 h d e te g
R-format 0 1 1 0 0 0 0 0 1 0
lw 0 0 0 0 1 0 1 0 1 1
sw 0 x 0 0 1 0 1 0 0 x
beq 0 x 0 1 0 1 0 0 0 x
31
• How to manage the control signals generated for i-th instruction and
control signal will be generated for (i+1)-th instructions?
• Erroneous control signals can be generated
32
• How to manage the control signals generated for i-th instruction and
control signal will be generated for (i+1)-th instructions?
• Erroneous control signals can be generated
• Extension of the pipeline registers for storing the control signals’ values
Pipelined Datapath & Control 33
Pipelined Control Signals
34
RegWriteD RegWriteE RegWriteM RegWriteW
MemtoRegD MemtoRegE 2 bits
Control MemtoRegM MemtoRegW
MemWriteD Regs
pipeline MemWriteE 5 MemWriteM
MemReadD bits
MemReadE MemReadM
Contr Regs
BranchD 9 bits BranchE
opcode ol BranchM
Regs
ALUOpD [1:0] ALUOpE [1:0]
Unit
ALUSrcD ALUSrcE
RegDstD RegDstE
JumpD
ALU DM
P RF Data
C pipeline
ALUDec
35
CLK
CL CL CL CL CL
Functional Units
Interconnects (FUs)
(Bus)
Less More
Methods/Algorithms:
Single-bus & Single-FU Single-bus & Many-FUs
Less 1) Multi-Cycle
(Multi-Cycle, IPC < 1) (Multi-Cycle, IPC < 1)
2) Single-Cycle
Many-bus & Single-FU Many-bus & Many-FUs 3) Pipelined
More
(Multi-Cycle, IPC < 1) (Single-Cycle or Pipeline, IPC = 1)
• Pipeline: IPC = 1 (borrowed from Single-Cycle) and less clock period (T)
(borrowed from Multi-Cycle), shared the Buses & FUs by more than one
instruction.
• Program Execution time: #instr. x (1/IPC) x Clk (T)
Instruction-1
Instruction-2
Instruction-3
Spatial parallelism
43
Static Multiple Issue MIPS Processor
• Two issue Instruction • What we did when merged two
instr.?
• Integer ALU operations
• Integer ALU operations
• ADD, BNE, etc • ADD, BNE, etc
• Data transfer operations • Data transfer operations
• LW & SW • LW & SW
Book-COD-P&H, CH-4
Static Multiple Issue Processor 44
Book-COD-P&H, CH-4
Static Multiple Issue Processor 45
In some designs, the compiler takes full responsibility for removing all
hazards, scheduling the code and inserting no-ops so that the code
executes without any need for hazard detection or hardware-generated
stalls.
Book-COD-P&H, CH-4
Static Multiple Issue Processor 46
Book-COD-P&H, CH-4
ISA design steps 48
• Step-1:
• Find out the instructions for the Algorithm(s)
• Step-2: [Microarchitecture design]
• Find out the strategy (Sharedbus/Singlecycle/Multicycle/Pipeline[in order]/etc) for datapath
and next
• Design the datapath and its components for each instructions
How about Single-purpose
• Step-3: microprocessor like
• Design the combined datapaths for all instructions MinMax microprocessor?
• Step-4:
• Decide the clock period based on the critical path [timing analysis]
• Add setup time, clock-to-Q and etc. to the decided clock period [satisfy hold time
constraints]
• Step-5:
• Identify the control signals on the combined datapath
• Step-6:
• Design the Control Unit (H/W or S/W) for generating the such control signals based on the
strategy (Sharedbus/Singlecycle/Multicycle/Pipeline[in order]/etc) decided for datapath
• Step-7:
• Test & verification of the designed processor
49
Applying Pipeline Technique in Other
Processors
•MinMax Processor
•Recording is available in the Google-classroom
•Simple CPU
•Recording is available in the Google-classroom
50
Homework
• Design the Pipelined MIPS ISA using Verilog HDL and C++
• Convert
• MinMax microprocessor in Pipelined MinMax
• Design the Pipelined MinMax microprocessor using Verilog HDL and
C++
• How does Intel manages to run CISC-type code onto RISC-based
pipeline?
51
Summary
• Limitation of Multi-cycle approach
• CPI Vs IPC
• Comparison between single-cycle and pipelined approaches
• Views of pipeline in operation
• Comparison of datapaths
• Design tradeoffs of microprocessors
• Datapath and CU for pipelined processor