Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Slide 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Computer Architecture

Design, Analysis, Execution and Optimization of Instructions


Datapath & CU for Pipelined Microprocessor: MIPS
2

Objectives
•Design the processor such that
•Clock period (T) should be lesser than a
single-cycle processor [similar to a
multi-cycle one]
• IPC (1/CPI) should be 1
2
Comparison of Single & Multi-Cycle MIPS Processor
4
Problems of Multi-cycle Processor
• The fundamental problem
• Split the slowest instruction, lw, 5-steps
• The processor’s clock cycle time does not improve 5-times, 185 ps
• The steps take unequal length of time
• Only one stage is busy and the remaining stages are idle
• 5-non-architectural registers and an additional multiplexer
Multi-cycle Single-cycle
Instructions
(Clock-cycle) (Clock-cycle) Single Cycle: Non-shared FUs, CPI =1 or IPC=1, clock period (Tsingle)= slowest
LW 5 1 instr. in ISA
SW 4 1 Multi-cycle: Shared FUs, CPI > 1 or IPC <1, clock period: Tmulti < Tsingle
R-type 4 1
BEQ 3 1 Can we have a microprocessor like: IPC=1 & clock period [< Tmulti <
ADDI 4 1 Tsingle]?
J 3 1 Cycles Per Instruction (CPI) Program Execution time: #instr. x CPI x Clk (T)
CPI >1 1
Instructions Per Cycle/Seconds (IPC) = 1/CPI
Lesser than More than
CLK Period: T
Single-cycle Multi-Cycle
5
Problems of Multi-cycle Processor
Only one stage is busy and remaining stages are idle at anytime

Book- P&H-COD
3
Pipeline in a Chemical Plant

Additive

Steam

Water

Filter Mixer

Boiler
7
Pipeline in the Instruction Execution

Memory Words Results


Instruction Instruction Instruction
Fetch Decode Execution
[Stage-1] [Stage-2] [Stage-3]

Stage-1 1 2 3 4 5
Stage-2 1 2 3 4
Stage-3 1 2 3

Time

What is the difference in this Analogical or Parallel reasoning?


8
Pipelined MIPS-based processor
• Partitioning the Instruction Executional cycle (function)
• Subfunctions
• Input of one subfunction TOTALLY comes from output of
previous subfunctions
• Other than inputs & outputs, there are no interrelationships between
subfunctions
• Hardware may be developed (stage) to execute each subfunction
• Each hardware units’ evaluations are usually approximately equal
9
Pipelined MIPS-based processor
•Powerful way to improve the throughput • Partitioning the Instruction
Executional cycle (function)
•Divide the single-cycle implementation • Subfunctions
• Input of one subfunction
• Fetch TOTALLY comes from output of
• Decode previous subfunctions
• Other than inputs & outputs,
• Execute there are no interrelationships
between subfunctions
• Memory • Hardware may be developed
• Writeback (stage) to execute each
subfunction
• A commercial MIPS processor: R2000/R3000 • Each hardware units’
evaluations are usually
approximately equal

Latency of each instructions is unchanged, but throughput is ideally 5-times better


10

Pipelined MIPS-based processor


•Stage elements
• Reading & writing the memory
• Register file
• ALU operation
•Each stage takes almost same amount of time
• Consists of one element
11

Comparison of timing diagram


• Delay of the elements Element Parameter Delay (ps)

Register clk-to-Q Tpcq 30

Register setup Tsetup 20

Multiplexer Tmux 25

ALU TALU 200

Memory read Tmem 250

Register file read tRFread 150

Register file write tRWrite 100

Register file setup tRFsetup 20


Comparison of timing diagram 12

• Delay of MUX & register is not included

Timing diagram of (a) single-cycle processor (b) pipelined processor


Book- P&H-COD
13
Comparison of timings

•Single-cycle processor •Pipelined processor


• Instruction latency is 950 ps • Length of pipeline stage is 250
• Throughput 1 instruction ps (mem. access)
per 950 ps • Instruction latency is 5*250 =
• 1.05 billions instruction per 1250 ps
second • Throughput 1 instruction per
250 ps
• 4 billions instructions per
seconds
14
A view of pipeline in operation
• Resource utilization

Book- P&H-COD
Delay elements and stage registers 15

IF_ID ID_EXE EXE_MEM MEM_WB

P
C IM RF DM
ALU

250 PS 150 PS 200 PS 250 PS

Delay values are from the previous table.


Datapath for R-type: ADD R1, R2, R3 16
op rs rt rd shamt funct
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM

ALU
P IM RF
C
Datapath for R-type: ADD R1, R2, R3 17
op rs rt rd shamt funct
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM

ALU
P IM RF
C

Reg. File’s write operation @posedge


&
Stage Reg.’s write operation @negedge
Datapath for B-type: BEQ R1, R3, offset 18
op rs rt offset Control Hazards:
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0) when BEQ is in
3rd stage, which
IF_ID ID_EXE EXE_MEM
instruction will
be in stage-1 and
2?

P IM RF
C
ALU
Datapath for J-type: J Offset 19
op address
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID

P
IM
C
Datapath for I-type: LW R1, #5( R3) 20
op rs rt Offset
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM MEM_WB

P
C IM RF DM
ALU
Datapath for I-type: SW R1, #5( R3) 21
op rs rt Offset
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM MEM_WB

P
C IM RF DM
ALU
Datapath for LW R1, #5( R3) & R-type 22
op rs rt rd shamt funct
6-bits(31-26) 5-bits(25-21) 5-bits(20-16) 5-bits(15-11) 5-bits(10-6) 6-bits (5-0)
IF_ID ID_EXE EXE_MEM MEM_WB

P
C IM RF DM
ALU
23

Combined Datapath
•Stages
•Insert multiplexer
•Stage registers
•Union of registers added for each instruction
Combined Datapath 24

Jump

Book- P&H-COD
25

Control unit for Pipelined MIPS processor


• Identify the control signals
• Jump
• RegDst
• RegWrite Fetch Decode Execute Memory Write Back
• ALUSrc jump ALUSrc Branch MemtoReg
• Branch ALUOp MemRead RegWrite
• ALUOp
RegDst MemWrite
• MemRead
• MemWrite
• MemtoReg
• How does one generate the control signals?
• We want to add a minerals liquid with the water flowing through a pipeline after
every 1 KM
• How?
Control generation: Single-cycle 26

Instr. Jump RegDst RegWrite ALUSrc Branch ALUOp1 ALUOp0 MemRead MemWrite MemtoReg

R-type 0 1 1 0 0 1 0 0 0 0
lw 0 0 1 1 0 0 0 1 0 1
sw 0 x 0 1 0 0 0 0 1 x
addi 0 0 1 1 0 0 0 0 0 0
B-type 0 x 0 0 1 0 1 0 0 x
J-type 1 x 0 x x x x 0 0 x
Control generation: Multi-cycle 27
Fetch Decode Execute Memory Write Back
Starting State jump ALUSrc Branch MemtoReg
ALUOp MemRead RegWrite
IF ID
RegDst MemWrite
(T0) (T1)
J
ADD LW ADDI BNE
SW
EXE EXE EXE
EXE (T8)
(T6) (T2) ADDI (T10)
LW
SW
ADD

MEM MEM MEM MEM


(T7) (T3) (T5) (T9)
LW

WB
(T4)
Control generation: Strategy - 1 28

Instr Instr Instr Instr

P
IM
C

IF_ID ID_EXE EXE_MEM MEM_WB


Control generation: Strategy - 2 29

Book- P&H-COD
30

Control unit for Pipelined MIPS processor


Instr Execution/Address Calc stage control lines Memory access stage control Write-back control
lines lines
Instr Jump RegDs ALUOp ALUOp ALUSrc Branch MemRea MemWrit RegWrit MemtoRe
t 1 0 d e e g
R-format 0 1 1 0 0 0 0 0 1 0
lw 0 0 0 0 1 0 1 0 1 1
sw 0 x 0 0 1 0 1 0 0 x
beq 0 x 0 1 0 1 0 0 0 x
Single cycle MIPS processor

Instr Jump RegDs ALUOp ALUOp ALUSrc Branc MemRea MemWrit RegWri MemtoRe
t 1 0 h d e te g
R-format 0 1 1 0 0 0 0 0 1 0
lw 0 0 0 0 1 0 1 0 1 1
sw 0 x 0 0 1 0 1 0 0 x
beq 0 x 0 1 0 1 0 0 0 x
31

Control unit for Pipelined MIPS processor


• How to generate such control signals?
• Settings the 10 control lines in each stage for each instruction
• Simplest way is same as in single cycle
• Most the controls can be generated at the same time or decoding stage

• How to manage the control signals generated for i-th instruction and
control signal will be generated for (i+1)-th instructions?
• Erroneous control signals can be generated
32

Control unit for Pipelined MIPS processor

• How to manage the control signals generated for i-th instruction and
control signal will be generated for (i+1)-th instructions?
• Erroneous control signals can be generated
• Extension of the pipeline registers for storing the control signals’ values
Pipelined Datapath & Control 33
Pipelined Control Signals
34
RegWriteD RegWriteE RegWriteM RegWriteW
MemtoRegD MemtoRegE 2 bits
Control MemtoRegM MemtoRegW
MemWriteD Regs
pipeline MemWriteE 5 MemWriteM
MemReadD bits
MemReadE MemReadM
Contr Regs
BranchD 9 bits BranchE
opcode ol BranchM
Regs
ALUOpD [1:0] ALUOpE [1:0]
Unit
ALUSrcD ALUSrcE
RegDstD RegDstE
JumpD

ALU DM
P RF Data
C pipeline
ALUDec
35

Designing Instruction Sets for Pipelining


• MIPS’s instructions are same length
• X86’s instructions vary 1 byte to 15 byte, is pipelining challenging ??
• MIPS has a few addressing modes
• Memory operands only appear in loads or stores in MIPS
• Operand are aligned in memory
Comparison 36
of
datapaths

CLK Only one instr. In the


datapath at an instant of
CL: Combinational Logic CL time

CLK

CL Only one instr.


CL CL

What if one instruction


is here.
CLK

CL CL CL CL CL

instr. #5 instr. #4 instr. #3 instr. #2 instr. #1


Microprocessor Design Trade-offs: Interconnects Vs Functional Units Vs IPC 37

• (clock) Cycle Per Instruction (CPI)


• Instructions Per (clock) Cycle/Seconds (IPC) = 1/CPI

Functional Units
Interconnects (FUs)
(Bus)
Less More
Methods/Algorithms:
Single-bus & Single-FU Single-bus & Many-FUs
Less 1) Multi-Cycle
(Multi-Cycle, IPC < 1) (Multi-Cycle, IPC < 1)
2) Single-Cycle
Many-bus & Single-FU Many-bus & Many-FUs 3) Pipelined
More
(Multi-Cycle, IPC < 1) (Single-Cycle or Pipeline, IPC = 1)

• Pipeline: IPC = 1 (borrowed from Single-Cycle) and less clock period (T)
(borrowed from Multi-Cycle), shared the Buses & FUs by more than one
instruction.
• Program Execution time: #instr. x (1/IPC) x Clk (T)

• Can we have the IPC > 1?


38

Can we have IPC > 1?

Multiple issue processor


Superscalar and VLIW
IPC = 2, Superscalar or multiple issue processor 39
IPC = 2, Superscalar or multiple issue processor 40
IPC = 2, Superscalar or multiple issue processor 41
IPC = 3, Superscalar or 3 issue processor 42

• 3-instruction can be fetched or issued


• 3-instructions can be executed in parallel (in-order superscalar execution)
Depth

Instruction-1

Instruction-2

Instruction-3

Spatial parallelism
43
Static Multiple Issue MIPS Processor
• Two issue Instruction • What we did when merged two
instr.?
• Integer ALU operations
• Integer ALU operations
• ADD, BNE, etc • ADD, BNE, etc
• Data transfer operations • Data transfer operations
• LW & SW • LW & SW

Book-COD-P&H, CH-4
Static Multiple Issue Processor 44

Book-COD-P&H, CH-4
Static Multiple Issue Processor 45

Very Long Instruction Word (VLIW)

If one instruction of the pair cannot be used, we require that it


be replaced with a nop. Thus, the instructions always issue in
pairs, possibly with a nop in one slot.

In some designs, the compiler takes full responsibility for removing all
hazards, scheduling the code and inserting no-ops so that the code
executes without any need for hazard detection or hardware-generated
stalls.

Book-COD-P&H, CH-4
Static Multiple Issue Processor 46

Figure- A static two-issue datapath.


The additions needed for double issue are highlighted: another
32 bits from instruction memory, two more read ports and one
more write port on the register file, and another ALU. Assume
the bottom ADDER handles address calculations for data Book-COD-P&H, CH-4
transfers and the top ALU handles everything else.
47

Difference between Superscalar and VLIW


•General & special Instr.
•Compiler Vs Dynamic scheduling
•Hazard detection
•Instruction format
•Different VLIW processor needs compilation of application

Book-COD-P&H, CH-4
ISA design steps 48

• Step-1:
• Find out the instructions for the Algorithm(s)
• Step-2: [Microarchitecture design]
• Find out the strategy (Sharedbus/Singlecycle/Multicycle/Pipeline[in order]/etc) for datapath
and next
• Design the datapath and its components for each instructions
How about Single-purpose
• Step-3: microprocessor like
• Design the combined datapaths for all instructions MinMax microprocessor?
• Step-4:
• Decide the clock period based on the critical path [timing analysis]
• Add setup time, clock-to-Q and etc. to the decided clock period [satisfy hold time
constraints]
• Step-5:
• Identify the control signals on the combined datapath
• Step-6:
• Design the Control Unit (H/W or S/W) for generating the such control signals based on the
strategy (Sharedbus/Singlecycle/Multicycle/Pipeline[in order]/etc) decided for datapath
• Step-7:
• Test & verification of the designed processor
49
Applying Pipeline Technique in Other
Processors
•MinMax Processor
•Recording is available in the Google-classroom

•Simple CPU
•Recording is available in the Google-classroom
50

Homework

• Design the Pipelined MIPS ISA using Verilog HDL and C++
• Convert
• MinMax microprocessor in Pipelined MinMax
• Design the Pipelined MinMax microprocessor using Verilog HDL and
C++
• How does Intel manages to run CISC-type code onto RISC-based
pipeline?
51

Summary
• Limitation of Multi-cycle approach
• CPI Vs IPC
• Comparison between single-cycle and pipelined approaches
• Views of pipeline in operation
• Comparison of datapaths
• Design tradeoffs of microprocessors
• Datapath and CU for pipelined processor

You might also like