Computer Organization and Architecture
Computer Organization and Architecture
Computer Organization and Architecture
Implementing MIPS
We're ready to look at an implementation of the MIPS instruction set
Simplified to contain only
arithmetic-logic instructions: add, sub, and, or, slt
memory-reference instructions: lw, sw
control-flow instructions: beq, j
op rs rt offset I-Format
6 bits 26 bits
op address J-Format
Implementing MIPS: the Fetch/Execute
Cycle
High-level abstract view of fetch/execute implementation
use the program counter (PC) to read instruction address
fetch the instruction from memory and increment PC
use fields of the instruction to select registers to read
execute depending on the instruction
repeat…
Overview: Processor Implementation
Styles
Single Cycle
perform each instruction in 1 clock cycle
clock cycle must be long enough for slowest instruction; therefore,
disadvantage: only as fast as slowest instruction
Multi-Cycle
break fetch/execute cycle into multiple steps
perform 1 step in each clock cycle
advantage: each instruction uses only as many cycles as it needs
Pipelined
execute each instruction in multiple steps
perform 1 step / instruction in each clock cycle
process multiple instructions in parallel – assembly line
State Elements on the Datapath:
Register File
Registers are implemented with arrays of D-flipflops
Clock
5 bits
32 bits
5 bits
5 bits
32 bits
32 bits
Control signal
Datapath
Animating the Datapath
Data is either
from ALU (R-type)
or memory (load) Combining the datapaths for R-type instructions
and load/stores using two multiplexors
Animating the Datapath:
R-type Instruction
add rd,rs,rt
Animating the Datapath:
Load Instruction
lw rt,offset(rs)
Animating the Datapath:
Store Instruction
sw rt,offset(rs)
MIPS Datapath II: Single-Cycle
Separate adder as ALU operations and PC
increment occur in the same clock cycle
lw rt,offset(rs)
Datapath Executing sw
sw rt,offset(rs)
Datapath Executing beq
beq r1,r2,offset
Control
Control unit takes input from
the instruction opcode bits
Load/store
opcode rs rt address
or branch
31-26 25-21 20-16 15-0
New multiplexor
Adding control to the MIPS Datapath III (and a new multiplexor to select field to
specify destination register): what are the functions of the 9 control signals?
Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory
MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal
PCSrc cannot be
set directly from the
opcode: zero test
outcome is required
Datapath with
Control II (cont.)
Determining control signals for the MIPS datapath based on instruction opcode
Control Signals:
R-Type Instruction
0
Value depends on
??? funct
1
0
0
1
Control signals 0
shown in blue 0
Control Signals:
lw Instruction
0
010
0
0
1
1
Control signals 1
shown in blue 1
Control Signals:
sw Instruction
0
010
X
1
X
0
Control signals 1
shown in blue 0
Control Signals:
beq Instruction
1 if Zero=1
110
X
0
X
0
Control signals 0
shown in blue 0
Datapath with Control III
Jump opcode address
31-26 Composing jump 25-0 New multiplexor with additional
target address control bit Jump
MIPS datapath extended to jumps: control unit generates new Jump control bit
Datapath Executing j
R-type Instruction: Step 1
add $t1, $t2, $t3 (active = bold)
Op3 0 0 1 0
Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
RegDst 1 0 x x
ALUSrc 0 1 1 0
MemtoReg 0 1 x x
RegWrite 1 1 0 0
Outputs
MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1 Main control PLA (programmable
ALUOp1 1 0 0 0 logic array): principle underlying
ALUOP2 0 0 0 1 PLAs is that any logical expression
Truth table for main control signals can be written as a sum-of-products
Single-Cycle Design Problems
Consider a machine with an additional floating point unit. Assume functional unit delays
as follows
memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns., register file
access (read or write): 1 ns.
multiplexors, control unit, PC accesses, sign extension, wires: no delay
Assume instruction mix as follows
all loads take same time and comprise 31%
all stores take same time and comprise 21%
R-format instructions comprise 27%
branches comprise 5%
jumps comprise 2%
FP adds and subtracts take the same time and totally comprise 7%
FP multiplys and divides take the same time and totally comprise 7%
Compare the performance of (a) a single-cycle implementation using a fixed-period clock with
(b) one using a variable-period clock where each instruction executes in one clock cycle that is
only as long as it needs to be (not really practical but pretend it’s possible!)
Solution
Instruction Instr. Register ALU Data Register FPU FPU Total
class mem. read oper. mem. write add/ mul/ time
sub div ns.
Load word 2 1 2 2 1 8
Store word 2 1 2 2 7
R-format 2 1 2 0 1 6
Branch 2 1 2 5
Jump 2 2
FP mul/div 2 1 1 16 20
FP add/sub 2 1 1 8 12
Note particularities of
multicyle vs. single-
diagrams
single memory for data
and instructions
single ALU, no extra adders
extra registers to
hold data between
clock cycles
Single-cycle datapath
RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);
Step 3: Execution, Address Computation or
Branch Completion (EX)
1: IF
2: ID
3: EX
4: MEM
5: WB
Multicycle Execution Step (1):
Instruction Fetch
IR = Memory[PC];
PC = PC + 4;
PC + 4
4
Multicycle Execution Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2)
Branch
Reg[rs]
Target
Address
PC + 4
Reg[rt]
Multicycle Execution Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
Reg[rs] Mem.
Address
PC + 4
Reg[rt]
Multicycle Execution Step (3):
ALU Instruction (R-Type)
ALUOut = A op B
Reg[rs]
R-Type
Result
PC + 4
Reg[rt]
Multicycle Execution Step (3):
Branch Instructions
if (A == B) PC = ALUOut;
Branch
Reg[rs]
Target
Address
Branch
Target
Address
Reg[rt]
Multicycle Execution Step (3):
Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2)
Branch
Reg[rs]
Target
Address
Jump
Address
Reg[rt]
Multicycle Execution Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
Reg[rs] Mem.
Address
PC + 4
Mem. Reg[rt]
Data
Multicycle Execution Step (4):
Memory Access - Write (sw)
Memory[ALUOut] = B;
Reg[rs]
PC + 4
Reg[rt]
Multicycle Execution Step (4):
ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOUT
Reg[rs]
R-Type
Result
PC + 4
Reg[rt]
Multicycle Execution Step (5):
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
Reg[rs]
Mem.
Address
PC + 4
Mem. Reg[rt]
Data
Multicycle Datapath with Control I
… with control lines and the ALU control block added – not all control lines are shown
Multicycle Datapath with Control II
New gates New multiplexor
For the jump address
1
0 0 0
X 010
X
1 0
1
Multicycle Control Step (2):
Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs])
B = Reg[IR[20-15]]; (B = Reg[rt])
ALUOut = (PC + sign-extend(IR[15-0]) << 2);
0
0
X 0
0 X 010
X
0 0
3
Multicycle Control Step (3):
Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]);
0
0
X 1
0 X 010
X
0 0
2
Multicycle Control Step (3):
ALU Instruction (R-Type)
ALUOut = A op B;
0
0
X 1
0 X ???
X
0 0
0
Multicycle Control Step (3):
Branch Instructions
if (A == B) PC = ALUOut;
0
1 if
Zero=1
X 1
0 X 011
X
0 0
0
Multicycle Execution Step (3):
Jump Instruction
PC = PC[21-28] concat (IR[25-0] << 2);
0
1
X X
0 X XXX
X
0 0
X
Multicycle Control Step (4):
Memory Access - Read (lw)
MDR = Memory[ALUOut];
0
0
1 X
0 X XXX
X
1 0
X
Multicycle Execution Steps (4)
Memory Access - Write (sw)
Memory[ALUOut] = B;
0
1 X
1 X XXX
X
0 0
X
Multicycle Control Step (4):
ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOut; (Reg[Rd] =
ALUOut)
0
IRWrite
I Instruction I jmpaddr 28 32
0 R 5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
2
X
IorD
32 5 5
0
MUX
1 RegDst
X M
0 5 1 ALUSrcA
XXX
1U
X
PC Operation 0
MemWrite RN1 RN2 WN 3
0M 0M
U ADDR M U PCSource
1X Registers 1X Zero
Memory D 0M RD1 A
RD R U
1X
WD ALU
ALU
X
OUT
WD RD2 B 0
MemRead MemtoReg 4 1M
U
2X
1 RegWrite
3
0 1 E
ALUSrcB
immediate 16 X 32
T <<2
N X
D
Multicycle Execution Steps (5)
Memory Read Completion (lw)
Reg[IR[20-16]] = MDR;
IRWrite 0
I Instruction I jmpaddr 28 32
0 R 5 I[25:0] <<2 CONCAT
PCWr* rs rt rd
X 32
0 1 RegDst
X 2
M
IorD 0 5 5 MUX
5 0 ALUSrcA XXX 1U
X
PC Operation 0
MemWrite RN1 RN2 WN 3
0M 0M
U ADDR M U PCSource
1 X Registers 1X Zero
Memory D 0M RD1 A
RD R U
1X
WD ALU X
ALU
OUT
WD RD2 B 0
MemRead MemtoReg 4 1M
U
2X
0 RegWrite
3
0
immediate 16
1 E
X 32
ALUSrcB
T
N
<<2 X
D
Simple Questions
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not equal
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: ...
In what cycle does the actual addition of $t2 and $t3 takes place?
Clock time-line
Implementing Control
Value of control signals is dependent upon:
what instruction is being executed
which step is being performed
# #
Output Output
“even” “odd”
Asserted signals
shown inside
state circles
EX
WB
The complete FSM control for the multicycle MIPS datapath:
refer Multicycle Datapath with Control II
Example: CPI in a multicycle CPU
Assume
the control design of the previous slide
An instruction mix of 22% loads, 11% stores, 49% R-type operations, 16%
branches, and 2% jumps
What is the CPI assuming each step requires 1 clock cycle?
Solution:
Number of clock cycles from previous slide for each instruction class:
loads 5, stores 4, R-type instructions 4, branches 3, jumps 3
CPI = CPU clock cycles / instruction count
= (instruction countclass i CPIclass i) / instruction count
= (instruction countclass I / instruction count) CPIclass I
= 0.22 5 + 0.11 4 + 0.49 4 + 0.16 3 + 0.02 3
= 4.04
FSM Control:
Implement-
ation
High-level view of FSM implementation: inputs to the combinational logic block are
the current state number and instruction opcode bits; outputs are the next state
number and control signals to be asserted for the current state
FSM
Control:
PLA
Implem-
entation
Upper half is the AND plane that computes all the products. The products are carried
to the lower OR plane by the vertical lines. The sum terms for each output is given by
the corresponding horizontal line
E.g., IorD = S0.S1.S2.S3 + S0.S1.S2.S3
FSM Control: ROM Implementation