Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
38 views41 pages

L11 DS PDF

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 41

Lecture-11 (Dynamic Scheduling)

CS422-Spring 2018

Biswa@CSE-IITK
How to Make CPI closer to One
• Let’s assume full pipelining:
– If we have a 4-cycle latency, then we need 3 instructions between a producing
instruction and its use:
multf $F0,$F2,$F4
delay-1
delay-2
delay-3
addf $F6,$F10,$F0 Earliest forwarding for
4-cycle instructions

Earliest forwarding for


1-cycle instructions

Fetch Decode Ex1 Ex2 Ex3 Ex4 WB

addf delay3 delay2 delay1 multf


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2
Where Are Stalls?
Loop: LD F0,0(R1) ;F0=vector element
ADDD F4,F0,F2 ;add scalar from F2
SD 0(R1),F4 ;store result
SUBI R1,R1,8 ;decrement pointer 8B (DW)
BNEZ R1,Loop ;branch R1!=zero
NOP ;delayed branch slot

Instruction Instruction Execution Latency in Use Latency in


producing result using result clock cycles clock cycles
FP ALU op Another FP ALU op 4 3
FP ALU op Store double 4 2
Load double FP ALU op 2 1
Load double Store double 2 0
Integer op Integer op 1 0

• Where are the stalls?

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 3


Rewrite The Code
1 Loop: LD F0,0(R1) ;F0=vector element
2 stall
3 ADDD F4,F0,F2 ;add scalar in F2
4 stall
5 stall
6 SD 0(R1),F4 ;store result
7 SUBI R1,R1,8 ;decrement pointer 8B (DW)
8 BNEZ R1,Loop ;branch R1!=zero
9 stall ;delayed branch slot

Instruction Instruction Use Latency in


producing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1

• 9 clocks: Rewrite code to minimize stalls?

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 4


Revised Loop
1 Loop: LD F0,0(R1)
2 stall
3 ADDD F4,F0,F2
4 SUBI R1,R1,8
5 BNEZ R1,Loop ;delayed branch
6 SD 8(R1),F4 ;altered when move past SUBI

Swap BNEZ and SD by changing address of SD


Instruction Instruction Use Latency in
producing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1

6 clocks: Unroll loop 4 times code to make faster?


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 5
Unroll It 4 times
1 cycle stall
1 Loop:LD F0,0(R1)
2 ADDD F4,F0,F2 2 cycles stall
3 SD 0(R1),F4 ;drop SUBI & BNEZ
4 LD F6,-8(R1) Rewrite loop to minimize stalls?
5 ADDD F8,F6,F2
6 SD -8(R1),F8 ;drop SUBI & BNEZ
7 LD F10,-16(R1)
8 ADDD F12,F10,F2
9 SD -16(R1),F12 ;drop SUBI & BNEZ
10 LD F14,-24(R1)
11 ADDD F16,F14,F2
12 SD -24(R1),F16
13 SUBI R1,R1,#32 ;alter to 4*8
14 BNEZ R1,LOOP
15 NOP

15 + 4 x (1+2) = 27 clock cycles, or 6.8 per iteration


Assumes R1 is multiple of 4

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 6


Even Better?
Unrolled Loop That Minimizes Stalls
1 Loop:LD F0,0(R1)
2 LD F6,-8(R1)
3 LD F10,-16(R1)
4 LD F14,-24(R1)
5 ADDD F4,F0,F2
6 ADDD F8,F6,F2
7 ADDD F12,F10,F2
8 ADDD F16,F14,F2
9 SD 0(R1),F4
10 SD -8(R1),F8
11 SD -16(R1),F12
12 SUBI R1,R1,#32
13 BNEZ R1,LOOP
14 SD 8(R1),F16 ; 8-32 = -24

14 clock cycles, or 3.5 per iteration


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 7
When Safe to Unroll?
• Example: Where are data dependencies?
(A,B,C distinct & nonoverlapping)
for (i=0; i<100; i=i+1) {
A[i+1] = A[i] + C[i]; /* S1 */
B[i+1] = B[i] + A[i+1]; /* S2 */
}

1. S2 uses the value, A[i+1], computed by S1 in the same iteration.


2. S1 uses a value computed by S1 in an earlier iteration, since iteration i computes A[i+1]
which is read in iteration i+1. The same is true of S2 for B[i] and B[i+1].
This is a “loop-carried dependence”: between iterations
• For our prior example, each iteration was distinct
– In this case, iterations can’t be executed in parallel, Right????

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 8


Out-of-order + Dynamic Scheduling ?
• Pipelining: Tries to achieve CPI =1

• Compiler scheduling minimizes the impacts of dependences.

• Hardware scheduling so far: In order execution


Instructions after stall must wait even if independent.

Dynamic scheduling: Out of order execution


Hardware lookahead of blocked instructions
• Inorder, O3
• Inorder issue, O3 execute, Inorder completion
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 9
Scoreboard
• Out-of-order execution divides ID stage:
1. Issue - decode instructions, check for structural hazards
2. Read operands - wait until no data hazards, then read operands (RAW)
3. Execute - Execute instruction and notify scoreboard when done
4. Write - Wait until earlier instructions read operands before writing to register file
(WAR)

• Scoreboards date to CDC6600 in 1963

• Instructions execute whenever not dependent on previous instructions and no hazards.


• CDC 6600: In order issue, out-of-order execution, out-of-order commit (or completion)
– No forwarding!
– Imprecise interrupt/exception model for now
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 10
Four Stages of Scoreboard Control - Details
• Issue—decode instructions & check for structural hazards (ID1)
– Instructions issued in program order (for hazard checking)
– Don’t issue if structural hazard
– Don’t issue if instruction is output dependent on any previously issued but
uncompleted instruction (no WAW hazards)

• Read operands—wait until no data hazards, then read operands (ID2)


– All real dependencies (RAW hazards) resolved in this stage, since we wait for
instructions to write back data.
– No forwarding of data in this model!

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 11


Four Stages of Scoreboard Control
• Execution—operate on operands (EX)
– The functional unit begins execution upon receiving operands. When the result is
ready, it notifies the scoreboard that it has completed execution.

• Write result—finish execution (WB)


– Stall until no WAR hazards with previous instructions:

Example: DIVD F0,F2,F4


ADDD F10,F0,F8
SUBD F8,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 12


Three Parts of the Scoreboard
• Instruction status:
Which of 4 steps the instruction is in

• Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each
functional unit
Busy: Indicates whether the unit is busy or not
Op: Operation to perform in the unit (e.g., + or –)
Fi: Destination register
Fj,Fk: Source-register numbers
Qj,Qk: Functional units producing source registers Fj, Fk
Rj,Rk: Flags indicating when Fj, Fk are ready

• Register result status—Indicates which functional unit will write each register, if one
exists. Blank when no pending instructions will write that register

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 13


Possible Architecture
FP Mult
FP Mult

Functional Units
Registers

FP Divide

FP Add

Integer

SCOREBOARD Memory

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 14


Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards ?
• Solutions for WAR:
– Stall write-back until registers have been read
– Read registers only during Read Operands stage
• Solution for WAW:
– Detect hazard and stall issue of new instruction until other instruction completes

• No register renaming
• Need to have multiple instructions in execution phase => multiple execution units or
pipelined execution units
• Scoreboard keeps track of dependencies between instructions that have already issued
• Scoreboard replaces ID, EX, WB with 4 stages

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 15


Scoreboard Example
Instruction status: Read Exec Write Integer: 1 cycle
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 FP add: 2 cycles
LD F2 45+ R3 FP multiply: 10 cycles
FP divide: 40 cycles
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 16


Cycle 1
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 17


Cycle 2
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer

• Issue 2nd LD? Can’t since integer unit is busy.


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 18
Cycle 3
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer

• Issue MULT? • F2?


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 19
Cycle 4
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 20


Cycle 5
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 21


Cycle 6
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULTD F0 F2 F4 6
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 22


Cycle 7
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add

• Read multiply operands? • LOAD is not done yet 


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 23
Cycle 8 (1st half)
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide

DIVD issues. MULT and SUBD. Both waiting for F2. LD #2 writes F2.
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 24
Cycle 8 (2nd Half)
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 25


Cycle 9
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Note 10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Remaining Mult2 No
2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide

• Read operands for MULT & SUB? Issue ADDD?


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 26
Cycle 10
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
9 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
1 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 27


Cycle 11
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0
SUBD F8
F2
F6
F4
F2
6
7
9
9 11
ADDD can’t start because add unit is busy
DIVD F10 F0 F6 8
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 28


Cycle 12
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide

• Read operands for DIVD?


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 29
Cycle 13
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 30


Cycle 14
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 31


Cycle 15
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
1 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 32


Cycle 16
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 33


Cycle 17
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 WAR Hazard!
ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide

• Why not write result of ADD???


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 34
Cycle 18
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 35


Cycle 19
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 36


Cycle 20
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Yes Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 37


Cycle 21
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Yes Yes

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide

• WAR Hazard is now gone...


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 38
Cycle 22
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
39 Divide Yes Div F10 F0 F6 No No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 39


Cycle 61
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide Yes Div F10 F0 F6 No No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide

CS422: Spring 2018 Biswabandan Panda, CSE@IITK 40


Cycle 62
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?


Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No

Register result status:


Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

• In-order issue; out-of-order execute & commit


CS422: Spring 2018 Biswabandan Panda, CSE@IITK 41

You might also like