ILP ScoreBoard
ILP ScoreBoard
ILP ScoreBoard
Static Scheduling
Compiler techniques for scheduling separate dependent instructions minimize the number of hazard and stalls e.g.: static branch prediction
Dynamic Scheduling
1. Uses hardware to rearrange instructions to reduce stalls 2. Works when real dependence is not known at compile time 3. Compiler simpler 4. Code for one pipeline runs well on another pipeline
CSCE430/830
ILP: Scoreboard
Key Idea: Allow instructions behind stall to proceed. => Instructions executing in parallel. There are multiple execution units, so use them. DIVD F0, F2, F4 Even though ADDD stalls, the SUBD has no dependencies ADDD F10, F0, F8 and can run. SUBD F12, F8, F14 Enables out-of-order execution => out-of-order completion
Dynamic pipeline scheduling overcomes the limitations of in-order pipelined execution by allowing out-of-order instruction execution.
CSCE430/830
ILP: Scoreboard
It dynamically constructs the dependency graph by hardware for a window of instructions as they are issued in program order. A scoreboard is a data structure that provides the information necessary for all pieces of the processor to work together.
CSCE430/830
CDC6600
(1963)
ILP: Scoreboard
Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions. We will use In order issue, out of order execution, out of order commit ( also called completion) First used in CDC6600 in 1963. Our example has been modified to fit for MIPS. CDC had 4 FP units, 5 memory reference units, 7 integer units. MIPS has 2 FP multiply, 1 FP adder, 1 FP divider, 1 integer.
CSCE430/830
ILP: Scoreboard
CSCE430/830
ILP: Scoreboard
A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit (no RAW). When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.
ILP: Scoreboard
CSCE430/830
A Scoreboard Example
The following code is run on the MIPS with a scoreboard given earlier with:
Functional Unit (FU) Integer Floating Point Multiply Floating Point add Floating point Divide # of FUs 1 2 1 1 EX cycles 1 10 2 40
F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2
CSCE430/830
ILP: Scoreboard
Example Code
2
L.D F2, 45 (R3)
3
MUL.D F0, F2, F4
1 2 3 4 5 6
F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2
4
SUB.D F8, F6, F2
Date Dependence: (1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6) Output Dependence: (1, 6)
5
DIV.D F10, F0, F6
Anti-dependence: (5, 6)
6
ADD.D F6, F8, F2
(WAR) (WAW)
CSCE430/830
ILP: Scoreboard
Scoreboard Example
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
Issue
Busy No No No No No
Op
dest Fi
S1 Fj
S2 Fk
Fk? Rk
Clock
FU
CSCE430/830
F0
F2
F4
F6
F8
F10
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
Issue 1
Issue LD #1
Shows in which cycle the operation occurred.
Busy Yes No No No No
Op Load
dest Fi F6
S1 Fj
S2 Fk R2
Fk? Rk Yes
Clock
1
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Integer
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
LD #2 cant issue since integer unit is busy. MULT cant issue because we require in-order issue.
Busy Yes No No No No
Op Load
dest Fi F6
S1 Fj
S2 Fk R2
Fk? Rk Yes
Clock
2
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Integer
F12
...
F30
ILP: Scoreboard
Instruction status Read Execution Write Instruction j k Issueoperands complete Result LD F6 34+ R2 1 2 3 LD F2 45+ R3 MULTD F2 F4 F0 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for FU for Fj? j k Time Name Busy Op Fi Fj Fk Qj Qk Rj Integer Yes Load F6 R2 Mult1 No Mult2 No Add No Divide No Register result status
Fk? Rk no
Clock
3
CSCE430/830
F0 F2
FU
F4
F30
ILP: Scoreboard
Instruction status Read Execution Write Instruction j k Issueoperands complete Result LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 MULTD F2 F4 F0 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for FU for Fj? j k Time Name Busy Op Fi Fj Fk Qj Qk Rj Integer no Load F6 R2 Mult1 No Mult2 No Add No Divide No Register result status
Fk? Rk no
Clock
4
CSCE430/830
F0 F2
FU
F4
F30
ILP: Scoreboard
Instruction status Read Execution Write Instruction j k Issueoperands complete Result LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 MULTD F2 F4 F0 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for FU for Fj? j k Time Name Busy Op Fi Fj Fk Qj Qk Rj Integer yes Load F2 R3 Mult1 No Mult2 No Add No Divide No Register result status
Fk? Rk Yes
Clock
5
CSCE430/830
F0 F2
FU
F4
F30
ILP: Scoreboard
Integer
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
Issue MULT.
Op Load Mult
dest Fi F2 F0
S1 Fj F2
S2 Fk R3 F4
Clock
6
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
F12
...
F30
ILP: Scoreboard
Mult1 Integer
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
dest Fi F2 F0 F8
S1 Fj F2 F6
S2 Fk R3 F4 F2
Clock
7
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Add
F12
...
F30
ILP: Scoreboard
Mult1 Integer
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
Clock
8
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Mult1 Integer
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
LD #2 writes F2.
Mult1
Clock
8
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 10 Mult1 Mult2 2 Add Divide Register result status
Now MULT and SUBD can both read F2. How can both instructions do this at the same time??
FU for j FU for k Fj? Qj Qk Rj Yes Yes No Fk? Rk Yes Yes Yes
Mult1
Clock
9 FU
F0
Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
CSCE430/830
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 8 Mult1 Mult2 0 Add Divide Register result status
Mult1
Clock
11
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 7 Mult1 Mult2 Add Divide Register result status
Op Mult
Fk? Rk Yes
Div
F10
F0
F6
Mult1
No
Yes
Clock
12
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 6 Mult1 Mult2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
ADDD issues.
Mult1
Clock
13
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 5 Mult1 Mult2 2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Mult1
Clock
14
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 4 Mult1 Mult2 1 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Mult1
Clock
15
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 3 Mult1 Mult2 0 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Mult1
Clock
16
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 2 Mult1 Mult2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Mult1
Clock
17
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 1 Mult1 Mult2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Nothing Happens!!
Mult1
Clock
18
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 0 Mult1 Mult2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 19 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Mult1
Clock
19
CSCE430/830
F0
FU Mult1
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No No No Yes Add F6 F8 F2 Yes Div F10 F0 F6
MULT writes.
Fk? Rk
Yes Yes
Yes Yes
Clock
20
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No No No Yes Add F6 F8 F2 Yes Div F10 F0 F6
Fk? Rk
Yes Yes
Yes Yes
Clock
21
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Add Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 40 Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No Yes Div F10 F0 F6
Fk? Rk
Yes
Yes
Clock
22
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status
Read Execution Write Issue operands completeResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 61 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No Yes Div F10 F0 F6
Fk? Rk
Yes
Yes
Clock
61
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
Divide
F12
...
F30
ILP: Scoreboard
Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status
DONE!!
Fk? Rk
Clock
62
CSCE430/830
F0
FU
F2
F4
F6 F8 F10
F12
...
F30
ILP: Scoreboard
Issue
f((Fj( f )Fi(FU) or Rj( f )=No) & Write result (Fk( f ) Fi(FU) or Rk( f )=No))
f(if Qj(f)=FU then Rj(f) Yes); f(if Qk(f)=FU then Rj(f) Yes); Result(Fi(FU)) 0; Busy(FU) No
CSCE430/830
ILP: Scoreboard
Summary
Techniques to deal with data hazards in instruction pipelines by:
Result forwarding to reduce or eliminate RAW hazards Hazard detection hardware to stall the pipeline during hazards Compiler-based static scheduling to separate the dependent instructions minimizing actual hazard-prevention stalls in scheduled code (will discuss in detail next week.) Uses a hardware-based mechanism to rearrange instruction execution order to reduce stalls dynamically at runtime (dynamic scheduling) Better dynamic exploitation of instruction-level parallelism (ILP) We learned scoreboard techniques today We will learn another technique Tomasulo next week.
CSCE430/830
ILP: Scoreboard
CSCE430/830
ILP: Scoreboard
A Scoreboard Example
The following code is run on the MIPS with a scoreboard given earlier with:
Functional Unit (FU) Integer Floating Point Multiply Floating Point add Floating point Divide # of FUs 1 2 1 1 EX cycles 1 10 2 40
F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2
CSCE430/830
ILP: Scoreboard
Example Code
2
L.D F2, 45 (R3)
3
MUL.D F0, F2, F4
1 2 3 4 5 6
F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2
4
SUB.D F8, F6, F2
5
DIV.D F10, F0, F6
6
ADD.D F6, F8, F2
(WAR) (WAW)
CSCE430/830
ILP: Scoreboard
Example Code
2
L.D F2, 45 (R3)
3
MUL.D F0, F2, F4
1 2 3 4 5 6
F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2
4
SUB.D F8, F6, F2
Date Dependence: (1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6) Output Dependence: (1, 6)
5
DIV.D F10, F0, F6
Anti-dependence: (5, 6)
6
ADD.D F6, F8, F2
(WAR) (WAW)
CSCE430/830
ILP: Scoreboard
Scoreboard Example
Instruction status Read Execution Write Instruction j k Issue operands complete Result LD F6 34+ R2 LD F2 45+ R3 MULTD F2 F4 F0 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Functional unit status dest S1 S2 Time Name Busy Op Fi Fj Fk Integer No Mult1 No Mult2 No Add No Divide No Register result status F0 FU
CSCE430/830 ILP: Scoreboard
Fk? Rk
F2
F4
F6
F8
F10
F12
...
F30
Wait until
Bookkeeping Busy(FU) yes; Op(FU) op; Fi(FU) `D; Fj(FU) `S1; Fk(FU) `S2; Qj Result(S1); Qk Result(`S2); Rj not Qj; Rk not Qk; Result(D) FU; Rj No; Rk No
Issue
Write result
f(if Qj(f)=FU then Rj(f) Yes); f(if Qk(f)=FU then Rj(f) Yes); Result(Fi(FU)) 0; Busy(FU) No
CSCE430/830
ILP: Scoreboard
Limitations of Scoreboard
The amount of parallelism available among the instructions (chosen from the same basic block) The number of score entries (The size of the scoreboard determines the size of the window) The number and types of functional units (Structural hazards increase when dynamic scheduling is used) The presence of antidependence and output dependences lead to WAR and WAW stalls.
CSCE430/830
ILP: Scoreboard