Out-Of-Order Completion
Out-Of-Order Completion
Out-Of-Order Completion
Several implementations
• out-of-order completion
• CDC 6600 with scoreboarding
• IBM 360/91 with Tomasulo’s algorithm & reservation
stations
• out-of-order completion leads to:
• imprecise interrupts
• WAR hazards
• WAW hazards
• in-order completion
• MIPS R10000/R12000 & Alpha 21264/21364 with large
physical register file & register renaming
• Intel Pentium Pro/Pentium III with the reorder buffer
Out-of-order Hardware
1
Tomasulo’s Algorithm
Tomasulo’s Algorithm
2
Hardware for Tomasulo’s Algorithm
Reservation stations
• buffers for functional units that hold instructions stalled for RAW
hazards & their operands
• source operands can be values or names of other reservation
station entries or load buffer entries that will produce the value
• both operands don’t have to be available at the same time
• when both operand values have been computed, an instruction
can be dispatched to its functional unit
3
Reservation Stations
Reservation Stations
4
Reservation Stations
Reservation Stations
5
Tomasulo’s Algorithm: More Key Features
6
Tomasulo’s Algorithm: Execution Steps
Tomasulo functions
(assume the instruction has been fetched)
• issue & read
• structural hazard detection for reservation stations & load/store
buffers
• issue if no hazard
• stall if hazard
• read registers for source operands
• put into reservation stations if values are in them
• put tag of producing functional unit or load buffer if not
(renaming the registers to eliminate WAR & WAW hazards)
• execute
• RAW hazard detection
• snoop on common data bus for missing operands
• dispatch instruction to a functional unit when obtain both
operand values
• execute the operation
• calculate effective address & start memory operation
• write
• broadcast result & reservation station id (tag) on the common
data bus
• reservation stations, registers & store buffer entries obtain the
value through snooping
7
Tomasulo’s Algorithm: State
first load
has
executed
8
Example in the Book: 2
Instruction Status Table
yes
yes second load
yes has
yes executed
(Load2)
(Load2)
(Load2)
(Load2)
Autumn 2006 CSE P548 - Tomasulo 17
yes
yes subtract
has
yes
executed
yes
(Load2)
(Load2)
(Load2) (Add1)
Autumn 2006 CSE P548 - Tomasulo 18
9
Example in the Book: 4
Instruction Status Table
yes
yes add
has
yes
executed
yes
(Load2)
yes
yes multiply
has
yes
executed
yes
10
Tomasulo’s Algorithm
• addf and st in each iteration has a different tag for the F0 value
• only the last iteration writes to F0
• effectively completely unrolling the loop
Tomasulo’s Algorithm
11
Dynamic Scheduling
Use both!
12