Memory dependence prediction

Memory dependence prediction is a technique, employed by high-performance out-of-order execution microprocessors that execute memory access operations (loads and stores) out of program order, to predict true dependencies between loads and stores at instruction execution time. With the predicted dependence information, the processor can then decide to speculatively execute certain loads and stores out of order, while preventing other loads and stores from executing out-of-order (keeping them in-order). Later in the pipeline, memory disambiguation techniques are used to determine if the loads and stores were correctly executed and, if not, to recover.

By using the memory dependence predictor to keep most dependent loads and stores in order, the processor gains the benefits of aggressive out-of-order load/store execution but avoids many of the memory dependence violations that occur when loads and stores were incorrectly executed. This increases performance because it reduces the number of pipeline flushes that are required to recover from these memory dependence violations. See the memory disambiguation article for more information on memory dependencies, memory dependence violations, and recovery.

In general, memory dependence prediction predicts whether two memory operations are dependent, that is, if they interact by accessing the same memory location. Besides using store to load (RAW or true) memory dependence prediction for the out-of-order scheduling of loads and stores, other applications of memory dependence prediction have been proposed. See for example.^[1]

Memory dependence prediction is an optimization on top of memory dependency speculation. Sequential execution semantics imply that stores and loads appear to execute in the order specified by the program. However, as with out-of-order execution of other instructions, it may be possible to execute two memory operations in a different order from that implied by the program. This is possible when the two operations are independent. In memory dependence speculation a load may be allowed to execute before a store that precedes it. Speculation succeeds when the load is independent of the store, that is, when the two instructions access different memory locations. Speculation fails when the load is dependent upon the store, that is, when the two accesses overlap in memory. In the first, modern out-of-order designs, memory speculation was not used as its benefits were limited. Αs the scope of the out-of-order execution increased over few tens of instructions, naive memory dependence speculation was used. In naive memory dependence speculation,^[2] a load is allowed to bypass any preceding store. As with any form of speculation, it is important to weight the benefits of correct speculation against the penalty paid on incorrect speculation. As the scope of out-of-order execution increases further into several tens of instructions, the performance benefits of naive speculation decrease. To retain the benefits of aggressive memory dependence speculation while avoiding the costs of mispeculation several predictors have been proposed.

Selective memory dependence prediction^[2]^[3] stalls specific loads until it is certain that no violation may occur. It does not explicitly predict dependencies. This predictor may delay loads longer than necessary and hence result in suboptimal performance. In fact, in some cases it performs worse than naively speculating all loads as early as possible. This is because often its faster to mispeculate and recover than to wait for all preceding stores to execute. Exact memory dependence prediction was developed at the University of Wisconsin–Madison. Specifically, Dynamic Speculation and Synchronization^[2]^[3] delays loads only as long as it is necessary by predicting the exact store a load should wait for. This predictor predicts exact dependences (store and load pair). The synonym predictor^[1] groups together all dependences that share a common load or store instruction. The store sets^[4] predictor represents multiple potential dependences efficiently by grouping together all possible stores a load may depend upon. The store barrier^[5] predictor treats certain store instructions as barriers. That is, all subsequent load or store operations are not allowed to bypass the specific store. The store barrier predictor does not explicitly predict dependencies. This predictor may unnecessarily delay subsequent, yet independent loads. Memory dependence prediction has other applications beyond the scheduling of loads and stores. For example, speculative memory cloaking^[1] and speculative memory bypassing^[1] use memory dependence prediction to streamline the communication of values through memory.

Analogy to branch prediction

Memory dependence prediction for loads and stores is analogous to branch prediction for conditional branch instructions. In branch prediction, the branch predictor predicts which way the branch will resolve before it is known. The processor can then speculatively fetch and execute instructions down one of the paths of the branch. Later, when the branch instruction executes, it can be determined if the branch instruction was correctly predicted. If not, this is a branch misprediction, and a pipeline flush is necessary to throw away instructions that were speculatively fetched and executed.

Branch prediction can be thought of as a two step process. First, the predictor determines the direction of the branch (taken or not). This is a binary decision. Then, the predictor determines the actual target address. Similarly, memory dependence prediction can be thought of as a two step process. First, the predictor determines whether there is a dependence. Then it determines which this dependence is.

References

^ ^a ^b ^c ^d Moshovos, A.; Sohi, G. S. (1997). "Streamlining Inter-Operation Memory Communication via Data Dependence Prediction". Proceedings of 30th Annual International Symposium on Microarchitecture. MICRO'97. pp. 235–245. doi:10.1109/MICRO.1997.645814.
^ ^a ^b ^c Moshovos, Andreas; Breach, Scott E.; Vijaykumar, T. N.; Sohi, Gurindar S. (1997). "Dynamic Speculation and Synchronization of Data Dependences". Proceedings of the 24th annual international symposium on Computer architecture. ISCA'97. pp. 181–193. doi:10.1145/264107.264189. Also as technical report, Computer Sciences Department, University of Wisconsin–Madison, March 1996.
^ ^a ^b Memory Dependence Prediction, Moshovos, Ph.D. Thesis, Computer Sciences Department, University of Wisconsin–Madison, Dec. 1998.
^ Chrysos, G. Z.; Emer, J. S. (1998). "Memory Dependence Prediction Using Store Sets". Proceedings 25th Annual International Symposium on Computer Architecture. ISCA'98. pp. 142–153. doi:10.1109/ISCA.1998.694770.
^ Apparatus to dynamically control the out-of-order execution of load-store instructions in a processor capable of dispatching, issuing and executing multiple instructions in a single processor cycle, Hesson, LeBlanc and Ciavaglia, IBM, US Patent 5,615,350, March 1997.

[SIOMCvMDP-1] Moshovos, A.; Sohi, G. S. (1997). "Streamlining Inter-Operation Memory Communication via Data Dependence Prediction". Proceedings of 30th Annual International Symposium on Microarchitecture. MICRO'97. pp. 235–245. doi:10.1109/MICRO.1997.645814.

[SaSoMD-2] Moshovos, Andreas; Breach, Scott E.; Vijaykumar, T. N.; Sohi, Gurindar S. (1997). "Dynamic Speculation and Synchronization of Data Dependences". Proceedings of the 24th annual international symposium on Computer architecture. ISCA'97. pp. 181–193. doi:10.1145/264107.264189. Also as technical report, Computer Sciences Department, University of Wisconsin–Madison, March 1996.

[MDP-3] Memory Dependence Prediction, Moshovos, Ph.D. Thesis, Computer Sciences Department, University of Wisconsin–Madison, Dec. 1998.

[4] Chrysos, G. Z.; Emer, J. S. (1998). "Memory Dependence Prediction Using Store Sets". Proceedings 25th Annual International Symposium on Computer Architecture. ISCA'98. pp. 142–153. doi:10.1109/ISCA.1998.694770.

[5] Apparatus to dynamically control the out-of-order execution of load-store instructions in a processor capable of dispatching, issuing and executing multiple instructions in a single processor cycle, Hesson, LeBlanc and Ciavaglia, IBM, US Patent 5,615,350, March 1997.

[1]

[2]

[3]

[4]

[5]

Analogy to branch prediction

See also

References