Correlation-based prefetching is a technique to observe and record spatial links between temporal events and address references in a processor with the goal of predicting what data will be needed by the processor. The main contribution of this dissertation is that correlation-based prefetching is shown to be an effective technique for prefetching data into a data cache, by observing the cache-miss traffic.
This thesis examines the issues that are critical to the performance of correlation-based prefetching. First, intrinsic qualities of test programs are studied which indicate the potential for using correlations for prefetching. Metrics of linked and strided spatial locality indicate the potential of correlation-based prefetching.
Hardware prefetching works because there often exists predictable patterns in the references of programs. Hardware correlation-based prefetching seeks to allow prefetching in the situations not easily handled by compilers or characterized by simple stride patterns. It is shown to be very efficient in such cases. Correlation-based prefetching is a complementary technique to compiler-controlled prefetching or stride-based prefetching done in hardware.
Specifically, this thesis examines mechanisms for construction of temporal events that are used as triggers for prefetching. These temporal events should be good indicators of the context of a program. A flexible mechanism for pairing temporal events with addresses, called the horizon and skip model, is shown to be useful for increasing the accuracy of prefetching for some applications.
Confirmation mechanisms are studied. They allow for a tradeoff between the coverage of memory accesses predicted and accuracy of prefetching.
An important issue in prefetching is where to store the accumulated prefetching meta-data. We examine the effects of varying the capacity of a table used to store this meta-data as well as a novel mechanism for storing the meta-data in a large, pre-existing secondary cache. We examine an alternative approach involving a hybrid prefetcher that combines a stride and pure correlation-based prefetcher.
The hybrid mechanism is shown to dramatically reduce the pair storage requirements of correlation-based prefetching.
Cited By
- Naderan-Tahan M and Sarbazi-Azad H (2018). Adaptive prefetching using global history buffer in multicore processors, The Journal of Supercomputing, 68:3, (1302-1320), Online publication date: 1-Jun-2014.
- Tang J, Liu C, Liu S and Gaudiot J (2013). Practical models for energy-efficient prefetching in mobile embedded systems, Microprocessors & Microsystems, 37:8, (1173-1182), Online publication date: 1-Nov-2013.
- Zhuang X and Lee H (2007). Reducing Cache Pollution via Dynamic Data Prefetch Filtering, IEEE Transactions on Computers, 56:1, (18-31), Online publication date: 1-Jan-2007.
- Yang Z, Shi X, Su F and Peir J Overlapping dependent loads with addressless preload Proceedings of the 15th international conference on Parallel architectures and compilation techniques, (275-284)
- Shi X, Yang Z, Peir J, Peng L, Chen Y, Lee V and Liang B Coterminous locality and coterminous group data prefetching on chip-multiprocessors Proceedings of the 20th international conference on Parallel and distributed processing, (89-89)
- Mutlu O, Kim H and Patt Y (2006). Address-Value Delta (AVD) Prediction, IEEE Transactions on Computers, 55:12, (1491-1508), Online publication date: 1-Dec-2006.
- Rogers B, Solihin Y and Prvulovic M (2005). Memory predecryption, ACM SIGARCH Computer Architecture News, 33:1, (27-33), Online publication date: 1-Mar-2005.
- Puzak T, Hartstein A, Emma P and Srinivasan V When prefetching improves/degrades performance Proceedings of the 2nd conference on Computing frontiers, (342-352)
- Srinivasan V, Davidson E and Tyson G (2004). A Prefetch Taxonomy, IEEE Transactions on Computers, 53:2, (126-140), Online publication date: 1-Feb-2004.
- Sair S, Sherwood T and Calder B (2003). A Decoupled Predictor-Directed Stream Prefetching Architecture, IEEE Transactions on Computers, 52:3, (260-276), Online publication date: 1-Mar-2003.
- Solihin Y, Lee J and Torrellas J (2003). Correlation Prefetching with a User-Level Memory Thread, IEEE Transactions on Parallel and Distributed Systems, 14:6, (563-580), Online publication date: 1-Jun-2003.
- Luk C, Muth R, Patil H, Weiss R, Lowney P and Cohn R Profile-guided post-link stride prefetching Proceedings of the 16th international conference on Supercomputing, (167-178)
- Solihin Y, Lee J and Torrellas J (2002). Using a user-level memory thread for correlation prefetching, ACM SIGARCH Computer Architecture News, 30:2, (171-182), Online publication date: 1-May-2002.
- Cooksey R, Jourdan S and Grunwald D A stateless, content-directed data prefetching mechanism Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, (279-290)
- Cooksey R, Jourdan S and Grunwald D (2002). A stateless, content-directed data prefetching mechanism, ACM SIGPLAN Notices, 37:10, (279-290), Online publication date: 1-Oct-2002.
- Cooksey R, Jourdan S and Grunwald D (2002). A stateless, content-directed data prefetching mechanism, ACM SIGARCH Computer Architecture News, 30:5, (279-290), Online publication date: 1-Dec-2002.
- Cooksey R, Jourdan S and Grunwald D (2002). A stateless, content-directed data prefetching mechanism, ACM SIGOPS Operating Systems Review, 36:5, (279-290), Online publication date: 1-Dec-2002.
- Solihin Y, Lee J and Torrellas J Using a user-level memory thread for correlation prefetching Proceedings of the 29th annual international symposium on Computer architecture, (171-182)
- Sherwood T, Sair S and Calder B Predictor-directed stream buffers Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, (42-53)
- Joseph D and Grunwald D (1999). Prefetching Using Markov Predictors, IEEE Transactions on Computers, 48:2, (121-133), Online publication date: 1-Feb-1999.
Index Terms
- Correlation-based hardware prefetching
Recommendations
Increasing hardware data prefetching performance using the second-level cache
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Effective cache prefetching on bus-based multiprocessors
Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a shared-memory multiprocessor. Prefetching ...