Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

High performance cache replacement using re-reference interval prediction (RRIP)

Published: 19 June 2010 Publication History

Abstract

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.

References

[1]
"Inside the Intel Itanium 2 Processor", HP Technical White Paper, July 2002.
[2]
"UltraSPARC T2 supplement to the UltraSPARC architecture 2007", Draft D1.4.3. 2007.
[3]
Intel. Intel Core i7 Processor. http://www.intel.com/products/processor/corei7/specifications.htm
[4]
H. Al-Zoubi, A. Milenkovic, M. Milenkovic. "Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite." In ACMSE, 2004.
[5]
S. Bansal and D. S. Modha. "CAR: Clock with Adaptive Replacement", In FAST, 2004.
[6]
A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez. "Scavenger: A New Last Level Cache Architecture with Global Block Priority". In Micro-40, 2007.
[7]
L. A. Belady. A study of replacement algorithms for a virtual-storage computer. In IBM Systems journal, pages 78--101, 1966.
[8]
M. Chaudhuri. "Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches". In Micro, 2009.
[9]
F. J. Corbató, "A paging experiment with the multics system," In Honor of P. M. Morse, pp. 217--228, MIT Press, 1969.
[10]
A. Jaleel, R. Cohn, C. K. Luk, B. Jacob. CMP$im: A Pin-Based On-The-Fly MultiCore Cache Simulator. In MoBS, 2008.
[11]
A. Jaleel, W. Hasenplaugh, M. K. Qureshi, S. C. Steely Jr., J. Emer. "Adaptive Insertion Policies for Managing Shared Caches". In PACT, 2008.
[12]
S. Jiang and X. Zhang, "LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance," In Proc. ACM SIGMETRICS Conf., 2002.
[13]
T. Johnson and D. Shasha, "2Q: A low overhead high performance buffer management replacement algorithm," In VLDB Conf., 1994.
[14]
S. Kaxiras, Z. Hu, M. Martonosi. "Cache decay: exploiting generational behavior to reduce cache leakage power." In ISCA--28.
[15]
G. Keramidas, P. Petoumenos, S. Kaxiras. "Cache replacement based on reuse-distance prediction'. In ICCD, 2007
[16]
A. Lai, C. Fide, B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001
[17]
D. Lee, J. Choi, J. Kim, S. H. Noh, S. Lyul Min, Y. Cho, C. Sang Kim. "LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies," IEEE Trans. Computers, vol. 50, no. 12, pp. 1352--1360, 2001.
[18]
W. Lin and S. K. Reinhardt. "Predicting last-touch references under optimal replacement." Technical Report CSE-TR-447-02, U. of Michigan, 2002.
[19]
H. Liu, M. Ferdman, J. Huh, D. Burger. "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency." In Micro-41, 2008.
[20]
G. Loh. "Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy". In Micro, 2009.
[21]
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, K. Hazelwood. "Pin: building customized program analysis tools with dynamic instrumentation". In PLDI, pages 190--200, 2005.
[22]
N. Megiddo and D. S. Modha, "ARC: A self-tuning, low overhead replacement cache,' in FAST, 2003.
[23]
E. J. O'Neil, P. E. O'Neil, G. Weikum. "The LRU-K page replacement algorithm for database disk buffering," in Proc. ACM SIGMOD Conf., pp. 297--306, 1993.
[24]
H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, A. Karunanidhi. "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation". In MICRO--37, 2004.
[25]
M. Qureshi, A. Jaleel, Y. Patt, S. Steely, J. Emer. "Adaptive Insertion Policies for High Performance Caching". In ISCA--34, 2007.
[26]
K. Rajan and G. Ramaswamy. "Emulating Optimal Replacement with a Shepherd Cache". In Micro--40, 2007.
[27]
J. T. Robinson and M. V. Devarakonda, "Data cache management using frequency-based replacement," in SIGMETRICS Conf, 1990.
[28]
S. Srinath, O. Mutlu, H. Kim, Y. Patt. "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetcher". In HPCA-13, 2007.
[29]
R. Subramanian, Y. Smaragdakis, G. Loh. "Adaptive caches: Effective shaping of cache behavior to workloads." In MICRO-39, 2006.
[30]
Y. Xie and G. Loh. "PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches." In ISCA-36, 2009
[31]
Y. Zhou and J. F. Philbin, "The multi-queue replacement algorithm for second level buffer caches," in USENIX Annual Tech. Conf, 2001.

Cited By

View all
  • (2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
  • (2024)Optimizing Video Caching and Transcoding in Multi-Access Edge Computing Using Deep Reinforcement LearningProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690989(345-351)Online publication date: 19-Jul-2024
  • (2024)Chimera: Leveraging Hybrid Offsets for Efficient Data PrefetchingProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689613(144-155)Online publication date: 14-Oct-2024
  • Show More Cited By

Index Terms

  1. High performance cache replacement using re-reference interval prediction (RRIP)

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
      ISCA '10
      June 2010
      508 pages
      ISSN:0163-5964
      DOI:10.1145/1816038
      Issue’s Table of Contents
      • cover image ACM Conferences
        ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
        June 2010
        520 pages
        ISBN:9781450300537
        DOI:10.1145/1815961
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 June 2010
      Published in SIGARCH Volume 38, Issue 3

      Check for updates

      Author Tags

      1. replacement
      2. scan resistance
      3. shared cache
      4. thrashing

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,280
      • Downloads (Last 6 weeks)89
      Reflects downloads up to 06 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
      • (2024)Optimizing Video Caching and Transcoding in Multi-Access Edge Computing Using Deep Reinforcement LearningProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690989(345-351)Online publication date: 19-Jul-2024
      • (2024)Chimera: Leveraging Hybrid Offsets for Efficient Data PrefetchingProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689613(144-155)Online publication date: 14-Oct-2024
      • (2024)GMT: GPU Orchestrated Memory Tiering for the Big Data EraProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651353(464-478)Online publication date: 27-Apr-2024
      • (2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: 1-Jan-2024
      • (2024)Randomizing Set-Associative Caches Against Conflict-Based Cache Side-Channel AttacksIEEE Transactions on Computers10.1109/TC.2024.334965973:4(1019-1033)Online publication date: Apr-2024
      • (2024)Enabling HW-Based Task Scheduling in Large Multicore ArchitecturesIEEE Transactions on Computers10.1109/TC.2023.332378173:1(138-151)Online publication date: 1-Jan-2024
      • (2024)Collaborative Content Caching in IIoT: A Multi-Agent Reinforcement Learning-Based Approach2024 IEEE International Conference on Smart Internet of Things (SmartIoT)10.1109/SmartIoT62235.2024.00084(508-515)Online publication date: 14-Nov-2024
      • (2024)Duration-based Instruction Cache Locking2024 IEEE 30th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA62462.2024.00021(85-90)Online publication date: 21-Aug-2024
      • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media