research-article

High performance cache replacement using re-reference interval prediction (RRIP)

Authors:

Kevin B. Theobald,

Simon C. Steely, Jr.,

Joel EmerAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 38, Issue 3

Pages 60 - 71

https://doi.org/10.1145/1816038.1815971

Published: 19 June 2010 Publication History

Abstract

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.

References

[1]

"Inside the Intel Itanium 2 Processor", HP Technical White Paper, July 2002.

[2]

"UltraSPARC T2 supplement to the UltraSPARC architecture 2007", Draft D1.4.3. 2007.

[3]

Intel. Intel Core i7 Processor. http://www.intel.com/products/processor/corei7/specifications.htm

[4]

H. Al-Zoubi, A. Milenkovic, M. Milenkovic. "Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite." In ACMSE, 2004.

Digital Library

[5]

S. Bansal and D. S. Modha. "CAR: Clock with Adaptive Replacement", In FAST, 2004.

Digital Library

[6]

A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez. "Scavenger: A New Last Level Cache Architecture with Global Block Priority". In Micro-40, 2007.

Digital Library

[7]

L. A. Belady. A study of replacement algorithms for a virtual-storage computer. In IBM Systems journal, pages 78--101, 1966.

Digital Library

[8]

M. Chaudhuri. "Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches". In Micro, 2009.

Digital Library

[9]

F. J. Corbató, "A paging experiment with the multics system," In Honor of P. M. Morse, pp. 217--228, MIT Press, 1969.

[10]

A. Jaleel, R. Cohn, C. K. Luk, B. Jacob. CMP$im: A Pin-Based On-The-Fly MultiCore Cache Simulator. In MoBS, 2008.

[11]

A. Jaleel, W. Hasenplaugh, M. K. Qureshi, S. C. Steely Jr., J. Emer. "Adaptive Insertion Policies for Managing Shared Caches". In PACT, 2008.

Digital Library

[12]

S. Jiang and X. Zhang, "LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance," In Proc. ACM SIGMETRICS Conf., 2002.

Digital Library

[13]

T. Johnson and D. Shasha, "2Q: A low overhead high performance buffer management replacement algorithm," In VLDB Conf., 1994.

Digital Library

[14]

S. Kaxiras, Z. Hu, M. Martonosi. "Cache decay: exploiting generational behavior to reduce cache leakage power." In ISCA--28.

Digital Library

[15]

G. Keramidas, P. Petoumenos, S. Kaxiras. "Cache replacement based on reuse-distance prediction'. In ICCD, 2007

[16]

A. Lai, C. Fide, B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001

Digital Library

[17]

D. Lee, J. Choi, J. Kim, S. H. Noh, S. Lyul Min, Y. Cho, C. Sang Kim. "LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies," IEEE Trans. Computers, vol. 50, no. 12, pp. 1352--1360, 2001.

Digital Library

[18]

W. Lin and S. K. Reinhardt. "Predicting last-touch references under optimal replacement." Technical Report CSE-TR-447-02, U. of Michigan, 2002.

[19]

H. Liu, M. Ferdman, J. Huh, D. Burger. "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency." In Micro-41, 2008.

Digital Library

[20]

G. Loh. "Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy". In Micro, 2009.

Digital Library

[21]

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, K. Hazelwood. "Pin: building customized program analysis tools with dynamic instrumentation". In PLDI, pages 190--200, 2005.

Digital Library

[22]

N. Megiddo and D. S. Modha, "ARC: A self-tuning, low overhead replacement cache,' in FAST, 2003.

Digital Library

[23]

E. J. O'Neil, P. E. O'Neil, G. Weikum. "The LRU-K page replacement algorithm for database disk buffering," in Proc. ACM SIGMOD Conf., pp. 297--306, 1993.

Digital Library

[24]

H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, A. Karunanidhi. "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation". In MICRO--37, 2004.

Digital Library

[25]

M. Qureshi, A. Jaleel, Y. Patt, S. Steely, J. Emer. "Adaptive Insertion Policies for High Performance Caching". In ISCA--34, 2007.

Digital Library

[26]

K. Rajan and G. Ramaswamy. "Emulating Optimal Replacement with a Shepherd Cache". In Micro--40, 2007.

Digital Library

[27]

J. T. Robinson and M. V. Devarakonda, "Data cache management using frequency-based replacement," in SIGMETRICS Conf, 1990.

Digital Library

[28]

S. Srinath, O. Mutlu, H. Kim, Y. Patt. "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetcher". In HPCA-13, 2007.

Digital Library

[29]

R. Subramanian, Y. Smaragdakis, G. Loh. "Adaptive caches: Effective shaping of cache behavior to workloads." In MICRO-39, 2006.

Digital Library

[30]

Y. Xie and G. Loh. "PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches." In ISCA-36, 2009

Digital Library

[31]

Y. Zhou and J. F. Philbin, "The multi-queue replacement algorithm for second level buffer caches," in USENIX Annual Tech. Conf, 2001.

Digital Library

Cited By

Liu JEgawa RTakahashi KShimomura YTakizawa H(2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
https://doi.org/10.1587/elex.21.20230520
Lan SLai WWang Z(2024)Optimizing Video Caching and Transcoding in Multi-Access Edge Computing Using Deep Reinforcement LearningProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690989(345-351)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3690931.3690989
He SWang ZTang XSun QDong D(2024)Chimera: Leveraging Hybrid Offsets for Efficient Data PrefetchingProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689613(144-155)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689613
Show More Cited By

Index Terms

High performance cache replacement using re-reference interval prediction (RRIP)
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

SHiP: signature-based hit predictor for high performance caching
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

The shared last-level caches in CMPs play an important role in improving application performance and reducing off-chip memory bandwidth requirements. In order to use LLCs more efficiently, recent research has shown that changing the re-reference ...
Adaptive insertion policies for high performance caching
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
Adaptive insertion policies for high performance caching

The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 38, Issue 3

ISCA '10

June 2010

508 pages

ISSN:0163-5964

DOI:10.1145/1816038

Issue’s Table of Contents

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Published in SIGARCH Volume 38, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

653
Total Citations
View Citations
8,979
Total Downloads

Downloads (Last 12 months)1,280
Downloads (Last 6 weeks)89

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu JEgawa RTakahashi KShimomura YTakizawa H(2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
https://doi.org/10.1587/elex.21.20230520
Lan SLai WWang Z(2024)Optimizing Video Caching and Transcoding in Multi-Access Edge Computing Using Deep Reinforcement LearningProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690989(345-351)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3690931.3690989
He SWang ZTang XSun QDong D(2024)Chimera: Leveraging Hybrid Offsets for Efficient Data PrefetchingProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689613(144-155)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689613
Chang CHan JSivasubramaniam ASharma Mailthody VQureshi ZHwu WTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)GMT: GPU Orchestrated Memory Tiering for the Big Data EraProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651353(464-478)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651353
Jiang ZYang KFisher NGuan NAudsley NDong Z(2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3332711
Song WXue ZHan JLi ZLiu P(2024)Randomizing Set-Associative Caches Against Conflict-Based Cache Side-Channel AttacksIEEE Transactions on Computers10.1109/TC.2024.334965973:4(1019-1033)Online publication date: Apr-2024
https://doi.org/10.1109/TC.2024.3349659
Morais LÁlvarez CJiménez-González Dde Haro JAraujo GFrank MGoldman AMartorell X(2024)Enabling HW-Based Task Scheduling in Large Multicore ArchitecturesIEEE Transactions on Computers10.1109/TC.2023.332378173:1(138-151)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TC.2023.3323781
Kang BPeng KChen Y(2024)Collaborative Content Caching in IIoT: A Multi-Agent Reinforcement Learning-Based Approach2024 IEEE International Conference on Smart Internet of Things (SmartIoT)10.1109/SmartIoT62235.2024.00084(508-515)Online publication date: 14-Nov-2024
https://doi.org/10.1109/SmartIoT62235.2024.00084
Lawand WPellizzoni R(2024)Duration-based Instruction Cache Locking2024 IEEE 30th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA62462.2024.00021(85-90)Online publication date: 21-Aug-2024
https://doi.org/10.1109/RTCSA62462.2024.00021
Bera RRanganathan ARakshit JMahto SNori AGaur JOlgun AKanellopoulos KSadrosadati MSubramoney SMutlu O(2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents