research-article

Where replacement algorithms fail: a thorough analysis

Authors:

Georgios Keramidas,

Pavlos Petoumenos,

Stefanos KaxirasAuthors Info & Claims

CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Pages 141 - 150

https://doi.org/10.1145/1787275.1787316

Published: 17 May 2010 Publication History

Abstract

Cache placement and eviction, especially at the last level of the memory hierarchy, have received a flurry of research activity recently. The common perception that LRU is a well-performing algorithm has recently been discredited: many researchers have turned their attention to more sophisticated algorithms that are able to substantially improve cache performance. In this paper, we thoroughly examine four recently proposed replacement policies: the Dynamic Insertion Policy (DIP), the Shepherd Cache (SC), the MLP-aware replacement, and the Instruction-based Reuse Distance Prediction (IbRDP) replacement policy. Our experimental studies show that there is a great inconsistency between the number of misses saved by each mechanism and the resulting improvement in IPC. This is particularly true for the DIP and the SC approach and indeed attest to the fact that these algorithms do not take into account the relative cost of each miss (i.e., whether it is an isolated or parallel miss). Their aim is to blindly lower the total number of misses. On the other hand, the MLP-aware replacement, although miss-cost-aware, cannot handle efficiently workloads which display LRU-hostile behavior and thus fails to reduce execution time even when there are ample opportunities to reduce cache misses. The IbRDP replacement policy shows both the ability to deal with non-LRU access patterns and MLP friendliness leading to greater consistency between the reduction of misses and the corresponding increase in performance thus the largest IPC improvement among the studied mechanisms. So, what are the appropriate characteristics of a replacement algorithm targeting the lower levels of the memory hierarchy? In this paper we are shedding some light on this question.

References

[1]

L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 1966.

Digital Library

[2]

Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploring memory-level parallelism. Proc. of the International Symposium on Computer Architecture, 2004.

Digital Library

[3]

S. Eyerman and L. Ecckhout. A MLP-Aware Fetch Policy for SMT Processors. Proc. of the International Symposium on High Performance Computer Architecture, 2007.

Digital Library

[4]

A. Glew. MLP yes! ILP no! In Wild and Crazy Ideas Session, 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998.

[5]

A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies tuned to Different Types of Locality. Proc. of the International Conference on Supercomputing, 1995.

Digital Library

[6]

J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. Proc. of the International Symposium on High Performance Computer Architecture, 2003.

Digital Library

[7]

T. Johnson, D. Connors, M. Merten, and W. Hwu. Run-Time Cache Bypassing. IEEE Transactions on Computers, 1999.

Digital Library

[8]

M. Kampe and F. Dahlgren. Exploration of the Spatial Locality on Emerging Applications and the Consequences for Cache Performance. Proc. of the International Parallel and Distributed Computing Symposium, 2000.

Digital Library

[9]

T.S. Karkhanis and J.E. Smith. A first-order superscalar processor model. Proc. of the International Symposium on Computer Architecture, 2004.

Digital Library

[10]

M. Karlsson and E. Hagersten. Timestamp-Based Selective Cache Allocation. In the Workshop on Memory Performance Issues, 2001.

[11]

S. Kaxiras, Z. Hu, M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. Proc. of the International Symposium on Computer Architecture, 2001.

Digital Library

[12]

G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache Replacement Based on Reuse-Distance Prediction. Proc. of the International Conference on Computer Design, 2007.

[13]

M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement Algorithms. Proc. of the International Conference on Computer Design, 2005.

Digital Library

[14]

A. C. Lai and B. Falsafi. Selective, accurate, and timely self-invalidation using last-touch prediction. Proc. of the International Symposium on Computer Architecture, 2000.

Digital Library

[15]

J. Lee, Y. Solihin and J. Torellas. Automatically mapping code on an intelligent memory architecture. Proc. of the International Symposium on High-Performance Computer Architecture, 2001.

Digital Library

[16]

W. F. Lin and S. K. Reinhardt. Predicting last-touch references under optimal replacement. University of Michigan Technical Report, 2002.

[17]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970.

Digital Library

[18]

V. Milutinovic, B. Markovic, M. Tomasevic, and M. Tremblay. The Split Temporal/Spatial Cache: Initial Performance analysis. Journal of Systems Architecture: the EUROMICRO Journal, 1996.

[19]

P. Petoumenos, G. Keramidas, and S. Kaxiras. Instruction based Reuse Distance Prediction for Effective Cache Management. Proc. of the International Symposium on Systems, Architectures, Modeling, and Simulation, 2009.

Digital Library

[20]

P. Petoumenos, G. Psychou, S. Kaxiras, J. M. Cebrian Gonzalez and J. L. Aragon. MLP-Aware Instruction Queue Resizing: the Key to Power-efficient Performance. Proc. of the International Conference on Architecture of Computing Systems, 2010.

Digital Library

[21]

T. Piquet, O. Rochecouste, and A. Seznec. Exploiting Single-Usage for Effective Memory Management. Proc. of the Asia-Pacific Computer Systems Architecture Conference, 2007.

Digital Library

[22]

M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer. Adaptive insertion policies for high-performance caching. Proc. of the International Symposium on Computer Architecture, 2007.

Digital Library

[23]

M. K. Qureshi, D. Lynch, O. Mutlu, and Y. N. Patt. A Case for MLP-Aware Cache Replacement. Proc. of the International Symposium on Computer Architecture, 2006.

Digital Library

[24]

K. Rajan and R. Govindarajan. Emulating optimal replacement with a shepherd cache. Proc. of the International Symposium on Microarchitecture, 2007.

Digital Library

[25]

S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. Proc. of the International Symposium on Microarchitecture, 1998.

Digital Library

[26]

M. Takagi and K. Hiraki. Inter-Reference Gap Distribution Replacement: an Improved Replacement Algorithm for Set-Associative Caches. Proc. of the International Conference on Supercomputing, 2004.

Digital Library

[27]

G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. Proc. of the International Symposium on Microarchitecture, 1995.

Digital Library

[28]

W. A. Wong and J. L. Baer. Modified LRU policies for improving second-level cache behavior. Proc. of the International Symposium on High-Performance Computer Architecture, 2000.

Cited By

Beckmann NSanchez D(2016)Modeling cache performance beyond LRU2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446067(225-236)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446067

Index Terms

Where replacement algorithms fail: a thorough analysis
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Cache Replacement Algorithms with Nonuniform Miss Costs

Cache replacement algorithms originally developed in the context of uniprocessors executing one instruction at a time implicitly assume that all cache misses have the same cost. However, in modern systems, some cache misses are more expensive than ...
Improving Performance in Sub-Block Caches with Optimized Replacement Policies
Special Issues on Neuromorphic Computing and Emerging Many-Core Systems for Exascale Computing

Recent advances in computer processor design have led to the introduction of sub-blocking to cache architectures. Sub-block caches reduce the tag area and power overhead in caches without reducing the effective cache size by using fewer tags to index ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

May 2010

370 pages

ISBN:9781450300445

DOI:10.1145/1787275

General Chair:
Nancy M. Amato
Texas A&M University, USA
,
Program Chairs:
Hubertus Franke
IBM Research, USA
,
Paul H.J. Kelly
Imperial College London, UK

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CF'10

Sponsor:

SIGMICRO

CF'10: Computing Frontiers Conference

May 17 - 19, 2010

Bertinoro, Italy

Acceptance Rates

CF '10 Paper Acceptance Rate 30 of 113 submissions, 27%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Beckmann NSanchez D(2016)Modeling cache performance beyond LRU2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446067(225-236)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446067

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents