Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1787275.1787316acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Where replacement algorithms fail: a thorough analysis

Published: 17 May 2010 Publication History

Abstract

Cache placement and eviction, especially at the last level of the memory hierarchy, have received a flurry of research activity recently. The common perception that LRU is a well-performing algorithm has recently been discredited: many researchers have turned their attention to more sophisticated algorithms that are able to substantially improve cache performance. In this paper, we thoroughly examine four recently proposed replacement policies: the Dynamic Insertion Policy (DIP), the Shepherd Cache (SC), the MLP-aware replacement, and the Instruction-based Reuse Distance Prediction (IbRDP) replacement policy. Our experimental studies show that there is a great inconsistency between the number of misses saved by each mechanism and the resulting improvement in IPC. This is particularly true for the DIP and the SC approach and indeed attest to the fact that these algorithms do not take into account the relative cost of each miss (i.e., whether it is an isolated or parallel miss). Their aim is to blindly lower the total number of misses. On the other hand, the MLP-aware replacement, although miss-cost-aware, cannot handle efficiently workloads which display LRU-hostile behavior and thus fails to reduce execution time even when there are ample opportunities to reduce cache misses. The IbRDP replacement policy shows both the ability to deal with non-LRU access patterns and MLP friendliness leading to greater consistency between the reduction of misses and the corresponding increase in performance thus the largest IPC improvement among the studied mechanisms. So, what are the appropriate characteristics of a replacement algorithm targeting the lower levels of the memory hierarchy? In this paper we are shedding some light on this question.

References

[1]
L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 1966.
[2]
Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploring memory-level parallelism. Proc. of the International Symposium on Computer Architecture, 2004.
[3]
S. Eyerman and L. Ecckhout. A MLP-Aware Fetch Policy for SMT Processors. Proc. of the International Symposium on High Performance Computer Architecture, 2007.
[4]
A. Glew. MLP yes! ILP no! In Wild and Crazy Ideas Session, 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998.
[5]
A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies tuned to Different Types of Locality. Proc. of the International Conference on Supercomputing, 1995.
[6]
J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. Proc. of the International Symposium on High Performance Computer Architecture, 2003.
[7]
T. Johnson, D. Connors, M. Merten, and W. Hwu. Run-Time Cache Bypassing. IEEE Transactions on Computers, 1999.
[8]
M. Kampe and F. Dahlgren. Exploration of the Spatial Locality on Emerging Applications and the Consequences for Cache Performance. Proc. of the International Parallel and Distributed Computing Symposium, 2000.
[9]
T.S. Karkhanis and J.E. Smith. A first-order superscalar processor model. Proc. of the International Symposium on Computer Architecture, 2004.
[10]
M. Karlsson and E. Hagersten. Timestamp-Based Selective Cache Allocation. In the Workshop on Memory Performance Issues, 2001.
[11]
S. Kaxiras, Z. Hu, M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. Proc. of the International Symposium on Computer Architecture, 2001.
[12]
G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache Replacement Based on Reuse-Distance Prediction. Proc. of the International Conference on Computer Design, 2007.
[13]
M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement Algorithms. Proc. of the International Conference on Computer Design, 2005.
[14]
A. C. Lai and B. Falsafi. Selective, accurate, and timely self-invalidation using last-touch prediction. Proc. of the International Symposium on Computer Architecture, 2000.
[15]
J. Lee, Y. Solihin and J. Torellas. Automatically mapping code on an intelligent memory architecture. Proc. of the International Symposium on High-Performance Computer Architecture, 2001.
[16]
W. F. Lin and S. K. Reinhardt. Predicting last-touch references under optimal replacement. University of Michigan Technical Report, 2002.
[17]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970.
[18]
V. Milutinovic, B. Markovic, M. Tomasevic, and M. Tremblay. The Split Temporal/Spatial Cache: Initial Performance analysis. Journal of Systems Architecture: the EUROMICRO Journal, 1996.
[19]
P. Petoumenos, G. Keramidas, and S. Kaxiras. Instruction based Reuse Distance Prediction for Effective Cache Management. Proc. of the International Symposium on Systems, Architectures, Modeling, and Simulation, 2009.
[20]
P. Petoumenos, G. Psychou, S. Kaxiras, J. M. Cebrian Gonzalez and J. L. Aragon. MLP-Aware Instruction Queue Resizing: the Key to Power-efficient Performance. Proc. of the International Conference on Architecture of Computing Systems, 2010.
[21]
T. Piquet, O. Rochecouste, and A. Seznec. Exploiting Single-Usage for Effective Memory Management. Proc. of the Asia-Pacific Computer Systems Architecture Conference, 2007.
[22]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer. Adaptive insertion policies for high-performance caching. Proc. of the International Symposium on Computer Architecture, 2007.
[23]
M. K. Qureshi, D. Lynch, O. Mutlu, and Y. N. Patt. A Case for MLP-Aware Cache Replacement. Proc. of the International Symposium on Computer Architecture, 2006.
[24]
K. Rajan and R. Govindarajan. Emulating optimal replacement with a shepherd cache. Proc. of the International Symposium on Microarchitecture, 2007.
[25]
S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. Proc. of the International Symposium on Microarchitecture, 1998.
[26]
M. Takagi and K. Hiraki. Inter-Reference Gap Distribution Replacement: an Improved Replacement Algorithm for Set-Associative Caches. Proc. of the International Conference on Supercomputing, 2004.
[27]
G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. Proc. of the International Symposium on Microarchitecture, 1995.
[28]
W. A. Wong and J. L. Baer. Modified LRU policies for improving second-level cache behavior. Proc. of the International Symposium on High-Performance Computer Architecture, 2000.

Cited By

View all
  • (2016)Modeling cache performance beyond LRU2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446067(225-236)Online publication date: Mar-2016

Index Terms

  1. Where replacement algorithms fail: a thorough analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '10: Proceedings of the 7th ACM international conference on Computing frontiers
    May 2010
    370 pages
    ISBN:9781450300445
    DOI:10.1145/1787275
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 May 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. last-level caches
    2. memory system
    3. profiling
    4. replacement/placement policies/algorithms

    Qualifiers

    • Research-article

    Conference

    CF'10
    Sponsor:
    CF'10: Computing Frontiers Conference
    May 17 - 19, 2010
    Bertinoro, Italy

    Acceptance Rates

    CF '10 Paper Acceptance Rate 30 of 113 submissions, 27%;
    Overall Acceptance Rate 273 of 785 submissions, 35%

    Upcoming Conference

    CF '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Modeling cache performance beyond LRU2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446067(225-236)Online publication date: Mar-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media