Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3591195.3595267acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article
Open access

Blast from the Past: Least Expected Use (LEU) Cache Replacement with Statistical History

Published: 06 June 2023 Publication History

Abstract

Cache replacement policies typically use some form of statistics on past access behavior. As a common limitation, however, the extent of the history being recorded is limited to either just the data in cache or, more recently, a larger but still finite-length window of accesses, because the cost of keeping a long history can easily outweigh its benefit.
This paper presents a statistical method to keep track of instruction pointer-based access reuse intervals of arbitrary length and uses this information to identify the Least Expected Use (LEU) blocks for replacement. LEU uses dynamic sampling supported by novel hardware that maintains a state to record arbitrarily long reuse intervals. LEU is evaluated using the Cache Replacement Championship simulator, tested on PolyBench and SPEC, and compared with five policies including a recent technique that approximates optimal caching using a fixed-length history. By maintaining statistics for an arbitrary history, LEU outperforms previous techniques for a broad range of scientific kernels, whose data reuses are longer than those in traces traditionally used in computer architecture studies.

References

[1]
[n. d.]. The 2nd cache replacement championship-co-located with isca june 2017. https://crc2.ece.tamu.edu/
[2]
[n. d.]. Standard Performance Evaluation Corporation. https://www.spec.org/cpu2006/
[3]
Nathan Beckmann and Daniel Sanchez. 2016. Modeling cache performance beyond LRU. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 225-236.
[4]
L. A. Belady. 1966. A study of replacement algorithms for a virtualstorage computer. IBM Systems Journal 5, 2 ( 1966 ), 78-101.
[5]
Erik Berg and Erik Hagersten. 2004. StatCache: A probabilistic approach to eficient and accurate data locality analysis. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (Austin, Texas). 20-27.
[6]
Edward Grady Cofman and Peter J Denning. 1973. Operating systems theory. Vol. 973. prentice-Hall Englewood Clifs, NJ.
[7]
Subhasis Das, Tor M Aamodt, and William J Dally. 2015. Reuse distancebased probabilistic cache replacement. ACM Transactions on Architecture and Code Optimization (TACO) 12, 4 ( 2015 ), 1-22.
[8]
Nam Duong, Dali Zhao, Taesu Kim, Rosario Cammarota, Mateo Valero, and Alexander V. Veidenbaum. 2012. Improving Cache Management Policies Using Dynamic Reuse Distances. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 389-400. https://doi.org/10.1109/MICRO. 2012.43
[9]
Agner Fog. 1996-2019. Instruction Tables,Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD, and VIA CPUs. ( 1996-2019), 179-180. https://www.agner.org/optimize/ instruction_tables.pdf
[10]
Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In 2012 Innovative Parallel Computing (InPar). Ieee, 1-10.
[11]
Akanksha Jain and Calvin Lin. 2016. Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement. In Proceedings of the International Symposium on Computer Architecture. 78-89. https://doi.org/10.1109/ISCA. 2016.17
[12]
Aamer Jaleel. [n. d.]. Memory Characterization of Workloads Using Instrumentation-Driven Simulation, A Pin-based Memory Characterization of the SPEC CPU2000 and SPEC CPU2006 Benchmark Suites. ([n. d.]). http://http://www.glue.umd.edu/~ajaleel/workload/
[13]
Aamer Jaleel, Joseph Nuzman, Adrian Moga, Simon C Steely Jr, and Joel Emer. 2015. High Performing Cache Hierarchies for Server Workloads. In High-Performance Computer Architecture (HPCA).
[14]
Aamer Jaleel, Kevin B Theobald, Simon C Steely Jr, and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 60-71.
[15]
S. Jiang and X. Zhang. 2002. LIRS: an eficient low inter-reference recency set replacement to improve bufer cache performance. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems. Marina Del Rey, California.
[16]
Richard M Karp, Scott Shenker, and Christos H Papadimitriou. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28, 1 ( 2003 ), 51-55.
[17]
Georgios Keramidas, Pavlos Petoumenos, and Stefanos Kaxiras. 2007. Cache replacement based on reuse-distance prediction. In Proceedings of the Proceedings of the International Conference on Computer Design (ICCD). 245-250. https://doi.org/10.1109/ICCD. 2007.4601909
[18]
Jinchun Kim, Elvira Teran, Paul V Gratz, Daniel A Jiménez, Seth H Pugsley, and Chris Wilkerson. 2017. Kill the program counter: Reconstructing program behavior in the processor cache hierarchy. ACM SIGARCH Computer Architecture News 45, 1 ( 2017 ), 737-749.
[19]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geof Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. In Acm sigplan notices, Vol. 40. ACM, 190-200.
[20]
R. Manikantan, Kaushik Rajan, and R. Govindarajan. 2011. NUcache: An eficient multicore cache organization based on Next-Use distance. In Proceedings of the International Symposium on High-Performance Computer Architecture. 243-253. https://doi.org/10.1109/HPCA. 2011. 5749733
[21]
Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation techniques for storage hierarchies. IBM Systems journal 9, 2 ( 1970 ), 78-117.
[22]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP laboratories 27 ( 2009 ), 28.
[23]
Samuel Pakalapati and Biswabandan Panda. 2020. Bouquet of instruction pointers: Instruction pointer classifier-based spatial hardware prefetching. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 118-131.
[24]
Pavlos Petoumenos, Georgios Keramidas, and Stefanos Kaxiras. 2009. Instruction-based reuse-distance prediction for efective cache management. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. 49-58. https://doi.org/10.1109/ICSAMOS. 2009.5289241
[25]
Louis-Noël Pouchet. [n. d.]. PolyBench/C 4.0. http://polybench.sourceforge.net.
[26]
Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. URL: http://www. cs. ucla. edu/pouchet/software/polybench ( 2012 ).
[27]
Kaushik Rajan and Ramaswamy Govindarajan. 2007. Emulating Optimal Replacement with a Shepherd Cache. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 445-454. https://doi.org/10.1109/MICRO. 2007.25
[28]
Ishan Shah, Akanksha Jain, and Calvin Lin. 2022. Efective Mimicry of Belady's MIN Policy. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 558-572. https://doi.org/10.1109/HPCA53966. 2022.00048
[29]
Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. 2019. Applying Deep Learning to the Cache Replacement Problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 413-425.
[30]
Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C Steely Jr, and Joel Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 430-441.
[31]
Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: a higher order theory of locality. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 343-356.
[32]
Liang Yuan, Chen Ding, Wesley Smith, Peter Denning, and Yunquan Zhang. 2019. A Relational Theory of Locality. ACM Transactions on Architecture and Code Optimization (TACO) 16, 3 ( 2019 ), 33.
[33]
Tomofumi Yuki and Louis-No¨ el Pouchet. 2015. POLYBENCH 4.0. ( 2015 ).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM 2023: Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management
June 2023
175 pages
ISBN:9798400701795
DOI:10.1145/3591195
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Caching
  2. Reuse Interval
  3. Statistical history

Qualifiers

  • Research-article

Funding Sources

Conference

ISMM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 588
    Total Downloads
  • Downloads (Last 12 months)326
  • Downloads (Last 6 weeks)43
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media