Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2465529.2465756acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Reuse-based online models for caches

Published: 17 June 2013 Publication History

Abstract

We develop a reuse distance/stack distance based analytical modeling framework for efficient, online prediction of cache performance for a range of cache configurations and replacement policies LRU, PLRU, RANDOM, NMRU. Our framework unifies existing cache miss rate prediction techniques such as Smith's associativity model, Poisson variants, and hardware way-counter based schemes. We also show how to adapt LRU way-counters to work when the number of sets in the cache changes. As an example application, we demonstrate how results from our models can be used to select, based on workload access characteristics, last-level cache configurations that aim to minimize energy-delay product.

References

[1]
A. Agarwal, J. Hennessy, and M. Horowitz. An analytical cache model. ACM Transactions on Computer Systems, 7(2):184--215, May 1989.
[2]
A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, D. J. Sorin, M. D. Hill, and D. A. Wood. Simulating a$2M commercial server on a$2K PC. IEEE Computer, 36(2):50--57, Feb. 2003.
[3]
A. R. Alameldeen and D. A. Wood. IPC considered harmful for multiprocessor workloads. IEEE Micro, 26(4):8--17, Jul/Aug 2006.
[4]
D. H. Albonesi. Selective cache ways: on-demand cache resource allocation. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 248--259, Nov. 1999.
[5]
G. Almási, C. Caşcaval, and D. A. Padua. Calculating stack distances efficiently. In Proceedings of the 2002 workshop on Memory system performance, pages 37--43, June 2002.
[6]
V. Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. Jones, and B. Parady. SPEComp: A new benchmark suite for measuring parallel computer performance. In Workshop on OpenMP Applications and Tools, pages 1--10, July 2001.
[7]
L. A. Belady. A study of replacement algorithms for virtual-storage computer. IBM Systems Journal, 5(2):78--101, 1966.
[8]
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pages 617--622, Aug. 2001.
[9]
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, Jan. 2011.
[10]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, July 1970.
[11]
A. Boneh and M. Hofri. The coupon-collector problem revisited -- a survey of engineering problems and computational methods. Communications in Statistics. Stochastic Models, 13(1):39--66, 1997.
[12]
J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In Proceedings of the 9th Annual ACM Symposium on Theory of Computing, pages 106--112, May 1977.
[13]
R. Cypher. Apparatus and method for determining stack distance including spatial locality of running software for estimating cache miss rates based upon contents of a hash table. US7366871, Apr. 2008.
[14]
R. Cypher. Apparatus and method for determining stack distance of running software for estimating cache miss rates based upon contents of a hash table. US7373480, May 2008.
[15]
C. Ding and Y. Zhong. Reuse distance analysis. Technical Report UR-CS-TR-741, University of Rochester, Feb. 2001.
[16]
S. Dropsho, A. Buyuktosunoglu, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, G. Semeraro, G. Magklis, and M. L. Scott. Integrating adaptive on-chip storage structures for reduced dynamic power. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 141--152, Sept. 2002.
[17]
B. Falsafi and D. A. Wood. Modeling cost/performance of a parallel computer simulator. ACM Transactions on Modeling and Computer Simulation, 7(1):104--130, Jan. 1997.
[18]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 148--157, May 2002.
[19]
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. In IEEE Journal of Solid-State Circuits, pages 1277--1284, Sept. 1996.
[20]
A. Gordon-Ross, P. Viana, F. Vahid, W. Najjar, and E. Barros. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of the conference on Design, automation and test in Europe, pages 755--760, Apr. 2007.
[21]
M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612--1630, Dec. 1989.
[22]
S. Jahagirdar, V. George, I. Sodhi, and R. Wells. Power management of the third generation Intel Core micro architecture formerly codenamed Ivy Bridge. In Hot Chips 24, Aug. 2012.
[23]
K. Kedzierski, M. Moreto, F. Cazorla, and M. Valero. Adapting cache partitioning algorithms to pseudo-LRU replacement policies. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium, pages 1--12, Apr. 2010.
[24]
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers, 43(6):664--675, 1994.
[25]
S. Laha. Accurate low-cost methods for performance evaluation of cache memory systems. PhD thesis, University of Illinois, Dept. of Computer Science, 1988.
[26]
S. Laha, J. H. Patel, and R. K. Iyer. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Transactions on Computers, 37(11):1325--1336, 1988.
[27]
H. Le, W. Starke, J. Fields, F. O'Connell, D. Nguyen, B. Ronchetti, W. Sauer, E. Schwarz, and M. Vaden. IBM POWER6 microarchitecture. IBM Journal of Research and Development, 51(6), 2007.
[28]
F. Liu, F. Guo, Y. Solihin, S. Kim, and A. Eker. Characterizing and modeling the behavior of context switch misses. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 91--101, Oct. 2008.
[29]
Y. Liu and W. Zhang. Exploiting stack distance to estimate worst-case data cache performance. In Proceedings of the 2009 ACM Symposium on Applied Computing, pages 1979--1983, Mar. 2009.
[30]
G. H. Loh and M. D. Hill. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 454--464, Dec. 2011.
[31]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. Computer Architecture News, pages 92--99, Sept. 2005.
[32]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[33]
T. R. Puzak. Analysis of Cache Replacement Algorithms. PhD thesis, Dept. of Electrical and Computer Engineering, University of Massachusetts, 1985.
[34]
M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 167--178, June 2006.
[35]
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, Dec. 2006.
[36]
J. Reineke and D. Grund. Relative competitive analysis of cache replacement policies. In Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, pages 51--60, June 2008.
[37]
D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing signatures for transactional memory. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 123--133, Dec. 2007.
[38]
X. Shi, F. Su, J.-K. Peir, and Z. Yang. Modeling and stack simulation of CMP cache capacity and accessibility. IEEE Transactions on Parallel and Distributed Systems, 20(12):1752--1763, Dec. 2009.
[39]
T. Shyamkumar, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008--20, Hewlett Packard Labs, 2008.
[40]
A. J. Smith. A comparative study of set associative memory mapping algorithms and their use for cache and main memory. IEEE Transactions on Software Engineering, SE-4(2):121--130, Mar. 1978.
[41]
K. So and R. N. Rechtschaffen. Cache operations by MRU change. IEEE Transactions on Computers, 37(6):700--709, June 1988.
[42]
H. S. Stone and D. Thibaut. Footprints in the cache. In ACM SIGMETRICS Performance Evaluation Review, pages 4--8, May 1986.
[43]
J. E. Strum. Binomial matrices. The Two-Year College Mathematics Journal, 8(5):260--266, Nov. 1977.
[44]
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the Eighth IEEE Symposium on High-Performance Computer Architecture, Feb. 2002.
[45]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic cache partitioning for CMP/SMT systems. Journal of Supercomputing, pages 7--26, 2004.
[46]
D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 121--132, Mar. 2009.
[47]
N. C. Thornock and J. K. Flanagan. Facilitating level three cache studies using set sampling. In Proceedings of the 32nd conference on Winter simulation, pages 471--479, Dec. 2000.
[48]
C. Zhang, F. Vahid, and W. Najjar. A highly configurable cache architecture for embedded systems. In Proceedings of the 30th Annual International Symposium on Computer architecture, pages 136--146, June 2003.
[49]
Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6):1--39, Aug. 2009.

Cited By

View all
  • (2023)LLVM Static Analysis for Program Characterization and Memory Reuse Profile EstimationProceedings of the International Symposium on Memory Systems10.1145/3631882.3631885(1-6)Online publication date: 2-Oct-2023
  • (2022)Understanding I/O Direct Cache Access Performance for End Host NetworkingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080426:1(1-37)Online publication date: 28-Feb-2022
  • (2021)KangarooProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483568(243-262)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '13: Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
June 2013
406 pages
ISBN:9781450319003
DOI:10.1145/2465529
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 41, Issue 1
    Performance evaluation review
    June 2013
    385 pages
    ISSN:0163-5999
    DOI:10.1145/2494232
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LRU
  2. NMRU
  3. PLRU
  4. cache
  5. random
  6. replacement policies
  7. reuse distance
  8. stack distance

Qualifiers

  • Research-article

Conference

SIGMETRICS '13
Sponsor:

Acceptance Rates

SIGMETRICS '13 Paper Acceptance Rate 54 of 196 submissions, 28%;
Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)3
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)LLVM Static Analysis for Program Characterization and Memory Reuse Profile EstimationProceedings of the International Symposium on Memory Systems10.1145/3631882.3631885(1-6)Online publication date: 2-Oct-2023
  • (2022)Understanding I/O Direct Cache Access Performance for End Host NetworkingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080426:1(1-37)Online publication date: 28-Feb-2022
  • (2021)KangarooProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483568(243-262)Online publication date: 26-Oct-2021
  • (2021)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-4Online publication date: 28-Jun-2021
  • (2020)PPT-SASMM: Scalable Analytical Shared Memory ModelProceedings of the International Symposium on Memory Systems10.1145/3422575.3422806(341-351)Online publication date: 28-Sep-2020
  • (2019)Directed Statistical Warming through Time TravelingProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture - MICRO '5210.1145/3352460.3358264(1037-1049)Online publication date: 2019
  • (2019)A Relational Theory of LocalityACM Transactions on Architecture and Code Optimization10.1145/334110916:3(1-26)Online publication date: 20-Aug-2019
  • (2019)Morphable DRAM Cache Design for Hybrid Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/333850516:3(1-24)Online publication date: 18-Jul-2019
  • (2019)Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP LoopsACM Transactions on Architecture and Code Optimization10.1145/333306016:3(1-26)Online publication date: 18-Jul-2019
  • (2019)Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct TopologiesACM Transactions on Architecture and Code Optimization10.1145/331980516:2(1-26)Online publication date: 18-Apr-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media