research-article

Fast modeling of shared caches in multicore systems

Authors:

David Black-Schaffer,

Erik HagerstenAuthors Info & Claims

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

Pages 147 - 157

https://doi.org/10.1145/1944862.1944885

Published: 24 January 2011 Publication History

Abstract

This work presents StatCC, a simple and efficient model for estimating the shared cache miss ratios of co-scheduled applications on architectures with a hierarchy of private and shared caches. StatCC leverages the StatStack cache model to estimate the co-scheduled applications' cache miss ratios from their individual memory reuse distance distributions, and a simple performance model that estimates their CPIs based on the shared cache miss ratios. These methods are combined into a system of equations that explicitly models the CPIs in terms of the shared miss ratios and can be solved to determine both. The result is a fast algorithm with a 2% error across the SPEC CPU2006 benchmark suite compared to a simulated in-order processor and a hierarchy of private and shared caches.

References

[1]

A. Fedorova, M. Seltzer and M. D. Smith. A Non-Work-Conserving Operating System Scheduler for SMT Processors. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), in conjunction with ISCA-33, Boston, MA, USA, June 2006.

[2]

A. Fedorova, M. Seltzer and M. D. Smith. Cache-fair Thread Scheduling for Multicore Processors. Technical Report TR-17-06, Division of Engineering and Applied Sciences, Harvard University, Oct. 2006.

[3]

A. Fedorova, M. Seltzer, C. Small and D. Nussbaum. Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 26--26, Berkeley, CA, USA, 2005. USENIX Association.

Digital Library

[4]

E. Berg and E. Hagersten. StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004), Austin, Texas, USA, Mar. 2004.

Digital Library

[5]

E. Berg and E. Hagersten. Fast Data-Locality Profiling of Native Execution. In Proceedings of ACM SIGMETRICS 2005, Banff, Canada, June 2005.

Digital Library

[6]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[7]

X. E. Chen and T. M. Aamodt. A First-Order Fine-Grained Multithreaded Throughput Model. In IEEE International Symposium on High Performance Computer Architecture (HPCA 2009), pages 329--340, Raleigh, North Carolina, USA, Feb. 2009.

[8]

C. Ding and Y. Zhong. Predicting Whole-Program Locality Through Reuse Distance Analysis. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 245--257, New York, NY, USA, 2003. ACM.

Digital Library

[9]

D. Eklov, D. Black-Schaffer, and E. Hagersten. StatCC: A Statistical Cache Contention Model. In PACT '10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 551--552, Vienna, Austria, Sept. 2010.

Digital Library

[10]

D. Eklov and E. Hagersten. StatStack: Efficient Modeling of LRU caches. In Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2010), White Plains, NY, USA, Mar. 2010

[11]

S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Mechanistic Performance Model for Superscalar out-of-order Processors. ACM Trans. Comput. Syst., 27(2):1--37, 2009.

Digital Library

[12]

T. S. Karkhanis and J. E. Smith. A First-Order Superscalar Processor Model. SIGARCH Comput. Archit. News, 32(2):338, 2004.

Digital Library

[13]

R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput., 43(6):664--675, 1994.

Digital Library

[14]

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hällberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. Computer, 35:50--58, 2002.

Digital Library

[15]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal, 9(2):78--117, 1970.

Digital Library

[16]

F. Olken. Efficient Methods for Calculating the Success Function of Fixed Space Replacement Policies. Technical Report LBL-12370, Lawrence Berkeley Lab Berkeley, May 1981.

[17]

D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating Multicore Reuse Distance Analysis with Sampling and Parallelization. In PACT '10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, Sept. 2010.

Digital Library

[18]

X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 55--61, New York, NY, USA, 2007. ACM.

Digital Library

[19]

R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. SIGMETRICS Perform. Eval. Rev., 21(1):24--35, 1993.

Digital Library

[20]

T. M. Taha and S. Wills. An Instruction Throughput Model of Superscalar Processors. IEEE Trans. Comput., 57(3):389--403, 2008.

Digital Library

[21]

R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the International Symposium on Computer Architecture, pages 84--95, 2003.

Digital Library

[22]

Y. Zhong and W. Chang. Sampling-based program locality approximation. In ISMM '08: Proceedings of the 7th international symposium on Memory management, pages 91--100, New York, NY, USA, 2008. ACM.

Digital Library

Cited By

Liu FZhu YSun SDing CSmith WHosseini K(2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676948
Mohamed AMubark NZagloul S(2023)Performance aware shared memory hierarchy model for multicore processorsScientific Reports10.1038/s41598-023-34297-313:1Online publication date: 5-May-2023
https://doi.org/10.1038/s41598-023-34297-3
Li LPandey SFlynn TLiu HWheeler NHoisie A(2022)SimNetProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35308916:2(1-24)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3530891
Show More Cited By

Index Terms

Fast modeling of shared caches in multicore systems
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multithreading

Recommendations

Exploiting reuse locality on inclusive shared last-level caches
Special Issue on High-Performance Embedded Architectures and Compilers

Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is ...
Fetch Caches
Adaptive insertion policies for managing shared caches
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Chip Multiprocessors (CMPs) allow different applications to concurrently execute on a single chip. When applications with differing demands for memory compete for a shared cache, the conventional LRU replacement policy can significantly degrade cache ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

January 2011

226 pages

ISBN:9781450302418

DOI:10.1145/1944862

General Chairs:
Manolis Katevenis
FORTH-ICS and U.Crete, Greece
,
Margaret Martonosi
Princeton University
,
Program Chairs:
Christos Kozyrakis
Stanford University
,
Olivier Temam
INRIA, France

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

HiPEAC: HiPEAC Network of Excellence

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

HIPEAC '11

Sponsor:

HiPEAC

HIPEAC '11: International Conference on High-Performance and Embedded Architectures and Compilers

January 24 - 26, 2011

Heraklion, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
580
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu FZhu YSun SDing CSmith WHosseini K(2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676948
Mohamed AMubark NZagloul S(2023)Performance aware shared memory hierarchy model for multicore processorsScientific Reports10.1038/s41598-023-34297-313:1Online publication date: 5-May-2023
https://doi.org/10.1038/s41598-023-34297-3
Li LPandey SFlynn TLiu HWheeler NHoisie A(2022)SimNetProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35308916:2(1-24)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3530891
Karanatsiou DSermpezis PGruda DKafetsios KDimitriadis IVakali A(2022)My Tweets Bring All the Traits to the Yard: Predicting Personality and Relational Traits in Online Social NetworksACM Transactions on the Web10.1145/352374916:2(1-26)Online publication date: 20-May-2022
https://dl.acm.org/doi/10.1145/3523749
Ding CChen DLiu FReber BSmith W(2022)CARL: Compiler Assigned Reference LeasingACM Transactions on Architecture and Code Optimization10.1145/349873019:1(1-28)Online publication date: 17-Mar-2022
https://dl.acm.org/doi/10.1145/3498730
Pourebadi MRiek L(2022)Facial Expression Modeling and Synthesis for Patient Simulator Systems: Past, Present, and FutureACM Transactions on Computing for Healthcare10.1145/34835983:2(1-32)Online publication date: 3-Mar-2022
https://dl.acm.org/doi/10.1145/3483598
Biswas APatro GGanguly NGummadi KChakraborty A(2021)Toward Fair Recommendation in Two-sided PlatformsACM Transactions on the Web10.1145/350362416:2(1-34)Online publication date: 21-Dec-2021
https://dl.acm.org/doi/10.1145/3503624
Sasongko MChabbi MMarzijarani MUnat D(2021)ReuseTracker: Fast Yet Accurate Multicore Reuse Distance AnalyzerACM Transactions on Architecture and Code Optimization10.1145/348419919:1(1-25)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3484199
Viola CŽivný S(2021)The Combined Basic LP and Affine IP Relaxation for Promise VCSPs on Infinite DomainsACM Transactions on Algorithms10.1145/345804117:3(1-23)Online publication date: 15-Jul-2021
https://dl.acm.org/doi/10.1145/3458041
Coester CKoutsoupias ELazos P(2021)The Infinite Server ProblemACM Transactions on Algorithms10.1145/345663217:3(1-23)Online publication date: 15-Jul-2021
https://dl.acm.org/doi/10.1145/3456632
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten