Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1944862.1944885acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

Fast modeling of shared caches in multicore systems

Published: 24 January 2011 Publication History

Abstract

This work presents StatCC, a simple and efficient model for estimating the shared cache miss ratios of co-scheduled applications on architectures with a hierarchy of private and shared caches. StatCC leverages the StatStack cache model to estimate the co-scheduled applications' cache miss ratios from their individual memory reuse distance distributions, and a simple performance model that estimates their CPIs based on the shared cache miss ratios. These methods are combined into a system of equations that explicitly models the CPIs in terms of the shared miss ratios and can be solved to determine both. The result is a fast algorithm with a 2% error across the SPEC CPU2006 benchmark suite compared to a simulated in-order processor and a hierarchy of private and shared caches.

References

[1]
A. Fedorova, M. Seltzer and M. D. Smith. A Non-Work-Conserving Operating System Scheduler for SMT Processors. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), in conjunction with ISCA-33, Boston, MA, USA, June 2006.
[2]
A. Fedorova, M. Seltzer and M. D. Smith. Cache-fair Thread Scheduling for Multicore Processors. Technical Report TR-17-06, Division of Engineering and Applied Sciences, Harvard University, Oct. 2006.
[3]
A. Fedorova, M. Seltzer, C. Small and D. Nussbaum. Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 26--26, Berkeley, CA, USA, 2005. USENIX Association.
[4]
E. Berg and E. Hagersten. StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004), Austin, Texas, USA, Mar. 2004.
[5]
E. Berg and E. Hagersten. Fast Data-Locality Profiling of Native Execution. In Proceedings of ACM SIGMETRICS 2005, Banff, Canada, June 2005.
[6]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society.
[7]
X. E. Chen and T. M. Aamodt. A First-Order Fine-Grained Multithreaded Throughput Model. In IEEE International Symposium on High Performance Computer Architecture (HPCA 2009), pages 329--340, Raleigh, North Carolina, USA, Feb. 2009.
[8]
C. Ding and Y. Zhong. Predicting Whole-Program Locality Through Reuse Distance Analysis. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 245--257, New York, NY, USA, 2003. ACM.
[9]
D. Eklov, D. Black-Schaffer, and E. Hagersten. StatCC: A Statistical Cache Contention Model. In PACT '10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 551--552, Vienna, Austria, Sept. 2010.
[10]
D. Eklov and E. Hagersten. StatStack: Efficient Modeling of LRU caches. In Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2010), White Plains, NY, USA, Mar. 2010
[11]
S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Mechanistic Performance Model for Superscalar out-of-order Processors. ACM Trans. Comput. Syst., 27(2):1--37, 2009.
[12]
T. S. Karkhanis and J. E. Smith. A First-Order Superscalar Processor Model. SIGARCH Comput. Archit. News, 32(2):338, 2004.
[13]
R. E. Kessler, M. D. Hill, and D. A. Wood. A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Trans. Comput., 43(6):664--675, 1994.
[14]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hällberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. Computer, 35:50--58, 2002.
[15]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[16]
F. Olken. Efficient Methods for Calculating the Success Function of Fixed Space Replacement Policies. Technical Report LBL-12370, Lawrence Berkeley Lab Berkeley, May 1981.
[17]
D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating Multicore Reuse Distance Analysis with Sampling and Parallelization. In PACT '10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, Sept. 2010.
[18]
X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 55--61, New York, NY, USA, 2007. ACM.
[19]
R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. SIGMETRICS Perform. Eval. Rev., 21(1):24--35, 1993.
[20]
T. M. Taha and S. Wills. An Instruction Throughput Model of Superscalar Processors. IEEE Trans. Comput., 57(3):389--403, 2008.
[21]
R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the International Symposium on Computer Architecture, pages 84--95, 2003.
[22]
Y. Zhong and W. Chang. Sampling-based program locality approximation. In ISMM '08: Proceedings of the 7th international symposium on Memory management, pages 91--100, New York, NY, USA, 2008. ACM.

Cited By

View all
  • (2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
  • (2023)Performance aware shared memory hierarchy model for multicore processorsScientific Reports10.1038/s41598-023-34297-313:1Online publication date: 5-May-2023
  • (2022)SimNetProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35308916:2(1-24)Online publication date: 6-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
January 2011
226 pages
ISBN:9781450302418
DOI:10.1145/1944862
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • HiPEAC: HiPEAC Network of Excellence

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

HIPEAC '11
Sponsor:
  • HiPEAC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
  • (2023)Performance aware shared memory hierarchy model for multicore processorsScientific Reports10.1038/s41598-023-34297-313:1Online publication date: 5-May-2023
  • (2022)SimNetProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35308916:2(1-24)Online publication date: 6-Jun-2022
  • (2022)My Tweets Bring All the Traits to the Yard: Predicting Personality and Relational Traits in Online Social NetworksACM Transactions on the Web10.1145/352374916:2(1-26)Online publication date: 20-May-2022
  • (2022)CARL: Compiler Assigned Reference LeasingACM Transactions on Architecture and Code Optimization10.1145/349873019:1(1-28)Online publication date: 17-Mar-2022
  • (2022)Facial Expression Modeling and Synthesis for Patient Simulator Systems: Past, Present, and FutureACM Transactions on Computing for Healthcare10.1145/34835983:2(1-32)Online publication date: 3-Mar-2022
  • (2021)Toward Fair Recommendation in Two-sided PlatformsACM Transactions on the Web10.1145/350362416:2(1-34)Online publication date: 21-Dec-2021
  • (2021)ReuseTracker: Fast Yet Accurate Multicore Reuse Distance AnalyzerACM Transactions on Architecture and Code Optimization10.1145/348419919:1(1-25)Online publication date: 6-Dec-2021
  • (2021)The Combined Basic LP and Affine IP Relaxation for Promise VCSPs on Infinite DomainsACM Transactions on Algorithms10.1145/345804117:3(1-23)Online publication date: 15-Jul-2021
  • (2021)The Infinite Server ProblemACM Transactions on Algorithms10.1145/345663217:3(1-23)Online publication date: 15-Jul-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media