Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2247684.2247687acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
research-article

Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis

Published: 16 June 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache hierarchies employed in modern CPUs. In today's hierarchies, performance is determined by complicated thread interactions, such as interference in shared caches and replication and communication in private caches. Researchers normally perform extensive simulations to study these interactions, but this can be costly and not very insightful. An alternative is multicore reuse distance (RD) analysis, which can provide extremely rich information about multicore memory behavior. In this paper, we apply multicore RD analysis to better understand cache system design. We focus on loop-based parallel programs, an important class of programs for which RD analysis provides high accuracy. We propose a novel framework to identify optimal multicore cache hierarchies, and extract several new insights. We also characterize how the optimal cache hierarchies vary with core count and problem size.

    References

    [1]
    K. Beyls and E. H. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, 2001.
    [2]
    C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2008.
    [3]
    J. Davis, J. Laudon, and K. Olukotun. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, 2005.
    [4]
    C. Ding and T. Chilimbi. A composable model for analyzing locality of multi-threaded programs. Technical Report MSR-TR-2009-107, Microsoft Research, 2009.
    [5]
    L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell. Exploring the cache design space for large scale CMPs. ACM SIGARCH Computer Architecture News, 2005.
    [6]
    J. Huh, S. W. Keckler, and D. Burger. Exploring the design space of future CMPs. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, 2001.
    [7]
    Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceeding of Compiler Construction, 2010.
    [8]
    M. Kulkarni, V. S. Pai, and D. L. Schuff. Towards architecture independent metrics for multicore performance analysis. ACM SIGMETRICS Performance Evaluation Review, 2010.
    [9]
    J. Li and J. F. Martinez. Power-performance implications of thread-level parallelism on chip multiprocessors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, 2005.
    [10]
    Y. Li, B. Lee, D. Brooks, Z. Hu, and K. Skadron. CMP design space exploration subject to physical constraints. In Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.
    [11]
    C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Programming Language Design and Implementation, 2005.
    [12]
    M. Wu and D. Yeung. Understanding multicore cache behavior of loop-based parallel programs via reuse distance analysis. Technical Report UMIACS-TR-2012-1, University of Maryland, 2012.
    [13]
    C. McCurdy and C. Fischer. Using pin as a memory reference generator for multiprocessor simulation. ACM SIGARCH Computer Architecture News, 2005.
    [14]
    R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: a benchmark suite for data mining workloads. In Proceedings of the International Symposium on Workload Characterization, 2006.
    [15]
    A. Qasem and K. Kennedy. Evaluating a model for cache conflict miss prediction. Technical Report CS-TR05-457, Rice University, 2005.
    [16]
    D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010.
    [17]
    D. L. Schuff, B. S. Parsons, and V. S. Pai. Multicore-aware reuse distance analysis. Technical Report TR-ECE-09-08, Purdue University, 2009.
    [18]
    S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, 1995.
    [19]
    M. Wu and D. Yeung. Coherent profiles: enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs. In Proceedings of the 20th International Symposium on Parallel Architectures and Compilation Techniques, 2011.
    [20]
    X. Xiang, B. Bao, C. Ding, and Y. Gao. Linear-time modeling of program working set in shared cache. In Proceedings of the 20th International Symposium on Parallel Architectures and Compilation Techniques, 2011.
    [21]
    L. Zhao, R. Iyer, S. Makineni, J. Moses, R. Illikkal, and D. Newell. Performance, area and bandwidth implications on large-scale CMP cache design. In Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnect, 2007.
    [22]
    Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, 2003.

    Cited By

    View all
    • (2023)A Profiling-Based Approach to Cache Partitioning of Program DataParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_35(453-463)Online publication date: 8-Apr-2023
    • (2021)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-4Online publication date: 28-Jun-2021
    • (2019)Featherlight Reuse-Distance Measurement2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00056(440-453)Online publication date: Feb-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSPC '12: Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
    June 2012
    82 pages
    ISBN:9781450312196
    DOI:10.1145/2247684
    • General Chair:
    • Lixin Zhang,
    • Program Chair:
    • Onur Mutlu
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache performance
    2. chip multiprocessors
    3. reuse distance

    Qualifiers

    • Research-article

    Conference

    PLDI '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6 of 20 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Profiling-Based Approach to Cache Partitioning of Program DataParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_35(453-463)Online publication date: 8-Apr-2023
    • (2021)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-4Online publication date: 28-Jun-2021
    • (2019)Featherlight Reuse-Distance Measurement2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00056(440-453)Online publication date: Feb-2019
    • (2018)Efficient Cache Performance Modeling in GPUs Using Reuse Distance AnalysisACM Transactions on Architecture and Code Optimization10.1145/329105115:4(1-24)Online publication date: 19-Dec-2018
    • (2017)Optimizing locality in graph computations using reuse distance profiles2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC)10.1109/PCCC.2017.8280444(1-8)Online publication date: Dec-2017
    • (2017)Guiding Locality Optimizations for Graph Computations via Reuse Distance AnalysisIEEE Computer Architecture Letters10.1109/LCA.2017.269517816:2(119-122)Online publication date: 1-Jul-2017
    • (2017)Optimizing thin client caches for mobile cloud computing:Concurrency and Computation: Practice and Experience10.1002/cpe.404829:11Online publication date: 3-Mar-2017
    • (2016)Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance AnalysisACM Transactions on Computer Systems10.1145/285150334:1(1-30)Online publication date: 6-Apr-2016
    • (2015)Why Does Data Prefetching Not Work for Modern Workloads?The Computer Journal10.1093/comjnl/bxv11259:2(244-259)Online publication date: 23-Dec-2015
    • (2014)"CERE"Proceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)10.1109/HPCC.2014.97(566-573)Online publication date: 20-Aug-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media