Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3370748.3406564acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Public Access

A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

Published: 10 August 2020 Publication History

Abstract

Modern systems-on-chip (SoCs) include not only general-purpose CPUs but also specialized hardware accelerators. Typically, there are three coherence model choices to integrate an accelerator with the memory hierarchy: no coherence, coherent with the last-level cache (LLC), and private cache based full coherence. However, there has been very limited research on finding which coherence models are optimal for the accelerators of a complex many-accelerator SoC. This paper focuses on determining a cost-aware coherence interface for an SoC and its target application: find the best coherence models for the accelerators that optimize their power and performance, considering both workload characteristics and system-level contention. A novel comprehensive methodology is proposed that uses Bayesian optimization to efficiently find the cost-aware coherence interfaces for SoCs that are modeled using the gem5-Aladdin architectural simulator. For a complete analysis, gem5-Aladdin is extended to support LLC coherence in addition to already-supported no coherence and full coherence. For a heterogeneous SoC targeting applications with varying amount of accelerator-level parallelism, the proposed framework rapidly finds cost-aware coherence interfaces that show significant performance and power benefits over the other commonly-used coherence interfaces.

Supplementary Material

MP4 File (3370748.3406564.mp4)
A new methodology to find optimal coherence models for different accelerators of an SoC

References

[1]
D. D. Sarma and G. Venkataramanan. Compute and redundancy solution for Tesla's full self driving computer. Hot Chips, 2019.
[2]
Y. Chen et al. Accelerator-rich CMPs: from concept to real hardware. In ICCD, pages 169--176, 2013.
[3]
E. G. Cota et al. An analysis of accelerator coupling in heterogeneous architectures. In DAC, pages 202:1--202:6, 2015.
[4]
D. Giri et al. Accelerators and coherence: an SoC perspective. IEEE Micro, 38(6): 36--45, 2018.
[5]
Arm. ARM Cortex-A72 MPCore processor reference manual.
[6]
B. Wile. Coherent accelerator processor interface (CAPI) for POWER8 systems. https://www.alpha-data.com/pdfs/capi_wp_29sept2014_pub.pdf, 2014.
[7]
Ashley Stevens. Introduction to AMBA® 4 ACE and big.little processing technology. ARM White Paper, CoreLink Intelligent System IP, 2011.
[8]
J. Cong et al. On-chip interconnection network for accelerator-rich architectures. In DAC, pages 8:1--8:6, 2015.
[9]
H. Framke et al. Introduction to the wire-speed processor and architecture. In IBM J. of Research and Development, 2010.
[10]
V. J. Reddi and M. Hill. Accelerator-level parallelism (ALP). ACM SiGARCH Computer Architecture Today, 2019.
[11]
Y. S. Shao et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In MICRO, pages 48:1--48:12, 2016.
[12]
C. Tan et al. Stitch: fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In ISCA, pages 575--587, 2018.
[13]
D. Giri et al. NoC-based support of heterogeneous cache-coherence models for accelerators. In NOCS, pages 1:1--1:8, 2018.
[14]
J. Alsop, M. D. Sinclair, and S. V. Adve. Spandex: a flexible interface for efficient heterogeneous coherence. In ISCA, pages 261--274, 2018.
[15]
K. Bhardwaj et al. Determining optimal coherency interface for many-accelerator SoCs using Bayesian optimization. IEEE CAL, 18:119--123, 2019.
[16]
D. Giri et al. Runtime reconfigurable memory hierarchy in embedded scalable platforms (invited). In ASP-DAC, pages 719--726, 2019.
[17]
N. L. Binkert et al. The gem5 simulator. SIGARCH Computer Architecture News, pages 1--7, 2011.
[18]
Y. S. Shao et al. Aladdin: a pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ISCA, pages 97--108, 2014.
[19]
J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, pages 2960--2968, 2012.
[20]
B. Shahriari et al. Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, pages 148--175, 2016.
[21]
C. Rasmussen and C. Williams. Gaussian processes for machine learning. MIT press, Cambridge, MA, 2005.
[22]
W. Ponweiser et al. Multiobjective optimization on a limited budget of evaluations using model-assisted S-Metric selection. In PPSN, pages 784--794, 2008.
[23]
B. Reagen et al. Machsuite: benchmarks for accelerator design and customized architectures. In IISWC, pages 110--119, 2014.
[24]
J. Ragan-Kelley et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI, pages 519--530, 2013.

Cited By

View all
  • (2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
  • (2024)Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343765735:10(1780-1795)Online publication date: Oct-2024
  • (2023)Exploring the Architecture of Multiple GEMM Accelerators in Heterogeneous Systems2023 9th International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/ICCSSE59359.2023.10245050(450-455)Online publication date: 16-Jun-2023
  • Show More Cited By

Index Terms

  1. A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
    August 2020
    263 pages
    ISBN:9781450370530
    DOI:10.1145/3370748
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CAS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ISLPED '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 398 of 1,159 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)81
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
    • (2024)Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343765735:10(1780-1795)Online publication date: Oct-2024
    • (2023)Exploring the Architecture of Multiple GEMM Accelerators in Heterogeneous Systems2023 9th International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/ICCSSE59359.2023.10245050(450-455)Online publication date: 16-Jun-2023
    • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
    • (2022)CASPHAr: Cache-Managed Accelerator Staging and Pipelining in Heterogeneous System ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319753541:11(4325-4336)Online publication date: Nov-2022
    • (2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media