research-article

Public Access

A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

Authors:

Kshitij Bhardwaj,

David M. Brooks,

José Miguel Hernández-Lobato,

Gu-Yeon WeiAuthors Info & Claims

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Pages 145 - 150

https://doi.org/10.1145/3370748.3406564

Published: 10 August 2020 Publication History

Abstract

Modern systems-on-chip (SoCs) include not only general-purpose CPUs but also specialized hardware accelerators. Typically, there are three coherence model choices to integrate an accelerator with the memory hierarchy: no coherence, coherent with the last-level cache (LLC), and private cache based full coherence. However, there has been very limited research on finding which coherence models are optimal for the accelerators of a complex many-accelerator SoC. This paper focuses on determining a cost-aware coherence interface for an SoC and its target application: find the best coherence models for the accelerators that optimize their power and performance, considering both workload characteristics and system-level contention. A novel comprehensive methodology is proposed that uses Bayesian optimization to efficiently find the cost-aware coherence interfaces for SoCs that are modeled using the gem5-Aladdin architectural simulator. For a complete analysis, gem5-Aladdin is extended to support LLC coherence in addition to already-supported no coherence and full coherence. For a heterogeneous SoC targeting applications with varying amount of accelerator-level parallelism, the proposed framework rapidly finds cost-aware coherence interfaces that show significant performance and power benefits over the other commonly-used coherence interfaces.

Supplementary Material

MP4 File (3370748.3406564.mp4)

A new methodology to find optimal coherence models for different accelerators of an SoC

Download
23.52 MB

References

[1]

D. D. Sarma and G. Venkataramanan. Compute and redundancy solution for Tesla's full self driving computer. Hot Chips, 2019.

[2]

Y. Chen et al. Accelerator-rich CMPs: from concept to real hardware. In ICCD, pages 169--176, 2013.

[3]

E. G. Cota et al. An analysis of accelerator coupling in heterogeneous architectures. In DAC, pages 202:1--202:6, 2015.

[4]

D. Giri et al. Accelerators and coherence: an SoC perspective. IEEE Micro, 38(6): 36--45, 2018.

[5]

Arm. ARM Cortex-A72 MPCore processor reference manual.

[6]

B. Wile. Coherent accelerator processor interface (CAPI) for POWER8 systems. https://www.alpha-data.com/pdfs/capi_wp_29sept2014_pub.pdf, 2014.

[7]

Ashley Stevens. Introduction to AMBA® 4 ACE and big.little processing technology. ARM White Paper, CoreLink Intelligent System IP, 2011.

[8]

J. Cong et al. On-chip interconnection network for accelerator-rich architectures. In DAC, pages 8:1--8:6, 2015.

Digital Library

[9]

H. Framke et al. Introduction to the wire-speed processor and architecture. In IBM J. of Research and Development, 2010.

[10]

V. J. Reddi and M. Hill. Accelerator-level parallelism (ALP). ACM SiGARCH Computer Architecture Today, 2019.

[11]

Y. S. Shao et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In MICRO, pages 48:1--48:12, 2016.

[12]

C. Tan et al. Stitch: fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In ISCA, pages 575--587, 2018.

[13]

D. Giri et al. NoC-based support of heterogeneous cache-coherence models for accelerators. In NOCS, pages 1:1--1:8, 2018.

[14]

J. Alsop, M. D. Sinclair, and S. V. Adve. Spandex: a flexible interface for efficient heterogeneous coherence. In ISCA, pages 261--274, 2018.

[15]

K. Bhardwaj et al. Determining optimal coherency interface for many-accelerator SoCs using Bayesian optimization. IEEE CAL, 18:119--123, 2019.

[16]

D. Giri et al. Runtime reconfigurable memory hierarchy in embedded scalable platforms (invited). In ASP-DAC, pages 719--726, 2019.

[17]

N. L. Binkert et al. The gem5 simulator. SIGARCH Computer Architecture News, pages 1--7, 2011.

[18]

Y. S. Shao et al. Aladdin: a pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ISCA, pages 97--108, 2014.

[19]

J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, pages 2960--2968, 2012.

Digital Library

[20]

B. Shahriari et al. Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, pages 148--175, 2016.

[21]

C. Rasmussen and C. Williams. Gaussian processes for machine learning. MIT press, Cambridge, MA, 2005.

[22]

W. Ponweiser et al. Multiobjective optimization on a limited budget of evaluations using model-assisted S-Metric selection. In PPSN, pages 784--794, 2008.

[23]

B. Reagen et al. Machsuite: benchmarks for accelerator design and customized architectures. In IISWC, pages 110--119, 2014.

[24]

J. Ragan-Kelley et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI, pages 519--530, 2013.

Cited By

Liu JEgawa RTakahashi KShimomura YTakizawa H(2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
https://doi.org/10.1587/elex.21.20230520
Klein JBoybat IAnsaloni GZapater MAtienza D(2024)Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343765735:10(1780-1795)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3437657
Zhang CZhang JZhou LLiu H(2023)Exploring the Architecture of Multiple GEMM Accelerators in Heterogeneous Systems2023 9th International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/ICCSSE59359.2023.10245050(450-455)Online publication date: 16-Jun-2023
https://doi.org/10.1109/ICCSSE59359.2023.10245050
Show More Cited By

Index Terms

A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from ...
Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization
The modern system-on-chip (SoC) of the current exascale computing era is complex. These SoCs not only consist of several general-purpose processing cores but also integrate many specialized hardware accelerators. Three common coherency interfaces are used ...
Coherence decoupling: making use of incoherence
ASPLOS '04

This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

August 2020

263 pages

ISBN:9781450370530

DOI:10.1145/3370748

Conference Chairs:
David Atienza Alonso
EPFL
,
Qinru Qiu
Syracuse Univ.
,
Program Chairs:
Sherief Reda
Brown Univ.
,
Yiran Chen
Duke Univ.

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

ISLPED '20

Sponsor:

SIGDA

ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design

August 10 - 12, 2020

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
417
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)9

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu JEgawa RTakahashi KShimomura YTakizawa H(2024)Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systemsIEICE Electronics Express10.1587/elex.21.2023052021:4(20230520-20230520)Online publication date: 25-Feb-2024
https://doi.org/10.1587/elex.21.20230520
Klein JBoybat IAnsaloni GZapater MAtienza D(2024)Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343765735:10(1780-1795)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3437657
Zhang CZhang JZhou LLiu H(2023)Exploring the Architecture of Multiple GEMM Accelerators in Heterogeneous Systems2023 9th International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/ICCSSE59359.2023.10245050(450-455)Online publication date: 16-Jun-2023
https://doi.org/10.1109/ICCSSE59359.2023.10245050
Alsop JNa WSinclair MGrayson SAdve S(2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3530819
Asri MGerstlauer A(2022)CASPHAr: Cache-Managed Accelerator Staging and Pipelining in Heterogeneous System ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319753541:11(4325-4336)Online publication date: Nov-2022
https://doi.org/10.1109/TCAD.2022.3197535
Zuckerman JGiri DKwon JMantovani PCarloni L(2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480065

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents