research-article

Runtime reconfigurable memory hierarchy in embedded scalable platforms

Authors:

Paolo Mantovani,

Luca P. CarloniAuthors Info & Claims

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Pages 719 - 726

https://doi.org/10.1145/3287624.3288755

Published: 21 January 2019 Publication History

Abstract

In heterogeneous systems-on-chip, the optimal choice of the cache-coherence model for a loosely-coupled accelerator may vary at each invocation, depending on workload and system status. We propose a runtime adaptive algorithm to manage the coherence of accelerators. The algorithm's choices are based on the combination of static and dynamic features of the active accelerators and their workloads. We evaluate the algorithm by leveraging our FPGA-based platform for rapid SoC prototyping. Experimental results, obtained through the deployment of a multi-core and multi-accelerator system that runs Linux SMP, show the benefits of our approach in terms of execution time and memory accesses.

References

[1]

J. Alsop, M. D. Sinclair, and S. V. Advel. 2018. Spandex: A Flexible Interface for Efficient Heterogeneous Coherence. In Proc. of ISCA.

Digital Library

[2]

Mobileye (an Intel Company). 2018. Towards Autonomous Driving. <scp>url</scp>: https://s21.q4cdn.com/600692695/files/doc_presentations/2018/CES-2018-final-MBLY.pdf. CES.

[3]

ARM 2017. AMBA AXI and ACE Protocol Specification. ARM.

[4]

J. Balkind et al. 2016. OpenPiton: An Open Source Manycore Research Framework. In Proc. of ASPLOS.

Digital Library

[5]

L. Benini and G. De Micheli. 2002. Networks on Chips: A New SoC Paradigm. IEEE Computer (2002).

Digital Library

[6]

B. Blaner et al. 2013. IBM POWER7+ Processor On-Chip Accelerators for Cryptography and Active Memory Expansion. IBM J. Research & Development (2013).

Digital Library

[7]

L. P. Carloni. 2015. From Latency-Insensitive Design to Communication-Based System-Level Design. Proc. of the IEEE (2015).

[8]

L. P. Carloni. 2016. The Case for Embedded Scalable Platforms. In Proc. of DAC.

Digital Library

[9]

Y. T. Chen et al. 2013. Accelerator-rich CMPs: From Concept to Real Hardware. In Proc. of ICCD.

[10]

J. Cong et al. 2014. Accelerator-rich Architectures: Opportunities and Progresses. In Proc. of DAC.

Digital Library

[11]

E. Cota et al. 2015. An Analysis of Accelerator Coupling in Heterogeneous Architectures. In Proc. of DAC.

Digital Library

[12]

M. Ditty et al. 2014. NVIDIA'S Tegra K1 System-on-Chip. In Proc. of HCS.

[13]

H. Franke et al. 2010. Introduction to the Wire-Speed Processor and Architecture. IBM J. Research & Development (2010).

Digital Library

[14]

Jiri Gaisler. 2004. An Open-Source VHDL IP Library with Plug & Play Configuration. Building the Information Society (2004).

[15]

D. Giri, P. Mantovani, and L. P. Carloni. 2018. Accelerators & Coherence: An SoC Perspective. IEEE Micro (2018).

[16]

D. Giri, P. Mantovani, and L. P. Carloni. 2018. NoC-Based Support of Heterogeneous Cache-Coherence Models for Accelerators. In Proc. of NOCS.

Digital Library

[17]

John Goodacre. 2008. The Effect and Technique of System Coherence in ARM Multicore Technology. MPSoC.

[18]

Y. Hao et al. 2017. Supporting Address Translation for Accelerator-Centric Architectures. In Proc. of HPCA.

[19]

S. Kumar et al. 2015. Fusion: Design Tradeoffs in Coherent Cache Hierarchies for Accelerators. In Proc. of ISCA.

Digital Library

[20]

M. Lyons et al. 2012. The Accelerator Store: A Shared Memory Framework for Accelerator-based Systems. TACO (2012).

Digital Library

[21]

P. Mantovani et al. 2016. An FPGA-based Infrastructure for Fine-Grained DVFS Analysis in High-Performance Embedded Systems. In Proc. of DAC.

Digital Library

[22]

P. Mantovani et al. 2016. Handling Large Data Sets for High-performance Embedded Applications in Heterogeneous Systems-on-chip. In Proc. of CASES.

Digital Library

[23]

P. Mantovani, G. Di Guglielmo, and L. P. Carloni. 2016. High-level Synthesis of Accelerators in Embedded Scalable Platforms. In Proc. of ASPDAC.

[24]

S. Neuendorffer and F. Martinez-Vallina. 2013. Building Zynq® Accelerators with Vivado® High-Level Synthesis. In Proc. of FPGA.

Digital Library

[25]

P. Pande et al. 2005. Performance Evaluation and Design Trade-offs for Network-on-chip Interconnect Architectures. IEEE Trans. on Computers (2005).

Digital Library

[26]

Y. Shao et al. 2016. Co-designing Accelerators and SoC Interfaces Using gem5-Aladdin. In Proc. of MICRO.

Digital Library

[27]

Y. Shao and D. Brooks. 2015. Research Infrastructures for Hardware Accelerators. Morgan & Claypool.

[28]

D. Sorin et al. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool.

Digital Library

[29]

J. Stuecheli. 2013. POWER8. In Proc. of the IEEE Hot Chips Symp.

[30]

J. Stuecheli et al. 2015. CAPI: A Coherent Accelerator Processor Interface. IBM J. Research & Development (2015).

Digital Library

[31]

Xilinx. 2018. Adaptable Intelligence: The Next Computing Era. Keynote at the 30th Hot Chips Symposium.

Cited By

Bernardi ABrilli GCapotondi AMarongiu ABurgio P(2022)An FPGA Overlay for Efficient Real-Time Localization in 1/10th Scale Autonomous Vehicles2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774517(915-920)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774517
Zuckerman JGiri DKwon JMantovani PCarloni L(2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480065

Index Terms

Runtime reconfigurable memory hierarchy in embedded scalable platforms
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from ...
An efficient cache design for scalable glueless shared-memory multiprocessors
CF '06: Proceedings of the 3rd conference on Computing frontiers

Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is ...
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

To meet the demand for more powerful high-performance shared-memory servers, multiprocessor systems must incorporate efficient and scalable cache coherence protocols, such as those based on directory caches. However, the limited directory cache size of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

January 2019

794 pages

ISBN:9781450360074

DOI:10.1145/3287624

General Chair:
Toshiyuki Shibuya
Fujitsu Laboratories

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEICE ESS: Institute of Electronics, Information and Communication Engineers, Engineering Sciences Society
IEEE CAS
IEEE CEDA
IPSJ SIG-SLDM: Information Processing Society of Japan, SIG System LSI Design Methodology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPDAC '19

Sponsor:

SIGDA

ASPDAC '19: 24th Asia and South Pacific Design Automation Conference

January 21 - 24, 2019

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Upcoming Conference

ASPDAC '25

Sponsor:
sigda

30th Asia and South Pacific Design Automation Conference

January 20 - 23, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
221
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bernardi ABrilli GCapotondi AMarongiu ABurgio P(2022)An FPGA Overlay for Efficient Real-Time Localization in 1/10th Scale Autonomous Vehicles2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774517(915-920)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774517
Zuckerman JGiri DKwon JMantovani PCarloni L(2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480065

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents