research-article

Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Authors:

Carlos Flores Fajardo,

German Fabila Garcia,

Li ZhaoAuthors Info & Claims

DAC '11: Proceedings of the 48th Design Automation Conference

Pages 966 - 971

https://doi.org/10.1145/2024724.2024938

Published: 05 June 2011 Publication History

Abstract

In an SoC, building local storage in each accelerator is area inefficient due to the low average utilization. In this paper, we present design and implementation of Buffer-integrated-Caching (BiC), which allows many buffers to be instantiated simultaneously in caches. BiC enables cores to view portions of the SRAM as cache while accelerators access other portions of the SRAM as private buffers.

We demonstrate the cost-effectiveness of BiC based on a recognition MPSoC that includes two Pentium^TM cores, an Augmented Reality accelerator and a speech recognition accelerator. With 3% extra area added to the baseline L2 cache, BiC eliminates the need to build 215KB dedicated SRAM for the accelerators, while increasing total cache misses by no more than 0.3%.

References

[1]

S. Agarwala, C. Fuoco, T. Anderson, D. Comisky, C. Mobley, "A Multi-level Memory System Architecture for High Performance DSP Applications,", ICCD, 2000

Digital Library

[2]

ARM, "CortexA9 Processors", http://www.arm.com/products/processors

[3]

N. Binkert, A. Saidi, S. Reinhardt, "Integrated network interfaces for high-bandwidth TCP/IP", ASPLOS, 2006

Digital Library

[4]

D. Chiou, et al, "Application-Specific Memory Management for Embedded Systems Using Software-controlled Caches", DAC, 2000

Digital Library

[5]

M. Choubassi and Y. Wu, "Augmented Reality on Mobile Internet Devices", Intel Technology Journal, 14(1), 2010

[6]

Carnie Mellon University, http://www.cmusphinx.org

[7]

F. Liu, et al., "Understanding How Off-chip Memory Bandwidth Partitioning in Chip-Multiprocessors Affects System Performance", HPCA 2010

[8]

M. Gschwind, H. Peter Hofstee, et al, "Synergistic Processing in Cell's Multicore Architecture", IEEE Micro, 26(1), 2006

Digital Library

[9]

R. Iyer, L. Zhao, et al., "QoS Policies and Architecture for Cache/Memory in CMP Platforms", ACM SIGMETRICS, 2007

Digital Library

[10]

T. Jones, S. Bartolini, B. Bus, J. Cavazos and M. O'Boyle, "Instruction Cache Energy Saving through Compiler Way Placement", Design and Test in Europe 2008

Digital Library

[11]

H. Kim, Arun Somani, Akhilesh Tyagi, "A Reconfigurable Multi-function Computing Cache Architecture", FPGA 2000

Digital Library

[12]

T. Kluter, P. Brisk, P. Ienne and E. Charbon, "Way Stealing: Cache-assisted Automatic Instruction Set Extensions", DAC, 2009

Digital Library

[13]

S. Lee, Y. Zhang, Z. Fang, S. Srinivasan, R. Iyer, D. Newell, "Accelerating Mobile Augmented Reality on a Handheld Platform", International Conference on Computer Design, 2009

Digital Library

[14]

G. Liao and L. Bhuyan, "Performance Measurement of an Integrated NIC Architecture with 10GbE", Hot Interconnect, 2009

Digital Library

[15]

B. Mathew, A. Davis and Z. Fang, "A Low-Power Accelerator for the SPHINX3 Speech Recognition System", CASES, 2003

Digital Library

[16]

J. Miller and A. Agarwal, "Software-based Instruction Caching for Embedded Processors", ASPLOS, 2006

Digital Library

[17]

K. J. Nesbit, J. Laudon, and J. E. Smith, "Virtual Private Caches", International Symposium on Computer Architecture, 2007.

Digital Library

[18]

P. Ranganathan, S. Adve and N. Jouppi, "Reconfigurable Caches and Their Application to Media Processing", ISCA, 2000

Digital Library

[19]

R. Ravindran, M. Chu, and S. Mahlke, "Compiler-managed Partitioned Cache for Low Power", ACM Conf. on LCTES 2007

Digital Library

[20]

S. Seo, Jaejin Lee, Zehra Sura, "Design and Implementation of Software-managed Caches for Multicores with Local Memory", IEEE Conference on High Performance Computer Architecture, 2009

[21]

S. Srinivasan, et al, "Performance Characterization and Optimization of Mobile Augmented Reality on Handheld Platforms", IEEE International Symposium on Workload Characterization, 2009

Digital Library

[22]

B. Rogers, et al., "Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling", ISCA 2009

Digital Library

[23]

M. Kharbutli, et al., "Comprehensively and Efficiently Protecting the Heap", ASPLOS, 2006

Digital Library

[24]

X. Jiang et al., "Architecture Support for Improving Bulk Memory Copying and Initialization Performance", PACT 2009

Digital Library

[25]

X. Jiang and Y. Solihin, "Architectural Framework for Supporting Operating System Survivability", HPCA 2011

Digital Library

[26]

Texas Instruments, "TMS320C Flexible Cache", http://focus.ti.com.cn/cn/lit/ug/sprug82a/sprug82a.pdf

[27]

Texas Instruments, "OMAP 4 Mobile Applications Platform", http://focus.ti.com/lit/ml/swpt034/swpt034.pdf

Cited By

Krishna TKwon HParashar APellauer MSamajdar A(2020)Data Orchestration in Deep Learning AcceleratorsSynthesis Lectures on Computer Architecture10.2200/S01015ED1V01Y202005CAC05215:3(1-164)Online publication date: 17-Aug-2020
https://doi.org/10.2200/S01015ED1V01Y202005CAC052
Wittig RHasler MMatus EFettweis G(2019)Queue Based Memory Management Unit for Heterogeneous MPSoCs2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715129(1297-1300)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8715129
Pellauer MShao YClemons JCrago NHegde KVenkatesan RKeckler SFletcher CEmer JBahar IHerlihy MWitchel ELebeck A(2019)BuffetsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304025(137-151)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304025
Show More Cited By

Index Terms

Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores
EuroSys '11: Proceedings of the sixth conference on Computer systems

Buffer caches in operating systems keep active file blocks in memory to reduce disk accesses. Related studies have been focused on how to minimize buffer misses and the caused performance degradation. However, the side effects and performance ...
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Recent research advocates large die-stacked DRAM caches in many core servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '11: Proceedings of the 48th Design Automation Conference

June 2011

1055 pages

ISBN:9781450306362

DOI:10.1145/2024724

General Chair:
Leon Stok
IBM Corp., Hopewell Jct., NY
,
Program Chairs:
Nikil Dutt
Univ. of California, Irvine, CA
,
Soha Hassoun
Tufts Univ., Medford, MA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE Council on Electronic Design Automation (CEDA)

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '11

Sponsor:

EDAC
SIGDA

DAC '11: The 48th Annual Design Automation Conference 2011

June 5 - 10, 2011

California, San Diego

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
215
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Krishna TKwon HParashar APellauer MSamajdar A(2020)Data Orchestration in Deep Learning AcceleratorsSynthesis Lectures on Computer Architecture10.2200/S01015ED1V01Y202005CAC05215:3(1-164)Online publication date: 17-Aug-2020
https://doi.org/10.2200/S01015ED1V01Y202005CAC052
Wittig RHasler MMatus EFettweis G(2019)Queue Based Memory Management Unit for Heterogeneous MPSoCs2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715129(1297-1300)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8715129
Pellauer MShao YClemons JCrago NHegde KVenkatesan RKeckler SFletcher CEmer JBahar IHerlihy MWitchel ELebeck A(2019)BuffetsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304025(137-151)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304025
Cong JFang ZHuang MWei PWu DYu C(2019)Customizable Computing—From Single Chip to DatacentersProceedings of the IEEE10.1109/JPROC.2018.2876372107:1(185-203)Online publication date: Jan-2019
https://doi.org/10.1109/JPROC.2018.2876372
Vijaykumar NJain AMajumdar DHsieh KPekhimenko GEbrahimi EHajinazar NGibbons PMutlu O(2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00027
Zhu YReddi V(2017)Optimizing General-Purpose CPUs for Energy-Efficient Mobile Web ComputingACM Transactions on Computer Systems10.1145/304102435:1(1-31)Online publication date: 20-Mar-2017
https://dl.acm.org/doi/10.1145/3041024
Shao YXi SSrinivasan VWei GBrooks DHsu WYang CLipasti MLee H(2016)Co-designing accelerators and SoC interfaces using gem5-aladdinThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195697(1-12)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195697
Mantovani PCota EPilato CDi Guglielmo GCarloni L(2016)Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chipProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968509(1-10)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2968455.2968509
Cota EMantovani PCarloni L(2016)Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator IntegrationProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926258(1-12)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926258
Shao YXi SSrinivasan VWei GBrooks D(2016)Co-designing accelerators and SoC interfaces using gem5-Aladdin2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783751(1-12)Online publication date: Oct-2016
https://doi.org/10.1109/MICRO.2016.7783751
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents