Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2024724.2024938acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Published: 05 June 2011 Publication History
  • Get Citation Alerts
  • Abstract

    In an SoC, building local storage in each accelerator is area inefficient due to the low average utilization. In this paper, we present design and implementation of Buffer-integrated-Caching (BiC), which allows many buffers to be instantiated simultaneously in caches. BiC enables cores to view portions of the SRAM as cache while accelerators access other portions of the SRAM as private buffers.
    We demonstrate the cost-effectiveness of BiC based on a recognition MPSoC that includes two PentiumTM cores, an Augmented Reality accelerator and a speech recognition accelerator. With 3% extra area added to the baseline L2 cache, BiC eliminates the need to build 215KB dedicated SRAM for the accelerators, while increasing total cache misses by no more than 0.3%.

    References

    [1]
    S. Agarwala, C. Fuoco, T. Anderson, D. Comisky, C. Mobley, "A Multi-level Memory System Architecture for High Performance DSP Applications,", ICCD, 2000
    [2]
    ARM, "CortexA9 Processors", http://www.arm.com/products/processors
    [3]
    N. Binkert, A. Saidi, S. Reinhardt, "Integrated network interfaces for high-bandwidth TCP/IP", ASPLOS, 2006
    [4]
    D. Chiou, et al, "Application-Specific Memory Management for Embedded Systems Using Software-controlled Caches", DAC, 2000
    [5]
    M. Choubassi and Y. Wu, "Augmented Reality on Mobile Internet Devices", Intel Technology Journal, 14(1), 2010
    [6]
    Carnie Mellon University, http://www.cmusphinx.org
    [7]
    F. Liu, et al., "Understanding How Off-chip Memory Bandwidth Partitioning in Chip-Multiprocessors Affects System Performance", HPCA 2010
    [8]
    M. Gschwind, H. Peter Hofstee, et al, "Synergistic Processing in Cell's Multicore Architecture", IEEE Micro, 26(1), 2006
    [9]
    R. Iyer, L. Zhao, et al., "QoS Policies and Architecture for Cache/Memory in CMP Platforms", ACM SIGMETRICS, 2007
    [10]
    T. Jones, S. Bartolini, B. Bus, J. Cavazos and M. O'Boyle, "Instruction Cache Energy Saving through Compiler Way Placement", Design and Test in Europe 2008
    [11]
    H. Kim, Arun Somani, Akhilesh Tyagi, "A Reconfigurable Multi-function Computing Cache Architecture", FPGA 2000
    [12]
    T. Kluter, P. Brisk, P. Ienne and E. Charbon, "Way Stealing: Cache-assisted Automatic Instruction Set Extensions", DAC, 2009
    [13]
    S. Lee, Y. Zhang, Z. Fang, S. Srinivasan, R. Iyer, D. Newell, "Accelerating Mobile Augmented Reality on a Handheld Platform", International Conference on Computer Design, 2009
    [14]
    G. Liao and L. Bhuyan, "Performance Measurement of an Integrated NIC Architecture with 10GbE", Hot Interconnect, 2009
    [15]
    B. Mathew, A. Davis and Z. Fang, "A Low-Power Accelerator for the SPHINX3 Speech Recognition System", CASES, 2003
    [16]
    J. Miller and A. Agarwal, "Software-based Instruction Caching for Embedded Processors", ASPLOS, 2006
    [17]
    K. J. Nesbit, J. Laudon, and J. E. Smith, "Virtual Private Caches", International Symposium on Computer Architecture, 2007.
    [18]
    P. Ranganathan, S. Adve and N. Jouppi, "Reconfigurable Caches and Their Application to Media Processing", ISCA, 2000
    [19]
    R. Ravindran, M. Chu, and S. Mahlke, "Compiler-managed Partitioned Cache for Low Power", ACM Conf. on LCTES 2007
    [20]
    S. Seo, Jaejin Lee, Zehra Sura, "Design and Implementation of Software-managed Caches for Multicores with Local Memory", IEEE Conference on High Performance Computer Architecture, 2009
    [21]
    S. Srinivasan, et al, "Performance Characterization and Optimization of Mobile Augmented Reality on Handheld Platforms", IEEE International Symposium on Workload Characterization, 2009
    [22]
    B. Rogers, et al., "Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling", ISCA 2009
    [23]
    M. Kharbutli, et al., "Comprehensively and Efficiently Protecting the Heap", ASPLOS, 2006
    [24]
    X. Jiang et al., "Architecture Support for Improving Bulk Memory Copying and Initialization Performance", PACT 2009
    [25]
    X. Jiang and Y. Solihin, "Architectural Framework for Supporting Operating System Survivability", HPCA 2011
    [26]
    Texas Instruments, "TMS320C Flexible Cache", http://focus.ti.com.cn/cn/lit/ug/sprug82a/sprug82a.pdf
    [27]
    Texas Instruments, "OMAP 4 Mobile Applications Platform", http://focus.ti.com/lit/ml/swpt034/swpt034.pdf

    Cited By

    View all
    • (2020)Data Orchestration in Deep Learning AcceleratorsSynthesis Lectures on Computer Architecture10.2200/S01015ED1V01Y202005CAC05215:3(1-164)Online publication date: 17-Aug-2020
    • (2019)Queue Based Memory Management Unit for Heterogeneous MPSoCs2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715129(1297-1300)Online publication date: Mar-2019
    • (2019)BuffetsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304025(137-151)Online publication date: 4-Apr-2019
    • Show More Cited By

    Index Terms

    1. Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DAC '11: Proceedings of the 48th Design Automation Conference
      June 2011
      1055 pages
      ISBN:9781450306362
      DOI:10.1145/2024724
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 June 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. accelerator
      2. cache
      3. memory
      4. system-on-chip

      Qualifiers

      • Research-article

      Conference

      DAC '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

      Upcoming Conference

      DAC '25
      62nd ACM/IEEE Design Automation Conference
      June 22 - 26, 2025
      San Francisco , CA , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Data Orchestration in Deep Learning AcceleratorsSynthesis Lectures on Computer Architecture10.2200/S01015ED1V01Y202005CAC05215:3(1-164)Online publication date: 17-Aug-2020
      • (2019)Queue Based Memory Management Unit for Heterogeneous MPSoCs2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715129(1297-1300)Online publication date: Mar-2019
      • (2019)BuffetsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304025(137-151)Online publication date: 4-Apr-2019
      • (2019)Customizable Computing—From Single Chip to DatacentersProceedings of the IEEE10.1109/JPROC.2018.2876372107:1(185-203)Online publication date: Jan-2019
      • (2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
      • (2017)Optimizing General-Purpose CPUs for Energy-Efficient Mobile Web ComputingACM Transactions on Computer Systems10.1145/304102435:1(1-31)Online publication date: 20-Mar-2017
      • (2016)Co-designing accelerators and SoC interfaces using gem5-aladdinThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195697(1-12)Online publication date: 15-Oct-2016
      • (2016)Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chipProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968509(1-10)Online publication date: 1-Oct-2016
      • (2016)Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator IntegrationProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926258(1-12)Online publication date: 1-Jun-2016
      • (2016)Co-designing accelerators and SoC interfaces using gem5-Aladdin2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783751(1-12)Online publication date: Oct-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media