Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1391469.1391585acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

A practical approach of memory access parallelization to exploit multiple off-chip DDR memories

Published: 08 June 2008 Publication History

Abstract

3D stacked memory enables more off-chip DDR memories. Redesigning existing IPs to exploit the increased memory parallelism will be prohibitively costly. In our work, we propose a practical approach to exploit the increased bandwidth and reduced latency of multiple off-chip DDR memories while reusing existing IPs without modification. The proposed approach is based on two new concepts: transaction id renaming and distributed soft arbitration. We present two on-chip network components, request parallelizer and read data serializer, to realize the concepts. Experiments with synthetic test cases and an industrial strength DTV SoC design show that the proposed approach gives significant improvements in total execution cycle (21.6%) and average memory access latency (31.6%) in the DTV case with a small area overhead (30.1% in the on-chip network, and less than 1.4% in the entire chip).

References

[1]
F. Li, et. al., "Design and Management of 3D Chip Multiprocessors Using Network-in-Memory", Proc. ISCA, 2006.
[2]
L. Hsu, et. al., "Exploring the Cache Design Space for Large Scale CMPs", SIGARCH Computer Architecture News, pp. 24--33, 33(4), 2005.
[3]
S. Borkar, "Thousand-Core Chips - A Technology Perspective", Proc. DAC, 2007.
[4]
B. So, M. W. Hall and H. E. Ziegler, "Custom Data Layout for Memory Parallelism", Proc. International Symposium on Code Generation and Optimization, 2004.
[5]
B. Mathew, S. McKee, J. Carter, and A. Davis, "Design of a parallel vector access unit for SDRAM memory systems", Proc. HPCA, 1999.
[6]
K. Kim and V. K. Prasanna, "Latin square for parallel array access", IEEE Trans. on Parallel and Distributed Systems, 4(4), April 1993.
[7]
M. Franklin and G. S. Sohi, "ARB: A Hardware Mechanism for Dynamic Reordering of Memory References", IEEE Trans. on Computers, 45(5), May 1996.
[8]
Sonics MemMAX 2.0 Datasheet, available at www.sonicsinc.com.
[9]
AMBA3 protocol and Fabric IPs, available at http://www.arm.com/products/index.html.
[10]
T. Ezaki, et. al., "A 160Gb/s Interface Design Configuration for Multichip LSI", Proc. ISSCC, 2004.
[11]
D. Mosberger, "Memory Consistency Models", Operating Systems Review, 17(1), Jan. 1993.
[12]
DesignWare® IP Solutions for AMBA#8482; Infrastructure & Fabric, available at www.synopsys.com.
[13]
W. Kwon, et. al., "An Open-Loop Flow Control Scheme Based on the Accurate Global Information of On-Chip Communication", Proc. DATE, 2008.
[14]
Y. Lin, "Design Challenge of a QuadHDTV Video Decoder", MPSoC School, 2007, available at tima.imag.fr/mpsoc.
[15]
A. Harris, et. al., "Bus deadlock avoidance", United States Patent 7219178, 2007.

Cited By

View all
  • (2020)Efficient Support of AXI4 Transaction Ordering Requirements in Many-Core ArchitectureIEEE Access10.1109/ACCESS.2020.30290148(182663-182678)Online publication date: 2020
  • (2015)System-Level Performance and Power Optimization for MPSoCACM Transactions on Embedded Computing Systems10.1145/265633914:1(1-26)Online publication date: 21-Jan-2015
  • (2014)An analytical model for worst-case reorder buffer size of multi-path minimal routing NoCs2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS)10.1109/NOCS.2014.7008761(49-56)Online publication date: Sep-2014
  • Show More Cited By

Index Terms

  1. A practical approach of memory access parallelization to exploit multiple off-chip DDR memories

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '08: Proceedings of the 45th annual Design Automation Conference
    June 2008
    993 pages
    ISBN:9781605581156
    DOI:10.1145/1391469
    • General Chair:
    • Limor Fix
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. arbitration
    2. memory
    3. parallelization

    Qualifiers

    • Research-article

    Conference

    DAC '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Efficient Support of AXI4 Transaction Ordering Requirements in Many-Core ArchitectureIEEE Access10.1109/ACCESS.2020.30290148(182663-182678)Online publication date: 2020
    • (2015)System-Level Performance and Power Optimization for MPSoCACM Transactions on Embedded Computing Systems10.1145/265633914:1(1-26)Online publication date: 21-Jan-2015
    • (2014)An analytical model for worst-case reorder buffer size of multi-path minimal routing NoCs2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS)10.1109/NOCS.2014.7008761(49-56)Online publication date: Sep-2014
    • (2013)CARSProceedings of the Conference on Design, Automation and Test in Europe10.5555/2485288.2485539(1048-1051)Online publication date: 18-Mar-2013
    • (2013)A network congestion-aware memory subsystem for manycoreACM Transactions on Embedded Computing Systems10.1145/2485984.248599812:4(1-18)Online publication date: 3-Jul-2013
    • (2013)A systematic reordering mechanism for on-chip networks using efficient congestion-aware methodJournal of Systems Architecture10.1016/j.sysarc.2012.01.00259:4-5(213-222)Online publication date: Apr-2013
    • (2012)Memory-Efficient On-Chip Network With Adaptive InterfacesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.216034831:1(146-159)Online publication date: 1-Jan-2012
    • (2012)Memory-efficient logic layer communication platform for 3D-stacked memory-on-processor architectures2011 IEEE International 3D Systems Integration Conference (3DIC), 2011 IEEE International10.1109/3DIC.2012.6263024(1-8)Online publication date: Jan-2012
    • (2011)Application-Aware NoC Design for Efficient SDRAM AccessIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.216017630:10(1521-1533)Online publication date: 1-Oct-2011
    • (2011)A quantitative analysis of performance benefits of 3D die stacking on mobile and embedded SoC2011 Design, Automation & Test in Europe10.1109/DATE.2011.5763214(1-6)Online publication date: Mar-2011
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media