Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Single-Tier Virtual Queuing Memory Controller Architecture for Heterogeneous MPSoCs

Published: 27 April 2017 Publication History

Abstract

Heterogeneous MPSoCs typically integrate diverse cores, including application CPUs, GPUs, and HD coders. These cores commonly share an off-chip memory to save cost and energy, but their memory accesses often interfere with each other, leading to undesirable consequences like a slowdown of application performance or a failure to sustain real-time performance. The memory controller plays a central role in meeting the QoS needs of real-time cores while maximizing CPU performance. Previous QoS-aware memory controllers are based on a classic two-tier queuing architecture that buffers memory transactions at the first tier, followed by a second tier that buffers translated DRAM commands. In these designs, QoS-aware policies are used to schedule competing transactions at the first stage, but the translated DRAM commands are served in FIFO order at the second stage. Unfortunately, once the scheduled transactions have been forwarded to the command stage, newly arriving transactions that may be more critical cannot be served ahead of those translated commands that are already queued at the second stage. To address this, we propose a scalable memory controller architecture based on single-tier virtual queuing (STVQ) that maintains a single tier of request queues and employs an efficacious scheduler that considers both QoS requirements and DRAM bank states. In comparison with previous QoS-aware memory controllers, the proposed STVQ memory controller reduces CPU slowdown by up to 13.9% while satisfying all frame rate requirements. We propose further optimizations that can significantly increase row-buffer hits by up to 66.2% and reduce memory latency by up to 19.8%.

References

[1]
Jose-Maria Arnau, Joan-Manuel Parcerisa, and Polychronis Xekalakis. 2013a. Parallel frame rendering: Trading responsiveness for energy on a mobile GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT’13). IEEE, Los Alamitos, CA, 83--92.
[2]
Jose-Maria Arnau, Joan-Manuel Parcerisa, and Polychronis Xekalakis. 2013b. TEAPOT: A toolset for evaluating performance, power and image quality on mobile graphics systems. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS’13). ACM, New York, NY, 37--46.
[3]
Rachata Ausavarungnirun, Kevin Kai-Wei Chang, Lavanya Subramanian, Gabriel H. Loh, and Onur Mutlu. 2012. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE, Los Alamitos, CA, 416--427.
[4]
Daniel U. Becker and William J. Dally. 2009. Allocator implementations for network-on-chip routers. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). ACM, New York, NY, 52:1--52:12.
[5]
Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, San Francisco, CA.
[6]
Javier Jalle, Eduardo Quinones, Jaume Abella, Luca Fossati, Marco Zulianello, and Francisco J. Cazorla. 2014. A dual-criticality memory controller (DCmc): Proposal and evaluation of a space case study. In Proceedings of the Real-Time Systems Symposium (RTSS’14). IEEE, Los Alamitos, CA, 207--217.
[7]
Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 850--855.
[8]
Dimitris Kaseridis, Jeffrey Stuecheli, and Lizy Kurian John. 2011. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’44). ACM, New York, NY, 24--35.
[9]
Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. 2010a. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA’10). IEEE, Los Alamitos, CA, 1--12.
[10]
Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’00). IEEE, Los Alamitos, CA, 65--76.
[11]
Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013. Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture. Journal of Parallel and Distributed Computing 73, 12, 1525--1538.
[12]
Asit K. Mishra, Onur Mutlu, and Chita R. Das. 2013. A heterogeneous multiple network-on-chip design: An application-aware approach. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, 36:1--36:10.
[13]
Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE, Los Alamitos, CA, 63--74.
[14]
NVIDIA. 2015. Tegra X1. Retrieved February 4, 2017, from http://www.nvidia.com/object/tegra-x1-processor.html.
[15]
Jason Power, Arkaprava Basu, Junli Gu, Sooraj Puthoor, Bradford M. Beckmann, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2013. Heterogeneous system coherence for integrated CPU-GPU systems. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, 457--467.
[16]
Qualcomm. 2015. Snapdragon 820. Retrieved February 4, 2017, from https://www.qualcomm.com/products/snapdragon/processors/820.
[17]
Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY, 128--138.
[18]
Tungsten Graphics. 2010. Gallium3D. Retrieved February 4, 2017, from http://en.wikipedia.org/wiki/Gallium3D/.
[19]
David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMSim: A memory-system simulator. SIGARCH Computer Architecture News 33, 4, 100--107.
[20]
Tao Zhang, Cong Xu, Ke Chen, Guangyu Sun, and Yuan Xie. 2014. 3D-SWIFT: A high-performance 3D-stacked wide IO DRAM. In Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI (GLSVLSI’14). ACM, New York, NY, 51--56.

Index Terms

  1. A Single-Tier Virtual Queuing Memory Controller Architecture for Heterogeneous MPSoCs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 3
    July 2017
    440 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/3062395
    • Editor:
    • Naehyuck Chang
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 27 April 2017
    Accepted: 01 December 2016
    Revised: 01 November 2016
    Received: 01 August 2016
    Published in TODAES Volume 22, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Memory controller
    2. memory scheduling
    3. quality of service

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Qualcomm Fellow-Mentor-Advisor (FMA) award

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 119
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media