research-article

A Single-Tier Virtual Queuing Memory Controller Architecture for Heterogeneous MPSoCs

Authors:

Bill LinAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 22, Issue 3

Article No.: 56, Pages 1 - 23

https://doi.org/10.1145/3035481

Published: 27 April 2017 Publication History

Abstract

Heterogeneous MPSoCs typically integrate diverse cores, including application CPUs, GPUs, and HD coders. These cores commonly share an off-chip memory to save cost and energy, but their memory accesses often interfere with each other, leading to undesirable consequences like a slowdown of application performance or a failure to sustain real-time performance. The memory controller plays a central role in meeting the QoS needs of real-time cores while maximizing CPU performance. Previous QoS-aware memory controllers are based on a classic two-tier queuing architecture that buffers memory transactions at the first tier, followed by a second tier that buffers translated DRAM commands. In these designs, QoS-aware policies are used to schedule competing transactions at the first stage, but the translated DRAM commands are served in FIFO order at the second stage. Unfortunately, once the scheduled transactions have been forwarded to the command stage, newly arriving transactions that may be more critical cannot be served ahead of those translated commands that are already queued at the second stage. To address this, we propose a scalable memory controller architecture based on single-tier virtual queuing (STVQ) that maintains a single tier of request queues and employs an efficacious scheduler that considers both QoS requirements and DRAM bank states. In comparison with previous QoS-aware memory controllers, the proposed STVQ memory controller reduces CPU slowdown by up to 13.9% while satisfying all frame rate requirements. We propose further optimizations that can significantly increase row-buffer hits by up to 66.2% and reduce memory latency by up to 19.8%.

References

[1]

Jose-Maria Arnau, Joan-Manuel Parcerisa, and Polychronis Xekalakis. 2013a. Parallel frame rendering: Trading responsiveness for energy on a mobile GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT’13). IEEE, Los Alamitos, CA, 83--92.

[2]

Jose-Maria Arnau, Joan-Manuel Parcerisa, and Polychronis Xekalakis. 2013b. TEAPOT: A toolset for evaluating performance, power and image quality on mobile graphics systems. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS’13). ACM, New York, NY, 37--46.

Digital Library

[3]

Rachata Ausavarungnirun, Kevin Kai-Wei Chang, Lavanya Subramanian, Gabriel H. Loh, and Onur Mutlu. 2012. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE, Los Alamitos, CA, 416--427.

Digital Library

[4]

Daniel U. Becker and William J. Dally. 2009. Allocator implementations for network-on-chip routers. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). ACM, New York, NY, 52:1--52:12.

Digital Library

[5]

Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, San Francisco, CA.

Digital Library

[6]

Javier Jalle, Eduardo Quinones, Jaume Abella, Luca Fossati, Marco Zulianello, and Francisco J. Cazorla. 2014. A dual-criticality memory controller (DCmc): Proposal and evaluation of a space case study. In Proceedings of the Real-Time Systems Symposium (RTSS’14). IEEE, Los Alamitos, CA, 207--217.

[7]

Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 850--855.

Digital Library

[8]

Dimitris Kaseridis, Jeffrey Stuecheli, and Lizy Kurian John. 2011. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’44). ACM, New York, NY, 24--35.

Digital Library

[9]

Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. 2010a. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA’10). IEEE, Los Alamitos, CA, 1--12.

[10]

Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’00). IEEE, Los Alamitos, CA, 65--76.

Digital Library

[11]

Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013. Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture. Journal of Parallel and Distributed Computing 73, 12, 1525--1538.

Digital Library

[12]

Asit K. Mishra, Onur Mutlu, and Chita R. Das. 2013. A heterogeneous multiple network-on-chip design: An application-aware approach. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, 36:1--36:10.

Digital Library

[13]

Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE, Los Alamitos, CA, 63--74.

Digital Library

[14]

NVIDIA. 2015. Tegra X1. Retrieved February 4, 2017, from http://www.nvidia.com/object/tegra-x1-processor.html.

[15]

Jason Power, Arkaprava Basu, Junli Gu, Sooraj Puthoor, Bradford M. Beckmann, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2013. Heterogeneous system coherence for integrated CPU-GPU systems. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, 457--467.

Digital Library

[16]

Qualcomm. 2015. Snapdragon 820. Retrieved February 4, 2017, from https://www.qualcomm.com/products/snapdragon/processors/820.

[17]

Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY, 128--138.

Digital Library

[18]

Tungsten Graphics. 2010. Gallium3D. Retrieved February 4, 2017, from http://en.wikipedia.org/wiki/Gallium3D/.

[19]

David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMSim: A memory-system simulator. SIGARCH Computer Architecture News 33, 4, 100--107.

Digital Library

[20]

Tao Zhang, Cong Xu, Ke Chen, Guangyu Sun, and Yuan Xie. 2014. 3D-SWIFT: A high-performance 3D-stacked wide IO DRAM. In Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI (GLSVLSI’14). ACM, New York, NY, 51--56.

Digital Library

Index Terms

A Single-Tier Virtual Queuing Memory Controller Architecture for Heterogeneous MPSoCs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Refresh pausing in DRAM memory systems

Dynamic Random Access Memory (DRAM) cells rely on periodic refresh operations to maintain data integrity. As the capacity of DRAM memories has increased, so has the amount of time consumed in doing refresh. Refresh operations contend with read ...
Thread Cluster Memory Scheduling

Memory schedulers in multicore systems should carefully schedule memory requests from different threads to ensure high system performance and fair, fast progress of each thread. No existing memory scheduler provides both the highest system performance ...
Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 22, Issue 3

July 2017

440 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3062395

Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 27 April 2017

Accepted: 01 December 2016

Revised: 01 November 2016

Received: 01 August 2016

Published in TODAES Volume 22, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Qualcomm Fellow-Mentor-Advisor (FMA) award

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
119
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents