research-article

The virtual write queue: coordinating DRAM and last-level cache policies

Authors:

Jeffrey Stuecheli,

Dimitris Kaseridis,

Hillery C. Hunter,

Lizy K. JohnAuthors Info & Claims

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Pages 72 - 82

https://doi.org/10.1145/1815961.1815972

Published: 19 June 2010 Publication History

Abstract

In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU's data needs, and are mostly oblivious to the main memory. In this paper, we demonstrate that the era of many-core architectures has created new main memory bottlenecks, and mandates a new approach: coordination of cache policy with main memory characteristics. Using the cache for memory optimization purposes, we propose a Virtual Write Queue which dramatically expands the memory controller's visibility of processor behavior, at low implementation overhead. Through memory-centric modification of existing policies, such as scheduled writebacks, this paper demonstrates that performance limiting effects of highly-threaded architectures can be overcome. We show that through awareness of the physical main memory layout and by focusing on writes, both read and write average latency can be shortened, memory power reduced, and overall system performance improved. Through full-system cycle-accurate simulations of SPEC cpu2006, we demonstrate that the proposed Virtual Write Queue achieves an average 10.9% system-level throughput improvement on memory-intensive workloads, along with an overall reduction of 8.7% in memory power across the whole suite.

References

[1]

J. Borkenhagen, B. Vanderpool & L. Whitley, "Read prediction algorithm to provide low latency reads with SDRAM cache," US Patent 6801982, 2004.

[2]

DDR3 SDRAM Standard, JEDEC JESD79-3, http://www.jedec.org, June 2007.

[3]

3. P. Glaskowsky, "High-end server chips breaking records," http://news.cnet.com/8301--13512_3-10321740-23.html, Aug. 2009.

[4]

J. Hruska, "Nehalem by the numbers: The Ars review," http://arstechnica.com/hardware/reviews/2008/11/nehalem-launch-review.ars/3.

[5]

B. Jacob, S. Ng & D. Wang, "Memory systems: Cache, DRAM, disk," Morgan Kaufmann Publishers Inc., USA, 2007.

Digital Library

[6]

W. Jang & D. Pan, "An SDRAM-aware router for networks-on-chip," in Proceedings of the 46th Annual Design Automation Conference, pp. 800--805, 2009.

Digital Library

[7]

R. Kalla, B. Sinharoy & J. M. Tendler,"IBM Power5 chip: A dual-core multithreaded processor," IEEE Micro, vol. 24, no. 2, pp. 40--47, 2004.

Digital Library

[8]

N. Y. Ker & C. H. Chen, "An effective SDRAM power mode management scheme for performance and energy sensitive embedded systems," in Proceedings of the Asia and South Pacific Design Automation Conference, pp. 515--518, 2003.

Digital Library

[9]

H. Lee, G. Tyson & M. Farrens, "Eager writeback -- a technique for improving bandwidth utilization," in Proceedings of the International Symposium on Microarchitecture, pp. 11--21, 2000.

Digital Library

[10]

W. Lin, S. Reinhardt & D. Burger, "Reducing DRAM latencies with an integrated memory hierarchy design," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp 301--312, 2001.

Digital Library

[11]

J. Lin, H. Zheng, Z. Zhu, Z. Zhang & H. David, "DRAM-level prefetching for fully-buffered DIMM: Design, performance and power saving," in International Symposium on Performance Analysis of Systems & Software, pp 94--104, 2008.

[12]

S. Liu, S. Memik, Y. Zhang & G. Memik, "A power and temperature aware DRAM architecture," in Proceedings of the 45th Annual Design Automation Conference, pp 878--883, 2008.

Digital Library

[13]

M. Martin et al., "Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset," Computer Architecture News (CAN), September 2005.

Digital Library

[14]

J. McCalpin, "Memory bandwidth and machine balance in current high performance computers," IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, 1995.

[15]

Micron Technologies, Inc., "Exploring the RLDRAM II Feature Set," Technical Report: TN-49-02, 2004.

[16]

Micron Technologies, Inc., DDR3 SDRAM system-power calculator, revision 0.1, Mar. 2007.

[17]

O. Mutlu & T. Moscibroda, "Parallelism-aware batch scheduling: Enabling high-performance and fair shared memory controllers," IEEE Micro vol. 29, pp. 22--32, 2009.

Digital Library

[18]

K. Nesbit, N. Aggarwal, J. Laudon & J. Smith, "Fair queuing memory systems," in Proceedings of the International Symposium on Microarchitecture, pp. 208--222, 2006.

Digital Library

[19]

M. Qureshi, V. Srinivasan & J. Rivers, "Scalable high performance main memory system using phase-change memory technology," in Proceedings of the International Symposium on Computer Architecture, pp. 24--33, 2009.

Digital Library

[20]

K. Rajamani et al., "Power Management for Computer Systems and Datacenters", tutorial at the International Symposium on Low Power Electronics and Design (ISLPED), 2008.

Digital Library

[21]

S. Rixner, W. Dally, U. Kapasi, P. Mattson & J. Owens, "Memory access scheduling," in Proceedings of International Symposium on Computer Architecture, pp. 128--138, 2000.

Digital Library

[22]

Simics Microarchitect's Toolset, http://www.virtutech.com.

[23]

Standard Performance Evaluation Corporation, http://www.spec.org.

[24]

M. Valero, T. Lang & E. Ayguade, "Conflict-free access of vectors with power-of-two strides," in Proceedings of the International Conference on Supercomputing, pp. 149--156, 1992.

Digital Library

[25]

R. Venkatesan, A. AL-Zawawi & E. Rotenberg, "Tapping ZettaRAM for Low-Power Memory Systems," in 11th International Symposium on High-Performance Computer Architecture, pp. 83--94, 2005

Digital Library

Cited By

Alismail SKoch D(2023)Efficient Resource Scheduling for Runtime Reconfigurable Systems on FPGAs2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00025(123-129)Online publication date: 4-Sep-2023
https://doi.org/10.1109/FPL60245.2023.00025
Raoufi MZhang YYang J(2022)IR-ORAM: Path Access Type Based Memory Intensity Reduction for Path-ORAM2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00034(360-372)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00034
Chou YNg CCattell SIntan JSinclair MDevietti JRogers TAamodt T(2020)Deterministic Atomic Buffering2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00083(981-995)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00083
Show More Cited By

Index Terms

The virtual write queue: coordinating DRAM and last-level cache policies
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

The virtual write queue: coordinating DRAM and last-level cache policies
ISCA '10

In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU's data needs, and are mostly oblivious to the main memory. In this paper, we demonstrate that ...
Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
Reducing DRAM row activations with eager read/write clustering

This article describes and evaluates a new approach to optimizing DRAM performance and energy consumption that is based on eagerly writing dirty cache lines to DRAM. Under this approach, many dirty cache lines are written to DRAM before they are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

June 2010

520 pages

ISBN:9781450300537

DOI:10.1145/1815961

General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel

ACM SIGARCH Computer Architecture News Volume 38, Issue 3
ISCA '10
June 2010
508 pages
ISSN:0163-5964
DOI:10.1145/1816038
Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '10

Sponsor:

SIGARCH

ISCA '10: The 37th Annual International Symposium on Computer Architecture

June 19 - 23, 2010

Saint-Malo, France

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
1,666
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)13

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alismail SKoch D(2023)Efficient Resource Scheduling for Runtime Reconfigurable Systems on FPGAs2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00025(123-129)Online publication date: 4-Sep-2023
https://doi.org/10.1109/FPL60245.2023.00025
Raoufi MZhang YYang J(2022)IR-ORAM: Path Access Type Based Memory Intensity Reduction for Path-ORAM2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00034(360-372)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00034
Chou YNg CCattell SIntan JSinclair MDevietti JRogers TAamodt T(2020)Deterministic Atomic Buffering2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00083(981-995)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00083
Wang YOrosa LPeng XGuo YGhose SPatel MKim JLuna JSadrosadati MGhiasi NMutlu O(2020)FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00036(313-328)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00036
Cho BKwon YLym SErez MMartínez JDuato JEeckhout L(2020)Near data acceleration with concurrent host accessProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00072(818-831)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00072
Khairy MShen ZAamodt TRogers T(2020)Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00047(473-486)Online publication date: May-2020
https://doi.org/10.1109/ISCA45697.2020.00047
Balasubramonian R(2019)Innovations in the Memory SystemSynthesis Lectures on Computer Architecture10.2200/S00933ED1V01Y201906CAC04814:2(1-151)Online publication date: 10-Sep-2019
https://doi.org/10.2200/S00933ED1V01Y201906CAC048
Ghose SLi THajinazar NCali DMutlu O(2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3366708
Song SDas AMutlu OKandasamy N(2019)Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change MemoriesACM Transactions on Embedded Computing Systems10.1145/335818018:5s(1-25)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1145/3358180
Jang JShin WChoi JKim YKim L(2019)Sparse-Insertion Write Cache to Mitigate Write Disturbance Errors in Phase Change MemoryIEEE Transactions on Computers10.1109/TC.2018.288113768:5(752-764)Online publication date: 1-May-2019
https://doi.org/10.1109/TC.2018.2881137
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten