research-article

Open access

Dirty-Block Tracking in a Direct-Mapped DRAM Cache with Self-Balancing Dispatch

Authors:

Kiyoung ChoiAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 2

Article No.: 11, Pages 1 - 25

https://doi.org/10.1145/3068460

Published: 10 May 2017 Publication History

Abstract

Recently, processors have begun integrating 3D stacked DRAMs with the cores on the same package, and there have been several approaches to effectively utilizing the on-package DRAMs as caches. This article presents an approach that combines the previous approaches in a synergistic way by devising a module called the dirty-block tracker to maintain the dirtiness of each block in a dirty region. The approach avoids unnecessary tag checking for a write operation if the corresponding block in the cache is not dirty. Our simulation results show that the proposed technique achieves a 10.3% performance improvement on average over the state-of-the-art DRAM cache technique.

Supplementary Material

TACO1402-11 (taco1402-11.pdf)

Slide deck associated with this paper

Download
1.57 MB

References

[1]

Jung Ho Ahn, Sheng Li, O. Seongil, and Norman P. Jouppi. 2013. McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’13). IEEE, 74--85.

[2]

Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’12). IEEE, 33--38.

Digital Library

[3]

Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1--12.

Digital Library

[4]

Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2015. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches. In Proceedings of the 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15). IEEE, 198--210.

Digital Library

[5]

Nagendra Gulur, Mahesh Mehendale, R. Manikantan, and R. Govindarajan. 2014. Bi-modal dram cache: Improving hit rate, hit latency and bandwidth. In Proceedings of the 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 38--50.

Digital Library

[6]

Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1--28.

[7]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17.

Digital Library

[8]

Cheng-Chieh Huang and Vijay Nagarajan. 2014. ATCache: Reducing DRAM cache latency via a small SRAM tag cache. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, 51--60.

Digital Library

[9]

Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, 87--88.

[10]

High Bandwidth Memory JEDEC. 2013. DRAM (JESD235).

[11]

Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, and Babak Falsafi. 2014. Unison cache: A scalable and effective die-stacked DRAM cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 25--37.

Digital Library

[12]

Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache. ACM SIGARCH Computer Architecture News 41, 3 (2013), 404--415.

Digital Library

[13]

Jung-Sik Kim, Chi Sung Oh, Hocheol Lee, Donghyuk Lee, Hyong Ryol Hwang, Sooman Hwang, Byongwook Na, Joungwook Moon, Jin-Guk Kim, Hanna Park, Jang-Woo Ryu, Kiwon Park, Sang Kyu Kang, So-Young Kim, Hoyoung Kim, Jong-Min Bang, Hyunyoon Cho, Minsoo Jang, Cheolmin Han, Jung-Bae Lee, Joo Sun Choi, and Young-Hyun Jun. 2012. A 1.2 V 12.8 GB/s 2 Gb mobile wide-I/O DRAM With 4 × 128 I/Os using TSV based stacking. IEEE Journal of Solid-State Circuits 47, 1 (2012), 107--116.

[14]

Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. 2015. A fully associative, tagless DRAM cache. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 211--222.

Digital Library

[15]

Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the 35th International Symposium on Computer Architecture, 2008 (ISCA’08). IEEE, 453--464.

Digital Library

[16]

Gabriel H. Loh and Mark D. Hill. 2011. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 454--464.

Digital Library

[17]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.

[18]

Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ACM SIGARCH Computer Architecture News, Vol. 36. IEEE Computer Society, 63--74.

Digital Library

[19]

Moinuddin K. Qureshi and Gabriel H. Loh. 2012. Fundamental latency trade-off in architecting dram caches: Outperforming impractical sram-tags with a simple and practical design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 235--246.

Digital Library

[20]

Vivek Seshadri, Abhishek Bhowmick, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2014. The dirty-block index. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE, 157--168.

Digital Library

[21]

Jaewoong Sim, Gabriel H. Loh, Hyesoon Kim, Mike O’Connor, and Mithuna Thottethodi. 2012. A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 247--257.

Digital Library

[22]

Young Hoon Son, O. Seongil, Hyunggyun Yang, Daejin Jung, Jung Ho Ahn, John Kim, Jangwoo Kim, and Jae W. Lee. 2014. Microbank: Architecting through-silicon interposer-based main memory systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 1059--1070.

Digital Library

[23]

Dong Hyuk Woo, Nak Hee Seong, Dean L. Lewis, and Hsien-Hsin S. Lee. 2010. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA’10). IEEE, 1--12.

Cited By

Zhang QWu YCui C(2018)A New Method of Live Tracking of Process MemoryProceedings of the 2nd International Conference on Cryptography, Security and Privacy10.1145/3199478.3199497(154-158)Online publication date: 16-Mar-2018
https://dl.acm.org/doi/10.1145/3199478.3199497
Patil AGovindarajan R(2017)HAShCacheACM Transactions on Architecture and Code Optimization10.1145/315864114:4(1-26)Online publication date: 18-Dec-2017
https://dl.acm.org/doi/10.1145/3158641

Index Terms

Dirty-Block Tracking in a Direct-Mapped DRAM Cache with Self-Balancing Dispatch
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Morphable DRAM Cache Design for Hybrid Memory Systems

DRAM caches have emerged as an efficient new layer in the memory hierarchy to address the increasing diversity of memory components. When a small amount of fast memory is combined with slow but large memory, the cache-based organization of the fast ...
Micro-Sector Cache: Improving Space Utilization in Sectored DRAM Caches

Recent research proposals on DRAM caches with conventional allocation units (64 or 128 bytes) as well as large allocation units (512 bytes to 4KB) have explored ways to minimize the space/latency impact of the tag store and maximize the effective ...
Opportunistic compression for direct-mapped DRAM caches
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Large off-chip DRAM caches offer performance and bandwidth improvements for many systems by bridging the gap between on-chip last level caches and off-chip memories. To avoid the high hit latency resulting from serial DRAM accesses for tags and data, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 14, Issue 2

June 2017

259 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3086564

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2017

Accepted: 01 February 2017

Revised: 01 January 2017

Received: 01 May 2016

Published in TACO Volume 14, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
662
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)13

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang QWu YCui C(2018)A New Method of Live Tracking of Process MemoryProceedings of the 2nd International Conference on Cryptography, Security and Privacy10.1145/3199478.3199497(154-158)Online publication date: 16-Mar-2018
https://dl.acm.org/doi/10.1145/3199478.3199497
Patil AGovindarajan R(2017)HAShCacheACM Transactions on Architecture and Code Optimization10.1145/315864114:4(1-26)Online publication date: 18-Dec-2017
https://dl.acm.org/doi/10.1145/3158641

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents