Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Dirty-Block Tracking in a Direct-Mapped DRAM Cache with Self-Balancing Dispatch

Published: 10 May 2017 Publication History

Abstract

Recently, processors have begun integrating 3D stacked DRAMs with the cores on the same package, and there have been several approaches to effectively utilizing the on-package DRAMs as caches. This article presents an approach that combines the previous approaches in a synergistic way by devising a module called the dirty-block tracker to maintain the dirtiness of each block in a dirty region. The approach avoids unnecessary tag checking for a write operation if the corresponding block in the cache is not dirty. Our simulation results show that the proposed technique achieves a 10.3% performance improvement on average over the state-of-the-art DRAM cache technique.

Supplementary Material

TACO1402-11 (taco1402-11.pdf)
Slide deck associated with this paper

References

[1]
Jung Ho Ahn, Sheng Li, O. Seongil, and Norman P. Jouppi. 2013. McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’13). IEEE, 74--85.
[2]
Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’12). IEEE, 33--38.
[3]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1--12.
[4]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2015. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches. In Proceedings of the 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15). IEEE, 198--210.
[5]
Nagendra Gulur, Mahesh Mehendale, R. Manikantan, and R. Govindarajan. 2014. Bi-modal dram cache: Improving hit rate, hit latency and bandwidth. In Proceedings of the 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 38--50.
[6]
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1--28.
[7]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17.
[8]
Cheng-Chieh Huang and Vijay Nagarajan. 2014. ATCache: Reducing DRAM cache latency via a small SRAM tag cache. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, 51--60.
[9]
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, 87--88.
[10]
High Bandwidth Memory JEDEC. 2013. DRAM (JESD235).
[11]
Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, and Babak Falsafi. 2014. Unison cache: A scalable and effective die-stacked DRAM cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 25--37.
[12]
Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache. ACM SIGARCH Computer Architecture News 41, 3 (2013), 404--415.
[13]
Jung-Sik Kim, Chi Sung Oh, Hocheol Lee, Donghyuk Lee, Hyong Ryol Hwang, Sooman Hwang, Byongwook Na, Joungwook Moon, Jin-Guk Kim, Hanna Park, Jang-Woo Ryu, Kiwon Park, Sang Kyu Kang, So-Young Kim, Hoyoung Kim, Jong-Min Bang, Hyunyoon Cho, Minsoo Jang, Cheolmin Han, Jung-Bae Lee, Joo Sun Choi, and Young-Hyun Jun. 2012. A 1.2 V 12.8 GB/s 2 Gb mobile wide-I/O DRAM With 4 × 128 I/Os using TSV based stacking. IEEE Journal of Solid-State Circuits 47, 1 (2012), 107--116.
[14]
Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. 2015. A fully associative, tagless DRAM cache. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 211--222.
[15]
Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the 35th International Symposium on Computer Architecture, 2008 (ISCA’08). IEEE, 453--464.
[16]
Gabriel H. Loh and Mark D. Hill. 2011. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 454--464.
[17]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.
[18]
Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ACM SIGARCH Computer Architecture News, Vol. 36. IEEE Computer Society, 63--74.
[19]
Moinuddin K. Qureshi and Gabriel H. Loh. 2012. Fundamental latency trade-off in architecting dram caches: Outperforming impractical sram-tags with a simple and practical design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 235--246.
[20]
Vivek Seshadri, Abhishek Bhowmick, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2014. The dirty-block index. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE, 157--168.
[21]
Jaewoong Sim, Gabriel H. Loh, Hyesoon Kim, Mike O’Connor, and Mithuna Thottethodi. 2012. A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 247--257.
[22]
Young Hoon Son, O. Seongil, Hyunggyun Yang, Daejin Jung, Jung Ho Ahn, John Kim, Jangwoo Kim, and Jae W. Lee. 2014. Microbank: Architecting through-silicon interposer-based main memory systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 1059--1070.
[23]
Dong Hyuk Woo, Nak Hee Seong, Dean L. Lewis, and Hsien-Hsin S. Lee. 2010. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA’10). IEEE, 1--12.

Cited By

View all
  • (2018)A New Method of Live Tracking of Process MemoryProceedings of the 2nd International Conference on Cryptography, Security and Privacy10.1145/3199478.3199497(154-158)Online publication date: 16-Mar-2018
  • (2017)HAShCacheACM Transactions on Architecture and Code Optimization10.1145/315864114:4(1-26)Online publication date: 18-Dec-2017

Index Terms

  1. Dirty-Block Tracking in a Direct-Mapped DRAM Cache with Self-Balancing Dispatch

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 2
    June 2017
    259 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3086564
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 May 2017
    Accepted: 01 February 2017
    Revised: 01 January 2017
    Received: 01 May 2016
    Published in TACO Volume 14, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D-stacked memory
    2. DRAM Cache
    3. memory bandwidth

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)A New Method of Live Tracking of Process MemoryProceedings of the 2nd International Conference on Cryptography, Security and Privacy10.1145/3199478.3199497(154-158)Online publication date: 16-Mar-2018
    • (2017)HAShCacheACM Transactions on Architecture and Code Optimization10.1145/315864114:4(1-26)Online publication date: 18-Dec-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media