Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2751205.2751227acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

History-Assisted Adaptive-Granularity Caches (HAAG$) for High Performance 3D DRAM Architectures

Published: 08 June 2015 Publication History

Abstract

3D-stacked DRAM has the potential to provide high performance and large capacity memory for future high performance computing systems and datacenters, and the integration of a dedicated logic die opens up opportunities for architectural enhancements such as DRAM row-buffer caches. However, high performance and cost-effective row-buffer cache designs remain challenging for 3D memory systems. In this paper, we propose History-Assisted Adaptive-Granularity Cache (HAAG$) that employs an adaptive caching scheme to support full associativity at various granularities, and an intelligent history-assisted predictor to support a large number of banks in 3D memory systems. By increasing the row-buffer cache hit rate and avoiding unnecessary data caching, HAAG$ significantly reduces memory access latency and dynamic power. Our design works particularly well for manycore CPUs running (irregular) memory intensive applications where memory locality is hard to exploit. Evaluation results show that with memory-intensive CPU workloads, HAAG$ can outperform the state-of-the-art row buffer cache by 33.5%.

References

[1]
"Hybrid Memory Cube Specification 1.0," Hybrid Memory Cube Consortium, Tech. Rep., Jan 2013.
[2]
J. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber, "Future Scaling of Processor-Memory Interfaces," in Supercomputing, 2009.
[3]
J. Ahn, S. Li, S. O, and N. P. Jouppi, "McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling," in ISPASS, 2013.
[4]
R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu, "Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems," in ISCA, 2012.
[5]
K. Chen, S. Li, N. Muralimanohar, J. B. Brockman, and N. P. Jouppi, "CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory," in DATE, 2012.
[6]
X. Dong, Y. Xie, N. Muralimanohar, N. Jouppi, and R. Kaufmann, "Simple but Effective Heterogeneous Main Memory with On-chip Memory Controller Support," in SC, 2010.
[7]
X. Dong and Y. Xie, "System-Level Cost Analysis and Design Exploration for Three-Dimensional Integrated Circuits (3D ICs)," in ASP-DAC, 2009.
[8]
T. J. Ham, B. Chelepalli, N. Xue, and B. Lee, "Disintegrated Control for Energy-Efficient and Heterogeneous Memory Systems," in HPCA, 2013.
[9]
H. Hidaka, Y. Matsuda, M. Asakura, and K. Fujishima, "The cache DRAM Architecture: A DRAM with an On-Chip Cache Memory," vol. 10, no. 2, pp. 14--25, 1990.
[10]
IC Konwledge LLC, "IC Cost Model Revision 1105 http://www.icknowledge.com/," 2011.
[11]
K. Inoue, K. Kai, and K. Murakami, "Dynamically Variable Line-Size Cache Exploiting High On-Chip Memory Bandwidth of Merged DRAM/Logic LSIs," in HPCA, 1999, pp. 218--222.
[12]
JEDEC, "High Bandwidth Memory (HBM) DRAM - JESD235," Oct 2013. {Online}. Available: http://www.jedec.org/standards-documents/docs/jesd235.
[13]
D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, "Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache," in MICRO, 2014.
[14]
U. Kang et al., "8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology," JSSC, vol. 45, no. 1, pp. 111--119, 2010.
[15]
G. Kedem and R. P. Koganti, "WCDRAM: A Fully Associative Integrated Cached-DRAM with Wide Cache Lines," Duke University, Tech. Rep., 1997.
[16]
T. Kgil, S. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, and K. Flautner, "PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor," in ASPLOS, 2006.
[17]
J.-S. Kim et al., "A 1.2V 12.8GB/s 2Gb Mobile Wide-I/O DRAM with 4x128 I/Os Using TSV-Based Stacking," in ISSCC, 2011.
[18]
P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, vol. 25, no. 2, pp. 21--29, 2005.
[19]
D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, "Tiered-latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, 2013.
[20]
S. Li, J. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in MICRO, 2009.
[21]
S. Li, D. H. Yoon, K. Chen, J. Zhao, J. Ahn, J. B. Brockman, Y. Xie, and N. P. Jouppi, "MAGE: Adaptive Granularity and ECC for Resilient and Power Efficient Memory Systems," in SC, 2012.
[22]
G. H. Loh, "3D-Stacked Memory Architectures for Multi-Core Processors," in ISCA, 2008.
[23]
____, "A Register-file Approach for Row Buffer Caches in Die-stacked DRAMs," in MICRO, 2011.
[24]
G. Loi, B. Agrawal, N. Srivastava, S. Lin, T. Sherwood, and K. Banerjee, "A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy," in DAC, 2006.
[25]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," in PLDI, Jun 2005.
[26]
H. McGhan, "SPEC CPU2006 Benchmark Suite," in Microprocessor Report, Oct 2006.
[27]
S. W. Moore and B. T. Graham, "Tagged Up/Down Sorter - a Hardware Priority Queue," The Computer Journal, vol. 38, pp. 695--703, 1995.
[28]
O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," in ISCA, 2008.
[29]
J. T. Pawlowski, "Hybrid Memory Cube (HMC)," in Hot Chips, 2011.
[30]
Samsung Electronics, "DDR3 SDRAM Datasheet," 2002.
[31]
Semiconductor Industries Association, "International Technology Roadmap for Semiconductors." 2007.
[32]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in ASPLOS, 2002.
[33]
Y. H. Son, S. O, H. Yang, D. Jung, J. Ahn, J. Kim, J. Kim, and J. W. Lee, "Microbank: Architecting Through-silicon Interposer-based Main Memory Systems," in SC, 2014.
[34]
D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. S. Lee, "An Optimized 3D-Stacked Memory Architecture by Exploiting Excessive, High-Density TSV Bandwidth," in HPCA, 2010.
[35]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 programs: Characterization and Methodological Considerations," in ISCA, 1995.
[36]
D. H. Yoon, M. K. Jeong, and M. Erez, "Adaptive Granularity Memory Systems: A Tradeoff between Storage Efficiency and Throughput," in ISCA, June 2011.
[37]
Z. Zhang, Z. Zhu, and X. Zhang, "Cached DRAM for ILP Processor Memory Access Latency Reduction," in IEEE Micro, vol. 21, no. 4, 2001, pp. 22--32.

Cited By

View all
  • (2017)SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-Stacked DRAM as Part of Memory2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2017.23(187-197)Online publication date: Sep-2017
  • (2016)Integrated Thermal Analysis for Processing In Die-Stacking MemoryProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989093(402-414)Online publication date: 3-Oct-2016

Index Terms

  1. History-Assisted Adaptive-Granularity Caches (HAAG$) for High Performance 3D DRAM Architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
    June 2015
    446 pages
    ISBN:9781450335591
    DOI:10.1145/2751205
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3d dram
    2. adaptive granularity
    3. haag$
    4. row buffer cache

    Qualifiers

    • Research-article

    Funding Sources

    • Center for Circuit & System Solutions (C2S2)
    • MOTIE/KSRC, the Future Semiconductor Device Technology Development Program

    Conference

    ICS'15
    Sponsor:
    ICS'15: 2015 International Conference on Supercomputing
    June 8 - 11, 2015
    California, Newport Beach, USA

    Acceptance Rates

    ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-Stacked DRAM as Part of Memory2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2017.23(187-197)Online publication date: Sep-2017
    • (2016)Integrated Thermal Analysis for Processing In Die-Stacking MemoryProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989093(402-414)Online publication date: 3-Oct-2016

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media