Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Morphable DRAM Cache Design for Hybrid Memory Systems

Published: 18 July 2019 Publication History

Abstract

DRAM caches have emerged as an efficient new layer in the memory hierarchy to address the increasing diversity of memory components. When a small amount of fast memory is combined with slow but large memory, the cache-based organization of the fast memory can provide a SW-transparent solution for the hybrid memory systems. In such DRAM cache designs, their effectiveness is affected by the bandwidth and latency of both fast and slow memory. To quantitatively assess the effect of memory configurations and application patterns on the DRAM cache designs, this article first investigates how three prior approaches perform with six hybrid memory scenarios. From the investigation, we observe no single DRAM cache organization always outperforms the other organizations across the diverse hybrid memory configurations and memory access patterns. Based on this observation, this article proposes a reconfigurable DRAM cache design that can adapt to different hybrid memory combinations and workload patterns. Unlike the fixed tag and data arrays of conventional on-chip SRAM caches, this study advocates to exploit the flexibility of DRAM caches, which can store tags and data to DRAM in any arbitrary way. Using a sample-based mechanism, the proposed DRAM cache controller dynamically finds the best organization from three candidates and applies the best one by reconfiguring the tags and data layout in the DRAM cache. Our evaluation shows that the proposed morphable DRAM cache can outperform the fixed DRAM configurations across six hybrid memory configurations.

References

[1]
Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). 631--644
[2]
Kursad Albayraktaroglu, Aamer Jaleel, Xue Wu, Manoj Franklin, Bruce Jacob, Chau-Wen Tseng, and Donald Yeung. 2005. BioBench: A benchmark suite of bioinformatics applications. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’15). 2--9.
[3]
Mike Amidi. 2016. NVDIMM-X deliver DRAM performance at NAND capacity (Flash Memory Summit’16).
[4]
W. Cheong, C. Yoon, S. Woo, K. Han, D. Kim, C. Lee, Y. Choi, S. Kim, D. Kang, G. Yu, J. Kim, J. Park, K. Song, K. Park, S. Cho, H. Oh, D. D. G. Lee, J. Choi, and J. Jeong. 2018. A flash memory controller for 15s ultra-low-latency SSD using high-speed 3D NAND flash with 3s read time. In IEEE International Solid - State Circuits Conference (ISSCC’18). 338--340.
[5]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. CAMEO: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). 1--12.
[6]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2015. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). 198--210.
[7]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2016. CANDY: Enabling coherent DRAM caches for multi-node systems. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). 1--13.
[8]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.
[9]
Cheng-Chieh Huang and Vijay Nagarajan. 2014. ATCache: Reducing DRAM cache latency via a small SRAM tag cache. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). 51--60.
[10]
Hakbeom Jang, Yongjun Lee, Jongwon Kim, Youngsok Kim, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. 2016. Efficient footprint caching for tagless DRAM caches. In IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 237--248.
[11]
Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, and Babak Falsafi. 2014. Unison cache: A scalable and effective die-stacked DRAM cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). 25--37.
[12]
Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). 404--415.
[13]
Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin, and Rajeev// Balasubramonian. 2010. CHOP: Adaptive filter-based DRAM caching for CMP server platforms. In The 16th International Symposium on High-Performance Computer Architecture (HPCA’10). 1--12.
[14]
Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS: OS design for heterogeneous memory management in datacenter. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). 521--534.
[15]
Harshad Kasture and Daniel Sánchez. 2016. Tailbench: A benchmark suite and evaluation methodology for latency-critical applications. In IEEE International Symposium on Workload Characterization (IISWC’16). IEEE Computer Society, 3--12.
[16]
Kwangwon Koh, Kangho Kim, Seunghyub Jeon, and Jaehyuk Huh. 2019. Disaggregated cloud memory with elastic block management. IEEE Trans. Comput. 68, 1 (2019), 39--52.
[17]
D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin, J. H. Cho, K. H. Kwon, M. J. Kim, J. Lee, K. W. Park, B. Chung, and S. Hong. 2014. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). 432--433.
[18]
Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. 2015. A fully associative, tagless DRAM cache. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). 211--222.
[19]
Gabriel H. Loh. 2009. Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 201--212.
[20]
Gabriel H. Loh and Mark D. Hill. 2011. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). 454--464.
[21]
Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 126--136.
[22]
Micron 2016. 576Mb: x18, x36 RLDRAM 3. Micron.
[23]
Kyle J. Nesbit, Nidhi Aggarwal, James Laudon, and James E. Smith. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39). 208--222.
[24]
J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In IEEE Hot Chips 23 Symposium (HCS’11). 1--24.
[25]
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2008. Set-dueling-controlled adaptive insertion for high-performance caching. IEEE Micro 28, 1 (Jan. 2008), 91--98.
[26]
Moinuddin K. Qureshi and Gabe H. Loh. 2012. Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). 235--246.
[27]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). 24--33.
[28]
Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Comput. Archit. Lett. 10, 1 (Jan. 2011), 16--19.
[29]
Samsung Electronics 2017. 4Gb E-die DDR4 SDRAM. Samsung Electronics. Rev 1.6.
[30]
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). 475--486.
[31]
Jaewoong Sim, Alaa R. Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked DRAM as part of memory. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). 13--24.
[32]
Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation intel xeon phi product. IEEE Micro 36, 2 (March 2016), 34--46.
[33]
spec.org 2017. SPEC CPU2017 Documentation. Retrieved on Oct. 1, 2018 from https://www.spec.org/cpu2017/Docs/.
[34]
Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-defined cache hierarchies. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). 652--665.
[35]
Stavros Volos, Djordje Jevdjic, Babak Falsafi, and Boris Grot. 2017. Fat caches for scale-out servers. IEEE Micro 37, 2 (Mar. 2017), 90--103.
[36]
H.-S. Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P. Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E. Goodson. 2010. Phase change memory. Proc. IEEE 98, 12 (Dec. 2010), 2201--2227.
[37]
Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, and Srinivas Devadas. 2017. Banshee: Bandwidth-efficient DRAM caching via software/hardware cooperation. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50). 1--14.

Cited By

View all
  • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
  • (2022)Challenges in Design, Data Placement, Migration and Power-Performance Trade-offs in DRAM-NVM-based Hybrid Memory SystemsIETE Technical Review10.1080/02564602.2022.212794540:4(498-520)Online publication date: 13-Oct-2022

Index Terms

  1. Morphable DRAM Cache Design for Hybrid Memory Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 16, Issue 3
      September 2019
      347 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3341169
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 July 2019
      Accepted: 01 May 2019
      Revised: 01 May 2019
      Received: 01 October 2018
      Published in TACO Volume 16, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. DRAM cache
      2. high bandwidth memory
      3. hybrid memory systems
      4. nonvolatile memory
      5. reconfigurable cache design

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Samsung Electronics (DRAM-NAND Hybrid Memory Architecture Research)
      • Ministry of Science and ICT, Korea
      • National Research Foundation of Korea
      • Institute for Information and Communications Technology Promotion

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)159
      • Downloads (Last 6 weeks)25
      Reflects downloads up to 06 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
      • (2022)Challenges in Design, Data Placement, Migration and Power-Performance Trade-offs in DRAM-NVM-based Hybrid Memory SystemsIETE Technical Review10.1080/02564602.2022.212794540:4(498-520)Online publication date: 13-Oct-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media