CACTI 7: New tools for interconnect exploration in innovative off-chip memories

R Balasubramonian, AB Kahng… - ACM Transactions on …, 2017 - dl.acm.org
R Balasubramonian, AB Kahng, N Muralimanohar, A Shafiee, V Srinivas
ACM Transactions on Architecture and Code Optimization (TACO), 2017dl.acm.org
Historically, server designers have opted for simple memory systems by picking one of a few
commoditized DDR memory products. We are already witnessing a major upheaval in the off-
chip memory hierarchy, with the introduction of many new memory products—buffer-on-
board, LRDIMM, HMC, HBM, and NVMs, to name a few. Given the plethora of choices, it is
expected that different vendors will adopt different strategies for their high-capacity memory
systems, often deviating from DDR standards and/or integrating new functionality within …
Historically, server designers have opted for simple memory systems by picking one of a few commoditized DDR memory products. We are already witnessing a major upheaval in the off-chip memory hierarchy, with the introduction of many new memory products—buffer-on-board, LRDIMM, HMC, HBM, and NVMs, to name a few. Given the plethora of choices, it is expected that different vendors will adopt different strategies for their high-capacity memory systems, often deviating from DDR standards and/or integrating new functionality within memory systems. These strategies will likely differ in their choice of interconnect and topology, with a significant fraction of memory energy being dissipated in I/O and data movement. To make the case for memory interconnect specialization, this paper makes three contributions.
First, we design a tool that carefully models I/O power in the memory system, explores the design space, and gives the user the ability to define new types of memory interconnects/topologies. The tool is validated against SPICE models, and is integrated into version 7 of the popular CACTI package. Our analysis with the tool shows that several design parameters have a significant impact on I/O power.
We then use the tool to help craft novel specialized memory system channels. We introduce a new relay-on-board chip that partitions a DDR channel into multiple cascaded channels. We show that this simple change to the channel topology can improve performance by 22% for DDR DRAM and lower cost by up to 65% for DDR DRAM. This new architecture does not require any changes to DIMMs, and it efficiently supports hybrid DRAM/NVM systems.
Finally, as an example of a more disruptive architecture, we design a custom DIMM and parallel bus that moves away from the DDR3/DDR4 standards. To reduce energy and improve performance, the baseline data channel is split into three narrow parallel channels and the on-DIMM interconnects are operated at a lower frequency. In addition, this allows us to design a two-tier error protection strategy that reduces data transfers on the interconnect. This architecture yields a performance improvement of 18% and a memory power reduction of 23%.
The cascaded channel and narrow channel architectures serve as case studies for the new tool and show the potential for benefit from re-organizing basic memory interconnects.
ACM Digital Library