ROP: Alleviating refresh overheads via reviving the memory system in frozen cycles

P Huang, W Liu, K Tang, X He… - 2016 45th International …, 2016 - ieeexplore.ieee.org
2016 45th International Conference on Parallel Processing (ICPP), 2016ieeexplore.ieee.org
DRAM memory performs periodic refreshes to prevent data loss due to charge leakage,
while memory refreshes cause performance degradation and energy consumption, referred
to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to
alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines
from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is
undergoing refresh, memory requests can still be serviced rather than being blocked. At the …
DRAM memory performs periodic refreshes to prevent data loss due to charge leakage, while memory refreshes cause performance degradation and energy consumption, referred to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is undergoing refresh, memory requests can still be serviced rather than being blocked. At the core of ROP is a probabilistic prefetch model determining which cache lines are prefetched for a refresh based on the access patterns appearing in an observational window ahead of the refresh. A Pattern Profiler collects statistics about memory traffic occurring before and after the starting time of each refresh operation in a period of training time and it outputs two conditional probabilities which are used to control subsequent prefetch decisions. A Prefetcher maintains a prediction table which helps to ascertain access patterns appearing around refresh operations. The prediction table is updated every time an access occurs to the to-be-nextrefreshed ran during the observational window and is consulted to decide which cache lines are prefetched. Extensive evaluation results with benchmarks from SPEC CPU2006 on a DDR4 memory have demonstrated that with ROP memory performance can be improved by up to 9.2% (3.3% on average) for singlecore simulations, while reducing the overall memory energy by up to 6.7% (3.6% on average), relative to an auto-refresh baseline memory. Moreover, it increases the Weighted Speedup by up to 2.22X (1.32X on average) for 4-core multiprogram simulations, while reducing energy by up to 48.8% (24.4% on average).
ieeexplore.ieee.org