Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2989081.2989100acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

MicroRefresh: Minimizing Refresh Overhead in DRAM Caches

Published: 03 October 2016 Publication History
  • Get Citation Alerts
  • Abstract

    DRAM memory systems require periodic recharging to avoid loss of data from leaky capacitors. These refresh operations consume energy and reduce the duration of time for which the DRAM banks are available to service memory requests. Higher DRAM density and 3D-stacking aggravate the refresh overheads, incurring even higher energy and performance costs. 3D-stacked DRAM and other emerging on-chip High Bandwidth Memory (HBM) technologies which are widely considered to be changing the landscape of memory hierarchy in future heterogeneous and many-core architectures could suffer significantly from refresh overheads.
    Such large on-chip memory, when used as a very large last-level cache, however, provides opportunities for addressing the refresh overheads. In this work, we propose MicroRefresh, a scheme for almost eliminating the refresh overhead in DRAM caches. MicroRefresh eliminates unwanted refresh of recently accessed DRAM pages; it takes advantage of the relative latency difference between on-chip and off-chip DRAM and achieves a fine balance of usage of system resources by aggressively opportunistically eliminating refresh of older DRAM pages. It tolerates any resulting increase in cache misses by leveraging the under-utilized main memory bandwidth. The resulting organization eliminates the energy and performance overhead of refresh operations in the DRAM cache to achieve overall performance and energy improvement.
    Across both 4-core and 8-core workloads, MicroRefresh eliminates 92% the refresh energy consumed in the baseline periodic refresh mechanism. Further this is accompanied by performance improvements of upto 10%, with average improvements of 3.9% and 3.4% in 4-core and 8-core respectively.

    References

    [1]
    A. Agrawal, P. Jain, A. Ansari, and J. Torrellas, "Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies," in 19th IEEE International Symposium on High Performance Computer Architecture, HPCA 2013, Shenzhen, China, February 23-27, 2013, 2013, pp. 400--411. {Online}. Available: http://dx.doi.org/10.1109/HPCA.2013.6522336
    [2]
    N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. {Online}. Available: http://doi.acm.org/10.1145/2024716.2024718
    [3]
    B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, "Die stacking (3d) microarchitecture," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 39. Washington, DC, USA: IEEE Computer Society, 2006, pp. 469--479. {Online}. Available: http://dx.doi.org/10.1109/MICRO.2006.18
    [4]
    C. Chou, A. Jaleel, and M. K. Qureshi, "Bear: Techniques for mitigating bandwidth bloat in gigascale dram caches," in Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ser. ISCA '15. New York, NY, USA: ACM, 2015, pp. 198--210. {Online}. Available: http://doi.acm.org/10.1145/2749469.2750387
    [5]
    P. G. Emma, W. R. Reohr, and M. Meterelliyoz, "Rethinking refresh: Increasing availability and reducing power in dram for cache applications." IEEE Micro, vol. 28, no. 6, pp. 47--56, 2008.
    [6]
    S. Eyerman and L. Eeckhout, "System-level performance metrics for multiprogram workloads." IEEE Micro, vol. 28, no. 3, pp. 42--53, 2008.
    [7]
    K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy caches: Simple techniques for reducing leakage power," in Proceedings of the 29th Annual International Symposium on Computer Architecture, ser. ISCA '02. Washington, DC, USA: IEEE Computer Society, 2002, pp. 148--157. {Online}. Available: http://dl.acm.org/citation.cfm?id=545215.545232
    [8]
    M. Ghosh and H.-H. S. Lee, "Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3d die-stacked drams," in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40. Washington, DC, USA: IEEE Computer Society, 2007, pp. 134--145. {Online}. Available: http://dx.doi.org/10.1109/MICRO.2007.38
    [9]
    N. Gulur, M. Mehendale, R. Manikantan, and R. Govindarajan, "Bi-modal dram cache: Improving hit rate, hit latency and bandwidth," in Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, Dec 2014, pp. 38--50.
    [10]
    N. Gulur, M. Mehendale, and R. Govindarajan, "A comprehensive analytical performance model of dram caches," in Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, ser. ICPE '15. New York, NY, USA: ACM, 2015, pp. 157--168. {Online}. Available: http://doi.acm.org/10.1145/2668930.2688044
    [11]
    J. L. Henning, "Spec cpu2006 benchmark descriptions," SIGARCH Comput. Archit. News, vol. 34, no. 4, pp. 1--17, Sep. 2006. {Online}. Available: http://doi.acm.org/10.1145/1186736.1186737
    [12]
    C.-C. Huang and V. Nagarajan, "Atcache: Reducing dram cache latency via a small sram tag cache," in Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, ser. PACT '14. New York, NY, USA: ACM, 2014, pp. 51--60. {Online}. Available: http://doi.acm.org/10.1145/2628071.2628089
    [13]
    B. Jacob, S. Ng, and D. Wang, Memory Systems: Cache, DRAM, Disk. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007.
    [14]
    JEDEC, "Ddr3 sdram specification," 2010.
    [15]
    D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache," in Proceedings of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY, USA: ACM, 2013, pp. 404--415. {Online}. Available: http://doi.acm.org/10.1145/2485922.2485957
    [16]
    X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian, "Chop: Adaptive filter-based dram caching for cmp server platforms." in HPCA, M. T. Jacob, C. R. Das, and P. Bose, Eds. IEEE Computer Society, 2010, pp. 1--12.
    [17]
    S. Kaxiras, Z. Hu, and M. Martonosi, "Cache decay: Exploiting generational behavior to reduce cache leakage power," in Proceedings of the 28th Annual International Symposium on Computer Architecture, ser. ISCA '01. New York, NY, USA: ACM, 2001, pp. 240--251. {Online}. Available: http://doi.acm.org/10.1145/379240.379268
    [18]
    F. Liu, X. Jiang, and Y. Solihin, "Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance." in HPCA, M. T. Jacob, C. R. Das, and P. Bose, Eds. IEEE Computer Society, 2010, pp. 1--12.
    [19]
    J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, "Raidr: Retention-aware intelligent dram refresh," in Proceedings of the 39th Annual International Symposium on Computer Architecture, ser. ISCA '12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 1--12. {Online}. Available: http://dl.acm.org/citation.cfm?id=2337159.2337161
    [20]
    G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked dram caches," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44. New York, NY, USA: ACM, 2011, pp. 454--464. {Online}. Available: http://doi.acm.org/10.1145/2155620.2155673
    [21]
    MICRON, "Micron ddr3 power calculator," 2009. {Online}. Available: http://www.micron.com/products/support/power-calc
    [22]
    J. Mukundan, H. Hunter, K.-h. Kim, J. Stuecheli, and J. F. Martínez, "Understanding and mitigating refresh overheads in high-density ddr4 dram systems," in Proceedings of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY, USA: ACM, 2013, pp. 48--59. {Online}. Available: http://doi.acm.org/10.1145/2485922.2485927
    [23]
    S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, "Reducing memory interference in multicore systems via application-aware memory channel partitioning," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44. New York, NY, USA: ACM, 2011, pp. 374--385. {Online}. Available: http://doi.acm.org/10.1145/2155620.2155664
    [24]
    M. K. Qureshi and G. H. Loh, "Fundamental latency trade-off in architecting dram caches: Outperforming impractical sram-tags with a simple and practical design," in Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-45. Washington, DC, USA: IEEE Computer Society, 2012, pp. 235--246. {Online}. Available: http://dx.doi.org/10.1109/MICRO.2012.30
    [25]
    S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory access scheduling," in Proceedings of the 27th Annual International Symposium on Computer Architecture, ser. ISCA '00. New York, NY, USA: ACM, 2000, pp. 128--138. {Online}. Available: http://doi.acm.org/10.1145/339647.339668
    [26]
    S. M. Ross, Introduction to Probability Models, Ninth Edition, 2006.
    [27]
    J. Sim, G. H. Loh, H. Kim, M. O'Connor, and M. Thottethodi, "A mostly-clean dram cache for effective hit speculation and self-balancing dispatch," in Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-45. Washington, DC, USA: IEEE Computer Society, 2012, pp. 247--257. {Online}. Available: http://dx.doi.org/10.1109/MICRO.2012.31
    [28]
    J. Sim, G. H. Loh, H. Kim, M. O'Connor, and M. Thottethodi, "A mostly-clean dram cache for effective hit speculation and self-balancing dispatch," in Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-45. Washington, DC, USA: IEEE Computer Society, 2012, pp. 247--257. {Online}. Available: http://dx.doi.org/10.1109/MICRO.2012.31
    [29]
    J. Stuecheli, D. Kaseridis, H. C.Hunter, and L. K. John, "Elastic refresh: Techniques to mitigate refresh penalties in high density memory," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO '43. Washington, DC, USA: IEEE Computer Society, 2010, pp. 375--384. {Online}. Available: http://dx.doi.org/10.1109/MICRO.2010.22
    [30]
    R. K. Venkatesan, S. Herr, and E. Rotenberg, "Retention-aware placement in dram (rapid):software methods for quasi-non-volatile dram," in In Proceedings of the Twelfth Annual Symposium on High Performance Computer Architecture, 2006, pp. 155--165.
    [31]
    C. Wilkerson, A. R. Alameldeen, Z. Chishti, W. Wu, D. Somasekhar, and S.-l. Lu, "Reducing cache power with low-cost, multi-bit error-correcting codes," in Proceedings of the 37th Annual International Symposium on Computer Architecture, ser. ISCA '10. New York, NY, USA: ACM, 2010, pp. 83--93. {Online}. Available: http://doi.acm.org/10.1145/1815961.1815973
    [32]
    H.-S. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran, M. Asheghi, and K. E. Goodson, "Phase change memory," Proceedings of the IEEE, vol. 98, no. 12, pp. 2201--2227, Dec 2010.
    [33]
    M. Zhou, Y. Du, B. Childers, R. Melhem, and D. Mossé, "Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems," ACM Trans. Archit. Code Optim., vol. 8, no. 4, pp. 53:1--53:21, Jan. 2012. {Online}. Available: http://doi.acm.org/10.1145/2086696.2086732

    Cited By

    View all
    • (2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
    • (2020)DSMProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33921514:2(1-26)Online publication date: 12-Jun-2020
    • (2018)VRL-DRAMProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196136(1-6)Online publication date: 24-Jun-2018
    • Show More Cited By
    1. MicroRefresh: Minimizing Refresh Overhead in DRAM Caches

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
      October 2016
      463 pages
      ISBN:9781450343053
      DOI:10.1145/2989081
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 October 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MEMSYS '16

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
      • (2020)DSMProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33921514:2(1-26)Online publication date: 12-Jun-2020
      • (2018)VRL-DRAMProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196136(1-6)Online publication date: 24-Jun-2018
      • (2018)VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465769(1-6)Online publication date: Jun-2018
      • (2017)HAShCacheACM Transactions on Architecture and Code Optimization10.1145/315864114:4(1-26)Online publication date: 18-Dec-2017
      • (2017)Near-Optimal Access Partitioning for Memory Hierarchies with Multiple Heterogeneous Bandwidth Sources2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.46(13-24)Online publication date: Feb-2017

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media