Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

Published: 20 December 2013 Publication History

Abstract

Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache memories is impeded by its long write latency and high write power. Recent work proposed improving the write performance through relaxing the retention time of STT-RAM cells. The resultant volatile STT-RAM needs to be periodically refreshed to prevent data loss. When volatile STT-RAM is applied as the last-level cache (LLC) in chip multiprocessor (CMP) systems, frequent refresh operations could dissipate significant extra energy. In addition, refresh operations could severely conflict with normal read/write operations to degrade overall system performance. Therefore, minimizing the performance impact caused by refresh operations is crucial for the adoption of volatile STT-RAM.
In this article, we propose Cache-Coherence-Enabled Adaptive Refresh (CCear) to minimize the number of refresh operations for volatile STT-RAM, adopted as the LLC for CMP systems. Specifically, CCear interacts with cache coherence protocol and cache management policy to minimize the number of refresh operations on volatile STT-RAM caches. Full-system simulation results show that CCear performs close to an ideal refresh policy with low overhead. Compared with state-of-the-art refresh policies, CCear simultaneously improves the system performance and reduces the energy consumption. Moreover, the performance of CCear could be further enhanced using small filter caches to accommodate the not-refreshed private STT-RAM blocks.

References

[1]
Barth, J., Reohr, W. R., Parries, P., Fredeman, G., Golz, J., Schuster, S. E., Matick, R. E., Hunter, H., Tanner, C. C., Harig, J., Kim, H., Khan, B. A., Griesemer, J., Havreluk, R. P., Yanagisawa, K., Kirihata, T., and Iyer, S. S. 2008. A 500 MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE J. Solid-State Circ. 43, 1, 86--95.
[2]
Bienia, C. 2011. Benchmarking modern multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ.
[3]
Chen, E., Apalkov, D., Diao, Z., Driskill-Smith, A., Druist, D., Lottis, D., Nikitin, V., Tang, X., Watts, S., Wang, S., Wolf, S. A., Ghosh, A. W., Lu, J. W., Poon, S. J., Stan, M., Butler, W. H., Gupta, S., Mewes, C., Mewes, T., and Visscher, P. B. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Tran. Magnet. 46, 6, 1873--1878.
[4]
Chen, Y.-T., Cong, J., Huang, H., Liu, B., Liu, C., Potkonjak, M., and Reinman, G. 2012. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 45--50.
[5]
Dally, W. J. and Towles, B. 2001. Route packets, not wires: On-chip inteconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 684--689.
[6]
Dong, X., Wu, X., Sun, G., Xie, Y., Li, H., and Chen, Y. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 554--559.
[7]
Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of 29th Annual International Symposium on Computer Architecture (ISCA'02). 148--157.
[8]
Ghosh, M. and Lee, H.-H. S. 2007. Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'40). 134--145.
[9]
Hosomi, M., Yamagishi, H., Yamamoto, T., Bessho, K., Higo, Y., Yamane, K., Yamada, H., Shoji, M., Hachino, H., Fukumoto, C., Nagao, H., and Kano, H. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the IEEE International Electron Devices Meeting (IEDM'05). 459--462.
[10]
Hu, Z., Kaxiras, S., and Martonosi, M. 2002. Let caches decay: Reducing leakage energy via exploitation of cache generational behavior. ACM Trans. Comput. Syst. 20, 2, 161--190.
[11]
Jadidi, A., Arjomand, M., and Sarbazi-Azad, H. 2011. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design (ISLPED'11). 79--84.
[12]
Jaleel, A., Theobald, K. B., Steely, S. C. Jr., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). 60--71.
[13]
Jog, A., Mishra, A. K., Xu, C., Xie, Y., Narayanan, V., Iyer, R., and Das, C. R. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). 243--252.
[14]
Kahng, A. B., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 423--428.
[15]
Kalla, R., Sinharoy, B., Starke, W. J., and Floyd, M. 2010. Power7: IBM's next-generation server processor. IEEE Micro 30, 2, 7--15.
[16]
Khan, S. M., Jiménez, D. A., Burger, D., and Falsafi, B. 2010a. Using dead blocks as a virtual victim cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 489--500.
[17]
Khan, S. M., Tian, Y., and Jimenez, D. A. 2010b. Sampling dead block prediction for last-level caches. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 175--186.
[18]
Kim, N. S., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. Computer 36, 12, 68--75.
[19]
Kin, J., Gupta, M., and Mangione-Smith, W. H. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (MICRO'97). 184--193.
[20]
Li, J., Shi, L., Xue, C. J., Yang, C., and Xu, Y. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia). 19--28.
[21]
Li, Q., Li, J., Shi, L., Xue, C. J., and He, Y. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). 351--356.
[22]
Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'41). 222--233.
[23]
Liu, J., Jaiyen, B., Veras, R., and Mutlu, O. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA'12). 1--12.
[24]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58.
[25]
Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Architect. News 33, 4, 92--99.
[26]
Meng, Y., Sherwood, T., and Kastner, R. 2005. On the limits of leakage power reduction in caches. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). 154--165.
[27]
Micron Technology. 2007. Calculating Memory System Power for DDR3. 2007. http://download.micron.com/pdf/technotes/ddr3/TN41_01DDR3Power.pdf.
[28]
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 3--14. http://www.hpl.hp.com/research/cacti/.
[29]
Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely S. C. Jr., and Emer, J. 2008. Set-dueling-controlled adaptive insertion for high-performance caching. IEEE Micro 28, 1, 91--98.
[30]
Rasquinha, M., Choudhary, D., Chatterjee, S., Mukhopadhyay, S., and Yalamanchili, S. 2010. An energy efficient cache design using spin torque transfer (STT) RAM. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'10). 389--394.
[31]
Smullen, C. W., Mohan, V., Nigam, A., Gurumurthi, S., and Stan, M. R. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA'11). 50--61.
[32]
Sorin, D. J., Hill, M. D., and Wood, D. A. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan and Claypool.
[33]
Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 375--384.
[34]
Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA'09). 239--249.
[35]
Sun, Z., Bi, X., Li, H. (Helen), Wong, W.-F., Ong, Z.-L., Zhu, X., and Wu, W. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). 329--338.
[36]
Sweazey, P. and Smith, A. J. 1986. A class of compatible cache consistency protocols and their support by the IEEE futurebus. SIGARCH Comput. Archit. News 14, 2, 414--423.
[37]
Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., and Agarwal, A. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2, 25--35.
[38]
Tehrani, S., Slaughter, J. M., Deherrera, M., Engel, B. N., Rizzo, N. D., Salter, J., Durlam, M., Dave, R. W., Janesky, J., Butcher, B., Smith, K., and Grynkewich, G. 2003. Magnetoresistive random access memory using magnetic tunnel junctions. Proc. IEEE 91, 5, 703--714.
[39]
Valero, A., Sahuquillo, J., Petit, S., and Duato, J. 2013. Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. In Proceedings of the 27th International ACM Conference on Supercomputing (ICS'13). 491--492.
[40]
Valero, A., Sahuquillo, J., Petit, S., López, P., and Duato, J. 2012. Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans. Archit. Code Optim. 9, 3, 16:1--16:20.
[41]
Wu, X., Li, J., Zhang, L., Speight, E., Rajamony, R., and Xie, Y. 2010. Design exploration of hybrid caches with disparate memory technologies. ACM Trans. Archit. Code Optim. 7, 3, 15:1--15:34.
[42]
Xue, C. J., Zhang, Y., Chen, Y., Sun, G., Yang, J. J., and Li, H. 2011. Emerging non-volatile memories: opportunities and challenges. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'11). 325--334.
[43]
Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers (ICCAD'09). 264--268.

Cited By

View all
  • (2023)CAPMIG: Coherence-Aware Block Placement and Migration in Multiretention STT-RAM CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317524242:2(411-422)Online publication date: Feb-2023
  • (2019)Research and Analysis of Design and Optimization of Magnetic Memory Material Cache Based on STT-MRAMKey Engineering Materials10.4028/www.scientific.net/KEM.815.28815(28-34)Online publication date: Aug-2019
  • (2017)Fluid wireless protocolsProceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia10.1145/3139315.3139321(22-31)Online publication date: 15-Oct-2017
  • Show More Cited By

Index Terms

  1. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 19, Issue 1
      December 2013
      210 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/2558148
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 20 December 2013
      Accepted: 01 September 2013
      Revised: 01 June 2013
      Received: 01 December 2012
      Published in TODAES Volume 19, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Spin-torque transfer RAM
      2. cache coherence
      3. embedded DRAM
      4. energy efficiency
      5. nonvolatile memory
      6. refresh

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 06 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)CAPMIG: Coherence-Aware Block Placement and Migration in Multiretention STT-RAM CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317524242:2(411-422)Online publication date: Feb-2023
      • (2019)Research and Analysis of Design and Optimization of Magnetic Memory Material Cache Based on STT-MRAMKey Engineering Materials10.4028/www.scientific.net/KEM.815.28815(28-34)Online publication date: Aug-2019
      • (2017)Fluid wireless protocolsProceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia10.1145/3139315.3139321(22-31)Online publication date: 15-Oct-2017
      • (2017)Exploiting Multiple Write Modes of Nonvolatile Main Memory in Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/306313016:4(1-26)Online publication date: 11-May-2017
      • (2017)On-Chip Non-volatile STT-MRAM for Zero-Standby PowerEnabling the Internet of Things10.1007/978-3-319-51482-6_7(213-246)Online publication date: 26-Jan-2017
      • (2016)Reducing System Power Consumption Using Check-Pointing on Nonvolatile Embedded Magnetic Random Access MemoriesACM Journal on Emerging Technologies in Computing Systems10.1145/287650712:4(1-24)Online publication date: 12-May-2016
      • (2016)Spin-Transfer Torque Memories: Devices, Circuits, and SystemsProceedings of the IEEE10.1109/JPROC.2016.2521712104:7(1449-1488)Online publication date: Jul-2016
      • (2016)A Novel L1 Cache Based on Volatile STT-RAMComputer Engineering and Technology10.1007/978-981-10-3159-5_4(32-39)Online publication date: 9-Dec-2016
      • (2016)A Novel Hybrid Last Level Cache Based on Multi-retention STT-RAM CellsAdvanced Computer Architecture10.1007/978-981-10-2209-8_3(28-39)Online publication date: 9-Aug-2016
      • (2015)Nonvolatile main memory aware garbage collection in high-level language virtual machineProceedings of the 12th International Conference on Embedded Software10.5555/2830865.2830887(197-206)Online publication date: 4-Oct-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media