Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Building and Optimizing MRAM-Based Commodity Memories

Published: 08 December 2014 Publication History

Abstract

Emerging non-volatile memory technologies such as MRAM are promising design solutions for energy-efficient memory architecture, especially for mobile systems. However, building commodity MRAM by reusing DRAM designs is not straightforward. The existing memory interfaces are incompatible with MRAM small page size, and they fail to leverage MRAM unique properties, causing unnecessary performance and energy overhead. In this article, we propose four techniques to enable and optimize an LPDDRx-compatible MRAM solution: ComboAS to solve the pin incompatibility; DynLat to avoid unnecessary access latencies; and EarlyPA and BufW to further improve performance by exploiting the MRAM unique features of non-destructive read and independent write path. Combining all these techniques together, we boost the MRAM performance by 17% and provide a DRAM-compatible MRAM solution consuming 21% less energy.

References

[1]
J. H. Ahn, J. Leverich, R. S. Schreiber, and N. P. Jouppi. 2009. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. Computer Architecture Letters 8, 1 (2009), 5--8.
[2]
S. M. Alam, T. Andre, and D. Gogl. 2012. Memory controller and method for interleaving DRAM and MRAM accesses. Patent No. 2012/0155160 A1, Filed Dec 16, 2011, Issued Jun 21, 2012.
[3]
Y.-C. Bae, J.-Y. Park, S. J. Rhee, S. B. Ko, Y. Jeong, K.-S. Noh, Y. Son, J. Youn, Y. Chu, H. Cho, M. Kim, D. Yim, H.-C. Kim, S.-H. Jung, H.-I. Choi, S. Yim, J.-B. Lee, J.-S. Choi, and K. On. 2012. A 1.2V 30nm 1.6Gb/s/pin 4Gb LPDDR3 SDRAM with input skew calibration and enhanced control scheme. In Proceedings of the International Solid-State Circuits Conference. 44--46.
[4]
S. Beamer, C. Sun, Y.-J. Kwon, A. Joshi, C. Batten, V. Stojanovic, and K. Asanovic. 2010. Re-architecting DRAM memory systems with monolithically integrated silicon photonics. In Proceedings of the International Symposium on Computer Architecture. 129--140.
[5]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7.
[6]
N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi. 2012. Staged reads: Mitigating the impact of DRAM writes on DRAM reads. In Proceedings of the International Symposium on High Performance Computer Architecture. 1--12.
[7]
S. Cho and H. Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proceedings of the International Symposium on Microarchitecture, 347--357.
[8]
S. Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H. J. Kim, S. H. Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang, J. Y. Lee, Y.-B. An, J.-U. Yi, Y.-H. Seo, D.-H. Jung, M.-S. Lee, S.-H. Cho, J.-N. Kim, G.-J. Park, G. Jin, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Y. Kim, J.-S. Rho, S.-K. Park, S.-W. Chung, J.-G. Jeong, and S. J. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Proceedings of the International Electron Devices Meeting. 12.7.1--12.7.4.
[9]
X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7, 944--1007.
[10]
R. Duan, M. Bi, and C. Gniady. 2011. Exploring memory energy optimizations in smartphones. In Proceedings of the International Green Computing Conference and Workshops. 1--8.
[11]
E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. 2011. Parallel application memory scheduling. In Proceedings of the International Symposium on Microarchitecture. 362--373.
[12]
EEMBC. 2014. EEMBC benchmark. Retrieved September 10, 2014 from http://www.eembc.org/.
[13]
V. Gowri. 2014. Samsung Galaxy Tab. AnandTech Review.
[14]
S. Hellmold. 2013. Delivering nanosecond-class persistent memory. In Flash Memory Summit.
[15]
H. Hidaka, Y. Matsuda, M. Asakura, and K. Fujishima. 1990. The cache DRAM architecture: A DRAM with an on-chip cache memory. IEEE Micro 10, 2, 14--25.
[16]
HPEC. 2006. HPEC benchmark. Retrieved September 10, 2014 from http://www.omgwiki.org/hpec/files/hpec-challenge/.
[17]
H. Huang, K. G. Shin, Lefurgy C., and T. Keller. 2005. Improving energy efficiency by making DRAM less randomly accessed. In Proceedings of the International Symposium on Low Power Electronics and Design. 393--398.
[18]
I. Hur and C. Lin. 2008. A comprehensive approach to DRAM power management. In Proceedings of the International Symposium on High Performance Computer Architecture. 305--316.
[19]
JEDEC Solid State Technology Association. 2010. Standard JESD209-2E LPDDR2.
[20]
JEDEC Solid State Technology Association. 2012. Standard JESD209-3 LPDDR3.
[21]
Y. Joo, D. Niu, X. Dong, G. Sun, N. Chang, and Y. Xie. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of the Design, Automation and Test in Europe Conference, 136--141.
[22]
W. Kim, J. H. Jeong, Y. Kim, W. C. Lim, J-H. Kim, J. H. Park, H. J. Shin, Y. S. Park, K. S. Kim, S. H. Park, Y. J. Lee, K. W. Kim, H. J. Kwon, H. L. Park, H. S. Ahn, S. C. Oh, J. E. Lee, S. O. Park, S. Choi, H.-K. Kang, and C. Chung. 2011. Extended scalability of perpendicular STT-MRAM towards sub-20nm MTJ node. In Proceedings of the International Electron Devices Meeting. 24.1.1--24.1.4.
[23]
Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. 2010. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the International Symposium on High Performance Computer Architecture. 1--12.
[24]
Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In Proceedings of the International Symposium on Computer Architecture, 368--379.
[25]
E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, 256--267.
[26]
O.-G. La. 2002. SDRAM having posted CAS function of JEDEC standard. Patent No. 6,483,769, Filed May 2, 2001, Issued Nov 19, 2002.
[27]
A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis. 2000. Power aware page allocation. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 105--116.
[28]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting phase change memory as a scalable DRAM alternative. In Proceedings of the International Symposium on Computer Architecture. 2--13.
[29]
J. Liu, B. Jaiyen, R. Veras, and O. Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In Proceedings of the International Symposium on Computer Architecture, 1--12.
[30]
G. H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the International Symposium on Computer Architecture, 453--464.
[31]
K. T. Malladi, B. C. Lee, F. A. Nothaft, C. Kozyrakis, K. Periyathambi, and M. Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. In Proceedings of the International Symposium on Computer Architecture. 37--48.
[32]
J. Meza, J. Li, and O. Mutlu. 2012a. A case for small row buffers in non-volatile main memories. In Proceedings of the International Conference on Computer Design. 484--485.
[33]
J. Meza, J. Li, and O. Mutlu. 2012b. Evaluating Row Buffer Locality in Future Non-Volatile Main Memories. Technical Report CMU-2012-002. Carnegie Mellon University, Pittsburgh, PA.
[34]
Micron. 2012. 121-Ball LPDDR2-PCM and LPDDR2 MCP.
[35]
J. Mukundan, H. Hunter, K.-H. Kim, J. Stuecheli, and J. F. Martínez. 2013. Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems. In Proceedings of the International Symposium on Computer Architecture. 48--59.
[36]
S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the International Symposium on Microarchitecture. 374--385.
[37]
O. Mutlu and T. Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the International Symposium on Computer Architecture. 63--74.
[38]
P. Nair, C.-C. Chou, and M. K. Qureshi. 2013. A case for refresh pausing in DRAM memory systems. In Proceedings of the International Symposium on High Performance Computer Architecture. 627--638.
[39]
V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini. 2006. DMA-aware memory energy management. In Proceedings of the International Symposium on High Performance Computer Architecture. 133--144.
[40]
M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture. 24--33.
[41]
L. E. Ramos, E. Gorbatov, and R. Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing. 85--95.
[42]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. 128--138.
[43]
N. D. Rizzo, D. Houssameddine, J. Janesky, R. Whig, F. B. Mancoff, M. L. Schneider, M. DeHerrera, J. J. Sun, K. Nagel, S. Deshpande, H.-J. Chia, S. M. Alam, T. Andre, S. Aggarwal, and J. M. Slaughter. 2013. A fully functional 64 Mb DDR3 ST-MRAM built on 90 nm CMOS technology. IEEE Transactions on Magnetics 49, 7, 4441--4446.
[44]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters 10, 1, 16--19.
[45]
J. M. Slaughter, N. D. Rizzo, J. Janesky, R. Whig, F. B. Mancoff, D. Houssameddine, J. J. Sun, S. Aggarwal, K. Nagel, S. Deshpande, S. M. Alam, T. Andre, and P. LoPresti. 2012. High density ST-MRAM technology. In Proceedings of the International Electron Devices Meeting. 29.3.1--29.3.4.
[46]
C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance Computer Architecture. 50--61.
[47]
SPEC CPU. 2006. SPEC CPU2006. Retrieved September 10, 2014 from http://www.spec.org/cpu2006/.
[48]
K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. 2010. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 219--230.
[49]
G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen. 2009. A novel 3D stacked MRAM cache architecture for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture. 239--249.
[50]
Z. Sun, X. Bi, H. Li, W.-F. Wong, Z.-L. Ong, X. Zhu, and W. Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 329--338.
[51]
S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi. 2008. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the International Symposium on Computer Architecture. 51--62.
[52]
K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama, M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura, H. Yoda, and Y. Watanabe. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference. 258--259.
[53]
A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi. 2010. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proceedings of the International Symposium on Computer Architecture. 175--186.
[54]
D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. G. Beausoleil, and J. H. Ahn. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the International Symposium on Computer Architecture. 153--164.
[55]
W. Xu, J. Liu, and T. Zhang. 2009. Data manipulation techniques to reduce phase change memory write energy. In Proceedings of the International Symposium on Low Power Electronics and Design. 237--242.
[56]
B.-D. Yang, J.-E. Lee, J.-S. Kim, J. Cho, S.-Y. Lee, and B.-G. Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In Proceedings of the International Symposium on Circuits and Systems. 3014--3017.
[57]
D. H. Yoon, J. Chang, N. Muralimanohar, and P. Ranganathan. 2012. BOOM: Enabling mobile memory based low-power server DIMMs. In Proceedings of the International Symposium on Computer Architecture. 25--36.
[58]
Z. Zhang, Z. Zhu, and X. Zhang. 2000. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proceedings of the International Symposium on Microarchitecture. 32--41.
[59]
Z. Zhang, Z. Zhu, and X. Zhang. 2001. Cached DRAM for ILP processor memory access latency reduction. IEEE Micro 21, 4, 22--32.
[60]
H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu. 2008. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proceedings of the International Symposium on Microarchitecture. 210--221.
[61]
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. 2004. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 177--188.
[62]
P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the International Symposium on Computer Architecture. 14--23.

Cited By

View all

Index Terms

  1. Building and Optimizing MRAM-Based Commodity Memories

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 11, Issue 4
    January 2015
    797 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2695583
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 December 2014
    Accepted: 01 August 2014
    Revised: 01 August 2014
    Received: 01 June 2014
    Published in TACO Volume 11, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. LPDDR2
    2. LPDDR3
    3. MRAM
    4. Nonvolatile memory
    5. energy
    6. main memory
    7. performance
    8. spin-transfer torque

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)79
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Class consistent hashing for fast Web data searchingWorld Wide Web10.1007/s11280-018-0540-y22:2(477-497)Online publication date: 15-May-2019
    • (2019)NoTThe Journal of Supercomputing10.1007/s11227-019-02749-175:7(3810-3841)Online publication date: 31-Jul-2019
    • (2019)Towards learning a semantic-consistent subspace for cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-6578-078:1(389-412)Online publication date: 1-Jan-2019
    • (2018)Design guidelines for high-performance SCM hierarchiesProceedings of the International Symposium on Memory Systems10.1145/3240302.3240310(3-16)Online publication date: 1-Oct-2018
    • (2017)Emerging NVMACM Transactions on Design Automation of Electronic Systems10.1145/313184823:2(1-32)Online publication date: 14-Nov-2017
    • (2017)Durable and Energy Efficient In-Memory Frequent-Pattern MiningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.268107736:12(2003-2016)Online publication date: Dec-2017
    • (2017)Endurance-Aware Security Enhancement in Non-Volatile Memories Using Compression and Selective EncryptionIEEE Transactions on Computers10.1109/TC.2016.264218066:7(1132-1144)Online publication date: 1-Jul-2017
    • (2017)BibliographyFlash Memory Integration10.1016/B978-1-78548-124-6.50022-0(229-248)Online publication date: 2017
    • (2016)Making In-Memory Frequent Pattern Mining Durable and Energy Efficient2016 45th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2016.13(47-56)Online publication date: Aug-2016
    • (2015)Hand-Object SenseProceedings of the 23rd ACM international conference on Multimedia10.1145/2733373.2807990(765-766)Online publication date: 13-Oct-2015

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media