Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory

Published: 08 October 2013 Publication History

Abstract

Improving the vulnerability to soft errors is one of the important design goals for future architecture design of Chip-MultiProcessors (CMPs). In this study, we explore the soft error characteristics of CMPs with 3D stacked NonVolatile Memory (NVM), in particular, the Spin-Transfer Torque Random Access Memory (STT-RAM), whose cells are immune to radiation-induced soft errors and do not have endurance problems. We use 3D stacking as an enabler for modular integration of STT-RAM memories with minimum disruption in the baseline processor design flow, while providing further interconnection and capacity advantages. We take an in-depth look at alternative replacement schemes to explore the soft error resilience benefits and design trade-offs of 3D stacked STT-RAM and capture the multivariable optimization challenges microprocessor architectures face. We propose a vulnerability metric, with respect to the instruction and data in the core pipeline and through the cache hierarchy, to present a comprehensive system evaluation with respect to reliability, performance, and power consumption for our CMP architectures. Our experimental results show that, for the average workload, replacing memories with an STT-RAM alternative significantly mitigates soft errors on-chip, improves the performance by 14.15%, and reduces power consumption by 13.44%.

References

[1]
Akerman, J., Brown, P., Gajewski, D., Griswold, M., Janesky, J., Martin, M., Mekonnen, H., Nahas, J., Pietambaram, S., Slaughter, J., and Tehrani, S. 2005. Reliability of 4mbit mram. In Proceedings of the 43rd IEEE International Annual Reliability Physics Symposium. 163--167.
[2]
Bertram, H., Wang, X., and Safonov, V. 2001. Dynamic-thermal effects in thin film media. IEEE Trans. Magnetics 37, 4, 1521--1527.
[3]
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. 72--81.
[4]
Bossen, D., Tendler, J., and Reick, K. 2002. Power4 system design for high reliability. IEEE Micro 22, 2, 16--24.
[5]
Diao, Z., Li, Z., Wang, S., Ding, Y., Panchula, A., et al. 2007. Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory. J. Phys. Condensed Matter 19, 16, 165--209.
[6]
Dong, X., Wu, X., Sun, G., Xie, Y., Li, H., and Chen, Y. 2008. Circuit and microarchitecture evaluation of 3d stacking magnetic ram (mram) as a universal memory replacement. In Proceedings of the 45th Annual Design Automation Conference. 554--559.
[7]
Freescale Document Number Brmramslscltrl. 2007. Freescale MRAM technology.
[8]
Gallagher, W. J. and Parkin, S. S. P. 2006. Development of the magnetic tunnel junction mram at ibm: From first junctions to a 16-mb mram demonstrator chip. IBM J. Res. Devel. 50, 1, 5--23.
[9]
Guo, X., Ipek, E., and Soyata, T. 2010. Resistive computation: Avoiding the power wall with low-leakage, stt-mram based computing. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). ACM Press, New York, 371--382.
[10]
Hamming, R. W. 1950. Error correcting and error detecting codes. Bell Syst. Tech. J. 29, 14--160.
[11]
Hsiao, M. Y. 1970. A class of optimal minimum odd-weight-column sec-ded codes. IBM J. Res. Devel. 14, 4, 395--401.
[12]
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the cell multiprocessor. IBM J. Res. Devel. 49, 4/5, 589--604.
[13]
Kim, J., Hardavellas, N., Mai, K., Falsafi, B., and Hoe, J. 2007. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th Annual International Symposium on Microarchitecture. 197--209.
[14]
Kim, S. 2009. Reducing area overhead for error-protecting large l2/l3 caches. IEEE Trans. Comput. 58, 3, 300--310.
[15]
Kleinosowski, A., Cannon, E. H., Oldiges, P., and Wissel, L. 2008. Circuit design and modeling for soft errors. IBM J. Res. Devel. 52, 3.
[16]
Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: A 32-way multithreaded sparc processor. IEEE Micro 25, 2, 21--29.
[17]
Lee, B. C., Ipek, E., Mutlu, O., and Burger, D. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture. 2--13.
[18]
Li, H., Wang, X., Ong, Z.-L., Wong, W.-F., Zhang, Y., Wang, P., and Chen, Y. 2011. Performance, power, and reliability tradeoffs of stt-ram cell subject to architecture-level requirement. IEEE Trans. Magnetics 47, 10, 2356--2359.
[19]
Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., and Jouppi, N. P. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480.
[20]
Li, X., Adve, S. V., Bose, P., and Rivers, J. A. 2005. Softarch: An architecture level tool for modeling and analyzing soft errors. In Proceedings of the International Conference on Dependable Systems and Networks. 496--505.
[21]
Loi, G. L., Agrawal, B., Srivastava, N., Lin, S.-C., Sherwood, T., and Banerjee, K. 2006. A thermally-aware performance analysis of vertically integrated (3-d) processor-memory hierarchy. In Proceedings of the 43rd Annual Design Automation Conference. 991--996.
[22]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., et al. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58.
[23]
Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., et al. 2005. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News 33, 4, 92--99.
[24]
Mukherjee, S. 2008. Architecture Design for Soft Errors. Elsevier.
[25]
Mukherjee, S. S., Weaver, C., Emer, J., Reinhardt, S. K., and Austin, T. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. 29--40.
[26]
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing nuca organizations and wiring alternatives for large caches with cacti 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 3--14.
[27]
Pagiamtzis, K., Azizi, N., and Najm, F. 2006. A soft-error tolerant content-addressable memory (cam) using an error-correcting-match scheme. In Proceedings of the Custom Integrated Circuits Conference. 301--304.
[28]
Pagiamtzis, K. and Sheikholeslami, A. 2006. Content-addressable memory (cam) circuits and architectures: A tutorial and survey. IEEE J. Solid-State Circ. 41, 3, 712--727.
[29]
Qureshi, M. K., Srinivasan, V., and Rivers, J. A. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 24--33.
[30]
Rivers, J. A., Bose, P., Kudva, P., Wellman, J.-D., Sanda, P. N., et al. 2008. Phaser: Phased methodology for modeling the system-level effects of soft errors. IBM J. Res. Devel. 52, 3, 293--306.
[31]
Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3d stacked mram l2 cache for cmps. In Proceedings of the International Symposium on High-Performance Computer Architecture. 239--249.
[32]
Sun, H., Liu, C., Xu, W., Zhao, J., Zheng, N., and Zhang, T. 2010. Using magnetic ram to build low-power and soft error-resilient l1 cache. IEEE Trans. VLSI Syst. 20, 1, 19--28.
[33]
Tehrani, S. 2010. Status and prospect for mram technology. http://www.hotchips.org/wp-content/uploads/hc_archives/hc22/HC22.22.130-Tehrani-MRAM.pdf.
[34]
Wang, W. and Jiang, Z. 2007. Magnetic content addressable memory. IEEE Trans. Magnetics 43, 6, 2355--2357.
[35]
Wang, X., Chen, Y., Li, H., Dimitrov, D., and Liu, H. 2008a. Spin torque random access memory down to 22 nm technology. IEEE Trans. Magnetics 44, 11, 2479--2482.
[36]
Wang, X., Zheng, Y., Xi, H., and Dimitrov, D. 2008b. Thermal fluctuation effects on spin torque induced switching: Mean and variations. J. Appl. Phys. 103, 034507.
[37]
Weaver, C., Emer, J., Mukherjee, S. S., and Reinhardt, S. K. 2004. Techniques to reduce the soft error rate of a high-performance microprocessor. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). 264--275.
[38]
Wu, X., Li, J., Zhang, L., Speight, E., Rajamony, R., and Xie, Y. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 34--45.
[39]
Xu, W., Zhang, T., and Chen, Y. 2009. Design of spin-torque transfer magnetoresistive ram and cam/tcam with high sensing and search speed. IEEE Trans. VLSI Syst. 18, 1, 66--74.
[40]
Yoon, D. H. and Erez, M. 2009. Memory mapped ECC: low-cost error protection for last level caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 116--127.
[41]
Zhang, W. and Li, T. 2008. Managing multi-core soft-error reliability through utility-driven cross domain optimization. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP'08). 132--137.
[42]
Zhang, W. and Li, T. 2009. Exploring phase change memory and 3d die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, LOS Alamitos, CA, 101--112.
[43]
Zhao, W., Belhaire, E., Mistral, Q., Chappert, C., Javerliac, V., et al. 2006. Macro-model of spin-transfer torque based magnetic tunnel junction device for hybrid magnetic-cmos design. In Proceedings of the IEEE International Behavioral Modeling and Simulation Workshop. 40--43.
[44]
Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 14--23.

Cited By

View all
  • (2020)A Comprehensive Performance Evaluation to GPGPU Applications under STT- RAM based Hybrid Cache Architectures2020 X Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC51047.2020.9277841(1-8)Online publication date: 24-Nov-2020
  • (2017)Overview of 3-D Architecture Design Opportunities and TechniquesIEEE Design & Test10.1109/MDAT.2015.246328234:4(60-68)Online publication date: Aug-2017
  • (2017)Implicit Programming: A Fast Programming Strategy for nand Flash Memory Storage Systems Adopting Redundancy MethodsIEEE Embedded Systems Letters10.1109/LES.2017.26701409:2(37-40)Online publication date: 25-May-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 9, Issue 3
September 2013
196 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/2533711
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 08 October 2013
Accepted: 01 April 2012
Revised: 01 February 2012
Received: 01 July 2011
Published in JETC Volume 9, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D stacking
  2. Nonvolatile memory
  3. soft errors

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)A Comprehensive Performance Evaluation to GPGPU Applications under STT- RAM based Hybrid Cache Architectures2020 X Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC51047.2020.9277841(1-8)Online publication date: 24-Nov-2020
  • (2017)Overview of 3-D Architecture Design Opportunities and TechniquesIEEE Design & Test10.1109/MDAT.2015.246328234:4(60-68)Online publication date: Aug-2017
  • (2017)Implicit Programming: A Fast Programming Strategy for nand Flash Memory Storage Systems Adopting Redundancy MethodsIEEE Embedded Systems Letters10.1109/LES.2017.26701409:2(37-40)Online publication date: 25-May-2017
  • (2017)Emerging and Non-volatile MemoryHandbook of Hardware/Software Codesign10.1007/978-94-017-7358-4_15-1(1-17)Online publication date: 10-Apr-2017
  • (2017)Emerging and Nonvolatile MemoryHandbook of Hardware/Software Codesign10.1007/978-94-017-7267-9_15(443-459)Online publication date: 27-Sep-2017
  • (2016)Micromagnetic Simulation of Strain-Assisted Current-Induced Magnetization SwitchingAdvances in Condensed Matter Physics10.1155/2016/92714072016(1-6)Online publication date: 2016
  • (2016)An Endurance-Aware Metadata Allocation Strategy for MLC NAND Flash Memory Storage SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.247439435:4(691-694)Online publication date: 17-Mar-2016
  • (2015)Impact of Cell Failure on Reliable Cross-Point Resistive Memory DesignACM Transactions on Design Automation of Electronic Systems10.1145/275375920:4(1-21)Online publication date: 28-Sep-2015
  • (2014)Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor SystemsSensors10.3390/s14101885114:10(18851-18877)Online publication date: 10-Oct-2014
  • (2012)Parametric flowsProceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.94(1-10)Online publication date: 10-Nov-2012
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media