Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Algorithmic Fault Detection for RRAM-based Matrix Operations

Published: 13 May 2020 Publication History

Abstract

An RRAM-based computing system (RCS) provides an energy-efficient hardware implementation of vector-matrix multiplication for machine-learning hardware. However, it is vulnerable to faults due to the immature RRAM fabrication process. We propose an efficient fault tolerance method for RCS; the proposed method, referred to as extended-ABFT (X-ABFT), is inspired by algorithm-based fault tolerance (ABFT). We utilize row checksums and test-input vectors to extract signatures for fault detection and error correction. We present a solution to alleviate the overflow problem caused by the limited number of voltage levels for the test-input signals. Simulation results show that for a Hopfield classifier with faults in 5% of its RRAM cells, X-ABFT allows us to achieve nearly the same classification accuracy as in the fault-free case.

References

[1]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251.
[2]
Cynthia J. Anfinson and Franklin T. Luk. 1988. A linear algebraic model of algorithm-based fault tolerance. IEEE Trans. Comput. 37, 12 (1988), 1599--1604.
[3]
Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, and Warren J. Gross. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2688–2699.
[4]
Karsten Beckmann et al. 2016. Nanoscale hafnium oxide RRAM devices exhibit pulse dependent behavior and multi-level resistance capability. MRS Advances 1, 49 (2016), 3355--3360.
[5]
Claus Braun, Sebastian Halder, and Hans Joachim Wunderlich. 2014. A-abft: Autonomous algorithm-based fault tolerance for matrix multiplications on graphics processing units. In Proceedings of the International Conference on Dependable Systems and Networks. IEEE, 443--454.
[6]
Yi Cai et al. 2018. Long live TIME: Improving lifetime for training-in-memory engines by structured gradient sparsification. In Proceedings of the Design Automation Conference (DAC’18). ACM, 107.
[7]
Meng-Fan Chang et al. 2014. 19.4 embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In Proceedings of the International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). IEEE, 332--333.
[8]
B. Chen, Y. Lu, B. Gao, Y. H. Fu, F. F. Zhang, P. Huang, Y. S. Chen, L. F. Liu, X. Y. Liu, J. F. Kang, et al. 2011. Physical mechanisms of endurance degradation in TMO-RRAM. In Proceedings of the 2011 International Electron Devices Meeting. IEEE, 12--3.
[9]
C. Chen et al. 2013. Conductance quantization in oxygen-anion-migration-based resistive switching memory devices. Appl. Phys. Lett. 103, 4 (2013), 043510.
[10]
Ching-Yi Chen et al. 2015. RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Trans. Comput. 64, 1 (2015), 180--190.
[11]
Lerong Chen et al. 2017. Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar. In Proceedings of the Design, Automation 8 Test in Europe Conference (DATE’17). 19--24.
[12]
Yang Yin Chen et al. 2012. Understanding of the endurance failure in scaled HfO 2-based 1T1R RRAM through vacancy mobility degradation. In Proceedings of the International Electron Devices Meeting. IEEE, 20--3.
[13]
Ping Chi et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proceedings of the International Symposium on Computer Architecture (ISCA’16).
[14]
R. Degraeve et al. 2015. Causes and consequences of the stochastic aspect of filamentary RRAM. Microelectr. Eng. 147 (2015), 171--175.
[15]
Shukai Duan et al. 2016. Small-world Hopfield neural networks with weight salience priority and memristor synapses for digit recognition. Neur. Comput. Appl. 79, 8 (2016), 837--844.
[16]
Andrea Fantini, Ludovic Goux, Robin Degraeve, D. J. Wouters, N. Raghavan, G. Kar, Attilio Belmonte, Y.-Y. Chen, Bogdan Govoreanu, and Malgorzata Jurczak. 2013. Intrinsic switching variability in HfO2 RRAM. In Proceedings of the International Memory Workshop (IMW’13). IEEE, 30--33.
[17]
Ligang Gao et al. 2016. Demonstration of convolution kernel operation on resistive cross-point array. IEEE Electr. Dev. Lett. 37, 7 (2016), 870--873.
[18]
Zhen Gao et al. 2016. Efficient fault tolerant parallel matrix-vector multiplications. In Proceedings of the On-Line Testing and Robust System Design Conferene (IOLTS’16). IEEE, 25--26.
[19]
M. B. Gonzalez et al. 2014. Analysis of the switching variability in -Based RRAM Devices. Trans. Dev. Mater. Reliabil. 14, 2 (2014), 769--771.
[20]
B. Govoreanu et al. 2011. 10 10nm 2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation. In Proceedings of the International Electron Devices Meeting (IEDM’11). 31--6.
[21]
Alessandro Grossi et al. 2016. Fundamental variability limits of filament-based RRAM. In Proceedings of the International Electron Devices Meeting (IEDM’16). IEEE, 4--7.
[22]
John J. Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U.S.A. 79 (1982), 2554--2558.
[23]
Miao Hu et al. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of the Design Automation Conference (DAC’16).
[24]
Kuang-Hua Huang et al. 1984. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 100, 6 (1984), 518--528.
[25]
Wenqin Huangfu et al. 2017. Computation-oriented fault-tolerance schemes for RRAM computing systems. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’17). IEEE, 794--799.
[26]
Jing-Yang Jou et al. 1986. Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures. Proc. IEEE 74, 5 (1986), 732--741.
[27]
Sachhidh Kannan et al. 2013. Sneak-path testing of crossbar-based nonvolatile random access memories. IEEE Trans. Nanotechnol. 12, 3 (2013), 413--426.
[28]
T. Nandha Kumar et al. 2015. Operational fault detection and monitoring of a memristor-based LUT. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 429--434.
[29]
Seung Ryul Lee et al. 2012. Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory. In Proceedings of the Symposium on VLSI Technology (VLSIT’12). IEEE, 71--72.
[30]
Boxun Li et al. 2014. ICE: Inline calibration for memristor crossbar-based computing engine. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’14).
[31]
Chenchen Liu et al. 2017. Rescuing memristor-based neuromorphic design with high defects. In Proceedings of the Design Automation Conference (DAC’17). ACM, 87.
[32]
Mengyun Liu, Lixue Xia, Yu Wang, and Krishnendu Chakrabarty. 2018a. Design of fault-tolerant neuromorphic computing systems. In IEEE European Test Symposium (ETS). IEEE, 1--9.
[33]
Mengyun Liu, Lixue Xia, Yu Wang, and Krishnendu Chakrabarty. 2018b. Fault tolerance for RRAM-based matrix operations. In IEEE International Test Conference (ITC).
[34]
Peter Marwedel. 2006. Embedded System Design. Vol. 1. Springer.
[35]
Cory E. Merkel et al. 2011. Reconfigurable N-level memristor memory design. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’11). 3042--3048.
[36]
A. Prakash et al. 2015. Resistance controllability and variability improvement in a TaOx-based resistive memory for multilevel storage application. Appl. Phys. Lett. 106 (2015), 233104.
[37]
Amit Prakash et al. 2016. Multilevel cell storage and resistance variability in resistive random access memory. Phys. Sci. Rev. 1 (2016).
[38]
Mirko Prezioso et al. 2015. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 7550 (2015), 61--64.
[39]
Jennifer Rexford et al. 1994. Partitioned encoding schemes for algorithm-based fault tolerance in massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 5 (1994), 649--653.
[40]
Ali Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. SIGARCH Comput. Arch. News 44, 3 (2016), 14--26.
[41]
Chang Song et al. 2017a. A quantization-aware regularized learning method in multilevel memristor-based neuromorphic computing system. In Proceedings of the Non-Volatile Memory Systems and Applications Symposium (NVMSA’17). IEEE, 1--6.
[42]
Linghao Song et al. 2017b. PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 541--552.
[43]
Dmitri B. Strukov et al. 2008. The missing memristor found. Nature 453, 7191 (2008), 80--83.
[44]
Xiaoyu Sun et al. 2018. XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’18). IEEE, 1423--1428.
[45]
Tianqi Tang et al. 2017. Binary convolutional neural network on RRAM. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’17). IEEE, 782--787.
[46]
Jue Wang et al. 2013. i2WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). IEEE, 234--245.
[47]
Mingqing Wang et al. 2015. Theory study and implementation of configurable ECC on RRAM memory. In Proceedings of the Non-Volatile Memory Technology Symposium (NVMTS’15). IEEE, 1--3.
[48]
H.-S. Philip Wong, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai. 2012. Metal–oxide RRAM. Proc. IEEE 100, 6 (2012), 1951--1970.
[49]
Lixue Xia et al. 2016. Technological exploration of RRAM crossbar array for matrix-vector multiplication. J. Comput. Sci. Technol. 31 (2016), 3--19.
[50]
Lixue Xia et al. 2017. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In Proceedings of the Design Automation Conference (DAC’17). ACM, 33.
[51]
Lixue Xia et al. 2017. Stuck-at fault tolerance in RRAM computing systems. J. Emerg. Select. Top. Circ. Syst. 8, 1 (2017), 102--115.
[52]
Cong Xu, Dimin Niu, Naveen Muralimanohar, Norman P. Jouppi, and Yuan Xie. 2013. Understanding the trade-offs in multi-level cell ReRAM memory design. In Proceedings of the Design Automation Conference (DAC’13). IEEE.
[53]
Shimeng Yu, Bin Gao, Zheng Fang, Hongyu Yu, Jinfeng Kang, and H.-S. Philip Wong. 2013. A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. Adv. Mater. 25, 12 (2013), 1774--1779.

Cited By

View all
  • (2024)Double Adjacent Error Correction in RRAM Matrix Multiplication using Weighted Checksums2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS60994.2024.10616083(1-5)Online publication date: 3-Jul-2024
  • (2023)Compact Functional Testing for Neuromorphic Computing CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322384342:7(2391-2403)Online publication date: 1-Jul-2023
  • (2023)Training-Free Stuck-At Fault Mitigation for ReRAM-Based Deep Learning AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322228842:7(2174-2186)Online publication date: 1-Jul-2023
  • Show More Cited By

Index Terms

  1. Algorithmic Fault Detection for RRAM-based Matrix Operations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 25, Issue 3
      May 2020
      179 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3386183
      • Editor:
      • Naehyuck Chang
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 13 May 2020
      Online AM: 07 May 2020
      Accepted: 01 February 2020
      Revised: 01 January 2020
      Received: 01 September 2019
      Published in TODAES Volume 25, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. RRAM
      2. fault detection
      3. neural network

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • IEEE International Test Conference 2018

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)67
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 10 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Double Adjacent Error Correction in RRAM Matrix Multiplication using Weighted Checksums2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS60994.2024.10616083(1-5)Online publication date: 3-Jul-2024
      • (2023)Compact Functional Testing for Neuromorphic Computing CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322384342:7(2391-2403)Online publication date: 1-Jul-2023
      • (2023)Training-Free Stuck-At Fault Mitigation for ReRAM-Based Deep Learning AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322228842:7(2174-2186)Online publication date: 1-Jul-2023
      • (2023)Testability and Dependability of AI Hardware: Survey, Trends, Challenges, and PerspectivesIEEE Design & Test10.1109/MDAT.2023.324111640:2(8-58)Online publication date: Apr-2023
      • (2022)Online Fault Detection in ReRAM-Based Computing Systems for InferencingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.313953030:4(392-405)Online publication date: Apr-2022
      • (2022)MAC-ECC: In-Situ Error Correction and Its Design Methodology for Reliable NVM-Based Compute-in-Memory Inference EngineIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2022.322303112:4(835-845)Online publication date: Dec-2022
      • (2021)Perspectives on Emerging Computation-in-Memory Paradigms2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9473976(1925-1934)Online publication date: 1-Feb-2021
      • (2021)Adaptive Methods for Machine Learning-Based Testing of Integrated Circuits and Boards2021 IEEE International Test Conference (ITC)10.1109/ITC50571.2021.00023(153-162)Online publication date: Oct-2021
      • (2021)Robust Fault-Tolerant Design Based on Checksum and On-Line Testing for Memristor Neural Network2021 IEEE 30th Asian Test Symposium (ATS)10.1109/ATS52891.2021.00017(25-30)Online publication date: Nov-2021
      • (2021)Multiply accumulate operations in memristor crossbar arrays for analog computingJournal of Semiconductors10.1088/1674-4926/42/1/01310442:1(013104)Online publication date: 1-Jan-2021

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media