Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture

Published: 12 August 2021 Publication History
  • Get Citation Alerts
  • Abstract

    With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers, and so on. Thus, the safety risk of deploying NNs is now drawing much attention. In this article, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then, we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-18). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.

    References

    [1]
    Austin P. Arechiga and Alan J. Michaels. 2018. The robustness of modern deep learning architectures against single event upset errors. In IEEE High Performance Extreme Computing Conference (HPEC'18). 1–6.
    [2]
    Hossein Asadi and Mehdi B. Tahoori. 2007. Analytical techniques for soft error rate modeling and mitigation of FPGA-based designs. IEEE Trans. Very Large Scale Integ. Syst. 15, 12 (Dec. 2007), 1320–1331.
    [3]
    Bowen Baker, Otkrist Gupta, R. Raskar, and N. Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823 (2017).
    [4]
    Cristiana Bolchini, Antonio Miele, and Marco D. Santambrogio. 2007. TMR and partial dynamic reconfiguration to mitigate SEU faults in FPGAs. In IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'07). 87–95.
    [5]
    Shekhar Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6 (2005), 10–16.
    [6]
    Carl Carmichael, Michael Caffrey, and Anthony Salazar. 2000. Correcting single-event upsets through Virtex partial configuration. Xilinx Application Notes 216 (2000), v1.
    [7]
    Ching-Yi Chen, Hsiu-Chuan Shih, Cheng-Wen Wu, C. Lin, Pi-Feng Chiu, S. Sheu, and F. Chen. 2015. RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Trans. Comput. 64, 1 (Jan. 2015), 180–190.
    [8]
    Lerong Chen, Jiawen Li, Yiran Chen, Qiuping Deng, Jiyuan Shen, X. Liang, and L. Jiang. 2017. Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar. In IEEE/ACM Design, Automation and Test in Europe Conference (DATE'17). 19–24.
    [9]
    Tianshi Chen, Zidong Du, Ninghui Sun, J. Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14).
    [10]
    Ping Chi, Shuangchen Li, C. Xu, Tao Zhang, J. Zhao, Yongpan Liu, Y. Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In IEEE/ACM International Symposium on Computer Architecture (ISCA'16). IEEE Press, 27–39.
    [11]
    Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. 2019. A survey of FPGA-based neural network inference accelerators. ACM Trans. Reconfig. Technol. Syst. 12, 1 (Mar. 2019).
    [12]
    Ghouthi Boukli Hacene, François Leduc-Primeau, Amal Ben Soussia, Vincent Gripon, and F. Gagnon. 2019. Training modern deep neural networks for memory-fault robustness. In IEEE International Symposium on Circuits and Systems (ISCAS'19). 1–5.
    [13]
    Mahta Haghi and Jeff Draper. 2009. The 90 nm double-DICE storage element to reduce single-event upsets. In IEEE International Midwest Symposium on Circuits and Systems (MWSCAS'09). IEEE, 463–466.
    [14]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). 770–778.
    [15]
    Zhezhi He, Jie Lin, Rickard Ewetz, J. Yuan, and Deliang Fan. 2019. Noise injection adaption: End-to-end ReRAM crossbar non-ideal effect adaption for neural network mapping. In ACM/IEEE Design Automation Conference (DAC'19).
    [16]
    Jörg Henkel, Lars Bauer, Nikil Dutt, Puneet Gupta, Sani Nassif, Muhammad Shafique, Mehdi Tahoori, and Norbert Wehn. 2013. Reliable on-chip systems in the nano-era: Lessons learnt and future trends. In ACM/IEEE Design Automation Conference (DAC'13). IEEE, 1–10.
    [17]
    Miao Hu, Hai Li, Yiran Chen, Q. Wu, and G. Rose. 2013. BSB training scheme implementation on memristor-based circuit. In IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA'13). IEEE, 80–87.
    [18]
    Wenqin Huangfu, Lixue Xia, Ming Cheng, Xiling Yin, Tianqi Tang, Boxun Li, Krishnendu Chakrabarty, Yuan Xie, Yu Wang, and Huazhong Yang. 2017. Computation-oriented fault-tolerance schemes for RRAM computing systems. In IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC'17). IEEE, 794–799.
    [19]
    Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18 (2017).
    [20]
    Sachhidh Kannan, Naghmeh Karimi, Ramesh Karri, and Ozgur Sinanoglu. 2015. Modeling, detection, and diagnosis of faults in multilevel memristor memories. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 34 (2015), 822–834.
    [21]
    Sachhidh Kannan, Jeyavijayan Rajendran, Ramesh Karri, and Ozgur Sinanoglu. 2013. Sneak-path testing of memristor-based memories. In 26th International Conference on VLSI Design and 12th International Conference on Embedded Systems. 386–391.
    [22]
    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR'15).
    [23]
    Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. http://www.cs.toronto.edu/ kriz/cifar.html.
    [24]
    Binh Q. Le, Alessandro Grossi, Elisa Vianello, Tony Wu, Giusy Lama, Edith Beigne, H.-S. Philip Wong, and Subhasish Mitra. 2018. Resistive RAM with multiple bits per cell: Array-level demonstration of 3 bits per cell. IEEE Transactions on Electron Devices 66, 1 (2018), 641–646.
    [25]
    Guanpeng Li, S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and Stephen W. Keckler. 2017. Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In ACM/IEEE Supercomputing Conference (SC'17). ACM, 8.
    [26]
    F. Libano, B. Wilson, J. Anderson, M. Wirthlin, C. Cazzaniga, C. Frost, and P. Rech. 2019. Selective hardening for neural networks in FPGAs. IEEE Trans. Nucl. Sci. 66 (2019), 216–222.
    [27]
    Beiye Liu, Hai Li, Yiran Chen, Xin Li, Qing Wu, and Tingwen Huang. 2015. Vortex: Variation-aware training for memristor X-bar. In ACM/IEEE Design Automation Conference (DAC'15). 1–6.
    [28]
    Chenchen Liu, Miao Hu, John Paul Strachan, and Hai Li. 2017. Rescuing memristor-based neuromorphic design with high defects. In 54th ACM/EDAC/IEEE Design Automation Conference (DAC'17). IEEE, 1–6.
    [29]
    Hanxiao Liu, K. Simonyan, and Yiming Yang. 2019. DARTS: Differentiable architecture search. In International Conference on Learning Representations (ICLR'19).
    [30]
    Tao Liu, Wujie Wen, Lei Jiang, Yanzhi Wang, Chengmo Yang, and Gang Quan. 2019. A fault-tolerant neural network architecture. In ACM/IEEE Design Automation Conference (DAC'19). 55:1–55:6.
    [31]
    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot MultiBox detector. In European Conference on Computer Vision (ECCV'16). Springer, 21–37.
    [32]
    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15). 3431–3440.
    [33]
    Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations (ICLR'17).
    [34]
    Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. Neural architecture optimization. In Conference on Neural Information Processing Systems (NIPS'18). 7816–7827.
    [35]
    Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, and Huazhong Yang. 2020. A generic graph-based neural architecture encoding scheme for predictor-based NAS. In European Conference on Computer Vision (ECCV'20).
    [36]
    Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning (ICML'18).
    [37]
    Jiantao Qiu, J. Wang, Song Yao, K. Guo, Boxun Li, Erjin Zhou, J. Yu, T. Tang, N. Xu, S. Song, Yu Wang, and H. Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In ACM International Symposium on Field-Programmable Gate Arrays (FPGA'16). ACM, 26–35.
    [38]
    Brandon Reagen, Udit Gupta, L. Pentecost, P. Whatmough, S. Lee, Niamh Mulholland, D. Brooks, and Gu-Yeon Wei. 2018. Ares: A framework for quantifying the resilience of deep neural networks. In ACM/IEEE Design Automation Conference (DAC'18) (DAC'18).
    [39]
    Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In AAAI Conference on Artificial Intelligence, Vol. 33. 4780–4789.
    [40]
    Christoph Schorn, Andre Guntoro, and Gerd Ascheid. 2018. Accurate neuron resilience prediction for a flexible reliability management in neural network accelerators. In IEEE/ACM Design, Automation and Test in Europe Conference (DATE'18).
    [41]
    Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! Targeted clean-label poisoning attacks on neural networks. In Conference on Neural Information Processing Systems (NIPS'18). 6103–6113.
    [42]
    Xiaoxuan She and N. Li. 2017. Reducing critical configuration bits via partial TMR for SEU mitigation in FPGAs. IEEE Trans. Nucl. Sci. 64 (2017), 2626–2632.
    [43]
    Charles Slayman. 2011. Soft error trends and mitigation techniques in memory devices. In Reliability and Maintainability Symposium. 1–5.
    [44]
    Christian Szegedy, W. Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and R. Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
    [45]
    Stylianos I. Venieris and C. Bouganis. 2019. fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst. 30 (2019), 326–342.
    [46]
    Jean-Charles Vialatte and François Leduc-Primeau. 2017. A study of deep learning robustness against computation failures. arXiv:1704.05396 (2017).
    [47]
    Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 3-4 (1992), 229–256.
    [48]
    Jiyong Woo, Tien Van Nguyen, Jeong-Hun Kim, J. Im, Solyee Im, Yeriaron Kim, Kyeong-Sik Min, and S. Moon. 2020. Exploiting defective RRAM array as synapses of HTM spatial pooler with boost-factor adjustment scheme for defect-tolerant neuromorphic systems. Sci. Rep. 10 (2020).
    [49]
    Lixue Xia, Wenqin Huangfu, Tianqi Tang, Xiling Yin, K. Chakrabarty, Yuan Xie, Y. Wang, and H. Yang. 2018. Stuck-at fault tolerance in RRAM computing systems. IEEE J. Emerg. Select. Topics Circ. Syst. 8 (2018), 102–115.
    [50]
    Lixue Xia, Mengyun Liu, Xuefei Ning, K. Chakrabarty, and Yu Wang. 2017. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In ACM/IEEE Design Automation Conference (DAC'17). 1–6.
    [51]
    Cheng-Xin Xue, J.-M. Hung, H.-Y. Kao, Y.-H. Huang, S.-P. Huang, F.-C. Chang, P. Chen, T.-W. Liu, C.-J. Jhang, C.-I. Su, W.-S. Khwa, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, Y.-D. Chih, T.-Y. J. Chang, and M.-F. Chang. 2021. A 22nm 4Mb 8b-precision ReRAM computing-in-memory macrowith 11.91 to 195.7TOPS/W for tiny AI edges devices. In IEEE International Solid-State Circuits Conference (ISSCC'21).
    [52]
    Zheyu Yan, Yiyu Shi, Wang Liao, M. Hashimoto, Xichuan Zhou, and Cheng Zhuo. 2020. When single event upset meets deep neural networks: Observations, explorations, and remedies. In IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC'20). 163–168.
    [53]
    Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpreting CNN knowledge via an explanatory graph. In AAAI Conference on Artificial Intelligence.
    [54]
    Yang Zhao, X. Hu, Shuangchen Li, Jing Ye, Lei Deng, Y. Ji, Jianyu Xu, Dong Wu, and Yuan Xie. 2019. Memory Trojan attack on neural network accelerators. IEEE/ACM Design, Automation and Test in Europe Conference (DATE'19). 1415–1420.
    [55]
    Zhuoran Zhao, D. Agiakatsikas, N. H. Nguyen, E. Cetin, and O. Diessel. 2018. Fine-grained module-based error recovery in FPGA-based TMR systems. ACM Trans. Reconfig. Technol. Syst. 11, 1 (2018), 4.
    [56]
    Zhenhua Zhu, Hanbo Sun, Yujun Lin, Guohao Dai, L. Xia, Song Han, Yu Wang, and H. Yang. 2019. A configurable multi-precision CNN computing framework based on single bit RRAM. In ACM/IEEE Design Automation Conference (DAC'19). 1–6.
    [57]
    Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In International Conference on Learning Representations (ICLR'17) (2017).

    Cited By

    View all
    • (2024)An Overlay Accelerator of DeepLab CNN for Spacecraft Image Segmentation on FPGARemote Sensing10.3390/rs1605089416:5(894)Online publication date: 2-Mar-2024
    • (2024)MRFI: An Open-Source Multiresolution Fault Injection Framework for Neural Network ProcessingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.338440432:7(1325-1335)Online publication date: Jul-2024
    • (2023)Soft Error Reliability Analysis of Vision TransformersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.331713831:12(2126-2136)Online publication date: 5-Oct-2023
    • Show More Cited By

    Index Terms

    1. FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 26, Issue 6
        November 2021
        218 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/3472284
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Journal Family

        Publication History

        Published: 12 August 2021
        Accepted: 01 April 2021
        Revised: 01 March 2021
        Received: 01 December 2020
        Published in TODAES Volume 26, Issue 6

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Neural architecture search
        2. fault tolerance
        3. neural networks

        Qualifiers

        • Research-article
        • Refereed

        Funding Sources

        • National Natural Science Foundation of China
        • National Key R&D Program of China
        • Beijing National Research Center for Information Science and Technology (BNRist)
        • Beijing Innovation Center for Future Chips
        • Tsinghua University and Toyota Joint Research Center for AI Technology of Automated Vehicle
        • Beijing Academy of Artificial Intelligence

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)64
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)An Overlay Accelerator of DeepLab CNN for Spacecraft Image Segmentation on FPGARemote Sensing10.3390/rs1605089416:5(894)Online publication date: 2-Mar-2024
        • (2024)MRFI: An Open-Source Multiresolution Fault Injection Framework for Neural Network ProcessingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.338440432:7(1325-1335)Online publication date: Jul-2024
        • (2023)Soft Error Reliability Analysis of Vision TransformersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.331713831:12(2126-2136)Online publication date: 5-Oct-2023
        • (2023)Exploring Winograd Convolution for Cost-Effective Neural Network Fault ToleranceIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.330689431:11(1763-1773)Online publication date: 1-Sep-2023
        • (2023)Statistical Modeling of Soft Error Influence on Neural NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.326640542:11(4152-4163)Online publication date: 11-Apr-2023
        • (2023)TOSA: Tolerating Stuck-At-Faults in Edge-based RRAM Inference Accelerators2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00172(1181-1190)Online publication date: 17-Dec-2023
        • (2023)Design of an experimental setup for the implementation of CNNs in APSoCs2023 IEEE Colombian Caribbean Conference (C3)10.1109/C358072.2023.10436151(1-5)Online publication date: 22-Nov-2023
        • (2023)Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging PerspectiveIEEE Access10.1109/ACCESS.2023.330037611(89803-89834)Online publication date: 2023
        • (2022)Special Session: Fault-Tolerant Deep Learning: A Hierarchical Perspective2022 IEEE 40th VLSI Test Symposium (VTS)10.1109/VTS52500.2021.9794239(1-12)Online publication date: 25-Apr-2022
        • (2022)Soft Error Tolerant Convolutional Neural Networks on FPGAs With Ensemble LearningIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.313849130:3(291-302)Online publication date: Mar-2022

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media