research-article

BinFI: an efficient fault injector for safety-critical machine learning systems

Authors:

Karthik Pattabiraman,

Nathan DeBardelebenAuthors Info & Claims

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 69, Pages 1 - 23

https://doi.org/10.1145/3295500.3356177

Published: 17 November 2019 Publication History

Abstract

As machine learning (ML) becomes pervasive in high performance computing, ML has found its way into safety-critical domains (e.g., autonomous vehicles). Thus the reliability of ML has grown in importance. Specifically, failures of ML systems can have catastrophic consequences, and can occur due to soft errors, which are increasing in frequency due to system scaling. Therefore, we need to evaluate ML systems in the presence of soft errors.

In this work, we propose BinFI, an efficient fault injector (FI) for finding the safety-critical bits in ML applications. We find the widely-used ML computations are often monotonic. Thus we can approximate the error propagation behavior of a ML application as a monotonic function. BinFI uses a binary-search like FI technique to pinpoint the safety-critical bits (also measure the overall resilience). BinFI identifies 99.56% of safety-critical bits (with 99.63% precision) in the systems, which significantly outperforms random FI, with much lower costs.

References

[1]

Autonomous and ADAS test cars produce over 11 TB of data per day. https://www.tuxera.com/blog/autonomous-and-adas-test-cars-produce-over-11-tb-of-data-per-day/

[2]

Autonomous Car - A New Driver for Resilient Computing and Design-for-Test. https://nepp.nasa.gov/workshops/etw2016/talks/15WED/20160615-0930-Autonomous_Saxena-Nirmal-Saxena-Rec2016Jun16-nasaNEPP.pdf

[3]

Autumn model in Udacity challenge. https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/autumn

[4]

Cifar dataset. https://www.cs.toronto.edu/~kriz/cifar.html

[5]

comma.ai's steering model. https://github.com/commaai/research

[6]

Driving dataset. https://github.com/SullyChen/driving-datasets

[7]

Epoch model in Udacity challenge. https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/cg23

[8]

Functional Safety Methodologies for Automotive Applications. https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/solutions/automotive-functional-safety-wp.pdf

[9]

Mnist dataset. http://yann.lecun.com/exdb/mnist/

[10]

NVIDIA DRIVE AGX. https://www.nvidia.com/en-us/self-driving-cars/drive-platform/hardware/

[11]

On-road tests for Nvidia Dave system. https://devblogs.nvidia.com/deep-learning-self-driving-cars/

[12]

Rambo. https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/rambo

[13]

Survival dataset. https://archive.ics.uci.edu/ml/datasets/Haberman's+Survival

[14]

Tensorflow Popularity. https://towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297a

[15]

Training AI for Self-Driving Vehicles: the Challenge of Scale. https://devblogs.nvidia.com/training-self-driving-vehicles-challenge-scale/

[16]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.

Digital Library

[17]

Rizwan A Ashraf, Roberto Gioiosa, Gokcen Kestor, Ronald F DeMara, Chen-Yong Cher, and Pradip Bose. 2015. Understanding the propagation of transient errors in HPC applications. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

Digital Library

[18]

Subho S Banerjee, Saurabh Jha, James Cyriac, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2018. Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 586--597.

[19]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).

[20]

Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, Michael B Sullivan, and Mattan Erez. 2018. Evaluating and accelerating high-fidelity error injection for HPC. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 45.

Digital Library

[21]

G Cong, G Domeniconi, J Shapiro, F Zhou, and BY Chen. 2018. Accelerating Deep Neural Network Training for Action Recognition on a Cluster of GPUs. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).

[22]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).

[23]

Nathan DeBardeleben, James Laros, John T Daly, Stephen L Scott, Christian Engelmann, and Bill Harrod. 2009. High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development. Whitepaper, Dec (2009).

[24]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. (2009).

[25]

Fernando Fernandes dos Santos, Caio Lunardi, Daniel Oliveira, Fabiano Libano, and Paolo Rech. 2019. Reliability Evaluation of Mixed-Precision Architectures. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 238--249.

[26]

Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115.

[27]

Bo Fang, Karthik Pattabiraman, Matei Ripeanu, and Sudhanva Gurumurthi. 2014. Gpu-qin: A methodology for evaluating the error resilience of gpgpu applications. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 221--230.

[28]

Michael S Gashler and Stephen C Ashmore. 2014. Training deep fourier neural networks to fit time-series data. In International Conference on Intelligent Computing. Springer, 48--55.

[29]

Giorgis Georgakoudis, Ignacio Laguna, Dimitrios S Nikolopoulos, and Martin Schulz. 2017. Refine: Realistic fault injection via compiler-based instrumentation for accuracy, portability and speed. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 29.

Digital Library

[30]

Jason George, Bo Marr, Bilge ES Akgul, and Krishna V Palem. 2006. Probabilistic arithmetic and energy efficient embedded signal processing. In Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems. ACM, 158--168.

Digital Library

[31]

Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: differential fuzzing testing of deep learning systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 739--743.

Digital Library

[32]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.

Digital Library

[33]

Siva Kumar Sastry Hari, Sarita V Adve, Helia Naeimi, and Pradeep Ramachandran. 2012. Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults. In ACM SIGPLAN Notices, Vol. 47. ACM, 123--134.

[34]

Simon Haykin. 1994. Neural networks. Vol. 2. Prentice hall New York.

[35]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: a datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 620--629.

[36]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[37]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[38]

Sanghyun Hong, Pietro Frigo, Yiğitcan Kaya, Cristiano Giuffrida, and Tudor Dumitras. 2019. Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks. arXiv preprint arXiv:1906.01017 (2019).

[39]

Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and Christian Igel. 2013. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In The 2013 international joint conference on neural networks (IJCNN). IEEE, 1--8.

[40]

Jie S Hu, Feihui Li, Vijay Degalahal, Mahmut Kandemir, Narayanan Vijaykrishnan, and Mary J Irwin. 2005. Compiler-directed instruction duplication for soft error detection. In Design, Automation and Test in Europe. IEEE, 1056--1057.

[41]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

[42]

Saurabh Jha, Subho S Banerjee, James Cyriac, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2018. Avfi: Fault injection for autonomous vehicles. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 55--56.

[43]

Saurabh Jha, Timothy Tsai, Subho Banerjee, Siva Kumar Sastry Hari, Michael Sullivan, Steve Keckler, Zbigniew Kalbarczyk, and Ravishankar Iyer. 2019. ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[44]

Kyle D Julian, Jessica Lopez, Jeffrey S Brush, Michael P Owen, and Mykel J Kochenderfer. 2016. Policy compression for aircraft collision avoidance systems. In 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). IEEE, 1--10.

[45]

Zvi M Kedem, Vincent J Mooney, Kirthi Krishna Muntimadugu, and Krishna V Palem. 2011. An approach to energy-error tradeoffs in approximate ripple carry adders. In Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design. IEEE Press, 211--216.

Digital Library

[46]

Philipp Klaus Krause and Ilia Polian. 2011. Adaptive voltage over-scaling for resilient applications. In 2011 Design, Automation & Test in Europe. IEEE, 1--6.

[47]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[48]

Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems. 396--404.

[49]

Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W Keckler. 2017. Understanding error propagation in deep learning neural network (dnn) accelerators and applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 8.

Digital Library

[50]

Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2018. TensorFI: A Configurable Fault Injector for TensorFlow Applications. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 313--320.

[51]

Guanpeng Li, Karthik Pattabiraman, Siva Kumar Sastry Hari, Michael Sullivan, and Timothy Tsai. 2018. Modeling soft-error propagation in programs. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 27--38.

[52]

Wenchao Li, Susmit Jha, and Sanjit A Seshia. 2013. Generating control logic for optimized soft error resilience. In Proceedings of the 9th Workshop on Silicon Errors in Logic-System Effects (SELSE'13), Palo Alto, CA, USA. Citeseer.

[53]

Robert E Lyons and Wouter Vanderkulk. 1962. The use of triple-modular redundancy to improve computer reliability. IBM journal of research and development 6, 2 (1962), 200--209.

[54]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.

[55]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30. 3.

[56]

Marisol Monterrubio-Velasco, José Carlos Carrasco-Jimenez, Octavio Castillo-Reyes, Fernando Cucchietti, and Josep De la Puente. 2018. A Machine Learning Approach for Parameter Screening in Earthquake Simulation. In 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 348--355.

[57]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807--814.

Digital Library

[58]

Nahmsuk Oh, Philip P Shirvani, and Edward J McCluskey. 2002. Control-flow checking by software signatures. IEEE transactions on Reliability 51, 1 (2002), 111--122.

[59]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.

Digital Library

[60]

Pranav Rajpurkar, Awni Y Hannun, Masoumeh Haghpanahi, Codie Bourn, and Andrew Y Ng. 2017. Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv preprint arXiv:1707.01836 (2017).

[61]

Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017).

[62]

Brandon Reagen, Udit Gupta, Lillian Pentecost, Paul Whatmough, Sae Kyu Lee, Niamh Mulholland, David Brooks, and Gu-Yeon Wei. 2018. Ares: A framework for quantifying the resilience of deep neural networks. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). IEEE, 1--6.

Digital Library

[63]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.

[64]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.

[65]

Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68.

Digital Library

[66]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

[67]

Abu Hasnat Mohammad Rubaiyat, Yongming Qin, and Homa Alemzadeh. 2018. Experimental resilience assessment of an open-source driving agent. arXiv preprint arXiv:1807.06172 (2018).

[68]

Behrooz Sangchoolie, Karthik Pattabiraman, and Johan Karlsson. 2017. One bit is (not) enough: An empirical study of the impact of single and multiple bit-flip errors. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 97--108.

[69]

Siva Kumar Sastry Hari, Radha Venkatagiri, Sarita V Adve, and Helia Naeimi. 2014. GangES: Gang error simulation for hardware resiliency evaluation. ACM SIGARCH Computer Architecture News 42, 3 (2014), 61--72.

Digital Library

[70]

Bianca Schroeder and Garth A Gibson. 2007. Understanding failures in petascale computers. In Journal of Physics: Conference Series, Vol. 78. IOP Publishing, 012022.

[71]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.

[72]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[73]

Marc Snir, Robert W Wisniewski, Jacob A Abraham, Sarita V Adve, Saurabh Bagchi, Pavan Balaji, Jim Belak, Pradip Bose, Franck Cappello, Bill Carlson, et al. 2014. Addressing failures in exascale computing. The International Journal of High Performance Computing Applications 28, 2 (2014), 129--173.

Digital Library

[74]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[75]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence.

Digital Library

[76]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[77]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.

[78]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering. ACM, 303--314.

Digital Library

[79]

Jiesheng Wei, Anna Thomas, Guanpeng Li, and Karthik Pattabiraman. 2014. Quantifying the accuracy of high-level fault injection techniques for hardware faults. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 375--382.

Digital Library

[80]

Zhaohan Xiong, Martin K Stiles, and Jichao Zhao. 2017. Robust ECG signal classification for detection of atrial fibrillation using a novel neural network. In 2017 Computing in Cardiology (CinC). IEEE, 1--4.

[81]

Hong-Jun Yoon, Arvind Ramanathan, and Georgia Tourassi. 2016. Multi-task deep neural networks for automated extraction of primary site and laterality information from cancer pathology reports. In INNS Conference on Big Data. Springer, 195--204.

[82]

Ming Zhang, Subhasish Mitra, TM Mak, Norbert Seifert, Nicholas J Wang, Quan Shi, Kee Sup Kim, Naresh R Shanbhag, and Sanjay J Patel. 2006. Sequential element design with built-in soft error resilience. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14, 12 (2006), 1368--1378.

Digital Library

Cited By

Wei XWang CYue HTan JGuan ZJiang NZheng XZhao JQiu M(2024)ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error DetectionACM Transactions on Architecture and Code Optimization10.1145/367490921:3(1-26)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3674909
Weng OMeza ABock QHawks BCampos JTran NDuarte JKastner R(2024)FKeras: A Sensitivity Analysis Tool for Edge Neural NetworksACM Journal on Autonomous Transportation Systems10.1145/3665334Online publication date: 18-May-2024
https://dl.acm.org/doi/10.1145/3665334
Santos FCarro LVella FRech P(2024)Assessing the Impact of Compiler Optimizations on GPUs ReliabilityACM Transactions on Architecture and Code Optimization10.1145/363824921:2(1-22)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3638249
Show More Cited By

Recommendations

G-SEPM: building an accurate and efficient soft error prediction model for GPGPUs
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

As GPUs become ubiquitous in large-scale general purpose HPC systems (GPGPUs), ensuring the reliable execution of such systems in the presence of soft errors is increasingly essential. To provide insights into how resilient GPU programs are toward soft ...
PEPPA-X: finding program test inputs to bound silent data corruption vulnerability in HPC applications
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Transient hardware faults have become prevalent due to the shrinking size of transistors, leading to silent data corruptions (SDCs). Therefore, HPC applications need to be evaluated (e.g., via fault injections) and protected to meet the reliability ...
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems

The authors describe a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2019

1921 pages

ISBN:9781450362290

DOI:10.1145/3295500

General Chair:
Michela Taufer,
Program Chairs:
Pavan Balaji,
Antonio J. Peña

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada (NSERC)

Conference

SC '19

Sponsor:

SIGHPC

SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis

November 17 - 19, 2019

Colorado, Denver

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
992
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)11

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wei XWang CYue HTan JGuan ZJiang NZheng XZhao JQiu M(2024)ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error DetectionACM Transactions on Architecture and Code Optimization10.1145/367490921:3(1-26)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3674909
Weng OMeza ABock QHawks BCampos JTran NDuarte JKastner R(2024)FKeras: A Sensitivity Analysis Tool for Edge Neural NetworksACM Journal on Autonomous Transportation Systems10.1145/3665334Online publication date: 18-May-2024
https://dl.acm.org/doi/10.1145/3665334
Santos FCarro LVella FRech P(2024)Assessing the Impact of Compiler Optimizations on GPUs ReliabilityACM Transactions on Architecture and Code Optimization10.1145/363824921:2(1-22)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3638249
Huang HLiu CXue XLiu BLi HLi X(2024)MRFI: An Open-Source Multiresolution Fault Injection Framework for Neural Network ProcessingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.338440432:7(1325-1335)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TVLSI.2024.3384404
Rech P(2024)Artificial Neural Networks for Space and Safety-Critical Applications: Reliability Issues and Potential SolutionsIEEE Transactions on Nuclear Science10.1109/TNS.2024.334995671:4(377-404)Online publication date: May-2024
https://doi.org/10.1109/TNS.2024.3349956
Hsiao YWan ZJia TGhosal RMahmoud ARaychowdhury ABrooks DWei GReddi V(2024)Silent Data Corruption in Robot Operating System: A Case for End-to-End System-Level Fault Analysis Using Autonomous UAVsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333229343:4(1037-1050)Online publication date: May-2024
https://doi.org/10.1109/TCAD.2023.3332293
Atoofian E(2024)Hardened-TC: A Low-cost Reliability Solution for CNNs Run by Modern GPUs2024 IEEE 37th International System-on-Chip Conference (SOCC)10.1109/SOCC62300.2024.10737768(1-6)Online publication date: 16-Sep-2024
https://doi.org/10.1109/SOCC62300.2024.10737768
Colucci ASteininger AShafique M(2024)EISFINN: On the Role of Efficient Importance Sampling in Fault Injection Campaigns for Neural Network Robustness Analysis2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS60994.2024.10616075(1-3)Online publication date: 3-Jul-2024
https://doi.org/10.1109/IOLTS60994.2024.10616075
Colucci ASteininger AShafique M(2024)SBanTEM: A Novel Methodology for Sparse Band Tensors as Soft-Error Mitigation in Sparse Convolutional Neural Networks2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS60994.2024.10616070(1-3)Online publication date: 3-Jul-2024
https://doi.org/10.1109/IOLTS60994.2024.10616070
Traiola MPappalardo SPiri ARuospo ADeveautour BSanchez EBosio ASaeedi SCarpegna AGöğebakan AMagliano ESavino A(2024)Approximate Fault-Tolerant Neural Network Systems2024 IEEE European Test Symposium (ETS)10.1109/ETS61313.2024.10567290(1-10)Online publication date: 20-May-2024
https://doi.org/10.1109/ETS61313.2024.10567290
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents