research-article

Improving the Performance of CNN Accelerator Architecture under the Impact of Process Variations

Authors:

Jingweijia Tan,

Kaige YanAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 28, Issue 5

Article No.: 85, Pages 1 - 21

https://doi.org/10.1145/3604236

Published: 09 September 2023 Publication History

Abstract

Convolutional neural network (CNN) accelerators are popular specialized platforms for efficient CNN processing. As semiconductor manufacturing technology scales down to nano scale, process variation dramatically affects the chip’s quality. Process variation causes delay variation within the chip due to transistor parameter differences. CNN accelerators adopt a large number of processing elements (PEs) for parallel computing, which are highly susceptible to process variation effects. Fast CNN processing desires consistent performance among PEs; otherwise the processing speed is limited by the slowest PE within the chip. In this work, we first quantitatively model and analyze the impact of process variation on CNN accelerators’ operating frequency. We further analyze the utilization of CNN accelerators and the characteristics of CNN models. We then leverage the PE underutilization to propose a sub-matrix reformation mechanism and leverage the pixel similarity of images to propose a weight transfer technique. Both techniques are able to tolerate the low-frequency PEs and achieve performance improvement at chip level. Furthermore, a novel resilience-aware mapping technique that exploits the diversity in the importance of weights is also proposed to improve the performance. Evaluation results show that our techniques are able to achieve significant processing speed improvement with negligible accuracy loss.

References

[1]

Cifar-10 CNN. n.d. https://keras.io/examples/cifar10_cnn/.

[2]

International Technology Roadmap for Semiconductors. n.d. http://www.itrs2.net/.

[3]

Keras. n.d. https://keras.io/.

[4]

Process integration, devices, and structures by semiconductor industry association. n.d. https://www.semiconductors.org/resources/2009-international-technology-roadmap-for-semiconductors-itrs/pids/. 05 Sep 2018.

[5]

TensorFlow. n.d. https://www.tensorflow.org/.

[6]

P. Aguilera, J. Lee, A. Farmahini-Farahani, K. Morrow, M. Schulte, and N. S. Kim. 2014. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1–6. DOI:

[7]

D. M. Blough and A. Pelc. 1993. A clustered failure model for the memory array reconfiguration problem. IEEE Trans. Comput. 42, 5 (May1993), 518–528. DOI:

Digital Library

[8]

S. Borkar, T. Karnik, and Vivek De. 2004. Design and reliability challenges in nanometer technologies. In Design Automation Conference.

Digital Library

[9]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 247–257. DOI:

Digital Library

[10]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, et al. 2014. DaDianNao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 609–622. DOI:

Digital Library

[11]

Yu Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In Solid-state Circuits Conference.

[12]

Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, et al. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15). 92–104.

Digital Library

[13]

Tim Edwards. 2019. Qflow 1.3: An Open-Source Digital Synthesis Flow. http://opencircuitdesign.com/qflow/.

[14]

P. Friedberg, Y. Cao, J. Cain, R. Wang, and C. Spanos. 2005. Modeling within-die spatial correlation effects for process-design co-optimization. In International Symposium on Quality of Electronic Design.

[15]

Matthew R. Guthaus, James E. Stine, Samira Ataei, Brian Chen, Bin Wu, and Mehedi Sarwar. 2016. OpenRAM: An open-source memory compiler. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). 1–6. DOI:

Digital Library

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). arxiv:1512.03385 http://arxiv.org/abs/1512.03385.

[17]

Kaige Jia, Zheyu Liu, Qi Wei, Fei Qiao, Xinjun Liu, Yi Yang, et al. 2018. Calibrating process variation at system level with in-situ low-precision transfer learning for analog neural network processors. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). ACM, New York, NY, Article 12, 6 pages. DOI:

Digital Library

[18]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). 1–12. DOI:

Digital Library

[19]

U. R. Karpuzcu, K. B. Kolluru, N. S. Kim, and J. Torrellas. 2012. VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In International Conference on Dependable Systems and Networks.

[20]

J. H. Kim and S. M. Reddy. 1989. On the design of fault-tolerant two-dimensional systolic arrays for yield enhancement. IEEE Trans. Comput. 38, 4 (April1989), 515–525. DOI:

Digital Library

[21]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. Citeseer.

[22]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov.1998), 2278–2324. DOI:

[23]

X. Liang and D. Brooks. 2006. Mitigating the impact of process variations on processor register files and execution units. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 504–514. DOI:

Digital Library

[24]

W. Lu, G. Yan, J. Li, S. Gong, Y. Han, and X. Li. 2017. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). 553–564. DOI:

[25]

Maodi Ma, Jingweijia Tan, Xiaohui Wei, and Kaige Yan. 2019. Process variation mitigation on convolutional neural network accelerator architecture. In 37th IEEE International Conference on Computer Design (ICCD’19). IEEE, 47–55. DOI:

[26]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, et al. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (Dec.2015), 211–252. DOI:

Digital Library

[27]

K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).

[28]

James E. Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W. Rhett Davis, et al. 2017. FreePDK: An open-source variation-aware design kit. In 2007 IEEE International Conference on Microelectronic Systems Education (MSE’07). 173–174. DOI:

Digital Library

[29]

Sy-Yen Kuo and W. K. Fuchs. 1986. Efficient spare allocation in reconfigurable arrays. In 23rd ACM/IEEE Design Automation Conference. 385–390. DOI:

[30]

V. Sze, Y. Chen, T. Yang, and J. S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (Dec.2017), 2295–2329. DOI:

[31]

J. Tan and X. Fu. 2015. Mitigating the susceptibility of GPGPUs register file to process variations. In 2015 IEEE International Parallel and Distributed Processing Symposium. 969–978. DOI:

Digital Library

[32]

Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In International Symposium on Low Power Electronics and Design (ISLPED’14), Yuan Xie, Tanay Karnik, Muhammad M. Khellah, and Renu Mehra (Eds.). ACM, 27–32.

Digital Library

[33]

Jeff Zhang, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018. Thundervolt: Enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). ACM, New York, NY, Article 19, 6 pages. DOI:

Digital Library

[34]

J. J. Zhang, T. Gu, K. Basu, and S. Garg. 2018. Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In 2018 IEEE 36th VLSI Test Symposium (VTS’18). 1–6. DOI:

Cited By

Toca-Díaz YGran Tejero RValero A(2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024
https://doi.org/10.1016/j.sysarc.2024.103292

Index Terms

Improving the Performance of CNN Accelerator Architecture under the Impact of Process Variations
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Systolic arrays
  2. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures
    2. Reliability
2. Hardware
  1. Robustness
    1. Design for manufacturability
      1. Process variations

Recommendations

MOCCA: A Process Variation Tolerant Systolic DNN Accelerator using CNFETs in Monolithic 3D
GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022

Hardware accelerators based on systolic arrays have become the dominant method for efficient processing of deep neural networks (DNNs). Although such designs provide significant performance improvement compared to its contemporary CPUs or GPUs, their ...
Efficiently Managing the Impact of Hardware Variability on GPUs’ Streaming Processors

Graphics Processing Units (GPUs) are widely used in general-purpose high-performance computing fields due to their highly parallel architecture. In recent years, a new era with the nanometer scale integrated circuit manufacture process has come. As a ...
Reducing random-dopant fluctuation impact using footer transistors in many-core systems

Process variation creates core-speed discrepancy among the core in a many-core platforms. Random variation is one of the important components that contributes into core-speed discrepancy. In this paper, we propose a novel technique that uses footer ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 28, Issue 5

September 2023

475 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3623508

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 September 2023

Online AM: 08 June 2023

Accepted: 28 May 2023

Revised: 10 March 2023

Received: 13 October 2022

Published in TODAES Volume 28, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Jilin Scientific and Technological Development Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
240
Total Downloads

Downloads (Last 12 months)116
Downloads (Last 6 weeks)7

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Toca-Díaz YGran Tejero RValero A(2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024
https://doi.org/10.1016/j.sysarc.2024.103292

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents