Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving the Performance of CNN Accelerator Architecture under the Impact of Process Variations

Published: 09 September 2023 Publication History

Abstract

Convolutional neural network (CNN) accelerators are popular specialized platforms for efficient CNN processing. As semiconductor manufacturing technology scales down to nano scale, process variation dramatically affects the chip’s quality. Process variation causes delay variation within the chip due to transistor parameter differences. CNN accelerators adopt a large number of processing elements (PEs) for parallel computing, which are highly susceptible to process variation effects. Fast CNN processing desires consistent performance among PEs; otherwise the processing speed is limited by the slowest PE within the chip. In this work, we first quantitatively model and analyze the impact of process variation on CNN accelerators’ operating frequency. We further analyze the utilization of CNN accelerators and the characteristics of CNN models. We then leverage the PE underutilization to propose a sub-matrix reformation mechanism and leverage the pixel similarity of images to propose a weight transfer technique. Both techniques are able to tolerate the low-frequency PEs and achieve performance improvement at chip level. Furthermore, a novel resilience-aware mapping technique that exploits the diversity in the importance of weights is also proposed to improve the performance. Evaluation results show that our techniques are able to achieve significant processing speed improvement with negligible accuracy loss.

References

[2]
International Technology Roadmap for Semiconductors. n.d. http://www.itrs2.net/.
[4]
Process integration, devices, and structures by semiconductor industry association. n.d. https://www.semiconductors.org/resources/2009-international-technology-roadmap-for-semiconductors-itrs/pids/. 05 Sep 2018.
[6]
P. Aguilera, J. Lee, A. Farmahini-Farahani, K. Morrow, M. Schulte, and N. S. Kim. 2014. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1–6. DOI:
[7]
D. M. Blough and A. Pelc. 1993. A clustered failure model for the memory array reconfiguration problem. IEEE Trans. Comput. 42, 5 (May1993), 518–528. DOI:
[8]
S. Borkar, T. Karnik, and Vivek De. 2004. Design and reliability challenges in nanometer technologies. In Design Automation Conference.
[9]
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 247–257. DOI:
[10]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, et al. 2014. DaDianNao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 609–622. DOI:
[11]
Yu Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In Solid-state Circuits Conference.
[12]
Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, et al. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15). 92–104.
[13]
Tim Edwards. 2019. Qflow 1.3: An Open-Source Digital Synthesis Flow. http://opencircuitdesign.com/qflow/.
[14]
P. Friedberg, Y. Cao, J. Cain, R. Wang, and C. Spanos. 2005. Modeling within-die spatial correlation effects for process-design co-optimization. In International Symposium on Quality of Electronic Design.
[15]
Matthew R. Guthaus, James E. Stine, Samira Ataei, Brian Chen, Bin Wu, and Mehedi Sarwar. 2016. OpenRAM: An open-source memory compiler. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). 1–6. DOI:
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). arxiv:1512.03385http://arxiv.org/abs/1512.03385.
[17]
Kaige Jia, Zheyu Liu, Qi Wei, Fei Qiao, Xinjun Liu, Yi Yang, et al. 2018. Calibrating process variation at system level with in-situ low-precision transfer learning for analog neural network processors. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). ACM, New York, NY, Article 12, 6 pages. DOI:
[18]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). 1–12. DOI:
[19]
U. R. Karpuzcu, K. B. Kolluru, N. S. Kim, and J. Torrellas. 2012. VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In International Conference on Dependable Systems and Networks.
[20]
J. H. Kim and S. M. Reddy. 1989. On the design of fault-tolerant two-dimensional systolic arrays for yield enhancement. IEEE Trans. Comput. 38, 4 (April1989), 515–525. DOI:
[21]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. Citeseer.
[22]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov.1998), 2278–2324. DOI:
[23]
X. Liang and D. Brooks. 2006. Mitigating the impact of process variations on processor register files and execution units. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). 504–514. DOI:
[24]
W. Lu, G. Yan, J. Li, S. Gong, Y. Han, and X. Li. 2017. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). 553–564. DOI:
[25]
Maodi Ma, Jingweijia Tan, Xiaohui Wei, and Kaige Yan. 2019. Process variation mitigation on convolutional neural network accelerator architecture. In 37th IEEE International Conference on Computer Design (ICCD’19). IEEE, 47–55. DOI:
[26]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, et al. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (Dec.2015), 211–252. DOI:
[27]
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
[28]
James E. Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W. Rhett Davis, et al. 2017. FreePDK: An open-source variation-aware design kit. In 2007 IEEE International Conference on Microelectronic Systems Education (MSE’07). 173–174. DOI:
[29]
Sy-Yen Kuo and W. K. Fuchs. 1986. Efficient spare allocation in reconfigurable arrays. In 23rd ACM/IEEE Design Automation Conference. 385–390. DOI:
[30]
V. Sze, Y. Chen, T. Yang, and J. S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (Dec.2017), 2295–2329. DOI:
[31]
J. Tan and X. Fu. 2015. Mitigating the susceptibility of GPGPUs register file to process variations. In 2015 IEEE International Parallel and Distributed Processing Symposium. 969–978. DOI:
[32]
Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In International Symposium on Low Power Electronics and Design (ISLPED’14), Yuan Xie, Tanay Karnik, Muhammad M. Khellah, and Renu Mehra (Eds.). ACM, 27–32.
[33]
Jeff Zhang, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018. Thundervolt: Enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). ACM, New York, NY, Article 19, 6 pages. DOI:
[34]
J. J. Zhang, T. Gu, K. Basu, and S. Garg. 2018. Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In 2018 IEEE 36th VLSI Test Symposium (VTS’18). 1–6. DOI:

Cited By

View all
  • (2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 28, Issue 5
September 2023
475 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3623508
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 09 September 2023
Online AM: 08 June 2023
Accepted: 28 May 2023
Revised: 10 March 2023
Received: 13 October 2022
Published in TODAES Volume 28, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Process variation
  2. CNN accelerator
  3. systolic array

Qualifiers

  • Research-article

Funding Sources

  • Jilin Scientific and Technological Development Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)7
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN acceleratorsJournal of Systems Architecture10.1016/j.sysarc.2024.103292157(103292)Online publication date: Dec-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media