Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3287624.3287641acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

TNPU: an efficient accelerator architecture for training convolutional neural networks

Published: 21 January 2019 Publication History

Abstract

Training large scale convolutional neural networks (CNNs) is an extremely computation and memory intensive task that requires massive computational resources and training time. Recently, many accelerator solutions have been proposed to improve the performance and efficiency of CNNs. Existing approaches mainly focus on the inference phase of CNN, and can hardly address the new challenges posed in CNN training: the resource requirement diversity and bidirectional data dependency between convolutional layers (CVLs) and fully-connected layers (FCLs). To overcome this problem, this paper presents a new accelerator architecture for CNN training, called TNPU, which leverages the complementary effect of the resource requirements between CVLs and FCLs. Unlike prior approaches optimizing CVLs and FCLs in separate way, we take an alternative by smartly orchestrating the computation of CVLs and FCLs in single computing unit to work concurrently so that both computing and memory resources will maintain high utilization, thereby boosting the performance. We also proposed a simplified out-of-order scheduling mechanism to address the bidirectional data dependency issues in CNN training. The experiments show that TNPU achieves a speedup of 1.5x and 1.3x, with an average energy reduction of 35.7% and 24.1% over comparably provisioned state-of-the-art accelerators (DNPU and DaDianNao), respectively.

References

[1]
J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In Proceedings of the 43rd ISCA.
[2]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. {n. d.}. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the 19th ASPLOS.
[3]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Proc. of MICRO.
[4]
Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits (2017).
[5]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor.
[6]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd ISCA.
[7]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135--1143.
[8]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.
[9]
J. Li, G. Yan, W. Lu, S. Jiang, S. Gong, J. Wu, and X. Li. 2018. CCR: A concise convolution rule for sparse neural network accelerators. In 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE). 189--194.
[10]
J. Li, G. Yan, W. Lu, S. Jiang, S. Gong, J. Wu, and X. Li. 2018. SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators. In 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE). 343--348.
[11]
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In 2017 IEEE 23th HPCA.
[12]
Angshuman Parashar et al. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of 44th ISCA. ACM.
[13]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proc. of FPGA.
[14]
D. Shin, J. Lee, J. Lee, J. Lee, and Hoi-Jun Yoo. 2017. An energy-efficient deep learning processor with heterogeneous multi-core architecture for convolutional neural networks and recurrent neural networks. In 2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS). 1--2.
[15]
Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In Proc. of FPGA.
[16]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 FPGA.
[17]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proc. of MICRO.

Cited By

View all
  • (2024)WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition WinogradIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.333847171:2(634-645)Online publication date: Feb-2024
  • (2023)An Overview of Energy-Efficient DNN Training ProcessorsOn-Chip Training NPU - Algorithm, Architecture and SoC Design10.1007/978-3-031-34237-0_8(183-210)Online publication date: 29-May-2023
  • (2022)THETA: A High-Efficiency Training Accelerator for DNNs With Triple-Side Sparsity ExplorationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.317558230:8(1034-1046)Online publication date: Aug-2022
  • Show More Cited By

Index Terms

  1. TNPU: an efficient accelerator architecture for training convolutional neural networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference
      January 2019
      794 pages
      ISBN:9781450360074
      DOI:10.1145/3287624
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • IEICE ESS: Institute of Electronics, Information and Communication Engineers, Engineering Sciences Society
      • IEEE CAS
      • IEEE CEDA
      • IPSJ SIG-SLDM: Information Processing Society of Japan, SIG System LSI Design Methodology

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 January 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. CNN training
      2. accelerator architecture
      3. convolutional neural networks

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ASPDAC '19
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 466 of 1,454 submissions, 32%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)59
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 11 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition WinogradIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.333847171:2(634-645)Online publication date: Feb-2024
      • (2023)An Overview of Energy-Efficient DNN Training ProcessorsOn-Chip Training NPU - Algorithm, Architecture and SoC Design10.1007/978-3-031-34237-0_8(183-210)Online publication date: 29-May-2023
      • (2022)THETA: A High-Efficiency Training Accelerator for DNNs With Triple-Side Sparsity ExplorationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.317558230:8(1034-1046)Online publication date: Aug-2022
      • (2022)H2Learn: High-Efficiency Learning Accelerator for High-Accuracy Spiking Neural NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313834741:11(4782-4796)Online publication date: Nov-2022
      • (2022)Energy-Efficient DNN Training Processors on Micro-AI SystemsIEEE Open Journal of the Solid-State Circuits Society10.1109/OJSSCS.2022.32190342(259-275)Online publication date: 2022
      • (2021)LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643567(1-8)Online publication date: 1-Nov-2021
      • (2020)A Gradient-Interleaved Scheduler for Energy-Efficient Backpropagation for Training Neural Networks2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181242(1-5)Online publication date: Oct-2020
      • (2020)TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181001(1-5)Online publication date: Oct-2020
      • (2020)Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network TrainingParallel Architectures, Algorithms and Programming10.1007/978-981-15-2767-8_42(479-491)Online publication date: 26-Jan-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media