research-article

Open access

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Authors:

David GreggAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 15, Issue 3

Article No.: 31, Pages 1 - 24

https://doi.org/10.1145/3233300

Published: 04 September 2018 Publication History

All formats PDF

Abstract

Convolutional neural networks (CNNs) are one of the most successful machine-learning techniques for image, voice, and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs that typically contain large numbers of multiply-accumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate count and power consumption. “Weight-sharing” accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins, and the bin index is used to access the weight-shared value. We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight-shared CNN. PASM re-architects the MAC to instead count the frequency of each weight and place it in a bin. The accumulated value is computed in a subsequent multiply phase, significantly reducing gate count and power consumption of the CNN. In this article, we implement PASM in a weight-shared CNN convolution hardware accelerator and analyze its effectiveness. Experiments show that for a clock speed 1GHz implemented on a 45nm ASIC process our approach results in fewer gates, smaller logic, and reduced power with only a slight increase in latency. We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate the MAC operations.

References

[1]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural Support Programming Languages and Operating Systems. 269--284.

Digital Library

[2]

Yu Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 2016 43rd International Symposium on Computer Architecture (ISCA’16). IEEE, 367--379.

Digital Library

[3]

Tim Dettmers. 2016. 8-bit approximations for parallelism in deep learning. In 4th International Conference on Learning Representations (ICLR'16). San Juan, Puerto Rico.

[4]

C. Farabet, B. Martini, P. Akselrod, S. Talay, Y. LeCun, and E. Culurciello. 2010. Hardware accelerated convolutional neural networks for synthetic vision systems. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems. 257--260.

[5]

Yao Fu, Ephrem Wu, Ashish Sirasao, Sedny Attia, Kamran Khan, and Ralph Wittig. 2016. Deep learning with INT8 optimization on Xilinx devices white paper (WP485). 486, WP486 (v1.0.1) (2016), 1--11.

[6]

Martin Fürer. 2007. Faster integer multiplication. In Proceedings of the 39th Annual ACM Symposium on the Theory of Computing. 57--66.

Digital Library

[7]

Sridhar Gangadharan and Sanjay Churiwala. 2015. Constraining Designs for Synthesis and Timing Analysis: A Practical Guide to Synopsys Design Constraints (SDC). Springer.

Digital Library

[8]

J. Garland and D. Gregg. 2017. Low complexity multiply accumulate unit for weight-sharing convolutional neural networks. IEEE Comput. Arch. Lett. 16, 2 (Jul. 2017), 132--135.

[9]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning. 1737--1746.

Digital Library

[10]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. 243--254.

Digital Library

[11]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR'16).

[12]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.

[13]

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Sign. Process. Mag. 29, 6 (Nov. 2012), 82--97.

[14]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’12). Curran Associates Inc., 1097--1105. http://dl.acm.org/citation.cfm?id=2999134.2999257

Digital Library

[15]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 4 (Dec. 1989), 541--551.

Digital Library

[16]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278--2324.

[17]

Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 45--54.

Digital Library

[18]

S. Sabeetha, J. Ajayan, S. Shriram, K. Vivek, and V. Rajesh. 2015. A study of performance comparison of digital multipliers using 22nm strained silicon technology. In Proceedings of the 2015 2nd International Conference on Electronics and Communication Systems (ICECS’15). 180--184.

[19]

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and application to data-parallel distributed training of speech DNNs. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech’14).

[20]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556

[21]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1--9.

[22]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on the FPGAs. 161--170.

Digital Library

Cited By

Zacchigna FLew SLutenberg A(2024)Flexible Quantization for Efficient Convolutional Neural NetworksElectronics10.3390/electronics1310192313:10(1923)Online publication date: 14-May-2024
https://doi.org/10.3390/electronics13101923
Wagle ASingh GKhatri SVrudhula S(2024)An ASIC Accelerator for QNN With Variable Precision and Tunable Energy EfficiencyIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335759743:7(2057-2070)Online publication date: 24-Jan-2024
https://dl.acm.org/doi/10.1109/TCAD.2024.3357597
Nagornov NLyakhov PBergerman MKalita D(2024)Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image ProcessingIEEE Access10.1109/ACCESS.2024.338149312(44659-44681)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3381493
Show More Cited By

Index Terms

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing
1. Hardware

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA
Applied Reconfigurable Computing. Architectures, Tools, and Applications
Abstract
Deep neural networks (DNNs) are prevalent for many applications related to classification, prediction and regression. To perform different applications with better performance and accuracy, an optimized network architecture is required, which can ...
Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 15, Issue 3

September 2018

322 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3274266

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2018

Accepted: 01 June 2018

Revised: 01 May 2018

Received: 01 January 2018

Published in TACO Volume 15, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Science Foundation Ireland

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
2,829
Total Downloads

Downloads (Last 12 months)607
Downloads (Last 6 weeks)68

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zacchigna FLew SLutenberg A(2024)Flexible Quantization for Efficient Convolutional Neural NetworksElectronics10.3390/electronics1310192313:10(1923)Online publication date: 14-May-2024
https://doi.org/10.3390/electronics13101923
Wagle ASingh GKhatri SVrudhula S(2024)An ASIC Accelerator for QNN With Variable Precision and Tunable Energy EfficiencyIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335759743:7(2057-2070)Online publication date: 24-Jan-2024
https://dl.acm.org/doi/10.1109/TCAD.2024.3357597
Nagornov NLyakhov PBergerman MKalita D(2024)Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image ProcessingIEEE Access10.1109/ACCESS.2024.338149312(44659-44681)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3381493
Kwon YKang HPyo J(2024)Estimation of aquatic ecosystem health using deep neural network with nonlinear data mappingEcological Informatics10.1016/j.ecoinf.2024.10258881(102588)Online publication date: Jul-2024
https://doi.org/10.1016/j.ecoinf.2024.102588
Vishwakarma SRaut GJaiswal SVishvakarma SGhai D(2024)A Precision-Aware Neuron Engine for DNN AcceleratorsSN Computer Science10.1007/s42979-024-02851-z5:5Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1007/s42979-024-02851-z
Lee TNa YKim BLee SChoi Y(2023)Identification of Individual Hanwoo Cattle by Muzzle Pattern Images through Deep LearningAnimals10.3390/ani1318285613:18(2856)Online publication date: 8-Sep-2023
https://doi.org/10.3390/ani13182856
Takano KYajima TKawakami S(2023)Design of The Ultra-Low-Power Driven VMM Configurations for μW Scale IoT Devices2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00018(65-72)Online publication date: 18-Dec-2023
https://doi.org/10.1109/MCSoC60832.2023.00018
Rehman AP. MB.S. P(2023)FPGA-Based Efficient MLP Neural Network for Digit Recognition2023 International Conference on Integration of Computational Intelligent System (ICICIS)10.1109/ICICIS56802.2023.10430242(1-7)Online publication date: 1-Nov-2023
https://doi.org/10.1109/ICICIS56802.2023.10430242
Khataei ABazargan K(2023)Constant Coefficient Multipliers Using Self-Similarity-Based Hybrid Binary-Unary Computing2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323844(1-7)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323844
Ramezanzad ARezaei MNikmehr HKalbasi M(2023)Real-time approximate and combined 2D convolvers for FPGA-based image processingThe Journal of Supercomputing10.1007/s11227-023-05377-y79:16(18910-18946)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s11227-023-05377-y
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents