research-article

ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator

Authors:

Kiyoung ChoiAuthors Info & Claims

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

Pages 139 - 148

https://doi.org/10.1145/3205289.3205295

Published: 12 June 2018 Publication History

Abstract

While negative inputs for ReLU are useless, it consumes a lot of computing power to calculate them for deep neural networks. We propose a computation pruning technique that detects at an early stage that the result of a sum of products will be negative by adopting an inverted two's complement expression for weights and a bit-serial sum of products. Therefore, it can skip a large amount of computations for negative results and simply set the ReLU outputs to zero. Moreover, we devise a DNN accelerator architecture that can efficiently apply the proposed technique. The evaluation shows that the accelerator using the computation pruning through early negative detection technique significantly improves the energy efficiency and the performance.

References

[1]

Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In Proceedings of the International Symposium on Microarchitecture. ACM, 382--394.

Digital Library

[2]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In Proceedings of the International Symposium on Computer Architecture. IEEE, 1--13.

Digital Library

[3]

Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, et al. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 2 (2013), 13.

Digital Library

[4]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.

Digital Library

[5]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the International Symposium on Microarchitecture. IEEE, 609--622.

Digital Library

[6]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.

[7]

Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (2012), 994--1007.

Digital Library

[8]

A Driskill-Smith, D Apalkov, V Nikitin, X Tang, S Watts, D Lottis, K Moon, A Khvalkovskiy, R Kawakami, X Luo, et al. 2011. Latest advances and roadmap for in-plane and perpendicular STT-RAM. In Proceedings of the International Memory Workshop. IEEE, 1--3.

[9]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and efficient neural network acceleration with 3D memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 751--764.

Digital Library

[10]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 580--587.

Digital Library

[11]

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.

[12]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the International Symposium on Microarchitecture. IEEE, 1--12.

Digital Library

[13]

Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design & Test (2017).

[14]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[15]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.

[16]

Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Computer Architecture Letters 10, 1 (2011), 16--19.

Digital Library

[17]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[18]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[19]

Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE, 239--249.

[20]

Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the ACM International Conference on Multimedia. ACM, 689--692.

Digital Library

Cited By

Ibrahim MUsman MLee J(2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
https://doi.org/10.3390/electronics13101893
Hsiao SKuo HKuo YChen K(2024)Neural Network Acceleration Using Digit-Plane Computation with Early Termination2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558491(1-4)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10558491
Pan YYu JLukefahr ADas RMahlke S(2023)BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural NetworksACM Transactions on Embedded Computing Systems10.1145/360909322:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609093
Show More Cited By

Index Terms

ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
  2. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

In-Datacenter Performance Analysis of a Tensor Processing Unit
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates ...
A Neurosymbolic Approach to the Verification of Temporal Logic Properties of Learning-enabled Control Systems
ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)

Signal Temporal Logic (STL) has become a popular tool for expressing formal requirements of Cyber-Physical Systems (CPS). The problem of verifying STL properties of neural network-controlled CPS remains a largely unexplored problem. In this paper, we ...
Symmetric Rectified Linear Units for Fully Connected Deep Models
Knowledge Science, Engineering and Management
Abstract
Rectified Linear Units (ReLU) is one of the key aspects for the success of Deep Learning models. It has been shown that deep networks can be trained efficiently using ReLU without pre-training. In this paper, we compare and analyze various kinds ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

June 2018

407 pages

ISBN:9781450357838

DOI:10.1145/3205289

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICS '18

Sponsor:

SIGARCH

ICS '18: 2018 International Conference on Supercomputing

June 12 - 15, 2018

Beijing, China

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
307
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ibrahim MUsman MLee J(2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
https://doi.org/10.3390/electronics13101893
Hsiao SKuo HKuo YChen K(2024)Neural Network Acceleration Using Digit-Plane Computation with Early Termination2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558491(1-4)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10558491
Pan YYu JLukefahr ADas RMahlke S(2023)BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural NetworksACM Transactions on Embedded Computing Systems10.1145/360909322:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609093
Fan ZLi WWang ZLiu TWu HLiu YWu MWu XYe XFan DSun NAn X(2023)Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output ActivationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332493434:12(3253-3265)Online publication date: Dec-2023
https://doi.org/10.1109/TPDS.2023.3324934
Zheng RKo YLiu T(2023)A Speculative Computation Approach for Energy-Efficient Deep Neural NetworkIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318356142:3(795-806)Online publication date: Mar-2023
https://doi.org/10.1109/TCAD.2022.3183561
Ibrahim MUsman MNisar MLee J(2023)DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00098(686-692)Online publication date: 6-Sep-2023
https://doi.org/10.1109/DSD60849.2023.00098
Wu XFan ZLiu TLi WYe XFant D(2022)LRP: Predictive output activation based on SVD approach for CNN s acceleration2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774744(831-836)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774744
Qu ZLiu LTu FChen ZDing YXie YFalsafi BFerdman MLu SWenisch T(2022)DOTA: detect and omit weak attentions for scalable transformer accelerationProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507738(14-26)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507738
Kim NPark HLee DKang SLee JChoi K(2022)ComPreEND: Computation Pruning through Predictive Early Negative Detection for ReLU in a Deep Neural Network AcceleratorIEEE Transactions on Computers10.1109/TC.2021.309220571:7(1537-1550)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TC.2021.3092205
Vadera SAmeen S(2022)Methods for Pruning Deep Neural NetworksIEEE Access10.1109/ACCESS.2022.318265910(63280-63300)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3182659
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten