Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3205289.3205295acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator

Published: 12 June 2018 Publication History

Abstract

While negative inputs for ReLU are useless, it consumes a lot of computing power to calculate them for deep neural networks. We propose a computation pruning technique that detects at an early stage that the result of a sum of products will be negative by adopting an inverted two's complement expression for weights and a bit-serial sum of products. Therefore, it can skip a large amount of computations for negative results and simply set the ReLU outputs to zero. Moreover, we devise a DNN accelerator architecture that can efficiently apply the proposed technique. The evaluation shows that the accelerator using the computation pruning through early negative detection technique significantly improves the energy efficiency and the performance.

References

[1]
Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In Proceedings of the International Symposium on Microarchitecture. ACM, 382--394.
[2]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In Proceedings of the International Symposium on Computer Architecture. IEEE, 1--13.
[3]
Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, et al. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 2 (2013), 13.
[4]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.
[5]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the International Symposium on Microarchitecture. IEEE, 609--622.
[6]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.
[7]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (2012), 994--1007.
[8]
A Driskill-Smith, D Apalkov, V Nikitin, X Tang, S Watts, D Lottis, K Moon, A Khvalkovskiy, R Kawakami, X Luo, et al. 2011. Latest advances and roadmap for in-plane and perpendicular STT-RAM. In Proceedings of the International Memory Workshop. IEEE, 1--3.
[9]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and efficient neural network acceleration with 3D memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 751--764.
[10]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 580--587.
[11]
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.
[12]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the International Symposium on Microarchitecture. IEEE, 1--12.
[13]
Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design & Test (2017).
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[15]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.
[16]
Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Computer Architecture Letters 10, 1 (2011), 16--19.
[17]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.
[18]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[19]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE, 239--249.
[20]
Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the ACM International Conference on Multimedia. ACM, 689--692.

Cited By

View all
  • (2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
  • (2024)Neural Network Acceleration Using Digit-Plane Computation with Early Termination2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558491(1-4)Online publication date: 19-May-2024
  • (2023)BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural NetworksACM Transactions on Embedded Computing Systems10.1145/360909322:5s(1-24)Online publication date: 31-Oct-2023
  • Show More Cited By

Index Terms

  1. ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICS '18: Proceedings of the 2018 International Conference on Supercomputing
      June 2018
      407 pages
      ISBN:9781450357838
      DOI:10.1145/3205289
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 June 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Early negative detection
      2. ReLU
      3. accelerator
      4. deep neural network

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICS '18
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 629 of 2,180 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)25
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 06 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
      • (2024)Neural Network Acceleration Using Digit-Plane Computation with Early Termination2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558491(1-4)Online publication date: 19-May-2024
      • (2023)BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural NetworksACM Transactions on Embedded Computing Systems10.1145/360909322:5s(1-24)Online publication date: 31-Oct-2023
      • (2023)Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output ActivationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332493434:12(3253-3265)Online publication date: Dec-2023
      • (2023)A Speculative Computation Approach for Energy-Efficient Deep Neural NetworkIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318356142:3(795-806)Online publication date: Mar-2023
      • (2023)DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00098(686-692)Online publication date: 6-Sep-2023
      • (2022)LRP: Predictive output activation based on SVD approach for CNN s acceleration2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774744(831-836)Online publication date: 14-Mar-2022
      • (2022)DOTA: detect and omit weak attentions for scalable transformer accelerationProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507738(14-26)Online publication date: 28-Feb-2022
      • (2022)ComPreEND: Computation Pruning through Predictive Early Negative Detection for ReLU in a Deep Neural Network AcceleratorIEEE Transactions on Computers10.1109/TC.2021.309220571:7(1537-1550)Online publication date: 1-Jul-2022
      • (2022)Methods for Pruning Deep Neural NetworksIEEE Access10.1109/ACCESS.2022.318265910(63280-63300)Online publication date: 2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media