Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3079856.3080254acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Published: 24 June 2017 Publication History


Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs, especially in mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improves performance and energy efficiency by exploiting the zero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator. Specifically, SCNN employs a novel dataflow that enables maintaining the sparse weights and activations in a compressed encoding, which eliminates unnecessary data transfers and reduces storage requirements. Furthermore, the SCNN dataflow facilitates efficient delivery of those weights and activations to a multiplier array, where they are extensively reused; product accumulation is performed in a novel accumulator array. On contemporary neural networks, SCNN can improve both performance and energy by a factor of 2.7x and 2.3x, respectively, over a comparably provisioned dense CNN accelerator.


Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing. In Proceedings of the International Symposium on Computer Architecture (ISCA). 1--13.
Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-Layer CNN Accelerators. In Proceedings of the International Symposium on Microarchitecture (MICRO).
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2015. Deep Speech 2: End-To-End Speech Recognition in English and Mandarin. https://arxiv.org/abs/1512.02595. (2015).
Caffe 2016. Caffe. http://caffe.berkeleyvision.org. (2016).
Caffe 2017. Caffe Model Zoo. https://github.com/BVLC/caffe/wiki/Model-Zoo. (2017).
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS). 269--284.
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA). 367--379.
Yu-Hsin Chen, Tushar Krishna, Joel Emer, and Vivienne Sze. 2016. Eyeriss: An Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In Proceedings of the International Solid State Circuits Conference (ISSCC).
Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) From Scratch. https://arxiv.org/abs/1103.0398. (2011).
Jason Cong and Bingjun Xiao. 2014. Minimizing Computation in Convolutional Neural Networks. In Proceedings of the International Conference on Artificial Neural Networks (ICANN). 281--290.
Gregory Diamos, Shubho Sengupta, Bryan Catanzaro, Mike Chrzanowski, Adam Coates, Erich Elsen, Jesse Engel, Awni Hannun, and Sanjeev Satheesh. 2016. Persistent RNNs: Stashing Recurrent Weights On-Chip. In Proceedings of the International Conference on Machine Learning (ICML).
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In Proceedings of the International Symposium on Computer Architecture (ISCA). 92--104.
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS). 751--764.
Alex Graves and Jurgen Schmidhuber. 2005. Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures. In Neural Networks.
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, and Bill Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the International Symposium on Computer Architecture (ISCA). 243--254.
Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. https://arxiv.org/abs/1510.00149. (2015).
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 1135--1143.
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling Up End-To-End Speech Recognition. https://arxiv.org/abs/1412.5567. (2014).
Kaming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385. (2015).
Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Weinberger. 2016. Deep Networks with Stochastic Depth. https://arxiv.org/abs/1603.09382. (2016).
ImageNet. 2016. http://image-net.org. (2016).
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521 (May 2015), 436--444.
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
Grant Martin and Gary Smith. 2009. High-Level Synthesis: Past, Present, and Future. IEEE Design & Test of Computers 26, 4 (July/August 2009), 18--25.
Mentor 2017. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis. (2017).
NVIDIA 2016. NVIDIA cuDNN. https://developer.nvidia.com/cudnn. (2016).
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Saekyu Lee, Jose Miguel Hernandez Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators. In Proceedings of the International Symposium on Computer Architecture (ISCA). 267--278.
Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design. In Proceedings of the International Symposium on Microarchitecture (MICRO).
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556. (May 2015).
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
Ganesh Venkatesh, Eriko Nurvitadhi, and Debbie Marr. 2016. Accelerating Deep Convolutional Networks Using Low-precision and Sparsity. https://arxiv.org/abs/1610.00324. (2016).
Richard W. Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. University of California, Berkeley.
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An Accelerator for Sparse Neural Networks. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Cited By

View all
  • (2025)Steering Angle-Guided Multimodal Fusion Lane Detection for Autonomous DrivingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.350753626:2(1470-1481)Online publication date: Feb-2025
  • (2025)SPSA: Exploring Sparse-Packing Computation on Systolic Arrays From ScratchIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343435944:2(497-511)Online publication date: Feb-2025
  • (2025)DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and DataflowIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342941944:1(251-265)Online publication date: Jan-2025
  • Show More Cited By

Index Terms

  1. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks



      Information & Contributors


      Published In

      cover image ACM Conferences
      ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
      June 2017
      736 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 June 2017


      Request permissions for this article.

      Check for updates

      Author Tags

      1. Convolutional neural networks
      2. accelerator architecture


      • Research-article
      • Research
      • Refereed limited


      ISCA '17

      Acceptance Rates

      ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;
      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)723
      • Downloads (Last 6 weeks)59
      Reflects downloads up to 08 Feb 2025

      Other Metrics


      Cited By

      View all
      • (2025)Steering Angle-Guided Multimodal Fusion Lane Detection for Autonomous DrivingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.350753626:2(1470-1481)Online publication date: Feb-2025
      • (2025)SPSA: Exploring Sparse-Packing Computation on Systolic Arrays From ScratchIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343435944:2(497-511)Online publication date: Feb-2025
      • (2025)DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and DataflowIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342941944:1(251-265)Online publication date: Jan-2025
      • (2025)Energy-Latency Attacks via Sponge PoisoningInformation Sciences10.1016/j.ins.2025.121905(121905)Online publication date: Jan-2025
      • (2024)Reconfigurable Acceleration of Neural Networks: A Comprehensive Study of FPGA-based SystemsInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.55910:4Online publication date: 15-Nov-2024
      • (2024)Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future EnvisionACM Transactions on Embedded Computing Systems10.1145/370172824:1(1-100)Online publication date: 24-Oct-2024
      • (2024)Transient Fault Detection in Tensor Cores for Modern GPUsACM Transactions on Embedded Computing Systems10.1145/368748323:5(1-29)Online publication date: 10-Aug-2024
      • (2024)Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse FormatACM Transactions on Embedded Computing Systems10.1145/368723923:6(1-30)Online publication date: 22-Aug-2024
      • (2024)A Hybrid Sparse-dense Defensive DNN Accelerator Architecture against Adversarial Example AttacksACM Transactions on Embedded Computing Systems10.1145/367731823:5(1-28)Online publication date: 9-Jul-2024
      • (2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
      • Show More Cited By

      View Options

      Login options

      View options


      View or Download as a PDF file.



      View online with eReader.







      Share this Publication link

      Share on social media