research-article

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Authors:

Angshuman Parashar,

Anurag Mukkara,

Antonio Puglielli,

Rangharajan Venkatesan,

Brucek Khailany,

Stephen W. Keckler,

William J. DallyAuthors Info & Claims

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Pages 27 - 40

https://doi.org/10.1145/3079856.3080254

Published: 24 June 2017 Publication History

Abstract

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs, especially in mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improves performance and energy efficiency by exploiting the zero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator. Specifically, SCNN employs a novel dataflow that enables maintaining the sparse weights and activations in a compressed encoding, which eliminates unnecessary data transfers and reduces storage requirements. Furthermore, the SCNN dataflow facilitates efficient delivery of those weights and activations to a multiplier array, where they are extensively reused; product accumulation is performed in a novel accumulator array. On contemporary neural networks, SCNN can improve both performance and energy by a factor of 2.7x and 2.3x, respectively, over a comparably provisioned dense CNN accelerator.

References

[1]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing. In Proceedings of the International Symposium on Computer Architecture (ISCA). 1--13.

Digital Library

[2]

Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-Layer CNN Accelerators. In Proceedings of the International Symposium on Microarchitecture (MICRO).

[3]

Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2015. Deep Speech 2: End-To-End Speech Recognition in English and Mandarin. https://arxiv.org/abs/1512.02595. (2015).

[4]

Caffe 2016. Caffe. http://caffe.berkeleyvision.org. (2016).

[5]

Caffe 2017. Caffe Model Zoo. https://github.com/BVLC/caffe/wiki/Model-Zoo. (2017).

[6]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS). 269--284.

Digital Library

[7]

Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA). 367--379.

Digital Library

[8]

Yu-Hsin Chen, Tushar Krishna, Joel Emer, and Vivienne Sze. 2016. Eyeriss: An Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In Proceedings of the International Solid State Circuits Conference (ISSCC).

[9]

Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) From Scratch. https://arxiv.org/abs/1103.0398. (2011).

[10]

Jason Cong and Bingjun Xiao. 2014. Minimizing Computation in Convolutional Neural Networks. In Proceedings of the International Conference on Artificial Neural Networks (ICANN). 281--290.

[11]

Gregory Diamos, Shubho Sengupta, Bryan Catanzaro, Mike Chrzanowski, Adam Coates, Erich Elsen, Jesse Engel, Awni Hannun, and Sanjeev Satheesh. 2016. Persistent RNNs: Stashing Recurrent Weights On-Chip. In Proceedings of the International Conference on Machine Learning (ICML).

Digital Library

[12]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In Proceedings of the International Symposium on Computer Architecture (ISCA). 92--104.

Digital Library

[13]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS). 751--764.

Digital Library

[14]

Alex Graves and Jurgen Schmidhuber. 2005. Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures. In Neural Networks.

Digital Library

[15]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, and Bill Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the International Symposium on Computer Architecture (ISCA). 243--254.

Digital Library

[16]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. https://arxiv.org/abs/1510.00149. (2015).

[17]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 1135--1143.

Digital Library

[18]

Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling Up End-To-End Speech Recognition. https://arxiv.org/abs/1412.5567. (2014).

[19]

Kaming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://arxiv.org/abs/1512.03385. (2015).

[20]

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Weinberger. 2016. Deep Networks with Stochastic Depth. https://arxiv.org/abs/1603.09382. (2016).

[21]

ImageNet. 2016. http://image-net.org. (2016).

[22]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).

Digital Library

[23]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521 (May 2015), 436--444.

[24]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).

[25]

Grant Martin and Gary Smith. 2009. High-Level Synthesis: Past, Present, and Future. IEEE Design & Test of Computers 26, 4 (July/August 2009), 18--25.

Digital Library

[26]

Mentor 2017. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis. (2017).

[27]

NVIDIA 2016. NVIDIA cuDNN. https://developer.nvidia.com/cudnn. (2016).

[28]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Saekyu Lee, Jose Miguel Hernandez Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators. In Proceedings of the International Symposium on Computer Architecture (ISCA). 267--278.

Digital Library

[29]

Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design. In Proceedings of the International Symposium on Microarchitecture (MICRO).

[30]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556. (May 2015).

[31]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).

[32]

Ganesh Venkatesh, Eriko Nurvitadhi, and Debbie Marr. 2016. Accelerating Deep Convolutional Networks Using Low-precision and Sparsity. https://arxiv.org/abs/1610.00324. (2016).

[33]

Richard W. Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. University of California, Berkeley.

Digital Library

[34]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An Accelerator for Sparse Neural Networks. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Cited By

Gong YZhang XLu JJiang XWang ZLiu HLi ZWang LYang QWu X(2025)Steering Angle-Guided Multimodal Fusion Lane Detection for Autonomous DrivingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.350753626:2(1470-1481)Online publication date: Feb-2025
https://doi.org/10.1109/TITS.2024.3507536
Tang MWen MYang JXue ZShen J(2025)SPSA: Exploring Sparse-Packing Computation on Systolic Arrays From ScratchIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343435944:2(497-511)Online publication date: Feb-2025
https://doi.org/10.1109/TCAD.2024.3434359
Song JLiang RYuan BHu J(2025)DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and DataflowIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342941944:1(251-265)Online publication date: Jan-2025
https://doi.org/10.1109/TCAD.2024.3429419
Show More Cited By

Index Terms

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
    2. Parallel architectures

Recommendations

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
ISCA'17

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs, especially in mobile platforms such as autonomous vehicles, cameras, ...
A CGRA-Based Approach for Accelerating Convolutional Neural Networks
MCSOC '15: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip

Convolutional neural network (CNN) is an emerging approach for achieving high recognition accuracy in various machine learning applications. To accelerate CNN computations, various GPU-based or application-specific hardware approaches have been recently ...
TNPU: an efficient accelerator architecture for training convolutional neural networks
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Training large scale convolutional neural networks (CNNs) is an extremely computation and memory intensive task that requires massive computational resources and training time. Recently, many accelerator solutions have been proposed to improve the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

June 2017

736 pages

ISBN:9781450348928

DOI:10.1145/3079856

ACM SIGARCH Computer Architecture News Volume 45, Issue 2
ISCA'17
May 2017
715 pages
ISSN:0163-5964
DOI:10.1145/3140659
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE: IEEE Computer Society Technical Committee on Design Automation
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ISCA '17

Sponsor:

IEEE
SIGARCH

ISCA '17: The 44th Annual International Symposium on Computer Architecture

June 24 - 28, 2017

ON, Toronto, Canada

Acceptance Rates

ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

912
Total Citations
View Citations
6,104
Total Downloads

Downloads (Last 12 months)723
Downloads (Last 6 weeks)59

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gong YZhang XLu JJiang XWang ZLiu HLi ZWang LYang QWu X(2025)Steering Angle-Guided Multimodal Fusion Lane Detection for Autonomous DrivingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.350753626:2(1470-1481)Online publication date: Feb-2025
https://doi.org/10.1109/TITS.2024.3507536
Tang MWen MYang JXue ZShen J(2025)SPSA: Exploring Sparse-Packing Computation on Systolic Arrays From ScratchIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343435944:2(497-511)Online publication date: Feb-2025
https://doi.org/10.1109/TCAD.2024.3434359
Song JLiang RYuan BHu J(2025)DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and DataflowIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342941944:1(251-265)Online publication date: Jan-2025
https://doi.org/10.1109/TCAD.2024.3429419
Cinà ADemontis ABattista BRoli FPelillo M(2025)Energy-Latency Attacks via Sponge PoisoningInformation Sciences10.1016/j.ins.2025.121905(121905)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2025.121905
Machireddy CChella S(2024)Reconfigurable Acceleration of Neural Networks: A Comprehensive Study of FPGA-based SystemsInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.55910:4Online publication date: 15-Nov-2024
https://doi.org/10.22399/ijcesen.559
Luo XLiu DKong HHuai SChen HXiong GLiu W(2024)Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future EnvisionACM Transactions on Embedded Computing Systems10.1145/370172824:1(1-100)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3701728
Hafezan MAtoofian E(2024)Transient Fault Detection in Tensor Cores for Modern GPUsACM Transactions on Embedded Computing Systems10.1145/368748323:5(1-29)Online publication date: 10-Aug-2024
https://dl.acm.org/doi/10.1145/3687483
Basak BDasgupta PPal A(2024)Efficient Low-Memory Implementation of Sparse CNNs Using Encoded Partitioned Hybrid Sparse FormatACM Transactions on Embedded Computing Systems10.1145/368723923:6(1-30)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1145/3687239
Wang XZhao BSu YZhang SYuan FZhang JMeng DHou R(2024)A Hybrid Sparse-dense Defensive DNN Accelerator Architecture against Adversarial Example AttacksACM Transactions on Embedded Computing Systems10.1145/367731823:5(1-28)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3677318
Lee CYeh T(2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.1145/3653363
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten