research-article

Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks

Authors:

Vivienne SzeAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 44, Issue 3

Pages 367 - 379

https://doi.org/10.1145/3007787.3001177

Published: 18 June 2016 Publication History

Abstract

Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still remains high as data movement can be more expensive than computation. Accordingly, finding a dataflow that supports parallel processing with minimal data movement cost is crucial to achieving energy-efficient CNN processing without compromising accuracy.

In this paper, we present a novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and feature map pixels, i.e., activations, in the high-dimensional convolutions, and minimizing data movement of partial sum accumulations. Unlike dataflows used in existing designs, which only reduce certain types of data movement, the proposed RS dataflow can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism. To evaluate the energy efficiency of the different dataflows, we propose an analysis framework that compares energy cost under the same hardware area and processing parallelism constraints. Experiments using the CNN configurations of AlexNet show that the proposed RS dataflow is more energy efficient than existing dataflows in both convolutional (1.4× to 2.5×) and fully-connected layers (at least 1.3× for batch size larger than 16). The RS dataflow has also been demonstrated on a fabricated chip, which verifies our energy analysis.

References

[1]

Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, 2015.

[2]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in NIPS, 2012.

Digital Library

[3]

K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," CoRR, vol. abs/1409.1556, 2014.

[4]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going Deeper With Convolutions," in IEEE CVPR, 2015.

[5]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE CVPR, 2016.

[6]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in IEEE CVPR, 2014.

Digital Library

[7]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks," CoRR, vol. abs/1312.6229, 2013.

[8]

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, "Learning Deep Features for Scene Recognition using Places Database," in NIPS, 2014.

Digital Library

[9]

Y. Le Cun, L. Jackel, B. Boser, J. Denker, H. Graf, I. Guyon, D. Henderson, R. Howard, and W. Hubbard, "Handwritten digit recognition: applications of neural network chips and automatic learning," IEEE Communications Magazine, vol. 27, no. 11, 1989.

Digital Library

[10]

J. Cong and B. Xiao, "Minimizing computation in convolutional neural networks," in ICANN, 2014.

[11]

B. Dally, "Power, Programmability, and Granularity: The Challenges of ExaScale Computing," in IEEE IPDPS, 2011.

Digital Library

[12]

M. Horowitz, "Computing's energy problem (and what we can do about it)," in IEEE ISSCC, 2014.

[13]

R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz, "Understanding Sources of Inefficiency in General-purpose Chips," in ISCA, 2010.

Digital Library

[14]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cuDNN: Efficient Primitives for Deep Learning," CoRR, vol. abs/1410.0759, 2014.

[15]

M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf, "A Massively Parallel Coprocessor for Convolutional Neural Networks," in IEEE ASAP, 2009.

Digital Library

[16]

V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, "Towards an embedded biologically-inspired machine vision processor," in FPT, 2010.

[17]

S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A Dynamically Configurable Coprocessor for Convolutional Neural Networks," in ISCA, 2010.

Digital Library

[18]

M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, "Memory-centric accelerator design for Convolutional Neural Networks," in IEEE ICCD, 2013.

[19]

V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," in IEEE CVPRW, 2014.

Digital Library

[20]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep Learning with Limited Numerical Precision," CoRR, vol. abs/1502.02551, 2015.

[21]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in FPGA, 2015.

Digital Library

[22]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning," in ASPLOS, 2014.

Digital Library

[23]

Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, "ShiDianNao: Shifting Vision Processing Closer to the Sensor," in ISCA, 2015.

Digital Library

[24]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, "DaDianNao: A Machine-Learning Supercomputer," in MICRO, 2014.

Digital Library

[25]

S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.-J. Yoo, "A 1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications," in IEEE ISSCC, 2015.

[26]

L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, "Origami: A Convolutional Network Accelerator," in GLSVLSI, 2015.

Digital Library

[27]

E. Mirsky and A. DeHon, "MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources," in IEEE FCCM, 1996.

[28]

J. R. Hauser and J. Wawrzynek, "Garp: a MIPS processor with a reconfigurable coprocessor," in IEEE FCCM, 1997.

Digital Library

[29]

B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, "ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix," in FPL, 2003.

[30]

A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel, R. Allmon, R. Rayess, S. Maresh, and J. Emer, "Triggered Instructions: A Control Paradigm for Spatially-programmed Architectures," in ISCA, 2013.

Digital Library

[31]

V. Govindaraju, C.-H. Ho, and K. Sankaralingam, "Dynamically Specialized Datapaths for Energy Efficient Computing," in IEEE HPCA, 2011.

Digital Library

[32]

S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers, "The WaveScalar Architecture," ACM TOCS, vol. 25, no. 2, 2007.

Digital Library

[33]

H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, and R. Reed Taylor, "PipeRench: A virtualized programmable datapath in 0.18 micron technology," in IEEE CICC, 2002.

[34]

D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder, "Scaling to the End of Silicon with EDGE Architectures," Computer, vol. 37, no. 7, 2004.

Digital Library

[35]

T. Nowatzki, V. Gangadhar, and K. Sankaralingam, "Exploring the Potential of Heterogeneous Von Neumann/Dataflow Execution Models," in ISCA, 2015.

Digital Library

[36]

Y. LeCun, K. Kavukcuoglu, and C. Farabet, "Convolutional networks and applications in vision," in IEEE ISCAS, 2010.

[37]

V. Nair and G. E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines," in ICML, 2010.

[38]

S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding," in ICLR, 2016.

[39]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional Architecture for Fast Feature Embedding," arXiv preprint arXiv:1408.5093, 2014.

[40]

W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. A. Horowitz, "Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing," in ISCA, 2013.

Digital Library

[41]

Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in IEEE ISSCC, 2016.

[42]

J. J. Tithi, N. C. Crago, and J. S. Emer, "Exploiting spatial architectures for edit distance algorithms," IEEE ISPASS, 2014.

[43]

K. T. Malladi, B. C. Lee, F. A. Nothaft, C. Kozyrakis, K. Periyathambi, and M. Horowitz, "Towards energy-proportional datacenter memory with mobile dram," in ISCA, 2012.

Digital Library

Cited By

Cheng XWang YDing WLou HLi P(2024)Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering DataflowElectronics10.3390/electronics1307121713:7(1217)Online publication date: 26-Mar-2024
https://doi.org/10.3390/electronics13071217
Liu YZhang YHao XChen LNi MChen MChen R(2024)Design of a Convolutional Neural Network Accelerator Based on On-Chip Data ReorderingElectronics10.3390/electronics1305097513:5(975)Online publication date: 4-Mar-2024
https://doi.org/10.3390/electronics13050975
Tong HHan KHan SLuo Y(2024)Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal BalanceElectronics10.3390/electronics1304076113:4(761)Online publication date: 14-Feb-2024
https://doi.org/10.3390/electronics13040761
Show More Cited By

Index Terms

Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Networks
  1. Network performance evaluation

Index terms have been assigned to the content through auto-classification.

Recommendations

Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in ...
Lossless-constraint Denoising based Auto-encoders

In this paper, we address the poor generalization ability problem of traditional auto-encoder on noise data, and propose a Lossless-constraint Denoising (LD) method, which can enhance the anti-noise ability and robustness of auto-encoders. We ...
Undecimated wavelet shrinkage estimate of the 1D and 2D spectra
ICASSP '00: Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

We study the problem of estimating the log-spectrum of a stationary Gaussian time series by thresholding the wavelet coefficients. We propose the use of the undecimated wavelet transform to denoise the log-periodogram. For this, we review a denoising ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 44, Issue 3

ISCA'16

June 2016

730 pages

ISSN:0163-5964

DOI:10.1145/3007787

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
June 2016
756 pages
ISBN:9781467389471
General Chairs:
Sang Lyul Min
Seoul National University
,
Gabriel Loh
AMD Research

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2016

Published in SIGARCH Volume 44, Issue 3

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,066
Total Citations
View Citations
6,077
Total Downloads

Downloads (Last 12 months)850
Downloads (Last 6 weeks)78

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheng XWang YDing WLou HLi P(2024)Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering DataflowElectronics10.3390/electronics1307121713:7(1217)Online publication date: 26-Mar-2024
https://doi.org/10.3390/electronics13071217
Liu YZhang YHao XChen LNi MChen MChen R(2024)Design of a Convolutional Neural Network Accelerator Based on On-Chip Data ReorderingElectronics10.3390/electronics1305097513:5(975)Online publication date: 4-Mar-2024
https://doi.org/10.3390/electronics13050975
Tong HHan KHan SLuo Y(2024)Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal BalanceElectronics10.3390/electronics1304076113:4(761)Online publication date: 14-Feb-2024
https://doi.org/10.3390/electronics13040761
Yang CLiu H(2024)Stable Low-Rank CP Decomposition for Compression of Convolutional Neural Networks Based on SensitivityApplied Sciences10.3390/app1404149114:4(1491)Online publication date: 12-Feb-2024
https://doi.org/10.3390/app14041491
Lin SNing SZhu HZhou TMorris CClayton SCherukara MChen RWang Z(2024)Neural network methods for radiation detectors and imagingFrontiers in Physics10.3389/fphy.2024.133429812Online publication date: 22-Feb-2024
https://doi.org/10.3389/fphy.2024.1334298
Chen CLi LSabry Aly M(2024)ViTA: A Highly Efficient Dataflow and Architecture for Vision Transformers2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546565(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546565
Zouzoula SMaleki MAzhar MTrancoso P(2024)Scratchpad Memory Management for Deep Learning AcceleratorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673115(629-639)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673115
Xiang YWu ZYao HXiong XYang F(2024)Aries: A DNN Inference Scheduling Framework for Multi-core AcceleratorsProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670136(186-191)Online publication date: 24-May-2024
https://dl.acm.org/doi/10.1145/3670105.3670136
Lai CZhang W(2024)gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-ChipsACM Transactions on Design Automation of Electronic Systems10.1145/3661997Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3661997
Tsoukas VGkogkidis ABoumpa EKakarountas A(2024)A Review on the emerging technology of TinyMLACM Computing Surveys10.1145/3661820Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3661820
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents