research-article

ShiDianNao: shifting vision processing closer to the sensor

Authors:

Robert Fasthuber,

Olivier TemamAuthors Info & Claims

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 92 - 104

https://doi.org/10.1145/2749469.2750389

Published: 13 June 2015 Publication History

Abstract

In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications.

Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs.

In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60&times more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm² and consuming only 320mW, but still about 30× faster than high-end GPUs.

References

[1]

Berkeley Vision and Learning Center, "Caffe: a deep learning framework." Available: http://caffe.berkeleyvision.org/

[2]

S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in Proceedings of the 37th annual international symposium on Computer architecture (ISCA). New York, USA: ACM Press, 2010, pp. 247--257.

Digital Library

[3]

T. Chen, Z. Du, N. Sun, J. Wang, and C. Wu, "DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Salt Lake City, UT, USA, 2014, pp. 269--284.

Digital Library

[4]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, "DaDianNao: A Machine-Learning Supercomputer," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2015, pp. 609--622.

Digital Library

[5]

D. C. Cires, U. Meier, J. Masci, and L. M. Gambardella, "Flexible, High Performance Convolutional Neural Networks for Image Classification," in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCNN), 2003, pp. 1237--1242.

Digital Library

[6]

A. Coates, B. Huval, T. Wang, D. J. Wu, and A. Y. Ng, "Deep learning with COTS HPC systems," in Proceedings of the 30th International Conference on Machine Learning (ICML), 2013, pp. 1337--1345.

[7]

G. Dahl, T. Sainath, and G. Hinton, "Improving deep neural networks for LVCSR using rectified linear units and dropout," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'13), 2013, pp. 8609--8613.

[8]

S. A. Dawwd, "The multi 2D systolic design and implementation of Convolutional Neural Networks," in 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS). IEEE, Dec. 2013, pp. 221--224.

[9]

M. Delakis and C. Garcia, "Text Detection with Convolutional Neural Networks," in International Conference on Computer Vision Theory and Applications (VISAPP), 2008, pp. 290--294.

[10]

Z. Du, A. Lingamneni, Y. Chen, K. Palem, O. Temam, and C. Wu, "Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators," 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 201--206, Jan. 2014.

[11]

S. Duffner and C. Garcia, "Robust Face Alignment Using Convolutional Neural Networks," in International Conference on Computer Vision Theory and Applications (VISAPP), 2008, pp. 30--37.

[12]

H. Esmaeilzadeh, P. Saeedi, B. Araabi, C. Lucas, and S. Fakhraie, "Neural Network Stream Processing Core (NnSP) for Embedded Systems," 2006 IEEE International Symposium on Circuits and Systems (ISCS), pp. 2773--2776, 2006.

[13]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neural Acceleration for General-Purpose Approximate Programs," 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 449--460, Dec. 2012.

Digital Library

[14]

K. Fan, S. Mahlke, and A. Arbor, "Bridging the Computation Gap Between Programmable Processors and Hardwired Accelerators," in IEEE 15th International Symposium on High Performance Computer Architecture (HPCA), 2009, pp. 313--322.

[15]

C. Farabet, B. Martini, P. Akselrod, S. Talay, Y. LeCun, and E. Culurciello, "Hardware accelerated convolutional neural networks for synthetic vision systems," in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCS). IEEE, May 2010, pp. 257--260.

[16]

C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, "NeuFlow: A runtime reconfigurable dataflow processor for vision," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, Jun. 2011, pp. 109--116.

[17]

C. Garcia and M. Delakis, "Convolutional face finder: a neural architecture for fast and robust face detection." IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 26, no. 11, pp. 1408--23, Nov. 2004.

Digital Library

[18]

V. Gokhale, J. Jin, and A. Dundar, "A 240 G-ops/s mobile coprocessor for deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2014, pp. 682--687.

Digital Library

[19]

Google, "Google image search." Available: http://www.google.com/insidesearch/features/images/searchbyimage.htm

[20]

R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz, "Understanding sources of inefficiency in general-purpose chips," in Proceedings of Annual International Symposium on Computer Architecture (ISCA). New York, USA: ACM Press, 2010, p. 37.

Digital Library

[21]

S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1998.

Digital Library

[22]

V. Hecht, K. Ronner, and P. Pirsch, "An Advanced Programmable 2D-Convolution Chip for Real Time Image Processing," in IEEE International Sympoisum on Circuits and Systems (ISCS), 1991, pp. 1897--1900.

[23]

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," in arXiv: 1207.0580 (2012), pp. 1--18.

[24]

P. S. Huang, X. He, J. Gao, and L. Deng, "Learning deep structured semantic models for web search using clickthrough data," in International Conference on Information and Knowledge Management (CIKM), 2013, pp. 2333--2338.

Digital Library

[25]

P. Ienne, T. Cornu, and G. Kuhn, "Special-purpose digital hardware for neural networks: An architectural survey," Journal of VLSI Signal Processing, vol. 13, no. 1, pp. 5--25, 1996.

Digital Library

[26]

K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun, "What is the best multi-stage architecture for object recognition?" 2009 IEEE 12th International Conference on Computer Vision (ICCV), pp. 2146--2153, Sep. 2009.

[27]

S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. Sakauchi, "Traffic monitoring and accident detection at intersections," IEEE Transactions on Intelligent Transportation Systems, vol. 1, no. 2, pp. 108--118, Jun. 2000.

Digital Library

[28]

S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, "GPUs and the future of parallel computing," IEEE Micro, pp. 7--17, 2011.

Digital Library

[29]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Advances In Neural Information Processing Systems, pp. 1--9, 2012.

[30]

B. Kwolek, "Face detection using convolutional neural networks and Gabor filters," Artificial Neural Networks: Biological Inspirations (ICANN), pp. 551--556, 2005.

Digital Library

[31]

D. Larkin, A. Kinane, V. Muresan, and N. E. O'Connor, "An Efficient Hardware Architecture for a Neural Network Activation Function Generator," in Advances in Neural Networks, ser. Lecture Notes in Computer Science, vol. 3973. Springer, 2006, pp. 1319--1327.

Digital Library

[32]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, "An empirical evaluation of deep architectures on problems with many factors of variation," in International Conference on Machine Learning (ICML). New York, New York, USA: ACM Press, 2007, pp. 473--480.

Digital Library

[33]

S. Lawrence, C. L. Giles, a. C. Tsoi, and a. D. Back, "Face recognition: a convolutional neural-network approach." IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 98--113, Jan. 1997.

Digital Library

[34]

Q. V. Le, M. A. Ranzato, M. Devin, G. S. Corrado, and A. Y. Ng, "Building High-level Features Using Large Scale Unsupervised Learning," International Conference on Machine Learning (ICML), pp. 8595--8598, 2012.

[35]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.

[36]

Y. LeCun, K. Kavukcuoglu, and C. Farabet, "Convolutional networks and applications in vision," Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCS), pp. 253--256, May 2010.

[37]

J.-J. Lee and G.-Y. Song, "Super-Systolic Array for 2D Convolution," 2006 IEEE Region 10 Conference (TENCON), pp. 1--4, 2006.

[38]

S. Y. Lee and J. K. Aggarwal, "Parallel 2D convolution on a Mesh Connected Array Processor," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. PAMI-9, no. 4, pp. 590--594, 1987.

Digital Library

[39]

B. Liang and P. Dubey, "Recognition, Mining and Synthesis," Intel Technology Journal, vol. 09, no. 02, 2005.

[40]

D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Temam, X. Feng, X. Zhou, and Y. Chen, "Pudiannao: A polyvalent machine learning accelerator," in Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ), 2015, pp. 369--381.

Digital Library

[41]

V. Mnih and G. Hinton, "Learning to Label Aerial Images from Noisy Data," in Proceedings of the 29th International Conference on Machine Learning (ICML), 2012, pp. 567--574.

[42]

N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0," The 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 3--14, Dec. 2007.

Digital Library

[43]

J. Nagi, F. Ducatelle, G. A. D. Caro, D. Cires, U. Meier, A. Giusti, and L. M. Gambardella, "Max-Pooling Convolutional Neural Networks for Vision-based Hand Gesture Recognition," in IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011, pp. 342--347.

[44]

C. Nebauer, "Evaluation of convolutional neural networks for visual recognition." IEEE Transactions on Neural Networks, vol. 9, no. 4, pp. 685--96, Jan. 1998.

Digital Library

[45]

M. Peemen, A. a. a. Setio, B. Mesman, and H. Corporaal, "Memory-centric accelerator design for Convolutional Neural Networks," in International Conference on Computer Design (ICCD). IEEE, Oct. 2013, pp. 13--19.

[46]

C. Poulet, J. Y. Han, and Y. Lecun, "CNP: An FPGA-based processor for Convolutional Networks," in International Conference on Field Programmable Logic and Applications (FPL), vol. 1, no. 1, 2009, pp. 32--37.

[47]

M. Ranzato, F. J. Huang, Y. L. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Computer Vision and Pattern Recognition (CVPR). IEEE, Jun. 2007, pp. 1--8.

[48]

R. Salakhutdinov and G. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure," in AI and Statistics, ser. JMLR Workshop and Conference Proceedings, vol. 3, no. 5. Citeseer, 2007, pp. 412--419.

[49]

SAMSUNG, "SAMSUNG Gear2 Tech Specs," Samsung Electronics, 2014.

[50]

M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf, "A Massively Parallel Coprocessor for Convolutional Neural Networks," 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 53--60, Jul. 2009.

Digital Library

[51]

D. Scherer, H. Schulz, and S. Behnke, "Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors," in Artificial Neural Networks (ICANN), ser. Lecture Notes in Computer Science, K. Diamantaras, W. Duch, and L. Iliadis, Eds. Springer Berlin Heidelberg, 2010, vol. 6354, pp. 82--91.

Digital Library

[52]

P. Sermanet and Y. LeCun, "Traffic sign recognition with multi-scale Convolutional Networks," in International Joint Conference on Neural Networks (IJCNN). Ieee, Jul. 2011, pp. 2809--2813.

[53]

P. Y. Simard, D. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks applied to visual document analysis," in Seventh International Conference on Document Analysis and Recognition 2003 Proceedings (ICDAR), vol. 1. IEEE Comput. Soc, 2003, pp. 958--963.

Digital Library

[54]

T. Starner, "Project Glass: An Extension of the Self," IEEE Pervasive Computing, vol. 12, no. 2, pp. 14--16, Apr. 2013.

Digital Library

[55]

STV0986, 5 Megapixel mobile imaging processor (Data Brief), STMicroelectronics, Jan. 2007.

[56]

STV0987, 8 Megapixel mobile imaging processor (Data Brief), STMicroelectronics, Mar. 2013.

[57]

O. Temam, "A defect-tolerant accelerator for emerging high-performance applications," 2012 39th Annual International Symposium on Computer Architecture (ISCA), vol. 00, no. c, pp. 356--367, 2012.

Digital Library

[58]

H. Tkung and D. Wl, "Two-level pipelined systolic array for multidimensional convolution," Image and Vision Computing, vol. 1, no. 1, pp. 30--36, 1983.

[59]

V. Vanhoucke, A. Senior, and M. Z. Mao, "Improving the speed of neural networks on CPUs," in Deep Learning and Unsupervised Feature Learning Workshop, Neural Information Processing Systems Conference (NIPS), 2011.

[60]

G. Venkatesh, J. Sampson, N. Goulding-hotta, S. K. Venkata, M. B. Taylor, and S. Swanson, "QSCORES: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores Categories and Subject Descriptors," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2011, pp. 163--174.

Digital Library

[61]

S. Yehia, S. Girbal, H. Berry, and O. Temam, "Reconciling specialization and flexibility through compound circuits," in IEEE 15th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2009, pp. 277--288.

Cited By

Taheri NTabrizchi SRoohi A(2024)Intermittent-Aware Design Exploration of Systolic Array Using Various Non-Volatile Memory: A Comparative StudyMicromachines10.3390/mi1503034315:3(343)Online publication date: 29-Feb-2024
https://doi.org/10.3390/mi15030343
Mahmoud KNicolici N(2024)ALPRI-FI: A Framework for Early Assessment of Hardware Fault Resiliency of DNN AcceleratorsElectronics10.3390/electronics1316324313:16(3243)Online publication date: 15-Aug-2024
https://doi.org/10.3390/electronics13163243
Vezzoli MNel LBhardwaj KManohar RGokhale M(2024)Designing an Energy-Efficient Fully-Asynchronous Deep Learning Convolution Engine2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546579(1-2)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546579
Show More Cited By

Index Terms

ShiDianNao: shifting vision processing closer to the sensor
1. Hardware

Recommendations

ShiDianNao: shifting vision processing closer to the sensor
ISCA'15

In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications.

Still, both the energy ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU
DAC '12: Proceedings of the 49th Annual Design Automation Conference

Single-chip CPU/GPU architecture is being adopted in high-end (embedded) systems, e.g., smartphones and tablet PCs. Main memory subsystem is expected to consist of hybrid DRAM and phase-change RAM (PRAM) due to the difficulties in DRAM scaling. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

768 pages

ISBN:9781450334020

DOI:10.1145/2749469

General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ISCA '15

Sponsor:

IEEE TCCA
SIGARCH

ISCA '15: The 42nd Annual International Symposium on Computer Architecture

June 13 - 17, 2015

Oregon, Portland

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

794
Total Citations
View Citations
5,189
Total Downloads

Downloads (Last 12 months)346
Downloads (Last 6 weeks)27

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Taheri NTabrizchi SRoohi A(2024)Intermittent-Aware Design Exploration of Systolic Array Using Various Non-Volatile Memory: A Comparative StudyMicromachines10.3390/mi1503034315:3(343)Online publication date: 29-Feb-2024
https://doi.org/10.3390/mi15030343
Mahmoud KNicolici N(2024)ALPRI-FI: A Framework for Early Assessment of Hardware Fault Resiliency of DNN AcceleratorsElectronics10.3390/electronics1316324313:16(3243)Online publication date: 15-Aug-2024
https://doi.org/10.3390/electronics13163243
Vezzoli MNel LBhardwaj KManohar RGokhale M(2024)Designing an Energy-Efficient Fully-Asynchronous Deep Learning Convolution Engine2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546579(1-2)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546579
Garbay THachicha KDobias PPinna AHocine KDron WLusich PKhalis IGranado B(2024)ZIP-CNN: Design Space Exploration for CNN Implementation within a MCUACM Transactions on Embedded Computing Systems10.1145/369134324:1(1-26)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3691343
Weerasena HMishra P(2024)Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference AcceleratorsACM Transactions on Embedded Computing Systems10.1145/368800123:6(1-25)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3688001
Zhuang JLau JYe HYang ZJi SLo JDenolf KNeuendorffer SJones AHu JShi YChen DCong JZhou P(2024)CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/368616317:3(1-31)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3686163
Wei XWang CYue HTan JGuan ZJiang NZheng XZhao JQiu M(2024)ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error DetectionACM Transactions on Architecture and Code Optimization10.1145/367490921:3(1-26)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3674909
Zouzoula SMaleki MAzhar MTrancoso P(2024)Scratchpad Memory Management for Deep Learning AcceleratorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673115(629-639)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673115
Xiang YWu ZYao HXiong XYang F(2024)Aries: A DNN Inference Scheduling Framework for Multi-core AcceleratorsProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670136(186-191)Online publication date: 24-May-2024
https://dl.acm.org/doi/10.1145/3670105.3670136
Lee CYeh T(2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.1145/3653363
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents