research-article

NNest: Early-Stage Design Space Exploration Tool for Neural Network Inference Accelerators

Authors:

Xuan ZhangAuthors Info & Claims

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

Article No.: 4, Pages 1 - 6

https://doi.org/10.1145/3218603.3218647

Published: 23 July 2018 Publication History

Abstract

Deep neural network (DNN) has achieved spectacular success in recent years. In response to DNN's enormous computation demand and memory footprint, numerous inference accelerators have been proposed. However, the diverse nature of DNNs, both at the algorithm level and the parallelization level, makes it hard to arrive at an "one-size-fits-all" hardware design. In this paper, we develop NNest, an early-stage design space exploration tool that can speedily and accurately estimate the area/performance/energy of DNN inference accelerators based on high-level network topology and architecture traits, without the need for low-level RTL codes. Equipped with a generalized spatial architecture framework, NNest is able to perform fast high-dimensional design space exploration across a wide spectrum of architectural/micro-architectural parameters. Our proposed novel date movement strategies and multi-layer fitting schemes allow NNest to more effectively exploit parallelism inherent in DNN. Results generated by NNest demonstrate: 1) previously-undiscovered accelerator design points that can outperform state-of-the-art implementation by 39.3% in energy efficiency; 2) Pareto frontier curves that comprehensively and quantitatively reveal the multi-objective tradeoffs in custom DNN accelerators; 3) holistic design exploration of different level of quantization techniques including recently-proposed binary neural network (BNN).

References

[1]

A. Krizhevsky, et al. 2012. ImageNet classification with deep convolutional neural networks. NIPS (2012), 1097--1105.

Digital Library

[2]

Brandon Reagen, et al. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. ISCA (2016).

Digital Library

[3]

C. Zhang, et al. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. FPGA (2015), 161--170.

Digital Library

[4]

Chen Zhang, et al. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. ICCAD (2016).

Digital Library

[5]

Eric Chung, et al. 2017. Accelerating Persistent Neural Networks at Datacenter Scale. In Hotchip.

[6]

H. Sharma, et al. 2016. From High-Level Deep Neural Models to FPGAs. MICRO (2016).

Digital Library

[7]

K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. ICLR (2015).

[8]

M. Sankaradas, et al. 2009. A massively parallel coprocessor for convolutional neural networks. ASAP (2009), 53--60.

Digital Library

[9]

Manuele Rusci, et al. 2017. Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity? arXiv (2017).

[10]

Mohammad Motamedi, et al. 2016. Design Space Exploration of FPGA Based Deep Convolutional Neural Networks. ASP-DAC (2016).

[11]

Naveen Muralimanohar, et al. {n. d.}. Cacti. ({n. d.}).

[12]

S. Chakradhar, et al. 2010. A dynamically configurable coprocessor for convolutional neural networks. ISCA (2010), 247--257.

Digital Library

[13]

S. Gupta, et al. 2015. Deep learning with limited numerical precision. ICML (2015), 1737--1746.

Digital Library

[14]

Song Han, et al. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. ISCA (2016).

Digital Library

[15]

T. Chen, et al. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ASPLOS (2014), 269--284.

Digital Library

[16]

Tien-Ju Yang, et al. 2017. A Method to Estimate the Energy Consumption of Deep Neural Networks. Asilomar (2017).

[17]

V. Gokhale, et al. 2014. A 240 G-ops/s mobile coprocessor for deep neural networks. CVPR Workshop (2014), 682--687.

Digital Library

[18]

V. Sriram, et al. 2010. Towards an embedded biologically inspired machine vision processor. FPT (2010), 273--278.

[19]

Vivienne Sze, et al. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. arXiv (2017).

[20]

Y. Chen, et al. 2014. DaDianNao: A machinelearning supercomputer. MICRO (2014), 609--622.

Digital Library

[21]

Y.-H. Chen, et al. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. ISSCC (2016), 262--263.

Digital Library

[22]

Yakun Sophia Shao, et al. 2014. Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. ISCA (2014).

[23]

Yijin Guan, et al. 2016. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. ICCAD (2016).

[24]

Z. Du, et al. 2015. ShiDianNao: Shifting vision processing closer to the sensor. ISCA (2015), 92--104.

Digital Library

Cited By

Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Zheng SChen SGao SJia LSun GWang RLiang Y(2023)TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based AnalysisProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623792(1271-1288)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623792
Mishra RGupta H(2023)Transforming Large-Size to Lightweight Deep Neural Networks for IoT ApplicationsACM Computing Surveys10.1145/357095555:11(1-35)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3570955
Show More Cited By

Index Terms

NNest: Early-Stage Design Space Exploration Tool for Neural Network Inference Accelerators
1. Hardware
  1. Electronic design automation
    1. Modeling and parameter extraction

Recommendations

Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures

Multicore architectures provide scalable performance with a lower hardware design effort than single core processors. Our article presents a design methodology and an embedded multicore architecture, focusing on reducing the software design complexity ...
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
SBAC-PAD '13: Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance Computing

This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, ...
Programming the Linpack benchmark for the IBM PowerXCell 8i processor
High Performance Computing with the Cell Broadband Engine

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i ¹ processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™ ² architecture ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

July 2018

327 pages

ISBN:9781450357043

DOI:10.1145/3218603

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ISLPED '18

Sponsor:

SIGDA

ISLPED '18: International Symposium on Low Power Electronics and Design

July 23 - 25, 2018

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Upcoming Conference

ISLPED '24

Sponsor:
sigda

ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

Newport Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
606
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Zheng SChen SGao SJia LSun GWang RLiang Y(2023)TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based AnalysisProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623792(1271-1288)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623792
Mishra RGupta H(2023)Transforming Large-Size to Lightweight Deep Neural Networks for IoT ApplicationsACM Computing Surveys10.1145/357095555:11(1-35)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3570955
Liu SLi XZhou ZGuo BZhang MShen HYu Z(2023)AdaEnlightProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35694646:4(1-26)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3569464
Huang CLu JChen YTung MChang SChinnery DHui-Ru Jiang I(2023)Optimization of AI SoC with Compiler-assisted Virtual Design PlatformProceedings of the 2023 International Symposium on Physical Design10.1145/3569052.3578930(187-193)Online publication date: 26-Mar-2023
https://dl.acm.org/doi/10.1145/3569052.3578930
Chou SHsiao TJou JHuang J(2023)An Evaluation and Architecture Exploration Engine for CNN Accelerators through Extensive Dataflow Analysis2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC57769.2023.10321934(1-6)Online publication date: 16-Oct-2023
https://doi.org/10.1109/VLSI-SoC57769.2023.10321934
Wu YTsai PParashar ASze VEmer J(2022)Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00096(1377-1395)Online publication date: Oct-2022
https://doi.org/10.1109/MICRO56248.2022.00096
Yi XYu JWu ZXiong XXu DChen CTao JYang F(2022)NNASIM: An Efficient Event-Driven Simulator for DNN Accelerators with Accurate Timing and Area Models2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937422(2806-2810)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937422
Dariol QLe Nours SPillement SStemmer RHelms DGrüttner K(2022)A Hybrid Performance Prediction Approach for Fully-Connected Artificial Neural Networks on Multi-core PlatformsEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-15074-6_16(250-263)Online publication date: 3-Jul-2022
https://dl.acm.org/doi/10.1007/978-3-031-15074-6_16
Yu YLi YChe SJha NZhang W(2021)Software-Defined Design Space Exploration for an Efficient DNN Accelerator ArchitectureIEEE Transactions on Computers10.1109/TC.2020.298369470:1(45-56)Online publication date: 1-Jan-2021
https://doi.org/10.1109/TC.2020.2983694
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents