research-article

NAAS: Neural Accelerator Architecture Search

Authors:

Song HanAuthors Info & Claims

2021 58th ACM/IEEE Design Automation Conference (DAC)

Pages 1051 - 1056

https://doi.org/10.1109/DAC18074.2021.9586250

Published: 05 December 2021 Publication History

Abstract

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity. Previous frameworks focus on sizing the numerical architectural hyper-parameters while neglect searching the PE connectivities and compiler mappings. To tackle this challenge, we propose Neural Accelerator Architecture Search (NAAS) that holistically searches the neural network architecture, accelerator architecture and compiler mapping in one optimization loop. NAAS composes highly matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals the human design Eyeriss by $4.4 \times$ EDP reduction with 2.7% accuracy improvement on ImageNet under the same computation resource, and offers $1.4 \times$ to $3.5 \times$ EDP reduction than only sizing the architectural hyper-parameters.

References

[1]

H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” in ICLR, 2019.

[2]

Q. Lu et al., “On neural architecture search for resource-constrained hardware platforms,” 2019.

[3]

K. Wang et al., “HAQ: Hardware-aware automated quantization with mixed precision,” in CVPR, 2019, pp. 8612–8620.

[4]

H. Cai et al., “Once for all: Train one network and specialize it for efficient deployment,” in ICLR, 2020.

[5]

T. Chen et al., “Learning to optimize tensor programs,” in NeurIPS, 2018, pp. 3389–3400.

[6]

S.-C. Kao and T. Krishna, “Gamma: Mapping space exploration via genetic algorithm,” in ICCAD’20, 2020.

[7]

C. Hao et al., “FPGA/DNN co-design: An efficient design methodology for 1ot intelligence on the edge,” in DAC, 2019, pp. 1–6.

[8]

Y. Li et al., “EDD: Efficient differentiable dnn architecture and implementation co-search for embedded ai solutions,” DAC, 2020.

[9]

X. Zhang et al., “Skynet: a hardware-efficient method for object detection and tracking on embedded systems,” 2020.

[10]

S.-C. Kao et al., “Confuciux: Autonomous hardware resource assignment for dnn accelerators using reinforcement learning,” in MICRO, 2020.

[11]

L. Yang et al., “Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks,” 2020.

[12]

Y. Lin et al., “Neural hardware architecture search,” NeurIPS WS, 2019.

[13]

W. Jiang et al., “Hardware/software co-exploration of neural architectures,” TCAD, 2020.

[14]

Y.-H. Chen et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IJSSC, vol. 52, no. 1, 2016.

[15]

NVIDIA, “NVIDIA deep learning accelerator,” 2017. [Online]. Available: http://nvidia.org

[16]

J. Albericio et al., “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” ACM SIGARCH, vol. 44, no. 3, pp. 1–13, 2016.

Digital Library

[17]

N. Hansen, “The CMA evolution strategy: a comparing review,” in Towards a new evolutionary computation. Springer, 2006, pp. 75–102.

[18]

Z. Du et al., “Shidiannao: Shifting vision processing closer to the sensor,” in ISCA, 2015, pp. 92–104.

[19]

E. Strubell et al., “Energy and policy considerations for deep learning in nlp,” in ACL, 2019, pp. 3645–3650.

[20]

M. Motamedi et al., “Design space exploration of fpga-based deep convolutional neural networks,” in ASP-DAC. IEEE, 2016, pp. 575–580.

[21]

G. Zhong et al., “Design space exploration of fpga-based accelerators with multi-level parallelism,” in DATE. IEEE, 2017, pp. 1141–1146.

[22]

Y. Chen et al., “Cloud-dnn: An open framework for mapping dnn models to cloud fpgas,” in FPGA, 2019, pp. 73–82.

[23]

T. Chen et al., “TVM: An automated end-to-end optimizing compiler for deep learning,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 578–594.

[24]

J. Mu et al., “A history-based auto-tuning framework for fast and high-performance dnn design on gpu,” in 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6.

[25]

Y. S. Shao et al., “The aladdin approach to accelerator design and modeling,” IEEE Micro, vol. 35, no. 3, pp. 58–70, 2015.

Digital Library

[26]

Y. N. Wu and V. Sze, “Accelergy: An architecture-level energy estimation methodology for accelerator designs,” in TCAD, 2019.

[27]

A. Parashar et al., “Timeloop: A systematic approach to dnn accelerator evaluation,” in ISPASS. IEEE, 2019, pp. 304–315.

[28]

H. Kwon et al., “Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach,” in ISCA, 2019, pp. 754–768.

[29]

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in ICML, 2019, pp. 6105–6114.

[30]

B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in ICLR, 2017.

[31]

Z. Guo et al., “Single path one-shot neural architecture search with uniform sampling,” arXiv preprint, 2019.

Cited By

Yuan JYang CCai DWang SYuan XZhang ZLi XZhang DMei HJia XWang SXu MGanesan DShi W(2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649361
Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Rashidi BGao CLu SWang ZZhou CNiu DSun F(2023)UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network AccelerationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614282(77-90)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614282
Show More Cited By

Index Terms

NAAS: Neural Accelerator Architecture Search
1. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

A Training-free Genetic Neural Architecture Search
ACM ICEA '21: Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications

The so-called neural architecture search (NAS) provides an alternative way to construct a "good neural architecture," which would normally outperform hand-made architectures, for solving complex problems without domain knowledge. However, a critical ...
Graph neural architecture search
IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Graph neural networks (GNNs) emerged recently as a powerful tool for analyzing non-Euclidean data such as social network data. Despite their success, the design of graph neural networks requires heavy manual work and domain knowledge. In this paper, we ...
Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation Conference

Neural Architecture Search (NAS) is a promising approach to discover good neural network architectures for given applications. Among the three basic components in a NAS system (search space, search strategy, and evaluation), prior work mainly focused on ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2021 58th ACM/IEEE Design Automation Conference (DAC)

Dec 2021

1380 pages

Copyright © 2021.

Publisher

IEEE Press

Publication History

Published: 05 December 2021

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yuan JYang CCai DWang SYuan XZhang ZLi XZhang DMei HJia XWang SXu MGanesan DShi W(2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649361
Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Rashidi BGao CLu SWang ZZhou CNiu DSun F(2023)UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network AccelerationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614282(77-90)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614282
Tuli SLi CSharma RJha N(2023)CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkACM Transactions on Embedded Computing Systems10.1145/357579822:3(1-30)Online publication date: 20-Apr-2023
https://dl.acm.org/doi/10.1145/3575798
Cai HLin JLin YLiu ZTang HWang HZhu LHan S(2022)Enable Deep Learning on Mobile Devices: Methods, Systems, and ApplicationsACM Transactions on Design Automation of Electronic Systems10.1145/348661827:3(1-50)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3486618

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents