research-article

A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCU

Authors:

Cheng ZhuoAuthors Info & Claims

Volume 88, Issue C

Pages 241 - 248

https://doi.org/10.1016/j.vlsi.2022.10.006

Published: 01 January 2023 Publication History

Abstract

Deep neural networks (DNNs) are widely used in modern AI systems, and their dedicated accelerators have become a promising option for edge scenarios due to the energy efficiency and high performance. Since the DNN model requires significant storage and computation resources, various energy efficient DNN accelerators or algorithms have been proposed for edge devices. Many quantization algorithms for efficient DNN training have been proposed, where the weights in the DNN layers are quantified to small/zero values, therefore requiring much fewer effective bits, i.e., low-precision bits in both arithmetic and storage units. Such sparsity can be leveraged to remove non-effective bits and reduce the design cost, however at the cost of accuracy degradation, as some key operations still demand a higher precision. Therefore, in this paper, we propose a universal mixed-precision DNN accelerator architecture that can simultaneously support mixed-precision DNN arithmetic operations. A big–little core controller based on RISC-V is implemented to effectively control the datapath, and assign the arithmetic operations to full precision and low precision process units, respectively. Experimental results show that, with the proposed designs, we can save 16% chip area and 45.2% DRAM access compared with the state-of-the-art design.

Graphical abstract

Display Omitted

Highlights

•

A mixed precision structure to process different effective bit-width weights separately.

•

An MCU contains two big–little RISC-V cores to control the accelerator and peripherals.

•

A fine-grain control flow to control the queuing operations with low hardware amount.

References

[1]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proc. CVPR, 2014.

[2]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. CVPR, 2016.

[3]

Deng J., Shi Z., Zhuo C., Energy efficient real-time UAV object detection on embedded platforms, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. (TCAD) 39 (10) (2020) 3123–3127.

[4]

M. Bojarski, et al., End to end learning for self-driving cars, in: Proc. CVPR, 2016.

[5]

G. Peng, et al., FBNA: A Fully Binarized Neural Network Accelerator, in: Proc. FPL, 2018.

[6]

D.J.M. Moss, High performance binary neural networks on the xeon+FPGATM platform, in: Proc. FPL, 2017.

[7]

R. Cai, et al., VIBNN: Hardware Acceleration of Bayesian Neural Networks, in: Proc. ACM, 2018.

[8]

Y. Ma, T. Zheng, Y. Cao, S. Vrudhula, J. Seo, Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs, in: Proc. ICCAD, 2018.

[9]

C. Yao, J. He, X. Zhang, C. Hao, D. Chen, Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs, in: Proc. ACM, 2019.

[10]

Chen Y.H., Krishna T., Emer J.S., Sze V., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, IEEE J. Solid-State Circuits 52 (1) (2017) 127–138.

[11]

S. Yin, et al., An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28 nm CMOS, in: Proc. VLSI Circuit, 2018.

[12]

B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor 28 nm FDSOI, in: Proc. ISSCC, 2017.

[13]

D. Shin, J. Lee, J. Lee, H.-J. Yoo, DNPU: An 8.1 TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks, in: IEEE ISSCC, 2017, pp. 240–241.

[14]

Lee J., et al., UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision, IEEE JSSC 54 (1) (2019) 173–185.

[15]

A.G. Howard, et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications, in: Proc. CVPR, 2017.

[16]

M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: ImageNet classification using binary convolutional neural networks, in: Proc. CVPR, 2016.

[17]

F.N. Landola, et al., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡0.5MB model size, in: Proc. CVPR, 2016.

[18]

Chen C., et al., Pam: A piecewise-linearly-approximated floating-point multiplier with unbiasedness and configurability, IEEE T. Computers 71 (10) (2022) 2473–2486.

[19]

Q. He, et al., Effective quantization methods for recurrent neural networks, in: Proc. CVPR, 2016.

[20]

C. Zhu, S. Han, H. Mao, W.J. Dally, Trained ternary quantization, in: Proc. ICLR, 2017.

[21]

L. Lai, N. Suda, V. Chandra, Deep convolutional neural network inference with floating-point weights and fixed-point activations, in: Proc. CVPR, 2017.

[22]

S. Hashemi, et al., Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks, in: Proc. DATE, 2017.

[23]

J.H. Ko, et al., Adaptive Weight Compression for Memory-Efficient Neural Networks, in: Proc. DATE, 2017.

[24]

I. Chakraborty, et al., Efficient Hybrid Network Architectures for Extremely Quantized Neural Networks Enabling Intelligence at the Edge, in: Proc. ACM, 2019.

[25]

B. Moons, K. Goetschalckx, N. Van Berckelaer, M. Verhelst, Minimum energy quantized neural networks, in: Proc. ACSSC, 2017, pp. 1921–1925.

[26]

Hubara I., Courbariaux M., Soudry D., El-Yaniv R., Bengio Y., Quantized neural networks: Training neural networks with low precision weights and activations, J. Mach. Learn. Res. 18 (1) (2017) 6869–6898.

[27]

Chen C., et al., From layout to system: Early stage power delivery and architecture co-exploration, IEEE T. Computer-Aided Design of Integrated Circuits and Systems 38 (7) (2019) 1291–1304.

[28]

Chen C., et al., Noise-aware DVFS for efficient transitions on battery-powered iot devices, IEEE T. Computer-Aided Design of Integrated Circuits and Systems 39 (7) (2020) 1498–1510.

[29]

A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: Proc. Adv. Neural Inf. Process. Syst., Vol. 25, 2012, pp. 1097–1105.

[30]

G. Ottavi, A. Garofalo, et al., A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference, in: Proc. ISVLSI, 2020, pp. 512–517.

[31]

X. Zhou, L. Zhang, et al., A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability, in: Proc. ISCAS, 2020.

[32]

RISC-V International, https://riscv.org/.

[33]

PicoRV, https://github.com/cliffordwolf/picorv32.

[34]

S. Hsiao, P. Wu, et al., Design Tradeoff of Internal Memory Size and Memory Access Energy in Deep Neural Network Hardware Accelerators, in: IEEE 7th Global Conference on Consumer Electronics, GCCE, 2018.

[35]

J. Qiu, et al., Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, in: Proc. ACM/SIGDA Intl. Symp. on Field-Programmable Gate Arrays, FPGA, 2016, pp. 26–35.

[36]

Pytorch, https://pytorch.org/.

[37]

S. Han, et al., EIE: Efficient Inference Engine on Compressed Deep Neural Network, in: Proc. ACM/IEEE Intl. Symp. Computer Architecture, ISCA, 2016, pp. 243–254.

Cited By

Zouzoula SMaleki MAzhar MTrancoso P(2024)Scratchpad Memory Management for Deep Learning AcceleratorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673115(629-639)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673115

Index Terms

A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCU

Index terms have been assigned to the content through auto-classification.

Recommendations

Parallel DNN Inference Framework Leveraging a Compact RISC-V ISA-based Multi-core System
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

RISC-V is an open-source instruction set and now has been examined as a universal standard to unify the heterogeneous platforms. However, current research focuses primarily on the design and fabrication of general-purpose processors based on RISC-V, ...
Accelerating stencils on the Tenstorrent Grayskull RISC-V accelerator
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis

The RISC-V Instruction Set Architecture (ISA) has enjoyed phenomenal growth in recent years, however it still to gain popularity in HPC. Whilst adopting RISC-V CPU solutions in HPC might be some way off, RISC-V based PCIe accelerators offer a middle ...
Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors
Abstract
Emerging many-core processors feature very high memory bandwidth and computational power. For example, Intel Xeon Phi many-core processors of the Knights Corner (KNC) and Knights Landing (KNL) architectures embrace 60 to 64 x86-based ...
Highlights
- We find that the state-of-the-art implementations of in-memory database operators suffer severely from memory stalls. Also, such implementations under-...

Comments

Information & Contributors

Information

Published In

cover image Integration, the VLSI Journal

Integration, the VLSI Journal Volume 88, Issue C

Jan 2023

410 pages

ISSN:0167-9260

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zouzoula SMaleki MAzhar MTrancoso P(2024)Scratchpad Memory Management for Deep Learning AcceleratorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673115(629-639)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673115

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents