Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCU

Published: 01 January 2023 Publication History

Abstract

Deep neural networks (DNNs) are widely used in modern AI systems, and their dedicated accelerators have become a promising option for edge scenarios due to the energy efficiency and high performance. Since the DNN model requires significant storage and computation resources, various energy efficient DNN accelerators or algorithms have been proposed for edge devices. Many quantization algorithms for efficient DNN training have been proposed, where the weights in the DNN layers are quantified to small/zero values, therefore requiring much fewer effective bits, i.e., low-precision bits in both arithmetic and storage units. Such sparsity can be leveraged to remove non-effective bits and reduce the design cost, however at the cost of accuracy degradation, as some key operations still demand a higher precision. Therefore, in this paper, we propose a universal mixed-precision DNN accelerator architecture that can simultaneously support mixed-precision DNN arithmetic operations. A big–little core controller based on RISC-V is implemented to effectively control the datapath, and assign the arithmetic operations to full precision and low precision process units, respectively. Experimental results show that, with the proposed designs, we can save 16% chip area and 45.2% DRAM access compared with the state-of-the-art design.

Graphical abstract

Display Omitted

Highlights

A mixed precision structure to process different effective bit-width weights separately.
An MCU contains two big–little RISC-V cores to control the accelerator and peripherals.
A fine-grain control flow to control the queuing operations with low hardware amount.

References

[1]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proc. CVPR, 2014.
[2]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. CVPR, 2016.
[3]
Deng J., Shi Z., Zhuo C., Energy efficient real-time UAV object detection on embedded platforms, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. (TCAD) 39 (10) (2020) 3123–3127.
[4]
M. Bojarski, et al., End to end learning for self-driving cars, in: Proc. CVPR, 2016.
[5]
G. Peng, et al., FBNA: A Fully Binarized Neural Network Accelerator, in: Proc. FPL, 2018.
[6]
D.J.M. Moss, High performance binary neural networks on the xeon+FPGATM platform, in: Proc. FPL, 2017.
[7]
R. Cai, et al., VIBNN: Hardware Acceleration of Bayesian Neural Networks, in: Proc. ACM, 2018.
[8]
Y. Ma, T. Zheng, Y. Cao, S. Vrudhula, J. Seo, Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs, in: Proc. ICCAD, 2018.
[9]
C. Yao, J. He, X. Zhang, C. Hao, D. Chen, Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs, in: Proc. ACM, 2019.
[10]
Chen Y.H., Krishna T., Emer J.S., Sze V., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, IEEE J. Solid-State Circuits 52 (1) (2017) 127–138.
[11]
S. Yin, et al., An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28 nm CMOS, in: Proc. VLSI Circuit, 2018.
[12]
B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor 28 nm FDSOI, in: Proc. ISSCC, 2017.
[13]
D. Shin, J. Lee, J. Lee, H.-J. Yoo, DNPU: An 8.1 TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks, in: IEEE ISSCC, 2017, pp. 240–241.
[14]
Lee J., et al., UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision, IEEE JSSC 54 (1) (2019) 173–185.
[15]
A.G. Howard, et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications, in: Proc. CVPR, 2017.
[16]
M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: ImageNet classification using binary convolutional neural networks, in: Proc. CVPR, 2016.
[17]
F.N. Landola, et al., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡0.5MB model size, in: Proc. CVPR, 2016.
[18]
Chen C., et al., Pam: A piecewise-linearly-approximated floating-point multiplier with unbiasedness and configurability, IEEE T. Computers 71 (10) (2022) 2473–2486.
[19]
Q. He, et al., Effective quantization methods for recurrent neural networks, in: Proc. CVPR, 2016.
[20]
C. Zhu, S. Han, H. Mao, W.J. Dally, Trained ternary quantization, in: Proc. ICLR, 2017.
[21]
L. Lai, N. Suda, V. Chandra, Deep convolutional neural network inference with floating-point weights and fixed-point activations, in: Proc. CVPR, 2017.
[22]
S. Hashemi, et al., Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks, in: Proc. DATE, 2017.
[23]
J.H. Ko, et al., Adaptive Weight Compression for Memory-Efficient Neural Networks, in: Proc. DATE, 2017.
[24]
I. Chakraborty, et al., Efficient Hybrid Network Architectures for Extremely Quantized Neural Networks Enabling Intelligence at the Edge, in: Proc. ACM, 2019.
[25]
B. Moons, K. Goetschalckx, N. Van Berckelaer, M. Verhelst, Minimum energy quantized neural networks, in: Proc. ACSSC, 2017, pp. 1921–1925.
[26]
Hubara I., Courbariaux M., Soudry D., El-Yaniv R., Bengio Y., Quantized neural networks: Training neural networks with low precision weights and activations, J. Mach. Learn. Res. 18 (1) (2017) 6869–6898.
[27]
Chen C., et al., From layout to system: Early stage power delivery and architecture co-exploration, IEEE T. Computer-Aided Design of Integrated Circuits and Systems 38 (7) (2019) 1291–1304.
[28]
Chen C., et al., Noise-aware DVFS for efficient transitions on battery-powered iot devices, IEEE T. Computer-Aided Design of Integrated Circuits and Systems 39 (7) (2020) 1498–1510.
[29]
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: Proc. Adv. Neural Inf. Process. Syst., Vol. 25, 2012, pp. 1097–1105.
[30]
G. Ottavi, A. Garofalo, et al., A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference, in: Proc. ISVLSI, 2020, pp. 512–517.
[31]
X. Zhou, L. Zhang, et al., A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability, in: Proc. ISCAS, 2020.
[32]
RISC-V International, https://riscv.org/.
[34]
S. Hsiao, P. Wu, et al., Design Tradeoff of Internal Memory Size and Memory Access Energy in Deep Neural Network Hardware Accelerators, in: IEEE 7th Global Conference on Consumer Electronics, GCCE, 2018.
[35]
J. Qiu, et al., Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, in: Proc. ACM/SIGDA Intl. Symp. on Field-Programmable Gate Arrays, FPGA, 2016, pp. 26–35.
[37]
S. Han, et al., EIE: Efficient Inference Engine on Compressed Deep Neural Network, in: Proc. ACM/IEEE Intl. Symp. Computer Architecture, ISCA, 2016, pp. 243–254.

Cited By

View all
  • (2024)Scratchpad Memory Management for Deep Learning AcceleratorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673115(629-639)Online publication date: 12-Aug-2024

Index Terms

  1. A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCU
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Integration, the VLSI Journal
        Integration, the VLSI Journal  Volume 88, Issue C
        Jan 2023
        410 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 01 January 2023

        Author Tags

        1. Deep neural network
        2. RISC-V
        3. Mixed-precision
        4. Multi-core

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 19 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Scratchpad Memory Management for Deep Learning AcceleratorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673115(629-639)Online publication date: 12-Aug-2024

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media