Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Precision-Scalable Deep Neural Network Accelerator With Activation Sparsity Exploitation

Published: 31 August 2023 Publication History

Abstract

To meet the demand in a wide range of practical applications, precision-scalable deep neural network (DNN) accelerators are becoming an unavoidable trend. On the other hand, it has been demonstrated that a DNN accelerator may achieve better computation efficiency through exploiting the sparsity. Therefore, DNN accelerators with both precision scalability and sparsity exploitation are expected to have better performance. In this article, we propose an efficient precision-scalable DNN accelerator that can exploit the sparsity of activations. The precision scalability is obtained from the decomposable multiplier which is inspired by the well-known design, Bit Fusion. Besides, a zero-skipping scheme is adopted to leverage the inherent sparsity of activations. We first modify the architecture of the conventional fusion unit (FU) to make it amenable to the zero-skipping scheme. Then, a segmentation approach is devised to tackle the memory access conflict. Furthermore, a sparsity-aware mapping method is proposed to balance the workload of processing elements (PEs). Moreover, we present a bit-splitting strategy which can take advantage of the sparsity in the bit level. Compared with the state-of-the-art precision-scalable designs, our proposed accelerator can provide speedups of <inline-formula> <tex-math notation="LaTeX">$4.12\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$4.07\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$6.62\times $ </tex-math></inline-formula> in the precision modes <inline-formula> <tex-math notation="LaTeX">$8b\times 8b$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$4b\times 4b$ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$2b\times 2b$ </tex-math></inline-formula>, respectively. Meanwhile, it also achieves <inline-formula> <tex-math notation="LaTeX">$3.92\times $ </tex-math></inline-formula> peak area efficiency and competitive peak energy efficiency.

References

[1]
V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105, no. 12, pp. 2295–2329, Dec. 2017.
[2]
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: Alexnet-level accuracy with 50× fewer parameters and <0.5 MB model size,” 2016, arXiv:1602.07360.
[3]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520.
[4]
Y. Chenet al., “DaDianNao: A machine-learning supercomputer,” in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchit., 2014, pp. 609–622.
[5]
S. Hanet al., “EIE: Efficient inference engine on compressed deep neural network,” in Proc. 43rd IEEE Int. Symp. Comput. Archit., 2016, pp. 243–254.
[6]
Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017.
[7]
Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, “HAWQ: Hessian AWare quantization of neural networks with mixed-precision,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 293–302.
[8]
Z. Dong, Z. Yao, D. Arfeen, A. Gholami, M. W. Mahoney, and K. Keutzer, “HAWQ-V2: Hessian aware traceweighted quantization of neural networks,” in Proc. Adv. Neural Inf. Process. Syst, 2020, pp. 18518–18529.
[9]
Z. Yaoet al., “HAWQ-V3: Dyadic neural network quantization,” in Proc. Int. Conf. Mach. Learn, 2021, pp. 11875–11886.
[10]
Y. Cai, Z. Yao, Z. Dong, A. Gholami, M. W. Mahoney, and K. Keutzer, “ZeroQ: A novel zero shot quantization framework,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13169–13178.
[11]
Y. Liet al., “BRECQ: Pushing the limit of post-training quantization by block reconstruction,” in Proc. Int. Conf. Learn. Representat., 2021, pp. 1–16.
[12]
J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, “PACT: Parameterized clipping activation for quantized neural networks,” 2018, arXiv:1805.06085.
[13]
S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, “Learned step size quantization,” in Proc. Int. Conf. Learn. Represent., 2020, pp. 1–12.
[14]
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” 2016, arXiv:1606.06160.
[15]
H. Alemdar, V. Leroy, A. Prost-Boucle, and F. Petrot, “Ternary neural networks for resource-efficient AI applications,” in Proc. Int. Joint Conf. Neural Netw., 2017, pp. 2547–2554.
[16]
L. Deng, P. Jiao, J. Pei, Z. Wu, and G. Li, “GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework,” Neural Netw., vol. 100, pp. 49–58, Apr. 2018.
[17]
Y. Li, S. Deng, X. Dong, R. Gong, and S. Gu, “A free lunch from ANN: Towards efficient, accurate spiking neural networks calibration,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 6316–6325.
[18]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, “Stripes: Bit-serial deep neural network computing,” in Proc. 49th Annu. IEEE/ACM Int. Symp. Microarchit., 2016, pp. 1–12.
[19]
J. Albericioet al., “Bit-pragmatic deep neural network computing,” in Proc. 50th Annu. IEEE/ACM Int. Symp. Microarchit., 2017, pp. 382–394.
[20]
S. Sharify, A. D. Lascorz, K. Siu, P. Judd, and A. Moshovos, “Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks,” in Proc. 55th Annu. Design Autom. Conf., 2018, pp. 1–6.
[21]
J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, “UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision,” IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 173–185, Jan. 2019.
[22]
H. Sharmaet al., “Bit Fusion: Bit-level dynamically composable architecture for accelerating deep neural network,” in Proc. 45th IEEE Int. Symp. Comput. Archit., 2018, pp. 764–775.
[23]
W. Liu, J. Lin, and Z. Wang, “A precision-scalable energy-efficient convolutional neural network accelerator,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 10, pp. 3484–3497, Oct. 2020.
[24]
S. Ryuet al., “BitBlade: Energy-efficient variable bit-precision hardware accelerator for quantized neural networks,” IEEE J. Solid-State Circuits, vol. 57, no. 6, pp. 1924–1935, Jun. 2022.
[25]
B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 246–247.
[26]
D. Shin, J. Lee, J. Lee, and H.-J. Yoo. “DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 240–241.
[27]
S. Sharifyet al., “Laconic deep learning inference acceleration,” in Proc. 46th IEEE Int. Symp. Comput. Archit., 2019, pp. 304–317.
[28]
A. Parasharet al., “SCNN: An accelerator for compressed-sparse convolutional neural networks,” in Proc. 44th IEEE Int. Symp. Comput. Archit., 2017, pp. 27–40.
[29]
X. Zhouet al., “Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach,” in Proc. 51st Annu. IEEE/ACM Int. Symp. Microarchit., 2018, pp. 15–28.
[30]
A. D. Lascorzet al., “Bit-tactical: A software/hardware approach to exploiting value and bit sparsity in neural networks,” in Proc. 24th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2019, pp. 749–763.
[31]
S. Kang, G. Park, S. Kim, S. Kim, D. Han, and H.-J. Yoo, “An overview of sparsity exploitation in CNNs for on-device intelligence with software-hardware cross-layer optimizations,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 11, no. 4, pp. 634–648, Dec. 2021.
[32]
V. Camus, L. Mei, C. Enz, and M. Verhelst, “Review and benchmarking of precision-scalable multiply-accumulate unit architectures for embedded neural-network processing,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 4, pp. 697–711, Dec. 2019.
[33]
E. M. Ibrahim, L. Mei, and M. Verhelst, “Taxonomy and benchmarking of precision-scalable MAC arrays under enhanced DNN dataflow representation,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 5, pp. 2013–2024, May 2022.
[34]
B. Jacobet al., “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2704–2713.
[35]
M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. van Baalen, and T. Blankevoort, “A white paper on neural network quantization,” 2021, arXiv:2106.08295.
[36]
W. Li, A. Hu, G. Wang, N. Xu, and G. He, “Low-complexity precision-scalable multiply-accumulate unit architectures for deep neural network accelerators,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 70, no. 4, pp. 1610–1614, Apr. 2023.
[37]
Z. Chen, Y. Ma, and Z. Wang, “Hybrid stochastic-binary computing for low-latency and high-precision inference of CNNs,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 7, pp. 2707–2720, Jul. 2022.
[38]
M. Kim and J.-S. Seo, “An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access,” IEEE J. Solid-State Circuits, vol. 56, no. 3, pp. 803–813, Mar. 2021.
[39]
S. Sharify, “Loom and Laconic: Hardware accelerators for deep learning algorithms,” Ph.D. dissertation, Dept. Electr. Comput. Eng., Univ. Toronto, Toronto, ON, Canada, 2020.
[40]
J. Yanget al., “GQNA: Generic quantized DNN accelerator with weight-repetition-aware activation aggregating,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 10, pp. 4069–4082, Oct. 2022.
[41]
Q. Yang and H. Li, “BitSystolic: A 26.7 TOPS/W 2b 8b NPU with configurable data flows for edge devices,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 3, pp. 1134–1145, Mar. 2021.

Cited By

View all
  • (2024)Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPUProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657571(72-82)Online publication date: 20-Jun-2024

Index Terms

  1. A Precision-Scalable Deep Neural Network Accelerator With Activation Sparsity Exploitation
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        Publisher

        IEEE Press

        Publication History

        Published: 31 August 2023

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPUProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657571(72-82)Online publication date: 20-Jun-2024

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media