Abstract
Deep neural networks are widely used in computer vision, pattern recognition, and speech recognition and achieve high accuracy at the cost of remarkable computation. High computational complexity and memory accesses of such networks create a big challenge for using them in resource-limited and low-power embedded systems. Several binary neural networks have been proposed that exploit only 1-bit values for both weights and activations. Binary neural networks substitute complex multiply-accumulation operations with bitwise logic operations to reduce computations and memory usage. However, these quantized neural networks suffer from accuracy loss, especially in big datasets. In this paper, we introduce a quantized neural network with 2-bit weights and activations that is more accurate compared to the state-of-the-art quantized neural networks, and also the accuracy is close to the full precision neural networks. Moreover, we propose E2BNet, an efficient MAC-free hardware architecture that increases power efficiency and throughput/W about 3.6 × and 1.5 × , respectively, compared to the state-of-the-art quantized neural networks. E2BNet processes more than 500 images/s on the ImageNet dataset that not only meet real-time requirements of images/video processing but also can be deployed on high frame rate video applications.
Similar content being viewed by others
References
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Deng, L. et al.: Recent advances in deep learning for speech research at Microsoft. In: IEEE International Conference on Zcoustics, Speech, and Signal Processing, pp. 0–4, 2013.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)
He, K., Zhang, X., Ren, Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. 2016.
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105, (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, (2014)
Nasse, F., Thurau, C., Fink, G.A.: Face detection using GPU-based convolutional neural networks. In: International Conference on Computer Analysis of Images and Patterns, pp. 83–90. Springer, Berlin, Heidelberg, 2009.
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: Advances in neural information processing systems, pp. 1135–1143. 2015.
Chen, Y.-H., Emer, J., Sze, V.: Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Arch. News 44(3), 367–379 (2016)
Gysel, P., Pimentel, J., Motamedi, M., Ghiasi, S.: Ristretto: a framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29(11), 1–6 (2018)
Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks. In International Conference on Learning Representations, pp. 1–8, 2016.
Ma, Y., Suda, N., Cao, Y., Seo, J., Vrudhula, S.; Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: Field Programmable Logic and Applications (FPL), 2016 26th International Conference on, pp. 1–8. IEEE, 2016.
Yang, T.J., Chen, Y.H., Sze, V.: Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceeding of the 30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, pp. 6071–6079, 2017.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520. 2018
Zhang, X., Xinyu, Z., Mengxiao, L., Jian, S.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856. 2018.
Choi, Y., Mostafa, E.K., Jungwon, L.: Universal deep neural network compression. IEEE Journal of Selected Topics in Signal Processing (2020).
Sharma, H. et al.: Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In the 45th Annual International Symposium on Computer Architecture (ISCA), pp. 764–775. IEEE, 2018.
Courbariaux, M., David, J.: Training deep neural networks with low precision multiplications. ArXiv Preprint ArXiv: 1412.7024, 2015.
Nazari, N., Salehi, M.E.: Binary neural networks. In Hardware Architectures for Deep Learning, The Institution of Engineering and Technology, 2020.
Courbariaux, M., David, J.: BinaryConnect : Training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131. 2015.
Liu, B.: Ternary weight networks. ArXiv Preprint ArXiv: 1605.04711, 2016.
Nazari, N., Loni, M., Salehi, M. E., Daneshtalab, M., Mikael S.: TOT-Net: an endeavor toward optimizing ternary neural networks. In 2019 22nd Euromicro Conference on Digital System Design (DSD), pp. 305–312. IEEE, 2019.
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net : imagenet classification using binary. In European Conference on Computer Vision, pp. 525–542. Springer, Cham, 2016.
Deng, L., Jiao, P., Pei, J., Wu, Z., Li, G.: GXNOR-Net : training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework. Neural Netw. 100(4), 49–58 (2018)
Zhu, C., Song Han, H.M., Dally, W.J.: Trained ternary quantization. In International Conference on Learning Representations (ICLR), 2017.
Choi, J., Swagath, V., Vijayalakshmi, S., Kailash, G., Zhuo, W., Pierce, C.: Accurate and efficient 2-bit quantized neural networks. In: Proceedings of the SysML Conference, vol. 2019. 2019.
Yang, Z., Yunhe, W., Kai, H., Chunjing, X., Chao, X., Dacheng, T., Chang, X.: Searching for low-bit weights in quantized neural networks. 34th Conference on Neural Information Processing Systems (NIPS), 2020.
Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11212 LNCS, pp. 373–390.
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. 1(1):1–13, (2016)
Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M.: FINN : A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 65–74. ACM, 2017.
Ghasemzadeh, M., Samragh, M., Koushanfar, F.: ReBNet: residual binarized neural network. In The 26th IEEE International Symposium on Field-Programmable Custom Computing Machines. pp. 57–64, 2017.
Faraone, J., Fraser, N., Blott, M., Leong, P.H.W.: SYQ: Learning symmetric quantization for efficient deep neural networks. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 4300–4309, 2018.
Andri, R., Cavigelli, L., Rossi, D., Benini, L.: YodaNN 1: an architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(1), 48–60 (2018)
Al Bahou, A., Karunaratne, G., Andri, R., Cavigelli, L., Benini, L.: XNORBIN : A 95 TOp / s / W hardware accelerator for binary convolutional neural networks. In 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), pp. 1–3. IEEE, 2018.
Conti, F., Schiavone, P.D., Member, S., Benini, L.: XNOR neural engine: a hardware accelerator ip for 21 . 6 fJ / op Binary Neural Network Inference. ArXiv Preprint ArXiv: 1807.03010, (2018)
Sato, S., Nakahara, H., Ikebe, M.: BRein memory : A single-chip binary / ternary reconfigurable in-memory deep neural network. (2017)
Nazari, N., Seyed Ahmad M., Sima S., Salehi, M.E., Masoud D.: Multi-level binarized lstm in eeg classification for wearable devices. In 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 175–181. IEEE, 2020.
Mirsalari, S.A., Sima S., Salehi, M.E., Masoud D.: MuBiNN: Multi-level binarized recurrent neural network for EEG signal classification. In: 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE, 2020.
Mirsalari, S.A., Najmeh Nazari, S.A.A., Sinaei, S., Salehi, M.E., Daneshtalab, M.: ELC-ECG: Efficient LSTM Cell for ECG classification based on quantized architecture. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE, (2021)
Mirsalari, S.A., Najmeh Nazari, S.S., Salehi, M.E., Daneshtalab, M: FaCT-LSTM: fast and compact ternary architecture for LSTM recurrent neural networks. IEEE Design & Test (2021)
Ouyang, W. et al.: DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection. ArXiv Preprint ArXiv: 1409.3505. (2014)
Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: IEEE International Conference on Machine Learning. pp. 1–9, 2013.
Li, B., Hassan Najafi, M., Yuan, B., Lilja, D.J: Quantized neural networks with new stochastic multipliers. In 2018 19th International Symposium on Quality Electronic Design (ISQED), pp. 376–382. IEEE, 2018.
Li, B., Hassan Najafi, M., Lilja, D.J.: Low-cost stochastic hybrid multiplier for quantized neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 15(2), 1–19 (2019)
Wess, M., Dinakarrao, S.M.P., Jantsch, A.: Weighted quantization-regularization in DNNs for weight memory minimization toward HW implementation. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(11), 2929–2939 (2018). https://doi.org/10.1109/TCAD.2018.2857080
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mirsalari, S.A., Nazari, N., Ansarmohammadi, S.A. et al. E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems. J Real-Time Image Proc 18, 1285–1299 (2021). https://doi.org/10.1007/s11554-021-01148-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01148-1