A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Sang, Xiaoting; Ruan, Tao; Li, Chunlei; Li, Huanyu; Yang, Ruimin; Liu, Zhoufeng

doi:10.1007/s11554-023-01378-5

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Research
Published: 24 November 2023

Volume 21, article number 4, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Xiaoting Sang¹,
Tao Ruan²,
Chunlei Li¹,
Huanyu Li³,
Ruimin Yang¹ &
…
Zhoufeng Liu¹

388 Accesses
1 Citation
Explore all metrics

Abstract

Convolutional neural network (CNN) models equipped with depth separable convolution (DSC) promise a lower spatial complexity while retaining high model accuracy. However, little attention has been paid to their hardware architecture. Previous studies on DSC-based CNN accelerators typically use fixed computational models for various models, leading to an imbalance between power, efficiency, and performance. To address this problem, a novel, real-time DSC-based CNN accelerator that can accommodate field-programmable gate arrays (FPGAs) of different capacities and CNNs of different sizes is proposed in this paper. Attractively, a dynamically reconfigurable computing engine and block-convolution-based adaptive dataflow scheduling mode strike a trade-off between hardware resources and the processing speed in industrial processes. The proposed MobileNet accelerator was implemented and evaluated on the Xilinx XC7020 platform. Compared to previous FPGA-based accelerators, the experimental results showed that our approach can provide 10.86 GOPS of computational performance for full HD RGB images, meeting the needs of real industrial real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance-oriented FPGA-based convolution neural network designs

Article 09 February 2023

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

An Anatomization of FPGA-Based Neural Networks

Data availability

The data used to support the findings of this study are available from the corresponding author upon reasonable request.

References

Krizhevsky, I., Sutskever, G.E., Hinton, E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Hussain, M., Bird, J.J., Faria, D.R.: A study on CNN transfer learning for image classification. In: UK Workshop on computational intelligence, pp. 191–202. Springer (2018)
Google Scholar
Bazrafkan, S., Corcoran, P.M.: Pushing the AI envelope: merging deep networks to accelerate edge artificial intelligence in consumer electronics devices and systems. In IEEE Consum. Electron. Mag. 7(2), 55–61 (2018)
Article Google Scholar
Otto, A., Agatz, N., Campbell, J., Golden, B., Pesch, E.: Optimization approaches for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: a survey. Networks 72(4), 411–458 (2018)
Article MathSciNet Google Scholar
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018).
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W.J., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Zhang, X., Lu, H., Hao, C., Li, J., Cheng, B., Li, Y., Rupnow, K., Xiong, J.J., Huang, T., Shi, H.H., Hwu, W.M., Chen, D.: SkyNet: a hardware-efficient method for object detection and tracking on embedded systems. Proc. Mach. Learn. Syst. 2, 216–229 (2020)
Google Scholar
Sinha, D., El-Sharkawy, M.: Thin MobileNet: an enhanced mobilenet architecture. In: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0280–0285 (2019)
Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., Chen, Y.: DaDianNao: a neural network supercomputer. IEEE Trans. Comput. 66(1), 73–88 (2017)
Article MathSciNet Google Scholar
Du, L., Du, Y., Li, Y., Su, J., Kuan, Y.C., Liu, C.C., Chang, M.C.F.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2018)
Article Google Scholar
Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., Chen, Y.: Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016)
Jouppi, N.P., Yoon, D.H., Kurian, G., Li, S., Patil, N., Laudon, J., Young, C., Patterson, D.: A domain-specific supercomputer for training deep neural networks. Commun. ACM 63(7), 67–78 (2020)
Article Google Scholar
Liu, X., Yang, J., Zou, C., Chen, Q., Yan, X., Chen, Y., Cai, C.: Collaborative edge computing with fpga-based cnn accelerators for energy-efficient and time-aware face tracking system. IEEE Trans. Comput. Soc. Syst. 9(1), 252–266 (2022)
Article Google Scholar
Li, B., Wang, H., Zhang, X., Ren, J., Liu, L., Sun, H., Zheng, N.: Dynamic dataflow scheduling and computation mapping techniques for efficient depthwise separable convolution acceleration. IEEE Trans. Circuits Syst. I Regul. Pap. 68(8), 3279–3292 (2021)
Article Google Scholar
Xie, X., Sun, F., Lin, J., Wang, Z.: Fast-ABC: A fast architecture for bottleneck-like based convolutional neural networks. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 1–6 (2019)
Baharani, M., Sunil, U., Manohar, K., Furgurson, S., Tabkhi, H.: DeepDive: an integrative algorithm/architecture co-design for deep separable convolutional neural networks. In: Proceedings of 2021 on Great Lakes Symposium on VLSI, pp. 247–252 (2021)
Ding, W., Huang, Z., Huang, Z., Tian, L., Wang, H., Feng, S.: Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. J. Syst. Architect. 97, 278–286 (2019)
Article Google Scholar
Bai, L., Zhao, Y., Huang, X.: A cnn accelerator on fpga using depthwise separable convolution. IEEE Trans. Circuits Syst. II Express Briefs 65(10), 1415–1419 (2018)
Google Scholar
Liu, B., Zou, D., Feng, L., Feng, S., Fu, P., Li, J.: An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics 8(3), 281 (2019)
Article Google Scholar
Wu, X., Ma, Y., Wang, M., Wang, Z.: A flexible and efficient fpga accelerator for various large-scale and lightweight cnns. IEEE Trans. Circuits Syst. I Regul. Pap. 69(3), 1185–1198 (2022)
Article Google Scholar
Pérez, I., Figueroa, M.: A heterogeneous hardware accelerator for image classification in embedded systems. Sensors 21(8), 2637 (2021)
Article ADS PubMed PubMed Central Google Scholar
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolution networks for rapid-deployment. Adv. Neural Inf. Process. Syst. (2019)***
Zhang, M., Li, L., Wang, H., Liu, Y., Qin, H., Zhao, W.: Optimized compression for implementing convolutional neural networks on fpga. Electronics 8(3), 295 (2019)
Article Google Scholar
Yan, S., Liu, Z., Wang, Y., Zeng, C., Liu, Q., Cheng, B., Cheung, R. C.: An fpga-based mobileNet accelerator considering network structure characteristics. In: 31st International Conference on Field-Programmable Logic and Applications (FPL), pp. 17–23 (2021)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170 (2015)
ASUS. Tinker Edge R. https://tinker-board.asus.com/product/tinker-edge-r.html. (2019). Retrieved 02 Feb 2022
Raspberry Pi. Raspberry Pi 4 Model B specifications. https://www.raspberrypi.com/products/raspberrypi-4-model-b/. (2019) . Retrieved Feb 2, 2022
Google. Coral Dev Board. https://coral.ai/products/dev-board/. (2020). Retrieved Feb 2, 2022
NVIDIA. Jetson Nano. https://developer.nvidia.com/embedded/jetson-nano-developer-kit. (2019) Retrieved Feb 2, 2022
Ahmad, A., Pasha, M.A.: Optimizing hardware accelerated general matrix-matrix multiplication for cnns on fpgas. IEEE Trans. Circuits Syst. II Express Briefs 67(11), 2692–2696 (2020)
Google Scholar
Cao, Y.J., Gao, Y.X., Du, X.C.: FPGA acceleration method based on improved Yolov4-Tiny. Radio Eng. 52(4), 604–611 (2022)
Google Scholar
Mousouliotis, P. G., Panayiotou, K. L., Tsardoulias, E. G., Petrou, L. P., Symeonidis, A. L.: Expanding a robot's life: Low power object recognition via fpga-based DCNN deployment. In: 2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST), Thessaloniki, Greece, pp. 1–4 (2018)
MV S., Rao M.: An hardware accelerator design of Mobile-Net model on FPGA. In: Proceedings of the Second International Conference on AI-ML Systems, pp. 1–9 (2022)
Liao, J., Cai, L., Xu, Y., He, M.: Design of accelerator for MobileNet convolutional neural network based on FPGA. IEEE Adv. Inf. Technol. Electron. Autom. Control Conf. 1, 1392–1396 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the NSFC (No. 62072489, U1804157), Henan Science and Technology Innovation Team (CXTD2017091), IRTSTHN (21IRTSTHN013), Zhongyuan Science and Technology Innovation Leading Talent Program (214200510013), Qianjiang Laboratory Open Fund Project of Hangzhou Research Institute of Beihang (2020-Y3-A-026), Henan Provincial Machine Learning and Image Analysis Working Studio collaboration with prominent international scientists (GZS2022012, Department of Science and Technology of Henan Province), Open Research Program of the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology (20200206). Xiaoting Sang and Tao Ruan are the co-first authors of this paper.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Zhongyuan University of Technology, Zhengzhou, 450007, China
Xiaoting Sang, Chunlei Li, Ruimin Yang & Zhoufeng Liu
China Patent Information Center, Beijing, 100088, China
Tao Ruan
College of Oceanography and Space Informatics, China University of Petroleum East China, Qingdao, 266580, China
Huanyu Li

Authors

Xiaoting Sang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Chunlei Li
View author publications
You can also search for this author in PubMed Google Scholar
Huanyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhoufeng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XS conceived the idea and wrote the paper, TR developed the theory and expanded the experiment, CL guided the direction of this paper, and the rest assisted in collating the dataset and revising the paper.

Corresponding author

Correspondence to Chunlei Li.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sang, X., Ruan, T., Li, C. et al. A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification. J Real-Time Image Proc 21, 4 (2024). https://doi.org/10.1007/s11554-023-01378-5

Download citation

Received: 02 July 2023
Accepted: 12 October 2023
Published: 24 November 2023
DOI: https://doi.org/10.1007/s11554-023-01378-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance-oriented FPGA-based convolution neural network designs

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

An Anatomization of FPGA-Based Neural Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance-oriented FPGA-based convolution neural network designs

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

An Anatomization of FPGA-Based Neural Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation