research-article

A Review of FPGA Accelerated Computing Methods for YOLO Models

Authors:

Menglong ZhangAuthors Info & Claims

IoTML '24: Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning

Pages 34 - 42

https://doi.org/10.1145/3697467.3697591

Published: 08 November 2024 Publication History

Abstract

In recent years, with the rapid evolution of deep learning and neural networks, combined with the advent of the big data and intelligence era, one of the models in the field of object detection, YOLO (You Only Look Once) has also become a hot research topic. Addressing the critical benchmarks of object detection—Timeliness Rate and Accuracy Rate—has prompted a surge in research dedicated to constructing an FPGA (Field-Programmable Gate Array)-based acceleration scheme. In this article, we first provide an overview of neural networks and hardware platforms, followed by an in-depth exploration of the implementation of the YOLO model on FPGA hardware platforms. Additionally, we consolidate and review the current state of FPGA acceleration for YOLO models. Subsequently, we undertake a thorough analysis of the performance of different acceleration techniques. Finally, we delve into the exploration and discussion of potential future directions for development.

References

[1]

Molanes, R. F.; Amarasinghe, K., Rodriguez-Andina, J.; Manic, M. Deep learning and reconfigurable platforms in the internet of things: Challenges and opportunities in algorithms and hardware. IEEE Ind. Electron 2018, 12, 36-49.

[2]

Nasiri, N. Cost-effective programming for maximum power-efficiency of data centric applications on FPGAs. Doctoral dissertation, University of Massachusetts Lowell, 789 East Eisenhower Parkway, 2016.

[3]

Attia, S.; Betz, V. StateMover: Combining simulation and hardware execution for efficient FPGA debugging. FPGA '20: Proceedings of the 2020 ACM/SIGDA InternationalF Symposium on Field-Programmable Gate Arrays, Seaside CA, United States, 24 February; Association for Computer Machinery: New York, United States, 2020; pp. 175–185.

[4]

Geier, M.; Brändle, M.; Faller, D.; Chakraborty, S. Debugging FPGA-accelerated real-time systems. 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Sydney NSW, Australia, 21-24 April; IEEE: New York, United States, 2020; pp. 350-363.

[5]

Yap, J.W.; bin Mohd Yussof, Z.; bin Salim, S.I.; Lim, K.C. Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int J Adv Comput Sci Appl 2018, 9.

[6]

Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas Neveda, United States, 26-30 June; 2016; pp. 779-788.

[7]

Khalid, M.; Sarfraz, M.S.; Iqbal, U.; Aftab, M.U.; Niedbała, G.; Rauf, H.T. Real-Time Plant Health Detection Using Deep Convolutional Neural Networks. Agriculture 2023, 13, 510.

[8]

Psaltis, A.; Dimou, A.; Alvarez, F.; Daras, P. Flow R-CNN: Flow-enhanced object detection. Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, Taiwan, China, 10–15 January; Springer International Publishing: Berlin, German, 2021; pp. 685-700,

[9]

Wu, H.; Liu, Q.; Liu, X. A review on deep learning approaches to image classification and object segmentation. CMC-COMPUT MATER CON 2019, 60, 575-597.

[10]

Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018 (Cornell University).

[11]

Elmessery, W.M. YOLO-based model for automatic detection of broiler pathological phenomena through visual and thermal images in intensive poultry houses. Agriculture 2023, 13, 1527.

[12]

Valeja, Y.; Pathare, S.; Patel, D.; Pawar, M. (2021) Traffic sign detection using Clara and Yolo in python. 2021 7th international conference on advanced computing and communication systems (ICACCS), Coimbatore, India, 19-20 March; IEEE: New York, United States, 2021; pp. 367-371.

[13]

He, T. Achieving real-time target tracking usingwireless sensor networks. 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06), San Jose California, United States, 4-7 April; IEEE: New York, United States, 2006; pp. 37-48.

[14]

Chollet, F. Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu Hawaii, United States, 21-26 July; IEEE: New York, United States, 2017; pp. 1251-1258.

[15]

Zhao, B.; Liu, S.; Liu, G.; Yang, Z.; Ma, Z.; Fu, H. Efficient Object Detection based on Deep Feature Fusion Network. J Phys Conf Ser 2021, 1848, 012005.

[16]

Porambage, P.; Okwuibe, J.; Liyanage, M.; Ylianttila, M.; Taleb, T. Survey on multi-access edge computing for internet of things realization. Ieee Commun Surv Tut 2018, 20, 2961-2991.

[17]

Liu, S.; Liu, L.; Tang, J.; Yu, B.; Wang, Y.; Shi, W. Edge computing for autonomous driving: Opportunities and challenges. P Ieee 2019, 107, 1697-1716.

[18]

Mittal, S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. J Syst Architect 2019, 97, 428-442.

[19]

Chen, S.; Zhan, R.; Wang, W.; Zhang, J. Learning slimming SAR ship object detector through network pruning and knowledge distillation. Ieee J-Stars 2020, 14, 1267-1282.

[20]

Anwar, S.; Hwang, K.; Sung, W. Structured pruning of deep convolutional neural networks. ACM J. Emerging Technol. Comput 2017, 13, 1-18.

[21]

Liberatori, B.; Mami, C.A.; Santacatterina, G.; Zullich, M.; Pellegrino, F.A. Yolo-based face mask detection on low-end devices using pruning and quantization. 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 23-27 May; IEEE: New York, United States, 2022; pp. 900-905.

[22]

Nguyen, D.T.; Nguyen, T.N.; Kim, H.; Lee, H.J. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019, 27, 1861-1873.

[23]

Yang, B.; Liu, J.; Zhou, L.; Wang, Y.; Chen, J. Quantization and training of object detection networks with low-precision weights and activations. J Electron Imaging 2018, 27, 013020-013020.

[24]

Huang, R.; Pedoeem, J.; Chen, C. YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In 2018 IEEE international conference on big data (big data), Seattle WA, United States, 10-13 December; IEEE: New York, United States, 2018; pp. 2503-2510.

[25]

Wang, D.; He, D. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning. Bioproc Eng 2021, 210, 271-281.

[26]

Wang, Z.; Zhang, J.; Zhao, Z.; Su, F. (2020) Efficient yolo: A lightweight model for embedded deep learning object detection. 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, 6-10 July; IEEE: New York, United States, 2020; pp. 1-6.

[27]

Han, S.; Liu, X.; Mao, H.; Pu, J.; Pedram, A.; Horowitz, M.A.; Dally, W.J. EIE: Efficient inference engine on compressed deep neural network. SIGARCH COMPUT. ARCHIT. NEWS 2016, 44, 243-254.

Digital Library

[28]

Zeng, K.; Ma, Q.; Wu, J.W.; Chen, Z.; Shen, T.; Yan, C. FPGA-based accelerator for object detection: A comprehensive survey. J SUPERCOMPUT 2022, 78, 14096-14136.

[29]

Liu, Y.; Chu, H.; Song, L.; Zhang, Z.; Wei, X.; Chen, M.; Shen, J. An improved tuna-YOLO model based on YOLO v3 for real-time tuna detection considering lightweight deployment. J Mar Sci Eng 2023, 11, 542.

[30]

Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014 (Cornell University).

[31]

Li, T.; Ma, Y.; Endoh, T. A systematic study of tiny YOLO3 inference: Toward compact brainware processor with less memory and logic gate. IEEE Access 2020, 8, 142931-142955.

[32]

Chen, S.; Zhan, R.; Wang, W.; Zhang, J. Learning slimming SAR ship object detector through network pruning and knowledge distillation. Ieee J-Stars 2020, 14, 1267-1282.

[33]

Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, Monterey California, United States, 22-24 February; 2015; pp. 161-170.

[34]

Karapurkar, S.S.; Bramhane, L.K.; Rahulkar, A.D.; Veerakumar, T. Energy Efficient Implementation of Processing Elements for CNN Hardware Accelerator. 2023 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP), Nagpur, India, 28-29 April; IEEE: New York, United States, 2023; pp. 1-5.

[35]

Li, Z.; Wang, J. An improved algorithm for deep learning YOLO network based on Xilinx ZYNQ FPGA. 2020 International Conference on Culture-oriented Science & Technology (ICCST), Beijing, China, 28-31 October; IEEE: New York, United States, 2020; pp. 447-451.

[36]

Li, S.; Yu, C.; Xie, T.; Feng, W. A power-efficient optimizing framework FPGA accelerator for YOLO. In 2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China, 5-7 November; IEEE: New York, United States, 2022; pp. 1-6.

[37]

Wang, J.; Gu, S. Fpga implementation of object detection accelerator based on vitis-ai. 2021 11th International Conference on Information Science and Technology (ICIST), Chengdu, China, 21-23 May; IEEE: New York, United States, 2021; pp. 571-577.

[38]

Tan, M.; Pang, R.; Quoc, V. Le. "EfficientDet: Scalable and Efficient Object Detection," In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 13-19 June; IEEE: New York, United States, 2020; pp. 10778-10787.

Index Terms

A Review of FPGA Accelerated Computing Methods for YOLO Models
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware

Recommendations

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
Hardware accelerated FPGA placement

A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...
Acceleration of an FPGA router
FCCM '97: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines

The authors describe their experience and progress in accelerating an FPGA router. Placement and routing is undoubtedly the most time-consuming process in automatic chip design or configuring programmable logic devices as reconfigurable computing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IoTML '24: Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning

August 2024

443 pages

ISBN:9798400710353

DOI:10.1145/3697467

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IoTML 2024

IoTML 2024: 2024 4th International Conference on Internet of Things and Machine Learning

August 9 - 11, 2024

Nanchang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
160
Total Downloads

Downloads (Last 12 months)160
Downloads (Last 6 weeks)80

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten