Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3671016.3671392acmconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Reduce Detection Latency of YOLOv5 to Prevent Real-Time Tracking Failures for Lightweight Robots

Published: 24 July 2024 Publication History

Abstract

Lightweight robots are frequently engaged in real-time tracking tasks to provide human companionship services. For effective target tracking, the YOLO series is often employed as a lightweight object detection framework in robot systems. However, YOLO still demands substantial resources to train larger-scale models, striking a balance between accuracy and resource efficiency. Deploying YOLO directly on robots with limited computing resources can lead to significant delays in detection, compromising the effectiveness of tracking tasks. A deeper concern arises from the prevalent use of CPUs as the primary computing units in robots, rendering many existing model optimization techniques, which primarily target GPU computing, unsuitable for this context.
To tackle this challenge, we propose a novel detection framework called SLCNet-YOLOv5, specifically designed for deployment in CPU-centric computing environments on robots. The core concept of SLCNet-YOLOv5 entails substituting the native YOLOv5 backbone network with SLCNet, which is a simplified version derived from the existing CPU convolutional neural network, PP-LCNet. It is important to note that our aim is not to enhance PP-LCNet to improve inference accuracy but rather to simplify it to enhance inference speed, while tolerating a certain degree of accuracy loss. This is because excessive inference latency may lead to real-time tracking failures. By employing a backbone network optimized for CPU-centric computation and reducing the computational complexity of the detection model, SLCNet markedly reduces latency, expediting the detection process, with only a minor trade-off in accuracy. In comparison to the performance of the state-of-the-art detector YOLOv5, experimental results on publicly available coco-foot-and-leg and PASCAL VOC datasets demonstrate significant enhancements in detection speed per image on CPU-centric terminals, with respective increases of 62.8% and 81.3%, alongside marginal declines in mean Average Precision (mAP) at 0.5 Intersection over Union (IoU) threshold, with losses of 0.077 and 0.165.

References

[1]
Pranav Adarsh, Pratibha Rathi, and Manoj Kumar. 2020. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In 2020 6th international conference on advanced computing and communication systems (ICACCS). IEEE, 687–694.
[2]
AH Adiwahono, VB Saputra, KP Ng, W Gao, Qk Ren, BH Tan, and TW Chang. 2017. Human tracking and following in dynamic environment for service robots. In TENCON 2017-2017 IEEE Region 10 Conference. IEEE, 3068–3073.
[3]
Kean Chen, Weiyao Lin, Jianguo Li, John See, Ji Wang, and Junni Zou. 2020. AP-loss for accurate one-stage object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 3782–3798.
[4]
Cheng Cui, Tingquan Gao, Shengyu Wei, Yuning Du, Ruoyu Guo, Shuilong Dong, Bin Lu, Ying Zhou, Xueying Lv, Qiwen Liu, 2021. PP-LCNet: A lightweight CPU convolutional neural network. arXiv preprint arXiv:2109.15099 (2021).
[5]
Cheng Cui, Tingquan Gao, Shengyu Wei, Yuning Du, Ruoyu Guo, Shuilong Dong, Bin Lu, Ying Zhou, Xueying Lv, Qiwen Liu, 2021. PP-LCNet: A lightweight CPU convolutional neural network. arXiv preprint arXiv:2109.15099 (2021).
[6]
Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, and Lei Zhang. 2021. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7373–7382.
[7]
Xiuli Du, Linkai Song, Yana Lv, and Shaoming Qiu. 2022. A lightweight military target detection algorithm based on improved YOLOv5. Electronics 11, 20 (2022), 3263.
[8]
Markus Eisenbach, Alexander Vorndran, Sven Sorge, and Horst-Michael Gross. 2015. User recognition for guiding and following people with a mobile robot in a clinical environment. (2015), 3600–3607.
[9]
Efstathios P Fotiadis, Mario Garzón, and Antonio Barrientos. 2013. Human detection from a mobile robot using fusion of laser and vision information. Sensors 13, 9 (2013), 11603–11635.
[10]
Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C Berg. 2017. Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).
[11]
David Geronimo, Antonio M Lopez, Angel D Sappa, and Thorsten Graf. 2009. Survey of pedestrian detection for advanced driver assistance systems. IEEE transactions on pattern analysis and machine intelligence 32, 7 (2009), 1239–1258.
[12]
Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7036–7045.
[13]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
[14]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.
[15]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.
[16]
Qiyi He, Ao Xu, Zhiwei Ye, Wen Zhou, and Ting Cai. 2023. Object detection based on lightweight YOLOX for autonomous driving. Sensors 23, 17 (2023), 7596.
[17]
Rui Hou, Chen Chen, and Mubarak Shah. 2017. An end-to-end 3d convolutional neural network for action detection and segmentation in videos. arXiv preprint arXiv:1712.01111 (2017).
[18]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision. 1314–1324.
[19]
Miao Hu, Yali Li, Lu Fang, and Shengjin Wang. 2021. A2-FPN: Attention aggregation based feature pyramid network for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15343–15352.
[20]
Md Jahidul Islam, Jungseok Hong, and Junaed Sattar. 2019. Person-following by autonomous robots: A categorical overview. The International Journal of Robotics Research 38, 14 (2019), 1581–1618.
[21]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 221–231.
[22]
Alexander Kirillov, Ross Girshick, Kaiming He, and Piotr Dollár. 2019. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6399–6408.
[23]
Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 156–165.
[24]
Hao Li, Lianbing Deng, Cheng Yang, Jianbo Liu, and Zhaoquan Gu. 2021. Enhanced YOLO v3 tiny network for real-time ship detection from visual image. Ieee Access 9 (2021), 16692–16706.
[25]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
[26]
Gang Liu, Yanxin Hu, Zhiyu Chen, Jianwei Guo, and Peng Ni. 2023. Lightweight object detection algorithm for robots with improved YOLOv5. Engineering Applications of Artificial Intelligence 123 (2023), 106217.
[27]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 21–37.
[28]
Eleftherios Lygouras, Nicholas Santavas, Anastasios Taitzoglou, Konstantinos Tarchanidis, Athanasios Mitropoulos, and Antonios Gasteratos. 2019. Unsupervised human detection with an embedded vision system on a fully autonomous UAV for search and rescue operations. Sensors 19, 16 (2019), 3542.
[29]
Samah AF Manssor, Shaoyuan Sun, Mohammed Abdalmajed, and Shima Ali. 2022. Real-time human detection in thermal infrared imaging at night using enhanced Tiny-yolov3 network. Journal of Real-Time Image Processing (2022), 1–14.
[30]
Nitin Nair, Chinchu Thomas, and Dinesh Babu Jayagopi. 2018. Human activity recognition using temporal convolutional network. In Proceedings of the 5th international Workshop on Sensor-based Activity Recognition and Interaction. 1–8.
[31]
Kemal Oksuz, Baris Can Cam, Emre Akbas, and Sinan Kalkan. 2020. A ranking-based, balanced loss function unifying classification and localisation in object detection. Advances in Neural Information Processing Systems 33 (2020), 15534–15545.
[32]
Siyuan Qiao, Liang-Chieh Chen, and Alan Yuille. 2021. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10213–10224.
[33]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.
[34]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271.
[35]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[36]
ELizabeth Sabu and K Suresh. 2018. Object detection from video using temporal convolutional network. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 11–15.
[37]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
[38]
Yuheng Shi, Naiyan Wang, and Xiaojie Guo. 2023. YOLOV: Making still image object detectors great at video object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2254–2262.
[39]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781–10790.
[40]
Ryosuke Tasaki, Michiteru Kitazaki, Jun Miura, and Kazuhiko Terashima. 2015. Prototype design of medical round supporting robot “Terapio”. (2015), 829–834.
[41]
Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, and Chunhua Shen. 2022. Fully convolutional one-stage 3d object detection on lidar range images. Advances in Neural Information Processing Systems 35 (2022), 34899–34911.
[42]
Z Tian, C Shen, H Chen, and T He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. arXiv. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019), 9627––9636.
[43]
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2023. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7464–7475.
[44]
Jianfeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, and Nanning Zheng. 2021. End-to-end object detection with fully convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15849–15858.
[45]
Rongsheng Wang, Yaofei Duan, Menghan Hu, Xiaohong Liu, Yukun Li, Qinquan Gao, Tong Tong, and Tao Tan. 2023. LightR-YOLOv5: A compact rotating detector for SARS-CoV-2 antigen-detection rapid diagnostic test results. Displays 78 (2023), 102403.
[46]
Tai Wang, Xinge Zhu, Jiangmiao Pang, and Dahua Lin. 2021. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 913–922.
[47]
Wentong Wu, Han Liu, Lingling Li, Yilin Long, Xiaodong Wang, Zhuohua Wang, Jinglun Li, and Yi Chang. 2021. Application of local fully Convolutional Neural Network combined with YOLO v5 algorithm in small target detection of remote sensing image. PloS one 16, 10 (2021), e0259283.
[48]
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tieyan Liu. 2020. On layer normalization in the transformer architecture. In International Conference on Machine Learning. PMLR, 10524–10533.
[49]
Hao Xu, Bo Li, and Fei Zhong. 2022. Light-YOLOv5: A lightweight algorithm for improved YOLOv5 in complex fire scenarios. Applied Sciences 12, 23 (2022), 12312.
[50]
Jiexian Zeng, Jiale Xiong, Xiang Fu, and Lu Leng. 2020. ReFPN-FCOS: One-stage object detection for feature learning and accurate localization. IEEE Access 8 (2020), 225052–225063.
[51]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848–6856.
[52]
Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. 2020. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 12993–13000.

Index Terms

  1. Reduce Detection Latency of YOLOv5 to Prevent Real-Time Tracking Failures for Lightweight Robots
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware
          July 2024
          518 pages
          ISBN:9798400707056
          DOI:10.1145/3671016
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 24 July 2024

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Inference Latency
          2. Lightweight Robots
          3. Object Detection
          4. Real-Time Tracking

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          Internetware 2024
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 55 of 111 submissions, 50%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 39
            Total Downloads
          • Downloads (Last 12 months)39
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 14 Jan 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media