Real-time flying object detection with YOLOv8

D Reis, J Kupec, J Hong, A Daoudi - arXiv preprint arXiv:2305.09972, 2023 - arxiv.org
D Reis, J Kupec, J Hong, A Daoudi
arXiv preprint arXiv:2305.09972, 2023arxiv.org
This paper presents a generalized model for real-time detection of flying objects that can be
used for transfer learning and further research, as well as a refined model that is ready for
implementation. We achieve this by training our first generalized model on a data set
containing 40 different classes of flying objects, forcing the model to extract abstract feature
representations. We then perform transfer learning with these learned parameters on a data
set more representative of real world environments (ie, higher frequency of occlusion, small …
This paper presents a generalized model for real-time detection of flying objects that can be used for transfer learning and further research, as well as a refined model that is ready for implementation. We achieve this by training our first generalized model on a data set containing 40 different classes of flying objects, forcing the model to extract abstract feature representations. We then perform transfer learning with these learned parameters on a data set more representative of real world environments (i.e., higher frequency of occlusion, small spatial sizes, rotations, etc.) to generate our refined model. Object detection of flying objects remains challenging due to large variance object spatial sizes/aspect ratios, rate of speed, occlusion, and clustered backgrounds. To address some of the presented challenges while simultaneously maximizing performance, we utilize the current state of the art single-shot detector, YOLOv8, in an attempt to find the best tradeoff between inference speed and mAP. While YOLOv8 is being regarded as the new state-of-the-art, an official paper has not been provided. Thus, we provide an in-depth explanation of the new architecture and functionality that YOLOv8 has adapted. Our final generalized model achieves an mAP50-95 of 0.685 and average inference speed on 1080p videos of 50 fps. Our final refined model maintains this inference speed and achieves an improved mAP50-95 of 0.835.
arxiv.org