research-article

Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos

Authors:

Jianping Zhong,

Qingming HuangAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 16, Pages 1 - 7

https://doi.org/10.1145/3595916.3626385

Published: 01 January 2024 Publication History

Abstract

Keypoint-based detectors perform well in surveillance videos but face challenges in detecting objects in UAV videos due to missed corners and mismatches. To address this, we propose a semantic-aware module with a feature fusion sub-module and a feature selection sub-module. The feature fusion module adaptively combines low-level and high-level features, enhancing corner recall. The feature selection module determines spatial location importance, improving discriminative capabilities and reducing background interference, resulting in better precision. Experiments on the UAVDT benchmark show our method achieves competitive results. Notably, our method improves corner recall by 4.0% and reduces the mismatch rate by 2.9% compared to the baseline. Code is available at https://github.com/jianpingZhonggit/SemanticAwareModule.

References

[1]

Zhaowei Cai, Mohammad Saberian, and Nuno Vasconcelos. 2015. Learning complexity-aware cascades for deep pedestrian detection. In ICCV. 3361–3369.

[2]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.

Digital Library

[3]

Zhe Cui, Li Su, Weigang Zhang, and Qingming Huang. 2021. Fixation guided network for salient object detection. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.

Digital Library

[4]

Piotr Dollár, Ron Appel, Serge Belongie, and Pietro Perona. 2014. Fast feature pyramids for object detection. TPAMI 36, 8 (2014), 1532–1545.

Digital Library

[5]

K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian. 2019. CenterNet: Keypoint Triplets for Object Detection. In ICCV. 6568–6577.

[6]

Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2022. CenterNet++ for object detection. arXiv preprint arXiv:2204.08394 (2022).

[7]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. IJCV 88, 2 (2010), 303–338.

Digital Library

[8]

Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. 2023. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19358–19369.

[9]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580–587.

[10]

Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, and Yurong Chen. 2017. Ron: Reverse connection with objectness prior networks for object detection. In CVPR. 5936–5944.

[11]

Takashi Konno, Ayako Amma, and Asako Kanezaki. 2021. Incremental multi-view object detection from a moving camera. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.

Digital Library

[12]

Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In ECCV. 734–750.

[13]

Suichan Li and Feng Chen. 2018. 3D-DETNet: a single stage video-based vehicle detector. In Third International Workshop on Pattern Recognition, Vol. 10828. International Society for Optics and Photonics, 108280A.

[14]

Jun Liang, Haosheng Chen, Kaiwen Du, Yan Yan, and Hanzi Wang. 2021. Learning intra-inter semantic aggregation for video object detection. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.

Digital Library

[15]

Yuping Liang, Jie Feng, Xiangrong Zhang, Junpeng Zhang, and Licheng Jiao. 2023. MidNet: An Anchor-and-Angle-Free Detector for Oriented Ship Detection in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing (2023).

[16]

Wan Teng Lim, Kelvin Ang, and Yuen Peng Loh. 2022. Deep Enhancement-Object Features Fusion for Low-Light Object Detection. In Proceedings of the 4th ACM International Conference on Multimedia in Asia. 1–6.

Digital Library

[17]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR. 2117–2125.

[18]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In ICCV. 2980–2988.

[19]

Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2021. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. In International Conference on Learning Representations.

[20]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In ECCV. Springer, 21–37.

[21]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431–3440.

[22]

Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. 2021. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3651–3660.

[23]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. Springer, 483–499.

[24]

Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, and Pierre Gravel. 2019. Road user detection in videos. arXiv preprint arXiv:1903.12049 (2019).

[25]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In CVPR. 7263–7271.

[26]

Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).

[27]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster r-cnn: Towards real-time object detection with region proposal networks. TPAMI 39, 6 (2016), 1137–1149.

Digital Library

[28]

Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, and Jifeng Dai. 2023. Towards all-in-one pre-training via maximizing multi-modal mutual information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15888–15899.

[29]

Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, 2021. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14454–14463.

[30]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In ICCV. 9627–9636.

[31]

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, 2023. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14408–14419.

[32]

Haoran Wei, Xin Chen, Lingxi Xie, and Qi Tian. 2022. Cornerformer: Purifying instances for corner-based detectors. In European Conference on Computer Vision. Springer, 18–34.

Digital Library

[33]

Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, and Siwei Lyu. 2020. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Computer Vision and Image Understanding 193 (2020), 102907.

Digital Library

[34]

Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. Reppoints: Point set representation for object detection. In ICCV. 9657–9666.

[35]

Haihui Ye, Qiang Qi, Ying Wang, Yang Lu, and Hanzi Wang. 2021. Global and local feature alignment for video object detection. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.

Digital Library

[36]

Hongyang Yu, Guorong Li, Weigang Zhang, Qingming Huang, Dawei Du, Qi Tian, and Nicu Sebe. 2020. The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. IJCV 128, 5 (2020), 1141–1159.

Digital Library

[37]

Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z Li. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9759–9768.

[38]

Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Single-shot refinement neural network for object detection. In CVPR. 4203–4212.

[39]

Xingyi Zhou, Jiacheng Zhuo, and Philipp Krahenbuhl. 2019. Bottom-up object detection by grouping extreme and center points. In CVPR. 850–859.

[40]

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.

Index Terms

Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

An Anchor-free Small Object Detection Algorithm Based On Feature Enhancement And Feature Fusion
ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence

Small object detection has the problem of not being able to obtain enough semantic information and rich detail information at the same time, and it is prone to missed detection and false detection. Based on this, we propose an anchor-free small object ...
Feature extraction and fusion network for salient object detection
Abstract
In the salient object detection (SOD) models based on convolutional neural network (CNN), the high-level semantic features and low-level features of the image are effectively fused and complementary, which can effectively improve the performance ...
Pyramid attention object detection network with multi-scale feature fusion
Highlights
- A multi-scale feature fusion pyramid attention module is proposed to better capture global and local features and improve the performance of object ...
Abstract
With the development of deep learning, object detection has made substantial progress. However, when the object to be detected in the image is small or partially occluded, the detection network often fails to detect it successfully. We ...
Graphical abstract

Display Omitted

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
83
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten