Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3595916.3626385acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos

Published: 01 January 2024 Publication History

Abstract

Keypoint-based detectors perform well in surveillance videos but face challenges in detecting objects in UAV videos due to missed corners and mismatches. To address this, we propose a semantic-aware module with a feature fusion sub-module and a feature selection sub-module. The feature fusion module adaptively combines low-level and high-level features, enhancing corner recall. The feature selection module determines spatial location importance, improving discriminative capabilities and reducing background interference, resulting in better precision. Experiments on the UAVDT benchmark show our method achieves competitive results. Notably, our method improves corner recall by 4.0% and reduces the mismatch rate by 2.9% compared to the baseline. Code is available at https://github.com/jianpingZhonggit/SemanticAwareModule.

References

[1]
Zhaowei Cai, Mohammad Saberian, and Nuno Vasconcelos. 2015. Learning complexity-aware cascades for deep pedestrian detection. In ICCV. 3361–3369.
[2]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
[3]
Zhe Cui, Li Su, Weigang Zhang, and Qingming Huang. 2021. Fixation guided network for salient object detection. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.
[4]
Piotr Dollár, Ron Appel, Serge Belongie, and Pietro Perona. 2014. Fast feature pyramids for object detection. TPAMI 36, 8 (2014), 1532–1545.
[5]
K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian. 2019. CenterNet: Keypoint Triplets for Object Detection. In ICCV. 6568–6577.
[6]
Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2022. CenterNet++ for object detection. arXiv preprint arXiv:2204.08394 (2022).
[7]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. IJCV 88, 2 (2010), 303–338.
[8]
Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. 2023. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19358–19369.
[9]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580–587.
[10]
Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, and Yurong Chen. 2017. Ron: Reverse connection with objectness prior networks for object detection. In CVPR. 5936–5944.
[11]
Takashi Konno, Ayako Amma, and Asako Kanezaki. 2021. Incremental multi-view object detection from a moving camera. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.
[12]
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In ECCV. 734–750.
[13]
Suichan Li and Feng Chen. 2018. 3D-DETNet: a single stage video-based vehicle detector. In Third International Workshop on Pattern Recognition, Vol. 10828. International Society for Optics and Photonics, 108280A.
[14]
Jun Liang, Haosheng Chen, Kaiwen Du, Yan Yan, and Hanzi Wang. 2021. Learning intra-inter semantic aggregation for video object detection. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.
[15]
Yuping Liang, Jie Feng, Xiangrong Zhang, Junpeng Zhang, and Licheng Jiao. 2023. MidNet: An Anchor-and-Angle-Free Detector for Oriented Ship Detection in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing (2023).
[16]
Wan Teng Lim, Kelvin Ang, and Yuen Peng Loh. 2022. Deep Enhancement-Object Features Fusion for Low-Light Object Detection. In Proceedings of the 4th ACM International Conference on Multimedia in Asia. 1–6.
[17]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR. 2117–2125.
[18]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In ICCV. 2980–2988.
[19]
Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2021. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. In International Conference on Learning Representations.
[20]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In ECCV. Springer, 21–37.
[21]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431–3440.
[22]
Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. 2021. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3651–3660.
[23]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. Springer, 483–499.
[24]
Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, and Pierre Gravel. 2019. Road user detection in videos. arXiv preprint arXiv:1903.12049 (2019).
[25]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In CVPR. 7263–7271.
[26]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[27]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster r-cnn: Towards real-time object detection with region proposal networks. TPAMI 39, 6 (2016), 1137–1149.
[28]
Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, and Jifeng Dai. 2023. Towards all-in-one pre-training via maximizing multi-modal mutual information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15888–15899.
[29]
Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, 2021. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14454–14463.
[30]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In ICCV. 9627–9636.
[31]
Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, 2023. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14408–14419.
[32]
Haoran Wei, Xin Chen, Lingxi Xie, and Qi Tian. 2022. Cornerformer: Purifying instances for corner-based detectors. In European Conference on Computer Vision. Springer, 18–34.
[33]
Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, and Siwei Lyu. 2020. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Computer Vision and Image Understanding 193 (2020), 102907.
[34]
Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. Reppoints: Point set representation for object detection. In ICCV. 9657–9666.
[35]
Haihui Ye, Qiang Qi, Ying Wang, Yang Lu, and Hanzi Wang. 2021. Global and local feature alignment for video object detection. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.
[36]
Hongyang Yu, Guorong Li, Weigang Zhang, Qingming Huang, Dawei Du, Qi Tian, and Nicu Sebe. 2020. The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. IJCV 128, 5 (2020), 1141–1159.
[37]
Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z Li. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9759–9768.
[38]
Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Single-shot refinement neural network for object detection. In CVPR. 4203–4212.
[39]
Xingyi Zhou, Jiacheng Zhuo, and Philipp Krahenbuhl. 2019. Bottom-up object detection by grouping extreme and center points. In CVPR. 850–859.
[40]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.

Index Terms

  1. Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
    December 2023
    745 pages
    ISBN:9798400702051
    DOI:10.1145/3595916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anchor free
    2. attention mechanisms
    3. semantic aware
    4. small objects detection
    5. unmanned aerial vehicle (UAV).

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MMAsia '23
    Sponsor:
    MMAsia '23: ACM Multimedia Asia
    December 6 - 8, 2023
    Tainan, Taiwan

    Acceptance Rates

    Overall Acceptance Rate 59 of 204 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 74
      Total Downloads
    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media