Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475428acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection

Published: 17 October 2021 Publication History

Abstract

Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce annotated examples. Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component in the detector, yet few of them take the distinct preferences towards feature embedding of two subtasks into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. To the end, we propose a simple yet effective Adaptive Fully-Dual Network (AFD-Net). Specifically, we extend Faster R-CNN by introducing Dual Query Encoder and Dual Attention Generator for separate feature extraction, and Dual Aggregator for separate model reweighting. In this way, separate state estimation is achieved by the R-CNN detector. Furthermore, we introduce Adaptive Fusion Mechanism to guide the design of encoders for efficient feature fusion in the specific subtask. Extensive experiments on PASCAL VOC and MS COCO show that our approach achieves state-of-the-art performance by a large margin, demonstrating its effectiveness and generalization ability.

References

[1]
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems. 3981--3989.
[2]
Luca Bertinetto, Jo ao F Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in neural information processing systems. 523--531.
[3]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934 (2020).
[4]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154--6162.
[5]
Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. 2018. Lstd: A low-shot transfer detector for object detection. arXiv preprint arXiv:1803.01529 (2018).
[6]
Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A closer look at few-shot classification. arXiv preprint arXiv:1904.04232 (2019).
[7]
Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, and Trevor Darrell. 2020. A new meta-baseline for few-shot learning. arXiv preprint arXiv:2003.04390 (2020).
[8]
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems. 379--387.
[9]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.
[10]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, Vol. 88, 2 (2010), 303--338.
[11]
Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai. 2020. Few-shot object detection with attention-RPN and multi-relation detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4013--4022.
[12]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML .
[13]
Spyros Gidaris and Nikos Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4367--4375.
[14]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.
[15]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.
[16]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 9 (2015), 1904--1916.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[20]
Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-shot object detection via feature reweighting. In Proceedings of the IEEE International Conference on Computer Vision. 8420--8429.
[21]
Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M Bronstein. 2019. Repmet: Representative-based metric learning for classification and few-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5197--5206.
[22]
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Lille.
[23]
Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science, Vol. 350, 6266 (2015), 1332--1338.
[24]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.
[25]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[26]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[27]
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).
[28]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
[29]
Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L Yuille. 2018. Few-shot image recognition by predicting parameters from activations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7229--7238.
[30]
Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. (2016).
[31]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.
[32]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.
[33]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[34]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[35]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.
[36]
Larissa K Samuelson and Linda B Smith. 2005. They call it like they see it: Spontaneous naming and attention to shape. Developmental Science, Vol. 8, 2 (2005), 182--198.
[37]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Advances in neural information processing systems. 4077--4087.
[38]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.
[39]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in neural information processing systems. 3630--3638.
[40]
Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gonzalez, and Fisher Yu. 2020. Frustratingly Simple Few-Shot Object Detection. arXiv preprint arXiv:2003.06957 (2020).
[41]
Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, and Joseph E Gonzalez. 2019 b. Tafe-net: Task-aware feature embeddings for low shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1831--1840.
[42]
Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019 a. Meta-learning to detect rare objects. In Proceedings of the IEEE International Conference on Computer Vision. 9925--9934.
[43]
Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, and Yun Fu. 2020. Rethinking Classification and Localization for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10186--10195.
[44]
Yang Xiao and Renaud Marlet. 2020. Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild. In European Conference on Computer Vision (ECCV) .
[45]
Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and Gang Yu. 2020. SiamFC+: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. In AAAI. 12549--12556.
[46]
Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. 2019. Meta r-cnn: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 9577--9586.
[47]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016).

Cited By

View all
  • (2024)Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet HeadRemote Sensing10.3390/rs1619363016:19(3630)Online publication date: 29-Sep-2024
  • (2024)TIDE: Test-Time Few-Shot Object DetectionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2024.337169954:11(6500-6509)Online publication date: Nov-2024
  • (2024)SD-FSOD: Self-Distillation Paradigm via Distribution Calibration for Few-Shot Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334339734:7(5963-5976)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. few-shot object detection
    2. meta-learning
    3. task decomposition

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet HeadRemote Sensing10.3390/rs1619363016:19(3630)Online publication date: 29-Sep-2024
    • (2024)TIDE: Test-Time Few-Shot Object DetectionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2024.337169954:11(6500-6509)Online publication date: Nov-2024
    • (2024)SD-FSOD: Self-Distillation Paradigm via Distribution Calibration for Few-Shot Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334339734:7(5963-5976)Online publication date: Jul-2024
    • (2024)Instance-Dictionary Learning for Open-World Object Detection in Autonomous Driving ScenariosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.332246534:5(3395-3408)Online publication date: May-2024
    • (2024)A fast and data-efficient deep learning framework for multi-class fruit blossom detectionComputers and Electronics in Agriculture10.1016/j.compag.2023.108592217:COnline publication date: 16-May-2024
    • (2024)MPF-Net: multi-projection filtering network for few-shot object detectionApplied Intelligence10.1007/s10489-024-05556-154:17-18(7777-7792)Online publication date: 1-Sep-2024
    • (2023)PMR-CNN: Prototype Mixture R-CNN for Few-Shot Object Detection2023 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55152.2023.10186683(1-7)Online publication date: 4-Jun-2023
    • (2023)Automatic waste detection with few annotated samplesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.105865120:COnline publication date: 1-Apr-2023
    • (2023)Few-Shot Object Detection with Weight ImprintingCognitive Computation10.1007/s12559-023-10152-515:5(1725-1735)Online publication date: 25-May-2023
    • (2022)A Combined Approach to Infrared Small-Target Detection with the Alternating Direction Method of Multipliers and an Improved Top-Hat TransformationSensors10.3390/s2219732722:19(7327)Online publication date: 27-Sep-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media