research-article

AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection

Authors:

Haozhi LiAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2549 - 2557

https://doi.org/10.1145/3474085.3475428

Published: 17 October 2021 Publication History

Abstract

Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce annotated examples. Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component in the detector, yet few of them take the distinct preferences towards feature embedding of two subtasks into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. To the end, we propose a simple yet effective Adaptive Fully-Dual Network (AFD-Net). Specifically, we extend Faster R-CNN by introducing Dual Query Encoder and Dual Attention Generator for separate feature extraction, and Dual Aggregator for separate model reweighting. In this way, separate state estimation is achieved by the R-CNN detector. Furthermore, we introduce Adaptive Fusion Mechanism to guide the design of encoders for efficient feature fusion in the specific subtask. Extensive experiments on PASCAL VOC and MS COCO show that our approach achieves state-of-the-art performance by a large margin, demonstrating its effectiveness and generalization ability.

References

[1]

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems. 3981--3989.

Digital Library

[2]

Luca Bertinetto, Jo ao F Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in neural information processing systems. 523--531.

Digital Library

[3]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934 (2020).

[4]

Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154--6162.

[5]

Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. 2018. Lstd: A low-shot transfer detector for object detection. arXiv preprint arXiv:1803.01529 (2018).

[6]

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A closer look at few-shot classification. arXiv preprint arXiv:1904.04232 (2019).

[7]

Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, and Trevor Darrell. 2020. A new meta-baseline for few-shot learning. arXiv preprint arXiv:2003.04390 (2020).

[8]

Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems. 379--387.

Digital Library

[9]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.

[10]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, Vol. 88, 2 (2010), 303--338.

Digital Library

[11]

Qi Fan, Wei Zhuo, Chi-Keung Tang, and Yu-Wing Tai. 2020. Few-shot object detection with attention-RPN and multi-relation detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4013--4022.

[12]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML .

Digital Library

[13]

Spyros Gidaris and Nikos Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4367--4375.

[14]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.

Digital Library

[15]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.

Digital Library

[16]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 9 (2015), 1904--1916.

Digital Library

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[19]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.

[20]

Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-shot object detection via feature reweighting. In Proceedings of the IEEE International Conference on Computer Vision. 8420--8429.

[21]

Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M Bronstein. 2019. Repmet: Representative-based metric learning for classification and few-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5197--5206.

[22]

Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Lille.

[23]

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science, Vol. 350, 6266 (2015), 1332--1338.

[24]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.

[25]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[26]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[27]

Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).

[28]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[29]

Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L Yuille. 2018. Few-shot image recognition by predicting parameters from activations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7229--7238.

[30]

Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. (2016).

[31]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.

[32]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.

[33]

Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).

[34]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

Digital Library

[35]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[36]

Larissa K Samuelson and Linda B Smith. 2005. They call it like they see it: Spontaneous naming and attention to shape. Developmental Science, Vol. 8, 2 (2005), 182--198.

[37]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Advances in neural information processing systems. 4077--4087.

Digital Library

[38]

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.

[39]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in neural information processing systems. 3630--3638.

Digital Library

[40]

Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gonzalez, and Fisher Yu. 2020. Frustratingly Simple Few-Shot Object Detection. arXiv preprint arXiv:2003.06957 (2020).

[41]

Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, and Joseph E Gonzalez. 2019 b. Tafe-net: Task-aware feature embeddings for low shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1831--1840.

[42]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019 a. Meta-learning to detect rare objects. In Proceedings of the IEEE International Conference on Computer Vision. 9925--9934.

[43]

Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, and Yun Fu. 2020. Rethinking Classification and Localization for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10186--10195.

[44]

Yang Xiao and Renaud Marlet. 2020. Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild. In European Conference on Computer Vision (ECCV) .

[45]

Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and Gang Yu. 2020. SiamFC+: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. In AAAI. 12549--12556.

[46]

Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. 2019. Meta r-cnn: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 9577--9586.

[47]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016).

Cited By

Zhang JHong ZChen XLi Y(2024)Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet HeadRemote Sensing10.3390/rs1619363016:19(3630)Online publication date: 29-Sep-2024
https://doi.org/10.3390/rs16193630
Li WWei HWu YYang JRuan YLi YTang Y(2024)TIDE: Test-Time Few-Shot Object DetectionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2024.337169954:11(6500-6509)Online publication date: Nov-2024
https://doi.org/10.1109/TSMC.2024.3371699
Chen HWang QXie KLei LLin MLv TLiu YLuo J(2024)SD-FSOD: Self-Distillation Paradigm via Distribution Calibration for Few-Shot Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334339734:7(5963-5976)Online publication date: Jul-2024
https://doi.org/10.1109/TCSVT.2023.3343397
Show More Cited By

Index Terms

AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

A Survey of Deep Learning for Low-shot Object Detection
Object detection has achieved a huge breakthrough with deep neural networks and massive annotated data. However, current detection methods cannot be directly transferred to the scenario where the annotated data is scarce due to the severe overfitting ...
FSODv2: A Deep Calibrated Few-Shot Object Detection Network
Abstract
Traditional methods for object detection typically necessitate a substantial amount of training data, and creating high-quality training data is time-consuming. We propose a novel Few-Shot Object Detection network (FSODv2) in this paper that aims ...
Multi-task Self-supervised Few-Shot Detection
Pattern Recognition and Computer Vision
Abstract
Few-shot object detection involves detecting novel objects with only a few training samples. But very few samples are difficult to cover the bias of the new class in the deep model. To address the issue, we use self-supervision to expand the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Key Research and Development Program of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
198
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JHong ZChen XLi Y(2024)Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet HeadRemote Sensing10.3390/rs1619363016:19(3630)Online publication date: 29-Sep-2024
https://doi.org/10.3390/rs16193630
Li WWei HWu YYang JRuan YLi YTang Y(2024)TIDE: Test-Time Few-Shot Object DetectionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2024.337169954:11(6500-6509)Online publication date: Nov-2024
https://doi.org/10.1109/TSMC.2024.3371699
Chen HWang QXie KLei LLin MLv TLiu YLuo J(2024)SD-FSOD: Self-Distillation Paradigm via Distribution Calibration for Few-Shot Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334339734:7(5963-5976)Online publication date: Jul-2024
https://doi.org/10.1109/TCSVT.2023.3343397
Ma ZZheng ZWei JYang YShen H(2024)Instance-Dictionary Learning for Open-World Object Detection in Autonomous Driving ScenariosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.332246534:5(3395-3408)Online publication date: May-2024
https://doi.org/10.1109/TCSVT.2023.3322465
Zhou WCui YHuang HHuang HWang C(2024)A fast and data-efficient deep learning framework for multi-class fruit blossom detectionComputers and Electronics in Agriculture10.1016/j.compag.2023.108592217:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.compag.2023.108592
Chen HWang QXie KLei LWu X(2024)MPF-Net: multi-projection filtering network for few-shot object detectionApplied Intelligence10.1007/s10489-024-05556-154:17-18(7777-7792)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s10489-024-05556-1
Zhou JMei JLi HHu Y(2023)PMR-CNN: Prototype Mixture R-CNN for Few-Shot Object Detection2023 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55152.2023.10186683(1-7)Online publication date: 4-Jun-2023
https://doi.org/10.1109/IV55152.2023.10186683
Zhou WZhao LHuang HChen YXu SWang C(2023)Automatic waste detection with few annotated samplesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.105865120:COnline publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.engappai.2023.105865
Yan DHuang JSun HDing F(2023)Few-Shot Object Detection with Weight ImprintingCognitive Computation10.1007/s12559-023-10152-515:5(1725-1735)Online publication date: 25-May-2023
https://doi.org/10.1007/s12559-023-10152-5
Xi TYuan LSun Q(2022)A Combined Approach to Infrared Small-Target Detection with the Alternating Direction Method of Multipliers and an Improved Top-Hat TransformationSensors10.3390/s2219732722:19(7327)Online publication date: 27-Sep-2022
https://doi.org/10.3390/s22197327
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents