Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

Published: 15 March 2023 Publication History

Abstract

Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps extracted from images in the support set. These support prototypes may lead to inaccurate predictions when directly compared with features extracted from the query set due to the considerable distribution discrepancy between support and query features. We propose a query-guided prototype learning architecture to address this problem from two aspects: (i) We propose a cross-alignment loss for training the segmentation decoder. This loss function will help the decoder improve its robustness against the distribution discrepancy between support and query features. (ii) We build a dynamic fusion module to strengthen the original support prototype with another prototype extracted from query features. Experiments show that our method achieves promising results compared to previous prototype learning methods on PASCAL-5i and COCO-20i datasets.

References

[1]
Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, and Jose Dolz. 2021. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In CVPR. 13979–13988.
[2]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 4 (2017), 834–848.
[3]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213–3223.
[4]
Nanqing Dong and Eric P. Xing. 2018. Few-shot semantic segmentation with prototype learning. In BMVC, Vol. 3.
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16 \(\times\) 16 words: Transformers for image recognition at scale. ICLR (2021).
[6]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (VOC) challenge. IJCV 88, 2 (2010), 303–338.
[7]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In CVPR. 3146–3154.
[8]
Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In ECCV. Springer, 297–312.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
[10]
Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. CCNET: Criss-cross attention for semantic segmentation. In ICCV. 603–612.
[11]
Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim. 2021. Adaptive prototype learning and allocation for few-shot segmentation. In CVPR. 8334–8343.
[12]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV. Springer, 740–755.
[13]
Binghao Liu, Yao Ding, Jianbin Jiao, Xiangyang Ji, and Qixiang Ye. 2021. Anti-aliasing semantic reconstruction for few-shot semantic segmentation. In CVPR. 9747–9756.
[14]
Weide Liu, Chi Zhang, Guosheng Lin, and Fayao Liu. 2020. CRNet: Cross-reference networks for few-shot segmentation. In CVPR. 4165–4173.
[15]
Yongfei Liu, Xiangyi Zhang, Songyang Zhang, and Xuming He. 2020. Part-aware prototype network for few-shot semantic segmentation. In ECCV. Springer, 142–158.
[16]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431–3440.
[17]
Khoi Nguyen and Sinisa Todorovic. 2019. Feature weighting and boosting for few-shot segmentation. In ICCV. 622–631.
[18]
Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alyosha Efros, and Sergey Levine. 2018. Conditional networks for few-shot semantic segmentation.
[19]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-CNN: Towards real-time object detection with region proposal networks. In NeurIPS. 91–99.
[20]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234–241.
[21]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. IJCV 115, 3 (2015), 211–252.
[22]
A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots. 2017. One-shot learning for semantic segmentation. In BMVC.
[23]
Mennatullah Siam, Boris N. Oreshkin, and Martin Jagersand. 2019. AMP: Adaptive masked proxies for few-shot segmentation. In ICCV. 5249–5258.
[24]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In NeurIPS. 4077–4087.
[25]
Z. Tian, H. Zhao, M. Shu, Z. Yang, and J. Jia. 2020. Prior guided feature enrichment network for few-shot segmentation. TPAMI PP, 99 (2020), 1–1.
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998–6008.
[27]
Haochen Wang, Xudong Zhang, Yutao Hu, Yandan Yang, Xianbin Cao, and Xiantong Zhen. 2020. Few-shot semantic segmentation with democratic attention networks. In ECCV. Springer, 730–746.
[28]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep high-resolution representation learning for visual recognition. TPAMI 43, 10 (2020), 3349–3364.
[29]
Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019. PANET: Few-shot image semantic segmentation with prototype alignment. In ICCV. 9197–9206.
[30]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In CVPR. 7794–7803.
[31]
Xiu-Shen Wei, Yang Shen, Xuhao Sun, Han-Jia Ye, and Jian Yang. 2021. A \(^2\) -Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. Advances in Neural Information Processing Systems 34 (2021).
[32]
Xiu-Shen Wei, Peng Wang, Lingqiao Liu, Chunhua Shen, and Jianxin Wu. 2019. Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. TIP 28, 12 (2019), 6116–6125.
[33]
Guo-Sen Xie, Jie Liu, Huan Xiong, and Ling Shao. 2021. Scale-aware graph neural network for few-shot semantic segmentation. In CVPR. 5475–5484.
[34]
Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, and Qixiang Ye. 2020. Prototype mixture models for few-shot semantic segmentation. In ECCV. Springer, 763–778.
[35]
Bingfeng Zhang, Jimin Xiao, and Terry Qin. 2021. Self-guided and cross-guided learning for few-shot segmentation. In CVPR. 8312–8321.
[36]
Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. 2019. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In ICCV. 9587–9595.
[37]
Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. 2019. CANET: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In CVPR. 5217–5226.
[38]
Xiaolin Zhang, Yunchao Wei, Yi Yang, and Thomas S. Huang. 2020. SG-one: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics (2020).
[39]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR. 2881–2890.
[40]
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR. 6881–6890.
[41]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633–641.

Cited By

View all
  • (2024)Effective Video Summarization by Extracting Parameter-Free Motion AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467020:7(1-20)Online publication date: 16-May-2024
  • (2024)Unifying Pictorial and Textual Features for Screen Content Image Quality EvaluationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657610(1099-1103)Online publication date: 30-May-2024
  • (2024)Robust Image Hashing via CP Decomposition and DCT for Copy DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365011220:7(1-22)Online publication date: 25-Apr-2024
  • Show More Cited By

Index Terms

  1. Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
    April 2023
    545 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3572861
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 March 2023
    Online AM: 12 August 2022
    Accepted: 23 July 2022
    Revised: 21 July 2022
    Received: 28 December 2021
    Published in TOMM Volume 19, Issue 2s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Few-shot segmentation
    2. prototype learning
    3. query set

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)152
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Effective Video Summarization by Extracting Parameter-Free Motion AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467020:7(1-20)Online publication date: 16-May-2024
    • (2024)Unifying Pictorial and Textual Features for Screen Content Image Quality EvaluationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657610(1099-1103)Online publication date: 30-May-2024
    • (2024)Robust Image Hashing via CP Decomposition and DCT for Copy DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365011220:7(1-22)Online publication date: 25-Apr-2024
    • (2024)Multi-Content Interaction Network for Few-Shot SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364385020:6(1-20)Online publication date: 8-Mar-2024
    • (2024)Efficient Video Transformers via Spatial-temporal Token Merging for Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363378120:4(1-21)Online publication date: 11-Jan-2024
    • (2024)M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657833(2156-2166)Online publication date: 10-Jul-2024
    • (2023)Real-time institution video data analysis using fog computing and adaptive background subtractionJournal of Real-Time Image Processing10.1007/s11554-023-01350-320:5Online publication date: 16-Aug-2023
    • (2023)Simple yet effective joint guidance learning for few-shot semantic segmentationApplied Intelligence10.1007/s10489-023-04937-253:22(26603-26621)Online publication date: 26-Aug-2023
    • (2023)Lightweight transformer and multi-head prediction network for no-reference image quality assessmentNeural Computing and Applications10.1007/s00521-023-09188-336:4(1931-1946)Online publication date: 18-Nov-2023
    • (2023)Few-shot semantic segmentation: a review on recent approachesNeural Computing and Applications10.1007/s00521-023-08758-935:25(18251-18275)Online publication date: 1-Sep-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media