research-article

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

Authors:

Yi YuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 2s

Article No.: 84, Pages 1 - 20

https://doi.org/10.1145/3555314

Published: 15 March 2023 Publication History

Abstract

Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps extracted from images in the support set. These support prototypes may lead to inaccurate predictions when directly compared with features extracted from the query set due to the considerable distribution discrepancy between support and query features. We propose a query-guided prototype learning architecture to address this problem from two aspects: (i) We propose a cross-alignment loss for training the segmentation decoder. This loss function will help the decoder improve its robustness against the distribution discrepancy between support and query features. (ii) We build a dynamic fusion module to strengthen the original support prototype with another prototype extracted from query features. Experiments show that our method achieves promising results compared to previous prototype learning methods on PASCAL-5ⁱ and COCO-20ⁱ datasets.

References

[1]

Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, and Jose Dolz. 2021. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In CVPR. 13979–13988.

[2]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 4 (2017), 834–848.

[3]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213–3223.

[4]

Nanqing Dong and Eric P. Xing. 2018. Few-shot semantic segmentation with prototype learning. In BMVC, Vol. 3.

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16 \(\times\) 16 words: Transformers for image recognition at scale. ICLR (2021).

[6]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (VOC) challenge. IJCV 88, 2 (2010), 303–338.

Digital Library

[7]

Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In CVPR. 3146–3154.

[8]

Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In ECCV. Springer, 297–312.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.

[10]

Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. CCNET: Criss-cross attention for semantic segmentation. In ICCV. 603–612.

[11]

Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim. 2021. Adaptive prototype learning and allocation for few-shot segmentation. In CVPR. 8334–8343.

[12]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV. Springer, 740–755.

[13]

Binghao Liu, Yao Ding, Jianbin Jiao, Xiangyang Ji, and Qixiang Ye. 2021. Anti-aliasing semantic reconstruction for few-shot semantic segmentation. In CVPR. 9747–9756.

[14]

Weide Liu, Chi Zhang, Guosheng Lin, and Fayao Liu. 2020. CRNet: Cross-reference networks for few-shot segmentation. In CVPR. 4165–4173.

[15]

Yongfei Liu, Xiangyi Zhang, Songyang Zhang, and Xuming He. 2020. Part-aware prototype network for few-shot semantic segmentation. In ECCV. Springer, 142–158.

[16]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431–3440.

[17]

Khoi Nguyen and Sinisa Todorovic. 2019. Feature weighting and boosting for few-shot segmentation. In ICCV. 622–631.

[18]

Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alyosha Efros, and Sergey Levine. 2018. Conditional networks for few-shot semantic segmentation.

[19]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-CNN: Towards real-time object detection with region proposal networks. In NeurIPS. 91–99.

[20]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234–241.

[21]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. IJCV 115, 3 (2015), 211–252.

Digital Library

[22]

A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots. 2017. One-shot learning for semantic segmentation. In BMVC.

[23]

Mennatullah Siam, Boris N. Oreshkin, and Martin Jagersand. 2019. AMP: Adaptive masked proxies for few-shot segmentation. In ICCV. 5249–5258.

[24]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In NeurIPS. 4077–4087.

[25]

Z. Tian, H. Zhao, M. Shu, Z. Yang, and J. Jia. 2020. Prior guided feature enrichment network for few-shot segmentation. TPAMI PP, 99 (2020), 1–1.

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998–6008.

[27]

Haochen Wang, Xudong Zhang, Yutao Hu, Yandan Yang, Xianbin Cao, and Xiantong Zhen. 2020. Few-shot semantic segmentation with democratic attention networks. In ECCV. Springer, 730–746.

[28]

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep high-resolution representation learning for visual recognition. TPAMI 43, 10 (2020), 3349–3364.

[29]

Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019. PANET: Few-shot image semantic segmentation with prototype alignment. In ICCV. 9197–9206.

[30]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In CVPR. 7794–7803.

[31]

Xiu-Shen Wei, Yang Shen, Xuhao Sun, Han-Jia Ye, and Jian Yang. 2021. A \(^2\) -Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. Advances in Neural Information Processing Systems 34 (2021).

[32]

Xiu-Shen Wei, Peng Wang, Lingqiao Liu, Chunhua Shen, and Jianxin Wu. 2019. Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. TIP 28, 12 (2019), 6116–6125.

Digital Library

[33]

Guo-Sen Xie, Jie Liu, Huan Xiong, and Ling Shao. 2021. Scale-aware graph neural network for few-shot semantic segmentation. In CVPR. 5475–5484.

[34]

Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, and Qixiang Ye. 2020. Prototype mixture models for few-shot semantic segmentation. In ECCV. Springer, 763–778.

[35]

Bingfeng Zhang, Jimin Xiao, and Terry Qin. 2021. Self-guided and cross-guided learning for few-shot segmentation. In CVPR. 8312–8321.

[36]

Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. 2019. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In ICCV. 9587–9595.

[37]

Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. 2019. CANET: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In CVPR. 5217–5226.

[38]

Xiaolin Zhang, Yunchao Wei, Yi Yang, and Thomas S. Huang. 2020. SG-one: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics (2020).

[39]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR. 2881–2890.

[40]

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, et al. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR. 6881–6890.

[41]

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633–641.

Cited By

Han TZhou QYu JYu ZZhang JZhao S(2024)Effective Video Summarization by Extracting Parameter-Free Motion AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467020:7(1-20)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3654670
Chen YLiang XYu MTang ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Unifying Pictorial and Textual Features for Screen Content Image Quality EvaluationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657610(1099-1103)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657610
Liang XLiu WZhang XTang Z(2024)Robust Image Hashing via CP Decomposition and DCT for Copy DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365011220:7(1-22)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3650112
Show More Cited By

Index Terms

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation

Recommendations

Contrastive enhancement using latent prototype for few-shot segmentation
Abstract
Few-shot segmentation is a task that aims to generalize well to segment novel categories in images leveraging only a few annotated samples. Most existing methods adopt the prototype learning architecture, where support prototype vectors are ...
Zero-shot classification with unseen prototype learning
Abstract
Zero-shot learning (ZSL) aims at recognizing instances from unseen classes via training a classification model with only seen data. Most existing approaches easily suffer from the classification bias from unseen to seen categories since the models ...
Few-Shot Segmentation via Complementary Prototype Learning and Cascaded Refinement
Pattern Recognition and Computer Vision
Abstract
Prototype learning has been widely explored for few-shot segmentation. Existing methods typically learn the prototype from the foreground features of all support images, which rarely consider the background similarities between the query images ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 2s

April 2023

545 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3572861

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2023

Online AM: 12 August 2022

Accepted: 23 July 2022

Revised: 21 July 2022

Received: 28 December 2021

Published in TOMM Volume 19, Issue 2s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
339
Total Downloads

Downloads (Last 12 months)152
Downloads (Last 6 weeks)18

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Han TZhou QYu JYu ZZhang JZhao S(2024)Effective Video Summarization by Extracting Parameter-Free Motion AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467020:7(1-20)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3654670
Chen YLiang XYu MTang ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Unifying Pictorial and Textual Features for Screen Content Image Quality EvaluationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657610(1099-1103)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657610
Liang XLiu WZhang XTang Z(2024)Robust Image Hashing via CP Decomposition and DCT for Copy DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365011220:7(1-22)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3650112
Chen HYu YDong YLu ZLi YZhang Z(2024)Multi-Content Interaction Network for Few-Shot SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364385020:6(1-20)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3643850
Feng ZXu JMa LZhang S(2024)Efficient Video Transformers via Spatial-temporal Token Merging for Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363378120:4(1-21)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3633781
Dong XFeng ZZhou CYu XYang MGuo QHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657833(2156-2166)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657833
Amshavalli RKalaivani J(2023)Real-time institution video data analysis using fog computing and adaptive background subtractionJournal of Real-Time Image Processing10.1007/s11554-023-01350-320:5Online publication date: 16-Aug-2023
https://dl.acm.org/doi/10.1007/s11554-023-01350-3
Chang ZLu YRan XGao XZhao H(2023)Simple yet effective joint guidance learning for few-shot semantic segmentationApplied Intelligence10.1007/s10489-023-04937-253:22(26603-26621)Online publication date: 26-Aug-2023
https://doi.org/10.1007/s10489-023-04937-2
Tang ZChen YChen ZLiang XZhang X(2023)Lightweight transformer and multi-head prediction network for no-reference image quality assessmentNeural Computing and Applications10.1007/s00521-023-09188-336:4(1931-1946)Online publication date: 18-Nov-2023
https://dl.acm.org/doi/10.1007/s00521-023-09188-3
Chang ZLu YRan XGao XWang X(2023)Few-shot semantic segmentation: a review on recent approachesNeural Computing and Applications10.1007/s00521-023-08758-935:25(18251-18275)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s00521-023-08758-9

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents