Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-Content Interaction Network for Few-Shot Segmentation

Published: 08 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Few-Shot Segmentation (FSS) poses significant challenges due to limited support images and large intra-class appearance discrepancies. Most existing approaches focus on aligning the support-query correlations from the same layer of the frozen backbone while neglecting the bias between different tasks and different layers. In this article, we propose a Multi-Content Interaction Network (MCINet) to remedy these issues by fully exploiting and interacting with the different contextual information contained in distinct branches. Specifically, MCINet improves FSS from three perspectives: (1) boosting the query representations through incorporating the independent information from another learnable branch into the features from the frozen backbone, (2) enhancing the support-query correlations by exploiting both the same-layer and adjacent-layer features, and (3) refining the predicted results with a multi-scale mask prediction strategy. Experiments on three benchmarks demonstrate that our approach reaches state-of-the-art performances and outperforms the best competitors with many desirable advantages, especially on the challenging COCO dataset. Code will be released on GitHub (https://github.com/chenhao-zju/mcinet).

    References

    [1]
    Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, and Jose Dolz. 2021. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13979–13988.
    [2]
    Hao Chen, Yonghan Dong, Zheming Lu, Yunlong Yu, and Jungong Han. 2023. Self-prompting perceptual edge learning for dense prediction. IEEE Transactions on Circuits and Systems for Video Technology. Published Online, January 2023.
    [3]
    Hao Chen, Yonghan Dong, Zheming Lu, Yunlong Yu, and Jungong Han. 2024. Pixel matching network for cross-domain few-shot segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV ’24). 978–987.
    [4]
    Hao Chen, Zhe-Ming Lu, and Yang-Ming Zheng. 2023. Multi-similarity enhancement network for few-shot segmentation. IEEE Access 11 (2023), 73521–73530.
    [5]
    Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3640–3649.
    [6]
    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 801–818.
    [7]
    Kaiqi Dong, Wei Yang, Zhenbo Xu, Liusheng Huang, and Zhidong Yu. 2021. ABPNet: Adaptive background modeling for generalized few shot segmentation. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). 2271–2280.
    [8]
    Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.
    [9]
    Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang. 2022. Self-support few-shot semantic segmentation. In Proceedings of the 17th European Conference on Computer Vision (ICCV ’22).
    [10]
    Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, and David Z. Pan. 2022. Multi-scale high-resolution vision transformer for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12094–12103.
    [11]
    Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, and Qianru Sun. 2022. Revisiting local descriptor for improved few-shot classification. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2s (2022), Article 127, 23 pages.
    [12]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
    [13]
    Sunghwan Hong, Seokju Cho, Jisu Nam, Stephen Lin, and Seungryong Kim. 2022. Cost aggregation with 4D convolutional Swin transformer for few-shot segmentation. In Proceedings of the 17th European Conference on Computer Vision (ECCV ’22). 108–126.
    [14]
    Shuqiang Jiang, Weiqing Min, Yongqiang Lyu, and Linhu Liu. 2020. Few-shot food recognition via multi-view representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), Article 87, 20 pages.
    [15]
    Joakim Johnander, Johan Edstedt, Michael Felsberg, Fahad Shahbaz Khan, and Martin Danelljan. 2022. Dense Gaussian processes for few-shot segmentation. In Proceedings of the 17th European Conference on Computer Vision (ECCV ’22). 217–234.
    [16]
    Alper Kayabaşı, Gülin Tüfekci, and İlkay Ulusoy. 2023. Elimination of non-novel segments at multi-scale for few-shot segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV ’23). 2559–2567.
    [17]
    Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han. 2022. Learning what not to segment: A new perspective on few-shot segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8057–8067.
    [18]
    Chunbo Lang, Binfei Tu, Gong Cheng, and Junwei Han. 2022. Beyond the prototype: Divide-and-conquer proxies for few-shot segmentation. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI ’22). 1024–1030.
    [19]
    Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim. 2021. Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8334–8343.
    [20]
    Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, and Xiaogang Wang. 2019. Finding task-relevant features for few-shot learning by category traversal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–10.
    [21]
    Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, and Chi-Keung Tang. 2020. FSS-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2869–2878.
    [22]
    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14). 740–755.
    [23]
    Huafeng Liu, Pai Peng, Tao Chen, Qiong Wang, Yazhou Yao, and Xian-Sheng Hua. 2023. FECANet: Boosting few-shot semantic segmentation with feature-enhanced context-aware network. IEEE Transactions on Multimedia. Accepted.
    [24]
    Jie Liu, Yanqi Bao, Guo-Sen Xie, Huan Xiong, Jan-Jakob Sonke, and Efstratios Gavves. 2022. Dynamic prototype convolution network for few-shot semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11553–11562.
    [25]
    Lizhao Liu, Junyi Cao, Minqian Liu, Yong Guo, Qi Chen, and Mingkui Tan. 2020. Dynamic extension nets for few-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia (MM ’20). 1441–1449.
    [26]
    Weide Liu, Chi Zhang, Henghui Ding, Tzu-Yi Hung, and Guosheng Lin. 2022. Few-shot segmentation with optimal transport matching and message flow. IEEE Transactions on Multimedia 25 (2022), 5130–5141.
    [27]
    Xinfang Liu, Xiushan Nie, Junya Teng, Li Lian, and Yilong Yin. 2021. Single-shot semantic matching network for moment localization in videos. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 3 (2021), Article 84, 14 pages.
    [28]
    Yuanwei Liu, Nian Liu, Qinglong Cao, Xiwen Yao, Junwei Han, and Ling Shao. 2022. Learning non-target knowledge for few-shot semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11573–11582.
    [29]
    Yongfei Liu, Xiangyi Zhang, Songyang Zhang, and Xuming He. 2020. Part-aware prototype network for few-shot semantic segmentation. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20). 142–158.
    [30]
    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
    [31]
    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
    [32]
    Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2021. Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8741–8750.
    [33]
    Timo Lüddecke and Alexander Ecker. 2022. Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’22). 7086–7096.
    [34]
    Juhong Min, Dahyun Kang, and Minsu Cho. 2021. Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6941–6952.
    [35]
    Juhong Min, Dahyun Kang, and Minsu Cho. 2021. Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6941–6952.
    [36]
    Tsendsuren Munkhdalai and Hong Yu. 2017. Meta networks. In Proceedings of the International Conference on Machine Learning. 2554–2563.
    [37]
    Tsendsuren Munkhdalai, Xingdi Yuan, Soroush Mehri, and Adam Trischler. 2018. Rapid adaptation with conditionally shifted neurons. In Proceedings of the International Conference on Machine Learning. 3664–3673.
    [38]
    Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, and Jiaya Jia. 2023. Hierarchical dense correlation distillation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’23). 23641–23651.
    [39]
    Narinder Singh Punn and Sonali Agarwal. 2020. Inception U-Net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1 (2020), Article 12, 15 pages.
    [40]
    Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations.
    [41]
    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 234–241.
    [42]
    Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, and Byron Boots. 2017. One-shot learning for semantic segmentation. In Proceedings of the British Machine Vision Conference.
    [43]
    Guangchen Shi, Yirui Wu, Jun Liu, Shaohua Wan, Wenhai Wang, and Tong Lu. 2022. Incremental few-shot semantic segmentation via embedding adaptive-update and hyper-class representation. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). 5547–5556.
    [44]
    Xinyu Shi, Dong Wei, Yu Zhang, Donghuan Lu, Munan Ning, Jiashun Chen, Kai Ma, and Yefeng Zheng. 2022. Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. In Proceedings of the 17th European Conference on Computer Vision (ECCV ’22). 151–168.
    [45]
    Bo Sun, Banghuai Li, Shengcai Cai, Ye Yuan, and Chi Zhang. 2021. FSCE: Few-shot object detection via contrastive proposal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7352–7362.
    [46]
    Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 403–412.
    [47]
    Yanpeng Sun, Qiang Chen, Xiangyu He, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jian Cheng, Zechao Li, and Jingdong Wang. 2022. Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS ’22).
    [48]
    Yiming Tang and Yi Yu. 2023. Query-guided prototype learning with decoder alignment and dynamic fusion in few-shot segmentation. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 2s (2023), Article 84, 20 pages.
    [49]
    Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. 2020. Prior guided feature enrichment network for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. Published Online, August 3, 2020.
    [50]
    Haochen Wang, Xudong Zhang, Yutao Hu, Yandan Yang, Xianbin Cao, and Xiantong Zhen. 2020. Few-shot semantic segmentation with democratic attention networks. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20). 730–746.
    [51]
    Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019. PANet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9197–9206.
    [52]
    Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, and Tiejun Huang. 2022. Images speak in images: A generalist painter for in-context visual learning. arXiv preprint arXiv:2212.02499 (2022).
    [53]
    Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, and Tiejun Huang. 2023. SegGPT: Segmenting everything in context. arXiv preprint arXiv:2304.03284 (2023).
    [54]
    Yuan Wang, Rui Sun, Zhe Zhang, and Tianzhu Zhang. 2022. Adaptive agent transformer for few-shot segmentation. In Proceedings of the 17th European Conference on Computer Vision (ECCV ’22). 36–52.
    [55]
    Aming Wu, Yahong Han, Linchao Zhu, and Yi Yang. 2021. Universal-prototype enhancing for few-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9567–9576.
    [56]
    Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. 2018. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV ’18). 418–434.
    [57]
    Shunxin Xu, Ke Sun, Dong Liu, Zhiwei Xiong, and Zheng-Jun Zha. 2023. Synergy between semantic segmentation and image denoising via alternate boosting. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 2 (2023), Article 69, 23 pages.
    [58]
    Sung Whan Yoon, Jun Seo, and Jaekyun Moon. 2019. TapNet: Neural network augmented with task-adaptive projection for few-shot learning. In Proceedings of the International Conference on Machine Learning. 7115–7123.
    [59]
    Yuan Yuan, Jie Fang, Xiaoqiang Lu, and Yachuang Feng. 2019. Spatial structure preserving feature pyramid network for semantic image segmentation. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3 (2019), Article 73, 19 pages.
    [60]
    Bo Zhang, Jiakang Yuan, Baopu Li, Tao Chen, Jiayuan Fan, and Botian Shi. 2022. Learning cross-image object semantic relation in transformer for few-shot fine-grained image classification. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). 2135–2144.
    [61]
    Gengwei Zhang, Guoliang Kang, Yi Yang, and Yunchao Wei. 2021. Few-shot segmentation via cycle-consistent transformer. Advances in Neural Information Processing Systems 34 (2021), 21984–21996.
    [62]
    Ji Zhang, Jingkuan Song, Lianli Gao, and Hengtao Shen. 2022. Free-lunch for cross-domain few-shot learning: Style-aware episodic training with robust contrastive learning. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). 2586–2594.
    [63]
    Jian-Wei Zhang, Yifan Sun, Yi Yang, and Wei Chen. 2022. Feature-proxy transformer for few-shot segmentation. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, 6575–6588.
    [64]
    Lu Zhang, Yang Wang, Jiaogen Zhou, Chenbo Zhang, Yinglu Zhang, Jihong Guan, Yatao Bian, and Shuigeng Zhou. 2022. Hierarchical few-shot object detection: Problem, benchmark and method. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). 2002–2011.
    [65]
    Shan Zhang, Tianyi Wu, Sitong Wu, and Guodong Guo. 2022. CATrans: Context and affinity transformer for few-shot segmentation. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI ’22). 1658–1664.
    [66]
    Xiaolin Zhang, Yunchao Wei, Yi Yang, and Thomas S. Huang. 2020. SG-One: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics 50, 9 (2020), 3855–3865.
    [67]
    Yunzhi Zhuge and Chunhua Shen. 2021. Deep reasoning network for few-shot semantic segmentation. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). 5344–5352.

    Cited By

    View all
    • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
    • (2024)Pixel Matching Network for Cross-Domain Few-Shot Segmentation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00102(967-976)Online publication date: 3-Jan-2024
    • (2023)Multi-Similarity Enhancement Network for Few-Shot SegmentationIEEE Access10.1109/ACCESS.2023.329589311(73521-73530)Online publication date: 2023

    Index Terms

    1. Multi-Content Interaction Network for Few-Shot Segmentation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 6
      June 2024
      715 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3613638
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 March 2024
      Online AM: 03 February 2024
      Accepted: 25 January 2024
      Revised: 25 November 2023
      Received: 06 July 2023
      Published in TOMM Volume 20, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Few-shot semantic segmentation
      2. multi-content interaction
      3. adjacent-layer similarity

      Qualifiers

      • Research-article

      Funding Sources

      • Key R&D Program of Zhejiang Province, China
      • NSFC
      • Science and Technology Innovation 2025 Major Project of Ningbo, China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)160
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
      • (2024)Pixel Matching Network for Cross-Domain Few-Shot Segmentation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00102(967-976)Online publication date: 3-Jan-2024
      • (2023)Multi-Similarity Enhancement Network for Few-Shot SegmentationIEEE Access10.1109/ACCESS.2023.329589311(73521-73530)Online publication date: 2023

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media