Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Bridging knowledge distillation gap for few-sample unsupervised semantic segmentation

Published: 18 July 2024 Publication History

Abstract

Due to privacy, security, and costly labeling of images, unsupervised semantic segmentation with very few samples has become a promising direction, but still remains unexplored. This inspires us to introduce the few-sample unsupervised semantic segmentation task, which is very challenging because generalizing the segmentation model from only a few unlabeled images is far from sufficient. We address this problem in the knowledge distillation perspective, by proposing a medium-sized auxiliary network as the bridge, which narrows down the semantic knowledge gap between teacher network (large) and student network (small). To this end, we develop the Knowledge Distillation Bridge (KDB) framework for few-sample unsupervised semantic segmentation. In particular, it consists of the teacher-auxiliary-student architecture, which adopts the block-wise distillation that encourages the auxiliary to imitate the teacher and the student to imitate the auxiliary. In this way, the knowledge gap between the source feature distribution and the target one is reduced, allowing the student with the smaller network to be readily deployed in highly-demanding environment. Meanwhile, each channel characterizes different semantics in feature map, which motivates us to distill the features of decoder in a channel-wise manner. Extensive experiments on two benchmarks including Pascal VOC2012 and Cityscapes demonstrate the promising performance of the proposed method, which strikes a good balance between precision and speed, e.g., it achieves the inference speed of 230 fps for a 512 × 512 image.

References

[1]
Haoli Bai, Jiaxiang Wu, Irwin King, Michael Lyu, Few shot network compression via cross distillation, Proc. AAAI Conf. Artif. Intell. 34 (2020) 3203–3210.
[2]
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, Oscar Beijbom, nuscenes: a multimodal dataset for autonomous driving, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 11621–11631.
[3]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin, Emerging properties in self-supervised vision transformers, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9650–9660.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille, Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4) (2017) 834–848.
[5]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 801–818.
[6]
Xue-Tao Chen, Ying Li, Jia-Hao Fan, Rui Wang, Rgam: a novel network architecture for 3d point cloud semantic segmentation in indoor scenes, Inf. Sci. 571 (2021) 87–103.
[7]
Jang Hyun Cho, Utkarsh Mall, Kavita Bala, Bharath Hariharan, Picie: unsupervised semantic segmentation using invariance and equivariance in clustering, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16794–16804.
[8]
Robert T. Collins, Alan J. Lipton, Takeo Kanade, Introduction to the special section on video surveillance, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 745–746.
[9]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele, The cityscapes dataset for semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 3213–3223.
[10]
Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding, Resrep: lossless cnn pruning via decoupling remembering and forgetting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4510–4520.
[11]
Mark Everingham, S.M. Ali Eslami, Luc Van Gool, Christopher K.I. Williams, John Winn, Andrew Zisserman, The Pascal visual object classes challenge: a retrospective, Int. J. Comput. Vis. 111 (2015) 98–136.
[12]
Eyono, Roy Henha; Carlucci, Fabio Maria; Esperança, Pedro M.; Ru, Binxin; Torr, Phillip (2021): Autokd: automatic knowledge distillation into a student architecture family. arXiv preprint arXiv:2111.03555.
[13]
Jianping Gou, Liyuan Sun, Baosheng Yu, Lan Du, Kotagiri Ramamohanarao, Dacheng Tao, Collaborative knowledge distillation via multiknowledge transfer, IEEE Trans. Neural Netw. Learn. Syst. (2022) page early access.
[14]
Jianping Gou, Xiangshuo Xiong, Baosheng Yu, Lan Du, Yibing Zhan, Dacheng Tao, Multi-target knowledge distillation via student self-reflection, Int. J. Comput. Vis. 131 (2023) 1857–1874.
[15]
Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu, Attention mechanisms in computer vision: a survey, Comput. Vis. Media 8 (3) (2022) 331–368.
[16]
Bharath Hariharan, Pablo Arbeláez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik, Semantic contours from inverse detectors, in: Proceedings of the IEEE International Conference on Computer Vision, ICCV, 2011, pp. 991–998.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.
[18]
Hinton, Geoffrey; Vinyals, Oriol; Dean, Jeff (2015): Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
[19]
Jie Hu, Li Shen, Gang Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 7132–7141.
[20]
Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen, Segsort: segmentation by discriminative sorting of segments, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7334–7344.
[21]
Tsung-Wei Ke, Jyh-Jing Hwang, Yunhui Guo, Xudong Wang, Stella X. Yu, Unsupervised hierarchical semantic segmentation with multiview cosegmentation and clustering transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2571–2581.
[22]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf, Pruning filters for efficient convnets, in: Proceedings of the International Conference on Learning Representations, ICLR, 2017.
[23]
Lujun Li, Peijie Dong, Zimian Wei, Ya Yang, Automated knowledge distillation via Monte Carlo tree search, in: IEEE International Conference on Computer Vision, ICCV, 2023, pp. 17367–17378.
[24]
Li, Ping; Chen, Junjie; Lin, Binbin; Xu, Xianghua (2023): Residual spatial fusion network for rgb-thermal semantic segmentation. arXiv preprint arXiv:2306.10364.
[25]
Li, Ping; Chen, Junjie; Yuan, Li; Xu, Xianghua; Song, Mingli (2023): Triple-view knowledge distillation for semi-supervised semantic segmentation. arXiv preprint arXiv:2309.12557.
[26]
Tianhong Li, Jianguo Li, Zhuang Liu, Changshui Zhang, Few sample knowledge distillation for efficient network compression, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14639–14647.
[27]
Xin Li, Yiming Zhou, Zheng Pan, Jiashi Feng, Partial order pruning: for best speed/accuracy trade-off in neural architecture search, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9145–9153.
[28]
Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L. Yuille, Fei-Fei Li, Auto-deeplab: hierarchical neural architecture search for semantic image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 82–92.
[29]
Xiaolong Liu, Lujun Li, Chao Li, Anbang Yao, Norm: knowledge distillation via n-to-one representation matching, in: Proceedings of the International Conference on Learning Representations, ICLR, 2023.
[30]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell, Rethinking the value of network pruning, in: Proceedings of the International Conference on Learning Representations, ICLR, 2019.
[31]
Jian-Hao Luo, Jianxin Wu, Neural network pruning with residual-connections and limited-data, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1458–1467.
[32]
S. Ehsan Mirsadeghi, Ali Royat, Hamid Rezatofighi, Unsupervised image segmentation by mutual information maximization and adversarial regularization, IEEE Robot. Autom. Lett. 6 (4) (2021) 6931–6938.
[33]
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, Hassan Ghasemzadeh, Improved knowledge distillation via teacher assistant, Proc. AAAI Conf. Artif. Intell. 34 (2020) 5191–5198.
[34]
Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid, Fast neural architecture search of compact semantic segmentation models via auxiliary cells, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9126–9135.
[35]
Bo Pang, Yizhuo Li, Yifan Zhang, Gao Peng, Jiajun Tang, Kaiwen Zha, Jiefeng Li, Cewu Lu, Unsupervised representation for semantic segmentation by implicit cycle-attention contrastive learning, Proc. AAAI Conf. Artif. Intell. 36 (2022) 2044–2052.
[36]
Chengchao Shen, Xinchao Wang, Youtan Yin, Jie Song, Sihui Luo, Mingli Song, Progressive network grafting for few-shot knowledge distillation, Proc. AAAI Conf. Artif. Intell. 35 (2021) 2541–2549.
[37]
Simonyan, Karen; Zisserman, Andrew (2014): Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[38]
Xin Sun, Yu Zhang, Changrui Chen, Sihang Xie, Junyu Dong, High-order paired-aspp for deep semantic segmentation networks, Inf. Sci. 646 (2023).
[39]
Xiangyan Tang, Wenxuan Tu, Keqiu Li, Jieren Cheng, Dffnet: an iot-perceptive dual feature fusion network for general real-time semantic segmentation, Inf. Sci. 565 (2021) 326–343.
[40]
Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool, Unsupervised semantic segmentation by contrasting object mask proposals, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10052–10062.
[41]
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei Li, Dense contrastive learning for self-supervised visual pre-training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3024–3033.
[42]
Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun, Unified perceptual parsing for scene understanding, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 418–434.
[43]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Segformer: simple and efficient design for semantic segmentation with transformers, in: Advances in Neural Information Processing Systems, NeurIPS, 2021, pp. 12077–12090.
[44]
Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu, Propagate yourself: exploring pixel-level consistency for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16684–16693.
[45]
Guodong Xu, Ziwei Liu, Xiaoxiao Li, Chen Change Loy, Knowledge distillation meets self-supervision, in: Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part IX, Glasgow, UK, August 23–28, 2020, Springer, 2020, pp. 588–604.
[46]
Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin, Transfgu: a top-down approach to fine-grained unsupervised semantic segmentation, in: Proceedings of the European Conference on Computer Vision, ECCV, Springer, 2022, pp. 73–89.
[47]
Zhiwei Zhang, Yisha Liu, Weimin Xue, Ms-irtnet: multistage information interaction network for rgb-t semantic segmentation, Inf. Sci. 647 (2023).
[48]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, Pyramid scene parsing network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 2881–2890.
[49]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 2921–2929.
[50]
Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba, Semantic understanding of scenes through the ade20k dataset, Int. J. Comput. Vis. 127 (3) (2019) 302–321.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 673, Issue C
Jul 2024
181 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 18 July 2024

Author Tags

  1. Unsupervised semantic segmentation
  2. Knowledge distillation
  3. Block-wise/channel-wise distillation
  4. Few-sample learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media