Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3612567acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning

Published: 27 October 2023 Publication History

Abstract

Pseudo-labelling-based semi-supervised learning (SSL) has demonstrated remarkable success in enhancing model performance by effectively leveraging a large amount of unlabeled data. However, existing studies focus mainly on rectifying individual predictions (i.e., pseudo-labels) on each unlabeled instance but ignore the overall prediction statistics from a global perspective. Such neglect may lead to model collapse and performance degradation in SSL, especially in label-scarce scenarios. In this paper, we emphasize the cruciality of global prediction constraints and propose a new SSL method that employs Entropy-based optimization on both Individual and Global predictions of unlabeled instances, dubbed EntInG. Specifically, we propose two criteria for leveraging unlabeled data in SSL: individual prediction entropy minimization (IPEM) and global distribution entropy maximization (GDEM). On the one hand, we show that current dominant SSL methods can be viewed as an implicit form of IPEM improved by recent augmentation techniques. On the other hand, we construct a new distribution loss to encourage GDEM, which greatly benefits producing better pseudo-labels for unlabeled data. Theoretical analysis also demonstrates that our proposed criteria can be derived by enforcing mutual information maximization on unlabeled instances. Despite its simplicity, our proposed method can achieve significant accuracy gains on popular SSL classification benchmarks.

References

[1]
Abulikemu Abuduweili, Xingjian Li, et al. 2021. Adaptive Consistency Regularization for Semi-Supervised Transfer Learning. In CVPR. 6923--6932.
[2]
Luís B Almeida. 2003. MISEP--inear and nonlinear ICA based on mutual information. The journal of Machine Learning Research, Vol. 4 (2003), 1297--1318.
[3]
Philip Bachman, Ouais Alsharif, and Doina Precup. 2014. Learning with pseudo-ensembles. NeurIPS.
[4]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018).
[5]
Anthony J Bell and Terrence J Sejnowski. 1995. An information-maximization approach to blind separation and blind deconvolution. Neural computation, Vol. 7, 6 (1995), 1129--1159.
[6]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML. 41--48.
[7]
David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. In ICLR.
[8]
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. In NeurIPS, Vol. 32.
[9]
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 17th annual conference on computational learning theory. 92--100.
[10]
John S Bridle, Anthony JR Heading, and David JC MacKay. 1992. Unsupervised Classifiers, Mutual Information and 'Phantom Targets'. In NeurIPS, Vol. 4.
[11]
Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, and Vicente Ordonez. 2021. Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning. In AAAI, Vol. 35. 6912--6920.
[12]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, Vol. 20, 3 (2009), 542--542.
[13]
Dongdong Chen, Wei Wang, Wei Gao, and Zhihua Zhou. 2018. Tri-net for semi-supervised deep learning. In IJCAI. 2014--2020.
[14]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.
[15]
Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In AIStat.
[16]
Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops. https://doi.org/10.1109/CVPRW50498.2020.00359
[17]
Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. NeurIPS, Vol. 26.
[18]
Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).
[19]
Yue Duan, Zhen Zhao, Lei Qi, Lei Wang, Luping Zhou, Yinghuan Shi, and Yang Gao. 2022. MutexMatch: semi-supervised learning with mutex-based consistency regularization. IEEE Transactions on Neural Networks and Learning Systems (2022).
[20]
Chengyue Gong, Dilin Wang, and Qiang Liu. 2021. Alphamatch: Improving consistency for semi-supervised learning with alpha-divergence. In CVPR. 13683--13692.
[21]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. MIT press Cambridge.
[22]
Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. In NeurIPS. 529--536.
[23]
Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization. In CAP. 281--296.
[24]
Guan Gui, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, and Yinghuan Shi. 2022. Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class. In NeurIPS. 19849--19860.
[25]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729--9738.
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[27]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. In ICLR.
[28]
Zijian Hu, Zhengyu Yang, Xuefeng Hu, and Ram Nevatia. 2021. Simple: Similar pseudo label exploitation for semi-supervised classification. In CVPR. 15099--15108.
[29]
Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).
[30]
Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies, Vol. 9, 1 (2021), 2.
[31]
Byoungjip Kim, Jinho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai Min, and Youngjune Gwon. 2020. SelfMatch: Combining Contrastive Self-Supervision and Consistency for Semi-Supervised Learning. In NeurIPS Workshop.
[32]
A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).
[33]
Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In ICLR.
[34]
Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop.
[35]
Junnan Li, Caiming Xiong, and Steven Hoi. 2021. CoMatch: Semi-supervised Learning with Contrastive Graph Regularization. In ICCV. 9475--9484.
[36]
Ralph Linsker. 1988. Self-organization in a perceptual network. Computer, Vol. 21, 3 (1988), 105--117.
[37]
Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, and Jie Tang. 2020. Self-supervised learning: Generative or contrastive. arXiv preprint arXiv:2006.08218, Vol. 1, 2 (2020).
[38]
Geoffrey J McLachlan. 1975. Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Amer. Statist. Assoc., Vol. 70, 350 (1975), 365--369.
[39]
Lu Mi, Hao Wang, Yonglong Tian, and Nir Shavit. 2019. Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate. arXiv preprint arXiv:1910.04858 (2019).
[40]
Yuval Netzer and Tao Wang. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NeurIPS Workshop.
[41]
Avital Oliver, Augustus Odena, Colin Raffel, Ekin D Cubuk, and Ian J Goodfellow. 2018. Realistic evaluation of deep semi-supervised learning algorithms. arXiv preprint arXiv:1804.09170 (2018).
[42]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
[43]
Yassine Ouali, Céline Hudelot, and Myriam Tami. 2020. An Overview of Deep Semi-Supervised Learning. arXiv preprint arXiv:2006.05278 (2020).
[44]
Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. 2018. Deep co-training for semi-supervised image recognition. In ECCV. 135--152.
[45]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In NeurIPS, Vol. 28.
[46]
Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, and Mubarak Shah. 2021. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In ICLR.
[47]
Chuck Rosenberg, Martial Hebert, and Henry Schneiderman. 2005. Semi-Supervised Self-Training of Object Detection Models. In WACV Workshops.
[48]
Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020).
[49]
Kai Sheng Tai, Peter Bailis, and Gregory Valiant. 2021. Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training. ICML.
[50]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS, Vol. 30.
[51]
Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. 2019. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825 (2019).
[52]
Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR. 3733--3742.
[53]
Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019).
[54]
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020. Self-training with noisy student improves imagenet classification. In CVPR. 10687--10698.
[55]
I Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).
[56]
Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei Zhang, Chengjie Wang, and Long Zeng. 2022. Class-aware contrastive semi-supervised learning. In CVPR. 14421--14430.
[57]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).
[58]
Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. 2019. S4l: Self-supervised semi-supervised learning. In ICCV. 1476--1485.
[59]
Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, and Takahiro Shinozaki. 2021. FlexMatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling. In NeurIPS.
[60]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).
[61]
Zhen Zhao, Luping Zhou, Yue Duan, Lei Wang, Lei Qi, and Yinghuan Shi. 2022a. DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning. In CVPR. 9757--9765.
[62]
Zhen Zhao, Luping Zhou, Lei Wang, Yinghuan Shi, and Yang Gao. 2022b. LaSSL: Label-guided Self-training for Semi-supervised Learning. In AAAI. 9208--9216.

Index Terms

  1. Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. distribution entropy maximization
      2. mutual information maximization.
      3. prediction entropy minimization
      4. semi-supervised learning

      Qualifiers

      • Research-article

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 121
        Total Downloads
      • Downloads (Last 12 months)121
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media