research-article

Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning

Authors:

Luping ZhouAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8346 - 8355

https://doi.org/10.1145/3581783.3612567

Published: 27 October 2023 Publication History

Abstract

Pseudo-labelling-based semi-supervised learning (SSL) has demonstrated remarkable success in enhancing model performance by effectively leveraging a large amount of unlabeled data. However, existing studies focus mainly on rectifying individual predictions (i.e., pseudo-labels) on each unlabeled instance but ignore the overall prediction statistics from a global perspective. Such neglect may lead to model collapse and performance degradation in SSL, especially in label-scarce scenarios. In this paper, we emphasize the cruciality of global prediction constraints and propose a new SSL method that employs Entropy-based optimization on both Individual and Global predictions of unlabeled instances, dubbed EntInG. Specifically, we propose two criteria for leveraging unlabeled data in SSL: individual prediction entropy minimization (IPEM) and global distribution entropy maximization (GDEM). On the one hand, we show that current dominant SSL methods can be viewed as an implicit form of IPEM improved by recent augmentation techniques. On the other hand, we construct a new distribution loss to encourage GDEM, which greatly benefits producing better pseudo-labels for unlabeled data. Theoretical analysis also demonstrates that our proposed criteria can be derived by enforcing mutual information maximization on unlabeled instances. Despite its simplicity, our proposed method can achieve significant accuracy gains on popular SSL classification benchmarks.

References

[1]

Abulikemu Abuduweili, Xingjian Li, et al. 2021. Adaptive Consistency Regularization for Semi-Supervised Transfer Learning. In CVPR. 6923--6932.

[2]

Luís B Almeida. 2003. MISEP--inear and nonlinear ICA based on mutual information. The journal of Machine Learning Research, Vol. 4 (2003), 1297--1318.

Digital Library

[3]

Philip Bachman, Ouais Alsharif, and Doina Precup. 2014. Learning with pseudo-ensembles. NeurIPS.

[4]

Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018).

[5]

Anthony J Bell and Terrence J Sejnowski. 1995. An information-maximization approach to blind separation and blind deconvolution. Neural computation, Vol. 7, 6 (1995), 1129--1159.

[6]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML. 41--48.

[7]

David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. In ICLR.

[8]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. In NeurIPS, Vol. 32.

[9]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 17th annual conference on computational learning theory. 92--100.

Digital Library

[10]

John S Bridle, Anthony JR Heading, and David JC MacKay. 1992. Unsupervised Classifiers, Mutual Information and 'Phantom Targets'. In NeurIPS, Vol. 4.

[11]

Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, and Vicente Ordonez. 2021. Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning. In AAAI, Vol. 35. 6912--6920.

[12]

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, Vol. 20, 3 (2009), 542--542.

Digital Library

[13]

Dongdong Chen, Wei Wang, Wei Gao, and Zhihua Zhou. 2018. Tri-net for semi-supervised deep learning. In IJCAI. 2014--2020.

[14]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.

[15]

Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In AIStat.

[16]

Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops. https://doi.org/10.1109/CVPRW50498.2020.00359

[17]

Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. NeurIPS, Vol. 26.

[18]

Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).

[19]

Yue Duan, Zhen Zhao, Lei Qi, Lei Wang, Luping Zhou, Yinghuan Shi, and Yang Gao. 2022. MutexMatch: semi-supervised learning with mutex-based consistency regularization. IEEE Transactions on Neural Networks and Learning Systems (2022).

[20]

Chengyue Gong, Dilin Wang, and Qiang Liu. 2021. Alphamatch: Improving consistency for semi-supervised learning with alpha-divergence. In CVPR. 13683--13692.

[21]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. MIT press Cambridge.

Digital Library

[22]

Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. In NeurIPS. 529--536.

[23]

Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization. In CAP. 281--296.

[24]

Guan Gui, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, and Yinghuan Shi. 2022. Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class. In NeurIPS. 19849--19860.

[25]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729--9738.

[26]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[27]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. In ICLR.

[28]

Zijian Hu, Zhengyu Yang, Xuefeng Hu, and Ram Nevatia. 2021. Simple: Similar pseudo label exploitation for semi-supervised classification. In CVPR. 15099--15108.

[29]

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).

[30]

Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies, Vol. 9, 1 (2021), 2.

[31]

Byoungjip Kim, Jinho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai Min, and Youngjune Gwon. 2020. SelfMatch: Combining Contrastive Self-Supervision and Consistency for Semi-Supervised Learning. In NeurIPS Workshop.

[32]

A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).

[33]

Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In ICLR.

[34]

Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop.

[35]

Junnan Li, Caiming Xiong, and Steven Hoi. 2021. CoMatch: Semi-supervised Learning with Contrastive Graph Regularization. In ICCV. 9475--9484.

[36]

Ralph Linsker. 1988. Self-organization in a perceptual network. Computer, Vol. 21, 3 (1988), 105--117.

Digital Library

[37]

Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, and Jie Tang. 2020. Self-supervised learning: Generative or contrastive. arXiv preprint arXiv:2006.08218, Vol. 1, 2 (2020).

[38]

Geoffrey J McLachlan. 1975. Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Amer. Statist. Assoc., Vol. 70, 350 (1975), 365--369.

[39]

Lu Mi, Hao Wang, Yonglong Tian, and Nir Shavit. 2019. Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate. arXiv preprint arXiv:1910.04858 (2019).

[40]

Yuval Netzer and Tao Wang. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NeurIPS Workshop.

[41]

Avital Oliver, Augustus Odena, Colin Raffel, Ekin D Cubuk, and Ian J Goodfellow. 2018. Realistic evaluation of deep semi-supervised learning algorithms. arXiv preprint arXiv:1804.09170 (2018).

[42]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).

[43]

Yassine Ouali, Céline Hudelot, and Myriam Tami. 2020. An Overview of Deep Semi-Supervised Learning. arXiv preprint arXiv:2006.05278 (2020).

[44]

Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. 2018. Deep co-training for semi-supervised image recognition. In ECCV. 135--152.

[45]

Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In NeurIPS, Vol. 28.

[46]

Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, and Mubarak Shah. 2021. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In ICLR.

[47]

Chuck Rosenberg, Martial Hebert, and Henry Schneiderman. 2005. Semi-Supervised Self-Training of Object Detection Models. In WACV Workshops.

[48]

Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020).

[49]

Kai Sheng Tai, Peter Bailis, and Gregory Valiant. 2021. Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training. ICML.

[50]

Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS, Vol. 30.

[51]

Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Yoshua Bengio, and David Lopez-Paz. 2019. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825 (2019).

[52]

Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR. 3733--3742.

[53]

Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019).

[54]

Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020. Self-training with noisy student improves imagenet classification. In CVPR. 10687--10698.

[55]

I Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).

[56]

Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei Zhang, Chengjie Wang, and Long Zeng. 2022. Class-aware contrastive semi-supervised learning. In CVPR. 14421--14430.

[57]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).

[58]

Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. 2019. S4l: Self-supervised semi-supervised learning. In ICCV. 1476--1485.

[59]

Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, and Takahiro Shinozaki. 2021. FlexMatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling. In NeurIPS.

[60]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).

[61]

Zhen Zhao, Luping Zhou, Yue Duan, Lei Wang, Lei Qi, and Yinghuan Shi. 2022a. DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning. In CVPR. 9757--9765.

[62]

Zhen Zhao, Luping Zhou, Lei Wang, Yinghuan Shi, and Yang Gao. 2022b. LaSSL: Label-guided Self-training for Semi-supervised Learning. In AAAI. 9208--9216.

Index Terms

Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Semi-supervised partial label learning algorithm via reliable label propagation
Abstract
Partial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Multiview Semi-Supervised Learning with Consensus

Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
121
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)8

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents