Article

Analyzing Co-training Style Algorithms

Authors:

Zhi-Hua ZhouAuthors Info & Claims

ECML '07: Proceedings of the 18th European conference on Machine Learning

Pages 454 - 465

https://doi.org/10.1007/978-3-540-74958-5_42

Published: 17 September 2007 Publication History

Abstract

Co-training is a semi-supervised learning paradigm which trains two learners respectively from two different views and lets the learners label some unlabeled examples for each other. In this paper, we present a new PAC analysis on co-training style algorithms. We show that the co-training process can succeed even without two views, given that the two learners have large difference, which explains the success of some co-training style algorithms that do not require two views. Moreover, we theoretically explain that why the co-training process could not improve the performance further after a number of rounds, and present a rough estimation on the appropriate round to terminate co-training to avoid some wasteful learning rounds.

References

[1]

Abney, S.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 360-367 (2002).

Digital Library

[2]

Balcan, M.F., Blum, A., Yang, K.: Co-training and expansion: Towards bridging theory and practice. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 89-96. MIT Press, Cambridge, MA (2005).

[3]

Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Machine Learning 56, 209-239 (2004).

Digital Library

[4]

Blake, C., Keogh, E., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

[5]

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, WI, pp. 92-100 (1998).

Digital Library

[6]

Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 585-592. MIT Press, Cambridge, MA (2003).

[7]

Dasgupta, S., Littman, M., McAllester, D.: PAC generalization bounds for cotraining. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 375-382. MIT Press, Cambridge, MA (2002).

[8]

Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA, pp. 327-334 (2000).

Digital Library

[9]

Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, pp. 200-209 (1999).

Digital Library

[10]

Krogel, M.A., Scheffer, T.: Effectiveness of information extraction, multi-relational, and semi-supervised learning for predicting functional properties of genes. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, FL, pp. 569-572. IEEE Computer Society Press, Los Alamitos (2003).

Digital Library

[11]

Kushmerick, N.: Learning to remove internet advertisements. In: Proceedings of the 3rd Annual Conference on Autonomous Agents, Seattle, WA, pp. 175-181 (1999).

Digital Library

[12]

Mladenic, D.: Modeling information in textual data combining labeled and unlabeled data. In: Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, pp. 170-179.

Digital Library

[13]

Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103-134 (2000).

Digital Library

[14]

Pierce, D., Cardie, C.: Limitations of co-training for natural language learning from large data sets. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, Pittsburgh, PA, pp. 1-9 (2001).

[15]

Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA (2005).

Digital Library

[16]

Zhou, Z.H., Li, M.: Semi-supervised regression with co-training. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, pp. 908-913 (2005).

Digital Library

[17]

Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, Washington, DC, pp. 912-919 (2003).

Digital Library

Cited By

Zhou LDu GLü KWang LDu J(2024)A Survey and an Empirical Evaluation of Multi-View Clustering ApproachesACM Computing Surveys10.1145/364510856:7(1-38)Online publication date: 8-Feb-2024
https://dl.acm.org/doi/10.1145/3645108
Chen MWang C(2024)Multi-head co-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112325302:COnline publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112325
Lu XWu JHuang JLuo FYuan J(2023)Co-Training-Teaching: A Robust Semi-Supervised Framework for Review-Aware Rating RegressionACM Transactions on Knowledge Discovery from Data10.1145/362539118:2(1-16)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3625391
Show More Cited By

Recommendations

Combining labeled and unlabeled data with co-training

COLT' 98: Proceedings of the eleventh annual conference on Computational learning theory

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ECML '07: Proceedings of the 18th European conference on Machine Learning

September 2007

805 pages

ISBN:9783540749578

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 September 2007

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou LDu GLü KWang LDu J(2024)A Survey and an Empirical Evaluation of Multi-View Clustering ApproachesACM Computing Surveys10.1145/364510856:7(1-38)Online publication date: 8-Feb-2024
https://dl.acm.org/doi/10.1145/3645108
Chen MWang C(2024)Multi-head co-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112325302:COnline publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112325
Lu XWu JHuang JLuo FYuan J(2023)Co-Training-Teaching: A Robust Semi-Supervised Framework for Review-Aware Rating RegressionACM Transactions on Knowledge Discovery from Data10.1145/362539118:2(1-16)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3625391
Han XYou MMa WJin CHe LSong MWang R(2023)Semi-supervised Learning with Easy Labeled Data via Impartial Labeled Set ExtensionProceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3607541.3616815(29-39)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3607541.3616815
Zhang WZhang JLi JTsung FFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)A Co-training Approach for Noisy Time Series LearningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614759(3308-3318)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614759
Monteiro JMartins BCosta MPires JRenz MSarwat M(2022)A co-training approach for spatial data disaggregationProceedings of the 30th International Conference on Advances in Geographic Information Systems10.1145/3557915.3561475(1-10)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1145/3557915.3561475
Che SKong ZPeng HSun LLeow AChen YHe L(2022)Federated Multi-view Learning for Private Medical Data Integration and AnalysisACM Transactions on Intelligent Systems and Technology10.1145/350181613:4(1-23)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3501816
Shen XZhou JMa ZBao BZha Z(2021)Cross-Domain Object Representation via Robust Low-Rank Correlation AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345882517:4(1-20)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3458825
Barreto CGorgônio ACanuto AXavier-Júnior J(2020)A Distance-Weighted Selection of Unlabelled Instances for Self-training and Co-training Semi-supervised MethodsIntelligent Systems10.1007/978-3-030-61380-8_24(352-366)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-61380-8_24
Liu YWang LBai YQin CDing ZFu Y(2020)Generative View-Correlation Adaptation for Semi-supervised Multi-view LearningComputer Vision – ECCV 202010.1007/978-3-030-58568-6_19(318-334)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58568-6_19
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents