Article

Semi-supervised learning using randomized mincuts

Authors:

Avrim Blum,

John Lafferty,

Mugizi Robert Rwebangira,

Rajashekar ReddyAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 13

https://doi.org/10.1145/1015330.1015429

Published: 04 July 2004 Publication History

Get Access

Abstract

In many application domains there is a large amount of unlabeled data but only a very limited amount of labeled training data. One general approach that has been explored for utilizing this unlabeled data is to construct a graph on all the data points based on distance relationships among examples, and then to use the known labels to perform some type of graph partitioning. One natural partitioning to use is the minimum cut that agrees with the labeled data (Blum & Chawla, 2001), which can be thought of as giving the most probable label assignment if one views labels as generated according to a Markov Random Field on the graph. Zhu et al. (2003) propose a cut based on a relaxation of this field, and Joachims (2003) gives an algorithm based on finding an approximate min-ratio cut.In this paper, we extend the mincut approach by adding randomness to the graph structure. The resulting algorithm addresses several short-comings of the basic mincut approach, and can be given theoretical justification from both a Markov random field perspective and from sample complexity considerations. In cases where the graph does not have small cuts for a given classification problem, randomization may not help. However, our experiments on several datasets show that when the structure of the graph supports small cuts, this can result in highly accurate classifiers with good accuracy/coverage tradeoffs. In addition, we are able to achieve good performance with a very simple graph-construction procedure.

References

[1]

Benedek, G., & Itai, A. (1991). Learnability with respect to a fixed distribution. Theoretical Computer Science, 86, 377--389.]]

Digital Library

Google Scholar

[2]

Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. Proceedings of the 18th International Conference on Machine Learning (pp. 19--26). Morgan Kaufmann.]]

Digital Library

Google Scholar

[3]

Brown, J. I., Hickman, C. A., Sokal, A. D., & Wagner, D. G. (2001). Chromatic roots of generalized theta graphs. J. Combinatorial Theory, Series B, 83, 272--297.]]

Digital Library

Google Scholar

[4]

Dyer, M., Goldberg, L. A., Greenhill, C., & Jerrum, M. (2000). On the relative complexity of approximate counting problems. Proceedings of APPROX'00, Lecture Notes in Computer Science 1913 (pp. 108--119).]]

Digital Library

Google Scholar

[5]

Freund, Y., Mansour, Y., & Schapire, R. (2003). Generalization bounds for averaged classifiers (how to be a Bayesian without believing). To appear in Annals of Statistics. Preliminary version appeared in Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics, 2001.]]

Google Scholar

[6]

Greig, D., Porteous, B., & Seheult, A. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, Series B, 51, 271--279.]]

Crossref

Google Scholar

[7]

Hull, J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 550--554.]]

Digital Library

Google Scholar

[8]

Jerrum, M., & Sinclair, A. (1993). Polynomial-time approximation algorithms for the Ising model. SIAM Journal on Computing, 22, 1087--1116.]]

Digital Library

Google Scholar

[9]

Joachims, T. (2003). Transductive learning via spectral graph partitioning. Proceedings of the International Conference on Machine Learning (ICML) (pp. 290--297).]]

Google Scholar

[10]

Kleinberg, J. (2000). Detecting a network failure. Proc. 41st IEEE Symposium on Foundations of Computer Science (pp. 231--239).]]

Digital Library

Google Scholar

[11]

Kleinberg, J., Sandler, M., & Slivkins, A. (2004). Network failure detection and graph connectivity. Proc. 15th ACM-SIAM Symposium on Discrete Algorithms (pp. 76--85).]]

Digital Library

Google Scholar

[12]

Langford, J., & Shawe-Taylor, J. (2002). PAC-bayes and margins. Neural Information Processing Systems.]]

Google Scholar

[13]

McAllester, D. (2003). PAC-bayesian stochastic model selection. Machine Learning, 51, 5--21.]]

Digital Library

Google Scholar

[14]

Zhu, X., Gharahmani, Z., & Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. Proceedings of the 20th International Conference on Machine Learning (pp. 912--919).]]

Google Scholar

Cited By

View all

Tamura YYamada STamura YYamada S(2024)Deep Learning Method Based on Fault Big Data Analysis for OSS Reliability AssessmentApplied OSS Reliability Assessment Modeling, AI and Tools10.1007/978-3-031-64803-8_8(81-149)Online publication date: 26-Sep-2024
https://doi.org/10.1007/978-3-031-64803-8_8
Dalirrooyfard MMitrovic SNevmyvaka YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Nearly tight bounds for differentially private multiway cutProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667206(24947-24965)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667206
Zhou BSun YBabanezhad RKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Fast online node labeling for very large graphsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620207(42658-42697)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620207
Show More Cited By

Semi-supervised learning using randomized mincuts
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Multiview Semi-Supervised Learning with Consensus

Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Semi-supervised learning using label mean
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Semi-Supervised Support Vector Machines (S3VMs) typically directly estimate the label assignments for the unlabeled instances. This is often inefficient even with recent advances in the efficient training of the (supervised) SVM. In this paper, we show ...

Comments

Information & Contributors

Information

Published In

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

134
Total Citations
View Citations
1,047
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tamura YYamada STamura YYamada S(2024)Deep Learning Method Based on Fault Big Data Analysis for OSS Reliability AssessmentApplied OSS Reliability Assessment Modeling, AI and Tools10.1007/978-3-031-64803-8_8(81-149)Online publication date: 26-Sep-2024
https://doi.org/10.1007/978-3-031-64803-8_8
Dalirrooyfard MMitrovic SNevmyvaka YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Nearly tight bounds for differentially private multiway cutProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667206(24947-24965)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667206
Zhou BSun YBabanezhad RKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Fast online node labeling for very large graphsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620207(42658-42697)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620207
Pan ZWang BZhang RWang SLi YLi Y(2023)MIML-GAN: A GAN-Based Algorithm for Multi-Instance Multi-Label Learning on Overlapping Signal Waveform RecognitionIEEE Transactions on Signal Processing10.1109/TSP.2023.324209171(859-872)Online publication date: 2023
https://doi.org/10.1109/TSP.2023.3242091
Wei HZhu YLi XJiang B(2023)LoyalDE: Improving the performance of Graph Neural Networks with loyal node discovery and emphasisNeural Networks10.1016/j.neunet.2023.05.023Online publication date: May-2023
https://doi.org/10.1016/j.neunet.2023.05.023
Tamura YUeki RAnand AYamada S(2023)Estimation and comparison of mean time between failures based on deep learning for OSS fault big dataInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-01907-215:8(3596-3611)Online publication date: 25-Apr-2023
https://doi.org/10.1007/s13198-023-01907-2
Tamura YYamada S(2022)Prototype of 3D Reliability Assessment Tool Based on Deep Learning for Edge OSS ComputingMathematics10.3390/math1009157210:9(1572)Online publication date: 6-May-2022
https://doi.org/10.3390/math10091572
Merchant AMahadevan AMathioudakis M(2022)Scalably Using Node Attributes and Graph Structure for Node ClassificationEntropy10.3390/e2407090624:7(906)Online publication date: 30-Jun-2022
https://doi.org/10.3390/e24070906
Tamura YAnand AYamada S(2022)Visualization and Reliability Analysis for Edge Computing Open Source SoftwareHuman Interface and the Management of Information: Visual and Information Design10.1007/978-3-031-06424-1_30(410-420)Online publication date: 16-Jun-2022
https://doi.org/10.1007/978-3-031-06424-1_30
Sone HMiyamoto SKashihara YTamura YYamada S(2022)Deep Learning Approach Based on Fault Correction Time for Reliability Assessment of Cloud and Edge Open Source SoftwarePredictive Analytics in System Reliability10.1007/978-3-031-05347-4_1(1-17)Online publication date: 9-Sep-2022
https://doi.org/10.1007/978-3-031-05347-4_1
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Multiview Semi-Supervised Learning with Consensus

Semi-supervised learning using label mean

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations