Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1015330.1015429acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Semi-supervised learning using randomized mincuts

Published: 04 July 2004 Publication History

Abstract

In many application domains there is a large amount of unlabeled data but only a very limited amount of labeled training data. One general approach that has been explored for utilizing this unlabeled data is to construct a graph on all the data points based on distance relationships among examples, and then to use the known labels to perform some type of graph partitioning. One natural partitioning to use is the minimum cut that agrees with the labeled data (Blum & Chawla, 2001), which can be thought of as giving the most probable label assignment if one views labels as generated according to a Markov Random Field on the graph. Zhu et al. (2003) propose a cut based on a relaxation of this field, and Joachims (2003) gives an algorithm based on finding an approximate min-ratio cut.In this paper, we extend the mincut approach by adding randomness to the graph structure. The resulting algorithm addresses several short-comings of the basic mincut approach, and can be given theoretical justification from both a Markov random field perspective and from sample complexity considerations. In cases where the graph does not have small cuts for a given classification problem, randomization may not help. However, our experiments on several datasets show that when the structure of the graph supports small cuts, this can result in highly accurate classifiers with good accuracy/coverage tradeoffs. In addition, we are able to achieve good performance with a very simple graph-construction procedure.

References

[1]
Benedek, G., & Itai, A. (1991). Learnability with respect to a fixed distribution. Theoretical Computer Science, 86, 377--389.]]
[2]
Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. Proceedings of the 18th International Conference on Machine Learning (pp. 19--26). Morgan Kaufmann.]]
[3]
Brown, J. I., Hickman, C. A., Sokal, A. D., & Wagner, D. G. (2001). Chromatic roots of generalized theta graphs. J. Combinatorial Theory, Series B, 83, 272--297.]]
[4]
Dyer, M., Goldberg, L. A., Greenhill, C., & Jerrum, M. (2000). On the relative complexity of approximate counting problems. Proceedings of APPROX'00, Lecture Notes in Computer Science 1913 (pp. 108--119).]]
[5]
Freund, Y., Mansour, Y., & Schapire, R. (2003). Generalization bounds for averaged classifiers (how to be a Bayesian without believing). To appear in Annals of Statistics. Preliminary version appeared in Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics, 2001.]]
[6]
Greig, D., Porteous, B., & Seheult, A. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, Series B, 51, 271--279.]]
[7]
Hull, J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 550--554.]]
[8]
Jerrum, M., & Sinclair, A. (1993). Polynomial-time approximation algorithms for the Ising model. SIAM Journal on Computing, 22, 1087--1116.]]
[9]
Joachims, T. (2003). Transductive learning via spectral graph partitioning. Proceedings of the International Conference on Machine Learning (ICML) (pp. 290--297).]]
[10]
Kleinberg, J. (2000). Detecting a network failure. Proc. 41st IEEE Symposium on Foundations of Computer Science (pp. 231--239).]]
[11]
Kleinberg, J., Sandler, M., & Slivkins, A. (2004). Network failure detection and graph connectivity. Proc. 15th ACM-SIAM Symposium on Discrete Algorithms (pp. 76--85).]]
[12]
Langford, J., & Shawe-Taylor, J. (2002). PAC-bayes and margins. Neural Information Processing Systems.]]
[13]
McAllester, D. (2003). PAC-bayesian stochastic model selection. Machine Learning, 51, 5--21.]]
[14]
Zhu, X., Gharahmani, Z., & Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. Proceedings of the 20th International Conference on Machine Learning (pp. 912--919).]]

Cited By

View all
  • (2024)Deep Learning Method Based on Fault Big Data Analysis for OSS Reliability AssessmentApplied OSS Reliability Assessment Modeling, AI and Tools10.1007/978-3-031-64803-8_8(81-149)Online publication date: 26-Sep-2024
  • (2023)Nearly tight bounds for differentially private multiway cutProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667206(24947-24965)Online publication date: 10-Dec-2023
  • (2023)Fast online node labeling for very large graphsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620207(42658-42697)Online publication date: 23-Jul-2023
  • Show More Cited By
  1. Semi-supervised learning using randomized mincuts

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '04: Proceedings of the twenty-first international conference on Machine learning
    July 2004
    934 pages
    ISBN:1581138385
    DOI:10.1145/1015330
    • Conference Chair:
    • Carla Brodley

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep Learning Method Based on Fault Big Data Analysis for OSS Reliability AssessmentApplied OSS Reliability Assessment Modeling, AI and Tools10.1007/978-3-031-64803-8_8(81-149)Online publication date: 26-Sep-2024
    • (2023)Nearly tight bounds for differentially private multiway cutProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667206(24947-24965)Online publication date: 10-Dec-2023
    • (2023)Fast online node labeling for very large graphsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620207(42658-42697)Online publication date: 23-Jul-2023
    • (2023)MIML-GAN: A GAN-Based Algorithm for Multi-Instance Multi-Label Learning on Overlapping Signal Waveform RecognitionIEEE Transactions on Signal Processing10.1109/TSP.2023.324209171(859-872)Online publication date: 2023
    • (2023)LoyalDE: Improving the performance of Graph Neural Networks with loyal node discovery and emphasisNeural Networks10.1016/j.neunet.2023.05.023Online publication date: May-2023
    • (2023)Estimation and comparison of mean time between failures based on deep learning for OSS fault big dataInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-01907-215:8(3596-3611)Online publication date: 25-Apr-2023
    • (2022)Prototype of 3D Reliability Assessment Tool Based on Deep Learning for Edge OSS ComputingMathematics10.3390/math1009157210:9(1572)Online publication date: 6-May-2022
    • (2022)Scalably Using Node Attributes and Graph Structure for Node ClassificationEntropy10.3390/e2407090624:7(906)Online publication date: 30-Jun-2022
    • (2022)Visualization and Reliability Analysis for Edge Computing Open Source SoftwareHuman Interface and the Management of Information: Visual and Information Design10.1007/978-3-031-06424-1_30(410-420)Online publication date: 16-Jun-2022
    • (2022)Deep Learning Approach Based on Fault Correction Time for Reliability Assessment of Cloud and Edge Open Source SoftwarePredictive Analytics in System Reliability10.1007/978-3-031-05347-4_1(1-17)Online publication date: 9-Sep-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media