Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599491acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

Published: 04 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel ''hardness'' measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more ''easy'' samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU.

    Supplementary Material

    MP4 File (rtfp1218-2min-promo.mp4)
    Presentation video

    References

    [1]
    Sumit Basu and Janara Christensen. 2013. Teaching classification boundaries to humans. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 27. 109--115.
    [2]
    Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: A survey. Machine Learning, Vol. 109, 4 (2020), 719--760.
    [3]
    Shizhen Chang, Bo Du, and Liangpei Zhang. 2021. Positive unlabeled learning with class-prior approximation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 2014--2021.
    [4]
    Xuxi Chen, Wuyang Chen, Tianlong Chen, Ye Yuan, Chen Gong, Kewei Chen, and Zhangyang Wang. 2020. Self-pu: Self boosted and calibrated positive-unlabeled training. In International Conference on Machine Learning. PMLR, 1510--1519.
    [5]
    Marthinus Christoffel, Gang Niu, and Masashi Sugiyama. 2016. Class-prior estimation for learning from positive and unlabeled data. In Asian Conference on Machine Learning. PMLR, 221--236.
    [6]
    Marthinus C Du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of learning from positive and unlabeled data. Advances in neural information processing systems, Vol. 27 (2014).
    [7]
    Yanbo Fan, Ran He, Jian Liang, and Baogang Hu. 2017. Self-paced learning: an implicit regularization perspective. In Thirty-First AAAI Conference on Artificial Intelligence.
    [8]
    Pierre Geurts. 2011. Learning from positive and unlabeled examples by enforcing statistical significance. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 305--314.
    [9]
    Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R Scott, and Dinglong Huang. 2018. Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European conference on computer vision (ECCV). 135--150.
    [10]
    Ming Hou, Brahim Chaib-Draa, Chao Li, and Qibin Zhao. 2017. Generative adversarial positive-unlabelled learning. arXiv preprint arXiv:1711.08054 (2017).
    [11]
    Yu-Guan Hsieh, Gang Niu, and Masashi Sugiyama. 2019. Classification from positive, unlabeled and biased negative data. In International Conference on Machine Learning. PMLR, 2820--2829.
    [12]
    Lu Jiang, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2014. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the 22nd ACM international conference on Multimedia. 547--556.
    [13]
    Masahiro Kato, Takeshi Teshima, and Junya Honda. 2018. Learning from positive and unlabeled data with a selection bias. In International conference on learning representations.
    [14]
    Faisal Khan, Bilge Mutlu, and Jerry Zhu. 2011. How do humans teach: On curriculum learning and teaching dimension. Advances in neural information processing systems, Vol. 24 (2011).
    [15]
    Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. 2017. Positive-unlabeled learning with non-negative risk estimator. Advances in neural information processing systems, Vol. 30 (2017).
    [16]
    Gaurav Kumar, George Foster, Colin Cherry, and Maxim Krikun. 2019. Reinforcement learning based curriculum optimization for neural machine translation. arXiv preprint arXiv:1903.00041 (2019).
    [17]
    M Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. Advances in neural information processing systems, Vol. 23 (2010).
    [18]
    Changchun Li, Ximing Li, Lei Feng, and Jihong Ouyang. 2022. Who is your right mixup partner in positive and unlabeled learning. In International Conference on Learning Representations.
    [19]
    Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. 2002. Partially supervised classification of text documents. In ICML, Vol. 2. Sydney, NSW, 387--394.
    [20]
    Chuan Luo, Pu Zhao, Chen Chen, Bo Qiao, Chao Du, Hongyu Zhang, Wei Wu, Shaowei Cai, Bing He, Saravanakumar Rajmohan, et al. 2021. PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8784--8792.
    [21]
    Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
    [22]
    Fantine Mordelet and Jean-Philippe Vert. 2011. Prodige: Prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics, Vol. 12, 1 (2011), 1--15.
    [23]
    Fantine Mordelet and J-P Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters, Vol. 37 (2014), 201--209.
    [24]
    Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, and Tom M Mitchell. 2019. Competence-based curriculum learning for neural machine translation. arXiv preprint arXiv:1903.09848 (2019).
    [25]
    Valentin I Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2009. Baby Steps: How ?Less is More" in unsupervised dependency parsing. (2009).
    [26]
    Wen Wang, Wei Zhang, Shukai Liu, Qi Liu, Bo Zhang, Leyu Lin, and Hongyuan Zha. 2023. Incorporating Link Prediction into Multi-Relational Item Graph Modeling for Session-Based Recommendation. IEEE Trans. Knowl. Data Eng., Vol. 35, 3 (2023), 2683--2696.
    [27]
    Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
    [28]
    Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, and Junjie Yan. 2019. Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF international conference on computer vision. 5017--5026.
    [29]
    Miao Xu, B. Li, Gang Niu, Bo Han, and Masashi Sugiyama. 2019a. Revisiting Sample Selection Approach to Positive-Unlabeled Learning: Turning Unlabeled Data into Positive rather than Negative. ArXiv, Vol. abs/1901.10155 (2019).
    [30]
    Miao Xu, Bingcong Li, Gang Niu, Bo Han, and Masashi Sugiyama. 2019b. Revisiting sample selection approach to positive-unlabeled learning: Turning unlabeled data into positive rather than negative. arXiv preprint arXiv:1901.10155 (2019).
    [31]
    Hwanjo Yu, Jiawei Han, and KC-C Chang. 2004. PEBL: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering, Vol. 16, 1 (2004), 70--81.
    [32]
    Yunrui Zhao, Qianqian Xu, Yangbangyan Jiang, Peisong Wen, and Qingming Huang. 2022. Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14461--14470.
    [33]
    Tianyi Zhou, Shengjie Wang, and Jeff Bilmes. 2021a. Curriculum learning by optimizing learning dynamics. In International Conference on Artificial Intelligence and Statistics. PMLR, 433--441.
    [34]
    Tianyi Zhou, Shengjie Wang, and Jeff Bilmes. 2021b. Robust curriculum learning: from clean label detection to noisy label self-correction. In International Conference on Learning Representations.

    Cited By

    View all
    • (2024)Bootstrap Latent Prototypes for graph positive-unlabeled learningInformation Fusion10.1016/j.inffus.2024.102553112(102553)Online publication date: Dec-2024

    Index Terms

    1. Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2023
      5996 pages
      ISBN:9798400701030
      DOI:10.1145/3580305
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 August 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. curriculum learning
      2. positive-unlabeled learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)351
      • Downloads (Last 6 weeks)39

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Bootstrap Latent Prototypes for graph positive-unlabeled learningInformation Fusion10.1016/j.inffus.2024.102553112(102553)Online publication date: Dec-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media