research-article

Free access

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

Authors:

Saravan Rajmohan, and

Dongmei ZhangAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

Pages 3663 - 3673

https://doi.org/10.1145/3580305.3599491

Published: 04 August 2023 Publication History

Abstract

Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel ''hardness'' measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more ''easy'' samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU.

Supplementary Material

MP4 File (rtfp1218-2min-promo.mp4)

Presentation video

Download
4.96 MB

References

[1]

Sumit Basu and Janara Christensen. 2013. Teaching classification boundaries to humans. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 27. 109--115.

[2]

Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: A survey. Machine Learning, Vol. 109, 4 (2020), 719--760.

Digital Library

[3]

Shizhen Chang, Bo Du, and Liangpei Zhang. 2021. Positive unlabeled learning with class-prior approximation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 2014--2021.

[4]

Xuxi Chen, Wuyang Chen, Tianlong Chen, Ye Yuan, Chen Gong, Kewei Chen, and Zhangyang Wang. 2020. Self-pu: Self boosted and calibrated positive-unlabeled training. In International Conference on Machine Learning. PMLR, 1510--1519.

[5]

Marthinus Christoffel, Gang Niu, and Masashi Sugiyama. 2016. Class-prior estimation for learning from positive and unlabeled data. In Asian Conference on Machine Learning. PMLR, 221--236.

[6]

Marthinus C Du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of learning from positive and unlabeled data. Advances in neural information processing systems, Vol. 27 (2014).

[7]

Yanbo Fan, Ran He, Jian Liang, and Baogang Hu. 2017. Self-paced learning: an implicit regularization perspective. In Thirty-First AAAI Conference on Artificial Intelligence.

[8]

Pierre Geurts. 2011. Learning from positive and unlabeled examples by enforcing statistical significance. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 305--314.

[9]

Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R Scott, and Dinglong Huang. 2018. Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European conference on computer vision (ECCV). 135--150.

Digital Library

[10]

Ming Hou, Brahim Chaib-Draa, Chao Li, and Qibin Zhao. 2017. Generative adversarial positive-unlabelled learning. arXiv preprint arXiv:1711.08054 (2017).

[11]

Yu-Guan Hsieh, Gang Niu, and Masashi Sugiyama. 2019. Classification from positive, unlabeled and biased negative data. In International Conference on Machine Learning. PMLR, 2820--2829.

[12]

Lu Jiang, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2014. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the 22nd ACM international conference on Multimedia. 547--556.

Digital Library

[13]

Masahiro Kato, Takeshi Teshima, and Junya Honda. 2018. Learning from positive and unlabeled data with a selection bias. In International conference on learning representations.

[14]

Faisal Khan, Bilge Mutlu, and Jerry Zhu. 2011. How do humans teach: On curriculum learning and teaching dimension. Advances in neural information processing systems, Vol. 24 (2011).

[15]

Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. 2017. Positive-unlabeled learning with non-negative risk estimator. Advances in neural information processing systems, Vol. 30 (2017).

[16]

Gaurav Kumar, George Foster, Colin Cherry, and Maxim Krikun. 2019. Reinforcement learning based curriculum optimization for neural machine translation. arXiv preprint arXiv:1903.00041 (2019).

[17]

M Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. Advances in neural information processing systems, Vol. 23 (2010).

[18]

Changchun Li, Ximing Li, Lei Feng, and Jihong Ouyang. 2022. Who is your right mixup partner in positive and unlabeled learning. In International Conference on Learning Representations.

[19]

Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. 2002. Partially supervised classification of text documents. In ICML, Vol. 2. Sydney, NSW, 387--394.

Digital Library

[20]

Chuan Luo, Pu Zhao, Chen Chen, Bo Qiao, Chao Du, Hongyu Zhang, Wei Wu, Shaowei Cai, Bing He, Saravanakumar Rajmohan, et al. 2021. PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8784--8792.

[21]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

[22]

Fantine Mordelet and Jean-Philippe Vert. 2011. Prodige: Prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics, Vol. 12, 1 (2011), 1--15.

[23]

Fantine Mordelet and J-P Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters, Vol. 37 (2014), 201--209.

Digital Library

[24]

Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, and Tom M Mitchell. 2019. Competence-based curriculum learning for neural machine translation. arXiv preprint arXiv:1903.09848 (2019).

[25]

Valentin I Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2009. Baby Steps: How ?Less is More" in unsupervised dependency parsing. (2009).

[26]

Wen Wang, Wei Zhang, Shukai Liu, Qi Liu, Bo Zhang, Leyu Lin, and Hongyuan Zha. 2023. Incorporating Link Prediction into Multi-Relational Item Graph Modeling for Session-Based Recommendation. IEEE Trans. Knowl. Data Eng., Vol. 35, 3 (2023), 2683--2696.

[27]

Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[28]

Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, and Junjie Yan. 2019. Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF international conference on computer vision. 5017--5026.

[29]

Miao Xu, B. Li, Gang Niu, Bo Han, and Masashi Sugiyama. 2019a. Revisiting Sample Selection Approach to Positive-Unlabeled Learning: Turning Unlabeled Data into Positive rather than Negative. ArXiv, Vol. abs/1901.10155 (2019).

[30]

Miao Xu, Bingcong Li, Gang Niu, Bo Han, and Masashi Sugiyama. 2019b. Revisiting sample selection approach to positive-unlabeled learning: Turning unlabeled data into positive rather than negative. arXiv preprint arXiv:1901.10155 (2019).

[31]

Hwanjo Yu, Jiawei Han, and KC-C Chang. 2004. PEBL: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering, Vol. 16, 1 (2004), 70--81.

Digital Library

[32]

Yunrui Zhao, Qianqian Xu, Yangbangyan Jiang, Peisong Wen, and Qingming Huang. 2022. Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14461--14470.

[33]

Tianyi Zhou, Shengjie Wang, and Jeff Bilmes. 2021a. Curriculum learning by optimizing learning dynamics. In International Conference on Artificial Intelligence and Statistics. PMLR, 433--441.

[34]

Tianyi Zhou, Shengjie Wang, and Jeff Bilmes. 2021b. Robust curriculum learning: from clean label detection to noisy label self-correction. In International Conference on Learning Representations.

Cited By

Liang CTian YZhao DLi MPan SZhang HWei J(2024)Bootstrap Latent Prototypes for graph positive-unlabeled learningInformation Fusion10.1016/j.inffus.2024.102553112(102553)Online publication date: Dec-2024
https://doi.org/10.1016/j.inffus.2024.102553

Index Terms

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

In the multi-instance learning (MIL) setting instances are grouped together into bags. Labels are provided only for the bags and not on the level of individual instances. A positive bag label means that at least one instance inside the bag is positive, ...
Read More
Positive and unlabeled learning with label disambiguation
IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is ...
Read More
Unsupervised Few-Shot Learning via Positive Expansions and Negative Proxies
Advanced Intelligent Computing Technology and Applications
Abstract
Few-shot learning has made significant progress recently thanks to pre-training methods and meta-learning approaches. These methods, however, require an extensive labeled dataset that is difficult to obtain. We propose an unsupervised few-shot ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '24

Sponsor:
sigkdd
sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
351
Total Downloads

Downloads (Last 12 months)351
Downloads (Last 6 weeks)39

Other Metrics

View Author Metrics

Citations

Cited By

Liang CTian YZhao DLi MPan SZhang HWei J(2024)Bootstrap Latent Prototypes for graph positive-unlabeled learningInformation Fusion10.1016/j.inffus.2024.102553112(102553)Online publication date: Dec-2024
https://doi.org/10.1016/j.inffus.2024.102553

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents