Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3367471.3367632guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Positive and unlabeled learning with label disambiguation

Published: 10 August 2019 Publication History

Abstract

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is simultaneously treated as positive and negative with different class weights. However, the ground-truth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased classifier and deteriorated performance. To solve this problem, this paper proposes a novel algorithm dubbed as "Positive and Unlabeled learning with Label Disambiguation" (PULD). We first regard all the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response between the most likely label and the less likely one, to find the unique ground-truth label of each unlabeled example. Theoretically, we derive the generalization error bound of the proposed method by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the superiority of the proposed PULD to the existing PU learning approaches.

References

[1]
Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 3(Nov):463-482, 2002.
[2]
Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7(Nov):2399-2434, 2006.
[3]
Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
[4]
Ching-Hui Chen, Vishal M Patel, and Rama Chellappa. Learning from ambiguously labeled face images. IEEE T-PAMI, 40(7):1653-1667, 2018.
[5]
Marthinus Christoffel, Gang Niu, and Masashi Sugiyama. Class-prior estimation for learning from positive and unlabeled data. In ACML, pages 221-236, 2016.
[6]
Timothee Cour, Ben Sapp, and Ben Taskar. Learning from partial labels. JMLR, 12(May):1501-1536, 2011.
[7]
Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2(Dec):265-292, 2001.
[8]
François Denis, Rémi Gilleron, and Fabien Letouzey. Learning from positive and unlabeled examples. Theoretical Computer Science, 348(1):70-83, 2005.
[9]
Marthinus C Du Plessis, Gang Niu, and Masashi Sugiyama. Analysis of learning from positive and unlabeled data. In NeurIPS, pages 703-711, 2014.
[10]
Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. Convex formulation for learning from positive and unlabeled data. In ICML, pages 1386-1394, 2015.
[11]
Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. In SIGKDD, pages 213-220. ACM, 2008.
[12]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. JMLR, 9(Aug):1871-1874, 2008.
[13]
Lei Feng and Bo An. Leveraging latent label distributions for partial label learning. In IJCAI, pages 2107- 2113, 2018.
[14]
Guangyuan Fu, Jun Wang, Bo Yang, and Guoxian Yu. Neggoa: negative go annotations selection using ontology structure. Bioinformatics, 32(19):2996-3004, 2016.
[15]
Chen Gong, Tongliang Liu, Yuanyan Tang, Jian Yang, Jie Yang, and Dacheng Tao. A regularization approach for instance-based superset label learning. IEEE T-CYB, 48(3):967- 978, 2018.
[16]
Chen Gong, Tongliang Liu, Jian Yang, and Dacheng Tao. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE T-NNLS, 2019.
[17]
Chen Gong, Hong Shi, Jie Yang, and Jian Yanga. Multi-manifold positive and unlabeled learning for visual analysis. IEEE T-CSVT, 2019.
[18]
Mingyi Hong and Zhi-Quan Luo. On the linear convergence of the alternating direction method of multipliers. Mathematical Programming, 162(1-2):165-199, 2017.
[19]
Rong Jin and Zoubin Ghahramani. Learning with multiple labels. In NeurIPS, pages 921-928, 2003.
[20]
Ryuichi Kiryo, Gang Niu, Marthinus C du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. In NeurIPS, pages 1675-1685, 2017.
[21]
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
[22]
Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, volume 3, pages 448-455, 2003.
[23]
Xiaoli Li and Bing Liu. Learning to classify texts using positive and unlabeled data. In IJCAI, volume 3, pages 587-592, 2003.
[24]
Wenkai Li, Qinghua Guo, and Charles Elkan. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE T-GRS, 49(2):717-725, 2011.
[25]
Tongliang Liu and Dacheng Tao. Classification with noisy labels by importance reweighting. IEEE T-PAMI, 38(3):447-461, 2016.
[26]
Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. Partially supervised classification of text documents. In ICML, volume 2, pages 387-394. Citeseer, 2002.
[27]
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.
[28]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
[29]
Nam Nguyen and Rich Caruana. Classification with partial labels. In SIGKDD, pages 551-559. ACM, 2008.
[30]
Hong Shi, Shaojun Pan, Jian Yang, and Chen Gong. Positive and unlabeled learning via loss decomposition and centroid estimation. In IJCAI, pages 2689-2695, 2018.
[31]
Huihui Wei and Ming Li. Positive and unlabeled learning for detecting software functional clones with adversarial training. In IJCAI, pages 2840-2846, 2018.
[32]
Xuan Wu and Min-Ling Zhang. Towards enabling binary decomposition for partial label learning. In IJCAI, pages 2868-2874, 2018.
[33]
Noah Youngs, Duncan Penfold-Brown, Richard Bonneau, and Dennis Shasha. Negative example selection for protein function prediction: the nogo database. PLoS computational biology, 10(6):e1003644, 2014.
[34]
Noah Youngs, Dennis Shasha, and Richard Bonneau. Positive-unlabeled learning in the face of labeling bias. In ICDMW, pages 639-645. IEEE, 2015.
[35]
Fei Yu and Min-Ling Zhang. Maximum margin partial label learning. Machine Learning, 106(4):573-593, 2017.
[36]
Dengyong Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. In NeurIPS, pages 321-328, 2004.

Cited By

View all
  • (2021)Fraud Detection under Multi-Sourced Extremely Noisy AnnotationsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482433(2497-2506)Online publication date: 26-Oct-2021
  • (2020)Learning from positive and unlabeled data with arbitrary positive shiftProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496822(13088-13099)Online publication date: 6-Dec-2020
  1. Positive and unlabeled learning with label disambiguation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence
    August 2019
    6589 pages
    ISBN:9780999241141

    Sponsors

    • Sony: Sony Corporation
    • Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
    • Baidu Research: Baidu Research
    • The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)
    • Lenovo: Lenovo

    Publisher

    AAAI Press

    Publication History

    Published: 10 August 2019

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Fraud Detection under Multi-Sourced Extremely Noisy AnnotationsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482433(2497-2506)Online publication date: 26-Oct-2021
    • (2020)Learning from positive and unlabeled data with arbitrary positive shiftProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496822(13088-13099)Online publication date: 6-Dec-2020

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media