Article

Positive and unlabeled learning with label disambiguation

Authors:

Chen GongAuthors Info & Claims

IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence

Pages 4250 - 4256

Published: 10 August 2019 Publication History

Abstract

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is simultaneously treated as positive and negative with different class weights. However, the ground-truth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased classifier and deteriorated performance. To solve this problem, this paper proposes a novel algorithm dubbed as "Positive and Unlabeled learning with Label Disambiguation" (PULD). We first regard all the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response between the most likely label and the less likely one, to find the unique ground-truth label of each unlabeled example. Theoretically, we derive the generalization error bound of the proposed method by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the superiority of the proposed PULD to the existing PU learning approaches.

References

[1]

Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 3(Nov):463-482, 2002.

Digital Library

[2]

Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7(Nov):2399-2434, 2006.

Digital Library

[3]

Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[4]

Ching-Hui Chen, Vishal M Patel, and Rama Chellappa. Learning from ambiguously labeled face images. IEEE T-PAMI, 40(7):1653-1667, 2018.

[5]

Marthinus Christoffel, Gang Niu, and Masashi Sugiyama. Class-prior estimation for learning from positive and unlabeled data. In ACML, pages 221-236, 2016.

[6]

Timothee Cour, Ben Sapp, and Ben Taskar. Learning from partial labels. JMLR, 12(May):1501-1536, 2011.

Digital Library

[7]

Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2(Dec):265-292, 2001.

Digital Library

[8]

François Denis, Rémi Gilleron, and Fabien Letouzey. Learning from positive and unlabeled examples. Theoretical Computer Science, 348(1):70-83, 2005.

Digital Library

[9]

Marthinus C Du Plessis, Gang Niu, and Masashi Sugiyama. Analysis of learning from positive and unlabeled data. In NeurIPS, pages 703-711, 2014.

Digital Library

[10]

Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. Convex formulation for learning from positive and unlabeled data. In ICML, pages 1386-1394, 2015.

Digital Library

[11]

Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. In SIGKDD, pages 213-220. ACM, 2008.

Digital Library

[12]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. JMLR, 9(Aug):1871-1874, 2008.

Digital Library

[13]

Lei Feng and Bo An. Leveraging latent label distributions for partial label learning. In IJCAI, pages 2107- 2113, 2018.

Digital Library

[14]

Guangyuan Fu, Jun Wang, Bo Yang, and Guoxian Yu. Neggoa: negative go annotations selection using ontology structure. Bioinformatics, 32(19):2996-3004, 2016.

[15]

Chen Gong, Tongliang Liu, Yuanyan Tang, Jian Yang, Jie Yang, and Dacheng Tao. A regularization approach for instance-based superset label learning. IEEE T-CYB, 48(3):967- 978, 2018.

[16]

Chen Gong, Tongliang Liu, Jian Yang, and Dacheng Tao. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE T-NNLS, 2019.

[17]

Chen Gong, Hong Shi, Jie Yang, and Jian Yanga. Multi-manifold positive and unlabeled learning for visual analysis. IEEE T-CSVT, 2019.

[18]

Mingyi Hong and Zhi-Quan Luo. On the linear convergence of the alternating direction method of multipliers. Mathematical Programming, 162(1-2):165-199, 2017.

Digital Library

[19]

Rong Jin and Zoubin Ghahramani. Learning with multiple labels. In NeurIPS, pages 921-928, 2003.

Digital Library

[20]

Ryuichi Kiryo, Gang Niu, Marthinus C du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. In NeurIPS, pages 1675-1685, 2017.

Digital Library

[21]

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[22]

Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, volume 3, pages 448-455, 2003.

Digital Library

[23]

Xiaoli Li and Bing Liu. Learning to classify texts using positive and unlabeled data. In IJCAI, volume 3, pages 587-592, 2003.

Digital Library

[24]

Wenkai Li, Qinghua Guo, and Charles Elkan. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE T-GRS, 49(2):717-725, 2011.

[25]

Tongliang Liu and Dacheng Tao. Classification with noisy labels by importance reweighting. IEEE T-PAMI, 38(3):447-461, 2016.

Digital Library

[26]

Bing Liu, Wee Sun Lee, Philip S Yu, and Xiaoli Li. Partially supervised classification of text documents. In ICML, volume 2, pages 387-394. Citeseer, 2002.

Digital Library

[27]

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.

Digital Library

[28]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.

[29]

Nam Nguyen and Rich Caruana. Classification with partial labels. In SIGKDD, pages 551-559. ACM, 2008.

Digital Library

[30]

Hong Shi, Shaojun Pan, Jian Yang, and Chen Gong. Positive and unlabeled learning via loss decomposition and centroid estimation. In IJCAI, pages 2689-2695, 2018.

Digital Library

[31]

Huihui Wei and Ming Li. Positive and unlabeled learning for detecting software functional clones with adversarial training. In IJCAI, pages 2840-2846, 2018.

Digital Library

[32]

Xuan Wu and Min-Ling Zhang. Towards enabling binary decomposition for partial label learning. In IJCAI, pages 2868-2874, 2018.

Digital Library

[33]

Noah Youngs, Duncan Penfold-Brown, Richard Bonneau, and Dennis Shasha. Negative example selection for protein function prediction: the nogo database. PLoS computational biology, 10(6):e1003644, 2014.

[34]

Noah Youngs, Dennis Shasha, and Richard Bonneau. Positive-unlabeled learning in the face of labeling bias. In ICDMW, pages 639-645. IEEE, 2015.

Digital Library

[35]

Fei Yu and Min-Ling Zhang. Maximum margin partial label learning. Machine Learning, 106(4):573-593, 2017.

Digital Library

[36]

Dengyong Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. In NeurIPS, pages 321-328, 2004.

Digital Library

Cited By

Zhang CWang QLiu TLu XHong JHan BGong CDemartini GZuccon GCulpepper JHuang ZTong H(2021)Fraud Detection under Multi-Sourced Extremely Noisy AnnotationsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482433(2497-2506)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482433
Hammoudeh ZLowd DLarochelle HRanzato MHadsell RBalcan MLin H(2020)Learning from positive and unlabeled data with arbitrary positive shiftProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496822(13088-13099)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496822

Positive and unlabeled learning with label disambiguation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Partial label learning with unlabeled data
IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence

Partial label learning deals with training examples each associated with a set of candidate labels, among which only one label is valid. Previous studies typically assume that the candidate label sets are provided for all training examples. In many real-...
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Partial Label Learning via Feature-Aware Disambiguation
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Partial label learning deals with the problem where each training example is represented by a feature vector while associated with a set of candidate labels, among which only one label is valid. To learn from such ambiguous labeling information, the key ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence

August 2019

6589 pages

ISBN:9780999241141

Editor:
Sarit Kraus
Bar-Ilan University (ISRAEL)

Sponsors

Sony: Sony Corporation
Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
Baidu Research: Baidu Research
The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)
Lenovo: Lenovo

Publisher

AAAI Press

Publication History

Published: 10 August 2019

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang CWang QLiu TLu XHong JHan BGong CDemartini GZuccon GCulpepper JHuang ZTong H(2021)Fraud Detection under Multi-Sourced Extremely Noisy AnnotationsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482433(2497-2506)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482433
Hammoudeh ZLowd DLarochelle HRanzato MHadsell RBalcan MLin H(2020)Learning from positive and unlabeled data with arbitrary positive shiftProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496822(13088-13099)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496822

View Options

View options

Figures

Tables

Media

View Table of Conten