research-article

Adaptive Learning for Weakly Labeled Streams

Authors:

Zhi-Hua ZhouAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 2556 - 2564

https://doi.org/10.1145/3534678.3539351

Published: 14 August 2022 Publication History

Abstract

In plenty of real-world applications, data are collected in a streaming fashion, and their accurate labels are hard to obtain. For instance, in the environmental monitoring task, sensors are collecting the data all the time. Still, their labels are scarce because the labeling process requires human effort and can conceal annotation errors. This paper investigates the problem of learning with weakly labeled data streams, in which data are continuously collected, and only a limited subset of streaming data is labeled but potentially with noise. This setting is challenging and of great importance but rarely studied in the literature. When the data are constantly gathered with unknown noise on labels, it is quite challenging to design algorithms to obtain a well-generalized classifier. To address this difficulty, we propose a novel noise transition matrix estimation approach for data streams with scarce noisy labels by online anchor points identification. Based on that, we propose an adaptive learning algorithm for weakly labeled data streams via model reuse and effectively alleviate the negative influence of label noise with unlabeled data. Both theoretical analysis and extensive experiments justify and validate the effectiveness of the proposed approach.

References

[1]

Alekh Agarwal. Selective sampling algorithms for cost-sensitive multiclass prediction. In Proceedings of the 30th International Conference on Machine Learning (ICML), pages 1220--1228, 2013.

[2]

Dana Angluin and Philip Laird. Learning from noisy examples. Machine Learning, 2(4):343--370, 1988.

[3]

Sanjeev Arora, Rong Ge, and Ankur Moitra. Learning topic models--going beyond svd. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pages 1--10, 2012.

Digital Library

[4]

Joseph K Bradley and Robert E Schapire. Filterboost: Regression and classification on large datasets. Advances in Neural Information Processing Systems, 20, 2007.

[5]

Nicolo Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz. Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51(6):2152--2162, 2005.

Digital Library

[6]

Nicolo Cesa-Bianchi, Shai Shalev Shwartz, and Ohad Shamir. Online learning of noisy data with kernels. In Proceedings of the 23rd Annual Conference Computational of Learning Theory (COLT), pages 218--230, 2010.

[7]

Jiacheng Cheng, Tongliang Liu, Kotagiri Ramamohanarao, and Dacheng Tao. Learning with bounded instance and label-dependent label noise. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 1789--1799, 2020.

Digital Library

[8]

Fang Chu, Yizhou Wang, and Carlo Zaniolo. An adaptive learning approach for noisy data streams. In Proceedings of the 4th International Conference on Data Mining (ICDM), pages 351--354, 2004.

[9]

Yoav Freund. A more robust boosting algorithm. arXiv preprint arXiv:0905.2138, 2009.

[10]

Andrew B Goldberg, Ming Li, and Xiaojin Zhu. Online manifold regularization: A new learning setting and empirical study. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 393--407, 2008.

[11]

Chen Gong, Hengmin Zhang, Jian Yang, and Dacheng Tao. Learning with inadequate and incorrect supervision. In Proceedings of the 17th International Conference on Data Mining (ICDM), pages 889--894, 2017.

[12]

Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor W Tsang, Ya Zhang, and Masashi Sugiyama. Masking: a new perspective of noisy supervision. In Advances in Neural Information Processing Systems 32 (NeurIPS), pages 5841--5851, 2018.

[13]

Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3--4):157--325, 2016.

[14]

Bo-Jian Hou, Lijun Zhang, and Zhi-Hua Zhou. Learning with feature evolvable streams. In Advances in Neural Information Processing Systems 31 (NeurIPS), pages 1416--1426, 2017.

[15]

Chenping Hou and Zhi-Hua Zhou. One-pass learning with incremental and decremental features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11):2776--2792, 2018.

Digital Library

[16]

Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, volume 3, page 896, 2013.

[17]

Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with noisy labels as semi-supervised learning. In International Conference on Learning Representations, 2020.

[18]

Ming Li and Zhi-Hua Zhou. Setred: Self-training with editing. In Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD), pages 611--621, 2005.

Digital Library

[19]

Tongliang Liu and Dacheng Tao. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3):447--461, 2016.

Digital Library

[20]

Xin Mu, Kai Ming Ting, and Zhi-Hua Zhou. Classification under streaming emerging new classes: A solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering, 29(8):1605--1618, 2017.

Digital Library

[21]

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. In Advances in Neural Information Processing Systems 26 (NeurIPS), pages 1196--1204, 2013.

[22]

Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Proceedings of the 26th Conference On Learning Theory (COLT), volume 30, pages 993--1019, 2013.

[23]

Harish Ramaswamy, Clayton Scott, and Ambuj Tewari. Mixture proportion estimation via kernel embeddings of distributions. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pages 2052--2060, 2016.

[24]

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. SELFIE: Refurbishing unclean samples for robust deep learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 5907--5915, 2019.

[25]

W Nick Street and YongSeog Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 377--382, 2001.

[26]

Tim van Erven, Sarah Sachs, Wouter M Koolen, and Wojciech Kotlowski. Robust online convex optimization in the presence of outliers. In roceedings of the 34th Annual Conference Computational of Learning Theory (COLT), pages 4174--4194, 2021.

[27]

Tal Wagner, Sudipto Guha, Shiva Kasiviswanathan, and Nina Mishra. Semi-supervised learning on data streams via temporal label propagation. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 5095--5104, 2018.

[28]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help generalization against label corruption? In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 7164--7173, 2019.

[29]

Zhen-Yu Zhang, Peng Zhao, Yuan Jiang, and Zhi-Hua Zhou. Learning with feature and distribution evolvable streams. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 11317--11327, 2020.

Digital Library

[30]

Zhen-Yu Zhang, Peng Zhao, Yuan Jiang, and Zhi-Hua Zhou. Learning from incomplete and inaccurate supervision. IEEE Transactions on Knowledge and Data Engineering, 2021.

[31]

Peng Zhao, Guanghui Wang, Lijun Zhang, and Zhi-Hua Zhou. Bandit convex optimization in non-stationary environments. Journal of Machine Learning Research, 22(125): 1--45, 2021.

[32]

Peng Zhao, Xinqiang Wang, Siyu Xie, Lei Guo, and Zhi-Hua Zhou. Distribution-free one-pass learning. IEEE Transaction on Knowledge and Data Engineering, 33: 951--963, 2021.

[33]

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 12510--12520, 2020.

[34]

Peng Zhao, Yu-Jie Zhang, and Zhi-Hua Zhou. Exploratory machine learning with unknown unknowns. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), pages 10999--11006, 2021.

[35]

Zhi-Hua Zhou. A brief introduction to weakly supervised learning. National Science Review, 5(1):44--53, 2017.

[36]

Zhi-Hua Zhou. Machine Learning. Springer Nature Singapore, 2021.

[37]

Zhi-Hua Zhou. Open environment machine learning. arXiv preprint arXiv:2206.00423, 2022.

Cited By

Gu SXu CHu DHou C(2025)Adaptive Learning for Dynamic Features and Noisy LabelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.348921747:2(1219-1237)Online publication date: Feb-2025
https://doi.org/10.1109/TPAMI.2024.3489217
Qian YZhang ZZhao PZhou Z(2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 31-Jul-2024
https://dl.acm.org/doi/10.1145/3662186

Index Terms

Adaptive Learning for Weakly Labeled Streams
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Online learning settings
    2. Machine learning algorithms

Recommendations

Non-linear dictionary learning with partially labeled data

While recent techniques for discriminative dictionary learning have demonstrated tremendous success in image analysis applications, their performance is often limited by the amount of labeled data available for training. Even though labeling images is ...
Learning safe multi-label prediction for weakly labeled data

In this paper we study multi-label learning with weakly labeled data, i.e., labels of training examples are incomplete, which commonly occurs in real applications, e.g., image classification, document categorization. This setting includes, e.g., (i) ...
Learning with Noise: Mask-Guided Attention Model for Weakly Supervised Nuclei Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021
Abstract
Deep convolutional neural networks have been highly effective in segmentation tasks. However, high performance often requires large datasets with high-quality annotations, especially for segmentation, which requires precise pixel-wise labelling. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
345
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gu SXu CHu DHou C(2025)Adaptive Learning for Dynamic Features and Noisy LabelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.348921747:2(1219-1237)Online publication date: Feb-2025
https://doi.org/10.1109/TPAMI.2024.3489217
Qian YZhang ZZhao PZhou Z(2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 31-Jul-2024
https://dl.acm.org/doi/10.1145/3662186

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents