research-article

STARS: spatial-temporal active re-sampling for label-efficient learning from noisy annotations

AUTHORs:

Weishi Shi, and

Qi YuAuthors Info & Claims

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

February 2023

Article No.: 1232, Pages 10980 - 10988

https://doi.org/10.1609/aaai.v37i9.26301

Published: 07 February 2023 Publication History

Abstract

Active learning (AL) aims to sample the most informative data instances for labeling, which makes the model fitting data efficient while significantly reducing the annotation cost. However, most existing AL models make a strong assumption that the annotated data instances are always assigned correct labels, which may not hold true in many practical settings. In this paper, we develop a theoretical framework to formally analyze the impact of noisy annotations in AL and show that systematically re-sampling guarantees to reduce the noise rate, which can lead to improved generalization capability. More importantly, the theoretical framework demonstrates the key benefit of conducting active re-sampling on label-efficient learning, which is critical for AL. The theoretical results also suggest essential properties of an active re-sampling function with a fast convergence speed and guaranteed error reduction. This inspires us to design a novel spatial-temporal active resampling function by leveraging the important spatial and temporal properties of maximum-margin classifiers. Extensive experiments conducted on both synthetic and real-world data clearly demonstrate the effectiveness of the proposed active re-sampling function.

References

[1]

Bouguelia, M.-R.; Nowaczyk, S.; Santosh, K.; and Verikas, A. 2018. Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8): 1307-1319.

[2]

Chittilappilly, A. I.; Chen, L.; and Amer-Yahia, S. 2016. A survey of general-purpose crowdsourcing techniques. IEEE Transactions on Knowledge and Data Engineering, 28(9): 2246-2266.

Digital Library

[3]

Donmez, P.; and Carbonell, J. G. 2008. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the 17th ACM conference on Information and knowledge management, 619-628.

[4]

Du, J.; and Ling, C. X. 2010. Active learning with humanlike noisy oracle. In 2010 IEEE International Conference on Data Mining, 797-802. IEEE.

[5]

Dua, D.; and Graff, C. 2017. UCI Machine Learning Repository. Institution: University of California, Irvine, School of Information and Computer Sciences.

[6]

Hung, N. Q. V.; Tam, N. T.; Tran, L. N.; and Aberer, K. 2013. An evaluation of aggregation techniques in crowdsourcing. In International Conference on Web Information Systems Engineering, 1-15. Springer.

[7]

Ipeirotis, P. 2011. Crowdsourcing using mechanical turk: quality management and scalability. In Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011, 1.

Digital Library

[8]

Joshi, A. J.; Porikli, F.; and Papanikolopoulos, N. 2009. Multi-class active learning for image classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2372-2379. IEEE.

[9]

Khattak, F. K.; and Salleb-Aouissi, A. 2011. Quality control of crowd labeling through expert evaluation. In Proceedings of the NIPS 2nd Workshop on Computational Social Science and the Wisdom of Crowds, volume 2, 5.

[10]

Khetan, A.; Lipton, Z. C.; and Anandkumar, A. 2017. Learning from noisy singly-labeled data. arXiv preprint arXiv:1712.04577.

[11]

Kirsch, A.; Van Amersfoort, J.; and Gal, Y. 2019. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, 32.

[12]

Lin, C. H.; Mausam, M.; and Weld, D. S. 2015. Reactive learning: Actively trading off larger noisier training sets against smaller cleaner ones. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France (ICML).

[13]

Lin, C. H.; Mausam, M.; and Weld, D. S. 2016. Re-active learning: Active learning with relabeling. In Thirtieth AAAI Conference on Artificial Intelligence.

[14]

Luo, W.; Schwing, A.; and Urtasun, R. 2013. Latent structured active learning. Advances in Neural Information Processing Systems, 26.

[15]

Mohri, M.; Rostamizadeh, A.; and Talwalkar, A. 2018. Foundations of machine learning. MIT press.

[16]

Mozafari, B.; Sarkar, P.; Franklin, M.; Jordan, M.; and Madden, S. 2014. Scaling up crowd-sourcing to very large datasets: a case for active learning. Proceedings of the VLDB Endowment, 8(2): 125-136.

Digital Library

[17]

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12: 2825-2830.

Digital Library

[18]

Ramezani, M.; Sandvig, J. J.; Schimoler, T.; Gemmell, J.; Mobasher, B.; and Burke, R. 2009. Evaluating the impact of attacks in collaborative tagging environments. In 2009 International Conference on Computational Science and Engineering, volume 4, 136-143. IEEE.

[19]

Sheng, V. S.; Provost, F.; and Ipeirotis, P. G. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 614-622.

Digital Library

[20]

Shi, W.; and Yu, Q. 2018. An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains. In 2018 IEEE International Conference on Data Mining (ICDM).

[21]

Whitehill, J.; Wu, T.-f.; Bergsma, J.; Movellan, J.; and Ruvolo, P. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Bengio, Y.; Schuurmans, D.; Lafferty, J.; Williams, C.; and Culotta, A., eds., Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc.

[22]

Yoo, D.; and Kweon, I. S. 2019. Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 93-102.

[23]

Younesian, T.; Zhao, Z.; Ghiassi, A.; Birke, R.; and Chen, L. Y. 2021. QActor: Active Learning on Noisy Labels. In Asian Conference on Machine Learning, 548-563. PMLR.

[24]

Yu, D.; Shi, W. S.; and Yu, Q. 2023. Appendix: STARS: Spatial-Temporal Active Re-Sampling for Label-Efficient Learning from Noisy Annotations. https://github.com/ritmininglab/STARS.git.

[25]

Zhang, X.-Y.; Wang, S.; and Yun, X. 2015. Bidirectional active learning: A two-way exploration into unlabeled and labeled data set. IEEE Transactions on Neural Networks and Learning Systems, 26(12): 3034-3044.

[26]

Zhao, L.; Sukthankar, G.; and Sukthankar, R. 2011. Incremental relabeling for active learning with noisy crowdsourced annotations. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, 728-733. IEEE.

[27]

Zhao, Z.; Birke, R.; Han, R.; Robu, B.; Bouchenak, S.; Mokhtar, S. B.; and Chen, L. Y. 2021. Enhancing robustness of on-line learning models on highly noisy data. IEEE Transactions on Dependable and Secure Computing, 18(5): 2177-2192.

Digital Library

[28]

Zhao, Z.; Cerf, S.; Birke, R.; Robu, B.; Bouchenak, S.; Mokhtar, S. B.; and Chen, L. Y. 2019. Robust anomaly detection on unreliable data. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 630-637. IEEE.

Recommendations

Seeing stars when there aren't many stars: graph-based semi-supervised learning for sentiment categorization
TextGraphs-1: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing

We present a graph-based semi-supervised learning algorithm to address the sentiment analysis task of rating inference. Given a set of documents (e.g., movie reviews) and accompanying ratings (e.g., "4 stars"), the task calls for inferring numerical ...
Read More
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

February 2023

16496 pages

ISBN:978-1-57735-880-0

Copyright © 2023 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 07 February 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents