Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1609/aaai.v37i9.26301guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

STARS: spatial-temporal active re-sampling for label-efficient learning from noisy annotations

Published: 07 February 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Active learning (AL) aims to sample the most informative data instances for labeling, which makes the model fitting data efficient while significantly reducing the annotation cost. However, most existing AL models make a strong assumption that the annotated data instances are always assigned correct labels, which may not hold true in many practical settings. In this paper, we develop a theoretical framework to formally analyze the impact of noisy annotations in AL and show that systematically re-sampling guarantees to reduce the noise rate, which can lead to improved generalization capability. More importantly, the theoretical framework demonstrates the key benefit of conducting active re-sampling on label-efficient learning, which is critical for AL. The theoretical results also suggest essential properties of an active re-sampling function with a fast convergence speed and guaranteed error reduction. This inspires us to design a novel spatial-temporal active resampling function by leveraging the important spatial and temporal properties of maximum-margin classifiers. Extensive experiments conducted on both synthetic and real-world data clearly demonstrate the effectiveness of the proposed active re-sampling function.

    References

    [1]
    Bouguelia, M.-R.; Nowaczyk, S.; Santosh, K.; and Verikas, A. 2018. Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8): 1307-1319.
    [2]
    Chittilappilly, A. I.; Chen, L.; and Amer-Yahia, S. 2016. A survey of general-purpose crowdsourcing techniques. IEEE Transactions on Knowledge and Data Engineering, 28(9): 2246-2266.
    [3]
    Donmez, P.; and Carbonell, J. G. 2008. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the 17th ACM conference on Information and knowledge management, 619-628.
    [4]
    Du, J.; and Ling, C. X. 2010. Active learning with humanlike noisy oracle. In 2010 IEEE International Conference on Data Mining, 797-802. IEEE.
    [5]
    Dua, D.; and Graff, C. 2017. UCI Machine Learning Repository. Institution: University of California, Irvine, School of Information and Computer Sciences.
    [6]
    Hung, N. Q. V.; Tam, N. T.; Tran, L. N.; and Aberer, K. 2013. An evaluation of aggregation techniques in crowdsourcing. In International Conference on Web Information Systems Engineering, 1-15. Springer.
    [7]
    Ipeirotis, P. 2011. Crowdsourcing using mechanical turk: quality management and scalability. In Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011, 1.
    [8]
    Joshi, A. J.; Porikli, F.; and Papanikolopoulos, N. 2009. Multi-class active learning for image classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2372-2379. IEEE.
    [9]
    Khattak, F. K.; and Salleb-Aouissi, A. 2011. Quality control of crowd labeling through expert evaluation. In Proceedings of the NIPS 2nd Workshop on Computational Social Science and the Wisdom of Crowds, volume 2, 5.
    [10]
    Khetan, A.; Lipton, Z. C.; and Anandkumar, A. 2017. Learning from noisy singly-labeled data. arXiv preprint arXiv:1712.04577.
    [11]
    Kirsch, A.; Van Amersfoort, J.; and Gal, Y. 2019. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, 32.
    [12]
    Lin, C. H.; Mausam, M.; and Weld, D. S. 2015. Reactive learning: Actively trading off larger noisier training sets against smaller cleaner ones. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France (ICML).
    [13]
    Lin, C. H.; Mausam, M.; and Weld, D. S. 2016. Re-active learning: Active learning with relabeling. In Thirtieth AAAI Conference on Artificial Intelligence.
    [14]
    Luo, W.; Schwing, A.; and Urtasun, R. 2013. Latent structured active learning. Advances in Neural Information Processing Systems, 26.
    [15]
    Mohri, M.; Rostamizadeh, A.; and Talwalkar, A. 2018. Foundations of machine learning. MIT press.
    [16]
    Mozafari, B.; Sarkar, P.; Franklin, M.; Jordan, M.; and Madden, S. 2014. Scaling up crowd-sourcing to very large datasets: a case for active learning. Proceedings of the VLDB Endowment, 8(2): 125-136.
    [17]
    Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12: 2825-2830.
    [18]
    Ramezani, M.; Sandvig, J. J.; Schimoler, T.; Gemmell, J.; Mobasher, B.; and Burke, R. 2009. Evaluating the impact of attacks in collaborative tagging environments. In 2009 International Conference on Computational Science and Engineering, volume 4, 136-143. IEEE.
    [19]
    Sheng, V. S.; Provost, F.; and Ipeirotis, P. G. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 614-622.
    [20]
    Shi, W.; and Yu, Q. 2018. An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains. In 2018 IEEE International Conference on Data Mining (ICDM).
    [21]
    Whitehill, J.; Wu, T.-f.; Bergsma, J.; Movellan, J.; and Ruvolo, P. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Bengio, Y.; Schuurmans, D.; Lafferty, J.; Williams, C.; and Culotta, A., eds., Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc.
    [22]
    Yoo, D.; and Kweon, I. S. 2019. Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 93-102.
    [23]
    Younesian, T.; Zhao, Z.; Ghiassi, A.; Birke, R.; and Chen, L. Y. 2021. QActor: Active Learning on Noisy Labels. In Asian Conference on Machine Learning, 548-563. PMLR.
    [24]
    Yu, D.; Shi, W. S.; and Yu, Q. 2023. Appendix: STARS: Spatial-Temporal Active Re-Sampling for Label-Efficient Learning from Noisy Annotations. https://github.com/ritmininglab/STARS.git.
    [25]
    Zhang, X.-Y.; Wang, S.; and Yun, X. 2015. Bidirectional active learning: A two-way exploration into unlabeled and labeled data set. IEEE Transactions on Neural Networks and Learning Systems, 26(12): 3034-3044.
    [26]
    Zhao, L.; Sukthankar, G.; and Sukthankar, R. 2011. Incremental relabeling for active learning with noisy crowdsourced annotations. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, 728-733. IEEE.
    [27]
    Zhao, Z.; Birke, R.; Han, R.; Robu, B.; Bouchenak, S.; Mokhtar, S. B.; and Chen, L. Y. 2021. Enhancing robustness of on-line learning models on highly noisy data. IEEE Transactions on Dependable and Secure Computing, 18(5): 2177-2192.
    [28]
    Zhao, Z.; Cerf, S.; Birke, R.; Robu, B.; Bouchenak, S.; Mokhtar, S. B.; and Chen, L. Y. 2019. Robust anomaly detection on unreliable data. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 630-637. IEEE.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence
    February 2023
    16496 pages
    ISBN:978-1-57735-880-0

    Sponsors

    • Association for the Advancement of Artificial Intelligence

    Publisher

    AAAI Press

    Publication History

    Published: 07 February 2023

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media