Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos

Authors

  • Zixiao Wang Tsinghua University
  • Junwu Weng Tencent AI Lab
  • Chun Yuan Tsinghua University
  • Jue Wang Tencent AI Lab

DOI:

https://doi.org/10.1609/aaai.v37i3.25375

Keywords:

CV: Video Understanding & Activity Analysis

Abstract

Learning with noisy label is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering temporal semantics and computational cost is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) a lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category. 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed truNcatE-split-contrAsT (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10% of it, our method achieves over 0.4 noise detection F1-score and 5% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6%.

Downloads

Published

2023-06-26

How to Cite

Wang, Z., Weng, J., Yuan, C., & Wang, J. (2023). Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2751-2758. https://doi.org/10.1609/aaai.v37i3.25375

Issue

Section

AAAI Technical Track on Computer Vision III