research-article

CFRL: Coarse-Fine Decoupled Representation Learning For Long-Tailed Recognition

Authors:

Xuequan LuAuthors Info & Claims

MMASIA '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

Article No.: 34, Pages 1 - 7

https://doi.org/10.1145/3696409.3700195

Published: 28 December 2024 Publication History

Abstract

Data often faces a severe class imbalance issue in the real world, meaning that the number of instances within classes varies greatly, following a long-tailed distribution. In this case, the direct application of supervised learning yields poor performance. Existing long-tailed recognition (LTR) methods often heavily rely on the label information to enhance tail classes’ accuracy at the expense of head class by an image-level end-to-end resampling strategy to address data distribution imbalance. Nevertheless, they neglect label bias, which can severely affect the LTR model’s accuracy. In this paper, we propose a novel approach, namely Coarse-Fine Decoupled Representation Learning (CFRL) for LTR. Our core idea is to decouple data representations from the classifier and decompose representation learning into two stages: image-level and patch-level. Specifically, in the image-level stage, we leverage unsupervised learning on image-level information to reduce the impact of label bias caused by imbalanced datasets. In the patch-level stage, we introduce patch-level rotation augmentation as negative samples, forcing the model to acquire more comprehensive information. Our theoretical and empirical analyses demonstrate that the approach does not sacrifice the accuracy of head classes while significantly reducing the overfitting of tail classes, improving both of them. We showcase state-of-the-art results on CIFAR, ImageNet, and iNaturalist datasets. Furthermore, we illustrate that this training methodology can be combined with various existing Long-Tailed Recognition (LTR) methods, further enhancing their performance.

Supplemental Material

PDF File

Sup of paper 35

Download
541.17 KB

References

[1]

Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, and Shu Kong. 2022. Long-tailed recognition via weight balancing. In CVPR.

[2]

Jiarui Cai, Yizhou Wang, Jenq-Neng Hwang, et al. 2021. Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In ICCV.

[3]

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems 32 (2019).

[4]

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. NeurIPS (2019).

[5]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML.

[6]

Hsin-Ping Chou, Shih-Chieh Chang, Jia-Yu Pan, Wei Wei, and Da-Cheng Juan. 2020. Remix: rebalanced mixup. In ECCV.

Digital Library

[7]

Jiequan Cui, Shu Liu, Zhuotao Tian, Zhisheng Zhong, and Jiaya Jia. 2022. Reslt: Residual learning for long-tailed recognition. PAMI (2022).

[8]

Jiequan Cui, Zhisheng Zhong, Shu Liu, Bei Yu, and Jiaya Jia. 2021. Parametric contrastive learning. In ICCV.

[9]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In CVPR.

[10]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.

[11]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. 2022. Masked Autoencoders Are Scalable Vision Learners. In CVPR.

[12]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.

[14]

Ruifei He, Jihan Yang, and Xiaojuan Qi. 2021. Re-Distributing Biased Pseudo Labels for Semi-Supervised Semantic Segmentation: A Baseline Investigation. In ICCV.

[15]

Yin-Yin He, Jianxin Wu, Xiu-Shen Wei, et al. 2021. Distilling virtual examples for long-tailed recognition. In ICCV.

[16]

Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:https://arXiv.org/abs/1903.12261 (2019).

[17]

Yan Hong, Jianfu Zhang, Zhongyi Sun, and Ke Yan. 2022. SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification. In ECCV.

Digital Library

[18]

Ahmet Iscen, Andre Araujo, Boqing Gong, and Cordelia Schmid. 2021. Class-Balanced Distillation for Long-Tailed Visual Recognition. In BMVC.

[19]

Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, and Boqing Gong. 2020. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In CVPR.

[20]

Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, and Jiashi Feng. 2020. Exploring balanced feature spaces for representation learning. In ICLR.

[21]

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2019. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:https://arXiv.org/abs/1910.09217 (2019).

[22]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. NeurIPS (2020).

[23]

A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009).

[24]

Bolian Li, Zongbo Han, Haining Li, Huazhu Fu, and Changqing Zhang. 2022. Trustworthy Long-Tailed Classification. In CVPR.

[25]

Jun Li, Zichang Tan, Jun Wan, Zhen Lei, and Guodong Guo. 2022. Nested Collaborative Learning for Long-Tailed Visual Recognition. In CVPR.

[26]

Mengke Li, Yiu-ming Cheung, Yang Lu, et al. 2022. Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment. In CVPR.

[27]

Tianhong Li, Peng Cao, Yuan Yuan, Lijie Fan, Yuzhe Yang, Rogerio S Feris, Piotr Indyk, and Dina Katabi. 2022. Targeted supervised contrastive learning for long-tailed recognition. In CVPR.

[28]

Tianhao Li, Limin Wang, and Gangshan Wu. 2021. Self supervision to distillation for long-tailed visual recognition. In ICCV.

[29]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In ICCV.

[30]

Hao Liu and Pieter Abbeel. 2020. Hybrid discriminative-generative training via contrastive learning. arXiv preprint arXiv:https://arXiv.org/abs/2007.09070 (2020).

[31]

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. 2019. Large-Scale Long-Tailed Recognition in an Open World. In CVPR.

[32]

Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. In ICLR.

[33]

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. 2021. Long-tail learning via logit adjustment. In ICLR.

[34]

Yao Qin, Chiyuan Zhang, Ting Chen, Balaji Lakshminarayanan, Alex Beutel, and Xuezhi Wang. 2022. Understanding and improving robustness of vision transformers through patch-based negative augmentation. Advances in Neural Information Processing Systems 35 (2022), 16276–16289.

[35]

Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. 2020. Balanced meta-softmax for long-tailed visual recognition. NeurIPS (2020).

[36]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV (2015).

[37]

Abhishek Sinha, Kumar Ayush, Jiaming Song, Burak Uzkent, Hongxia Jin, and Stefano Ermon. 2021. Negative data augmentation. arXiv preprint arXiv:https://arXiv.org/abs/2102.05113 (2021).

[38]

Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. 2020. Equalization loss for long-tailed object recognition. In CVPR.

[39]

Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. 2020. Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS (2020).

[40]

Hugo Touvron, Matthieu Cord, and Hervé Jégou. 2022. Deit iii: Revenge of the vit. In ECCV.

Digital Library

[41]

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The inaturalist species classification and detection dataset. In CVPR.

[42]

Hualiang Wang, Siming Fu, Xiaoxuan He, Hangxiang Fang, Zuozhu Liu, and Haoji Hu. 2022. Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning. In ECCV.

Digital Library

[43]

Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML.

[44]

Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella X. Yu. 2021. Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. In ICLR.

[45]

Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille, and Fan Yang. 2021. Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In CVPR.

[46]

Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. 2020. Distribution-balanced loss for multi-label classification in long-tailed datasets. In ECCV.

Digital Library

[47]

Liuyu Xiang, Guiguang Ding, Jungong Han, et al. 2020. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In ECCV.

Digital Library

[48]

Yue Xu, Yong-Lu Li, Jiefeng Li, and Cewu Lu. 2022. Constructing balance from imbalance for long-tailed image recognition. In ECCV. Springer.

Digital Library

[49]

Zhengzhuo Xu, Zenghao Chai, Chun Yuan, et al. 2021. Towards calibrated model for long-tailed visual recognition from prior perspective. NeurIPS 34 (2021).

[50]

Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai, and Chun Yuan. 2023. Learning Imbalanced Data with Vision Transformers. In CVPR.

[51]

Yuzhe Yang and Zhi Xu. 2020. Rethinking the value of labels for improving class-imbalanced learning. Advances in neural information processing systems (2020).

[52]

Sihao Yu, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Zizhen Wang, and Xueqi Cheng. 2022. A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty. In CVPR.

[53]

Weiping Yu, Taojiannan Yang, and Chen Chen. 2021. Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection. In WACV.

[54]

Songyang Zhang, Zeming Li, Shipeng Yan, Xuming He, and Jian Sun. 2021. Distribution alignment: A unified framework for long-tail visual recognition. In CVPR.

[55]

Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, and Yu Qiao. 2017. Range loss for deep face recognition with long-tailed training data. In ICCV.

[56]

Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. 2021. Test-agnostic long-tailed recognition by test-time aggregating diverse experts with self-supervision. arXiv preprint arXiv:https://arXiv.org/abs/2107.09249 (2021).

[57]

Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. 2022. Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. NeurIPS (2022).

[58]

Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, and Jihong Zhu. 2022. Adaptive logit adjustment loss for long-tailed visual recognition. In AAAI.

[59]

Zhisheng Zhong, Jiequan Cui, Shu Liu, and Jiaya Jia. 2021. Improving Calibration for Long-Tailed Recognition. In CVPR.

[60]

Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In CVPR.

[61]

Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping Phoebe Chen, and Yu-Gang Jiang. 2022. Balanced Contrastive Learning for Long-Tailed Visual Recognition. In CVPR.

Index Terms

CFRL: Coarse-Fine Decoupled Representation Learning For Long-Tailed Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation

Recommendations

Learning from Reduced Labels for Long-Tailed Data
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate ...
Dual class representation learning for few-shot image classification
Abstract
Few-shot learning (FSL) models are trained on base classes that have many training examples and evaluated on novel classes that have very few training examples. Since these models cannot be properly fine-tuned on the novel classes ...
Highlights
- Proposes dual class representation learning (DCRL) for few-shot image classification.
Few-shot learning with long-tailed labels
Abstract
Few-Shot Learning (FSL) is a challenging classification task in machine learning, and it aims to recognize unseen examples of new classes with only a few labeled reference examples (i.e., the support set). The training phase of FSL typically ...
Highlights
- We propose a new problem setting termed FSL-LTL to consider a frequently occurring practical issue in which the class labels are long-tailed.
- We build a novel two-stage training framework called RCE to solve this new problem. It ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

December 2024

939 pages

ISBN:9798400712739

DOI:10.1145/3696409

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MMAsia '24

Sponsor:

SIGMM

MMAsia '24: ACM Multimedia Asia

December 3 - 6, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
20
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)20

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten