We released Deep Long-Tailed Learning: A Survey and our codebase to the community. In this survey, we reviewed recent advances in long-tailed learning based on deep neural networks. Existing long-tailed learning studies can be grouped into three main categories (i.e., class re-balancing, information augmentation and module improvement), which can be further classified into nine sub-categories (as shown in the below figure). We also provided empirical analysis for several state-of-the-art methods by evaluating to what extent they address the issue of class imbalance. We concluded the survey by highlighting important applications of deep long-tailed learning and identifying several promising directions for future research.
After completing this survey, we decided to release our long-tailed learning resources and codebase, hoping to push the development of the community. If you have any questions or suggestions, please feel free to contact us.
Symbol | Sampling |
CSL |
LA |
TL |
Aug |
---|---|---|---|---|---|
Type | Re-sampling | Class-sensitive Learning | Logit Adjustment | Transfer Learning | Data Augmentation |
Symbol | RL |
CD |
DT |
Ensemble |
other |
---|---|---|---|---|---|
Type | Representation Learning | Classifier Design | Decoupled Training | Ensemble Learning | Other Types |
Title | Venue | Year | Type | Code |
---|---|---|---|---|
Meta-weight-net: Learning an explicit mapping for sample weighting | NeurIPS | 2019 | CSL |
Official |
Learning imbalanced datasets with label-distribution-aware margin loss | NeurIPS | 2019 | CSL |
Official |
Dynamic curriculum learning for imbalanced data classification | ICCV | 2019 | Sampling |
|
Class-balanced loss based on effective number of samples | CVPR | 2019 | CSL |
Official |
Striking the right balance with uncertainty | CVPR | 2019 | CSL |
|
Feature transfer learning for face recognition with under-represented data | CVPR | 2019 | TL ,Aug |
|
Unequal-training for deep face recognition with long-tailed noisy data | CVPR | 2019 | RL |
Official |
Large-scale long-tailed recognition in an open world | CVPR | 2019 | RL |
Official |
Title | Venue | Year | Type | Code |
---|---|---|---|---|
Large scale fine-grained categorization and domain-specific transfer learning | CVPR | 2018 | TL |
Official |
Title | Venue | Year | Type | Code |
---|---|---|---|---|
Learning to model the tail | NeurIPS | 2017 | CSL |
|
Focal loss for dense object detection | ICCV | 2017 | CSL |
|
Range loss for deep face recognition with long-tailed training data | ICCV | 2017 | RL |
|
Class rectification hard mining for imbalanced deep learning | ICCV | 2017 | RL |
Title | Venue | Year | Type | Code |
---|---|---|---|---|
Learning deep representation for imbalanced classification | CVPR | 2016 | Sampling ,RL |
|
Factors in finetuning deep model for object detection with long-tail distribution | CVPR | 2016 | CSL ,RL |
Dataset | Long-tailed Task | # Class | # Training data | # Test data |
---|---|---|---|---|
ImageNet-LT | Classification | 1,000 | 115,846 | 50,000 |
CIFAR100-LT | Classification | 100 | 50,000 | 10,000 |
Places-LT | Classification | 365 | 62,500 | 36,500 |
iNaturalist 2018 | Classification | 8,142 | 437,513 | 24,426 |
LVIS v0.5 | Detection and Segmentation | 1,230 | 57,000 | 20,000 |
LVIS v1 | Detection and Segmentation | 1,203 | 100,000 | 19,800 |
VOC-LT | Multi-label Classification | 20 | 1,142 | 4,952 |
COCO-LT | Multi-label Classification | 80 | 1,909 | 5,000 |
VideoLT | Video Classification | 1,004 | 179,352 | 25,622 |
- To use our codebase, please install requirements:
pip install -r requirements.txt
- Hardware requirements: 4 GPUs with >= 23G GPU RAM are recommended.
- ImageNet-LT dataset: please download ImageNet-1K dataset, and put it to the ./data file.
data └──ImageNet ├── train └── val
- Softmax:
cd ./Main-codebase Training: python3 main.py --seed 1 --cfg config/ImageNet_LT/ce.yaml --exp_name imagenet/CE --gpu 0,1,2,3
- Weighted Softmax:
cd ./Main-codebase Training: python3 main.py --seed 1 --cfg config/ImageNet_LT/weighted_ce.yaml --exp_name imagenet/weighted_ce --gpu 0,1,2,3
- ESQL (Equalization loss):
cd ./Main-codebase Training: python3 main.py --seed 1 --cfg config/ImageNet_LT/seql.yaml --exp_name imagenet/seql --gpu 0,1,2,3
- Balanced Softmax:
cd ./Main-codebase Training: python3 main.py --seed 1 --cfg config/ImageNet_LT/balanced_softmax.yaml --exp_name imagenet/BS --gpu 0,1,2,3
- LADE:
cd ./Main-codebase Training: python3 main.py --seed 1 --cfg config/ImageNet_LT/lade.yaml --exp_name imagenet/LADE --gpu 0,1,2,3
- De-confound (Casual):
cd ./Main-codebase Training: python3 main.py --seed 1 --cfg config/ImageNet_LT/causal.yaml --exp_name imagenet/causal --remine_lambda 0.1 --alpha 0.005 --gpu 0,1,2,3
- Decouple (IB-CRT):
cd ./Main-codebase Training stage 1: python3 main.py --seed 1 --cfg config/ImageNet_LT/ce.yaml --exp_name imagenet/CE --gpu 0,1,2,3 Training stage 2: python3 main.py --cfg ./config/ImageNet_LT/cls_crt.yaml --model_dir exp_results/imagenet/CE/final_model_checkpoint.pth --gpu 0,1,2,3
- MiSLAS:
cd ./MiSLAS-codebase Training stage 1: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_stage1.py --cfg config/imagenet/imagenet_resnext50_stage1_mixup.yaml Training stage 2: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_stage2.py --cfg config/imagenet/imagenet_resnext50_stage2_mislas.yaml resume checkpoint_path Evalutation: CUDA_VISIBLE_DEVICES=0 python3 eval.py --cfg ./config/imagenet/imagenet_resnext50_stage2_mislas.yaml resume checkpoint_path_stage2
- RSG:
cd ./RSG-codebase Training: python3 imagenet_lt_train.py Evalutation: python3 imagenet_lt_test.py
- ResLT:
cd ./ResLT-codebase Training: CUDA_VISIBLE_DEVICES=0,1,2,3 bash sh/X50.sh Evalutation: CUDA_VISIBLE_DEVICES=0 bash sh/X50_eval.sh # The test performance can be found in the log file.
- PaCo:
cd ./PaCo-codebase Training: CUDA_VISIBLE_DEVICES=0,1,2,3 bash sh/ImageNetLT_train_X50.sh Evalutation: CUDA_VISIBLE_DEVICES=0 bash sh/ImageNetLT_eval_X50.sh # The test performance can be found in the log file.
- LDAM:
cd ./Ensemble-codebase Training: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py -c ./configs/config_imagenet_lt_resnext50_ldam.json Evalutation: CUDA_VISIBLE_DEVICES=0 python3 test.py -r checkpoint_path
- RIDE:
cd ./Ensemble-codebase Training: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py -c ./configs/config_imagenet_lt_resnext50_ride.json Evalutation: CUDA_VISIBLE_DEVICES=0 python3 test.py -r checkpoint_path
- SADE:
cd ./Ensemble-codebase Training: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py -c ./configs/config_imagenet_lt_resnext50_sade.json Evalutation: CUDA_VISIBLE_DEVICES=0 python3 test.py -r checkpoint_path
- We evaluate several state-of-the-art methods on ImageNet-LT to see to what extent they handle class imbalance via new evaluation metrics, i.e., UA (upper bound accuracy) and RA (relative accuracy). We categorize these methods based on class re-balancing (CR), information augmentation (IA) and module improvement (MI).
- Almost all long-tailed methods perform better than the Softmax baseline in terms of accuracy, which demonstrates the effectiveness of long-tailed learning.
- Training with 200 epochs leads to better performance for most long-tailed methods, since sufficient training enables deep models to fit data better and learn better image representations.
- In addition to accuracy, we also evaluate long-tailed methods based on UA and RA. For the methods that have higher UA, the performance gain comes not only from the alleviation of class imbalance, but also from other factors, like data augmentation or better network architectures. Therefore, simply using accuracy for evaluation is not accurate enough, while our proposed RA metric provides a good complement, since it alleviates the influences of factors apart from class imbalance.
- For example, MiSLAS, based on data mixup, has higher accuracy than Balanced Sofmtax under 90 training epochs, but it also has higher UA. As a result, the relative accuracy of MiSLAS is lower than Balanced Sofmtax, which means that Balanced Sofmtax alleviates class imbalance better than MiSLAS under 90 training epochs.
- Although some recent high-accuracy methods have lower RA, the overall development trend of long-tailed learning is still positive, as shown in the below figure.
- The current state-of-the-art long-tailed method in terms of both accuracy and RA is SADE (ensemble-based method).
- We further evaluate the performance of different cost-sensitive learning losses based on the decoupled training scheme.
- Decoupled training, compared to joint training, can further improve the overall performance of most cost-sensitive learning methods apart from balanced softmax (BS).
- Although BS outperofmrs other cost-sensitive losses under one-stage training, they perform comparably under decoupled training. This implies that although these cost-sensitive losses perform differently under joint training, they essentially learn similar quality of feature representations.
If this repository is helpful to you, please cite our survey.
@article{zhang2023deep,
title={Deep long-tailed learning: A survey},
author={Zhang, Yifan and Kang, Bingyi and Hooi, Bryan and Yan, Shuicheng and Feng, Jiashi},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
publisher={IEEE}
}