research-article

SWRM: Similarity Window Reweighting and Margin for Long-Tailed Recognition

Authors:

Qingfa LiuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 6

Article No.: 181, Pages 1 - 18

https://doi.org/10.1145/3643816

Published: 08 March 2024 Publication History

Abstract

Real-world data usually obeys a long-tailed distribution, where a few classes have higher number of samples compared to the other classes. Recent studies have been proposed to alleviate the extreme data imbalance from different perspectives. In this article, we experimentally find that due to the easily confusing visual features between some head- and tail classes, the cross-entropy model is prone to misclassify tail samples to similar head classes. Therefore, to alleviate the influence of the confusion on model performance and improve the classification of tail classes, we propose a Similarity Window Reweighting and Margin (SWRM) algorithm, where the SWRM consists of Similarity Window Reweighting (SWR) and Similarity Window Margin (SWM) algorithms. For the confusable head- and tail classes, SWR assigns larger weights to tail classes and smaller weights to head classes. Therefore, the model can enlarge the importance of tail classes and effectively improve their classification. Moreover, SWR considers the difference in label frequency and the impact of category similarity simultaneously, so that the weight coefficients are more reasonable and efficacious. SWM generates adaptive margins that are proportional to the ratio of the classifier’s weight norm, thus promoting the learning of tail classifier with small weight norm. Our SWRM effectively eliminates the confusion between head- and tail classes and alleviates the misclassification issues. Extensive experiments on three long-tailed datasets, i.e., CIFAR100-LT, ImageNet-LT, and Places-LT, verify our proposed method’s effectiveness and superiority over comparative methods.

References

[1]

Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, and Shu Kong. 2022. Long-tailed recognition via weight balancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6897–6907.

[2]

Jiarui Cai, Yizhou Wang, and Jenq-Neng Hwang. 2021. Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 112–121.

[3]

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems 32 (2019), 1565–1576.

[4]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.

[5]

Qiong Chen, Tianlin Huang, Geren Zhu, and Enlu Lin. 2023. A dual-branch model with inter-and intra-branch contrastive loss for long-tailed recognition. Neural Networks 168 (2023), 214–222.

Digital Library

[6]

Zhineng Chen, Shanshan Ai, and Caiyan Jia. 2019. Structure-aware deep learning for product image classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1–20.

Digital Library

[7]

Hsin-Ping Chou, Shih-Chieh Chang, Jia-Yu Pan, Wei Wei, and Da-Cheng Juan. 2020. Remix: Rebalanced mixup. In Proceedings of the Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Part VI 16. Springer, 95–110.

Digital Library

[8]

Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 113–123.

[9]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9268–9277.

[10]

Terrance DeVries and Graham W. Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).

[11]

Chris Drummond and Robert C. Holte. 2003. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Proceedings of the Workshop on Learning from Imbalanced Datasets II. 1–8.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[13]

Youngkyu Hong, Seungju Han, Kwanghee Choi, Seokjun Seo, Beomsu Kim, and Buru Chang. 2021. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6626–6636.

[14]

Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, and Sanjiv Kumar. 2022. Elm: Embedding and logit margins for long-tail learning. arXivpreprint arXiv:2204.13208 (2022).

[15]

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2020. Decoupling representation and classifier for long-tailed recognition. In Proceedings of the 8th International Conference on Learning Representations. OpenReview.net.

[16]

Jian Li, Ziyao Meng, Daqian Shi, Rui Song, Xiaolei Diao, Jingwen Wang, and Hao Xu. 2023. FCC: Feature clusters compression for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24080–24089.

[17]

Jun Li, Zichang Tan, Jun Wan, Zhen Lei, and Guodong Guo. 2022. Nested collaborative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6949–6958.

[18]

Mengke Li, Yiu-Ming Cheung, and Juyong Jiang. 2022. Feature-balanced loss for long-tailed visual recognition. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.

[19]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.

[20]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Part V 13. Springer, 740–755.

[21]

Xiangbin Liu, Jiesheng He, Liping Song, Shuai Liu, and Gautam Srivastava. 2021. Medical image classification based on an adaptive size deep learning model. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 3s (2021), 1–18.

Digital Library

[22]

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. 2019. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2537–2546.

[23]

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. 2021. Long-tail learning via logit adjustment. In Proceedings of the 9th International Conference on Learning Representations, 2021.

[24]

Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, and Jin Young Choi. 2022. The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6887–6896.

[25]

Seulki Park, Jongin Lim, Younghan Jeon, and Jin Young Choi. 2021. Influence-balanced loss for imbalanced visual classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 735–744.

[26]

Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, and Hongsheng Li. 2020. Balanced meta-softmax for long-tailed visual recognition. Advances in Neural Information Processing Systems 33 (2020), 4175–4186.

[27]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Feifei. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115 (2015), 211–252.

Digital Library

[28]

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in Neural Information Processing Systems 32 (2019), 1917–1928.

[29]

Saptarshi Sinha and Hiroki Ohashi. 2023. Difficulty-net: Learning to predict difficulty for long-tailed recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 6444–6453.

[30]

Saptarshi Sinha, Hiroki Ohashi, and Katsuyuki Nakamura. 2022. Class-difficulty based methods for long-tailed visual recognition. International Journal of Computer Vision 130, 10 (2022), 2517–2531.

Digital Library

[31]

Leslie N. Smith. 2022. Cyclical focal loss. arXiv preprint arXiv:2202.08978 (2022).

[32]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems 30 (2017), 4077–4087.

[33]

Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. 2020. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11662–11671.

[34]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).

[35]

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The inaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8769–8778.

[36]

Jianfeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, and Zhenghua Xu. 2021. Rsg: A simple but effective module for learning imbalanced datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3784–3793.

[37]

Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, and Dahua Lin. 2021. Seesaw loss for long-tailed instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9695–9704.

[38]

Peng Wang, Kai Han, Xiu-Shen Wei, Lei Zhang, and Lei Wang. 2021. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 943–952.

[39]

Tong Wang, Yousong Zhu, Chaoyang Zhao, Wei Zeng, Jinqiao Wang, and Ming Tang. 2021. Adaptive class suppression loss for long-tail object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3103–3112.

[40]

Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella X. Yu. 2020. Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020).

[41]

Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, and Junjie Yan. 2019. Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5017–5026.

[42]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2017. Learning to model the tail. Advances in Neural Information Processing Systems 30 (2017), 7029–7039.

[43]

Yong Yang, Qiong Chen, Yuan Feng, and Tianlin Huang. 2023. MIANet: Aggregating unbiased instance and general information for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7131–7140.

[44]

Renhui Zhang, Tiancheng Lin, Rui Zhang, and Yi Xu. 2022. Solving the long-tailed problem via intra-and inter-category balance. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2355–2359.

[45]

Zhisheng Zhong, Jiequan Cui, Shu Liu, and Jiaya Jia. 2021. Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16489–16498.

[46]

Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9719–9728.

[47]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (2017), 1452–1464.

Cited By

Wen HSong XChen XWei YNie LChua THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657727

Index Terms

SWRM: Similarity Window Reweighting and Margin for Long-Tailed Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

Mitigating biases in long-tailed recognition via semantic-guided feature transfer
Abstract
Dealing with significant class imbalance poses a significant challenge in various real-world applications, particularly when the accurate classification and generalization of minority classes are of crucial. One key factor that results in model ...
Long-Tailed Recognition Based on Self-attention Mechanism
Advanced Intelligent Computing Technology and Applications
Abstract
The long-tailed distribution data poses significant challenges for visual classification tasks. The existing solutions can be categorized into three main categories, i.e., class re-balancing, information augmentation, and module improvement. ...
Long-Tailed Time Series Classification via Feature Space Rebalancing
Database Systems for Advanced Applications
Abstract
Learning unbiased decision boundaries is crucial for time series classification. Real-world datasets typically exhibit long-tailed natures of class distributions, which results in an imbalanced feature space after training, i.e., decision ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 6

June 2024

715 pages

EISSN:1551-6865

DOI:10.1145/3613638

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2024

Online AM: 07 February 2024

Accepted: 25 January 2024

Revised: 29 October 2023

Received: 31 July 2023

Published in TOMM Volume 20, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
124
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)6

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wen HSong XChen XWei YNie LChua THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657727

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents