chapter

Free access

Learning imbalanced datasets with label-distribution-aware margin loss

AUTHORs:

Nikos Arechiga,

Tengyu MaAuthors Info & Claims

Proceedings of the 33rd International Conference on Neural Information Processing Systems

December 2019

Article No.: 140, Pages 1567 - 1578

Published: 08 December 2019 Publication History

PDF eReader Publisher Site

Abstract

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains¹.

References

[1]

Tiny imagenet visual recognition challenge. URL https://tiny-imagenet.herokuapp.com.

[2]

Sanjeev Arora, Rong Ge, Behnam Neyshabur, and Yi Zhang. Stronger generalization bounds for deep nets via a compression approach. arXiv preprint arXiv:1802.05296, 2018.

[3]

Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and Animashree Anandkumar. Regularized learning for domain adaptation under label shifts. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl0r3R9KX.

[4]

Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240-6249, 2017.

Digital Library

[5]

Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106:249-259, 2018.

Digital Library

[6]

Jonathon Byrd and Zachary Lipton. What is the effect of importance weighting in deep learning? In International Conference on Machine Learning, 2019.

[7]

Kaidi Cao, Yu Rong, Cheng Li, Xiaoou Tang, and Chen Change Loy. Pose-robust face recognition via deep residual equivariant mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5187-5196, 2018.

[8]

Yair Carmon, Yujia Jin, Aaron Sidford, and Kevin Tian. Variance reduction for matrix games. arXiv preprint arXiv:1907.02056, 2019.

[9]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321-357, 2002.

[10]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[11]

John C Duchi, Tatsunori Hashimoto, and Hongseok Namkoong. Distributionally robust losses against mixture covariate shifts.

[12]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303-338, 2010.

Digital Library

[13]

Noah Golowich, Alexander Rakhlin, and Ohad Shamir. Size-independent sample complexity of neural networks. arXiv preprint arXiv:1712.06541, 2017.

[14]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.

[15]

Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision, pages 87-102. Springer, 2016.

[16]

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning, pages 1934-1943, 2018.

[17]

Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, (9):1263-1284, 2008.

[18]

Haibo He and Yunqian Ma. Imbalanced learning: foundations, algorithms, and applications. John Wiley & Sons, 2013.

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.

[20]

J Henry Hinnefeld, Peter Cooman, Nat Mammo, and Rupert Deese. Evaluating fairness metrics in the presence of dataset bias. arXiv preprint arXiv:1809.09245, 2018.

[21]

Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning deep representation for imbalanced classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5375-5384, 2016.

[22]

Chen Huang, Yining Li, Change Loy Chen, and Xiaoou Tang. Deep imbalanced learning for face recognition and attribute prediction. IEEE transactions on pattern analysis and machine intelligence, 2019.

[23]

Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429-449, 2002.

Digital Library

[24]

Sham M Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in neural information processing systems, pages 793-800, 2009.

Digital Library

[25]

Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, and Ling Shao. Striking the right balance with uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 103-112, 2019.

[26]

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[27]

Vladimir Koltchinskii, Dmitry Panchenko, et al. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1):1-50, 2002.

[28]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32-73, 2017.

Digital Library

[29]

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[30]

Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[31]

Buyu Li, Yu Liu, and Xiaogang Wang. Gradient harmonized single-stage detector. arXiv preprint arXiv:1811.05181, 2018.

[32]

Yaoyong Li, Hugo Zaragoza, Ralf Herbrich, John Shawe-Taylor, and Jaz Kandola. The perceptron algorithm with uneven margins. In ICML, volume 2, pages 379-386, 2002.

Digital Library

[33]

Zeju Li, Konstantinos Kamnitsas, and Ben Glocker. Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 402-410. Springer, 2019.

[34]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer, 2014.

[35]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980-2988, 2017.

[36]

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. In International Conference on Machine Learning, pages 3128-3136, 2018.

[37]

Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. Large-margin softmax loss for convolutional neural networks. In ICML, volume 2, page 7, 2016.

[38]

Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 212-220, 2017.

[39]

Yu Liu, Hongyang Li, and Xiaogang Wang. Rethinking feature discrimination and polymerization for large-scale recognition. arXiv preprint arXiv:1710.00870, 2017.

[40]

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2537-2546, 2019.

[41]

Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pages 142-150. Association for Computational Linguistics, 2011.

Digital Library

[42]

Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. Diversity in faces. arXiv preprint arXiv:1901.10436, 2019.

[43]

Vaishnavh Nagarajan and Zico Kolter. Deterministic PAC-bayesian generalization bounds for deep networks via generalizing noise-resilience. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Hygn2o0qKX.

[44]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.

[45]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211-252, 2015.

Digital Library

[46]

Li Shen, Zhouchen Lin, and Qingming Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In European conference on computer vision, pages 467-482. Springer, 2016.

[47]

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. Meta-weight-net: Learning an explicit mapping for sample weighting. arXiv preprint arXiv:1902.07379, 2019.

[48]

Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822-2878, 2018.

Digital Library

[49]

Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293-300, 1999.

[50]

Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. Yfcc100m: The new data in multimedia research. arXiv preprint arXiv:1503.01817, 2015.

[51]

Grant Van Horn and Pietro Perona. The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv:1709.01450, 2017.

[52]

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769-8778, 2018.

[53]

Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7):926-930, 2018.

[54]

Mei Wang and Weihong Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135-153, 2018.

Digital Library

[55]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Learning to model the tail. In Advances in Neural Information Processing Systems, pages 7029-7039, 2017.

[56]

Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7278-7286, 2018.

[57]

Colin Wei and Tengyu Ma. Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. arXiv e-prints, art. arXiv:1905.03684, May 2019.

[58]

Colin Wei and Tengyu Ma. Improved sample complexities for deep networks and robust classification via an all-layer margin. arXiv preprint arXiv:1910.04284, 2019.

[59]

Colin Wei, Jason D Lee, Qiang Liu, and Tengyu Ma. On the margin theory of feedforward neural networks. arXiv preprint arXiv:1810.05369, 2018.

[60]

Q Zhong, C Li, Y Zhang, H Sun, S Yang, D Xie, and S Pu. Towards good practices for recognition & detection. In CVPR workshops, 2016.

[61]

Yang Zou, Zhiding Yu, BVK Kumar, and Jinsong Wang. Domain adaptation for semantic segmentation via class-balanced self-training. arXiv preprint arXiv:1810.07911, 2018.

Cited By

Qi TXie HLi PGe JZhang Y(2024)Balanced Classification: A Unified Framework for Long-Tailed Object DetectionIEEE Transactions on Multimedia10.1109/TMM.2023.330696826(3088-3101)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3306968
Hu YGao JXu C(2024)Learning Multi-Expert Distribution Calibration for Long-Tailed Video ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326788726(555-567)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3267887
Wang YSun KGuo CZhong SLiu HMa Y(2024)Multiple Contrastive Experts for long-tailed image classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124613255:PBOnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124613
Show More Cited By

Index Terms

Learning imbalanced datasets with label-distribution-aware margin loss
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Maximum margin partial label learning

Partial label learning aims to learn from training examples each associated with a set of candidate labels, among which only one label is valid for the training example. The basic strategy to learn from partial label examples is disambiguation, i.e. by ...
Multi-label Learning by Exploiting Imbalanced Label Correlations
PRICAI 2021: Trends in Artificial Intelligence
Abstract
Multi-label classification refers to the supervised learning problem where an instance may be associated with multiple labels. It is well known that exploiting label correlations is important for multi-label learning. Existing approaches typically ...
Value-Aware Resampling and Loss for Imbalanced Classification
CSAE '18: Proceedings of the 2nd International Conference on Computer Science and Application Engineering

Existing1 machine learning methods usually treat training samples equally, and their performance degrades significantly when facing imbalanced training data. This paper introduces Value-Aware Resampling and Loss (VARL) to tackle the imbalanced ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

December 2019

15947 pages

Copyright © 2019 Neural Information Processing Systems Foundation, Inc.

In-Cooperation

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 08 December 2019

Qualifiers

Chapter
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)18

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qi TXie HLi PGe JZhang Y(2024)Balanced Classification: A Unified Framework for Long-Tailed Object DetectionIEEE Transactions on Multimedia10.1109/TMM.2023.330696826(3088-3101)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3306968
Hu YGao JXu C(2024)Learning Multi-Expert Distribution Calibration for Long-Tailed Video ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326788726(555-567)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3267887
Wang YSun KGuo CZhong SLiu HMa Y(2024)Multiple Contrastive Experts for long-tailed image classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124613255:PBOnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124613
Yan CMeng LLi LZhang JWang ZYin JZhang JSun YZheng B(2022)Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-attentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347281018:1s(1-18)Online publication date: 25-Jan-2022
https://dl.acm.org/doi/10.1145/3472810
Goyal ABiswas SChellappa RChaudhury SArora CChaudhuri PMaji S(2021)Few-shot classification without forgetting of event-camera dataProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490304(1-8)Online publication date: 19-Dec-2021
https://dl.acm.org/doi/10.1145/3490035.3490304
Vigneswaran RLaw MBalasubramanian VTapaswi MChellappa RChaudhury SArora CChaudhuri PMaji S(2021)Feature generation for long-tail classificationProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490300(1-9)Online publication date: 19-Dec-2021
https://dl.acm.org/doi/10.1145/3490035.3490300
Gong LTang SCao JZhuang YTang XWu GHan YTang HLi XWang XYan BGao BYang Y(2021)Domain Balanced Sampling and Iterative Search for Product IdentificationProceedings of the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge10.1145/3475956.3484483(1-8)Online publication date: 24-Oct-2021
https://dl.acm.org/doi/10.1145/3475956.3484483
Li XMa HMeng LMeng XSong DTao DYuille AAnandkumar ALiu AChen XLi YXiao CYang XLiu X(2021)Comparative Study of Adversarial Training Methods for Long-tailed ClassificationProceedings of the 1st International Workshop on Adversarial Learning for Multimedia10.1145/3475724.3483601(1-7)Online publication date: 20-Oct-2021
https://dl.acm.org/doi/10.1145/3475724.3483601
Ding MZhang SYang JShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Improving Pedestrian Detection from a Long-tailed Domain PerspectiveProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475573(2918-2926)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475573
Peng ZHuang WGuo ZZhang XJiao JYe QShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Long-tailed Distribution AdaptationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475479(3275-3282)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475479
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Media

Figures

Other

Tables

View Table of Contents