Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3454287.3454427guideproceedingsArticle/Chapter ViewAbstractPublication PagesMonograph
chapter
Free access

Learning imbalanced datasets with label-distribution-aware margin loss

Published: 08 December 2019 Publication History

Abstract

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains1.

References

[1]
Tiny imagenet visual recognition challenge. URL https://tiny-imagenet.herokuapp.com.
[2]
Sanjeev Arora, Rong Ge, Behnam Neyshabur, and Yi Zhang. Stronger generalization bounds for deep nets via a compression approach. arXiv preprint arXiv:1802.05296, 2018.
[3]
Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and Animashree Anandkumar. Regularized learning for domain adaptation under label shifts. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl0r3R9KX.
[4]
Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240-6249, 2017.
[5]
Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106:249-259, 2018.
[6]
Jonathon Byrd and Zachary Lipton. What is the effect of importance weighting in deep learning? In International Conference on Machine Learning, 2019.
[7]
Kaidi Cao, Yu Rong, Cheng Li, Xiaoou Tang, and Chen Change Loy. Pose-robust face recognition via deep residual equivariant mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5187-5196, 2018.
[8]
Yair Carmon, Yujia Jin, Aaron Sidford, and Kevin Tian. Variance reduction for matrix games. arXiv preprint arXiv:1907.02056, 2019.
[9]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321-357, 2002.
[10]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[11]
John C Duchi, Tatsunori Hashimoto, and Hongseok Namkoong. Distributionally robust losses against mixture covariate shifts.
[12]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303-338, 2010.
[13]
Noah Golowich, Alexander Rakhlin, and Ohad Shamir. Size-independent sample complexity of neural networks. arXiv preprint arXiv:1712.06541, 2017.
[14]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[15]
Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision, pages 87-102. Springer, 2016.
[16]
Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning, pages 1934-1943, 2018.
[17]
Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, (9):1263-1284, 2008.
[18]
Haibo He and Yunqian Ma. Imbalanced learning: foundations, algorithms, and applications. John Wiley & Sons, 2013.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
[20]
J Henry Hinnefeld, Peter Cooman, Nat Mammo, and Rupert Deese. Evaluating fairness metrics in the presence of dataset bias. arXiv preprint arXiv:1809.09245, 2018.
[21]
Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning deep representation for imbalanced classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5375-5384, 2016.
[22]
Chen Huang, Yining Li, Change Loy Chen, and Xiaoou Tang. Deep imbalanced learning for face recognition and attribute prediction. IEEE transactions on pattern analysis and machine intelligence, 2019.
[23]
Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429-449, 2002.
[24]
Sham M Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in neural information processing systems, pages 793-800, 2009.
[25]
Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, and Ling Shao. Striking the right balance with uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 103-112, 2019.
[26]
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[27]
Vladimir Koltchinskii, Dmitry Panchenko, et al. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1):1-50, 2002.
[28]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32-73, 2017.
[29]
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
[30]
Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
[31]
Buyu Li, Yu Liu, and Xiaogang Wang. Gradient harmonized single-stage detector. arXiv preprint arXiv:1811.05181, 2018.
[32]
Yaoyong Li, Hugo Zaragoza, Ralf Herbrich, John Shawe-Taylor, and Jaz Kandola. The perceptron algorithm with uneven margins. In ICML, volume 2, pages 379-386, 2002.
[33]
Zeju Li, Konstantinos Kamnitsas, and Ben Glocker. Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 402-410. Springer, 2019.
[34]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer, 2014.
[35]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980-2988, 2017.
[36]
Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. In International Conference on Machine Learning, pages 3128-3136, 2018.
[37]
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. Large-margin softmax loss for convolutional neural networks. In ICML, volume 2, page 7, 2016.
[38]
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 212-220, 2017.
[39]
Yu Liu, Hongyang Li, and Xiaogang Wang. Rethinking feature discrimination and polymerization for large-scale recognition. arXiv preprint arXiv:1710.00870, 2017.
[40]
Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2537-2546, 2019.
[41]
Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pages 142-150. Association for Computational Linguistics, 2011.
[42]
Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. Diversity in faces. arXiv preprint arXiv:1901.10436, 2019.
[43]
Vaishnavh Nagarajan and Zico Kolter. Deterministic PAC-bayesian generalization bounds for deep networks via generalizing noise-resilience. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Hygn2o0qKX.
[44]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
[45]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211-252, 2015.
[46]
Li Shen, Zhouchen Lin, and Qingming Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In European conference on computer vision, pages 467-482. Springer, 2016.
[47]
Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. Meta-weight-net: Learning an explicit mapping for sample weighting. arXiv preprint arXiv:1902.07379, 2019.
[48]
Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822-2878, 2018.
[49]
Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293-300, 1999.
[50]
Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. Yfcc100m: The new data in multimedia research. arXiv preprint arXiv:1503.01817, 2015.
[51]
Grant Van Horn and Pietro Perona. The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv:1709.01450, 2017.
[52]
Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769-8778, 2018.
[53]
Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7):926-930, 2018.
[54]
Mei Wang and Weihong Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135-153, 2018.
[55]
Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Learning to model the tail. In Advances in Neural Information Processing Systems, pages 7029-7039, 2017.
[56]
Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7278-7286, 2018.
[57]
Colin Wei and Tengyu Ma. Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. arXiv e-prints, art. arXiv:1905.03684, May 2019.
[58]
Colin Wei and Tengyu Ma. Improved sample complexities for deep networks and robust classification via an all-layer margin. arXiv preprint arXiv:1910.04284, 2019.
[59]
Colin Wei, Jason D Lee, Qiang Liu, and Tengyu Ma. On the margin theory of feedforward neural networks. arXiv preprint arXiv:1810.05369, 2018.
[60]
Q Zhong, C Li, Y Zhang, H Sun, S Yang, D Xie, and S Pu. Towards good practices for recognition & detection. In CVPR workshops, 2016.
[61]
Yang Zou, Zhiding Yu, BVK Kumar, and Jinsong Wang. Domain adaptation for semantic segmentation via class-balanced self-training. arXiv preprint arXiv:1810.07911, 2018.

Cited By

View all

Index Terms

  1. Learning imbalanced datasets with label-distribution-aware margin loss
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems
      December 2019
      15947 pages

      In-Cooperation

      Publisher

      Curran Associates Inc.

      Red Hook, NY, United States

      Publication History

      Published: 08 December 2019

      Qualifiers

      • Chapter
      • Research
      • Refereed limited

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)98
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Balanced Classification: A Unified Framework for Long-Tailed Object DetectionIEEE Transactions on Multimedia10.1109/TMM.2023.330696826(3088-3101)Online publication date: 1-Jan-2024
      • (2024)Learning Multi-Expert Distribution Calibration for Long-Tailed Video ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326788726(555-567)Online publication date: 1-Jan-2024
      • (2024)Multiple Contrastive Experts for long-tailed image classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124613255:PBOnline publication date: 1-Dec-2024
      • (2022)Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-attentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347281018:1s(1-18)Online publication date: 25-Jan-2022
      • (2021)Few-shot classification without forgetting of event-camera dataProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490304(1-8)Online publication date: 19-Dec-2021
      • (2021)Feature generation for long-tail classificationProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490300(1-9)Online publication date: 19-Dec-2021
      • (2021)Domain Balanced Sampling and Iterative Search for Product IdentificationProceedings of the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge10.1145/3475956.3484483(1-8)Online publication date: 24-Oct-2021
      • (2021)Comparative Study of Adversarial Training Methods for Long-tailed ClassificationProceedings of the 1st International Workshop on Adversarial Learning for Multimedia10.1145/3475724.3483601(1-7)Online publication date: 20-Oct-2021
      • (2021)Improving Pedestrian Detection from a Long-tailed Domain PerspectiveProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475573(2918-2926)Online publication date: 17-Oct-2021
      • (2021)Long-tailed Distribution AdaptationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475479(3275-3282)Online publication date: 17-Oct-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media