Adversarial erasing attention for fine-grained image classification

Ji, Jinsheng; Jiang, Linfeng; Zhang, Tao; Zhong, Weilin; Xiong, Huilin

doi:10.1007/s11042-020-08666-3

Adversarial erasing attention for fine-grained image classification

Published: 30 January 2020

Volume 80, pages 22867–22889, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jinsheng Ji¹,
Linfeng Jiang¹,
Tao Zhang¹,
Weilin Zhong¹ &
…
Huilin Xiong²

617 Accesses
8 Citations
Explore all metrics

Abstract

Recognizing fine-grained subcategories is a challenging task due to the large intra-class diversities and small inter-class variances of the fine-grained images. The common thought is to find out the parts that can distinguish similar subcategories efficiently. Most previous works rely on the manual annotations or attention technologies to localize the discriminative parts and have achieved great progress. However, these manual annotations are demanding in practical applications and some complicated constrains on the loss functions have to be adopted to localize the discriminative parts for building multi-view feature representations. To handle the challenges above, the strategy of adversarial erasing is applied on the attention module in this paper, which learns to localize different discriminative parts by erasing the most one from the image. Without the complicated loss functions, the proposed attention module can localize the discriminative parts more efficiently. Different from many part based methods, the classification network which consists of three subnetworks is introduced, and the subnetworks are trained by the original image and two discriminative parts respectively. Moreover, features learned from the three subnetworks are then fused in a more efficiently way to build better feature representations. Four mostly used datasets of CUB-200-2011, Stanford Dogs, Stanford Cars and FGVC-Aircraft are utilized to evaluate the proposed method and experimental results show that it can outperform some state-of-the-art methods without using the manual annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Disentangled Representation for Fine-Grained Visual Categorization

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Weakly Supervised Fine-Grained Visual Recognition via Adversarial Complementary Attentions and Hierarchical Bilinear Pooling

References

Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2927–2936
Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2016) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802
Article Google Scholar
Berg T, Liu J, Woo Lee S, Alexander ML, Jacobs D, Belhumeur P (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2011–2018
Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: European conference on computer vision. Springer, pp 168–181
Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. BMVC
Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 321–328
Chang YS (2018) Fine-grained attention for image caption generation. Multimed Tools Appl 77(3):2959–2971
Article Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets. arXiv:1405.3531
Dai Z, Chen M, Zhu S, Tan P (2018) Batch feature erasing for person re-identification and beyond. arXiv:1811.07130
Darrell T, Huang C, Jia Y (2012) Beyond spatial pyramids: Receptive field learning for pooled image features. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 3370–3377
Ding Z, Fu Y (2016) Robust transfer metric learning for image classification. IEEE Trans Image Process 26(2):660–670
Article MathSciNet Google Scholar
Farrell R, Oza O, Zhang N, Morariu VI, Darrell T, Davis LS (2011) Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: 2011 IEEE international conference on Computer vision (ICCV). IEEE, pp 161–168
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gosselin PH, Murray N, Jégou H, Perronnin F (2014) Revisiting the fisher vector for fine-grained classification. Pattern Recogn Lett 49:92–98
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang C, He Z, Cao G, Cao W (2016) Task-driven progressive part localization for fine-grained object recognition. IEEE Trans Multimed 18(12):2372–2383
Article Google Scholar
Huh M, Agrawal P, Efros AA (2016) What makes imagenet good for transfer learning? arXiv:1608.08614
Huh MH, Zhang N (2019) Feedback adversarial learning: Spatial feedback for improving generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1476–1485
Iscen A, Tolias G, Gosselin PH, Jégou H (2015) A comparison of dense region detectors for image search and fine-grained classification. IEEE Trans Image Process 24(8):2369–2381
Article MathSciNet Google Scholar
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: CVPR Workshops, vol 2, pp 1
Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: 2015 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 5546–5555
Krause J, Stark M, Deng J, Fei-fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on Computer vision workshops (ICCVW). IEEE, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar A, Kim J, Lyndon D, Fulham M, Feng D (2016) An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE J Biomed Health Inf 21(1):31–40
Article Google Scholar
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1222–1230
Lin TY, RoyChowdhury A, Maji S (2017) Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. ECCV. pp 172–185
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. CoRR, arXiv:1603.06765
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu X, Guo Y, Liu N, Wan L, Fang T (2018) Non-convex joint bilateral guided depth upsampling. Multimed Tools Appl 77(12):15521–15544
Article Google Scholar
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circ Syst Video Technol 10(19):1–15
Google Scholar
Lu X, Ma C, Ni B, Yang X, Reid I, Yang MH (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 353–369
Lu X, Ni B, Ma C, Yang X (2019) Learning transform-aware attentive network for object tracking. Neurocomputing 349:133–144
Article Google Scholar
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3623–3632
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: ICVGIP. IEEE, pp 722–729
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724
Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: CVPR. IEEE, pp 1–8
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Rosenfeld A, Ullman S (2016) Visual concept recognition and localization via iterative introspection. In: Asian conference on computer vision. Springer, pp 264–279
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Article Google Scholar
Simon M, Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1143–1151
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, pp 140–C1556
Singh KK, Ojha U, Lee YJ (2018) Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. arXiv:1811.11155
Stark M, Krause J, Pepik B, Meger D, Little JJ, Schiele B, Koller D (2011) Fine-grained categorization for 3d scene understanding. Int J Robot Res 30 (13):1543–1552
Article Google Scholar
Sumbul G, Cinbis RG, Aksoy S (2019) Multisource region attention network for fine-grained object recognition in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing
Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 843–852
Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P, Belongie S (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 595–604
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Inst. Technol. Pasadena, Tech. Rep CNS-TR-2011-001
Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: 2015 IEEE international conference on Computer vision (ICCV). IEEE, pp 2399–2406
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164
Wang H, Gong D, Li Z, Liu W (2019) Decorrelated adversarial learning for age-invariant face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3527–3536
Wang W, Lu X, Shen J, Crandall DJ, Shao L (2019) Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9236–9245
Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. arXiv:1605.01130
Wegner JD, Branson S, Hall D, Schindler K, Perona P (2016) Cataloging public objects using aerial and street-level images-urban trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6014–6023
Wei XS, Xie CW, Wu J, Shen C (2018) Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714
Article Google Scholar
Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1568–1576
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 842–850
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1641–1648
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process 23(5):1994–2008
Article MathSciNet Google Scholar
Xie N, Lai F, Doran D, Kadav A (2019) Visual entailment: A novel task for fine-grained image understanding. arXiv:1901.06706
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: ECCV 2018. Springer, pp 438–454
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks?. In: Advances in neural information processing systems, pp 3320–3328
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhang H, Hu S, Zhang X (2014) Sift flow for large-displacement object tracking. Appl Opt 53(27):6194–6205
Article Google Scholar
Zhang H, Wang Y, Luo L, Lu X, Zhang M (2017) Sift flow for abrupt motion tracking via adaptive samples selection with sparse representation. Neurocomputing 249:253–265
Article Google Scholar
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision. Springer, pp 834–849
Zhang N, Farrell R, Darrell T (2012) Pose pooling kernels for sub-category recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, pp 3665–3672. IEEE
Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 729–736
Zhang N, Shelhamer E, Gao Y, Darrell T (2015) Fine-grained pose prediction, normalization, and recognition. arXiv:1511.07063
Zhang T, Ghanem B, Liu S, Ahuja N (2013) Robust visual tracking via structured multi-task sparse learning. Int J Comput Vis 101(2):367–383
Article MathSciNet Google Scholar
Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1325–1334
Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016) Picking deep filter responses for fine-grained image recognition. In: CVPR, pp 1134–1142
Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2017) Picking neural activations for fine-grained recognition. IEEE Trans Multimed 19(12):2736–2750
Google Scholar
Zhang Y, Wei XS, Wu J, Cai J, Lu J, Nguyen VA, Do MN (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725
Article MathSciNet Google Scholar
Zhao B, Feng J, Wu X, Yan S (2017) A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256
Article Google Scholar
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Int. Conf. on computer vision
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: 2016 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2921–2929

Download references

Acknowledgements

The work was partly supported by the National Science Foundation of China (NSFC), under contract No. 61673274.

Author information

Authors and Affiliations

Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Jinsheng Ji, Linfeng Jiang, Tao Zhang & Weilin Zhong
Institute for Sensing and Navigation, Shanghai Jiao Tong University, Shanghai, 200240, China
Huilin Xiong

Authors

Jinsheng Ji
View author publications
You can also search for this author in PubMed Google Scholar
Linfeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weilin Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Huilin Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huilin Xiong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, J., Jiang, L., Zhang, T. et al. Adversarial erasing attention for fine-grained image classification. Multimed Tools Appl 80, 22867–22889 (2021). https://doi.org/10.1007/s11042-020-08666-3

Download citation

Received: 29 May 2019
Revised: 20 November 2019
Accepted: 08 January 2020
Published: 30 January 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-020-08666-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial erasing attention for fine-grained image classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Disentangled Representation for Fine-Grained Visual Categorization

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Weakly Supervised Fine-Grained Visual Recognition via Adversarial Complementary Attentions and Hierarchical Bilinear Pooling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation