Abstract
Recognizing fine-grained subcategories is a challenging task due to the large intra-class diversities and small inter-class variances of the fine-grained images. The common thought is to find out the parts that can distinguish similar subcategories efficiently. Most previous works rely on the manual annotations or attention technologies to localize the discriminative parts and have achieved great progress. However, these manual annotations are demanding in practical applications and some complicated constrains on the loss functions have to be adopted to localize the discriminative parts for building multi-view feature representations. To handle the challenges above, the strategy of adversarial erasing is applied on the attention module in this paper, which learns to localize different discriminative parts by erasing the most one from the image. Without the complicated loss functions, the proposed attention module can localize the discriminative parts more efficiently. Different from many part based methods, the classification network which consists of three subnetworks is introduced, and the subnetworks are trained by the original image and two discriminative parts respectively. Moreover, features learned from the three subnetworks are then fused in a more efficiently way to build better feature representations. Four mostly used datasets of CUB-200-2011, Stanford Dogs, Stanford Cars and FGVC-Aircraft are utilized to evaluate the proposed method and experimental results show that it can outperform some state-of-the-art methods without using the manual annotations.
Similar content being viewed by others
References
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2927–2936
Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2016) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802
Berg T, Liu J, Woo Lee S, Alexander ML, Jacobs D, Belhumeur P (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2011–2018
Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: European conference on computer vision. Springer, pp 168–181
Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. BMVC
Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 321–328
Chang YS (2018) Fine-grained attention for image caption generation. Multimed Tools Appl 77(3):2959–2971
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets. arXiv:1405.3531
Dai Z, Chen M, Zhu S, Tan P (2018) Batch feature erasing for person re-identification and beyond. arXiv:1811.07130
Darrell T, Huang C, Jia Y (2012) Beyond spatial pyramids: Receptive field learning for pooled image features. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 3370–3377
Ding Z, Fu Y (2016) Robust transfer metric learning for image classification. IEEE Trans Image Process 26(2):660–670
Farrell R, Oza O, Zhang N, Morariu VI, Darrell T, Davis LS (2011) Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: 2011 IEEE international conference on Computer vision (ICCV). IEEE, pp 161–168
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gosselin PH, Murray N, Jégou H, Perronnin F (2014) Revisiting the fisher vector for fine-grained classification. Pattern Recogn Lett 49:92–98
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang C, He Z, Cao G, Cao W (2016) Task-driven progressive part localization for fine-grained object recognition. IEEE Trans Multimed 18(12):2372–2383
Huh M, Agrawal P, Efros AA (2016) What makes imagenet good for transfer learning? arXiv:1608.08614
Huh MH, Zhang N (2019) Feedback adversarial learning: Spatial feedback for improving generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1476–1485
Iscen A, Tolias G, Gosselin PH, Jégou H (2015) A comparison of dense region detectors for image search and fine-grained classification. IEEE Trans Image Process 24(8):2369–2381
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: CVPR Workshops, vol 2, pp 1
Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: 2015 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 5546–5555
Krause J, Stark M, Deng J, Fei-fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on Computer vision workshops (ICCVW). IEEE, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar A, Kim J, Lyndon D, Fulham M, Feng D (2016) An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE J Biomed Health Inf 21(1):31–40
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1222–1230
Lin TY, RoyChowdhury A, Maji S (2017) Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. ECCV. pp 172–185
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. CoRR, arXiv:1603.06765
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu X, Guo Y, Liu N, Wan L, Fang T (2018) Non-convex joint bilateral guided depth upsampling. Multimed Tools Appl 77(12):15521–15544
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circ Syst Video Technol 10(19):1–15
Lu X, Ma C, Ni B, Yang X, Reid I, Yang MH (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 353–369
Lu X, Ni B, Ma C, Yang X (2019) Learning transform-aware attentive network for object tracking. Neurocomputing 349:133–144
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3623–3632
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: ICVGIP. IEEE, pp 722–729
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724
Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: CVPR. IEEE, pp 1–8
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Rosenfeld A, Ullman S (2016) Visual concept recognition and localization via iterative introspection. In: Asian conference on computer vision. Springer, pp 264–279
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Simon M, Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1143–1151
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, pp 140–C1556
Singh KK, Ojha U, Lee YJ (2018) Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. arXiv:1811.11155
Stark M, Krause J, Pepik B, Meger D, Little JJ, Schiele B, Koller D (2011) Fine-grained categorization for 3d scene understanding. Int J Robot Res 30 (13):1543–1552
Sumbul G, Cinbis RG, Aksoy S (2019) Multisource region attention network for fine-grained object recognition in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing
Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 843–852
Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P, Belongie S (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 595–604
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Inst. Technol. Pasadena, Tech. Rep CNS-TR-2011-001
Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: 2015 IEEE international conference on Computer vision (ICCV). IEEE, pp 2399–2406
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164
Wang H, Gong D, Li Z, Liu W (2019) Decorrelated adversarial learning for age-invariant face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3527–3536
Wang W, Lu X, Shen J, Crandall DJ, Shao L (2019) Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9236–9245
Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. arXiv:1605.01130
Wegner JD, Branson S, Hall D, Schindler K, Perona P (2016) Cataloging public objects using aerial and street-level images-urban trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6014–6023
Wei XS, Xie CW, Wu J, Shen C (2018) Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714
Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1568–1576
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 842–850
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1641–1648
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process 23(5):1994–2008
Xie N, Lai F, Doran D, Kadav A (2019) Visual entailment: A novel task for fine-grained image understanding. arXiv:1901.06706
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: ECCV 2018. Springer, pp 438–454
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks?. In: Advances in neural information processing systems, pp 3320–3328
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhang H, Hu S, Zhang X (2014) Sift flow for large-displacement object tracking. Appl Opt 53(27):6194–6205
Zhang H, Wang Y, Luo L, Lu X, Zhang M (2017) Sift flow for abrupt motion tracking via adaptive samples selection with sparse representation. Neurocomputing 249:253–265
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision. Springer, pp 834–849
Zhang N, Farrell R, Darrell T (2012) Pose pooling kernels for sub-category recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, pp 3665–3672. IEEE
Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 729–736
Zhang N, Shelhamer E, Gao Y, Darrell T (2015) Fine-grained pose prediction, normalization, and recognition. arXiv:1511.07063
Zhang T, Ghanem B, Liu S, Ahuja N (2013) Robust visual tracking via structured multi-task sparse learning. Int J Comput Vis 101(2):367–383
Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1325–1334
Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016) Picking deep filter responses for fine-grained image recognition. In: CVPR, pp 1134–1142
Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2017) Picking neural activations for fine-grained recognition. IEEE Trans Multimed 19(12):2736–2750
Zhang Y, Wei XS, Wu J, Cai J, Lu J, Nguyen VA, Do MN (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725
Zhao B, Feng J, Wu X, Yan S (2017) A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Int. Conf. on computer vision
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: 2016 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2921–2929
Acknowledgements
The work was partly supported by the National Science Foundation of China (NSFC), under contract No. 61673274.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ji, J., Jiang, L., Zhang, T. et al. Adversarial erasing attention for fine-grained image classification. Multimed Tools Appl 80, 22867–22889 (2021). https://doi.org/10.1007/s11042-020-08666-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08666-3