Abstract
Fine-grained image recognition (FGIR) aims to distinguish visual objects belonging to different subclasses within the same category. Existing methods mainly focus on identifying discriminative regions and extracting the most prominent features. However, this approach leads to scale imbalance between the foreground and background of an image. And it tends to focus on extracting features from salient foreground regions while neglecting valuable information present in the background. To address these two challenges, we propose a weakly supervised foreground-background partitioning and feature fusion framework. Specifically, a foreground-background image partition module is employed to separate the foreground and background regions to resolve the scale imbalance in image. We incorporate a feature similarity calculation module to weigh the foreground and background features. To leverage the background information while capturing discriminative regions, we introduce a selective mask feature module. Comprehensive experiments on four popular and competitive datasets demonstrated the superiority of the proposed method in comparison with the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sun, G., Cholakkal, H., Khan, S., Khan, F., Shao, L.: Fine-grained recognition: accounting for subtle differences between similar classes. In: Proceedings of the AAAI conference on Artificial Intelligence, vol. 34, pp. 12047–12054 (2020)
He, J., Chen, J.N., Liu, S., Kortylewski, A., Yang, C., Bai, Y., Wang, C.: Transfg: atransformer architecture for fine-grained recognition. In: Proceedings of the AAAI conference on Artificial Intelligence, vol. 36, pp. 852–860 (2022)
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. arXiv preprint arXiv:1909.00169 (2019)
Wu, Q., Miao, S., Chai, Z., Guo, G.: Fine-grained image classification with global information and adaptive compensation loss. IEEE Signal Process. Lett. 29, 36–40 (2021)
Zhou, J., Li, J., Wang, C., Wu, H., Zhao, C., Wang, Q.: A vegetable disease recognition model for complex background based on region proposal and progressive learning. Comput. Electron. Agric. 184, 106101 (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Wang, J., Yu, X., Gao, Y.: Feature fusion vision transformer for fine-grained visual categorization. arXiv preprint arXiv:2107.02341 (2021)
Paul, S., Bhattacharyya, A., Mollah, A.F., Basu, S., Nasipuri, M.: Hand segmentation from complex background for gesture recognition. In: Proceedings of IEM Graph 2018 on Emerging Technology in Modelling and Graphics, pp. 775–782 (2020)
Fang, W., Ding, Y., Zhang, F., Sheng, V.S.: DOG: a new background removal for object recognition from images. Neurocomputing 361, 85–91 (2019)
Chou, P.Y., Kao, Y.Y., Lin, C.H.: Fine-grained visual classification with high-temperature refinement and background suppression. arXiv preprint arXiv:2303.06442 (2023)
Chen, G., et al.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst., Man, Cybern.: Syst. 52(2), 936–953 (2020)
Aminu, M., Ahmad, N.A.: New variants of global-local partial least squares discriminant analysis for appearance-based face recognition. IEEE Access 8, 166703–166720 (2020)
Yu, D., Fang, Z., Jiang, Y.X.: Foreground feature enhancement and peak background suppression for fine-grained visual classification. In: Proceedings of the International conference on Multimedia Modeling, pp. 134–146 (2024)
Zhang, F., Li, M., Zhai, G., Liu, Y.: Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the MultiMedia Modeling: 27th International conference on MMM 2021, Prague, Czech Republic, pp. 134–146 (2021)
Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1325–1334 (2018)
Wah, C., Branson, S., Welinder, P., Perona, P. Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M. Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Loshchilov, I., Hutter, F.: Sgdr: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help?. Adv. Neural Inf. Process. Syst. 32 (2019)
Zhuang, P., Wang, Y. Qiao, Y.: Learning attentive pairwise interaction for fine-grained classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13130–13137 (2020)
Li, H., Zhang, X., Tian, Q., Xiong, H.: Attribute mix: semantic data augmentation for fine grained recognition. In: Proceedings of the IEEE International Conference on Visual Communications and Image Processing (VCIP), pp. 243–246 (2020)
Wang, S., Li, H., Wang, Z. Ouyang, W.: Dynamic position-aware network for fine-grained image recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2791–2799 (2021)
Deng, W., Marsh, J., Gould, S., Zheng, L.: Fine-grained classification via categorical memory networks. IEEE Trans. Image Process. 31, 4186–4196 (2022)
Yang, X., Wang, Y., Chen, K., Xu, Y. Tian, Y.: Fine-grained object classification via self-supervised pose alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7399–7408 (2022)
Ke, X., Cai, Y., Chen, B., Liu, H., Guo, W.: Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification. Pattern Recogn. 137, 109305 (2023)
Kim, S., Nam, J. Ko, B.C.: Vit-net: interpretable vision transformers with neural tree decoder. In: International Conference on Machine Learning, pp. 11162–11172 (2022)
Do, T., Tran, H., Tjiputra, E., Tran, Q.D., Nguyen, A.: Fine-grained visual classification using self assessment classifier. arXiv preprint arXiv:2205.10529 (2022)
Zhu, H., Ke, W., Li, D., Liu, J., Tian, L., Shan, Y.: Dual cross-attention learning for fine-grained visual categorization and object re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4692–4702 (2022)
Chou, P.Y., Lin, C.H., Kao, W.C.: A novel plug-in module for fine-grained visual classification. arXiv preprint arXiv:2202.03822 (2022)
Xu, Q., Wang, J., Jiang, B., Luo, B.: Fine-grained visual classification via internal ensemble learning transformer. IEEE Trans. Multimedia (2023)
Ji, R., Li, J., Zhang, L., Liu, J. Wu, Y.: Dual transformer with multi-grained assembly for fine-grained visual classification. IEEE Trans. Circuits Syst. Video Technol. (2023)
Zhang, Z.C., Chen, Z.D., Wang, Y., Luo, X., Xu, X.S.: A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information. Pattern Recogn. 145, 109979 (2024)
Xu, Q., Li, S., Wang, J., Jiang, B., Tang, J.: Context-semantic quality awareness network for fine-grained visual categorization. arXiv preprint arXiv:2403.10298 (2024)
Acknowledgements
This research was supported by the Fundamental Research Funds for the Central Universities (grant nos. 2662022XXYJ006, 2662017PY059 and 2662023XXPY005), the National Natural Science Foundation of China (grant no. 61176052), and Yingzi Tech & Huazhong Agricultural University Intelligent Research Institute of Food Health(grant nos. IRIFH202212 and IRIFH202304).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, C. et al. (2025). Foreground-Background Partitioning and Feature Fusion for Weakly Supervised Fine-Grained Image Recognition. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15033. Springer, Singapore. https://doi.org/10.1007/978-981-97-8502-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-8502-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8501-8
Online ISBN: 978-981-97-8502-5
eBook Packages: Computer ScienceComputer Science (R0)