Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3651671.3651752acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Convolutionally Enhanced Feature Fusion Visual Transformer for Fine-Grained Visual Classification

Published: 07 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Fine-grained image classification is a popular research topic in computer vision and pattern recognition, where the goal is to recognize and classify subclasses of objects in images at the fine-grained level. In recent years, Transformer's self-attention mechanism has been increasingly introduced into fine-grained image classification tasks due to its ability to naturally focus on the most discriminative regions of the object. In this paper, a new Convolutionally Enhanced Feature Fusion Visual Transformer method is proposed based on the Feature Fusion Visual Transformer by introducing convolutional operations. Firstly, for the original input image, patches are not directly labeled, but extracted from the generated low-level features; Secondly, the computational complexity at the multi-head attention layer is reduced through spatial-reduction attention, which also reduces memory consumption; Finally, the inverted residual feed-forward network is applied to each encoder to improve the network's expression ability. Comparative experiments on four datasets show that the method improves the accuracy of fine-grained image feature extraction and reduces the computation and memory consumption by improving the self-attention layer to improve the efficiency and performance of the model.

    References

    [1]
    Zhang, N., Donahue, J., Girshick, R., Darrell, T. Part-based R-CNNs for Fine-grained Category Detection. in Proc. ECCV, Zurich, Switzerland, 2014, pp. 834–849.
    [2]
    Wei, X. S., Xie, C. W., Wu, J. X., Shen, C. H. Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-grained Bird Species Categorization. Pattern Recognition, vol. 76, pp. 704–714, Apr. 2018, 10.1016/j.patcog.2017.10.002.
    [3]
    Lin, T. Y., Roychowdhury, A., Maji, S. Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans. PA MI, vol. 40, no. 6, pp. 1309–1322, Jun. 2018, 10.1109/TPAMI.2017.2723400.
    [4]
    Fu, J. L., Zheng, H. L., Mei, T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. in Proc. CVPR, Honolulu, HAWAII, USA, 2017, pp. 4476–4484.
    [5]
    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X. H., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. 2020, arXiv:2010.11929.
    [6]
    Li, Y. H., Mao, H. Z., Girshick, R., He, K. M. Exploring plain vision transformer backbones for object detection. in Proc. ECCV, Tel Aviv, Israel, 2022, pp. 280–296.
    [7]
    Guo, R. H., Niu, D. T., Qu, L., Li, Z. B. SOTR: Segmenting objects with transformers. in Proc. ICCV, Montreal, Canada, 2021, pp. 7137–7146.
    [8]
    He, J., Chen, J. N., Liu, S., Kortylewski, A., Yang, C., Bai, Y. T., Wang, C. H. TransFG: A transformer architecture for fine-grained recognition. in Proc. AAAI, Vancouver, Canada, 2022, pp. 852–860.
    [9]
    Wang, J., Yu, X. H., Gao, Y. S. Feature fusion vision transformer for fine-grained visual categorization. 2021, arXiv:2107.02341.
    [10]
    Zhang, Y., Cao, J., Zhang, L., Liu, X. C., Wang, Z. Y., Ling, F., Chen, W. Q. A free lunch from vit: Adaptive attention multi-scale fusion transformer for fine-grained visual recognition. in Proc. ICASSP, Singapore, 2022, pp. 3234–3238.
    [11]
    Wang, Y. M., Morariu, V. I., Davis, L. S. Learning a discriminative filter bank within a CNN for fine-grained recognition. in Proc. CVPR, Salt Lake City, UT, USA, 2018, pp. 4148–4157.
    [12]
    Luo, W., Zhang, H. M., Li, J., Wei, X. S. Learning semantically enhanced feature for fine-grained image classification. IEEE Trans. SPL, vol. 27, pp. 1545–1549, 2022, 10.1109/LSP.2020.3020227.
    [13]
    Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S. End-to-end object detection with transformers. 2020, arXiv:2005.12872.
    [14]
    Wang, H. Y., Zhu, Y. K., Adam, H., Yuille, A., Chen, L. C. Max-deeplab: End-to-end panoptic segmentation with mask transformers. in Proc. CVPR, 2021, pp. 5459–5470.
    [15]
    Wang, W. H., Xie, E. Z., Li, X., Fan, D. P., Song, K. T., Liang, D., Lu, T., Luo, P., Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. in Proc. ICCV, Montreal, Canada, 2021, pp. 548–558.
    [16]
    Guo, J. Y., Han, K., Wu, H., Tang, Y. H., Chen, X. H., Wang, Y. H., Xu, C. CMT: Convolutional neural networks meet vision transformers. in Proc. CVPR, New Orleans, LA, USA, 2022, pp. 12165–12175.
    [17]
    Yuan, K., Guo, S. P., Liu, Z. W., Zhou, A. J., Yu, F. W., Wu, W. Incorporating convolution designs into visual transformers. in Proc. ICCV, Montreal, Canada, 2021, pp. 559–568.
    [18]
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology, CNS-TR-2011-001, 2011.
    [19]
    Yu, X. H., Zhao, Y., Gao, Y. S., Xiong, S. W., Yuan, X. H. Patchy image structure classification using multi-orientation region transform. in Proc. AAAI, New York, NY, 2020, pp. 12741–12748.
    [20]
    Du, R. Y., Chang, D. L., Bhunia, A. K., Xie, J. Y., Ma, Z. Y., Song, Z. Y., Guo, J. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. 2020, arXiv:2003.03836.
    [21]
    Rao, Y. M., Chen, G. Y., Lu, J. W., Zhou, J. Counterfactual attention learning for fine-grained visual categorization and re-identification. in Proc. ICCV, Montreal, Canada, 2021, pp. 1005–1014.
    [22]
    Sun, H. B., He, X. T., Peng, Y. X. SIM-trans: Structure information modeling transformer for fine-grained visual categorization. 2022, arXiv:2208.14607.
    [23]
    Krizhevsky, A., Sutskever, I., Hinton, G. E. ImageNet classification with deep convolutional neural networks. ACM, vol. 60, no. 6, pp. 84–90, June. 2017.
    [24]
    He, X. T., Peng, P. X. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. in Proc. AAAI, San Francisco, CA, 2017, pp. 4075–4081.
    [25]
    Li, P. H., Xie, J. T., Wang, Q. L., Gao, Z. L. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. in Proc. CVPR, Salt Lake City, UT, USA, 2018, pp. 947–955.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLC '24: Proceedings of the 2024 16th International Conference on Machine Learning and Computing
    February 2024
    757 pages
    ISBN:9798400709234
    DOI:10.1145/3651671
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convolutional neural network
    2. Fine-grained images
    3. Spatial-reduction attention
    4. Visual transformer

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICMLC 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 6
      Total Downloads
    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media