research-article

Knowledge Guided Evolutionary Transformer for Remote Sensing Scene Classification

Authors:

Shuyuan YangAuthors Info & Claims

IEEE Transactions on Circuits and Systems for Video Technology, Volume 34, Issue 10_Part_2

Pages 10368 - 10384

https://doi.org/10.1109/TCSVT.2024.3407138

Published: 01 October 2024 Publication History

Abstract

Solving the complex challenges of sophisticated terrain and multi-scale targets in remote sensing (RS) images requires a synergistic combination of Transformer and convolutional neural network (CNN). However, crafting effective CNN architectures remains a major challenge. To address these difficulties, this study introduces the knowledge guided evolutionary Transformer for RS scene classification (Evo RSFormer). It amalgamates adaptive evolutionary CNN (Evo CNN) with Transformers in a hybrid strategy synergistically, which combines fine-grained local feature extraction of CNNs with long-range contextual dependency modeling of Transformers. Furthermore, for the development of Evo CNN blocks, this paper presents a knowledge-guided adaptive efficient multi-objective evolutionary neural architecture search (MOE2-NAS) strategy. This approach markedly diminishes the labor-intensive characteristics associated with traditional CNN design, striking a balance for both accuracy and compactness. Additionally, by leveraging domain knowledge from natural scene analysis into the RS field, MOE2-NAS facilitates the efficiency of classical NAS. It utilizes a priori knowledge to generate promising initial solutions and constructs a surrogate model for efficient search. The effectiveness of the proposed Evo RSFormer has been rigorously tested on various benchmark RS datasets, including UC Merced, NWPU45, and AID. Empirical results strongly support the superiority of Evo RSFormer over existing methods. Furthermore, experiments on MOE2-NAS have been studied to confirm the important role of knowledge guidance in improving the efficiency of NAS.

References

[1]

G. Li, W. Liu, Q. Gao, Q. Wang, J. Han, and X. Gao, “Self-supervised edge perceptual learning framework for high-resolution remote sensing images classification,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–15, 2023. 10.1109/TCSVT.2023.3343881.

Digital Library

[2]

N. Casagli, E. Intrieri, V. Tofani, G. Gigli, and F. Raspini, “Landslide detection, monitoring and prediction with remote-sensing techniques,” Nature Rev. Earth Environ., vol. 4, no. 1, pp. 51–64, Jan. 2023.

[3]

J. Wang, W. Li, Y. Wang, R. Tao, and Q. Du, “Representation-enhanced status replay network for multisource remote-sensing image classification,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–13, 2023. 10.1109/TNNLS.2023.3286422.

[4]

G. Wang, G. Cheng, P. Zhou, and J. Han, “Cross-level attentive feature aggregation for change detection,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–12, 2023. 10.1109/TCSVT.2023.3344092.

Digital Library

[5]

C. Guo, K. Liu, D. Deng, and X. Li, “ViT spatio-temporal feature fusion for aerial object tracking,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–13, 2023. 10.1109/TCSVT.2023.3326695.

Digital Library

[6]

M. Ma et al., “MBSI-Net: Multimodal balanced self-learning interaction network for image classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 5, pp. 3819–3833, May 2024.

Digital Library

[7]

L. Scheibenreif, M. Mommert, and D. Borth, “Masked vision transformers for hyperspectral image classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2023, pp. 2165–2175.

[8]

X. Nie et al., “Pro-tuning: Unified prompt tuning for vision tasks,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–15, 2023. 10.1109/TCSVT.2023.3327605.

Digital Library

[9]

Y. Shao et al., “CoT: Contourlet transformer for hierarchical semantic segmentation,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15, 2024. 10.1109/TNNLS.2024.3367901.

[10]

Y. Zhou, F. Wang, J. Zhao, R. Yao, S. Chen, and H. Ma, “Spatial–temporal based multihead self-attention for remote sensing image change detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 10, pp. 6615–6626, Oct. 2022.

[11]

J. Yang, B. Du, and L. Zhang, “Overcoming the barrier of incompleteness: A hyperspectral image classification full model,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15, 2023. 10.1109/TNNLS.2023.3279377.

[12]

Z. Wei, X. Yang, N. Wang, and X. Gao, “Syncretic modality collaborative learning for visible infrared person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2021, pp. 225–234.

[13]

D. Li, R. Liu, Y. Tang, and Y. Liu, “PSCLI-TF: Position-sensitive cross-layer interactive transformer model for remote sensing image scene classification,” IEEE Geosci. Remote Sens. Lett., vol. 21, pp. 1–5, 2024.

[14]

Y. Zhang, X. Gao, Q. Duan, J. Leng, X. Pu, and X. Gao, “Contextual learning in Fourier complex field for vhr remote sensing images,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15, 2023. 10.1109/TNNLS.2023.3319363.

[15]

Y. Zhao, J. Liu, and Z. Wu, “CDLNet: Collaborative dictionary learning network for remote sensing image scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–16, 2023, Art. no.

[16]

X. Zheng et al., “Neural architecture search with representation mutual information,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 11902–11911.

[17]

X. Dong and Y. Yang, “NAS-Bench-201: Extending the scope of reproducible neural architecture search,” in Proc. Int. Conf. Learn. Represent., 2020, pp. 1–16.

[18]

C. Li, H. Wang, J. Zhang, W. Yao, and T. Jiang, “An approximated gradient sign method using differential evolution for black-box adversarial attack,” IEEE Trans. Evol. Comput., vol. 26, no. 5, pp. 976–990, Oct. 2022.

Digital Library

[19]

C. Wang et al., “Bi-level multiobjective evolutionary learning: A case study on multitask graph neural topology search,” IEEE Trans. Evol. Comput., vol. 28, no. 1, pp. 208–222, Feb. 2024.

Digital Library

[20]

Q. Yao, J. Xu, W.-W. Tu, and Z. Zhu, “Efficient neural architecture search via proximal iterations,” in Proc. AAAI Conf. Artif. Intell., vol. 34, 2020, pp. 6664–6671.

[21]

Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, and K. C. Tan, “A survey on evolutionary neural architecture search,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 2, pp. 550–570, Feb. 2023.

[22]

Y.-L. Liao, S. Karaman, and V. Sze, “Searching for efficient multi-stage vision transformers,” 2021, arXiv:2109.00642.

[23]

M. Chen, H. Peng, J. Fu, and H. Ling, “AutoFormer: Searching transformers for visual recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Oct. 2021, pp. 12270–12280.

[24]

Z. Yu, Y. Cui, J. Yu, M. Wang, D. Tao, and Q. Tian, “Deep multimodal neural architecture search,” in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 3743–3752.

[25]

C. Li {et al.}, “BossNAS: Exploring hybrid CNN-transformers with block-wisely self-supervised neural architecture search,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Oct. 2021, pp. 12281–12291.

[26]

Y. Liu, Y. Tang, Z. Lv, Y. Wang, and Y. Sun, “Bridge the gap between architecture spaces via a cross-domain predictor,” in Proc. Adv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 13355–13366.

[27]

G. Shala, T. Elsken, F. Hutter, and J. Grabocka, “Transfer NAS with meta-learned Bayesian surrogates,” in Proc. 11th Int. Conf. Learn. Represent., 2023, pp. 1–17.

[28]

H. Lee, E. Hyung, and S. J. Hwang, “Rapid neural architecture search by learning to generate graphs from datasets,” in Proc. Int. Conf. Learn. Represent., 2020, pp. 1–16.

[29]

D. Zhou et al., “EcoNAS: Finding proxies for economical neural architecture search,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11396–11404.

[30]

N. Wang et al., “NAS-FCOS: Fast neural architecture search for object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11943–11951.

[31]

C. Peng, Y. Li, L. Jiao, and R. Shang, “Efficient convolutional neural architecture search for remote sensing image scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 7, pp. 6092–6105, Jul. 2021.

[32]

J. Wang, Y. Zhong, Z. Zheng, A. Ma, and L. Zhang, “RSNet: The search for remote sensing deep neural networks in recognition tasks,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 3, pp. 2520–2534, Mar. 2021.

[33]

A. Ma, Y. Wan, Y. Zhong, J. Wang, and L. Zhang, “SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search,” ISPRS J. Photogramm. Remote Sens., vol. 172, pp. 171–188, Feb. 2021.

[34]

Y. Wan, Y. Zhong, A. Ma, J. Wang, and R. Feng, “RSSM-Net: Remote sensing image scene classification based on multi-objective neural architecture search,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Sep. 2020, pp. 1369–1372.

[35]

Y. Wan, Y. Zhong, A. Ma, J. Wang, and L. Zhang, “E²SCNet: Efficient multiobjective evolutionary automatic search for remote sensing image scene classification network architecture,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15, 2022. 10.1109/TNNLS.2022.3220699.

[36]

C. Peng, Y. Li, R. Shang, and L. Jiao, “RSBNet: One-shot neural architecture search for a backbone network in remote sensing image recognition,” Neurocomputing, vol. 537, pp. 110–127, Jun. 2023.

Digital Library

[37]

M. Zhao, Q. Meng, L. Zhang, X. Hu, and L. Bruzzone, “Local and long-range collaborative learning for remote sensing scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–15, 2023, Art. no.

[38]

G. Wang et al., “P²FEViT: Plug-and-play CNN feature embedded hybrid vision transformer for remote sensing image classification,” Remote Sens., vol. 15, no. 7, p. 1773, 2023.

[39]

P. Lv, W. Wu, Y. Zhong, F. Du, and L. Zhang, “SCViT: A spatial-channel feature preserving vision transformer for remote sensing image scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.

[40]

K. Xu, P. Deng, and H. Huang, “Vision transformer: An excellent teacher for guiding small networks in remote sensing image scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2022, Art. no.

[41]

G. Wang, N. Zhang, W. Liu, H. Chen, and Y. Xie, “MFST: A multi-level fusion network for remote sensing scene classification,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022.

[42]

X. Tang, M. Li, J. Ma, X. Zhang, F. Liu, and L. Jiao, “EMTCAL: Efficient multiscale transformer and cross-level attention learning for remote sensing scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.

[43]

J. Guo, N. Jia, and J. Bai, “Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image,” Sci. Rep., vol. 12, no. 1, p. 15473, 2022.

[44]

Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 10012–10022.

[45]

M. Zhang, S. Jiang, Z. Cui, R. Garnett, and Y. Chen, “D-VAE: A variational autoencoder for directed acyclic graphs,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–13.

[46]

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput., vol. 6, no. 2, pp. 182–197, Apr. 2002.

Digital Library

[47]

Y. Jin, H. Wang, T. Chugh, D. Guo, and K. Miettinen, “Data-driven evolutionary optimization: An overview and case studies,” IEEE Trans. Evol. Comput., vol. 23, no. 3, pp. 442–458, Jun. 2018.

[48]

W. Chao, J. Zhao, L. Jiao, L. Li, F. Liu, and S. Yang, “A match made in consistency heaven: When large language models meet evolutionary algorithms,” 2024, arXiv:2401.10510.

[49]

S. Liu, H. Wang, W. Yao, and W. Peng, “Surrogate-assisted environmental selection for fast hypervolume-based many-objective optimization,” IEEE Trans. Evol. Comput., vol. 28, no. 1, pp. 132–146, Jul. 2024.

Digital Library

[50]

J. Lee, Y. Lee, J. Kim, A. R. Kosiorek, S. Choi, and Y. W. Teho, “Set Transformer: A framework for attention-based permutation-invariant neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2019, pp. 3744–3753.

[51]

Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv. Geograph. Inf. Syst., Nov. 2010, pp. 270–279.

[52]

G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classification: Benchmark and state of the art,” Proc. IEEE, vol. 105, no. 10, pp. 1865–1883, Oct. 2017.

[53]

G.-S. Xia et al., “AID: A benchmark data set for performance evaluation of aerial scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 7, pp. 3965–3981, Jul. 2017.

[54]

Z. Zhao, J. Li, Z. Luo, J. Li, and C. Chen, “Remote sensing image scene classification based on an enhanced attention module,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 11, pp. 1926–1930, Nov. 2020.

[55]

Y. Yang et al., “Dual wavelet attention networks for image classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 4, pp. 1899–1910, Apr. 2023.

Digital Library

[56]

N. He, L. Fang, S. Li, J. Plaza, and A. Plaza, “Skip-connected covariance network for remote sensing scene classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1461–1474, May 2020.

[57]

C. Shi, T. Wang, and L. Wang, “Branch feature fusion convolution network for remote sensing scene classification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 5194–5210, 2020.

[58]

Q. Bi, K. Qin, Z. Li, H. Zhang, K. Xu, and G.-S. Xia, “A multiple-instance densely-connected ConvNet for aerial scene classification,” IEEE Trans. Image Process., vol. 29, pp. 4911–4926, 2020.

[59]

X. Wang, S. Wang, C. Ning, and H. Zhou, “Enhanced feature pyramid network with deep semantic embedding for remote sensing scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 9, pp. 7918–7932, Sep. 2021.

[60]

W. Wang, Y. Chen, and P. Ghamisi, “Transferring CNN with adaptive learning for remote sensing scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–18, 2022, Art. no.

[61]

S. Wang, Y. Guan, and L. Shao, “Multi-granularity canonical appearance pooling for remote sensing scene classification,” IEEE Trans. Image Process., vol. 29, pp. 5396–5407, 2020.

[62]

J. Wang, W. Li, M. Zhang, R. Tao, and J. Chanussot, “Remote-sensing scene classification via multistage self-guided separation network,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–12, 2023, Art. no.

[63]

S. Li, M. Dai, and B. Li, “MMPC-Net: Multigranularity and multiscale progressive contrastive learning neural network for remote sensing image scene classification,” IEEE Geosci. Remote Sens. Lett., vol. 21, pp. 1–5, 2024.

[64]

J. Wu, L. Fang, and J. Yue, “TAKD: Target-aware knowledge distillation for remote sensing scene classification,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–13, 2024. 10.1109/TCSVT.2024.3391018.

Digital Library

[65]

Z. Dong, Y. Gu, and T. Liu, “UPetu: A unified parameter-efficient fine-tuning framework for remote sensing foundation model,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–13, 2024, Art. no.

[66]

Y. Yang, X. Tang, Y.-M. Cheung, X. Zhang, and L. Jiao, “SAGN: Semantic-aware graph network for remote sensing scene classification,” IEEE Trans. Image Process., vol. 32, pp. 1011–1025, 2023.

[67]

K. Xu, H. Huang, P. Deng, and Y. Li, “Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5751–5765, Oct. 2022.

[68]

A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent., 2021, pp. 1–16.

[69]

L. Yuan et al., “Tokens-to-token ViT: Training vision transformers from scratch on ImageNet,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2021, pp. 558–567.

[70]

W. Wang et al., “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 568–578.

[71]

W. Yu et al., “MetaFormer is actually what you need for vision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10819–10829.

[72]

X. Huang, F. Liu, Y. Cui, P. Chen, L. Li, and P. Li, “Faster and better: A lightweight transformer network for remote sensing scene classification,” Remote Sens., vol. 15, no. 14, p. 3645, Jul. 2023.

[73]

K. Chen, B. Chen, C. Liu, W. Li, Z. Zou, and Z. Shi, “RSMamba: Remote sensing image classification with state space model,” 2024, arXiv:2403.19654.

[74]

F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1251–1258.

Index Terms

Knowledge Guided Evolutionary Transformer for Remote Sensing Scene Classification
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Diversity Guided Evolutionary Programming: A novel approach for continuous optimization

Avoiding premature convergence to local optima and rapid convergence towards global optima has been the major concern with evolutionary systems research. In order to avoid premature convergence, sufficient amount of genetic diversity within the evolving ...
Towards better exploiting convolutional neural networks for remote sensing scene classification

We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets or CNNs) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature ...
Diversity-Guided Evolutionary Algorithms
PPSN VII: Proceedings of the 7th International Conference on Parallel Problem Solving from Nature

Population diversity is undoubtably a key issue in the performance of evolutionary algorithms. A common hypothesis is that high diversity is important to avoid premature convergence and to escape local optima. Various diversity measures have been used ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology

IEEE Transactions on Circuits and Systems for Video Technology Volume 34, Issue 10_Part_2

Oct. 2024

761 pages

Issue’s Table of Contents

1051-8215 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 October 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

Affiliations

Jiaxuan Zhao

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0002-2827-0681

Licheng Jiao

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0003-3354-9617

Chao Wang

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0003-1684-3486

Xu Liu

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0002-8780-5455

Fang Liu

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0002-5669-9354

Lingling Li

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0002-6130-2518

Mengru Ma

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0002-6802-539X

Shuyuan Yang

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Hangzhou Institute of Technology, Xidian University, Xi’an, China

https://orcid.org/0000-0002-4796-5737

View Issue’s Table of Contents