Abstract
Salient object detection in remote sensing images (RSI-SOD) aims to identify the most prominent regions within complex RSI scenes. Current convolutional neural network (CNN)-based approaches struggle to capture long-distance dependencies, limiting their performance. To address this, we propose a novel dual-stream semantic interactive network (DSINet). Specifically, the model combines the advantages of Transformer and CNN to simultaneously model both global relationships and local details via the dual-stream architecture. It comprises three key modules: a multi-scale feature enhancement module to enhance feature representations across scales, a cross-attention complementary mining module to explore complementary cues between Transformer and CNN features, and a cross-layer feature interaction module to mitigate inconsistencies between adjacent layers. Extensive experiments on benchmark datasets demonstrate that DSINet achieves superior performance compared to state-of-the-art methods, effectively identifying salient objects in challenging RSI scenes. The code and results of our method are available at https://github.com/dqxfj99/DSINet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets can be found in the https://github.com/rmcong/EORSSD-dataset, https://li-chongyi.github.io/proj_optical_saliency.html, and https://github.com/wchao1213/ORSI-SOD.
References
Cong, R., Lei, J., Fu, H., Cheng, M.M., Lin, W., Huang, Q.: Review of visual saliency detection with comprehensive information. IEEE Trans. Circ. Syst. Video Technol. 29(10), 2941–2959 (2018)
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)
Li, G., Liu, Z., Ling, H.: Icnet: Information conversion network for rgb-d based salient object detection. IEEE Trans. Image Proc. 29, 4873–4884 (2020)
Wang, W., Lai, Q., Fu, H., Shen, J., Ling, H., Yang, R.: Salient object detection in the deep learning era: an in-depth survey. IEEE Trans. Patt. Anal. Mach. Intell. 44(6), 3239–3259 (2021)
Xie, Z., Zhang, W., Sheng, B., Li, P., Chen, C.P.: Bagfn: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4499–4513 (2021)
Zhou, Y., Chen, Z., Li, P., Song, H., Chen, C.P., Sheng, B.: Fsad-net: feedback spatial attention dehazing network. IEEE Trans. Neural Netw. Learn. Syst. 34(10), 7719–7733 (2022)
Zhang, Q., Ge, Y., Zhang, C., Bi, H.: Tprnet: camouflaged object detection via transformer-induced progressive refinement network. Visual Comput. 39(10), 4593–4607 (2023)
Karambakhsh, A., Sheng, B., Li, P., Li, H., Kim, J., Jung, Y., Chen, C.P.: Sparsevoxnet: 3-d object recognition with sparsely aggregation of 3-d dense blocks. IEEE Trans. Neural Netw. Learn. Syst. 35(1), 532–546 (2022)
Ge, Y., Ren, J., Zhang, C., He, M., Bi, H., Zhang, Q.: Feature-aware and iterative refinement network for camouflaged object detection. Visual Comput. 102, 1–18 (2024)
Ali, S.G., Wang, X., Li, P., Li, H., Yang, P., Jung, Y., Qin, J., Kim, J., Sheng, B.: Egdnet: an efficient glomerular detection network for multiple anomalous pathological feature in glomerulonephritis. Visual Comput. 26, 1–18 (2024)
Wei, W., Xu, M., Wang, J., Luo, X.: Bidirectional attentional interaction networks for rgb-d salient object detection. Image Vis. Comput. 138, 104792 (2023)
Lan, X., Gu, X., Gu, X.: Mmnet: Multi-modal multi-stage network for rgb-t image semantic segmentation. Appl. Intell. 52(5), 5817–5829 (2022)
Lian, Y., Shi, X., Shen, S., Hua, J.: Multitask learning for image translation and salient object detection from multimodal remote sensing images. Visual Comput. 40(3), 1395–1414 (2024)
Cong, R., Lei, J., Fu, H., Porikli, F., Huang, Q., Hou, C.: Video saliency detection via sparsity-based reconstruction and propagation. IEEE Trans. Image Proc. 28(10), 4819–4831 (2019)
Wang, P., Liu, Y., Cao, Y., Yang, X., Luo, Y., Lu, H., Liang, Z., Lau, R.W.: Salient object detection with image-level binary supervision. Patt. Recogn. 129, 108782 (2022)
Feng, W., Han, R., Guo, Q., Zhu, J., Wang, S.: Dynamic saliency-aware regularization for correlation filter-based object tracking. IEEE Trans. Image Process. 28(7), 3232–3245 (2019)
Hadizadeh, H., Bajić, I.V.: Saliency-aware video compression. IEEE Trans. Image Process. 23(1), 19–33 (2013)
Li, G., Liu, Z., Shi, R., Wei, W.: Constrained fixation point based segmentation via deep neural network. Neurocomputing 368, 180–187 (2019)
Li, G., Liu, Z., Shi, R., Hu, Z., Wei, W., Wu, Y., Huang, M., Ling, H.: Personal fixations-based object segmentation with object localization and boundary preservation. IEEE Trans. Image Process. 30, 1461–1475 (2020)
Liu, N., Zhao, W., Shao, L., Han, J.: Scg: Saliency and contour guided salient instance segmentation. IEEE Trans. Image Process. 30, 5862–5874 (2021)
En, Q., Duan, L., Zhang, Z.: Joint multisource saliency and exemplar mechanism for weakly supervised video object segmentation. IEEE Trans. Image Process. 30, 8155–8169 (2021)
Li, G., Wang, Y., Liu, Z., Zhang, X., Zeng, D.: Rgb-t semantic segmentation with location, activation, and sharpening. IEEE Trans. Circ. Syst. or Video Technol. 33(3), 1223–1235 (2022)
Wellmann, T., Lausch, A., Andersson, E., Knapp, S., Cortinovis, C., Jache, J., Scheuer, S., Kremer, P., Mascarenhas, A., Kraemer, R., et al.: Remote sensing in urban planning: contributions towards ecologically sound policies? Landsc. Urban Plann. 204, 103921 (2020)
Duraklı, E., Aptoula, E.: Domain generalized object detection for remote sensing images. In: 2023 31st signal processing and communications applications conference (SIU), pp. 1–4. IEEE (2023)
Li, C., Cong, R., Hou, J., Zhang, S., Qian, Y., Kwong, S.: Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 57(11), 9156–9166 (2019)
Zeng, X., Xu, M., Hu, Y., Tang, H., Hu, Y., Nie, L.: Adaptive edge-aware semantic interaction network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. (2023)
Li, G., Liu, Z., Zeng, D., Lin, W., Ling, H.: Adjacent context coordination network for salient object detection in optical remote sensing images. IEEE Trans. Cybern. 53(1), 526–538 (2023)
Li, G., Liu, Z., Lin, W., Ling, H.: Multi-content complementation network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Dong, P., Wang, B., Cong, R., Sun, H.H., Li, C.: Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images. Comput. Vis. Image Understand. 240, 103917 (2024)
Li, H., Chen, X., Yang, W., Huang, J., Sun, K., Wang, Y., Huang, A., Mei, L.: Global semantic-sense aggregation network for salient object detection in remote sensing images. Entropy 26(6), 445 (2024)
Zhang, M., Tian, X.: Transformer architecture based on mutual attention for image-anomaly detection. Virt. Real. Intell. Hardw. 5(1), 57–67 (2023)
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: Eapt: efficient attention pyramid transformer for image processing. IEEE Trans. Multim. 25, 50–61 (2021)
Huang, S., Liu, X., Tan, T., Hu, M., Wei, X., Chen, T., Sheng, B.: Transmrsr: transformer-based self-distilled generative prior for brain mri super-resolution. Visual Comput. 39(8), 3647–3659 (2023)
Wang, W., Zhao, S., Shen, J., Hoi, S.C., Borji, A.: Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1448–1457 (2019)
Liu, Y., Gu, Y.C., Zhang, X.Y., Wang, W., Cheng, M.M.: Lightweight salient object detection via hierarchical visual perception learning. IEEE Trans. Cybern. 51(9), 4439–4449 (2020)
Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R., Jagersand, M.: U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recogn. 106, 107404 (2020)
Wang, W., Shen, J., Dong, X., Borji, A., Yang, R.: Inferring salient objects from human fixations. IEEE transactions on pattern analysis and machine intelligence 42(8), 1913–1927 (2019)
Liu, Y., Zhang, X.Y., Bian, J.W., Zhang, L., Cheng, M.M.: Samnet: Stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Trans. Image Process. 30, 3804–3814 (2021)
Wang, W., Shen, J., Cheng, M.M., Shao, L.: An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5968–5977 (2019)
Zhao, X., Pang, Y., Zhang, L., Lu, H., Zhang, L.: Suppress and balance: A simple gated network for salient object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 35–51. Springer (2020)
Zhang, Q., Cong, R., Li, C., Cheng, M.M., Fang, Y., Cao, X., Zhao, Y., Kwong, S.: Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Trans. Image Process. 30, 1305–1317 (2020)
Tu, Z., Wang, C., Li, C., Fan, M., Zhao, H., Luo, B.: Orsi salient object detection via multiscale joint region and boundary model. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021)
Liang, B., Luo, H.: Meanet: An effective and lightweight solution for salient object detection in optical remote sensing images. Expert Systems with Applications p. 121778 (2023)
Cai, X., Lai, Q., Wang, Y., Wang, W., Sun, Z., Yao, Y.: Poly kernel inception network for remote sensing detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27706–27716 (2024)
Li, G., Liu, Z., Bai, Z., Lin, W., Ling, H.: Lightweight salient object detection in optical remote sensing images via feature correlation. IEEE Trans. Geosci. Remote Sens. 60, 1–12 (2022)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, pp. 213–229. Springer (2020)
Li, G., Liu, Z., Ye, L., Wang, Y., Ling, H.: Cross-modal weighting network for rgb-d salient object detection. In: European conference on computer vision, pp. 665–681. Springer (2020)
Zhang, Q.L., Yang, Y.B.: Sa-net: Shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022 (2021)
Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 568–578 (2021)
Wang, Q., Liu, Y., Xiong, Z., Yuan, Y.: Hybrid feature aligned network for salient object detection in optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)
Bao, L., Zhou, X., Zheng, B., Yin, H., Zhu, Z., Zhang, J., Yan, C.: Aggregating transformers and cnns for salient object detection in optical remote sensing images. Neurocomputing 553, 126560 (2023)
Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pvt v2: Improved baselines with pyramid vision transformer. Computat. Visual Media 8(3), 415–424 (2022)
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp. 6105–6114. PMLR (2019)
Fan, D.P., Ji, G.P., Cheng, M.M., Shao, L.: Concealed object detection. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 6024–6042 (2021)
Senhua, X., Liqing, G., Liang, W., Wei, F.: Multi-scale context-aware network for continuous sign language recognition. Virt. Real. Intell. Hardw. 6(4), 323–337 (2024)
Al-Jebrni, A.H., Ali, S.G., Li, H., Lin, X., Li, P., Jung, Y., Kim, J., Feng, D.D., Sheng, B., Jiang, L., et al.: Sthy-net: a feature fusion-enhanced dense-branched modules network for small thyroid nodule classification from ultrasound images. Vis. Comput. 39(8), 3675–3689 (2023)
Liu, R., Wang, T., Li, H., Zhang, P., Li, J., Yang, X., Shen, D., Sheng, B.: Tmm-nets: transferred multi-to mono-modal generation for lupus retinopathy diagnosis. IEEE Trans. Med. Imaging 42(4), 1083–1094 (2022)
Li, G., Liu, Z., Zhang, X., Lin, W.: Lightweight salient object detection in optical remote-sensing images via semantic matching and edge alignment. IEEE Trans. Geosci. Remote Sens. 61, 1–11 (2023)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856 (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7479–7489 (2019)
Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for rgb-d salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Shen, K., Zhou, X., Wan, B., Shi, R., Zhang, J.: Fully squeezed multiscale inference network for fast and accurate saliency detection in optical remote-sensing images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)
Feng, D., Chen, H., Liu, S., Liao, Z., Shen, X., Xie, Y., Zhu, J.: Boundary-semantic collaborative guidance network with dual-stream feedback mechanism for salient object detection in optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 61, 1–17 (2023)
Liu, Y., Xiong, Z., Yuan, Y., Wang, Q.: Transcending pixels: boosting saliency detection via scene understanding from aerial imagery. IEEE Transactions on Geoscience and Remote Sensing (2023)
Liu, Y., Xiong, Z., Yuan, Y., Wang, Q.: Distilling knowledge from super-resolution for efficient remote sensing salient object detection. IEEE Trans. Geosci. Remote Sens. 61, 1–16 (2023)
Liu, Y., Yuan, Y., Wang, Q.: Uncertainty-aware graph reasoning with global collaborative learning for remote sensing salient object detection. IEEE Geoscience and Remote Sensing Letters (2023)
Li, G., Bai, Z., Liu, Z.: Texture-semantic collaboration network for orsi salient object detection. IEEE Trans. Circuits Syst. II Express Briefs 71(4), 2464–2468 (2024)
Zhao, J., Jia, Y., Ma, L., Yu, L.: Adaptive dual-stream sparse transformer network for salient object detection in optical remote sensing images. IEEE J. Select. Topics Appl. Earth Observ. Remote Sens. 17, 5173–5192 (2024)
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp. 4548–4557 (2017)
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 733–740. IEEE (2012)
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 1597–1604. IEEE (2009)
Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 385–400 (2018)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Acknowledgements
This paper was supported by the National Natural Science Foundation of China (No. 62471124), Heilongjiang Province Natural Science Foundation (No. LH2022F005), and Young Top Talents Fund in the School of Electrical Information Engineering of Northeast Petroleum University (No. DYDQQB202204).
Author information
Authors and Affiliations
Contributions
Yanliang Ge provided software and contributed to validation and writing—original draft. Taichuan Liang was involved in methodology, validation, and writing—original draft, and provided software. Junchao Ren contributed to visualization, writing—review, and validation. Jiaxue Chen was involved in data curation, investigation, and validation. Hongbo Bi contributed to methodology and writing—review.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ge, Y., Liang, T., Ren, J. et al. Enhanced salient object detection in remote sensing images via dual-stream semantic interactive network. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03713-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03713-8