Abstract
Image semantic segmentation is a basic task of computer vision, and plays an important role in automatic driving, robot navigation and many other fields. However, the expensive computing cost limits its deployment on mobile devices. Therefore, the primary object of this study is to balance accuracy and inference speed in the semantic segmentation task. To this end, we propose a real-time semantic segmentation network with Spatial Enhancement (SENet). We propose to strengthen the information association between feature maps of different resolutions by attention mechanism. We design a spatial information branch to retain the high quality spatial features. The segmentation of object edges is improved by enhancing edge information, and the representation of features is improved by correlating high-level semantic information with low-level spatial information. The real-time performance of the model is achieved by using a lightweight feature enhancement module and a backbone network with low computational complexity. We have carried out several sets of experiments to test the validity of our SENet. The effectiveness and efficiency of SENet are evaluated on the PASCAL VOC2012 and the CityScapes dataset. The model achieves 76.37% and 77.23% mIoU segmentation accuracy, respectively, while the speed reaches 193.3 FPS and 30.8 FPS on a NVIDIA RTX 3080 GPU card. The research has resulted in a solution of balancing the accuracy and inference speed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amiri, M.M., Gündüz, D.: Machine learning at the wireless edge: distributed stochastic gradient descent over-the-air. IEEE Trans. Signal Process. 68, 2155–2169 (2020)
Araslanov, N., Roth, S.: Self-supervised augmentation consistency for adapting semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 15384–15394. Computer Vision Foundation/IEEE (2021)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Chen, L., Papandreou, G., Kokkinos, I., et al.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Chen, L., Zhu, Y., Papandreou, G., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., et al. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII. Lecture Notes in Computer Science, vol. 11211, pp. 833–851. Springer, New York (2018)
Chen, L.C., Papandreou, G., Schroff, F., et al.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, (2017)
Cheng, H. K., Chung, J., Tai, Y., et al.: Cascadepsp: toward class-agnostic and very high-resolution segmentation via global and local refinement. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 8887–8896. Computer Vision Foundation/IEEE (2020)
Cheng, Z., Qu, A., He, X.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 1–14 (2022)
Choi, S., Kim, J. T., Choo, J.: Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 9370–9380. Computer Vision Foundation/IEEE (2020)
Cordts, M., Omran, M., Ramos, S., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 3213–3223. IEEE Computer Society (2016)
Ding, L., Tang, H., Bruzzone, L.: Lanet: local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 59(1), 426–435 (2021)
Everingham, M., Eslami, S.M.A., Gool, L.V., et al.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Fritsch, J., Kühnl, T., Geiger, A.: A new performance measure and evaluation benchmark for road detection algorithms. In: 16th International IEEE Conference on Intelligent Transportation Systems, ITSC 2013, The Hague, The Netherlands, October 6–9, 2013, pp. 1693–1700. IEEE (2013)
Gao, R.: Rethink dilated convolution for real-time semantic segmentation. arXiv Preprint arXiv:2111.09957 (2021)
He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp 3561–3571 . IEEE (2019)
Hu, P., Zhu, H., Lin, J., et al.: Unsupervised contrastive cross-modal hashing. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3877–3889 (2022)
Hu, P., Huang, Z., Peng, D., et al.: Cross-modal retrieval with partially mismatched pairs. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Huang, Z., Wang, X., Huang, L., et al.: Ccnet: criss-cross attention for semantic segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 603–612. IEEE (2019)
Husbands, P., Shim, Y., Garvie, M., et al.: Recent advances in evolutionary and bio-inspired adaptive robotics: exploiting embodied dynamics. Appl. Intell. 51(9), 6467–6496 (2021)
Ibrahim, M. S., Vahdat, A., Ranjbar, M., et al.: Semi-supervised semantic image segmentation with self-correcting networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 12712–12722. Computer Vision Foundation/IEEE (2020)
Ji, J., Lu, X., Luo, M., et al.: Parallel fully convolutional network for semantic segmentation. IEEE Access 9, 673–682 (2020)
Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38(7), 2473–2488 (2022)
Li, X., Zhong, Z., Wu, J., et al.: Expectation-maximization attention networks for semantic segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 9166–9175. IEEE (2019)
Lin, G., Milan, A., Shen, C., et al.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 5168–5177. IEEE Computer Society (2017)
Liu, J., He, J., Qiao, Y., et al.: Learning to predict context-adaptive convolution for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., et al. (eds.) Computer Vision - ECCV 2020–16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV. Lecture Notes in Computer Science, vol. 12370, pp. 769–786. Springer, New York (2020)
Liu, J., He, J., Zhang, J., et al.: Efficientfcn: Holistically-guided decoding for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., et al. (eds.) Computer Vision - ECCV 2020–16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI. Lecture Notes in Computer Science, vol. 12371, pp. 1–17. Springer, New York (2020)
Liu, Y., Fan, B., Wang, L., et al.: Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. arXiv Preprint arXiv:1807.11236 (2018)
Liu, Z., Li, J., Shen, Z., et al.: Learning efficient convolutional networks through network slimming. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 2755–2763. IEEE Computer Society (2017)
Lo, S., Hang, H., Chan, S., et al.: Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: Xu, C., Kankanhalli, M.S., Aizawa, K., et al. (eds.) MMAsia ’19: ACM Multimedia Asia, Beijing, China, December 16–18, 2019, pp. 11–16. ACM, New York (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Ma, Z., Yuan, M., Gu, J., et al.: Triple-strip attention mechanism-based natural disaster images classification and segmentation. Vis. Comput. 38(9–10), 3163–3173 (2022)
Nirkin, Y., Wolf, L., Hassner, T.: Hyperseg: patch-wise hypernetwork for real-time semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, pp. 4061–4070. Computer Vision Foundation/IEEE (2021)
Peng, C., Zhang, X., Yu, G., et al.: Large kernel matters - improve semantic segmentation by global convolutional network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 1743–1751. IEEE Computer Society (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)
Sandler, M., Howard, A. G., Zhu, M., et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 4510–4520. Computer Vision Foundation/IEEE Computer Society (2018)
Sun, Y., Peng, D., Huang, H., et al.: Feature and semantic views consensus hashing for image set classification. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2097–2105 (2022)
Sun, Y., Ren, Z., Hu, P., et al.: Hierarchical consensus hashing for cross-modal retrieval. IEEE Trans. Multimed. (2023a)
Sun, Y., Wang, X., Peng, D., et al.: Hierarchical hashing learning for image set classification. IEEE Trans. Image Process. 32, 1732–1744 (2023)
Takikawa, T., Acuna, D., Jampani, V., et al.: Gated-scnn: gated shape cnns for semantic segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 5228–5237. IEEE (2019)
Tian, Z., He, T., Shen, C., et al.: Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 3126–3135. Computer Vision Foundation/IEEE (2019)
Wang, K., Yang, J., Yuan, S., et al.: A lightweight network with attention decoder for real-time semantic segmentation. Vis. Comput. 38(7), 2329–2339 (2022)
Wang, Q., Wu, B., Zhu, P., et al.: Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020a)
Wang, W., Fu, Y., Pan, Z., et al.: Real-time driving scene semantic segmentation. IEEE Access 8, 36,776-36,788 (2020)
Wu, T., Tang, S., Zhang, R., et al.: Cgnet: A light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)
Wu, Z., Wang, X., Gonzalez, J., et al.: ACE: adapting to changing environments for semantic segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 2121–2130. IEEE (2019)
Xie, E., Wang, W., Yu, Z., et al.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Xu, Ch., Shi, C., Yn, Chen: End-to-end dilated convolution network for document image semantic segmentation. J. Cent. South Univ. 28(6), 1765–1774 (2021)
Xu, H., Wang, S., Huang, Y., et al.: Fpanet: feature-enhanced position attention network for semantic segmentation. Mach. Vis. Appl. 32(6), 119 (2021)
Yang, Z., Wang, Y., Yang, F., et al.: Real-time instance segmentation with assembly parallel task. Vis. Comput. 1–11 (2022)
Yu, C., Wang, J., Peng, C., et al.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341 (2018)
Yu, C., Gao, C., Wang, J., et al.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 129, 3051–3068 (2021)
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., et al. (eds.) Computer Vision - ECCV 2020–16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI. Lecture Notes in Computer Science, vol. 12351, pp. 173–190. Springer, New York (2020)
Zhang, D., Han, J., Zhao, L., et al.: Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. Int. J. Comput. Vis. 127(4), 363–380 (2019)
Zhang, R., Chen, J., Feng, L., et al.: A refined pyramid scene parsing network for polarimetric SAR image semantic segmentation in agricultural areas. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)
Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Zhao, H., Qi, X., Shen, X., et al.: Icnet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., et al.: (eds) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part III, vol. 11207, pp. 418–434. Lecture Notes in Computer Science. Springer, New York (2018a)
Zhao, H., Zhang, Y., Liu, S., et al.: Psanet: Point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., et al. (eds.) Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part IX. Lecture Notes in Computer Science, vol. 11213, pp. 270–286. Springer, New York (2018)
Zheng, Z., Zhong, Y., Wang, J., et al.: Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 4095–4104. Computer Vision Foundation/IEEE (2020)
Funding
No funding was received to assist with the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, Y., Shi, P., He, H. et al. Senet: spatial information enhancement for semantic segmentation neural networks. Vis Comput 40, 3427–3440 (2024). https://doi.org/10.1007/s00371-023-03043-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03043-1