DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

Zhang, Wenming; Zhang, Shaotong; Li, Yaqian; Li, Haibin; Song, Tao

doi:10.1007/s11554-024-01579-6

DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

Research
Published: 13 November 2024

Volume 21, article number 195, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Wenming Zhang¹,
Shaotong Zhang¹,
Yaqian Li¹,
Haibin Li¹ &
…
Tao Song²

177 Accesses
1 Citation
Explore all metrics

Abstract

Semantic segmentation is crucial in autonomous driving because of its accurate identification and segmentation of objects and regions. However, there is a conflict between segmentation accuracy and real-time performance on embedded devices. We propose an efficient lightweight semantic segmentation network (DRMNet) to solve these problems. Employing a streamlined bilateral structure, the model encodes semantic and spatial paths, cross-fusing features during encoding, and incorporates unique skip connections to coordinate upsampling within the semantic pathway. We design a new self-calibrated aggregate pyramid pooling module (SAPPM) at the end of the semantic branch to capture more comprehensive multi-scale semantic information and balance its extraction and inference speed. Furthermore, we designed a new feature fusion module, which guides the fusion of detail features and semantic features through attention perception, alleviating the problem of semantic information quickly covering spatial detail information. Experimental results on the CityScapes, CamVid, and NightCity datasets demonstrate the effectiveness of DRMNet. On a 2080Ti GPU, DRMNet achieves 78.6% mIoU at 88.3 FPS on the CityScapes dataset, 78.9% mIoU at 149 FPS on the CamVid dataset, and 53.5% mIoU at 160.4 FPS on the NightCity dataset. These results highlight the model’s ability to balance accuracy and real-time performance better, making it suitable for embedded devices in autonomous driving applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Article 05 March 2024

Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes

Article 23 July 2024

LDMSNet: Lightweight Dual-Branch Multi-Scale Network for Real-Time Semantic Segmentation of Autonomous Driving

Article 07 November 2024

Data availability

No datasets were generated or analysed during the current study.

References

Feng, D., Haase Schütz, C., Rosenbaum, L., Hertlein, H., Glaeser, C., Timm, F., et al.: Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 22(3), 1341–1360 (2020)
Article Google Scholar
Siam, M., Elkerdawy, S., Jagersand, M., Yogamani, S.: Deep semantic segmentation for automated driving: Taxonomy, roadmap, and challenges. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 1–8. IEEE (2017)
Mo, Y., Wu, Y., Yang, X., Liu, F., Liao, Y.: Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493, 626–646 (2022)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Dumoulin, V., Visin, F. (2016) A Guide to Convolution Arithmetic for Deep Learning. arXiv: 1603.07285
Yang, D., Zhu, T., Wang, S., Wang, S., Xiong, Z.: LFRSNet: a robust light field semantic segmentation network combining contextual and geometric features. Front. Environ. Sci. 10, 996513 (2022)
Article Google Scholar
Li, S., Chen, J., Peng, W., Shi, X., Bu, W.: A vehicle detection method based on disparity segmentation. Multimed. Tools Appl. 82(13), 19643–19655 (2023)
Article Google Scholar
An, F., Wang, J., Liu, R.: Road traffic sign recognition algorithm based on cascade attention-modulation fusion mechanism. IEEE Trans. Intell. Transp. Syst. 25(11), 17841–17851 (2024)
Article Google Scholar
Gu, X., Chen, X., Lu, P., Lan, X., Li, X., Du, Y.: SiMaLSTM-SNP: novel semantic relatedness learning model preserving both Siamese networks and membrane computing. J. Supercomput. 80(3), 3382–3411 (2024)
Article Google Scholar
Chen, J., Wang, Q., Cheng, H.H., Peng, W., Xu, W.: A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transp. Syst. 23(11), 19954–19979 (2022)
Article Google Scholar
Khan, S.D., Alarabi, L., Basalamah, S.: Segmentation of farmlands in aerial images by deep learning framework with feature fusion and context aggregation modules. Multimed. Tools Appl. 82(27), 42353–42372 (2023)
Article Google Scholar
Hu, X., Feng, J., Gong, J.: LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes. Pattern Anal. Appl. 27(1), 27 (2024)
Article Google Scholar
Khan, S.D., Alarabi, L., Basalamah, S.: DSMSA-Net: deep spatial and multi-scale attention network for road extraction in high spatial resolution satellite images. Arab. J. Sci. Eng. 48(2), 1907–1920 (2023)
Article Google Scholar
Fathian, K., Ramirez-Paredes, J.P., Doucette, E.A., Curtis, J.W., Gans, N.R.: Quest: a quaternion-based approach for camera motion estimation from minimal feature points. IEEE Robot. Autom. Lett. 3(2), 857–864 (2018)
Article Google Scholar
Zhang, Y., Fathian, K., Gans, N. R.: VEst: an efficient solution to the camera velocity estimation from minimal feature points. In: 2020 American Control Conference (ACC), pp. 3381–3386. IEEE (2020)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Article Google Scholar
Wang, Y., Zhou, Q., Liu, J., Xiong, J., Gao, G., Wu, X., Latecki, L.J.: Lednet: a lightweight encoder decoder network for real-time semantic segmentation In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1860–1864 IEEE (2019)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341 (2018). https://arxiv.org/abs/1808.00897v1
Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9716–9725 (2021) https://arxiv.org/abs/2104.13188.
Hong, Y., Pan, H., Sun, W., Jia, Y.: Deep dual resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv: 2101.06085 (2021)
Xu, J., Xiong, Z., Bhattacharyya, S.P.: PIDNet: a real time semantic segmentation network inspired by pid controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19529–19539 (2023). https://arxiv.org/abs/2206.02066
Ouali, Y., Hudelot, C., Tami, M.: Semi supervised semantic segmentation with cross consistency training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12674–12684 (2020). https://arxiv.org/abs/2003.09005
Hafiz, A.M., Bhat, G.M.: A survey on instance segmentation: state of the art. Int. J. Multimed. Inf. Retr. 9(3), 171–189 (2020)
Article Google Scholar
Nilsson, D., Sminchisescu, C.: Semantic video segmentation by gated recurrent flow propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6819–6828 (2018)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder decoder with around separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018). https://arxiv.org/abs/1802.02611v3
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre trained image architectures for real-time semantic segmentation of road driving images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12607–12616 (2019)
Elhassan, M.A., Huang, C., Yang, C., Munea, T.L.: DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst. Appl. 183, 115090 (2021)
Article Google Scholar
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 129, 3051–3068 (2021)
Article Google Scholar
Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3–19) (2018). https://arxiv.org/abs/1807.06521v2
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al.: Attention is all you need. In: Advancements in Neural Information Processing Systems 30, pp. 5998–6008 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). https://arxiv.org/abs/1711.07971v3
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: non local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X.: Self supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12275–12284 (2020). https://arxiv.org/abs/2004.04581
Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhang, J., Huang, Z.: Efficient multi scale attention module with cross spatial learning. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE
Hu, J., Shen, L., Sun, G.: Squeeze and extraction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://arxiv.org/abs/1709.01507v4.
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Paszke, A., Chaurasia, A., Kim, S., Culuciello, E.: Enet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation (2016). arXiv: 1606.02147
Badrinarayanan, V., Kendall, A., Cipolla, R.: Signal: a deep convolutional encoder decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer International Publishing (2015)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)
Article Google Scholar
Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., Lau, R.W.: Night-time scene parsing with a large real dataset. IEEE Trans. Image Process. 30, 9085–9098 (2021)
Article Google Scholar
Jadon, S.: A survey of loss functions for semantic segmentation. In: 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7. IEEE (2020)
Shrivastava, A., Gupta, A., Girshick, R.: Training region based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Sharifi, M., Fathy, M., Mahmoudi, M.T.: A classified and comparative study of edge detection algorithms. In: Proceedings International Conference on Information Technology: Coding and Computing, pp. 117–120. IEEE (2002)
Peng, J., Liu, Y., Tang, S., Hao, Y., Chu, L., Chen, G., et al.: Pp-liteseg: A Superior Real-Time Semantic Segmentation Model (2022). arXiv:2204.02681
Si, H., Zhang, Z., Lv, F., Yu, G., Lu, F.: Real Time Semantic Segmentation Via Multiple Spatial Fusion Network (2019). arXiv:1911.07217
Kumaar, S., Lyu, Y., Nex, F., Yang, M.Y.: Cabinet: efficient context aggregation network for low latency semantic segmentation. In: 2021, IEEE International Conference on Robotics and Automation (ICRA), pp. 13517–13524. IEEE (2021)
Wen, X., Yan, Y., Dong, G., Shu, C., Wang, B., Wang, H., Zhang, J.: Deep multi branch aggregation network for real-time semantic segmentation in street scenes. IEEE Trans. Intell. Transp. Syst. 23(10), 17224–17240 (2022)
Article Google Scholar
Nirkin, Y., Wolf, L., Hassner, T.: Hyperseg: patch wise hypernetwork for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4070 (2021). https://arxiv.org/abs/2012.11582
Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multispectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115. IEEE (2017)
Shi, X., Yin, Z., Han, G., Liu, W., Qin, L., Bi, Y., Li, S.: BSSNet: a real-time semantic segmentation network for road scenes inspired from AutoEncoder. IEEE Trans. Circuits Syst. Video Technol. (2023). https://doi.org/10.1109/TCSVT.2023.3325360
Article Google Scholar

Download references

Acknowledgements

We thank all anonymous reviewers for their constructive suggestions. This work was supported by National Natural Science Foundation of China (62106214); Science Research Project of Hebei Education Department (CXY2024024); Provincial Key Laboratory Performance Subsidy Project (22567612H).

Author information

Authors and Affiliations

Key Lab of Industrial Computer Control Engineering of Hebei Province, Yanshan University, Qinhuangdao, 066004, China
Wenming Zhang, Shaotong Zhang, Yaqian Li & Haibin Li
Hebei Province Key Laboratory of Test/Measurement Technology and Instrument, School of Electrical Engineering, Yanshan University, Qinhuangdao, 066004, China
Tao Song

Authors

Wenming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaotong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaqian Li
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao Song
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Wenming Zhang: Conceptualization, Methodology, Funding acquistion. Shaotong Zhang: Investigation, Writing—original draft, Writing—review & editing. Yaqian Li: Investigation, Methodology, Data curation. Haibin Li: Formal analysis, Supervision, Project administration. Tao Song: Validation, Software, Visualization.

Corresponding author

Correspondence to Shaotong Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, W., Zhang, S., Li, Y. et al. DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes. J Real-Time Image Proc 21, 195 (2024). https://doi.org/10.1007/s11554-024-01579-6

Download citation

Received: 10 July 2024
Accepted: 29 October 2024
Published: 13 November 2024
DOI: https://doi.org/10.1007/s11554-024-01579-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes

LDMSNet: Lightweight Dual-Branch Multi-Scale Network for Real-Time Semantic Segmentation of Autonomous Driving

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes

LDMSNet: Lightweight Dual-Branch Multi-Scale Network for Real-Time Semantic Segmentation of Autonomous Driving

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation