Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Semantic segmentation is crucial in autonomous driving because of its accurate identification and segmentation of objects and regions. However, there is a conflict between segmentation accuracy and real-time performance on embedded devices. We propose an efficient lightweight semantic segmentation network (DRMNet) to solve these problems. Employing a streamlined bilateral structure, the model encodes semantic and spatial paths, cross-fusing features during encoding, and incorporates unique skip connections to coordinate upsampling within the semantic pathway. We design a new self-calibrated aggregate pyramid pooling module (SAPPM) at the end of the semantic branch to capture more comprehensive multi-scale semantic information and balance its extraction and inference speed. Furthermore, we designed a new feature fusion module, which guides the fusion of detail features and semantic features through attention perception, alleviating the problem of semantic information quickly covering spatial detail information. Experimental results on the CityScapes, CamVid, and NightCity datasets demonstrate the effectiveness of DRMNet. On a 2080Ti GPU, DRMNet achieves 78.6% mIoU at 88.3 FPS on the CityScapes dataset, 78.9% mIoU at 149 FPS on the CamVid dataset, and 53.5% mIoU at 160.4 FPS on the NightCity dataset. These results highlight the model’s ability to balance accuracy and real-time performance better, making it suitable for embedded devices in autonomous driving applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Feng, D., Haase Schütz, C., Rosenbaum, L., Hertlein, H., Glaeser, C., Timm, F., et al.: Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 22(3), 1341–1360 (2020)

    Article  Google Scholar 

  2. Siam, M., Elkerdawy, S., Jagersand, M., Yogamani, S.: Deep semantic segmentation for automated driving: Taxonomy, roadmap, and challenges. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 1–8. IEEE (2017)

  3. Mo, Y., Wu, Y., Yang, X., Liu, F., Liao, Y.: Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493, 626–646 (2022)

    Article  Google Scholar 

  4. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  5. Dumoulin, V., Visin, F. (2016) A Guide to Convolution Arithmetic for Deep Learning. arXiv: 1603.07285

  6. Yang, D., Zhu, T., Wang, S., Wang, S., Xiong, Z.: LFRSNet: a robust light field semantic segmentation network combining contextual and geometric features. Front. Environ. Sci. 10, 996513 (2022)

    Article  Google Scholar 

  7. Li, S., Chen, J., Peng, W., Shi, X., Bu, W.: A vehicle detection method based on disparity segmentation. Multimed. Tools Appl. 82(13), 19643–19655 (2023)

    Article  Google Scholar 

  8. An, F., Wang, J., Liu, R.: Road traffic sign recognition algorithm based on cascade attention-modulation fusion mechanism. IEEE Trans. Intell. Transp. Syst. 25(11), 17841–17851 (2024)

    Article  Google Scholar 

  9. Gu, X., Chen, X., Lu, P., Lan, X., Li, X., Du, Y.: SiMaLSTM-SNP: novel semantic relatedness learning model preserving both Siamese networks and membrane computing. J. Supercomput. 80(3), 3382–3411 (2024)

    Article  Google Scholar 

  10. Chen, J., Wang, Q., Cheng, H.H., Peng, W., Xu, W.: A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transp. Syst. 23(11), 19954–19979 (2022)

    Article  Google Scholar 

  11. Khan, S.D., Alarabi, L., Basalamah, S.: Segmentation of farmlands in aerial images by deep learning framework with feature fusion and context aggregation modules. Multimed. Tools Appl. 82(27), 42353–42372 (2023)

    Article  Google Scholar 

  12. Hu, X., Feng, J., Gong, J.: LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes. Pattern Anal. Appl. 27(1), 27 (2024)

    Article  Google Scholar 

  13. Khan, S.D., Alarabi, L., Basalamah, S.: DSMSA-Net: deep spatial and multi-scale attention network for road extraction in high spatial resolution satellite images. Arab. J. Sci. Eng. 48(2), 1907–1920 (2023)

    Article  Google Scholar 

  14. Fathian, K., Ramirez-Paredes, J.P., Doucette, E.A., Curtis, J.W., Gans, N.R.: Quest: a quaternion-based approach for camera motion estimation from minimal feature points. IEEE Robot. Autom. Lett. 3(2), 857–864 (2018)

    Article  Google Scholar 

  15. Zhang, Y., Fathian, K., Gans, N. R.: VEst: an efficient solution to the camera velocity estimation from minimal feature points. In: 2020 American Control Conference (ACC), pp. 3381–3386. IEEE (2020)

  16. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

  17. Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)

    Article  Google Scholar 

  18. Wang, Y., Zhou, Q., Liu, J., Xiong, J., Gao, G., Wu, X., Latecki, L.J.: Lednet: a lightweight encoder decoder network for real-time semantic segmentation In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1860–1864 IEEE (2019)

  19. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341 (2018). https://arxiv.org/abs/1808.00897v1

  20. Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9716–9725 (2021) https://arxiv.org/abs/2104.13188.

  21. Hong, Y., Pan, H., Sun, W., Jia, Y.: Deep dual resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv: 2101.06085 (2021)

  22. Xu, J., Xiong, Z., Bhattacharyya, S.P.: PIDNet: a real time semantic segmentation network inspired by pid controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19529–19539 (2023). https://arxiv.org/abs/2206.02066

  23. Ouali, Y., Hudelot, C., Tami, M.: Semi supervised semantic segmentation with cross consistency training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12674–12684 (2020). https://arxiv.org/abs/2003.09005

  24. Hafiz, A.M., Bhat, G.M.: A survey on instance segmentation: state of the art. Int. J. Multimed. Inf. Retr. 9(3), 171–189 (2020)

    Article  Google Scholar 

  25. Nilsson, D., Sminchisescu, C.: Semantic video segmentation by gated recurrent flow propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6819–6828 (2018)

  26. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder decoder with around separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018). https://arxiv.org/abs/1802.02611v3

  27. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  28. Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre trained image architectures for real-time semantic segmentation of road driving images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12607–12616 (2019)

  29. Elhassan, M.A., Huang, C., Yang, C., Munea, T.L.: DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst. Appl. 183, 115090 (2021)

    Article  Google Scholar 

  30. Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 129, 3051–3068 (2021)

    Article  Google Scholar 

  31. Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021)

    Article  Google Scholar 

  32. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3–19) (2018). https://arxiv.org/abs/1807.06521v2

  33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al.: Attention is all you need. In: Advancements in Neural Information Processing Systems 30, pp. 5998–6008 (2017)

  34. Wang, X., Girshick, R., Gupta, A., He, K.: Non local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). https://arxiv.org/abs/1711.07971v3

  35. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: non local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  36. Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X.: Self supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12275–12284 (2020). https://arxiv.org/abs/2004.04581

  37. Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhang, J., Huang, Z.: Efficient multi scale attention module with cross spatial learning. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE

  38. Hu, J., Shen, L., Sun, G.: Squeeze and extraction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://arxiv.org/abs/1709.01507v4.

  39. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  40. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

  41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  42. Paszke, A., Chaurasia, A., Kim, S., Culuciello, E.: Enet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation (2016). arXiv: 1606.02147

  43. Badrinarayanan, V., Kendall, A., Cipolla, R.: Signal: a deep convolutional encoder decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  44. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer International Publishing (2015)

  45. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)

  46. Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)

    Article  Google Scholar 

  47. Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., Lau, R.W.: Night-time scene parsing with a large real dataset. IEEE Trans. Image Process. 30, 9085–9098 (2021)

    Article  Google Scholar 

  48. Jadon, S.: A survey of loss functions for semantic segmentation. In: 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7. IEEE (2020)

  49. Shrivastava, A., Gupta, A., Girshick, R.: Training region based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)

  50. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  51. Sharifi, M., Fathy, M., Mahmoudi, M.T.: A classified and comparative study of edge detection algorithms. In: Proceedings International Conference on Information Technology: Coding and Computing, pp. 117–120. IEEE (2002)

  52. Peng, J., Liu, Y., Tang, S., Hao, Y., Chu, L., Chen, G., et al.: Pp-liteseg: A Superior Real-Time Semantic Segmentation Model (2022). arXiv:2204.02681

  53. Si, H., Zhang, Z., Lv, F., Yu, G., Lu, F.: Real Time Semantic Segmentation Via Multiple Spatial Fusion Network (2019). arXiv:1911.07217

  54. Kumaar, S., Lyu, Y., Nex, F., Yang, M.Y.: Cabinet: efficient context aggregation network for low latency semantic segmentation. In: 2021, IEEE International Conference on Robotics and Automation (ICRA), pp. 13517–13524. IEEE (2021)

  55. Wen, X., Yan, Y., Dong, G., Shu, C., Wang, B., Wang, H., Zhang, J.: Deep multi branch aggregation network for real-time semantic segmentation in street scenes. IEEE Trans. Intell. Transp. Syst. 23(10), 17224–17240 (2022)

    Article  Google Scholar 

  56. Nirkin, Y., Wolf, L., Hassner, T.: Hyperseg: patch wise hypernetwork for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4070 (2021). https://arxiv.org/abs/2012.11582

  57. Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multispectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115. IEEE (2017)

  58. Shi, X., Yin, Z., Han, G., Liu, W., Qin, L., Bi, Y., Li, S.: BSSNet: a real-time semantic segmentation network for road scenes inspired from AutoEncoder. IEEE Trans. Circuits Syst. Video Technol. (2023). https://doi.org/10.1109/TCSVT.2023.3325360

    Article  Google Scholar 

Download references

Acknowledgements

We thank all anonymous reviewers for their constructive suggestions. This work was supported by National Natural Science Foundation of China (62106214); Science Research Project of Hebei Education Department (CXY2024024); Provincial Key Laboratory Performance Subsidy Project (22567612H).

Author information

Authors and Affiliations

Authors

Contributions

Wenming Zhang: Conceptualization, Methodology, Funding acquistion. Shaotong Zhang: Investigation, Writing—original draft, Writing—review & editing. Yaqian Li: Investigation, Methodology, Data curation. Haibin Li: Formal analysis, Supervision, Project administration. Tao Song: Validation, Software, Visualization.

Corresponding author

Correspondence to Shaotong Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Zhang, S., Li, Y. et al. DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes. J Real-Time Image Proc 21, 195 (2024). https://doi.org/10.1007/s11554-024-01579-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01579-6

Keywords