Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Buffer ladder feature fusion architecture for semantic segmentation improvement

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Scene semantic segmentation plays an important role in computer vision. For real-time image segmentation, the waterfall atrous spatial pooling network (WASPnet) and the residual-two-network-segmentation (Res2Net-Seg) are two widely used lightweight networks. However, the WASPnet often loses the local features and the Res2Net-Seg tends to fuse trivially local features in the process of feature extraction. To solve these problems, this paper incorporates the advantage of Res2Net-Seg into WASPnet and extends the WASPnet with respect to two aspects. Firstly, a buffer ladder, which is based on the atrous convolution structure and the spatial pyramid pool architecture, is exploited to improve the deep feature extraction by capturing the multi-scale context. Secondly, the proposed architecture introduces a channel attention mechanism into the decoder. Thereby, the channel attention mechanism exploits the score maps output of the proposed structure. Compared to the WASPnet, the proposed network increases the MIoU on the Pascal visual object class (VOC) 2012 and Cityscapes dataset by 2.76% and 3.19%, respectively. In fact, the proposed buffer ladder improves not only the lightweight networks, but also the DeepLabv3+, which performs the best to date and has the similar module with WASPnet. The buffer ladder structure improves the MIoU of DeepLabv3+ on the Pascal VOC2012 and Cityscapes dataset by 1.48% and 2.11%, respectively. Finally, this paper proves the real-time performance with a GTX 2080Ti graphics processing unit and the results show that the proposed networks are capable of fulfilling real-time segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Not applicable.

References

  1. Banu, A.S., Deivalakshmi, S.: Awunet: leaf area segmentation based on attention gate and wavelet pooling mechanism. Signal Image Video Process. 17(5), 1915–1924 (2022)

    Article  Google Scholar 

  2. Candan, A.T., Kalkan, H.: U-net-based RGB and LiDAR image fusion for road segmentation. Signal Image Video Process. 17(6), 2837–2843 (2023)

    Article  Google Scholar 

  3. Zhang, L., Lan, C., Fu, L., Mao, X., Zhang, M.: Segmentation of brain tumor MRI image based on improved attention module Unet network. Signal Image Video Process. 17(5), 2277–2285 (2023)

    Article  Google Scholar 

  4. Lee, D.H., Liu, J.L.: End-to-end deep learning of lane detection and path prediction for real-time autonomous driving. Signal Image Video Process. 17(1), 199–205 (2023)

    Article  Google Scholar 

  5. Zhang, Y., Qian, K., Zhu, Z., Yu, H., Zhang, B.: DBA-UNet: a double U-shaped boundary attention network for maxillary sinus anatomical structure segmentation in CBCT images. Signal Image Video Process. 17(5), 2251–2257 (2022)

    Article  Google Scholar 

  6. Marhamati, M., Zadeh, A.A.L., Fard, M.M., Hussain, M.A., Jafarnezhad, K., Jafarnezhad, A., Bakhtoor, M., Momeny, M.: LAIU-NET: a learning-to-augment incorporated robust U-Net for depressed humans’ tongue segmentation. Displays 76, 102371 (2023)

    Article  Google Scholar 

  7. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  8. Samy, M., Amer, K., Eissa, K., Shaker, M., Elhelw, M.: Nu-net: deep residual wide field of view convolutional neural network for semantic segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)

  9. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. Computer ence (2015)

  10. Artacho, B., Savakis, A.: Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors 19(24), 5361 (2019)

    Article  Google Scholar 

  11. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017)

  12. Liu, M., Fu, B., Xie, S., He, H., Lan, F., Li, Y., Lou, P., Fan, D.: Comparison of multi-source satellite images for classifying marsh vegetation using deeplabv3 plus deep learning algorithm. Ecol. Indic. 125, 107562 (2021)

    Article  Google Scholar 

  13. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2(88), 303–338 (2010)

    Article  Google Scholar 

  14. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)

  15. Zhou, W., Chen, K.: A lightweight hand gesture recognition in complex backgrounds. Displays 74, 102226 (2022)

    Article  Google Scholar 

  16. Lv, Y., Ma, H., Li, J., Liu, S.: Attention guided U-Net with atrous convolution for accurate retinal vessels segmentation. IEEE Access 8, 32826–32839 (2020)

    Article  Google Scholar 

  17. Yuan, Y., Zengyong, X., Gang, L.: Spedccnn: spatial pyramid-oriented encoder–decoder cascade convolution neural network for crop disease leaf segmentation. IEEE Access 9, 14849–14866 (2021)

    Article  Google Scholar 

  18. Tian, Y., Chen, F., Wang, H., Zhang, S.: Real-time semantic segmentation network based on lite reduced atrous spatial pyramid pooling module group. In: 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), pp. 139–143. IEEE (2020)

  19. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

  20. Wang, J., Pan, Z., Wang, G., Li, M., Li, Y.: Spatial pyramid pooling of selective convolutional features for vein recognition. IEEE Access 6, 28563–28572 (2018)

    Article  Google Scholar 

  21. Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.-R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)

    Article  Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  23. Barchid, S., Mennesson, J., Djéraba, C.: Review on indoor RGB-D semantic segmentation with deep convolutional neural networks. In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–4. IEEE (2021)

  24. Zifeng, W., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognit. 90, 119–133 (2019)

  25. Singh, A., Kumar, D.: Detection of stress, anxiety and depression (SAD) in video surveillance using ResNet-101. Microprocess. Microsyst. 95, 104681 (2022)

    Article  Google Scholar 

  26. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)

  27. Miao, J., Wei, Y., Wu, Y., Liang, C., Li, G., Yang, Y.: Vspw: a large-scale dataset for video scene parsing in the wild. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4131–4141 (2021)

Download references

Funding

This work is supported by Hisense Group.

Author information

Authors and Affiliations

Authors

Contributions

ZL designed the proposed network, implemented all the experiments, wrote the main manuscript text and prepared Figs. 1, 2, 3 and 4. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhichun Lei.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Lei, Z. Buffer ladder feature fusion architecture for semantic segmentation improvement. SIViP 18, 475–483 (2024). https://doi.org/10.1007/s11760-023-02754-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02754-1

Keywords