Buffer ladder feature fusion architecture for semantic segmentation improvement

Liu, Zonghui; Lei, Zhichun

doi:10.1007/s11760-023-02754-1

Buffer ladder feature fusion architecture for semantic segmentation improvement

Original Paper
Published: 23 September 2023

Volume 18, pages 475–483, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Zonghui Liu¹ &
Zhichun Lei¹

222 Accesses
1 Altmetric
Explore all metrics

Abstract

Scene semantic segmentation plays an important role in computer vision. For real-time image segmentation, the waterfall atrous spatial pooling network (WASPnet) and the residual-two-network-segmentation (Res2Net-Seg) are two widely used lightweight networks. However, the WASPnet often loses the local features and the Res2Net-Seg tends to fuse trivially local features in the process of feature extraction. To solve these problems, this paper incorporates the advantage of Res2Net-Seg into WASPnet and extends the WASPnet with respect to two aspects. Firstly, a buffer ladder, which is based on the atrous convolution structure and the spatial pyramid pool architecture, is exploited to improve the deep feature extraction by capturing the multi-scale context. Secondly, the proposed architecture introduces a channel attention mechanism into the decoder. Thereby, the channel attention mechanism exploits the score maps output of the proposed structure. Compared to the WASPnet, the proposed network increases the MIoU on the Pascal visual object class (VOC) 2012 and Cityscapes dataset by 2.76% and 3.19%, respectively. In fact, the proposed buffer ladder improves not only the lightweight networks, but also the DeepLabv3+, which performs the best to date and has the similar module with WASPnet. The buffer ladder structure improves the MIoU of DeepLabv3+ on the Pascal VOC2012 and Cityscapes dataset by 1.48% and 2.11%, respectively. Finally, this paper proves the real-time performance with a GTX 2080Ti graphics processing unit and the results show that the proposed networks are capable of fulfilling real-time segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Article 24 January 2024

Learning More Accurate Features for Semantic Segmentation in CycleNet

Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation

Article 16 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Not applicable.

References

Banu, A.S., Deivalakshmi, S.: Awunet: leaf area segmentation based on attention gate and wavelet pooling mechanism. Signal Image Video Process. 17(5), 1915–1924 (2022)
Article Google Scholar
Candan, A.T., Kalkan, H.: U-net-based RGB and LiDAR image fusion for road segmentation. Signal Image Video Process. 17(6), 2837–2843 (2023)
Article Google Scholar
Zhang, L., Lan, C., Fu, L., Mao, X., Zhang, M.: Segmentation of brain tumor MRI image based on improved attention module Unet network. Signal Image Video Process. 17(5), 2277–2285 (2023)
Article Google Scholar
Lee, D.H., Liu, J.L.: End-to-end deep learning of lane detection and path prediction for real-time autonomous driving. Signal Image Video Process. 17(1), 199–205 (2023)
Article Google Scholar
Zhang, Y., Qian, K., Zhu, Z., Yu, H., Zhang, B.: DBA-UNet: a double U-shaped boundary attention network for maxillary sinus anatomical structure segmentation in CBCT images. Signal Image Video Process. 17(5), 2251–2257 (2022)
Article Google Scholar
Marhamati, M., Zadeh, A.A.L., Fard, M.M., Hussain, M.A., Jafarnezhad, K., Jafarnezhad, A., Bakhtoor, M., Momeny, M.: LAIU-NET: a learning-to-augment incorporated robust U-Net for depressed humans’ tongue segmentation. Displays 76, 102371 (2023)
Article Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Samy, M., Amer, K., Eissa, K., Shaker, M., Elhelw, M.: Nu-net: deep residual wide field of view convolutional neural network for semantic segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. Computer ence (2015)
Artacho, B., Savakis, A.: Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors 19(24), 5361 (2019)
Article Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017)
Liu, M., Fu, B., Xie, S., He, H., Lan, F., Li, Y., Lou, P., Fan, D.: Comparison of multi-source satellite images for classifying marsh vegetation using deeplabv3 plus deep learning algorithm. Ecol. Indic. 125, 107562 (2021)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2(88), 303–338 (2010)
Article Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Zhou, W., Chen, K.: A lightweight hand gesture recognition in complex backgrounds. Displays 74, 102226 (2022)
Article Google Scholar
Lv, Y., Ma, H., Li, J., Liu, S.: Attention guided U-Net with atrous convolution for accurate retinal vessels segmentation. IEEE Access 8, 32826–32839 (2020)
Article Google Scholar
Yuan, Y., Zengyong, X., Gang, L.: Spedccnn: spatial pyramid-oriented encoder–decoder cascade convolution neural network for crop disease leaf segmentation. IEEE Access 9, 14849–14866 (2021)
Article Google Scholar
Tian, Y., Chen, F., Wang, H., Zhang, S.: Real-time semantic segmentation network based on lite reduced atrous spatial pyramid pooling module group. In: 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), pp. 139–143. IEEE (2020)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Wang, J., Pan, Z., Wang, G., Li, M., Li, Y.: Spatial pyramid pooling of selective convolutional features for vein recognition. IEEE Access 6, 28563–28572 (2018)
Article Google Scholar
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.-R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Barchid, S., Mennesson, J., Djéraba, C.: Review on indoor RGB-D semantic segmentation with deep convolutional neural networks. In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–4. IEEE (2021)
Zifeng, W., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognit. 90, 119–133 (2019)
Singh, A., Kumar, D.: Detection of stress, anxiety and depression (SAD) in video surveillance using ResNet-101. Microprocess. Microsyst. 95, 104681 (2022)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)
Miao, J., Wei, Y., Wu, Y., Liang, C., Li, G., Yang, Y.: Vspw: a large-scale dataset for video scene parsing in the wild. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4131–4141 (2021)

Download references

Funding

This work is supported by Hisense Group.

Author information

Authors and Affiliations

School of Microelectronics, Tianjin University, Tianjin, 300072, China
Zonghui Liu & Zhichun Lei

Authors

Zonghui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhichun Lei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZL designed the proposed network, implemented all the experiments, wrote the main manuscript text and prepared Figs. 1, 2, 3 and 4. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhichun Lei.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Z., Lei, Z. Buffer ladder feature fusion architecture for semantic segmentation improvement. SIViP 18, 475–483 (2024). https://doi.org/10.1007/s11760-023-02754-1

Download citation

Received: 15 July 2023
Revised: 10 August 2023
Accepted: 19 August 2023
Published: 23 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11760-023-02754-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Buffer ladder feature fusion architecture for semantic segmentation improvement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Learning More Accurate Features for Semantic Segmentation in CycleNet

Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Buffer ladder feature fusion architecture for semantic segmentation improvement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Learning More Accurate Features for Semantic Segmentation in CycleNet

Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation