DMFTNet: dense multimodal fusion transfer network for free-space detection

Ma, Jiabao; Zhou, Wujie; Fang, Meixin; Luo, Ting

doi:10.1007/s00530-024-01417-6

DMFTNet: dense multimodal fusion transfer network for free-space detection

Regular Paper
Published: 29 July 2024

Volume 30, article number 226, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jiabao Ma¹,
Wujie Zhou^1,2,
Meixin Fang³ &
…
Ting Luo⁴

139 Accesses
Explore all metrics

Abstract

Free-space detection is an essential task in autonomous driving; it can be formulated as the semantic segmentation of driving scenes. An important line of research in free-space detection is the use of convolutional neural networks to achieve high-accuracy semantic segmentation. In this study, we introduce two fusion modules: the dense exploration module (DEM) and the dual-attention exploration module (DAEM). They efficiently capture diverse fusion information by fully exploring deep and representative information at each network stage. Furthermore, we propose a dense multimodal fusion transfer network (DMFTNet). This architecture uses elaborate multimodal deep fusion exploration modules to extract fused features from red–green–blue and depth features at every stage with the help of DEM and DAEM and then densely transfer them to predict the free space. Extensive experiments were conducted comparing DMFTNet and 11 state-of-the-art approaches on two datasets. The proposed fusion module ensured that DMFTNet’s free-space-detection performance was superior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Segmentation and Depth Estimation with RGB and DVS Sensor Fusion for Multi-view Driving Perception

SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection

Estimating Obstacle Maps for USVs Based on a Multistage Feature Aggregation and Semantic Feature Separation Network

Article 24 April 2021

Data availability

No datasets were generated or analysed during the current study.

References

Poonja, H.A., Shirazi, M.A., Khan, M.J., Javed, K.: Engagement detection and enhancement for STEM education through computer vision, augmented reality, and haptics. Image Vis. Comput. 136, 104720 (2023)
Article Google Scholar
Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)
Article Google Scholar
Min, C., Jiang, W, Zhao, D., Xu, J., Xiao, L., Nie, Y., Dai, B.: ORFD: a dataset and benchmark for OFF-Road Freespace Detection. In: Proc. of ICRA, pp. 2532–2538 (2022)
Tong, K., Wu, Y.: Deep learning-based detection from the perspective of small or tiny objects: a survey. Image Vis. Comput. 123, 104471 (2022)
Article Google Scholar
Xiao, C., Hao, X., Li, H., Li, Y., Zhang, W.: Real-time semantic segmentation with local spatial pixel adjustment. Image Vis. Comput. 123, 104470 (2022)
Article Google Scholar
Zhang, J., Li, Z., Zhang, C., Ma, H.: Stable self-attention adversarial learning for semi-supervised semantic image segmentation. J. Vis. Commun. Image Represent. 78, 103170 (2021)
Article Google Scholar
Cui, Y., Yu, M., Jiang, G., Peng, Z., Chen, F.: Blind light field image quality assessment by analyzing angular-spatial characteristics. Dig. Signal Process. 117, 103138 (2021)
Article Google Scholar
Chang, H.H., Yeh, C.H.: Face anti-spoofing detection based on multi-scale image quality assessment. Image Vis. Comput. 121, 104428 (2022)
Article Google Scholar
Bhandari, A.K., Subramani, B., Veluchamy, M.: Multi-exposure optimized contrast and brightness balance color image enhancement. Digital Signal Process. 123, 103406 (2022)
Article Google Scholar
Yang, K., Li, J., Dai, S., Li, X.: Multiscale features integration based multiple-in-single-out network for object detection. Image Vis. Comput. 135, 104714 (2023)
Article Google Scholar
Mocanu, B., Tapu, R., Zaharia, T.: Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis. Comput. 133, 104676 (2023)
Article Google Scholar
Chen, S., Tan, X., Wang, B., Lu, H., Hu, X., Fu, Y.: Reverse attention- based residual network for salient object detection. IEEE Trans. Image Process. 29, 3763–3776 (2020)
Article Google Scholar
Cai, P., Wang, S., Sun, Y., Liu, M.: Probabilistic end-to-end vehicle navigation in : complex dynamic environments with multimodal sensor fusion. IEEE Robot. Autom. Lett. 5(3), 4218–4224 (2020)
Google Scholar
Thoma, J., Paudel, D.P., Chhatkuli, A., Probst, T., Gool, l.V.: Mapping, localization and path planning for image-based navigation using visual features and map. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7375–7383 (2019)
Fan, R., Ozgunalp, U., Hosking, B., Liu, M., Pitas, I.: Pothole detection based on disparity transformation and road surface modeling. IEEE Trans. Image Process. 29, 897–908 (2020)
Article MathSciNet Google Scholar
Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)
Article Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computing 1(4), 541–551 (1989)
Article Google Scholar
Wang, H., Fan, R., Sun, Y., Liu, M.: Applying surface normal information in drivable area and road anomaly detection for ground mobile robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2706–2711 (2020)
Lu, C., van de Molengraft, M.J.G., Dubbelman, G.: Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks. IEEE Robot. Autom. Lett. 4(2), 445–452 (2009)
Article Google Scholar
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian Conference on Computer Vision (ACCV), pp. 213–228 (2017)
Hernandez-Juarez, D., Schneider, L., Espinosa, A., Vázquez, D., López, A.M., Franke, U., Moure, J.C.: Slanted Stixels: Representing San Francisco's Steepest Streets, (2017), [online] Available: https://arxiv.org/abs/1707.05397
Fan, R., Wang, H., Cai, P., Liu, M.: SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: European Conference on Computer Vision (ECCV), pp. 340–356 (2020)
Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440, (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) pp. 234–241, Springer, Cham.
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)
Jiang, J., Zheng, L., Luo, F., Zhang, Z.: RedNet: residual encoder-decoder network for indoor RGB-D semantic segmentation, (2018), [online] Available: https://arxiv.org/abs/1806.01054.
Wang, W., Neumann, U.: Depth-aware cnn for rgb-d segmentation. In: European Conference on Computer Vision (ECCV), pp. 144–161 (2018)
Zhou, W., Liu, W., Lei, J., Luo, T., Yu, L.: Deep binocular fixation prediction using a hierarchical multimodal fusion network. IEEE Trans. Cognit. Dev. Syst. 15, 476–486 (2023). https://doi.org/10.1109/TCDS.2021.3051010
Article Google Scholar
Hu, X., Yang, K., Fei, L., Wang, k.: ACNET: attention based network to exploit complementary features for RGBD semantic segmentation. In: IEEE International Conference on Image Processing (ICIP), pp. 1440–1444 (2019)
Zhou, W., Wu, J., Lei, J., Hwang, J.-N., Yu, L.: Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder. IEEE Trans. Multimedia 23, 3388–3399 (2021)
Article Google Scholar
Park, S.J., Hong, K. S., Lee, S.: Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4980–4989 (2017)
Zhou, W., Yuan, J., Lei, J., Luo, T.: TSNet: three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell. Syst. 36(4), 73–78 (2021)
Article Google Scholar
Zhang, G., Xue, J.-H., Xie, P., Yang, S., Wang, G.: Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process. Lett. 28, 658–662 (2021)
Article Google Scholar
Yue, Y., Zhou, W., Lei, J., Yu, L.: Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process. Lett. 28, 1115–1119 (2021)
Article Google Scholar
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7253–7262 (2019)
Sun, L., Yang, K., Hu, X., Hu, W., Wang, K.: Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot. Autom. Lett. 5(4), 5558–5565 (2020)
Article Google Scholar
Seichter, D., Köhler, M., Lewandowski, B., Wengefeld, T., Gross, H.M.: Efficient RGB-D semantic segmentation for indoor scene analysis. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 13525–13531
Li, C., Cong, R., Guo, C., Li, H., Zhang, C., Zheng, F., Zhao, Y.: A parallel down-up fusion network for salient object detection in optical remote sensing images. Neural Computi 415, 411–120 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, j.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Huang G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708 (2017)
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5659–5667 (2017)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3146–3154 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998– 6008 (2017)
Berman, M., Triki, A.R., Blaschko, M.B.: The Lovasz-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4413–4421 (2018)
Chen, X., Lin, K.Y., Wang, J., Wu, W., Qian, C., Li, H., Zeng, G.G.: Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 561–577 (2020)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62371422, 61971247); and the Zhejiang Provincial Natural Science Foundation of China (LY18F020012).

Author information

Authors and Affiliations

School of Information and Electronic Engineering, Zhejiang University of Science & Technology, Hangzhou, 310023, China
Jiabao Ma & Wujie Zhou
School of Computer Science and Engineering, Nanyang Technological University, Singapore, 308232, Singapore
Wujie Zhou
Zhejiang University, Hangzhou, 310027, China
Meixin Fang
College of Science and Technology, Ningbo University, Ningbo, 315211, China
Ting Luo

Authors

Jiabao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wujie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Meixin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Luo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jiabao Ma: Contributed to drafting the article. The analysis and interpretation of data associated with the work contained in the article. Wujie Zhou: Made a significant intellectual contribution to the theoretical development, system and experimental design, prototype development, and the analysis and interpretation of data associated with the work contained in the article. Meixin Fang: Reviewing and revising the article for intellectual content. System and experimental design. Ting Luo: Reviewing and revising the article for intellectual content. The analysis and interpretation of data associated with the work contained in the article. Approved the final version of the article as accepted for publication, including references. Prototype development.

Corresponding author

Correspondence to Wujie Zhou.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Marie Katsurai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, J., Zhou, W., Fang, M. et al. DMFTNet: dense multimodal fusion transfer network for free-space detection. Multimedia Systems 30, 226 (2024). https://doi.org/10.1007/s00530-024-01417-6

Download citation

Received: 27 July 2023
Accepted: 12 July 2024
Published: 29 July 2024
DOI: https://doi.org/10.1007/s00530-024-01417-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DMFTNet: dense multimodal fusion transfer network for free-space detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Segmentation and Depth Estimation with RGB and DVS Sensor Fusion for Multi-view Driving Perception

SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection

Estimating Obstacle Maps for USVs Based on a Multistage Feature Aggregation and Semantic Feature Separation Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DMFTNet: dense multimodal fusion transfer network for free-space detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Segmentation and Depth Estimation with RGB and DVS Sensor Fusion for Multi-view Driving Perception

SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection

Estimating Obstacle Maps for USVs Based on a Multistage Feature Aggregation and Semantic Feature Separation Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation