Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

DMFTNet: dense multimodal fusion transfer network for free-space detection

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Free-space detection is an essential task in autonomous driving; it can be formulated as the semantic segmentation of driving scenes. An important line of research in free-space detection is the use of convolutional neural networks to achieve high-accuracy semantic segmentation. In this study, we introduce two fusion modules: the dense exploration module (DEM) and the dual-attention exploration module (DAEM). They efficiently capture diverse fusion information by fully exploring deep and representative information at each network stage. Furthermore, we propose a dense multimodal fusion transfer network (DMFTNet). This architecture uses elaborate multimodal deep fusion exploration modules to extract fused features from red–green–blue and depth features at every stage with the help of DEM and DAEM and then densely transfer them to predict the free space. Extensive experiments were conducted comparing DMFTNet and 11 state-of-the-art approaches on two datasets. The proposed fusion module ensured that DMFTNet’s free-space-detection performance was superior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Poonja, H.A., Shirazi, M.A., Khan, M.J., Javed, K.: Engagement detection and enhancement for STEM education through computer vision, augmented reality, and haptics. Image Vis. Comput. 136, 104720 (2023)

    Article  Google Scholar 

  2. Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)

    Article  Google Scholar 

  3. Min, C., Jiang, W, Zhao, D., Xu, J., Xiao, L., Nie, Y., Dai, B.: ORFD: a dataset and benchmark for OFF-Road Freespace Detection. In: Proc. of ICRA, pp. 2532–2538 (2022)

  4. Tong, K., Wu, Y.: Deep learning-based detection from the perspective of small or tiny objects: a survey. Image Vis. Comput. 123, 104471 (2022)

    Article  Google Scholar 

  5. Xiao, C., Hao, X., Li, H., Li, Y., Zhang, W.: Real-time semantic segmentation with local spatial pixel adjustment. Image Vis. Comput. 123, 104470 (2022)

    Article  Google Scholar 

  6. Zhang, J., Li, Z., Zhang, C., Ma, H.: Stable self-attention adversarial learning for semi-supervised semantic image segmentation. J. Vis. Commun. Image Represent. 78, 103170 (2021)

    Article  Google Scholar 

  7. Cui, Y., Yu, M., Jiang, G., Peng, Z., Chen, F.: Blind light field image quality assessment by analyzing angular-spatial characteristics. Dig. Signal Process. 117, 103138 (2021)

    Article  Google Scholar 

  8. Chang, H.H., Yeh, C.H.: Face anti-spoofing detection based on multi-scale image quality assessment. Image Vis. Comput. 121, 104428 (2022)

    Article  Google Scholar 

  9. Bhandari, A.K., Subramani, B., Veluchamy, M.: Multi-exposure optimized contrast and brightness balance color image enhancement. Digital Signal Process. 123, 103406 (2022)

    Article  Google Scholar 

  10. Yang, K., Li, J., Dai, S., Li, X.: Multiscale features integration based multiple-in-single-out network for object detection. Image Vis. Comput. 135, 104714 (2023)

    Article  Google Scholar 

  11. Mocanu, B., Tapu, R., Zaharia, T.: Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis. Comput. 133, 104676 (2023)

    Article  Google Scholar 

  12. Chen, S., Tan, X., Wang, B., Lu, H., Hu, X., Fu, Y.: Reverse attention- based residual network for salient object detection. IEEE Trans. Image Process. 29, 3763–3776 (2020)

    Article  Google Scholar 

  13. Cai, P., Wang, S., Sun, Y., Liu, M.: Probabilistic end-to-end vehicle navigation in : complex dynamic environments with multimodal sensor fusion. IEEE Robot. Autom. Lett. 5(3), 4218–4224 (2020)

    Google Scholar 

  14. Thoma, J., Paudel, D.P., Chhatkuli, A., Probst, T., Gool, l.V.: Mapping, localization and path planning for image-based navigation using visual features and map. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7375–7383 (2019)

  15. Fan, R., Ozgunalp, U., Hosking, B., Liu, M., Pitas, I.: Pothole detection based on disparity transformation and road surface modeling. IEEE Trans. Image Process. 29, 897–908 (2020)

    Article  MathSciNet  Google Scholar 

  16. Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)

    Article  Google Scholar 

  17. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computing 1(4), 541–551 (1989)

    Article  Google Scholar 

  18. Wang, H., Fan, R., Sun, Y., Liu, M.: Applying surface normal information in drivable area and road anomaly detection for ground mobile robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2706–2711 (2020)

  19. Lu, C., van de Molengraft, M.J.G., Dubbelman, G.: Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks. IEEE Robot. Autom. Lett. 4(2), 445–452 (2009)

    Article  Google Scholar 

  20. Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian Conference on Computer Vision (ACCV), pp. 213–228 (2017)

  21. Hernandez-Juarez, D., Schneider, L., Espinosa, A., Vázquez, D., López, A.M., Franke, U., Moure, J.C.: Slanted Stixels: Representing San Francisco's Steepest Streets, (2017), [online] Available: https://arxiv.org/abs/1707.05397

  22. Fan, R., Wang, H., Cai, P., Liu, M.: SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: European Conference on Computer Vision (ECCV), pp. 340–356 (2020)

  23. Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440, (2015)

  24. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) pp. 234–241, Springer, Cham.

  25. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  26. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)

  27. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)

  28. Jiang, J., Zheng, L., Luo, F., Zhang, Z.: RedNet: residual encoder-decoder network for indoor RGB-D semantic segmentation, (2018), [online] Available: https://arxiv.org/abs/1806.01054.

  29. Wang, W., Neumann, U.: Depth-aware cnn for rgb-d segmentation. In: European Conference on Computer Vision (ECCV), pp. 144–161 (2018)

  30. Zhou, W., Liu, W., Lei, J., Luo, T., Yu, L.: Deep binocular fixation prediction using a hierarchical multimodal fusion network. IEEE Trans. Cognit. Dev. Syst. 15, 476–486 (2023). https://doi.org/10.1109/TCDS.2021.3051010

    Article  Google Scholar 

  31. Hu, X., Yang, K., Fei, L., Wang, k.: ACNET: attention based network to exploit complementary features for RGBD semantic segmentation. In: IEEE International Conference on Image Processing (ICIP), pp. 1440–1444 (2019)

  32. Zhou, W., Wu, J., Lei, J., Hwang, J.-N., Yu, L.: Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder. IEEE Trans. Multimedia 23, 3388–3399 (2021)

    Article  Google Scholar 

  33. Park, S.J., Hong, K. S., Lee, S.: Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4980–4989 (2017)

  34. Zhou, W., Yuan, J., Lei, J., Luo, T.: TSNet: three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell. Syst. 36(4), 73–78 (2021)

    Article  Google Scholar 

  35. Zhang, G., Xue, J.-H., Xie, P., Yang, S., Wang, G.: Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process. Lett. 28, 658–662 (2021)

    Article  Google Scholar 

  36. Yue, Y., Zhou, W., Lei, J., Yu, L.: Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process. Lett. 28, 1115–1119 (2021)

    Article  Google Scholar 

  37. Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7253–7262 (2019)

  38. Sun, L., Yang, K., Hu, X., Hu, W., Wang, K.: Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot. Autom. Lett. 5(4), 5558–5565 (2020)

    Article  Google Scholar 

  39. Seichter, D., Köhler, M., Lewandowski, B., Wengefeld, T., Gross, H.M.: Efficient RGB-D semantic segmentation for indoor scene analysis. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 13525–13531

  40. Li, C., Cong, R., Guo, C., Li, H., Zhang, C., Zheng, F., Zhao, Y.: A parallel down-up fusion network for salient object detection in optical remote sensing images. Neural Computi 415, 411–120 (2020)

    Google Scholar 

  41. He, K., Zhang, X., Ren, S., Sun, j.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  42. Huang G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708 (2017)

  43. Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5659–5667 (2017)

  44. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3146–3154 (2019)

  45. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998– 6008 (2017)

  46. Berman, M., Triki, A.R., Blaschko, M.B.: The Lovasz-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4413–4421 (2018)

  47. Chen, X., Lin, K.Y., Wang, J., Wu, W., Qian, C., Li, H., Zeng, G.G.: Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 561–577 (2020)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62371422, 61971247); and the Zhejiang Provincial Natural Science Foundation of China (LY18F020012).

Author information

Authors and Affiliations

Authors

Contributions

Jiabao Ma: Contributed to drafting the article. The analysis and interpretation of data associated with the work contained in the article. Wujie Zhou: Made a significant intellectual contribution to the theoretical development, system and experimental design, prototype development, and the analysis and interpretation of data associated with the work contained in the article. Meixin Fang: Reviewing and revising the article for intellectual content. System and experimental design. Ting Luo: Reviewing and revising the article for intellectual content. The analysis and interpretation of data associated with the work contained in the article. Approved the final version of the article as accepted for publication, including references. Prototype development.

Corresponding author

Correspondence to Wujie Zhou.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Marie Katsurai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, J., Zhou, W., Fang, M. et al. DMFTNet: dense multimodal fusion transfer network for free-space detection. Multimedia Systems 30, 226 (2024). https://doi.org/10.1007/s00530-024-01417-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01417-6

Keywords