Abstract
Free-space detection is an essential task in autonomous driving; it can be formulated as the semantic segmentation of driving scenes. An important line of research in free-space detection is the use of convolutional neural networks to achieve high-accuracy semantic segmentation. In this study, we introduce two fusion modules: the dense exploration module (DEM) and the dual-attention exploration module (DAEM). They efficiently capture diverse fusion information by fully exploring deep and representative information at each network stage. Furthermore, we propose a dense multimodal fusion transfer network (DMFTNet). This architecture uses elaborate multimodal deep fusion exploration modules to extract fused features from red–green–blue and depth features at every stage with the help of DEM and DAEM and then densely transfer them to predict the free space. Extensive experiments were conducted comparing DMFTNet and 11 state-of-the-art approaches on two datasets. The proposed fusion module ensured that DMFTNet’s free-space-detection performance was superior.
Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Poonja, H.A., Shirazi, M.A., Khan, M.J., Javed, K.: Engagement detection and enhancement for STEM education through computer vision, augmented reality, and haptics. Image Vis. Comput. 136, 104720 (2023)
Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)
Min, C., Jiang, W, Zhao, D., Xu, J., Xiao, L., Nie, Y., Dai, B.: ORFD: a dataset and benchmark for OFF-Road Freespace Detection. In: Proc. of ICRA, pp. 2532–2538 (2022)
Tong, K., Wu, Y.: Deep learning-based detection from the perspective of small or tiny objects: a survey. Image Vis. Comput. 123, 104471 (2022)
Xiao, C., Hao, X., Li, H., Li, Y., Zhang, W.: Real-time semantic segmentation with local spatial pixel adjustment. Image Vis. Comput. 123, 104470 (2022)
Zhang, J., Li, Z., Zhang, C., Ma, H.: Stable self-attention adversarial learning for semi-supervised semantic image segmentation. J. Vis. Commun. Image Represent. 78, 103170 (2021)
Cui, Y., Yu, M., Jiang, G., Peng, Z., Chen, F.: Blind light field image quality assessment by analyzing angular-spatial characteristics. Dig. Signal Process. 117, 103138 (2021)
Chang, H.H., Yeh, C.H.: Face anti-spoofing detection based on multi-scale image quality assessment. Image Vis. Comput. 121, 104428 (2022)
Bhandari, A.K., Subramani, B., Veluchamy, M.: Multi-exposure optimized contrast and brightness balance color image enhancement. Digital Signal Process. 123, 103406 (2022)
Yang, K., Li, J., Dai, S., Li, X.: Multiscale features integration based multiple-in-single-out network for object detection. Image Vis. Comput. 135, 104714 (2023)
Mocanu, B., Tapu, R., Zaharia, T.: Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis. Comput. 133, 104676 (2023)
Chen, S., Tan, X., Wang, B., Lu, H., Hu, X., Fu, Y.: Reverse attention- based residual network for salient object detection. IEEE Trans. Image Process. 29, 3763–3776 (2020)
Cai, P., Wang, S., Sun, Y., Liu, M.: Probabilistic end-to-end vehicle navigation in : complex dynamic environments with multimodal sensor fusion. IEEE Robot. Autom. Lett. 5(3), 4218–4224 (2020)
Thoma, J., Paudel, D.P., Chhatkuli, A., Probst, T., Gool, l.V.: Mapping, localization and path planning for image-based navigation using visual features and map. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7375–7383 (2019)
Fan, R., Ozgunalp, U., Hosking, B., Liu, M., Pitas, I.: Pothole detection based on disparity transformation and road surface modeling. IEEE Trans. Image Process. 29, 897–908 (2020)
Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computing 1(4), 541–551 (1989)
Wang, H., Fan, R., Sun, Y., Liu, M.: Applying surface normal information in drivable area and road anomaly detection for ground mobile robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2706–2711 (2020)
Lu, C., van de Molengraft, M.J.G., Dubbelman, G.: Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks. IEEE Robot. Autom. Lett. 4(2), 445–452 (2009)
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian Conference on Computer Vision (ACCV), pp. 213–228 (2017)
Hernandez-Juarez, D., Schneider, L., Espinosa, A., Vázquez, D., López, A.M., Franke, U., Moure, J.C.: Slanted Stixels: Representing San Francisco's Steepest Streets, (2017), [online] Available: https://arxiv.org/abs/1707.05397
Fan, R., Wang, H., Cai, P., Liu, M.: SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: European Conference on Computer Vision (ECCV), pp. 340–356 (2020)
Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440, (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) pp. 234–241, Springer, Cham.
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)
Jiang, J., Zheng, L., Luo, F., Zhang, Z.: RedNet: residual encoder-decoder network for indoor RGB-D semantic segmentation, (2018), [online] Available: https://arxiv.org/abs/1806.01054.
Wang, W., Neumann, U.: Depth-aware cnn for rgb-d segmentation. In: European Conference on Computer Vision (ECCV), pp. 144–161 (2018)
Zhou, W., Liu, W., Lei, J., Luo, T., Yu, L.: Deep binocular fixation prediction using a hierarchical multimodal fusion network. IEEE Trans. Cognit. Dev. Syst. 15, 476–486 (2023). https://doi.org/10.1109/TCDS.2021.3051010
Hu, X., Yang, K., Fei, L., Wang, k.: ACNET: attention based network to exploit complementary features for RGBD semantic segmentation. In: IEEE International Conference on Image Processing (ICIP), pp. 1440–1444 (2019)
Zhou, W., Wu, J., Lei, J., Hwang, J.-N., Yu, L.: Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder. IEEE Trans. Multimedia 23, 3388–3399 (2021)
Park, S.J., Hong, K. S., Lee, S.: Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4980–4989 (2017)
Zhou, W., Yuan, J., Lei, J., Luo, T.: TSNet: three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell. Syst. 36(4), 73–78 (2021)
Zhang, G., Xue, J.-H., Xie, P., Yang, S., Wang, G.: Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process. Lett. 28, 658–662 (2021)
Yue, Y., Zhou, W., Lei, J., Yu, L.: Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process. Lett. 28, 1115–1119 (2021)
Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7253–7262 (2019)
Sun, L., Yang, K., Hu, X., Hu, W., Wang, K.: Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot. Autom. Lett. 5(4), 5558–5565 (2020)
Seichter, D., Köhler, M., Lewandowski, B., Wengefeld, T., Gross, H.M.: Efficient RGB-D semantic segmentation for indoor scene analysis. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 13525–13531
Li, C., Cong, R., Guo, C., Li, H., Zhang, C., Zheng, F., Zhao, Y.: A parallel down-up fusion network for salient object detection in optical remote sensing images. Neural Computi 415, 411–120 (2020)
He, K., Zhang, X., Ren, S., Sun, j.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Huang G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708 (2017)
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5659–5667 (2017)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3146–3154 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998– 6008 (2017)
Berman, M., Triki, A.R., Blaschko, M.B.: The Lovasz-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4413–4421 (2018)
Chen, X., Lin, K.Y., Wang, J., Wu, W., Qian, C., Li, H., Zeng, G.G.: Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 561–577 (2020)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (62371422, 61971247); and the Zhejiang Provincial Natural Science Foundation of China (LY18F020012).
Author information
Authors and Affiliations
Contributions
Jiabao Ma: Contributed to drafting the article. The analysis and interpretation of data associated with the work contained in the article. Wujie Zhou: Made a significant intellectual contribution to the theoretical development, system and experimental design, prototype development, and the analysis and interpretation of data associated with the work contained in the article. Meixin Fang: Reviewing and revising the article for intellectual content. System and experimental design. Ting Luo: Reviewing and revising the article for intellectual content. The analysis and interpretation of data associated with the work contained in the article. Approved the final version of the article as accepted for publication, including references. Prototype development.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Communicated by Marie Katsurai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, J., Zhou, W., Fang, M. et al. DMFTNet: dense multimodal fusion transfer network for free-space detection. Multimedia Systems 30, 226 (2024). https://doi.org/10.1007/s00530-024-01417-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01417-6