Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3611738acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images

Published: 27 October 2023 Publication History

Abstract

Growing interests in multispectral semantic segmentation (MSS) have been witnessed in recent years, thanks to the unique advantages of combining RGB and thermal infrared images to tackle challenging scenarios with adverse conditions. However, unlike traditional RGB-only semantic segmentation, the lack of a large-scale MSS dataset has become a hindrance to the progress of this field. To address this issue, we introduce a SemanticRT dataset - the largest MSS dataset to date, comprising 11,371 high-quality, pixel-level annotated RGB-thermal image pairs. It is 7 times larger than the existing MFNet dataset, and covers a wide variety of challenging scenarios in adverse lighting conditions such as low-light and pitch black. Further, a novel Explicit Complement Modeling (ECM) framework is developed to extract modality-specific information, which is propagated through a robust cross-modal feature encoding and fusion process. Extensive experiments demonstrate the advantages of our approach and dataset over the existing counterparts. Our new dataset may also facilitate further development and evaluation of existing and new MSS algorithms.

References

[1]
Inigo Alonso, Luis Riazuelo, and Ana C Murillo. 2020. Mininet: An efficient semantic segmentation convnet for real-time robotic applications. IEEE Transactions on Robotics, Vol. 36, 4 (2020), 1340--1347.
[2]
Muhammad Arsalan, Muhammad Owais, Tahir Mahmood, Se Woon Cho, and Kang Ryoung Park. 2019. Aiding the diagnosis of diabetic and hypertensive retinopathy using artificial intelligence-based semantic segmentation. Journal of clinical medicine, Vol. 8, 9 (2019), 1446.
[3]
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In ICCV. 9297--9307.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014).
[5]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2017), 834--848.
[6]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV. 801--818.
[7]
Shaohui Chen, Zengzhao Chen, Xiaogang Xu, Ningyu Yang, and Xiuling He. 2020a. Nv-Net: Efficient infrared image segmentation with convolutional neural networks in the low illumination environment. Infrared Physics & Technology, Vol. 105 (2020), 103184.
[8]
Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, and Gang Zeng. 2020b. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In ECCV. 561--577.
[9]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223.
[10]
James W Davis and Vinay Sharma. 2007. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, Vol. 106, 2--3 (2007), 162--182.
[11]
Fuqin Deng, Hua Feng, Mingjian Liang, Hongmin Wang, Yong Yang, Yuan Gao, Junfeng Chen, Junjie Hu, Xiyue Guo, and Tin Lun Lam. 2021. FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation. In IROS. 4467--4473.
[12]
Deng-Ping Fan, Ming-Ming Cheng, Jiang-Jiang Liu, Shang-Hua Gao, Qibin Hou, and Ali Borji. 2018. Salient objects in clutter: Bringing salient object detection to the foreground. In ECCV. 186--202.
[13]
Di Feng, Christian Haase-Schütz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. 2020. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, 3 (2020), 1341--1360.
[14]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In CVPR. 3146--3154.
[15]
Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Pablo Martinez-Gonzalez, and Jose Garcia-Rodriguez. 2018. A survey on deep learning techniques for image and video semantic segmentation. Applied Soft Computing, Vol. 70 (2018), 41--65.
[16]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR. 3354--3361.
[17]
Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada. 2017. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In IROS. 5108--5115.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[19]
Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In ICCV. 603--612.
[20]
INO. 2012. Video Analytics Dataset. https://www.ino.ca/en/technologies/video-analytics-dataset/.
[21]
Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu, Wenbo Li, and Li Cheng. 2023 a. Segment anything is not always perfect: An investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750 (2023).
[22]
Wei Ji, Jingjing Li, Cheng Bian, Zongwei Zhou, Jiaying Zhao, Alan L Yuille, and Li Cheng. 2023 b. Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline. In CVPR. 1094--1104.
[23]
Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, et al. 2021a. Calibrated RGB-D salient object detection. In CVPR. 9471--9481.
[24]
Wei Ji, Ge Yan, Jingjing Li, Yongri Piao, Shunyu Yao, Miao Zhang, Li Cheng, and Huchuan Lu. 2022. DMRA: Depth-induced multi-scale recurrent attention network for RGB-D saliency detection. IEEE Transactions on Image Processing, Vol. 31 (2022), 2321--2336.
[25]
Wei Ji, Shuang Yu, Junde Wu, Kai Ma, Cheng Bian, Qi Bi, Jingjing Li, Hanruo Liu, Li Cheng, and Yefeng Zheng. 2021b. Learning calibrated medical image segmentation via multi-rater agreement modeling. In CVPR. 12341--12351.
[26]
Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. 2021. LLVIP: A Visible-infrared Paired Dataset for Low-light Vision. In ICCVW. 3496--3504.
[27]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
[28]
Fahad Lateef and Yassine Ruichek. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing, Vol. 338 (2019), 321--348.
[29]
Chenglong Li, Wei Xia, Yan Yan, Bin Luo, and Jin Tang. 2020. Segmenting objects in day and night: Edge-conditioned CNN for thermal image semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 7 (2020), 3069--3082.
[30]
Jingjing Li, Wei Ji, Miao Zhang, Yongri Piao, Huchuan Lu, and Li Cheng. 2023. Delving into Calibrated Depth for Accurate RGB-D Salient Object Detection. International Journal of Computer Vision, Vol. 131, 4 (2023), 855--876.
[31]
Jingjing Li, Tianyu Yang, Wei Ji, Jue Wang, and Li Cheng. 2022. Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization. In CVPR. 19914--19924.
[32]
Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR. 1925--1934.
[33]
Songtao Liu, Di Huang, et al. 2018. Receptive field block net for accurate and fast object detection. In ECCV. 385--400.
[34]
Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2015. Semantic Image Segmentation via Deep Parsing Network. In ICCV. 1377--1385.
[35]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012--10022.
[36]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.
[37]
Jiayi Ma, Yong Ma, and Chang Li. 2019a. Infrared and visible image fusion methods and applications: A survey. Information Fusion, Vol. 45 (2019), 153--178.
[38]
Jiayi Ma, Wei Yu, Pengwei Liang, Chang Li, and Junjun Jiang. 2019b. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information fusion, Vol. 48 (2019), 11--26.
[39]
B Maheswari and SR Reeja. 2023. Thermal infrared image semantic segmentation for night-time driving scenes based on deep learning. Multimedia Tools and Applications (2023), 1--26.
[40]
Jiaxu Miao, Yunchao Wei, Yu Wu, Chen Liang, Guangrui Li, and Yi Yang. 2021. Vspw: A large-scale dataset for video scene parsing in the wild. In CVPR. 4133--4143.
[41]
Yujian Mo, Yan Wu, Xinneng Yang, Feilin Liu, and Yujun Liao. 2022. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing, Vol. 493 (2022), 626--646.
[42]
Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. The role of context for object detection and semantic segmentation in the wild. In CVPR. 891--898.
[43]
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In ICCV. 1520--1528.
[44]
Bowen Pan, Jiankai Sun, Ho Yin Tiga Leung, Alex Andonian, and Bolei Zhou. 2020. Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, Vol. 5, 3 (2020), 4867--4873.
[45]
Matthieu Paul, Christoph Mayer, Luc Van Gool, and Radu Timofte. 2020. Efficient video semantic segmentation with labels propagation and refinement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2873--2882.
[46]
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In ICCV. 7254--7263.
[47]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. 234--241.
[48]
Neha Sharma, AS Arora, Ajay Pal Singh, and Jaspreet Singh. 2020. The role of infrared thermal imaging in road patrolling using unmanned aerial vehicles. Unmanned Aerial Vehicle: Applications in Agriculture and Environment (2020), 143--157.
[49]
Shreyas S Shivakumar, Neil Rodrigues, Alex Zhou, Ian D Miller, Vijay Kumar, and Camillo J Taylor. 2020. Pst900: Rgb-thermal calibration, dataset and segmentation network. In ICRA. 9441--9447.
[50]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[51]
Yuxiang Sun, Weixun Zuo, and Ming Liu. 2019. Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters, Vol. 4, 3 (2019), 2576--2583.
[52]
Yuxiang Sun, Weixun Zuo, Peng Yun, Hengli Wang, and Ming Liu. 2020. FuseSeg: semantic segmentation of urban scenes based on RGB and thermal data fusion. IEEE Transactions on Automation Science and Engineering, Vol. 18, 3 (2020), 1000--1011.
[53]
Linfeng Tang, Xinyu Xiang, Hao Zhang, Meiqi Gong, and Jiayi Ma. 2023. DIVFusion: Darkness-free infrared and visible image fusion. Information Fusion, Vol. 91 (2023), 477--493.
[54]
Irem Ulku and Erdem Akagündüz. 2022. A survey on deep learning-based architectures for semantic segmentation on 2d images. Applied Artificial Intelligence (2022), 1--45.
[55]
Peng Wang and Xiangzhi Bai. 2019. Thermal infrared pedestrian segmentation based on conditional GAN. IEEE Transactions on Image Processing, Vol. 28, 12 (2019), 6007--6021.
[56]
Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018a. Understanding convolution for semantic segmentation. In WACV. 1451--1460.
[57]
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV. 568--578.
[58]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018b. Non-Local Neural Networks. In CVPR. 7794--7803.
[59]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In ECCV. 3--19.
[60]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS, Vol. 34 (2021), 12077--12090.
[61]
Haitao Xiong, Wenjie Cai, and Qiong Liu. 2021. MCNet: Multi-level correction network for thermal image semantic segmentation of nighttime driving scene. Infrared Physics & Technology, Vol. 113 (2021), 103628.
[62]
Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, and Kuiyuan Yang. 2018. DenseASPP for Semantic Segmentation in Street Scenes. In CVPR. 3684--3692.
[63]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a discriminative feature network for semantic segmentation. In CVPR. 1857--1866.
[64]
Chi Yuan, Zhixiang Liu, and Youmin Zhang. 2017. Fire detection using infrared images for UAV-based forest fire surveillance. In International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 567--572.
[65]
Yuhui Yuan, Xilin Chen, and Jingdong Wang. 2020. Object-contextual representations for semantic segmentation. In ECCV. 173--190.
[66]
Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. 2018a. Context encoding for semantic segmentation. In CVPR. 7151--7160.
[67]
Hang Zhang, Han Zhang, Chenguang Wang, and Junyuan Xie. 2019b. Co-occurrent features in semantic segmentation. In CVPR. 548--557.
[68]
Miao Zhang, Wei Ji, Yongri Piao, Jingjing Li, Yu Zhang, Shuang Xu, and Huchuan Lu. 2020. LFNet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing, Vol. 29 (2020), 6276--6287.
[69]
Miao Zhang, Jingjing Li, Wei Ji, Yongri Piao, and Huchuan Lu. 2019a. Memory-oriented decoder for light field salient object detection. In NeurIPS. 898--908.
[70]
Miao Zhang, Jie Liu, Yifei Wang, Yongri Piao, Shunyu Yao, Wei Ji, Jingjing Li, Huchuan Lu, and Zhongxuan Luo. 2021a. Dynamic context-sensitive filtering network for video salient object detection. In ICCV. 1553--1563.
[71]
Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, and Xiang Ruan. 2022. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In CVPR. 8886--8895.
[72]
Qiang Zhang, Shenlu Zhao, Yongjiang Luo, Dingwen Zhang, Nianchang Huang, and Jungong Han. 2021b. ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation. In CVPR. 2633--2642.
[73]
Zhenli Zhang, Xiangyu Zhang, Chao Peng, Xiangyang Xue, and Jian Sun. 2018b. Exfuse: Enhancing feature fusion for semantic segmentation. In ECCV. 269--284.
[74]
Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. Icnet for real-time semantic segmentation on high-resolution images. In ECCV. 405--420.
[75]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR. 2881--2890.
[76]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633--641.
[77]
Wujie Zhou, Shaohua Dong, Caie Xu, and Yaguan Qian. 2022. Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing. In AAAI. 3571--3579.
[78]
Wujie Zhou, Xinyang Lin, Jingsheng Lei, Lu Yu, and Jenq-Neng Hwang. 2021. MFFENet: Multiscale feature fusion and enhancement network for RGB--Thermal urban road scene parsing. IEEE Transactions on Multimedia, Vol. 24 (2021), 2526--2538.
[79]
Wujie Zhou, Ying Lv, Jingsheng Lei, and Lu Yu. 2023. Embedded control gate fusion and attention residual learning for RGB-thermal urban scene parsing. IEEE Transactions on Intelligent Transportation Systems (2023).
[80]
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2019. Unet: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, Vol. 39, 6 (2019), 1856--1867.

Cited By

View all
  • (2024)One-shot In-context Part SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680989(10966-10975)Online publication date: 28-Oct-2024
  • (2024)Learning Spectral-Decomposited Tokens for Domain Generalized Semantic SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680906(8159-8168)Online publication date: 28-Oct-2024
  • (2024)Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world ApplicationsMachine Intelligence Research10.1007/s11633-023-1385-021:4(617-630)Online publication date: 12-Apr-2024
  • Show More Cited By

Index Terms

  1. SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. large-scale dataset
    2. multimodal fusion
    3. multispectral images
    4. semantic segmentation
    5. urban scene parsing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)271
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)One-shot In-context Part SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680989(10966-10975)Online publication date: 28-Oct-2024
    • (2024)Learning Spectral-Decomposited Tokens for Domain Generalized Semantic SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680906(8159-8168)Online publication date: 28-Oct-2024
    • (2024)Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world ApplicationsMachine Intelligence Research10.1007/s11633-023-1385-021:4(617-630)Online publication date: 12-Apr-2024
    • (2023)DVSODProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666505(8774-8787)Online publication date: 10-Dec-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media