research-article

SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images

Authors:

Zhicheng Zhang,

Li ChengAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 3307 - 3316

https://doi.org/10.1145/3581783.3611738

Published: 27 October 2023 Publication History

Abstract

Growing interests in multispectral semantic segmentation (MSS) have been witnessed in recent years, thanks to the unique advantages of combining RGB and thermal infrared images to tackle challenging scenarios with adverse conditions. However, unlike traditional RGB-only semantic segmentation, the lack of a large-scale MSS dataset has become a hindrance to the progress of this field. To address this issue, we introduce a SemanticRT dataset - the largest MSS dataset to date, comprising 11,371 high-quality, pixel-level annotated RGB-thermal image pairs. It is 7 times larger than the existing MFNet dataset, and covers a wide variety of challenging scenarios in adverse lighting conditions such as low-light and pitch black. Further, a novel Explicit Complement Modeling (ECM) framework is developed to extract modality-specific information, which is propagated through a robust cross-modal feature encoding and fusion process. Extensive experiments demonstrate the advantages of our approach and dataset over the existing counterparts. Our new dataset may also facilitate further development and evaluation of existing and new MSS algorithms.

References

[1]

Inigo Alonso, Luis Riazuelo, and Ana C Murillo. 2020. Mininet: An efficient semantic segmentation convnet for real-time robotic applications. IEEE Transactions on Robotics, Vol. 36, 4 (2020), 1340--1347.

Digital Library

[2]

Muhammad Arsalan, Muhammad Owais, Tahir Mahmood, Se Woon Cho, and Kang Ryoung Park. 2019. Aiding the diagnosis of diabetic and hypertensive retinopathy using artificial intelligence-based semantic segmentation. Journal of clinical medicine, Vol. 8, 9 (2019), 1446.

[3]

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In ICCV. 9297--9307.

[4]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014).

[5]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2017), 834--848.

[6]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV. 801--818.

[7]

Shaohui Chen, Zengzhao Chen, Xiaogang Xu, Ningyu Yang, and Xiuling He. 2020a. Nv-Net: Efficient infrared image segmentation with convolutional neural networks in the low illumination environment. Infrared Physics & Technology, Vol. 105 (2020), 103184.

[8]

Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, and Gang Zeng. 2020b. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In ECCV. 561--577.

[9]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223.

[10]

James W Davis and Vinay Sharma. 2007. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, Vol. 106, 2--3 (2007), 162--182.

Digital Library

[11]

Fuqin Deng, Hua Feng, Mingjian Liang, Hongmin Wang, Yong Yang, Yuan Gao, Junfeng Chen, Junjie Hu, Xiyue Guo, and Tin Lun Lam. 2021. FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation. In IROS. 4467--4473.

[12]

Deng-Ping Fan, Ming-Ming Cheng, Jiang-Jiang Liu, Shang-Hua Gao, Qibin Hou, and Ali Borji. 2018. Salient objects in clutter: Bringing salient object detection to the foreground. In ECCV. 186--202.

[13]

Di Feng, Christian Haase-Schütz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. 2020. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, 3 (2020), 1341--1360.

Digital Library

[14]

Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In CVPR. 3146--3154.

[15]

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Pablo Martinez-Gonzalez, and Jose Garcia-Rodriguez. 2018. A survey on deep learning techniques for image and video semantic segmentation. Applied Soft Computing, Vol. 70 (2018), 41--65.

[16]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR. 3354--3361.

[17]

Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada. 2017. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In IROS. 5108--5115.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[19]

Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In ICCV. 603--612.

[20]

INO. 2012. Video Analytics Dataset. https://www.ino.ca/en/technologies/video-analytics-dataset/.

[21]

Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu, Wenbo Li, and Li Cheng. 2023 a. Segment anything is not always perfect: An investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750 (2023).

[22]

Wei Ji, Jingjing Li, Cheng Bian, Zongwei Zhou, Jiaying Zhao, Alan L Yuille, and Li Cheng. 2023 b. Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline. In CVPR. 1094--1104.

[23]

Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, et al. 2021a. Calibrated RGB-D salient object detection. In CVPR. 9471--9481.

[24]

Wei Ji, Ge Yan, Jingjing Li, Yongri Piao, Shunyu Yao, Miao Zhang, Li Cheng, and Huchuan Lu. 2022. DMRA: Depth-induced multi-scale recurrent attention network for RGB-D saliency detection. IEEE Transactions on Image Processing, Vol. 31 (2022), 2321--2336.

[25]

Wei Ji, Shuang Yu, Junde Wu, Kai Ma, Cheng Bian, Qi Bi, Jingjing Li, Hanruo Liu, Li Cheng, and Yefeng Zheng. 2021b. Learning calibrated medical image segmentation via multi-rater agreement modeling. In CVPR. 12341--12351.

[26]

Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. 2021. LLVIP: A Visible-infrared Paired Dataset for Low-light Vision. In ICCVW. 3496--3504.

[27]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).

[28]

Fahad Lateef and Yassine Ruichek. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing, Vol. 338 (2019), 321--348.

Digital Library

[29]

Chenglong Li, Wei Xia, Yan Yan, Bin Luo, and Jin Tang. 2020. Segmenting objects in day and night: Edge-conditioned CNN for thermal image semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 7 (2020), 3069--3082.

[30]

Jingjing Li, Wei Ji, Miao Zhang, Yongri Piao, Huchuan Lu, and Li Cheng. 2023. Delving into Calibrated Depth for Accurate RGB-D Salient Object Detection. International Journal of Computer Vision, Vol. 131, 4 (2023), 855--876.

Digital Library

[31]

Jingjing Li, Tianyu Yang, Wei Ji, Jue Wang, and Li Cheng. 2022. Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization. In CVPR. 19914--19924.

[32]

Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR. 1925--1934.

[33]

Songtao Liu, Di Huang, et al. 2018. Receptive field block net for accurate and fast object detection. In ECCV. 385--400.

[34]

Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2015. Semantic Image Segmentation via Deep Parsing Network. In ICCV. 1377--1385.

[35]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012--10022.

[36]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.

[37]

Jiayi Ma, Yong Ma, and Chang Li. 2019a. Infrared and visible image fusion methods and applications: A survey. Information Fusion, Vol. 45 (2019), 153--178.

[38]

Jiayi Ma, Wei Yu, Pengwei Liang, Chang Li, and Junjun Jiang. 2019b. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information fusion, Vol. 48 (2019), 11--26.

[39]

B Maheswari and SR Reeja. 2023. Thermal infrared image semantic segmentation for night-time driving scenes based on deep learning. Multimedia Tools and Applications (2023), 1--26.

[40]

Jiaxu Miao, Yunchao Wei, Yu Wu, Chen Liang, Guangrui Li, and Yi Yang. 2021. Vspw: A large-scale dataset for video scene parsing in the wild. In CVPR. 4133--4143.

[41]

Yujian Mo, Yan Wu, Xinneng Yang, Feilin Liu, and Yujun Liao. 2022. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing, Vol. 493 (2022), 626--646.

Digital Library

[42]

Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. The role of context for object detection and semantic segmentation in the wild. In CVPR. 891--898.

[43]

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In ICCV. 1520--1528.

[44]

Bowen Pan, Jiankai Sun, Ho Yin Tiga Leung, Alex Andonian, and Bolei Zhou. 2020. Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, Vol. 5, 3 (2020), 4867--4873.

[45]

Matthieu Paul, Christoph Mayer, Luc Van Gool, and Radu Timofte. 2020. Efficient video semantic segmentation with labels propagation and refinement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2873--2882.

[46]

Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In ICCV. 7254--7263.

[47]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. 234--241.

[48]

Neha Sharma, AS Arora, Ajay Pal Singh, and Jaspreet Singh. 2020. The role of infrared thermal imaging in road patrolling using unmanned aerial vehicles. Unmanned Aerial Vehicle: Applications in Agriculture and Environment (2020), 143--157.

[49]

Shreyas S Shivakumar, Neil Rodrigues, Alex Zhou, Ian D Miller, Vijay Kumar, and Camillo J Taylor. 2020. Pst900: Rgb-thermal calibration, dataset and segmentation network. In ICRA. 9441--9447.

[50]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[51]

Yuxiang Sun, Weixun Zuo, and Ming Liu. 2019. Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters, Vol. 4, 3 (2019), 2576--2583.

[52]

Yuxiang Sun, Weixun Zuo, Peng Yun, Hengli Wang, and Ming Liu. 2020. FuseSeg: semantic segmentation of urban scenes based on RGB and thermal data fusion. IEEE Transactions on Automation Science and Engineering, Vol. 18, 3 (2020), 1000--1011.

[53]

Linfeng Tang, Xinyu Xiang, Hao Zhang, Meiqi Gong, and Jiayi Ma. 2023. DIVFusion: Darkness-free infrared and visible image fusion. Information Fusion, Vol. 91 (2023), 477--493.

Digital Library

[54]

Irem Ulku and Erdem Akagündüz. 2022. A survey on deep learning-based architectures for semantic segmentation on 2d images. Applied Artificial Intelligence (2022), 1--45.

[55]

Peng Wang and Xiangzhi Bai. 2019. Thermal infrared pedestrian segmentation based on conditional GAN. IEEE Transactions on Image Processing, Vol. 28, 12 (2019), 6007--6021.

[56]

Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018a. Understanding convolution for semantic segmentation. In WACV. 1451--1460.

[57]

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV. 568--578.

[58]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018b. Non-Local Neural Networks. In CVPR. 7794--7803.

[59]

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In ECCV. 3--19.

[60]

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS, Vol. 34 (2021), 12077--12090.

[61]

Haitao Xiong, Wenjie Cai, and Qiong Liu. 2021. MCNet: Multi-level correction network for thermal image semantic segmentation of nighttime driving scene. Infrared Physics & Technology, Vol. 113 (2021), 103628.

[62]

Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, and Kuiyuan Yang. 2018. DenseASPP for Semantic Segmentation in Street Scenes. In CVPR. 3684--3692.

[63]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a discriminative feature network for semantic segmentation. In CVPR. 1857--1866.

[64]

Chi Yuan, Zhixiang Liu, and Youmin Zhang. 2017. Fire detection using infrared images for UAV-based forest fire surveillance. In International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, 567--572.

[65]

Yuhui Yuan, Xilin Chen, and Jingdong Wang. 2020. Object-contextual representations for semantic segmentation. In ECCV. 173--190.

[66]

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. 2018a. Context encoding for semantic segmentation. In CVPR. 7151--7160.

[67]

Hang Zhang, Han Zhang, Chenguang Wang, and Junyuan Xie. 2019b. Co-occurrent features in semantic segmentation. In CVPR. 548--557.

[68]

Miao Zhang, Wei Ji, Yongri Piao, Jingjing Li, Yu Zhang, Shuang Xu, and Huchuan Lu. 2020. LFNet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing, Vol. 29 (2020), 6276--6287.

[69]

Miao Zhang, Jingjing Li, Wei Ji, Yongri Piao, and Huchuan Lu. 2019a. Memory-oriented decoder for light field salient object detection. In NeurIPS. 898--908.

[70]

Miao Zhang, Jie Liu, Yifei Wang, Yongri Piao, Shunyu Yao, Wei Ji, Jingjing Li, Huchuan Lu, and Zhongxuan Luo. 2021a. Dynamic context-sensitive filtering network for video salient object detection. In ICCV. 1553--1563.

[71]

Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, and Xiang Ruan. 2022. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In CVPR. 8886--8895.

[72]

Qiang Zhang, Shenlu Zhao, Yongjiang Luo, Dingwen Zhang, Nianchang Huang, and Jungong Han. 2021b. ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation. In CVPR. 2633--2642.

[73]

Zhenli Zhang, Xiangyu Zhang, Chao Peng, Xiangyang Xue, and Jian Sun. 2018b. Exfuse: Enhancing feature fusion for semantic segmentation. In ECCV. 269--284.

[74]

Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. Icnet for real-time semantic segmentation on high-resolution images. In ECCV. 405--420.

[75]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR. 2881--2890.

[76]

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633--641.

[77]

Wujie Zhou, Shaohua Dong, Caie Xu, and Yaguan Qian. 2022. Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing. In AAAI. 3571--3579.

[78]

Wujie Zhou, Xinyang Lin, Jingsheng Lei, Lu Yu, and Jenq-Neng Hwang. 2021. MFFENet: Multiscale feature fusion and enhancement network for RGB--Thermal urban road scene parsing. IEEE Transactions on Multimedia, Vol. 24 (2021), 2526--2538.

Digital Library

[79]

Wujie Zhou, Ying Lv, Jingsheng Lei, and Lu Yu. 2023. Embedded control gate fusion and attention residual learning for RGB-thermal urban scene parsing. IEEE Transactions on Intelligent Transportation Systems (2023).

Digital Library

[80]

Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2019. Unet: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, Vol. 39, 6 (2019), 1856--1867.

Cited By

Dai ZLiu TZhang XWei YZhang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)One-shot In-context Part SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680989(10966-10975)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680989
Yi JBi QZheng HZhan HJi WHuang YLi YZheng YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Learning Spectral-Decomposited Tokens for Domain Generalized Semantic SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680906(8159-8168)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680906
Ji WLi JBi QLiu TLi WCheng L(2024)Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world ApplicationsMachine Intelligence Research10.1007/s11633-023-1385-021:4(617-630)Online publication date: 12-Apr-2024
https://doi.org/10.1007/s11633-023-1385-0
Show More Cited By

Index Terms

SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

UEC-FoodPix Complete: A Large-Scale Food Image Segmentation Dataset
Pattern Recognition. ICPR International Workshops and Challenges
Abstract
Currently, many segmentation image datasets are open to the public. However, only a few open segmentation image dataset of food images exists. Among them, UEC-FoodPix is a large-scale food image segmentation dataset which consists of 10,000 food ...
AMONuSeg: A Histological Dataset for African Multi-organ Nuclei Semantic Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024
Abstract
Nuclei semantic segmentation is a key component for advancing machine learning and deep learning applications in digital pathology. However, most existing segmentation models are trained and tested on high-quality data acquired with expensive ...
Robust index-based semantic plant/background segmentation for RGB- images
Highlights
- Novel robust plant/background segmentation algorithm for RGB images was described.
Abstract
Plant/background segmentation is a key component of digital image analysis in agriculture. It can be used for yield prediction models, crop growth, disease diagnosis and automated navigation tasks. In particular, well-known ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants
Guangzhou Key Research and Development Project
National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
292
Total Downloads

Downloads (Last 12 months)271
Downloads (Last 6 weeks)17

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dai ZLiu TZhang XWei YZhang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)One-shot In-context Part SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680989(10966-10975)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680989
Yi JBi QZheng HZhan HJi WHuang YLi YZheng YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Learning Spectral-Decomposited Tokens for Domain Generalized Semantic SegmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680906(8159-8168)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680906
Ji WLi JBi QLiu TLi WCheng L(2024)Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world ApplicationsMachine Intelligence Research10.1007/s11633-023-1385-021:4(617-630)Online publication date: 12-Apr-2024
https://doi.org/10.1007/s11633-023-1385-0
Li JJi WWang SLi WCheng LOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)DVSODProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666505(8774-8787)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666505

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents