research-article

Crossmodal Few-shot 3D Point Cloud Semantic Segmentation

Authors:

Song WangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 4760 - 4768

https://doi.org/10.1145/3503161.3548251

Published: 10 October 2022 Publication History

Abstract

Recently, few-shot 3D point cloud semantic segmentation methods have been introduced to mitigate the limitations of existing fully supervised approaches, i.e., heavy dependence on labeled 3D data and poor capacity to generalize to new categories. However, those few-shot learning methods need one or few labeled data as support for testing. In practice, such data labeling usually requires manual annotation of large-scale points in 3D space, which can be very difficult and laborious. To address this problem, in this paper we introduce a novel crossmodal few-shot learning approach for 3D point cloud semantic segmentation. In this approach, the point cloud to be segmented is taken as query while one or few labeled 2D RGB images are taken as support to guide the segmentation of query. This way, we only need to annotate on a few 2D support images for the categories of interest. Specifically, we first convert the 2D support images into 3D point cloud format based on both appearance and the estimated depth information. We then introduce a co-embedding network for extracting the features of support and query, both from 3D point cloud format, to fill their domain gap. Finally, we compute the prototypes of support and employ cosine similarity between the prototypes and the query features for final segmentation. Experimental results on two widely-used benchmarks show that, with one or few labeled 2D images as support, our proposed method achieves competitive results against existing few-shot 3D point cloud semantic segmentation methods.

References

[1]

Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3d semantic parsing of large-scale indoor spaces. In IEEE Conference on Computer Vision and Pattern Recognition. 1534--1543.

[2]

Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Depth-aware video frame interpolation. In IEEE Conference on Computer Vision and Pattern Recognition. 3703--3712.

[3]

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In IEEE International Conference on Computer Vision. 9297--9307.

[4]

Liangfu Chen, Zeng Yang, Jianjun Ma, and Zheng Luo. 2018. Driving scene perception network: Real-time joint detection, depth estimation and semantic segmentation. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 1283--1291.

[5]

Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng. 2016. Single-image depth perception in the wild. Advances in Neural Information Processing Systems, Vol. 29 (2016).

[6]

Ming-Ming Cheng, Xiao-Chang Liu, Jie Wang, Shao-Ping Lu, Yu-Kun Lai, and Paul L Rosin. 2019. Structure-preserving neural style transfer. IEEE Transactions on Image Processing, Vol. 29 (2019), 909--920.

Digital Library

[7]

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition. 5828--5839.

[8]

Nanqing Dong and Eric P Xing. 2018. Few-shot semantic segmentation with prototype learning. In BMVC, Vol. 3.

[9]

David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems, Vol. 27 (2014).

[10]

Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition. 3828--3838.

[11]

Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, and Jie Zhou. 2021. SOSD-Net: Joint semantic object segmentation and depth estimation from monocular images. Neurocomputing, Vol. 440 (2021), 251--263.

[12]

Derek Hoiem, Alexei A Efros, and Martial Hebert. 2005. Geometric context from a single image. In IEEE International Conference on Computer Vision, Vol. 1. IEEE, 654--661.

Digital Library

[13]

Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, and Andrew Markham. 2020. Randla-net: Efficient semantic segmentation of large-scale point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 11108--11117.

[14]

Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, and Kate Saenko. 2017. Modeling relationships in referential expressions with compositional modular networks. In IEEE Conference on Computer Vision and Pattern Recognition. 1115--1124.

[15]

Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016. Natural language object retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 4555--4564.

[16]

Qiangui Huang, Weiyue Wang, and Ulrich Neumann. 2018. Recurrent slice networks for 3d segmentation of point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 2626--2635.

[17]

Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Émilie Wirbel, and Patrick Pérez. 2021. Cross-modal learning for domain adaptation in 3d semantic segmentation. arXiv preprint arXiv:2101.07253 (2021).

[18]

Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, et al. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Lille, 0.

[19]

Loic Landrieu and Martin Simonovsky. 2018. Large-scale point cloud semantic segmentation with superpoint graphs. In IEEE Conference on Computer Vision and Pattern Recognition. 4558--4567.

[20]

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. Pointcnn: Convolution on x-transformed points. Advances in Neural Information Processing Systems, Vol. 31 (2018).

[21]

Zhengqi Li and Noah Snavely. 2018. Megadepth: Learning single-view depth prediction from internet photos. In IEEE Conference on Computer Vision and Pattern Recognition. 2041--2050.

[22]

Fayao Liu, Chunhua Shen, and Guosheng Lin. 2015a. Deep convolutional neural fields for depth estimation from a single image. In IEEE Conference on Computer Vision and Pattern Recognition. 5162--5170.

[23]

Fayao Liu, Chunhua Shen, Guosheng Lin, and Ian Reid. 2015b. Learning depth from single monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 10 (2015), 2024--2039.

[24]

Khoi Nguyen and Sinisa Todorovic. 2019. Feature weighting and boosting for few-shot segmentation. In IEEE International Conference on Computer Vision. 622--631.

[25]

Duo Peng, Yinjie Lei, Wen Li, Pingping Zhang, and Yulan Guo. 2021. Sparse-to-dense feature matching: Intra and inter domain cross-modal learning in domain adaptation for 3d semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 7108--7117.

[26]

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 652--660.

[27]

Anirban Roy and Sinisa Todorovic. 2016. Monocular depth estimation using neural regression forest. In IEEE Conference on Computer Vision and Pattern Recognition. 5506--5514.

[28]

Ashutosh Saxena, Sung Chung, and Andrew Ng. 2005. Learning depth from single monocular images. Advances in Neural Information Processing Systems, Vol. 18 (2005).

[29]

Daniel Teso-Fz-Beto no, Ekaitz Zulueta, Ander Sánchez-Chica, Unai Fernandez-Gamiz, and Aitor Saenz-Aguirre. 2020. Semantic segmentation to develop an indoor navigation system for an autonomous mobile robot. Mathematics, Vol. 8, 5 (2020), 855.

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, Vol. 30 (2017).

[31]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems, Vol. 29 (2016).

[32]

Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019a. Panet: Few-shot image semantic segmentation with prototype alignment. In IEEE International Conference on Computer Vision. 9197--9206.

[33]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2019b. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), Vol. 38, 5 (2019), 1--12.

[34]

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, and Michael Firman. 2021. The temporal opportunist: Self-supervised multi-frame monocular depth. In IEEE Conference on Computer Vision and Pattern Recognition. 1164--1174.

[35]

Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. 2019. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10502--10511.

[36]

Xiaoqing Ye, Jiamao Li, Hexiao Huang, Liang Du, and Xiaolin Zhang. 2018. 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In European Conference on Computer Vision. 403--417.

[37]

Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, and Tamara L Berg. 2018. Mattnet: Modular attention network for referring expression comprehension. In IEEE Conference on Computer Vision and Pattern Recognition. 1307--1315.

[38]

Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. 2019a. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In IEEE International Conference on Computer Vision. 9587--9595.

[39]

Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. 2019b. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition. 5217--5226.

[40]

Na Zhao, Tat-Seng Chua, and Gim Hee Lee. 2021a. Few-shot 3d point cloud semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 8873--8882.

[41]

Na Zhao, Tat-Seng Chua, and Gim Hee Lee. 2021b. Ps2-net: A locally and globally aware network for point-based semantic segmentation. In International Conference on Pattern Recognition. IEEE, 723--730.

[42]

Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. 2021. Mixstyle neural networks for domain generalization and adaptation. In International Conference on Learning Representations.

Cited By

Wen CZhang LRen JHong RLi CYang CLv YChen Hyang N(2025)HFA-Net: hybrid feature-aware network for large-scale point cloud semantic segmentationArtificial Intelligence Review10.1007/s10462-025-11111-258:4Online publication date: 25-Jan-2025
https://doi.org/10.1007/s10462-025-11111-2
Zhao ZCai PZhang CLi XWang SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View SynthesisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681428(8777-8785)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681428
Han XRen YYao YSun YMa YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680720
Show More Cited By

Index Terms

Crossmodal Few-shot 3D Point Cloud Semantic Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Few-shot 3D Point Cloud Semantic Segmentation with Prototype Alignment
ICMLT '23: Proceedings of the 2023 8th International Conference on Machine Learning Technologies

Semantic Segmentation for 3D point clouds has made great progress in recent years. Most existing approaches for 3D point cloud segmentation are fully supervised, and they require a large number of well-annotated data for training. The training data is ...
Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...
Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set, which limits their application in real-world scenarios due to the lack of generalization ability. Large-scale visual-language pre-trained ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)4

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wen CZhang LRen JHong RLi CYang CLv YChen Hyang N(2025)HFA-Net: hybrid feature-aware network for large-scale point cloud semantic segmentationArtificial Intelligence Review10.1007/s10462-025-11111-258:4Online publication date: 25-Jan-2025
https://doi.org/10.1007/s10462-025-11111-2
Zhao ZCai PZhang CLi XWang SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View SynthesisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681428(8777-8785)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681428
Han XRen YYao YSun YMa YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680720
Chen WFan HJiang QHuang CYang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Progressive Point Cloud Denoising with Cross-Stage Cross-Coder Adaptive Edge Graph Convolution NetworkProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680687(6578-6587)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680687
Xiao AZhang XShao LLu S(2024)A Survey of Label-Efficient Deep Learning for 3D Point CloudsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341630246:12(9139-9160)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3416302
Zheng CLiu LMeng YPeng XWang M(2024)Few-Shot Point Cloud Semantic Segmentation via Support-Query Feature InteractionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341723334:11(10753-10763)Online publication date: Nov-2024
https://doi.org/10.1109/TCSVT.2024.3417233
Liu JYin WWang HChen YSonke JGavves E(2024)Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00045(810-819)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00045
Cai PZhang CShi LWang LImanpour NWang S(2024)EINet: Point Cloud Completion via Extrapolation and InterpolationComputer Vision – ECCV 202410.1007/978-3-031-73661-2_21(377-393)Online publication date: 10-Nov-2024
https://doi.org/10.1007/978-3-031-73661-2_21
Zhang BTan YZhang ZLiu WGao HXi ZWang WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Factorized Omnidirectional Representation based Vision GNN for Anisotropic 3D Multimodal MR Image SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613787(1607-1615)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613787
Liu JWu YGong MMiao QMa WXu CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Exploring Dual Representations in Large-Scale Point Clouds: A Simple Weakly Supervised Semantic Segmentation FrameworkProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612224(2371-2380)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612224
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten