Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548251acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Crossmodal Few-shot 3D Point Cloud Semantic Segmentation

Published: 10 October 2022 Publication History

Abstract

Recently, few-shot 3D point cloud semantic segmentation methods have been introduced to mitigate the limitations of existing fully supervised approaches, i.e., heavy dependence on labeled 3D data and poor capacity to generalize to new categories. However, those few-shot learning methods need one or few labeled data as support for testing. In practice, such data labeling usually requires manual annotation of large-scale points in 3D space, which can be very difficult and laborious. To address this problem, in this paper we introduce a novel crossmodal few-shot learning approach for 3D point cloud semantic segmentation. In this approach, the point cloud to be segmented is taken as query while one or few labeled 2D RGB images are taken as support to guide the segmentation of query. This way, we only need to annotate on a few 2D support images for the categories of interest. Specifically, we first convert the 2D support images into 3D point cloud format based on both appearance and the estimated depth information. We then introduce a co-embedding network for extracting the features of support and query, both from 3D point cloud format, to fill their domain gap. Finally, we compute the prototypes of support and employ cosine similarity between the prototypes and the query features for final segmentation. Experimental results on two widely-used benchmarks show that, with one or few labeled 2D images as support, our proposed method achieves competitive results against existing few-shot 3D point cloud semantic segmentation methods.

References

[1]
Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3d semantic parsing of large-scale indoor spaces. In IEEE Conference on Computer Vision and Pattern Recognition. 1534--1543.
[2]
Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Depth-aware video frame interpolation. In IEEE Conference on Computer Vision and Pattern Recognition. 3703--3712.
[3]
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In IEEE International Conference on Computer Vision. 9297--9307.
[4]
Liangfu Chen, Zeng Yang, Jianjun Ma, and Zheng Luo. 2018. Driving scene perception network: Real-time joint detection, depth estimation and semantic segmentation. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 1283--1291.
[5]
Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng. 2016. Single-image depth perception in the wild. Advances in Neural Information Processing Systems, Vol. 29 (2016).
[6]
Ming-Ming Cheng, Xiao-Chang Liu, Jie Wang, Shao-Ping Lu, Yu-Kun Lai, and Paul L Rosin. 2019. Structure-preserving neural style transfer. IEEE Transactions on Image Processing, Vol. 29 (2019), 909--920.
[7]
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition. 5828--5839.
[8]
Nanqing Dong and Eric P Xing. 2018. Few-shot semantic segmentation with prototype learning. In BMVC, Vol. 3.
[9]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems, Vol. 27 (2014).
[10]
Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition. 3828--3838.
[11]
Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, and Jie Zhou. 2021. SOSD-Net: Joint semantic object segmentation and depth estimation from monocular images. Neurocomputing, Vol. 440 (2021), 251--263.
[12]
Derek Hoiem, Alexei A Efros, and Martial Hebert. 2005. Geometric context from a single image. In IEEE International Conference on Computer Vision, Vol. 1. IEEE, 654--661.
[13]
Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, and Andrew Markham. 2020. Randla-net: Efficient semantic segmentation of large-scale point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 11108--11117.
[14]
Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, and Kate Saenko. 2017. Modeling relationships in referential expressions with compositional modular networks. In IEEE Conference on Computer Vision and Pattern Recognition. 1115--1124.
[15]
Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016. Natural language object retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 4555--4564.
[16]
Qiangui Huang, Weiyue Wang, and Ulrich Neumann. 2018. Recurrent slice networks for 3d segmentation of point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 2626--2635.
[17]
Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Émilie Wirbel, and Patrick Pérez. 2021. Cross-modal learning for domain adaptation in 3d semantic segmentation. arXiv preprint arXiv:2101.07253 (2021).
[18]
Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, et al. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Lille, 0.
[19]
Loic Landrieu and Martin Simonovsky. 2018. Large-scale point cloud semantic segmentation with superpoint graphs. In IEEE Conference on Computer Vision and Pattern Recognition. 4558--4567.
[20]
Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. Pointcnn: Convolution on x-transformed points. Advances in Neural Information Processing Systems, Vol. 31 (2018).
[21]
Zhengqi Li and Noah Snavely. 2018. Megadepth: Learning single-view depth prediction from internet photos. In IEEE Conference on Computer Vision and Pattern Recognition. 2041--2050.
[22]
Fayao Liu, Chunhua Shen, and Guosheng Lin. 2015a. Deep convolutional neural fields for depth estimation from a single image. In IEEE Conference on Computer Vision and Pattern Recognition. 5162--5170.
[23]
Fayao Liu, Chunhua Shen, Guosheng Lin, and Ian Reid. 2015b. Learning depth from single monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 10 (2015), 2024--2039.
[24]
Khoi Nguyen and Sinisa Todorovic. 2019. Feature weighting and boosting for few-shot segmentation. In IEEE International Conference on Computer Vision. 622--631.
[25]
Duo Peng, Yinjie Lei, Wen Li, Pingping Zhang, and Yulan Guo. 2021. Sparse-to-dense feature matching: Intra and inter domain cross-modal learning in domain adaptation for 3d semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 7108--7117.
[26]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 652--660.
[27]
Anirban Roy and Sinisa Todorovic. 2016. Monocular depth estimation using neural regression forest. In IEEE Conference on Computer Vision and Pattern Recognition. 5506--5514.
[28]
Ashutosh Saxena, Sung Chung, and Andrew Ng. 2005. Learning depth from single monocular images. Advances in Neural Information Processing Systems, Vol. 18 (2005).
[29]
Daniel Teso-Fz-Beto no, Ekaitz Zulueta, Ander Sánchez-Chica, Unai Fernandez-Gamiz, and Aitor Saenz-Aguirre. 2020. Semantic segmentation to develop an indoor navigation system for an autonomous mobile robot. Mathematics, Vol. 8, 5 (2020), 855.
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, Vol. 30 (2017).
[31]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems, Vol. 29 (2016).
[32]
Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019a. Panet: Few-shot image semantic segmentation with prototype alignment. In IEEE International Conference on Computer Vision. 9197--9206.
[33]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2019b. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), Vol. 38, 5 (2019), 1--12.
[34]
Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, and Michael Firman. 2021. The temporal opportunist: Self-supervised multi-frame monocular depth. In IEEE Conference on Computer Vision and Pattern Recognition. 1164--1174.
[35]
Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. 2019. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10502--10511.
[36]
Xiaoqing Ye, Jiamao Li, Hexiao Huang, Liang Du, and Xiaolin Zhang. 2018. 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In European Conference on Computer Vision. 403--417.
[37]
Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, and Tamara L Berg. 2018. Mattnet: Modular attention network for referring expression comprehension. In IEEE Conference on Computer Vision and Pattern Recognition. 1307--1315.
[38]
Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. 2019a. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In IEEE International Conference on Computer Vision. 9587--9595.
[39]
Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. 2019b. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition. 5217--5226.
[40]
Na Zhao, Tat-Seng Chua, and Gim Hee Lee. 2021a. Few-shot 3d point cloud semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 8873--8882.
[41]
Na Zhao, Tat-Seng Chua, and Gim Hee Lee. 2021b. Ps2-net: A locally and globally aware network for point-based semantic segmentation. In International Conference on Pattern Recognition. IEEE, 723--730.
[42]
Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. 2021. Mixstyle neural networks for domain generalization and adaptation. In International Conference on Learning Representations.

Cited By

View all
  • (2025)HFA-Net: hybrid feature-aware network for large-scale point cloud semantic segmentationArtificial Intelligence Review10.1007/s10462-025-11111-258:4Online publication date: 25-Jan-2025
  • (2024)Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View SynthesisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681428(8777-8785)Online publication date: 28-Oct-2024
  • (2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Crossmodal Few-shot 3D Point Cloud Semantic Segmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D point cloud
    2. crossmodal
    3. semantic segmentation

    Qualifiers

    • Research-article

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)HFA-Net: hybrid feature-aware network for large-scale point cloud semantic segmentationArtificial Intelligence Review10.1007/s10462-025-11111-258:4Online publication date: 25-Jan-2025
    • (2024)Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View SynthesisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681428(8777-8785)Online publication date: 28-Oct-2024
    • (2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
    • (2024)Progressive Point Cloud Denoising with Cross-Stage Cross-Coder Adaptive Edge Graph Convolution NetworkProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680687(6578-6587)Online publication date: 28-Oct-2024
    • (2024)A Survey of Label-Efficient Deep Learning for 3D Point CloudsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341630246:12(9139-9160)Online publication date: Dec-2024
    • (2024)Few-Shot Point Cloud Semantic Segmentation via Support-Query Feature InteractionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341723334:11(10753-10763)Online publication date: Nov-2024
    • (2024)Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00045(810-819)Online publication date: 18-Mar-2024
    • (2024)EINet: Point Cloud Completion via Extrapolation and InterpolationComputer Vision – ECCV 202410.1007/978-3-031-73661-2_21(377-393)Online publication date: 10-Nov-2024
    • (2023)Factorized Omnidirectional Representation based Vision GNN for Anisotropic 3D Multimodal MR Image SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613787(1607-1615)Online publication date: 26-Oct-2023
    • (2023)Exploring Dual Representations in Large-Scale Point Clouds: A Simple Weakly Supervised Semantic Segmentation FrameworkProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612224(2371-2380)Online publication date: 26-Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media