Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72940-9_3guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Canonical Shape Projection Is All You Need for 3D Few-Shot Class Incremental Learning

Published: 17 November 2024 Publication History

Abstract

In recent years, robust pre-trained foundation models have been successfully used in many downstream tasks. Here, we would like to use such powerful models to address the problem of few-shot class incremental learning (FSCIL) tasks on 3D point cloud objects. Our approach is to reprogram the well-known CLIP-based foundation model (trained on 2D images and text pairs) for this purpose. The CLIP model works by ingesting 2D images, so to leverage it in our context, we project the 3D object point cloud onto 2D image space to create proper depth maps. For this, prior works consider a fixed and non-trainable set of camera poses. In contrast, we propose to train the network to find a projection that best describes the object and is appropriate for extracting 2D image features from the CLIP vision encoder. Directly using the generated depth map is not suitable for the CLIP model, so we apply the model reprogramming paradigm to the depth map to augment the foreground and background to adapt it. This removes the need for modification or fine-tuning of the foundation model. In the setting we have investigated, we have limited access to data from novel classes, resulting in a problem with overfitting. Here, we address this problem via the use of a prompt engineering approach using multiple GPT-generated text descriptions. Our method, C3PR, successfully outperforms existing FSCIL methods on ModelNet, ShapeNet, ScanObjectNN, and CO3D datasets. The code is available at https://github.com/alichr/C3PR.

References

[1]
Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep networks? In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/bf424cb7b0dea050a42b9739eb261a3a-Paper.pdf
[2]
Belouadah, E., Popescu, A.: IL2M: class incremental learning with dual memory. In: CVPR (2019)
[3]
Belouadah, E., Popescu, A.: ScaIL: classifier weights scaling for class incremental learning. In: WACV (2020)
[4]
Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[5]
Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, and Alahari K Ferrari V, Hebert M, Sminchisescu C, and Weiss Y End-to-end incremental learning Computer Vision – ECCV 2018 2018 Cham Springer 241-257
[6]
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
[7]
Chen, K., Lee, C.G.: Incremental few-shot learning via vector quantization in deep embedded space. In: ICLR (2021)
[8]
Chen, P.Y.: Model reprogramming: resource-efficient cross-domain machine learning (2023)
[9]
Cheraghian, A., Rahman, S., Fang, P., Roy, S.K., Petersson, L., Harandi, M.: Semantic-aware knowledge distillation for few-shot class-incremental learning. In: CVPR (2021)
[10]
Cheraghian, A., et al.: Synthesized feature based few-shot class-incremental learning on a mixture of subspaces. In: ICCV (2021)
[11]
Chowdhury T, Cheraghian A, Ramasinghe S, Ahmadi S, Saberi M, and Rahman S Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T Few-shot class-incremental learning for 3D point cloud objects Computer Vision - ECCV 2022 2022 Cham Springer 204-220
[12]
Dinh, T., Seo, D., Du, Z., Shang, L., Lee, K.: Improved input reprogramming for GAN conditioning (2022)
[13]
Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. In: ICLR (2021)
[14]
Elsayed, G.F., Goodfellow, I., Sohl-Dickstein, J.: Adversarial reprogramming of neural networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Syx_Ss05tm
[15]
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper_files/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf
[16]
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 4904–4916. PMLR, 18–24 July 2021. https://proceedings.mlr.press/v139/jia21b.html
[17]
Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13693, pp. 709–727. Springer, Cham (2022).
[18]
Lee, D.H., Pujara, J., Sewak, M., White, R.W., Jauhar, S.K.: Making large language models better data creators. In: The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2023). https://openreview.net/forum?id=2Rdfdri2oT
[19]
Lee, K.Y., Zhong, Y., Wang, Y.X.: Do pre-trained models benefit equally in continual learning? In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6485–6493, January 2023
[20]
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: NeurIPS (2018)
[21]
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
[22]
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: CVPR (2019)
[23]
Mazumder, P., Singh, P., Rai, P.: Few-shot lifelong learning. In: AAAI (2021)
[24]
Pei, Y., et al.: Learning a condensed frame for memory-efficient video class-incremental learning. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022). https://openreview.net/forum?id=lCGYC7pXWNQ
[25]
Poulenard, A., Rakotosaona, M.J., Ponty, Y., Ovsjanikov, M.: Effective rotation-invariant point CNN with spherical harmonics kernels. In: 3DV (2019)
[26]
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
[27]
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
[28]
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR, 18–24 July 2021. https://proceedings.mlr.press/v139/radford21a.html
[29]
Rao, Y., Lu, J., Zhou, J.: Spherical fractal convolutional neural networks for point cloud recognition. In: CVPR (2019)
[30]
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: ICCV (2021)
[31]
Ronneberger O, Fischer P, and Brox T Navab N, Hornegger J, Wells WM, and Frangi AF U-Net: convolutional networks for biomedical image segmentation Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 2015 Cham Springer 234-241
[32]
Singh, A., et al.: FLAVA: a foundational language and vision alignment model. In: CVPR (2022)
[33]
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
[34]
Tan, Y., Xiang, X.: Cross-domain few-shot incremental learning for point-cloud recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2307–2316, January 2024
[35]
Tan, Z., Ding, K., Guo, R., Liu, H.: Graph few-shot class-incremental learning. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (2022)
[36]
Tao, X., Hong, X., Chang, X., Dong, S., Wei, X., Gong, Y.: Few-shot class-incremental learning. In: CVPR (2020)
[37]
Tsai, Y.Y., Chen, P.Y., Ho, T.Y.: Transfer learning without knowing: reprogramming black-box machine learning models with scarce data and limited resources. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 9614–9624. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/tsai20a.html
[38]
Tsai, Y.Y., Chen, P.Y., Ho, T.Y.: Transfer learning without knowing: reprogramming black-box machine learning models with scarce data and limited resources. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, JMLR.org (2020)
[39]
Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, D.T., Yeung, S.K.: Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: ICCV (2019)
[40]
Wang C, Samari B, and Siddiqi K Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Local spectral graph convolution for point set feature learning Computer Vision – ECCV 2018 2018 Cham Springer 56-71
[41]
Wang, R., et al.: AttriClip: a non-incremental learner for incremental knowledge learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3654–3663, June 2023
[42]
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) (2019)
[43]
Wu, W., Qi, Z., Fuxin, L.: PointCONV: deep convolutional networks on 3D point clouds. In: CVPR (2019)
[44]
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR (2015)
[45]
Xiang, T., Zhang, C., Song, Y., Yu, J., Cai, W.: Walk in the cloud: learning curves for point clouds shape analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 915–924, October 2021
[46]
Xu Y, Fan T, Xu M, Zeng L, and Qiao Yu Ferrari V, Hebert M, Sminchisescu C, and Weiss Y SpiderCNN: deep learning on point sets with parameterized convolutional filters Computer Vision – ECCV 2018 2018 Cham Springer 90-105
[47]
Zhang, R., et al.: PointClip: point cloud understanding by clip. arXiv preprint arXiv:2112.02413 (2021)
[48]
Zhang, Y., Rabbat, M.: A graph-CNN for 3D point cloud classification. In: ICASSP (2018)
[49]
Zhou, D.W., Wang, F.Y., Ye, H.J., Ma, L., Pu, S., Zhan, D.C.: Forward compatible few-shot class-incremental learning. In: CVPR (2022)
[50]
Zhu, X., et al.: PointCLIP V2: prompting clip and GPT for powerful 3D open-world learning. arXiv preprint arXiv:2211.11682 (2022)

Index Terms

  1. Canonical Shape Projection Is All You Need for 3D Few-Shot Class Incremental Learning
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image Guide Proceedings
              Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XLI
              Sep 2024
              585 pages
              ISBN:978-3-031-72939-3
              DOI:10.1007/978-3-031-72940-9
              • Editors:
              • Aleš Leonardis,
              • Elisa Ricci,
              • Stefan Roth,
              • Olga Russakovsky,
              • Torsten Sattler,
              • Gül Varol

              Publisher

              Springer-Verlag

              Berlin, Heidelberg

              Publication History

              Published: 17 November 2024

              Author Tags

              1. 3D shape projection
              2. Model reprogramming
              3. Few-shot class incremental learning

              Qualifiers

              • Article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 25 Jan 2025

              Other Metrics

              Citations

              View Options

              View options

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media