Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72943-0_17guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Beyond Viewpoint: Robust 3D Object Recognition Under Arbitrary Views Through Joint Multi-part Representation

Published: 29 November 2024 Publication History

Abstract

Existing view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint positions and quantities, and their poses are not aligned. However, most view-based methods, which aggregate multiple view features to obtain a global feature representation, hard to address 3D object recognition under arbitrary views. Due to the unaligned inputs from arbitrary views, it is challenging to robustly aggregate features, leading to performance degradation. In this paper, we introduce a novel Part-aware Network (PANet), which is a part-based representation, to address these issues. This part-based representation aims to localize and understand different parts of 3D objects, such as airplane wings and tails. It has properties such as viewpoint invariance and rotation robustness, which give it an advantage in addressing the 3D object recognition problem under arbitrary views. Our results on benchmark datasets clearly demonstrate that our proposed method outperforms existing view-based aggregation baselines for the task of 3D object recognition under arbitrary views, even surpassing most fixed viewpoint methods.

References

[1]
Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: CrossPoint: self-supervised cross-modal contrastive learning for 3D point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9902–9912 (2022)
[2]
Asif U, Bennamoun M, and Sohel FA A multi-modal, discriminative and spatially invariant CNN for RGB-D object labeling IEEE Trans. Pattern Anal. Mach. Intell. 2017 40 9 2051-2065
[3]
Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016)
[4]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229
[5]
Chen J, Qin J, Shen Y, Liu L, Zhu F, and Shao L Vedaldi A, Bischof H, Brox T, and Frahm J-M Learning attentive and hierarchical representations for 3D shape recognition Computer Vision – ECCV 2020 2020 Cham Springer 105-122
[6]
Chen, S., Yu, T., Li, P.: MVT: multi-view vision transformer for 3D object recognition. arXiv preprint arXiv:2110.13083 (2021)
[7]
Cheng B, Schwing A, and Kirillov A Per-pixel classification is not all you need for semantic segmentation Adv. Neural. Inf. Process. Syst. 2021 34 17864-17875
[8]
Cheng, Y., Cai, R., Zhao, X., Huang, K.: Convolutional fisher kernels for RGB-D object recognition. In: 2015 International Conference on 3D Vision, pp. 135–143. IEEE (2015)
[9]
Delitzas, A., et al.: Multi-CLIP: contrastive vision-language pre-training for question answering tasks in 3D scenes. arXiv preprint arXiv:2306.02329 (2023)
[10]
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
[11]
Dosovitskiy, A., Beyer, L., et al.: An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[12]
Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1568–1577 (2019)
[13]
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272 (2018)
[14]
Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)
[15]
Gao, K., Gao, Y., He, H., Lu, D., Xu, L., Li, J.: NeRF: neural radiance field in 3D vision, a comprehensive review. arXiv preprint arXiv:2210.00379 (2022)
[16]
Gao Y, Feng Y, Ji S, and Ji R HGNN+: general hypergraph neural networks IEEE Trans. Pattern Anal. Mach. Intell. 2022 45 3 3181-3199
[17]
Guo Y, Wang H, Hu Q, Liu H, Liu L, and Bennamoun M Deep learning for 3D point clouds: a survey IEEE Trans. Pattern Anal. Mach. Intell. 2020 43 12 4338-4364
[18]
Han Z et al. 3D2Seqviews: aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation IEEE Trans. Image Process. 2019 28 8 3986-3999
[19]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
[20]
Herreras, E.B.: Cognitive neuroscience; the biology of the mind. Cuadernos Neuropsicología/Panamerican J. Neuropsychol. 4(1), 87–90 (2010)
[21]
Hong, Y., Lin, C., Du, Y., Chen, Z., Tenenbaum, J.B., Gan, C.: 3D concept learning and reasoning from multi-view images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9202–9212 (2023)
[22]
Hu L, Qin M, Zhang F, Du Z, and Liu R RSCNN: a CNN-based method to enhance low-light remote-sensing images Remote Sens. 2020 13 1 62
[23]
Hu, T., Qi, H., Huang, Q., Lu, Y.: See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891 (2019)
[24]
Huang J, Yan W, Li T, Liu S, and Li G Learning the global descriptor for 3-D object recognition based on multiple views decomposition IEEE Trans. Multimed. 2022 24 188-201
[25]
Huang, S., Xu, Z., Tao, D., Zhang, Y.: Part-stacked CNN for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1173–1182 (2016)
[26]
Kanezaki A, Matsushita Y, and Nishida Y RotationNet for joint object categorization and unsupervised pose estimation from multi-view images IEEE Trans. Pattern Anal. Mach. Intell. 2019 43 1 269-283
[27]
Klokov, R., Lempitsky, V.: Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)
[28]
Kumawat, S., Raman, S.: LP-3DCNN: unveiling local phase in 3D convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4903–4912 (2019)
[29]
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation, pp. 1817–1824. IEEE (2011)
[30]
Li Z et al. Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, et al. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers Computer Vision – ECCV 2022 2022 Cham Springer 1-18
[31]
Lin Y, Gou Y, Liu X, Bai J, Lv J, and Peng X Dual contrastive prediction for incomplete multi-view representation learning IEEE Trans. Pattern Anal. Mach. Intell. 2022 45 4 4447-4461
[32]
Liu AA et al. Hierarchical multi-view context modelling for 3D object classification and retrieval Inf. Sci. 2021 547 984-995
[33]
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9298–9309 (2023)
[34]
Liu, S., Nguyen, V., Rehg, I., Tu, Z.: Recognizing objects from any view with object and viewer-centered representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2020)
[35]
Liu X, Han Z, Liu YS, and Zwicker M Fine-grained 3D shape classification with hierarchical part-view attention IEEE Trans. Image Process. 2021 30 1744-1758
[36]
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
[37]
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999).
[38]
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
[39]
Meng M, Zhang T, Yang W, Zhao J, Zhang Y, and Wu F Diverse complementary part mining for weakly supervised object localization IEEE Trans. Image Process. 2022 31 1774-1788
[40]
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, and Ng R NeRF: representing scenes as neural radiance fields for view synthesis Commun. ACM 2021 65 1 99-106
[41]
Nie W, Zhao Y, Song D, and Gao Y DAN: deep-attention network for 3D shape recognition IEEE Trans. Image Process. 2021 30 4371-4383
[42]
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
[43]
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017)
[44]
Rahman, M.M., Tan, Y., Xue, J., Lu, K.: RGB-D object recognition with multimodal deep convolutional neural networks. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 991–996. IEEE (2017)
[45]
Rao, Y., Chen, G., Lu, J., Zhou, J.: Counterfactual attention learning for fine-grained visual categorization and re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1025–1034 (2021)
[46]
Riegler, G., Osman Ulusoy, A., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)
[47]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
[48]
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
[49]
Su, J.C., Gadelha, M., Wang, R., Maji, S.: A deeper look at 3D shape classifiers. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
[50]
Sun K, Zhang J, Liu J, Yu R, and Song Z DRCNN: dynamic routing convolutional neural network for multi-view 3D object recognition IEEE Trans. Image Process. 2020 30 868-877
[51]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
[52]
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6411–6420 (2019)
[53]
Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, T., Yeung, S.K.: Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1588–1597 (2019)
[54]
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, and Solomon JM Dynamic graph CNN for learning on point clouds ACM Trans. Graph. (ToG) 2019 38 5 1-12
[55]
Wei, X., Gong, Y., Wang, F., Sun, X., Sun, J.: Learning canonical view representation for 3D shape recognition with arbitrary views. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 407–416 (2021)
[56]
Wei, X., Yu, R., Sun, J.: View-GCN: view-based graph convolutional network for 3D shape analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1850–1859 (2020)
[57]
Wei, X., Yu, R., Sun, J.: Learning view-based graph convolutional network for multi-view 3D shape analysis. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
[58]
Wu, S., Khosla, Y., Zhang, T.: D shapeNets: a deep representation for volumetric shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, pp. 7–12 (2015)
[59]
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
[60]
Xu Y, Zheng C, Xu R, Quan Y, and Ling H Multi-view 3D shape recognition via correspondence-aware deep learning IEEE Trans. Image Process. 2021 30 5299-5312
[61]
Xue, L., et al.: ULIP: learning unified representation of language, image and point cloud for 3D understanding. arXiv preprint arXiv:2212.05171 (2022)
[62]
Yang, C., et al.: BEVFormer v2: adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17830–17839 (2023)
[63]
Yang, Z., Wang, L.: Learning relationships for multi-view 3D object recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7505–7514 (2019)
[64]
Yu T, Meng J, Yang M, and Yuan J 3D object representation learning: a set-to-set matching perspective IEEE Trans. Image Process. 2021 30 2168-2179
[65]
Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3D object recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 186–194 (2018).
[66]
Zhang, H., et al.: SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143–1152 (2016)
[67]
Zhang N, Donahue J, Girshick R, and Darrell T Fleet D, Pajdla T, Schiele B, and Tuytelaars T Part-based R-CNNs for fine-grained category detection Computer Vision – ECCV 2014 2014 Cham Springer 834-849
[68]
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217 (2017)
[69]
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

Index Terms

  1. Beyond Viewpoint: Robust 3D Object Recognition Under Arbitrary Views Through Joint Multi-part Representation
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LII
            Sep 2024
            577 pages
            ISBN:978-3-031-72942-3
            DOI:10.1007/978-3-031-72943-0
            • Editors:
            • Aleš Leonardis,
            • Elisa Ricci,
            • Stefan Roth,
            • Olga Russakovsky,
            • Torsten Sattler,
            • Gül Varol

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 29 November 2024

            Author Tags

            1. 3D object recognition
            2. weakly-supervised learning

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 07 Mar 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media