DetMatch: Two Teachers are Better than One for Joint 2D and 3D Semi-Supervised Object Detection

Park, Jinhyung; Xu, Chenfeng; Zhou, Yiyang; Tomizuka, Masayoshi; Zhan, Wei

doi:10.1007/978-3-031-20080-9_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13670))

Included in the following conference series:

European Conference on Computer Vision

2225 Accesses
5 Citations

Abstract

While numerous 3D detection works leverage the complementary relationship between RGB images and point clouds, developments in the broader framework of semi-supervised object recognition remain uninfluenced by multi-modal fusion. Current methods develop independent pipelines for 2D and 3D semi-supervised learning despite the availability of paired image and point cloud frames. Observing that the distinct characteristics of each sensor cause them to be biased towards detecting different objects, we propose DetMatch, a flexible framework for joint semi-supervised learning on 2D and 3D modalities. By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels that both demonstrates stronger performance and stymies single-modality error propagation. Further, we leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes. Evaluating our method on the challenging KITTI and Waymo datasets, we improve upon strong semi-supervised learning methods and observe higher quality pseudo-labels. Code will be released here: https://github.com/Divadi/DetMatch.

J. Park—Work conducted during visit to University of California, Berkeley.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Trusted 3D self-supervised representation learning with cross-modal settings

Article 02 June 2024

Multimodal Transformer for Automatic 3D Annotation and Object Detection

References

Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. Adv. Neural Inf. Proc. Syst. 27 (2014)
Google Scholar
Berthelot, D., et al.: Remixmatch: semi-supervised learning with distribution matching and augmentation anchoring. In: ICLR (2020)
Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. Adv. Neural Inf. Proc. Syst. 32 (2019)
Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. ArXiv arXiv:2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Caine, B., et al.: Pseudo-labeling for scalable 3D object detection. ArXiv arXiv:2103.02093 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chong, Z., et al.: Monodistill: learning spatial features for monocular 3D object detection. ArXiv arXiv:2201.10830 (2022)
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: minkowski convolutional neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3070–3079 (2019)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2443 (2017)
Google Scholar
Feng, D., Zhou, Y., Xu, C., Tomizuka, M., Zhan, W.: A simple and efficient multi-task network for 3D object detection and road understanding. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7067–7074. IEEE (2021)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Google Scholar
Graham, B., Engelcke, M., Maaten, L.V.D.: 3D semantic segmentation with submanifold sparse convolutional networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)
Huang, T., Liu, Z., Chen, X., Bai, X.: EPNet: enhancing point features with image semantics for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 35–52. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_3
Chapter Google Scholar
Janoch, A., et al.: A category-level 3-D object dataset: Putting the Kinect to work. In: ICCV Workshops (2011)
Google Scholar
Jaritz, M., Vu, T.H., de Charette, R., Wirbel, É., Pérez, P.: xMUDA: cross-modal unsupervised domain adaptation for 3D semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12602–12611 (2020)
Google Scholar
Jeong, J., Lee, S., Kim, J., Kwak, N.: Consistency-based semi-supervised learning for object detection. In: NeurIPS (2019)
Google Scholar
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–799 (2018)
Google Scholar
Kim, T., Oh, J., Kim, N., Cho, S., Yun, S.Y.: Comparing Kullback-Leibler divergence and mean squared error loss in knowledge distillation. In: IJCAI (2021)
Google Scholar
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logist. Quart. 2, 83–97 (1955)
Article MathSciNet MATH Google Scholar
Lahoud, J., Ghanem, B.: 2D-driven 3D object detection in RGB-D images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4632–4640 (2017)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: ICLR (2017)
Google Scholar
Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896 (2013)
Google Scholar
Li, H., Wu, Z., Shrivastava, A., Davis, L.S.: Rethinking pseudo labels for semi-supervised object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1314–1322 (2022)
Google Scholar
Li, Y.J., Park, J., O’Toole, M., Kitani, K.: Modality-agnostic learning for radar-lidar fusion in vehicle detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2022)
Google Scholar
Liang, Z., Zhang, M., Zhang, Z., Zhao, X., Pu, S.: Rangercnn: towards fast and accurate 3D object detection with range image representation. ArXiv arXiv:2009.00206 (2020)
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327 (2020)
Article Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y.C., et al.: Unbiased teacher for semi-supervised object detection. In: ICLR (2021)
Google Scholar
Liu, Y.C., et al.: Learning from 2D: Pixel-to-point knowledge transfer for 3D pretraining. ArXiv arXiv:2104.04687 (2021)
Liu, Y., Yi, L., Zhang, S., Fan, Q., Funkhouser, T.A., Dong, H.: P4contrast: contrastive learning with pairs of point-pixel pairs for RGB-D scene understanding. ArXiv arXiv:2012.13089 (2020)
Liu, Z., Qi, X., Fu, C.W.: 3D-to-2D distillation for indoor scene parsing. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4462–4472 (2021)
Google Scholar
Park, J.D., Weng, X., Man, Y., Kitani, K.: Multi-modality task cascade for 3D object detection. In: BMVC (2021)
Google Scholar
Qi, C., Chen, X., Litany, O., Guibas, L.: Imvotenet: boosting 3D object detection in point clouds with image votes. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4403–4412 (2020)
Google Scholar
Qi, C., Litany, O., He, K., Guibas, L.: Deep hough voting for 3D object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9276–9285 (2019)
Google Scholar
Qi, C., Liu, W., Wu, C., Su, H., Guibas, L.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Google Scholar
Qi, C., Yi, L., Su, H., Guibas, L.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS (2017)
Google Scholar
Qi, C., et al.: Offboard 3D object detection from point cloud sequences. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6130–6140 (2021)
Google Scholar
Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)
Article Google Scholar
Rezatofighi, S.H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)
Google Scholar
Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3d object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10526–10535 (2020)
Google Scholar
Shi, S., Wang, X., Li, H.: Pointrcnn: 3D object proposal generation and detection from point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–779 (2019)
Google Scholar
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Sindagi, V., Zhou, Y., Tuzel, O.: Mvx-net: multimodal voxelnet for 3D object detection. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7276–7282 (2019)
Google Scholar
Sohn, K., et al.: Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural. Inf. Process. Syst. 33, 596–608 (2020)
Google Scholar
Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., Pfister, T.: A simple semi-supervised learning framework for object detection. ArXiv arXiv:2005.04757 (2020)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11560–11569 (2020)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–576 (2015)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2451 (2020)
Google Scholar
Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14449–14458 (2021)
Google Scholar
Tang, Y., Chen, W., Luo, Y., Zhang, Y.: Humble teachers teach better students for semi-supervised object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3131–3140 (2021)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9626–9635 (2019)
Google Scholar
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: sequential fusion for 3d object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4603–4611 (2020)
Google Scholar
Wang, C.H., Chen, H.W., Fu, L.C.: Vpfnet: voxel-pixel fusion network for multi-class 3D object detection. ArXiv arXiv:2111.00966 (2021)
Wang, H., Cong, Y., Litany, O., Gao, Y., Guibas, L.J.: 3dioumatch: leveraging IOU prediction for semi-supervised 3D object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14610–14619 (2021)
Google Scholar
Wang, J., Gang, H., Ancha, S., Chen, Y.T., Held, D.: Semi-supervised 3D object detection via temporal graph neural networks. In: 2021 International Conference on 3D Vision (3DV), pp. 413–422 (2021)
Google Scholar
Wang, Z., Jia, K.: Frustum convnet: sliding frustums to aggregate local point-wise features for amodal. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1742–1749 (2019)
Google Scholar
Xiao, J., Owens, A., Torralba, A.: Sun3d: a database of big spaces reconstructed using sfm and object labels. In: 2013 IEEE International Conference on Computer Vision, pp. 1625–1632 (2013)
Google Scholar
Xie, L., Xiang, C., Yu, Z., Xu, G., Yang, Z., Cai, D., He, X.: Pi-RCNN: an efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. AAAI arXiv:1911.06084 (2020)
Xu, C., et al.: Image2point: 3D point-cloud understanding with pretrained 2D convnets. arXiv preprint arXiv:2106.04180 (2021)
Xu, C., et al.: You only group once: efficient point-cloud processing with token representation and relation inference module. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4589–4596. IEEE (2021)
Google Scholar
Xu, M., et al.: End-to-end semi-supervised object detection with soft teacher. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3040–3049 (2021)
Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors (Basel, Switzerland) 18 (2018)
Google Scholar
Yang, Q., Wei, X., Wang, B., Hua, X., Zhang, L.: Interactive self-training with mean teachers for semi-supervised object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5937–5946 (2021)
Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11037–11045 (2020)
Google Scholar
Yin, T., Zhou, X., Krähenbühl, P.: Multimodal virtual point 3D detection. In: NeurIPS (2021)
Google Scholar
Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 720–736. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_43
Chapter Google Scholar
Zhang, B., et al.: Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. Adv. Neural. Inf. Process. Syst. 34, 18408–18419 (2021)
Google Scholar
Zhang, H., Cissé, M., Dauphin, Y., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: ICLR (2018)
Google Scholar
Zhao, L., Zhou, H., Zhu, X., Song, X., Li, H., Tao, W.: LIF-SEG: lidar and camera image fusion for 3d lidar semantic segmentation. ArXiv arXiv:2108.07511 (2021)
Zhao, N., Chua, T.S., Lee, G.H.: SESS: self-ensembling semi-supervised 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11076–11084 (2020)
Google Scholar
feng Zhou, Q., Yu, C., Wang, Z., Qian, Q., Li, H.: Instant-teaching: an end-to-end semi-supervised object detection framework. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4079–4088 (2021)
Google Scholar
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Google Scholar

Download references

Acknowledgements

Co-authors from UC Berkeley were sponsored by Berkeley Deep Drive (BDD).

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Jinhyung Park
University of California, Berkeley, CA, 94720, USA
Chenfeng Xu, Yiyang Zhou, Masayoshi Tomizuka & Wei Zhan

Authors

Jinhyung Park
View author publications
You can also search for this author in PubMed Google Scholar
Chenfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yiyang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Masayoshi Tomizuka
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenfeng Xu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6600 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, J., Xu, C., Zhou, Y., Tomizuka, M., Zhan, W. (2022). DetMatch: Two Teachers are Better than One for Joint 2D and 3D Semi-Supervised Object Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13670. Springer, Cham. https://doi.org/10.1007/978-3-031-20080-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-20080-9_22
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20079-3
Online ISBN: 978-3-031-20080-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DetMatch: Two Teachers are Better than One for Joint 2D and 3D Semi-Supervised Object Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Trusted 3D self-supervised representation learning with cross-modal settings

Multimodal Transformer for Automatic 3D Annotation and Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 6600 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

DetMatch: Two Teachers are Better than One for Joint 2D and 3D Semi-Supervised Object Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Trusted 3D self-supervised representation learning with cross-modal settings

Multimodal Transformer for Automatic 3D Annotation and Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 6600 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation