Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

Jang, Jae-hoon; Lee, Jungyoon; Kim, Seong-heum

doi:10.1007/s42835-023-01640-7

Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

Original Article
Published: 07 September 2023

Volume 19, pages 1817–1825, (2024)
Cite this article

Journal of Electrical Engineering & Technology Aims and scope Submit manuscript

303 Accesses
1 Citation
Explore all metrics

Abstract

We present a two-phase algorithm that first identifies the categories and 2D proposal regions of 3D objects and then estimates the eight corners of cubes bounding the target objects. Given the predicted corners, the six-degrees-of-freedom (6-DoF) poses of the 3D objects are calculated using the conventional perspective-n-point (PnP) algorithm and evaluated with respect to manually annotated corners. In addition, several 3D models with high-quality shapes, texture information, 2D images, and annotations, such as 2D boxes, 3D cuboids, and segmentation masks, are collected. New objects are included while validating the proposed method. Our results are compared qualitatively and quantitatively with those of the baseline model using the publicly accessible LineMOD dataset, additional annotations in the OCCLUSION dataset, and our own custom dataset. While handling single and multiple objects in testing scenes, the proposed method is observed to exhibit clear improvements on both the aforementioned datasets and in real-world examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BOP: Benchmark for 6D Object Pose Estimation

NMPose: Leveraging Normal Maps for 6D Pose Estimation

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Kim S-H, Hwang Y (2021) A survey on deep learning based methods and datasets for monocular 3d object detection. Electronics 10(4):517
Article ADS Google Scholar
Kim J, Kim S-H (2021) Deep learning based object detection method and its application for intelligent transport systems. J Inst Control Robot Syst 27(12):1016–1022
Article Google Scholar
Kim S-H, Choe G, Park M-G, Kweon I (2020) Salient view selection for visual recognition of industrial components. IEEE Robot Automation Lett 5(2):2506–2513
Article Google Scholar
Kim S-H, Choe G, Ahn B, Kweon I (2017) Deep representation of industrial components using simulated images, In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2003–2010
Kim S-H, Tai Y-W, Lee J-Y, Park J, Kweon I (2017) Category-specific salient view selection via deep convolutional neural networks, In: Computer graphics forum, vol. 36, no. 8. Wiley Online Library, pp. 313–328
Wang C-Y, Bochkovskiy A, Liao H-Y (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464–7475. [Online]. https://github.com/WongKinYiu/yolov7
Tekin B, Sinha S, Fua P (2018) Real-time seamless single shot 6d object pose prediction, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 292–301. https://github.com/microsoft/singleshotpose
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788
Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks, In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 1316–1322
Kneip L, Scaramuzza D, Siegwart R (2011) A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation, In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2969–2976
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2013) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, In: ACCV. Springer, pp. 548–562
Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth, In: Proceedings of the IEEE international conference on computer vision, pp. 3828–3836
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A (2016) Ssd: single shot multibox detector. ECCV. Springer, pp 21–37
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again, In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529
Chen H, Wang P, Wang F, Tian W, Xiong L, Li H (2022) Epro-pnp: generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2781–2790
Wang C-Y, Bochkovskiy A, Liao H-YM (June 2021) Scaled-YOLOv4: scaling cross stage partial network, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13 029–13 038. [Online]. https://github.com/AlexeyAB/darknet
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database, in CVPR. IEEE, pp 248–255
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks, In: Advances in neural information processing systems (NIPS), pp 1097–1105
Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes, In: ICCV. IEEE, pp 858–865
Kim S-H, Cho D (2021) Viewpoint-aware action recognition using skeleton-based features from still images. Electronics 10(9):1118
Article ADS Google Scholar
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates, In: ECCV. Springer, pp 536–551
Collins J, Goel S, Deng K, Luthra A, Xu L, Gundogdu E, Zhang X, Vicente TFY, Dideriksen T, Arora H et al (2022) Abo: dataset and benchmarks for real-world 3d object understanding, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 21 126–21 136
Brazil G, Kumar A, Straub J, Ravi N, Johnson J, Gkioxari G (2023) Omni3d: A large benchmark and model for 3d object detection in the wild, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13 154–13 164
Wu T, Zhang J, Fu X, Wang Y, Ren J, Pan L, Wu W, Yang L, Wang J, Qian C et al (2023) Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation, arXiv preprint arXiv:2301.07525

Download references

Acknowledgements

This work was supported by the Soongsil University Research Fund (New Professor Support Research) of 2021.

Author information

Authors and Affiliations

School of AI Convergence, College of Information Technology, Soongsil University, Seoul, South Korea
Jae-hoon Jang & Seong-heum Kim
Department of Intelligent Systems, Soongsil University, Seoul, South Korea
Jungyoon Lee

Authors

Jae-hoon Jang
View author publications
You can also search for this author in PubMed Google Scholar
Jungyoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seong-heum Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seong-heum Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jang, Jh., Lee, J. & Kim, Sh. Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation. J. Electr. Eng. Technol. 19, 1817–1825 (2024). https://doi.org/10.1007/s42835-023-01640-7

Download citation

Received: 13 February 2023
Revised: 22 August 2023
Accepted: 24 August 2023
Published: 07 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s42835-023-01640-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BOP: Benchmark for 6D Object Pose Estimation

NMPose: Leveraging Normal Maps for 6D Pose Estimation

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

BOP: Benchmark for 6D Object Pose Estimation

NMPose: Leveraging Normal Maps for 6D Pose Estimation

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation