Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation

  • Original Article
  • Published:
Journal of Electrical Engineering & Technology Aims and scope Submit manuscript

Abstract

We present a two-phase algorithm that first identifies the categories and 2D proposal regions of 3D objects and then estimates the eight corners of cubes bounding the target objects. Given the predicted corners, the six-degrees-of-freedom (6-DoF) poses of the 3D objects are calculated using the conventional perspective-n-point (PnP) algorithm and evaluated with respect to manually annotated corners. In addition, several 3D models with high-quality shapes, texture information, 2D images, and annotations, such as 2D boxes, 3D cuboids, and segmentation masks, are collected. New objects are included while validating the proposed method. Our results are compared qualitatively and quantitatively with those of the baseline model using the publicly accessible LineMOD dataset, additional annotations in the OCCLUSION dataset, and our own custom dataset. While handling single and multiple objects in testing scenes, the proposed method is observed to exhibit clear improvements on both the aforementioned datasets and in real-world examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Kim S-H, Hwang Y (2021) A survey on deep learning based methods and datasets for monocular 3d object detection. Electronics 10(4):517

    Article  ADS  Google Scholar 

  2. Kim J, Kim S-H (2021) Deep learning based object detection method and its application for intelligent transport systems. J Inst Control Robot Syst 27(12):1016–1022

    Article  Google Scholar 

  3. Kim S-H, Choe G, Park M-G, Kweon I (2020) Salient view selection for visual recognition of industrial components. IEEE Robot Automation Lett 5(2):2506–2513

    Article  Google Scholar 

  4. Kim S-H, Choe G, Ahn B, Kweon I (2017) Deep representation of industrial components using simulated images, In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2003–2010

  5. Kim S-H, Tai Y-W, Lee J-Y, Park J, Kweon I (2017) Category-specific salient view selection via deep convolutional neural networks, In: Computer graphics forum, vol. 36, no. 8. Wiley Online Library, pp. 313–328

  6. Wang C-Y, Bochkovskiy A, Liao H-Y (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464–7475. [Online]. https://github.com/WongKinYiu/yolov7

  7. Tekin B, Sinha S, Fua P (2018) Real-time seamless single shot 6d object pose prediction, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 292–301. https://github.com/microsoft/singleshotpose

  8. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788

  9. Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks, In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 1316–1322

  10. Kneip L, Scaramuzza D, Siegwart R (2011) A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation, In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2969–2976

  11. Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2013) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, In: ACCV. Springer, pp. 548–562

  12. Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth, In: Proceedings of the IEEE international conference on computer vision, pp. 3828–3836

  13. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587

  14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A (2016) Ssd: single shot multibox detector. ECCV. Springer, pp 21–37

  15. Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again, In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529

  16. Chen H, Wang P, Wang F, Tian W, Xiong L, Li H (2022) Epro-pnp: generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2781–2790

  17. Wang C-Y, Bochkovskiy A, Liao H-YM (June 2021) Scaled-YOLOv4: scaling cross stage partial network, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13 029–13 038. [Online]. https://github.com/AlexeyAB/darknet

  18. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database, in CVPR. IEEE, pp 248–255

  19. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks, In: Advances in neural information processing systems (NIPS), pp 1097–1105

  20. Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes, In: ICCV. IEEE, pp 858–865

  21. Kim S-H, Cho D (2021) Viewpoint-aware action recognition using skeleton-based features from still images. Electronics 10(9):1118

    Article  ADS  Google Scholar 

  22. Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates, In: ECCV. Springer, pp 536–551

  23. Collins J, Goel S, Deng K, Luthra A, Xu L, Gundogdu E, Zhang X, Vicente TFY, Dideriksen T, Arora H et al (2022) Abo: dataset and benchmarks for real-world 3d object understanding, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 21 126–21 136

  24. Brazil G, Kumar A, Straub J, Ravi N, Johnson J, Gkioxari G (2023) Omni3d: A large benchmark and model for 3d object detection in the wild, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13 154–13 164

  25. Wu T, Zhang J, Fu X, Wang Y, Ren J, Pan L, Wu W, Yang L, Wang J, Qian C et al (2023) Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation, arXiv preprint arXiv:2301.07525

Download references

Acknowledgements

This work was supported by the Soongsil University Research Fund (New Professor Support Research) of 2021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seong-heum Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jang, Jh., Lee, J. & Kim, Sh. Two-Phase Approach for Monocular Object Detection and 6-DoF Pose Estimation. J. Electr. Eng. Technol. 19, 1817–1825 (2024). https://doi.org/10.1007/s42835-023-01640-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42835-023-01640-7

Keywords