survey

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

Authors:

Jun HeAuthors Info & Claims

ACM Computing Surveys, Volume 55, Issue 4

Article No.: 81, Pages 1 - 40

https://doi.org/10.1145/3524496

Published: 21 November 2022 Publication History

Abstract

Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

References

[1]

Adel Ahmadyan, Tingbo Hou, Jianing Wei, Liangkai Zhang, Artsiom Ablavatski, and Matthias Grundmann. 2020. Instant 3D object tracking with applications in augmented reality. arXiv preprint arXiv:2006.13194 (2020).

[2]

Adel Ahmadyan, Liangkai Zhang, Artsiom Ablavatski, Jianing Wei, and Matthias Grundmann. 2021. Objectron: A large scale dataset of object-centric videos in the wild with pose annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7822–7831.

[3]

Eduardo Arnold, Omar Y. Al-Jarrah, Mehrdad Dianati, Saber Fallah, David Oxtoby, and Alex Mouzakitis. 2019. A survey on 3D object detection methods for autonomous driving applications. IEEE Trans. Intell. Transport. Syst. 20, 10 (2019), 3782–3795.

[4]

Vassileios Balntas, Andreas Doumanoglou, Caner Sahin, Juil Sock, Rigas Kouskouridas, and Tae-Kyun Kim. 2017. Pose guided RGBD feature learning for 3D object pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3856–3864.

[5]

Konstantinos Bousmalis, Alex Irpan, Paul Wohlhart, Yunfei Bai, Matthew Kelcey, Mrinal Kalakrishnan, Laura Downs, Julian Ibarz, Peter Pastor, Kurt Konolige, et al. 2018. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4243–4250.

Digital Library

[6]

Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton, and Carsten Rother. 2014. Learning 6D object pose estimation using 3D object coordinates. In Proceedings of the European Conference on Computer Vision. Springer, 536–551.

[7]

Eric Brachmann, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, et al. 2016. Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3364–3372.

[8]

Garrick Brazil and Xiaoming Liu. 2019. M3D-RPN: Monocular 3D region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9296.

[9]

Garrick Brazil, Gerard Pons-Moll, Xiaoming Liu, and Bernt Schiele. 2020. Kinematic 3D object detection in monocular video. In Proceedings of the European Conference on Computer Vision. Springer, 135–152.

Digital Library

[10]

Yannick Bukschat and Marcus Vetter. 2020. EfficientPose–An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. arXiv preprint arXiv:2011.04307 (2020).

[11]

Benjamin Busam, Hyun Jun Jung, and Nassir Navab. 2020. I like to move it: 6D pose estimation as an action decision process. arXiv preprint arXiv:2009.12678 (2020).

[12]

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11621–11631.

[13]

Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M. Dollar. 2015. The YCB object and model set: Towards common benchmarks for manipulation research. In Proceedings of the International Conference on Advanced Robotics (ICAR). IEEE, 510–517.

[14]

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).

[15]

Bo Chen, Alvaro Parra, Jiewei Cao, Nan Li, and Tat-Jun Chin. 2020. End-to-end learnable geometric vision by backpropagating PnP optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8100–8109.

[16]

Dengsheng Chen, Jun Li, Zheng Wang, and Kai Xu. 2020. Learning canonical shape space for category-level 6D object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11973–11982.

[17]

Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 742–751.

Digital Library

[18]

Jiale Chen, Lijun Zhang, Yi Liu, and Chi Xu. 2020. Survey on 6D pose estimation of rigid object. In Proceedings of the 39th Chinese Control Conference (CCC). IEEE, 7440–7445.

[19]

Kai Chen and Qi Dou. 2021. SGPA: Structure-guided prior adaptation for category-level 6D object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2773–2782.

[20]

Wei Chen, Jinming Duan, Hector Basevi, Hyung Jin Chang, and Ales Leonardis. 2020. PointPoseNet: Point pose network for robust 6D object pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2824–2833.

[21]

Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, and Ales Leonardis. 2020. G2L-Net: Global to local network for real-time 6D pose estimation with embedding vector features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4233–4242.

[22]

Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, Linlin Shen, and Ales Leonardis. 2021. FS-Net: Fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1581–1590.

[23]

Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. 2019. Learning to predict 3D objects with an interpolation-based differentiable renderer. Adv. Neural Inf. Process. Syst. 32 (2019), 9609–9619.

[24]

Xuzhan Chen, Youping Chen, Bang You, Jingming Xie, and Homayoun Najjaran. 2020. Detecting 6D poses of target objects from cluttered scenes by learning to align the point cloud patches with the CAD models. IEEE Access 8 (2020), 210640–210650.

[25]

Xu Chen, Zijian Dong, Jie Song, Andreas Geiger, and Otmar Hilliges. 2020. Category level object pose estimation via neural analysis-by-synthesis. In Proceedings of the European Conference on Computer Vision. Springer, 139–156.

Digital Library

[26]

Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2016. Monocular 3D object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2147–2156.

[27]

Pietro Cipresso, Irene Alice Chicchi Giglioli, Mariano Alcañiz Raya, and Giuseppe Riva. 2018. The past, present, and future of virtual and augmented reality research: A network and cluster analysis of the literature. Front. Psychol. 9 (2018), 2086.

[28]

Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Timothy Bretl, and Dieter Fox. 2021. PoseRBPF: A Rao–Blackwellized particle filter for 6-D object pose tracking. IEEE Trans. Robot. 37, 5 (2021), 1328–1342.

[29]

Xinke Deng, Yu Xiang, Arsalan Mousavian, Clemens Eppner, Timothy Bretl, and Dieter Fox. 2020. Self-supervised 6D object pose estimation for robot manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3665–3671.

[30]

Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, and Ping Luo. 2020. Learning depth-guided convolutions for monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 1000–1001.

[31]

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 2758–2766.

Digital Library

[32]

Guoguang Du, Kai Wang, and Shiguo Lian. 2019. Vision-based robotic grasping from object localization pose estimation grasp detection to motion planning: A review. arXiv preprint arXiv:1905.06658 (2019).

[33]

Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2019. CenterNet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6569–6578.

[34]

Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. 2018. Learning SO (3) equivariant representations with spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV). 52–68.

Digital Library

[35]

Zhaoxin Fan, Zhengbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan Liu, and Jun He. 2021. ACR-Pose: Adversarial canonical representation reconstruction network for category level 6D object pose estimation. arXiv preprint arXiv:2111.10524 (2021).

[36]

Duarte Fernandes, António Silva, Rafael Névoa, Cláudia Simões, Dibet Gonzalez, Miguel Guevara, Paulo Novais, João Monteiro, and Pedro Melo-Pinto. 2021. Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy. Inf. Fusion 68 (2021), 161–191.

[37]

Ge Gao, Mikko Lauri, Xiaolin Hu, Jianwei Zhang, and Simone Frintrop. 2021. CloudAAE: Learning 6D object pose regression with on-line data synthesis on point clouds. arXiv preprint arXiv:2103.01977 (2021).

[38]

Ge Gao, Mikko Lauri, Yulong Wang, Xiaolin Hu, Jianwei Zhang, and Simone Frintrop. 2020. 6D object pose regression via supervised learning on point clouds. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3643–3649.

[39]

Tianze Gao, Huihui Pan, and Huijun Gao. 2020. Monocular 3D object detection with sequential feature association and depth hint augmentation. arXiv preprint arXiv:2011.14589 (2020).

[40]

Mathieu Garon and Jean-François Lalonde. 2017. Deep 6-DOF tracking. IEEE Trans. Visualiz. Comput. Graph. 23, 11 (2017), 2410–2418.

Digital Library

[41]

Michele Gattullo, Giulia Wally Scurati, Michele Fiorentino, Antonio Emmanuele Uva, Francesco Ferrise, and Monica Bordegoni. 2019. Towards augmented reality manuals for industry 4.0: A methodology. Robot. Comput.-integ. Manuf. 56 (2019), 276–286.

[42]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.

[43]

Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and Gigel Macesanu. 2020. A survey of deep learning techniques for autonomous driving. J. Field Robot. 37, 3 (2020), 362–386.

[44]

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2020. Deep learning for 3D point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 12 (2020), 4338–4364.

Digital Library

[45]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.

[46]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[47]

Yisheng He, Haibin Huang, Haoqiang Fan, Qifeng Chen, and Jian Sun. 2021. FFB6D: A full flow bidirectional fusion network for 6D pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3003–3013.

[48]

Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, and Jian Sun. 2020. PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11632–11641.

[49]

Stefan Hinterstoisser, Cedric Cagniart, Slobodan Ilic, Peter Sturm, Nassir Navab, Pascal Fua, and Vincent Lepetit. 2011. Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34, 5 (2011), 876–888.

Digital Library

[50]

Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. 2011. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In Proceedings of the International Conference on Computer Vision. IEEE, 858–865.

Digital Library

[51]

Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. 2012. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In Proceedings of the Asian Conference on Computer Vision. Springer, 548–562.

Digital Library

[52]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computat. 9, 8 (1997), 1735–1780.

Digital Library

[53]

Tomas Hodan, Daniel Barath, and Jiri Matas. 2020. EPOS: Estimating 6D pose of objects with symmetries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11703–11712.

[54]

Tomáš Hodan, Pavel Haluza, Štepán Obdržálek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. 2017. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 880–888.

[55]

Tomáš Hodaň, Jiří Matas, and Štěpán Obdržálek. 2016. On evaluation of 6D object pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 606–619.

[56]

Tingbo Hou, Adel Ahmadyan, Liangkai Zhang, Jianing Wei, and Matthias Grundmann. 2020. MobilePose: Real-time pose estimation for unseen objects with weak shape supervision. arXiv preprint arXiv:2003.03522 (2020).

[57]

Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krahenbuhl, Trevor Darrell, and Fisher Yu. 2019. Joint monocular 3D vehicle detection and tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5390–5399.

[58]

Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, and Min Sun. 2021. Monocular quasi-dense 3D object tracking. arXiv preprint arXiv:2103.07351 (2021).

[59]

Yinlin Hu, Pascal Fua, Wei Wang, and Mathieu Salzmann. 2020. Single-stage 6D object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2930–2939.

[60]

Yinlin Hu, Joachim Hugonot, Pascal Fua, and Mathieu Salzmann. 2019. Segmentation-driven 6D object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3385–3394.

[61]

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. 2018. The ApolloScape dataset for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 954–960.

[62]

Daniel P. Huttenlocher, Gregory A. Klanderman, and William J. Rucklidge. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850–863.

Digital Library

[63]

María-Blanca Ibáñez and Carlos Delgado-Kloos. 2018. Augmented reality for STEM learning: A systematic review. Comput. Educ. 123 (2018), 109–123.

Digital Library

[64]

Omid Hosseini Jafari, Siva Karthik Mustikovela, Karl Pertsch, Eric Brachmann, and Carsten Rother. 2018. iPose: Instance-aware 6D pose estimation of partly occluded objects. In Proceedings of the Asian Conference on Computer Vision. Springer, 477–492.

[65]

Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, and Konstantinos Bousmalis. 2019. Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12627–12637.

[66]

Eskil Jörgensen, Christopher Zach, and Fredrik Kahl. 2019. Monocular 3D object detection and box fitting trained end-to-end using intersection-over-union loss. arXiv preprint arXiv:1906.08070 (2019).

[67]

Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. 35–45.

[68]

Roman Kaskman, Sergey Zakharov, Ivan Shugurov, and Slobodan Ilic. 2019. HomebrewedDB: RGB-D dataset for 6D pose estimation of 3D objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

[69]

Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In Proceedings of the IEEE International Conference on Computer Vision. 1521–1529.

[70]

Wadim Kehl, Fausto Milletari, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2016. Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 205–220.

[71]

Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1317–1327.

[72]

Kilian Kleeberger, Richard Bormann, Werner Kraus, and Marco F. Huber. 2020. A survey on learning-based robotic grasping. Curr. Robot. Rep. 1, 4 (2020), 239–249.

[73]

Alexander Krull, Eric Brachmann, Frank Michel, Michael Ying Yang, Stefan Gumhold, and Carsten Rother. 2015. Learning analysis-by-synthesis for 6D pose estimation in RGB-D images. In Proceedings of the IEEE International Conference on Computer Vision. 954–962.

Digital Library

[74]

Alexander Krull, Frank Michel, Eric Brachmann, Stefan Gumhold, Stephan Ihrke, and Carsten Rother. 2014. 6-DoF model based tracking via object coordinate regression. In Proceedings of the Asian Conference on Computer Vision. Springer, 384–399.

[75]

Harold W. Kuhn. 1955. The Hungarian method for the assignment problem. Naval Res. Logist. Quart. 2, 1–2 (1955), 83–97.

[76]

Taeyeop Lee, Byeong-Uk Lee, Myungchul Kim, and In So Kweon. 2021. Category-level metric scale object shape and pose estimation. IEEE Robot. Automat. Lett. 6, 4 (2021), 8575–8582.

[77]

Felix Leeb, Arunkumar Byravan, and Dieter Fox. 2019. Motion-Nets: 6D tracking of unknown objects in unseen environments using RGB. arXiv preprint arXiv:1910.13942 (2019).

[78]

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. 2009. EPnP: An accurate O (n) solution to the PnP problem. Int. J. Comput. Vis. 81, 2 (2009), 155.

Digital Library

[79]

Jesse Levinson, Jake Askeland, Jan Becker, Jennifer Dolson, David Held, Soeren Kammel, J. Zico Kolter, Dirk Langer, Oliver Pink, Vaughan Pratt, et al. 2011. Towards fully autonomous driving: Systems and algorithms. In Proceedings of the IEEE Intelligent Vehicles Symposium. IEEE, 163–168.

[80]

Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, and Xiaogang Wang. 2019. GS3D: An efficient 3D object detection framework for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1019–1028.

[81]

Peixuan Li and Zhao Huaici. 2021. Monocular 3D detection with geometric constraint embedding and semi-supervised training. IEEE Robot. Automat. Lett. 6, 3 (2021), 5565–5572.

[82]

Peixuan Li, Huaici Zhao, Pengfei Liu, and Feidao Cao. 2020. RTM3D: Real-time monocular 3D detection from object keypoints for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 644–660.

Digital Library

[83]

Xiaolong Li, He Wang, Li Yi, Leonidas J. Guibas, A. Lynn Abbott, and Shuran Song. 2020. Category-level articulated object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3706–3715.

[84]

Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, and Dieter Fox. 2018. DeepIM: Deep iterative matching for 6D pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 683–698.

Digital Library

[85]

Zhigang Li, Yinlin Hu, Mathieu Salzmann, and Xiangyang Ji. 2020. Robust RGB-based 6-DoF pose estimation without real pose annotations. arXiv preprint arXiv:2008.08391 (2020).

[86]

Zhigang Li, Gu Wang, and Xiangyang Ji. 2019. CDPN: Coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7678–7687.

[87]

Jiehong Lin, Zewei Wei, Zhihao Li, Songcen Xu, Kui Jia, and Yuanqing Li. 2021. DualPoseNet: Category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3560–3569.

[88]

Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, and Stan Birchfield. 2021. Single-stage keypoint-based category-level object pose estimation from an RGB image. arXiv preprint arXiv:2109.06161 (2021).

[89]

Lijie Liu, Jiwen Lu, Chunjing Xu, Qi Tian, and Jie Zhou. 2019. Deep fitting degree scoring network for monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1057–1066.

[90]

Lijie Liu, Chufan Wu, Jiwen Lu, Lingxi Xie, Jie Zhou, and Qi Tian. 2020. Reinforced axial refinement network for monocular 3D object detection. In Proceedings of the European Conference on Computer Vision. Springer, 540–556.

Digital Library

[91]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21–37.

[92]

Yuxuan Liu, Yuan Yixuan, and Ming Liu. 2021. Ground-aware monocular 3D object detection for autonomous driving. IEEE Robot. Automat. Lett. 6, 2 (2021), 919–926.

[93]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the International Conference on Computer Vision (ICCV).

[94]

Zechen Liu, Zizhang Wu, and Roland Tóth. 2020. SMOKE: Single-stage monocular 3D object detection via keypoint estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 996–997.

[95]

David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91–110.

Digital Library

[96]

Xinzhu Ma, Shinan Liu, Zhiyi Xia, Hongwen Zhang, Xingyu Zeng, and Wanli Ouyang. 2020. Rethinking pseudo-LiDAR representation. In Proceedings of the European Conference on Computer Vision. Springer, 311–327.

Digital Library

[97]

F. Manhardt, G. Wang, B. Busam, et al. 2020. CPS++: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning[J]. arXiv preprint arXiv:2003.0584.

[98]

Mateusz Majcher and Bogdan Kwolek. 2020. 3D model-based 6D object pose tracking on RGB images using particle filtering and heuristic optimization. In VISIGRAPP (5: VISAPP). 690–697.

[99]

Fabian Manhardt, Wadim Kehl, and Adrien Gaidon. 2019. ROI-10D: Monocular lifting of 2D detection to 6D pose and metric shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2069–2078.

[100]

Fabian Manhardt, Wadim Kehl, Nassir Navab, and Federico Tombari. 2018. Deep model-based 6D pose refinement in RGB. In Proceedings of the European Conference on Computer Vision (ECCV). 800–815.

Digital Library

[101]

Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, and Nassir Navab. 2020. CPS++: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning. arXiv e-prints (2020).

[102]

Isidoros Marougkas, Petros Koutras, Nikos Kardaris, Georgios Retsinas, Georgia Chalvatzaki, and Petros Maragos. 2020. How to track your dragon: A multi-attentional framework for real-time RGB-D 6-DoF object pose tracking. In Proceedings of the European Conference on Computer Vision. Springer, 682–699.

Digital Library

[103]

Jonathan Masci, Ueli Meier, Dan Cireşan, and Jürgen Schmidhuber. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 52–59.

[104]

Markus Maurer, J. Christian Gerdes, Barbara Lenz, and Hermann Winner. 2016. Autonomous Driving: Technical, Legal and Social Aspects. Springer Nature.

[105]

Douglas Morrison, Peter Corke, and Jürgen Leitner. 2018. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. In Proceedings of the Conference on Robotics: Science and Systems (RSS).

[106]

Arsalan Mousavian, Dragomir Anguelov, John Flynn, and Jana Kosecka. 2017. 3D bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7074–7082.

[107]

Yair Movshovitz-Attias, Takeo Kanade, and Yaser Sheikh. 2016. How useful is photo-realistic rendering for visual learning? In Proceedings of the European Conference on Computer Vision. Springer, 202–217.

[108]

Apurv Nigam, Adrian Penate-Sanchez, and Lourdes Agapito. 2018. Detect globally, label locally: Learning accurate 6-DoF object pose estimation by joint segmentation and coordinate regression. IEEE Robot. Automat. Lett. 3, 4 (2018), 3960–3967.

[109]

Markus Oberweger, Mahdi Rad, and Vincent Lepetit. 2018. Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 119–134.

Digital Library

[110]

Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon. 2021. Is pseudo-LiDAR needed for monocular 3D object detection? In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3142–3152.

[111]

Kiru Park, Timothy Patten, Johann Prankl, and Markus Vincze. 2019. Multi-task template matching for object detection, segmentation and pose estimation using depth images. In Proceedings of the International Conference on Robotics and Automation (ICRA). IEEE, 7207–7213.

Digital Library

[112]

Kiru Park, Timothy Patten, and Markus Vincze. 2019. Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7668–7677.

[113]

Aniruddha Patil and Pankaj Rabha. 2019. A survey on joint object detection and pose estimation using monocular vision. MATEC Web Conf. 277 (01 2019), 02029. DOI:

[114]

Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. 6-DoF object pose from semantic keypoints. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2011–2018.

Digital Library

[115]

Jon Peddie. 2017. Augmented Reality: Where We Will All Live. Springer.

[116]

Liang Peng, Fei Liu, Senbo Yan, Xiaofei He, and Deng Cai. 2021. OCM3D: Object-centric monocular 3D object detection. arXiv preprint arXiv:2104.06041 (2021).

[117]

Liang Peng, Fei Liu, Zhengxu Yu, Senbo Yan, Dan Deng, and Deng Cai. 2021. Lidar point cloud guided monocular 3D object detection. arXiv preprint arXiv:2104.09035 (2021).

[118]

Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, and Hujun Bao. 2019. PVNet: Pixel-wise voting network for 6DoF pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4561–4570.

[119]

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.

[120]

Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger, and Wei-Lun Chao. 2020. End-to-end pseudo-LiDAR for image-based 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5881–5890.

[121]

Zengyi Qin, Jinglu Wang, and Yan Lu. 2019. MonoGRNet: A geometric reasoning network for monocular 3D object localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8851–8858.

Digital Library

[122]

Mahdi Rad and Vincent Lepetit. 2017. BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In Proceedings of the IEEE International Conference on Computer Vision. 3828–3836.

[123]

Cody Reading, Ali Harakeh, Julia Chae, and Steven L. Waslander. 2021. Categorical depth distribution network for monocular 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8555–8564.

[124]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.

[125]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2016), 1137–1149.

Digital Library

[126]

Caner Sahin, Guillermo Garcia-Hernando, Juil Sock, and Tae-Kyun Kim. 2019. Instance-and category-level 6D object pose estimation. In RGB-D Image Analysis and Processing. Springer, 243–265.

[127]

Caner Sahin, Guillermo Garcia-Hernando, Juil Sock, and Tae-Kyun Kim. 2020. A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators. Image Vis. Comput. 96 (2020), 103898.

[128]

Caner Sahin and Tae-Kyun Kim. 2018. Category-level 6D object pose recovery in depth images. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops.

[129]

Caner Sahin and Tae-Kyun Kim. 2018. Recovering 6D object pose: A review and multi-modal analysis. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops.

[130]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.

[131]

Xuepeng Shi, Zhixiang Chen, and Tae-Kyun Kim. 2020. Distance-normalized unified representation for monocular 3D object detection. In Proceedings of the European Conference on Computer Vision. Springer, 91–107.

Digital Library

[132]

Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2930–2937.

Digital Library

[133]

Mennatullah Siam, Sepehr Valipour, Martin Jagersand, and Nilanjan Ray. 2017. Convolutional gated recurrent networks for video segmentation. In Proceedings of the IEEE International Conference on Image Processing (ICIP). IEEE, 3090–3094.

Digital Library

[134]

Andrea Simonelli, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, and Elisa Ricci. 2020. Demystifying Pseudo-LiDAR for monocular 3D object detection. arXiv preprint arXiv:2012.05796 (2020).

[135]

Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel López-Antequera, and Peter Kontschieder. 2019. Disentangling monocular 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1991–1999.

[136]

Juil Sock, Guillermo Garcia-Hernando, Anil Armagan, and Tae-Kyun Kim. 2020. Introducing pose consistency and warp-alignment for self-supervised 6D object pose estimation in color images. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, 291–300.

[137]

Chen Song, Jiaru Song, and Qixing Huang. 2020. HybridPose: 6D object pose estimation under hybrid representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 431–440.

[138]

Carsten Steger. 2001. Similarity measures for occlusion, clutter, and illumination invariant object recognition. In Proceedings of the Joint Pattern Recognition Symposium. Springer, 148–154.

[139]

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, and Rudolph Triebel. 2018. Implicit 3D orientation learning for 6D object detection from RGB images. In Proceedings of the European Conference on Computer Vision (ECCV). 699–715.

Digital Library

[140]

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105–6114.

[141]

Mingxing Tan, Ruoming Pang, and Quoc V. Le. 2020. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10781–10790.

[142]

Bugra Tekin, Sudipta N. Sinha, and Pascal Fua. 2018. Real-time seamless single shot 6D object pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 292–301.

[143]

Meng Tian, Marcelo H. Ang, and Gim Hee Lee. 2020. Shape prior deformation for categorical 6D object pose and size estimation. In Proceedings of the European Conference on Computer Vision. Springer, 530–546.

Digital Library

[144]

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 23–30.

Digital Library

[145]

Ameni Trabelsi, Mohamed Chaabane, Nathaniel Blanchard, and Ross Beveridge. 2021. A pose proposal and refinement network for better 6D object pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2382–2391.

[146]

Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. 2018. Deep object pose estimation for semantic robotic grasping of household objects. In Proceedings of the Conference on Robot Learning (CoRL).

[147]

Shinji Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Comput. Archit. Lett. 13, 04 (1991), 376–380.

[148]

Kentaro Wada, Edgar Sucar, Stephen James, Daniel Lenton, and Andrew J. Davison. 2020. MoreFusion: Multi-object reasoning for 6D pose estimation from volumetric fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14540–14549.

[149]

Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, and Yuke Zhu. 2020. 6-PACK: Category-level 6D pose tracker with anchor-based keypoints. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 10059–10066.

[150]

Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, and Silvio Savarese. 2019. Densefusion: 6D object pose estimation by iterative dense fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3343–3352.

[151]

Gu Wang, Fabian Manhardt, Jianzhun Shao, Xiangyang Ji, Nassir Navab, and Federico Tombari. 2020. Self6D: Self-supervised monocular 6D object pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 108–125.

Digital Library

[152]

Gu Wang, Fabian Manhardt, Federico Tombari, and Xiangyang Ji. 2021. GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16611–16621.

[153]

He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J. Guibas. 2019. Normalized object coordinate space for category-level 6D object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2642–2651.

[154]

Jiadai Wang, Jiajia Liu, and Nei Kato. 2018. Networking and communications in autonomous driving: A survey. IEEE Commun. Surv. Tutor. 21, 2 (2018), 1243–1274.

[155]

Tai Wang, Xinge Zhu, Jiangmiao Pang, and Dahua Lin. 2021. FCOS3D: Fully convolutional one-stage monocular 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.

[156]

Tai Wang, Xinge Zhu, Jiangmiao Pang, and Dahua Lin. 2021. Probabilistic and geometric depth: Detecting objects in perspective. arXiv preprint arXiv:2107.14160 (2021).

[157]

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the International Conference on Computer Vision (ICCV).

[158]

Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q. Weinberger. 2019. Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8445–8453.

[159]

Jianing Wei, Genzhi Ye, Tyler Mullen, Matthias Grundmann, Adel Ahmadyan, and Tingbo Hou. 2019. Instant motion tracking and its applications to augmented reality. arXiv preprint arXiv:1907.06796 (2019).

[160]

Bowen Wen, Chaitanya Mitash, Baozhang Ren, and Kostas E. Bekris. 2020. SE (3)-tracknet: Data-driven 6D pose tracking by calibrating image residuals in synthetic domains. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10367–10373.

Digital Library

[161]

Xinshuo Weng and Kris Kitani. 2019. Monocular 3D object detection with pseudo-LiDAR point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

[162]

Xinshuo Weng, Jianren Wang, David Held, and Kris Kitani. 2020. 3D multi-object tracking: A baseline and new evaluation metrics. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10359–10366.

Digital Library

[163]

Xinshuo Weng, Ye Yuan, and Kris Kitani. 2020. Joint 3D tracking and forecasting with graph neural network and diversity sampling. arXiv preprint arXiv:2003.07847 (2020).

[164]

Paul Wohlhart and Vincent Lepetit. 2015. Learning descriptors for object recognition and 3D pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3109–3118.

[165]

Yangzheng Wu, Mohsen Zand, Ali Etemad, and Michael Greenspan. 2021. Vote from the center: 6 DoF pose estimation in RGB-D images by radial keypoint voting. arXiv preprint arXiv:2104.02527 (2021).

[166]

Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2017. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017).

[167]

Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 802–810.

[168]

Zongxin Yang, Xin Yu, and Yi Yang. 2021. DSC-PoseNet: Learning 6DoF object pose estimation via dual-scale consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3907–3916.

[169]

Xiaoqing Ye, Liang Du, Yifeng Shi, Yingying Li, Xiao Tan, Jianfeng Feng, Errui Ding, and Shilei Wen. 2020. Monocular 3D object detection via feature domain adaptation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer, 17–34.

Digital Library

[170]

Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. 2020. iNeRF: Inverting neural radiance fields for pose estimation. arXiv preprint arXiv:2012.05877 (2020).

[171]

Yurong You, Yan Wang, Wei-Lun Chao, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, and Kilian Q. Weinberger. 2020. Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving. Proceedings of the Conference on International Conference on Learning Representations (ICLR).

[172]

Xin Yu, Zheyu Zhuang, Piotr Koniusz, and Hongdong Li. 2020. 6DoF object pose estimation via differentiable proxy voting loss. In Proceedings of the British Machine Vision Conference (BMVC).

[173]

Sergey Zakharov, Wadim Kehl, Benjamin Planche, Andreas Hutter, and Slobodan Ilic. 2017. 3D object instance recognition and pose estimation using triplet loss with dynamic margin. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 552–559.

Digital Library

[174]

Sergey Zakharov, Ivan Shugurov, and Slobodan Ilic. 2019. DPOD: 6D pose object detector and refiner. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1941–1950.

[175]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.

[176]

Zelin Zhao, Gao Peng, Haoyu Wang, Hao-Shu Fang, Chengkun Li, and Cewu Lu. 2018. Estimating 6D pose from localizing designated surface keypoints. arXiv preprint arXiv:1812.01387 (2018).

[177]

Leisheng Zhong, Yu Zhang, Hao Zhao, An Chang, Wenhao Xiang, Shunli Zhang, and Li Zhang. 2020. Seeing through the occluders: Robust monocular 6-DoF object pose tracking via model-guided video object segmentation. IEEE Robot. Automat. Lett. 5, 4 (2020), 5159–5166.

[178]

Xingyi Zhou, Vladlen Koltun, and Philipp Krähenbühl. 2020. Tracking objects as points. In Proceedings of the European Conference on Computer Vision. Springer, 474–490.

Digital Library

[179]

Xichuan Zhou, Yicong Peng, Chunqiao Long, Fengbo Ren, and Cong Shi. 2020. MoNet3D: Towards accurate monocular 3D object localization in real time. In Proceedings of the International Conference on Machine Learning. PMLR, 11503–11512.

[180]

Chenchen Zhu, Yihui He, and Marios Savvides. 2019. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 840–849.

Cited By

Petrov MKniga ADyachenko DDubodelov ASimakov S(2024)Control of sports exercises and diagnostics of diseases of the musculoskeletal system using hardware and software complex and machine learning algorithmRussian Journal of Information Technology in Sports10.62105/2949-6349-2024-1-1-9-12(8-12)Online publication date: 13-Mar-2024
https://doi.org/10.62105/2949-6349-2024-1-1-9-12
Guan JHao YWu QLi SFang Y(2024)A Survey of 6DoF Object Pose Estimation Methods for Different Application ScenariosSensors10.3390/s2404107624:4(1076)Online publication date: 7-Feb-2024
https://doi.org/10.3390/s24041076
KANAI S(2024)Object Recognition from 3D Point Clouds : A Survey for Beginnersはじめての三次元点群からの物体認識Journal of the Japan Society for Precision Engineering10.2493/jjspe.90.63590:8(635-641)Online publication date: 5-Aug-2024
https://doi.org/10.2493/jjspe.90.635
Show More Cited By

Index Terms

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
        Vision for robotics

Recommendations

Semi-dense Visual Odometry for a Monocular Camera
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

We propose a fundamentally novel approach to real-time visual odometry for a monocular camera. It allows to benefit from the simplicity and accuracy of dense tracking - which does not depend on visual features - while running in real-time on a CPU. The ...
Multi-view LiDAR Guided Monocular 3D Object Detection
Pattern Recognition and Computer Vision
Abstract
Detecting 3D objects from monocular RGB images is an ill-posed task for lacking depth knowledge, and monocular-based 3D detection methods perform poorly compared with LiDAR-based 3D detection methods. Some bird’s-eye-view-based monocular 3D ...
Silhouette lookup for monocular 3D pose tracking

Computers should be able to detect and track the articulated 3D pose of a human being moving through a video sequence. Incremental tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 55, Issue 4

April 2023

871 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3567469

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2022

Online AM: 31 March 2022

Accepted: 08 March 2022

Revised: 17 January 2022

Received: 08 June 2021

Published in CSUR Volume 55, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Refereed

Funding Sources

National Key Research and Development Program of China
National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
2,943
Total Downloads

Downloads (Last 12 months)962
Downloads (Last 6 weeks)78

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Petrov MKniga ADyachenko DDubodelov ASimakov S(2024)Control of sports exercises and diagnostics of diseases of the musculoskeletal system using hardware and software complex and machine learning algorithmRussian Journal of Information Technology in Sports10.62105/2949-6349-2024-1-1-9-12(8-12)Online publication date: 13-Mar-2024
https://doi.org/10.62105/2949-6349-2024-1-1-9-12
Guan JHao YWu QLi SFang Y(2024)A Survey of 6DoF Object Pose Estimation Methods for Different Application ScenariosSensors10.3390/s2404107624:4(1076)Online publication date: 7-Feb-2024
https://doi.org/10.3390/s24041076
KANAI S(2024)Object Recognition from 3D Point Clouds : A Survey for Beginnersはじめての三次元点群からの物体認識Journal of the Japan Society for Precision Engineering10.2493/jjspe.90.63590:8(635-641)Online publication date: 5-Aug-2024
https://doi.org/10.2493/jjspe.90.635
Wang PLi XJiang PLi ZLi LLiu LWang Z(2024)Detection of Apparent Defects in HPLC/Dual Mode Portable Sorting Device Based on Deep Learning and Image ProcessingApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-00299:1Online publication date: 31-Jan-2024
https://doi.org/10.2478/amns-2024-0029
Liang Rvan Iterson HKrueger HToeters MFeijs L(2024)Chic-Marker: Fashionably Fusing Fiducial Markers into Apparel and AccessoriesProceedings of the 9th ACM Symposium on Computational Fabrication10.1145/3639473.3665790(1-15)Online publication date: 7-Jul-2024
https://dl.acm.org/doi/10.1145/3639473.3665790
Li SSchieber HCorell NEgger BKreimeier JRoth D(2024)GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00072(513-523)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VR58804.2024.00072
Zheng YZheng CShen JLiu PZhao S(2024)Keypoint-Guided Efficient Pose Estimation and Domain Adaptation for Micro Aerial VehiclesIEEE Transactions on Robotics10.1109/TRO.2024.340093840(2967-2983)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3400938
Sun HFan ZSong ZWang ZWu KLu J(2024)MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3-D Object DetectionIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.341317973(1-14)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3413179
Black DSalcudean S(2024)Robust Object Pose Tracking for Augmented Reality Guidance and TeleoperationIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.339810873(1-15)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3398108
Zou LHuang ZGu NWang G(2024)GPT-COPE: A Graph-Guided Point Transformer for Category-Level Object Pose EstimationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330990234:4(2385-2398)Online publication date: Apr-2024
https://doi.org/10.1109/TCSVT.2023.3309902
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents