Abstract
Object pose estimation is crucial in human–computer interaction systems. The traditional point-based detection approaches rely on the robustness of feature points, the tracking methods utilize the similarity between frames to improve the speed, while the recent studies based on neural networks concentrate on solving specific invariance problems. Different from these methods, PTPE (Part-based Tracking for Pose Estimation) proposed in this paper focuses on how to balance the speed and accuracy under different conditions. In this method, the point matching is transformed into the part matching inside an object to enhance the reliability of the features. Additionally, a fast interframe tracking method is combined with learning models and structural information to enhance robustness. During tracking, multiple strategies are adopted for the different parts according to the matching effects evaluated by the learning models, so as to develop the locality and avoid the time consumption caused by undifferentiated full frame detection or learning. In addition, the constraints between parts are applied for parts detection optimization. Experiments show that PTPE is efficient both in accuracy and speed, especially in complex environments, when compared with classical algorithms that focus only on detection, interframe tracking, self-supervised models, and graph matching.
Similar content being viewed by others
Data availability
The corresponding author will supply the relevant data in response to reasonable requests.
References
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. IJCAI 81, 674–679 (1981)
Bouguet, J.-Y.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 5(1–10), 4 (2001)
Ali, S., Daul, C., Galbrun, E.: Illumination invariant optical flow using neighborhood descriptors. Comput. Vis. Image Underst. 145, 95–110 (2016)
Makar, M., Tsai, S.S., Chandrasekhar, V., Chen, D., Girod, B.: Interframe coding of canonical patches for low bit-rate mobile augmented reality. Int. J. Semant. Comput. 7(01), 5–24 (2013)
Koyama, J., Makar, M., Araujo, A.F., Girod, B.: Interframe compression with selective update framework of local features for mobile augmented reality. In: IEEE International Conference on Multimedia and Expo Workshops, pp. 1–6 (2014)
Crivellaro, A., Rad, M., Verdie, Y., Yi, K.M., Fua, P.: Robust 3d object tracking from monocular images using stable parts. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1465–1479 (2017)
Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–134 (2018)
Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: IEEE International Conference on Computer Vision (ICCV), pp. 3848–3856 (2017)
Lee, C.-Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: Roomnet: end-to-end room layout estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp. 21–37 (2016)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Barandiaran, J., Borro, D.: Edge-based markerless 3d tracking of rigid objects. In: International Conference on Artificial Reality and Telexistence, pp. 282–283 (2007)
Wang, T., Ling, H.: Gracker: a graph-based planar object tracker. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp 1494–1501 (2018)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. (CVIU) 110(3), 346–359 (2008)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: Orb: an efficient alternative to sift or surf. In: IEEE International Conference on Computer Vision (ICCV), Barcelona, 6–13 November 2011, pp. 2564–2571 (2011)
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: binary robust invariant scalable keypoints. In: IEEE International Conference on Computer Vision, pp. 2548–2555 (2011)
Alcantarilla, P.F.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: British Machine Vision Conference (BMVC), pp. 13.1–13.11 (2013)
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable cnn for joint detection and description of local features. arXiv preprint arXiv:1905.03561
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199
Billings, G., Johnson-Roberson, M.: Silhonet: an rgb method for 6d object pose estimation. IEEE Robot. Autom. Lett. 4(4), 3727–3734 (2019)
Peng, S., Zhou, X., Liu, Y., Lin, H., Bao, H.: Pvnet: pixel-wise voting network for 6dof object pose estimation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (99), pp. 1–1 (2020)
Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 431–440 (2020)
Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Underst. 73(2), 291–307 (1999)
Makar, M., Tsai, S.S., Chandrasekhar, V., Chen, D., Girod, B.: Interframe coding of canonical patches for low bit-rate mobile augmented reality. Int. J. Semant. Comput. 7(01), 5–24 (2013)
Pauwels, K., Rubio, L., Diaz, J., Ros, E.: Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, pp. 2347–2354 (2013)
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3d object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2021)
Acknowledgements
This work is supported by the Scientific Research Funds of Huaqiao University, China (605-50Y21011). Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University. Provincial Key Laboratory of Computer Vision and Machine Learning of Educational Department of Fujian Province (201902).
Author information
Authors and Affiliations
Contributions
SY: manuscript text, figures, tables. JY: software, validataion. QL: software, validation. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, S., Ye, J. & Lei, Q. Part-based tracking for object pose estimation. J Real-Time Image Proc 20, 99 (2023). https://doi.org/10.1007/s11554-023-01351-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01351-2