research-article

Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

Authors:

Jonathan Taylor,

Lucas Bordeaux,

Thomas Cashman,

Julien Valentin,

Arran Topalian,

Pushmeet Kohli,

Andrew Fitzgibbon, and

Jamie ShottonAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 35, Issue 4

Article No.: 143, Pages 1 - 12

https://doi.org/10.1145/2897824.2925965

Published: 11 July 2016 Publication History

Abstract

Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

Supplementary Material

ZIP File (a143-taylor-supp.zip)

Supplemental files.

Download
287.79 MB

MP4 File (a143.mp4)

Download
386.68 MB

References

[1]

3Gear Systems Inc, 2013. Gesture recognizer. http://threegear.com, Jan.

[2]

Athitsos, V., and Sclaroff, S. 2003. Estimating 3D hand pose from a cluttered image. In Proc. CVPR, vol. 2, II--432.

[3]

Ballan, L., Taneja, A., Gall, J., Gool, L. V., and Pollefeys, M. 2012. Motion capture of hands in action using discriminative salient points. In Proc. ECCV, 640--653.

Digital Library

[4]

Bray, M., Koller-Meier, E., and Van Gool, L. 2004. Smart particle filtering for 3D hand tracking. In Proc. Automatic Face and Gesture Recognition, 675--680.

Digital Library

[5]

de La Gorce, M., Fleet, D. J., and Paragios, N. 2011. Model-Based 3D Hand Pose Estimation from Monocular Video. IEEE Trans. PAMI 33, 9, 1793--1805.

Digital Library

[6]

Dipietro, L., Sabatini, A. M., and Dario, P. 2008. A survey of glove-based systems and their applications. IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 4, 461--482.

Digital Library

[7]

Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., and Twombly, X. 2007. Vision-based hand pose estimation: A review. CVIU 108, 1-2, 52--73.

Digital Library

[8]

Fleishman, S., Kliger, M., Lerner, A., and Kutliroff, G. 2015. ICPIK: Inverse kinematics based articulated-ICP. In Proc. CVPR Workshops, 28--35.

[9]

Geman, S., and McClure, D. E. 1987. Statistical methods for tomographic image reconstruction. Bulletin of the International Statistical Institute 52, 4, 5--21.

[10]

Guzmán-Rivera, A., Kohli, P., Glocker, B., Shotton, J., Sharp, T., Fitzgibbon, A. W., and Izadi, S. 2014. Multi-output learning for camera relocalization. In Proc. CVPR, 1114--1121.

Digital Library

[11]

Heap, T., and Hogg, D. 1996. Towards 3D hand tracking using a deformable model. In Proc. Automatic Face and Gesture Recognition, 140--145.

Digital Library

[12]

Intel Corporation, 2016. RealSense SDK. http://software.intel.com/realsense, Jan.

[13]

Jacobson, A., Deng, Z., Kavan, L., and Lewis, J. 2014. Skinning: Real-time shape deformation. In ACM SIGGRAPH 2014 Courses, #24.

Digital Library

[14]

Keskin, C., Kiraç, F., Kara, Y. E., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV, 852--863.

Digital Library

[15]

Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., and Fitzgibbon, A. 2015. Learning an efficient model of hand shape variation from depth images. In Proc. CVPR, 2540--2548.

[16]

Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In Proc. UIST, 167--176.

Digital Library

[17]

Krupka, E., Bar Hillel, A., Klein, B., Vinnikov, A., Freedman, D., and Stachniak, S. 2014. Discriminative ferns ensemble for hand pose recognition. In Proc. CVPR, 3670--3677.

Digital Library

[18]

Leap Motion Inc, 2013. Motion Controller. http://leapmotion.com/product, Jan.

[19]

Leap Motion Inc, 2015. Orion. http://developer.leapmotion.com/orion, Feb.

[20]

Li, P., Ling, H., Li, X., and Liao, C. 2015. 3D hand pose estimation using randomized decision forest with segmentation index points. In Proc. ICCV, 819--827.

Digital Library

[21]

Loop, C. T. 1987. Smooth Subdivision Surfaces Based on Triangles. Master's thesis, University of Utah.

[22]

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and Black, M. J. 2015. SMPL: a skinned multi-person linear model. ACM Trans. Graphics 34, 6, #248.

Digital Library

[23]

Makris, A., Kyriazis, N., and Argyros, A. 2015. Hierarchical particle filtering for 3D hand tracking. In Proc. CVPR Workshops, 8--17.

[24]

Melax, S., Keselman, L., and Orsten, S. 2013. Dynamics based 3D skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, 63--70.

Digital Library

[25]

Mitchell, D. P. 1991. Spectrally optimal sampling for distribution ray tracing. In Proc. SIGGRAPH, 157--164.

Digital Library

[26]

Monnai, Y., Hasegawa, K., Fujiwara, M., Yoshino, K., Inoue, S., and Shinoda, H. 2014. HaptoMime: Mid-air haptic interaction with a floating virtual screen. In Proc. UIST, 663--667.

Digital Library

[27]

Neverova, N., Wolf, C., Nebout, F., and Taylor, G. 2015. Hand pose estimation through weakly-supervised learning of a rich intermediate representation. arXiv preprint 1511.06728.

[28]

Oberweger, M., Wohlhart, P., and Lepetit, V. 2015. Training a feedback loop for hand pose estimation. In Proc. ICCV, 3316--3324.

Digital Library

[29]

Oikonomidis, I., Kyriazis, N., and Argyros, A. 2011. Efficient model-based 3D tracking of hand articulations using Kinect. In Proc. BMVC, 101.1--101.11.

[30]

Poier, G., Roditakis, K., Schulter, S., Michel, D., Bischof, H., and Argyros, A. A. 2015. Hybrid one-shot 3D hand pose estimation by exploiting uncertainties. In Proc. BMVC, 182.1--182.14.

[31]

Qian, C., Sun, X., Wei, Y., Tang, X., and Sun, J. 2014. Realtime and robust hand tracking from depth. In Proc. CVPR, 1106--1113.

Digital Library

[32]

Rehg, J. M., and Kanade, T. 1994. Visual tracking of high DOF articulated structures: an application to human hand tracking. In Proc. ECCV, 35--46.

Digital Library

[33]

Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A., and Izadi, S. 2015. Accurate, robust, and flexible realtime hand tracking. In Proc. CHI, 3633--3642.

Digital Library

[34]

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from a single depth image. In Proc. CVPR, 1297--1304.

Digital Library

[35]

Shotton, J., Sharp, T., Kohli, P., Nowozin, S., Winn, J., and Criminisi, A. 2013. Decision jungles: Compact and rich models for classification. In NIPS, 234--242.

[36]

Sridhar, S., Oulasvirta, A., and Theobalt, C. 2013. Interactive markerless articulated hand motion tracking using RGB and depth data. In Proc. ICCV, 2456--2463.

Digital Library

[37]

Sridhar, S., Rhodin, H., Seidel, H.-P., Oulasvirta, A., and Theobalt, C. 2014. Real-time hand tracking using a sum of anisotropic Gaussians model. In Proc. 3DV, 319--326.

Digital Library

[38]

Sridhar, S., Mueller, F., Oulasvirta, A., and Theobalt, C. 2015. Fast and robust hand tracking using detection-guided optimization. In Proc. CVPR, 3213--3221.

[39]

Stenger, B., Mendonça, P. R., and Cipolla, R. 2001. Model-based 3D tracking of an articulated hand. In Proc. CVPR, vol. 2, II--310.

[40]

Sun, X., Wei, Y., Liang, S., Tang, X., and Sun, J. 2015. Cascaded hand pose regression. In Proc. CVPR, 824--832.

[41]

Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., and Pauly, M. 2015. Robust articulated-ICP for real-time hand tracking. Computer Graphics Forum 34, 5, 101--114.

[42]

Tan, D. J., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., and Shotton, J. 2016. Fits like a glove: Rapid and reliable hand shape personalization. In Proc. CVPR.

[43]

Tang, D., Yu, T.-H., and Kim, T.-K. 2013. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In Proc. ICCV, 3224--3231.

Digital Library

[44]

Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.-K., and Shotton, J. 2015. Opening the black box: Hierarchical sampling optimization for estimating human hand pose. In Proc. ICCV, 3325--3333.

Digital Library

[45]

Taylor, J., Shotton, J., Sharp, T., and Fitzgibbon, A. 2012. The Vitruvian Manifold: Inferring dense correspondences for one-shot human pose estimation. In Proc. CVPR, 103--110.

Digital Library

[46]

Taylor, J., Stebbing, R., Ramakrishna, V., Keskin, C., Shotton, J., Izadi, S., Hertzmann, A., and Fitzgibbon, A. 2014. User-specific hand modeling from monocular depth sequences. In Proc. CVPR, 644--651.

Digital Library

[47]

Tejani, A., Tang, D., Kouskouridas, R., and Kim, T.-K. 2014. Latent-class Hough forests for 3D object detection and pose estimation. In Proc. ECCV, 462--477.

[48]

Tompson, J., Stein, M., Lecun, Y., and Perlin, K. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graphics 33, 5, #169.

Digital Library

[49]

Triggs, W., McLauchlan, P., Hartley, R., and Fitzgibbon, A. 2000. Bundle adjustment --- A modern synthesis. In Vision Algorithms: Theory and Practice, LNCS. 298--372.

Digital Library

[50]

Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., and Gall, J. 2015. Capturing hands in action using discriminative salient points and physics simulation. arXiv preprint 1506.02178.

Digital Library

[51]

Ultrahaptics Ltd, 2013. Haptics System. http://ultrahaptics.com, Jan. Valentin, J., Dai, A., Niessner, M., Kohli, P., Torr, P., Izadi, S., and Keskin, C. 2016. Learning to navigate the energy landscape. arXiv preprint 1603.05772.

[52]

Vicente, S., and Agapito, L. 2013. Balloon shapes: reconstructing and deforming objects with volume from images. In Proc. 3DV, 223--230.

Digital Library

[53]

Wang, R. Y., and Popović, J. 2009. Real-time hand-tracking with a color glove. ACM Trans. Graphics 28, 3, #63.

Digital Library

[54]

Wang, R., Paris, S., and Popović, J. 2011. 6D hands. In Proc. UIST, 549--558.

Digital Library

[55]

Wang, Y., Min, J., Zhang, J., Liu, Y., Xu, F., Dai, Q., and Chai, J. 2013. Video-based hand manipulation capture through composite motion control. ACM Trans. Graphics 32, 4 (July), 43:1--43:14.

Digital Library

[56]

Wu, Y., and Huang, T. S. 2000. View-independent recognition of hand postures. In Proc. CVPR, vol. 2, 88--94.

[57]

Wu, Y., Lin, J. Y., and Huang, T. S. 2001. Capturing natural hand articulation. In Proc. ICCV, vol. 2, 426--432.

[58]

Xu, C., and Cheng, L. 2013. Efficient hand pose estimation from a single depth image. In Proc. ICCV, 3456--3462.

Digital Library

[59]

Zach, C. 2014. Robust bundle adjustment revisited. In Proc. ECCV, 772--787.

[60]

Zhao, W., Chai, J., and Xu, Y.-Q. 2012. Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data. In Proc. Symposium on Computer Animation, 33--42.

Digital Library

Cited By

Zheng LHu ZYao MXu PMa J(2024)Point cloud based hand gesture recognition using template matchingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23312046:1(2615-2627)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-233120
Show More Cited By

Index Terms

Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Virtual reality
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques

Recommendations

Silhouette lookup for monocular 3D pose tracking

Computers should be able to detect and track the articulated 3D pose of a human being moving through a video sequence. Incremental tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a ...
Read More
Innovative geometric pose reconstruction for marker-based single camera tracking
VRCIA '06: Proceedings of the 2006 ACM international conference on Virtual reality continuum and its applications

Mobile augmented reality applications are in need of tracking systems which can be wearable and do not cause a high processing load, while still offering reasonable performance, robustness and accuracy. The motivation to develop yet another tracking ...
Read More
Global hand pose estimation by multiple camera ellipse tracking

Immersive virtual environments with life-like interaction capabilities have very demanding requirements including high-precision motion capture and high-processing speed. These issues raise many challenges for computer vision-based motion estimation ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 35, Issue 4

July 2016

1396 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2897824

Issue’s Table of Contents

Copyright © 2016 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2016

Published in TOG Volume 35, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

233
Total Citations
View Citations
2,238
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)9

Other Metrics

View Author Metrics

Citations

Cited By

Zheng LHu ZYao MXu PMa J(2024)Point cloud based hand gesture recognition using template matchingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23312046:1(2615-2627)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-233120
Wang SWang XJiang WMiao CCao QWang HSun KXue HSu L(2024)Towards Smartphone-based 3D Hand Pose Reconstruction Using Acoustic SignalsACM Transactions on Sensor Networks10.1145/3677122Online publication date: 16-Jul-2024
https://dl.acm.org/doi/10.1145/3677122
Kalshetti PChaudhuri P(2024)Intrinsic Hand Avatar: Illumination-aware Hand Appearance and Shape Reconstruction from Monocular RGB Video2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00601(6108-6118)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00601
Lougiakis CGonzález JGanias GKatifori AIoannis-Panagiotis Roussou M(2024)Comparing Physics-based Hand Interaction in Virtual Reality: Custom Soft Body Simulation vs. Off-the-Shelf Integrated Solution2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00094(743-753)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VR58804.2024.00094
Zhou JXu CGe YCheng L(2024)Realistic Depth Image Synthesis for 3D Hand Pose EstimationIEEE Transactions on Multimedia10.1109/TMM.2023.333052226(5246-5256)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3330522
Kyaw ASpencer LLok L(2024)Human–machine collaboration using gesture recognition in mixed reality and robotic fabricationArchitectural Intelligence10.1007/s44223-024-00053-43:1Online publication date: 15-Mar-2024
https://doi.org/10.1007/s44223-024-00053-4
Htet Kyaw ASpencer LZivkovic SLok L(2024)Gesture Recognition for Feedback Based Mixed Reality and Robotic Fabrication: A Case Study of the UnLog TowerPhygital Intelligence10.1007/978-981-99-8405-3_28(331-345)Online publication date: 4-Jan-2024
https://doi.org/10.1007/978-981-99-8405-3_28
Moon GSaito SXu WJoshi RBuffalini JBellan HRosen NRichardson JMize Mde Bree PSimon TPeng BGarg SMcPhail KShiratori TOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)A dataset of relighted 3D interacting handsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666898(17689-17701)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666898
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents