Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

Published: 11 July 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

    Supplementary Material

    ZIP File (a143-taylor-supp.zip)
    Supplemental files.
    MP4 File (a143.mp4)

    References

    [1]
    3Gear Systems Inc, 2013. Gesture recognizer. http://threegear.com, Jan.
    [2]
    Athitsos, V., and Sclaroff, S. 2003. Estimating 3D hand pose from a cluttered image. In Proc. CVPR, vol. 2, II--432.
    [3]
    Ballan, L., Taneja, A., Gall, J., Gool, L. V., and Pollefeys, M. 2012. Motion capture of hands in action using discriminative salient points. In Proc. ECCV, 640--653.
    [4]
    Bray, M., Koller-Meier, E., and Van Gool, L. 2004. Smart particle filtering for 3D hand tracking. In Proc. Automatic Face and Gesture Recognition, 675--680.
    [5]
    de La Gorce, M., Fleet, D. J., and Paragios, N. 2011. Model-Based 3D Hand Pose Estimation from Monocular Video. IEEE Trans. PAMI 33, 9, 1793--1805.
    [6]
    Dipietro, L., Sabatini, A. M., and Dario, P. 2008. A survey of glove-based systems and their applications. IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 4, 461--482.
    [7]
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., and Twombly, X. 2007. Vision-based hand pose estimation: A review. CVIU 108, 1-2, 52--73.
    [8]
    Fleishman, S., Kliger, M., Lerner, A., and Kutliroff, G. 2015. ICPIK: Inverse kinematics based articulated-ICP. In Proc. CVPR Workshops, 28--35.
    [9]
    Geman, S., and McClure, D. E. 1987. Statistical methods for tomographic image reconstruction. Bulletin of the International Statistical Institute 52, 4, 5--21.
    [10]
    Guzmán-Rivera, A., Kohli, P., Glocker, B., Shotton, J., Sharp, T., Fitzgibbon, A. W., and Izadi, S. 2014. Multi-output learning for camera relocalization. In Proc. CVPR, 1114--1121.
    [11]
    Heap, T., and Hogg, D. 1996. Towards 3D hand tracking using a deformable model. In Proc. Automatic Face and Gesture Recognition, 140--145.
    [12]
    Intel Corporation, 2016. RealSense SDK. http://software.intel.com/realsense, Jan.
    [13]
    Jacobson, A., Deng, Z., Kavan, L., and Lewis, J. 2014. Skinning: Real-time shape deformation. In ACM SIGGRAPH 2014 Courses, #24.
    [14]
    Keskin, C., Kiraç, F., Kara, Y. E., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV, 852--863.
    [15]
    Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., and Fitzgibbon, A. 2015. Learning an efficient model of hand shape variation from depth images. In Proc. CVPR, 2540--2548.
    [16]
    Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In Proc. UIST, 167--176.
    [17]
    Krupka, E., Bar Hillel, A., Klein, B., Vinnikov, A., Freedman, D., and Stachniak, S. 2014. Discriminative ferns ensemble for hand pose recognition. In Proc. CVPR, 3670--3677.
    [18]
    Leap Motion Inc, 2013. Motion Controller. http://leapmotion.com/product, Jan.
    [19]
    Leap Motion Inc, 2015. Orion. http://developer.leapmotion.com/orion, Feb.
    [20]
    Li, P., Ling, H., Li, X., and Liao, C. 2015. 3D hand pose estimation using randomized decision forest with segmentation index points. In Proc. ICCV, 819--827.
    [21]
    Loop, C. T. 1987. Smooth Subdivision Surfaces Based on Triangles. Master's thesis, University of Utah.
    [22]
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and Black, M. J. 2015. SMPL: a skinned multi-person linear model. ACM Trans. Graphics 34, 6, #248.
    [23]
    Makris, A., Kyriazis, N., and Argyros, A. 2015. Hierarchical particle filtering for 3D hand tracking. In Proc. CVPR Workshops, 8--17.
    [24]
    Melax, S., Keselman, L., and Orsten, S. 2013. Dynamics based 3D skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, 63--70.
    [25]
    Mitchell, D. P. 1991. Spectrally optimal sampling for distribution ray tracing. In Proc. SIGGRAPH, 157--164.
    [26]
    Monnai, Y., Hasegawa, K., Fujiwara, M., Yoshino, K., Inoue, S., and Shinoda, H. 2014. HaptoMime: Mid-air haptic interaction with a floating virtual screen. In Proc. UIST, 663--667.
    [27]
    Neverova, N., Wolf, C., Nebout, F., and Taylor, G. 2015. Hand pose estimation through weakly-supervised learning of a rich intermediate representation. arXiv preprint 1511.06728.
    [28]
    Oberweger, M., Wohlhart, P., and Lepetit, V. 2015. Training a feedback loop for hand pose estimation. In Proc. ICCV, 3316--3324.
    [29]
    Oikonomidis, I., Kyriazis, N., and Argyros, A. 2011. Efficient model-based 3D tracking of hand articulations using Kinect. In Proc. BMVC, 101.1--101.11.
    [30]
    Poier, G., Roditakis, K., Schulter, S., Michel, D., Bischof, H., and Argyros, A. A. 2015. Hybrid one-shot 3D hand pose estimation by exploiting uncertainties. In Proc. BMVC, 182.1--182.14.
    [31]
    Qian, C., Sun, X., Wei, Y., Tang, X., and Sun, J. 2014. Realtime and robust hand tracking from depth. In Proc. CVPR, 1106--1113.
    [32]
    Rehg, J. M., and Kanade, T. 1994. Visual tracking of high DOF articulated structures: an application to human hand tracking. In Proc. ECCV, 35--46.
    [33]
    Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A., and Izadi, S. 2015. Accurate, robust, and flexible realtime hand tracking. In Proc. CHI, 3633--3642.
    [34]
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from a single depth image. In Proc. CVPR, 1297--1304.
    [35]
    Shotton, J., Sharp, T., Kohli, P., Nowozin, S., Winn, J., and Criminisi, A. 2013. Decision jungles: Compact and rich models for classification. In NIPS, 234--242.
    [36]
    Sridhar, S., Oulasvirta, A., and Theobalt, C. 2013. Interactive markerless articulated hand motion tracking using RGB and depth data. In Proc. ICCV, 2456--2463.
    [37]
    Sridhar, S., Rhodin, H., Seidel, H.-P., Oulasvirta, A., and Theobalt, C. 2014. Real-time hand tracking using a sum of anisotropic Gaussians model. In Proc. 3DV, 319--326.
    [38]
    Sridhar, S., Mueller, F., Oulasvirta, A., and Theobalt, C. 2015. Fast and robust hand tracking using detection-guided optimization. In Proc. CVPR, 3213--3221.
    [39]
    Stenger, B., Mendonça, P. R., and Cipolla, R. 2001. Model-based 3D tracking of an articulated hand. In Proc. CVPR, vol. 2, II--310.
    [40]
    Sun, X., Wei, Y., Liang, S., Tang, X., and Sun, J. 2015. Cascaded hand pose regression. In Proc. CVPR, 824--832.
    [41]
    Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., and Pauly, M. 2015. Robust articulated-ICP for real-time hand tracking. Computer Graphics Forum 34, 5, 101--114.
    [42]
    Tan, D. J., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., and Shotton, J. 2016. Fits like a glove: Rapid and reliable hand shape personalization. In Proc. CVPR.
    [43]
    Tang, D., Yu, T.-H., and Kim, T.-K. 2013. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In Proc. ICCV, 3224--3231.
    [44]
    Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.-K., and Shotton, J. 2015. Opening the black box: Hierarchical sampling optimization for estimating human hand pose. In Proc. ICCV, 3325--3333.
    [45]
    Taylor, J., Shotton, J., Sharp, T., and Fitzgibbon, A. 2012. The Vitruvian Manifold: Inferring dense correspondences for one-shot human pose estimation. In Proc. CVPR, 103--110.
    [46]
    Taylor, J., Stebbing, R., Ramakrishna, V., Keskin, C., Shotton, J., Izadi, S., Hertzmann, A., and Fitzgibbon, A. 2014. User-specific hand modeling from monocular depth sequences. In Proc. CVPR, 644--651.
    [47]
    Tejani, A., Tang, D., Kouskouridas, R., and Kim, T.-K. 2014. Latent-class Hough forests for 3D object detection and pose estimation. In Proc. ECCV, 462--477.
    [48]
    Tompson, J., Stein, M., Lecun, Y., and Perlin, K. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graphics 33, 5, #169.
    [49]
    Triggs, W., McLauchlan, P., Hartley, R., and Fitzgibbon, A. 2000. Bundle adjustment --- A modern synthesis. In Vision Algorithms: Theory and Practice, LNCS. 298--372.
    [50]
    Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., and Gall, J. 2015. Capturing hands in action using discriminative salient points and physics simulation. arXiv preprint 1506.02178.
    [51]
    Ultrahaptics Ltd, 2013. Haptics System. http://ultrahaptics.com, Jan. Valentin, J., Dai, A., Niessner, M., Kohli, P., Torr, P., Izadi, S., and Keskin, C. 2016. Learning to navigate the energy landscape. arXiv preprint 1603.05772.
    [52]
    Vicente, S., and Agapito, L. 2013. Balloon shapes: reconstructing and deforming objects with volume from images. In Proc. 3DV, 223--230.
    [53]
    Wang, R. Y., and Popović, J. 2009. Real-time hand-tracking with a color glove. ACM Trans. Graphics 28, 3, #63.
    [54]
    Wang, R., Paris, S., and Popović, J. 2011. 6D hands. In Proc. UIST, 549--558.
    [55]
    Wang, Y., Min, J., Zhang, J., Liu, Y., Xu, F., Dai, Q., and Chai, J. 2013. Video-based hand manipulation capture through composite motion control. ACM Trans. Graphics 32, 4 (July), 43:1--43:14.
    [56]
    Wu, Y., and Huang, T. S. 2000. View-independent recognition of hand postures. In Proc. CVPR, vol. 2, 88--94.
    [57]
    Wu, Y., Lin, J. Y., and Huang, T. S. 2001. Capturing natural hand articulation. In Proc. ICCV, vol. 2, 426--432.
    [58]
    Xu, C., and Cheng, L. 2013. Efficient hand pose estimation from a single depth image. In Proc. ICCV, 3456--3462.
    [59]
    Zach, C. 2014. Robust bundle adjustment revisited. In Proc. ECCV, 772--787.
    [60]
    Zhao, W., Chai, J., and Xu, Y.-Q. 2012. Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data. In Proc. Symposium on Computer Animation, 33--42.

    Cited By

    View all
    • (2024)Point cloud based hand gesture recognition using template matchingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23312046:1(2615-2627)Online publication date: 1-Jan-2024
    • Show More Cited By

    Index Terms

    1. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 35, Issue 4
        July 2016
        1396 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2897824
        Issue’s Table of Contents
        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 July 2016
        Published in TOG Volume 35, Issue 4

        Check for updates

        Author Tags

        1. articulated tracking
        2. subdivision surfaces
        3. virtual reality

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)82
        • Downloads (Last 6 weeks)9

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Point cloud based hand gesture recognition using template matchingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23312046:1(2615-2627)Online publication date: 1-Jan-2024
        • (2024)Towards Smartphone-based 3D Hand Pose Reconstruction Using Acoustic SignalsACM Transactions on Sensor Networks10.1145/3677122Online publication date: 16-Jul-2024
        • (2024)Intrinsic Hand Avatar: Illumination-aware Hand Appearance and Shape Reconstruction from Monocular RGB Video2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00601(6108-6118)Online publication date: 3-Jan-2024
        • (2024)Comparing Physics-based Hand Interaction in Virtual Reality: Custom Soft Body Simulation vs. Off-the-Shelf Integrated Solution2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00094(743-753)Online publication date: 16-Mar-2024
        • (2024)Realistic Depth Image Synthesis for 3D Hand Pose EstimationIEEE Transactions on Multimedia10.1109/TMM.2023.333052226(5246-5256)Online publication date: 2024
        • (2024)Human–machine collaboration using gesture recognition in mixed reality and robotic fabricationArchitectural Intelligence10.1007/s44223-024-00053-43:1Online publication date: 15-Mar-2024
        • (2024)Gesture Recognition for Feedback Based Mixed Reality and Robotic Fabrication: A Case Study of the UnLog TowerPhygital Intelligence10.1007/978-981-99-8405-3_28(331-345)Online publication date: 4-Jan-2024
        • (2023)A dataset of relighted 3D interacting handsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666898(17689-17701)Online publication date: 10-Dec-2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media