Abstract
Computer systems are increasingly being used for sports training. Existing sports training systems either require expensive 3-D motion capture systems or do not provide intelligent analysis of user’s sports motion. This paper presents a framework for affordable and intelligent sports training systems for general users. The user is assumed to perform the same type of sport motion as an expert, and therefore the performer’s motion is more or less similar to the expert’s reference motion. The performer’s motion is recorded by a single stationary camera, and the expert’s 3-D reference motion is captured only once by a commercial motion capture system. Under such assumptions, sports motion analysis is formulated as a 3-D–2-D spatiotemporal motion registration problem. A novel algorithm is developed to perform spatiotemporal registration of the expert’s 3-D reference motion and a performer’s 2-D input video, thereby computing the deviation of the performer’s motion from the expert’s motion. The algorithm can effectively handle ambiguous situations in a single video such as depth ambiguity of body parts and partial occlusion. Test results on Taichi and golf swing motion show that, despite using only single video, the algorithm can compute 3-D posture errors that reflect the performer’s actual motion error.
Similar content being viewed by others
References
Simi: 3D motion tracking system. http://www.simi.com
Vicon: Optical motion capture system. http://www.vicon.com
MotionCoach: Golf swing analysis. http://www.motioncoach.com
Simi: Video based motion analysis. http://www.simi.com
Sports Motion: 2D video-based motion analysis system. http://www.sports-motion.com
V1 Pro: Golf swing analysis software. http://www.ifrontiers.com
Bregler C., Malik J.: Tracking people with twists and exponential maps. In: Proc. CVPR, pp. 8–15 (1998)
Sidenbladh H., Black M., Fleet D.: Stochastic tracking of 3D human figures using 2D image motion. In: Proc. ECCV, pp. 702–718 (2000)
Sminchisescu C., Triggs B.: Kinematic jump processes for monocular 3D human tracking. In: Proc. CVPR, pp. 69–76 (2003)
Li R., Yang M.H., Sclaroff S., Tian T.P.: Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In: Proc. ECCV, pp. 137–150 (2006)
Sminchisescu C., Triggs B.: Covariance scaled sampling for monocular 3D body tracking. In: Proc. CVPR, pp. 447–454 (2001)
Urtasun R., Fleet D.J., Fua P.: 3D people tracking with gaussian process dynamical models. In: Proc. CVPR, pp. 238–245 (2006)
Capsi Y., Irani M.: Spatio-temporal alignment of sequences. IEEE Trans. PAMI 24(11), 1409–1424 (2002)
Rao C., Gritai A., Shah M., Mahmood T.S.: View-invariant alignment and matching of video sequences. In: Proc. ICCV, pp. 939–945 (2003)
Moeslund T.B., Hilton A., Kruger V.: A survey of advances in vision-based human motion capture and analysis. Computer Vis. Image Underst. 104(2), 90–126 (2006)
Agarwal A., Triggs B.: 3D human pose from silhouettes by relevance vector regression. In: Proc. CVPR, pp. 882–888 (2004)
Bissacco A., Yang M.H., Soatto S.: Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In: Proc. CVPR (2007)
Elgammal A., Lee C.S.: Inferring 3D body pose from silhouettes using activity manifold learning. In: Proc. CVPR, pp. 681–688 (2004)
Fossati A., Salzmann M., Fua P.: Observable subspaces for 3D human motion recovery. In: Proc. CVPR (2009)
Ning H., Xu W., Gong Y., Huang T.: Discriminative learning of visual words for 3D human pose estimation. In: Proc. CVPR (2008)
Rosales R., Athitsos V., Sclaroff S.: 3D hand pose reconstruction using specialized mappings. In: Proc. ICCV, pp. 378–385 (2001)
Rosales R., Sclaroff S.: Specialized mappings and the estimation of human body pose from a single image. In: Workshop on human motion, pp. 19–24 (2000)
Sminchisescu C., Kanaujia A., Li Z., Metaxas D.: Discriminative density propagation for 3D human motion estimation. In: Proc. CVPR, pp. 390–397 (2005)
Thayananthan A., Navaratnam R., Stenger B., Torr P.H.S., Cipolla R.: Multivariate relevance vector machines for tracking. In: Proc. ECCV, pp. 124–138 (2006)
Urtasun R., Darrell T.: Sparse probabilistic regression for activity-independent human pose inference. In: Proc. CVPR (2008)
Tipping M.: The relevance vector machine. In: NIPS (2000)
Lee C.S., Elgammal A.: Modeling view and posture manifolds for tracking. In: Proc. ICCV (2007)
Navaratnam R., Fitzgibbon A., Cipolla R.: The joint manifold model for semi-supervised multi-valued regression. In: Proc. ICCV (2007)
Athitsos V., Alon J., Sclaroff S., Kollios G.: Boostmap: A method for efficient approximate similarity rankings. In: Proc. CVPR, pp. 268–275 (2004)
Athitsos V., Sclaroff S.: Inferring body pose without tracking body parts. In: Proc. CVPR, pp. 721–727 (2000)
Athitsos V., Sclaroff S.: Estimating 3D hand pose from a cluttered image. In: Proc. CVPR (2003)
Faloutsos C., Lin K.I.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: ACM SIGMOD, pp. 163–174 (1995)
Hjaltason G.R., Samet H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. PAMI 25(5), 530–549 (2003)
Howe N.R.: Silhouette lookup for automatic pose tracking. In: CVPR Workshop, pp. 15–22 (2004)
Mori G., Malik J.: Estimating human body configurations using shape context matching. In: Proc. ECCV, pp. 666–680 (2002)
Shakhnarovich G., Viola P., Darrell T.: Fast pose estimation with parameter-sensitive hashing. In: Proc. ICCV, pp. 750–757 (2003)
Roweis S., Saul L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Tenenbaum J.B., de Silva V., Langford J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Andriluka M., Roth S., Schiele B.: Pictorial structures revisited: people detection and articulated pose estimation. In: Proc. CVPR (2009)
Ferrari V., Marin-Jimenez M., Zisserman A.: Progressive search space reduction for human pose estimation. In: Proc. CVPR (2008)
Ioffe S., Forsyth D.: Finding people by sampling. In: Proc. ICCV, pp. 1092–1097 (1999)
Jiang H.: Human pose estimation using consistent max-covering. In: Proc. ICCV (2009)
Micilotta A., Ong E., Bowden R.: Detection and tracking of humans by probabilistic body part assembly. In: British Machine Vision Conference (2005)
Mikolajczyk K., Schmid D., Zisserman A.: Human detection based on a probabilistic assembly of robust part detectors. In: Proc. ECCV, pp. 69–82 (2004)
Mori G.: Guiding model search using segmentation. In: Proc. ICCV, pp. 1417–1423 (2005)
Ramanan D., Forsyth D.A., Zisserman A.: Strike a pose: tracking people by finding stylized poses. In: Proc. CVPR, pp. 271–278 (2005)
Ren X., Berg A.C., Malik J.: Recovering human body configurations using pairwise constraints between parts. In: Proc. ICCV, pp. 824–831 (2005)
Roberts T.J., McKenna S.J., Ricketts I.W.: Human pose estimation using partial configurations and probabilistic regions. IJCV 73(3), 285–306 (2007)
Ronfard R., Schmid C., Triggs B.: Learning to parse pictures of people. In: Proc. ECCV, pp. 700–714 (2002)
Daubney B., Gibson D., Campbell N.: Real-time pose estimation of articulated objects using low-level motion. In: Proc. CVPR (2008)
Ramannan D.: Learning to parse images of articulated bodies. In: Proceedings of neural information processing systems, pp. 1129–1136 (2006)
Yao B., Li F.F.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proc. CVPR (2010)
Cham T.J., Rehg, J.M.: A multiple hypothesis approach to figure tracking. In: Proc. CVPR, pp. 239–245 (1999)
Difranco D.E., Cham T.J., Rehg J.M.: Recovery of 3-D figure motion from 2-D correspondences. In: Proc. CVPR (2001)
Ju S., Black M., Yacoob Y.: Cardboard people: a parameterized model of articulated motion. In: Proc. Automatic Face and Gesture Recognition, pp. 38–44 (1996)
Rehg J., Kanade T.: Model-based tracking of self occluding articulated objects. In: Proc. CVPR, pp. 612–617 (1995)
Lee M.W., Cohen I.: Proposal maps driven mcmc for estimating human body pose in static images. In: Proc. CVPR, pp. 334–341 (2004)
Felzenszwalb P.F., Huttenlocher D.P.: Pictorial structures for object recognition. Int. J. Computer Vis. 61(1), 55–79 (2005)
Hua G., Wu Y.: Multi-scale visual tracking by sequential belief propagation. In: Proc. CVPR, pp. 826–833 (2004)
Isard M.: Pampas: Real-valued graphical models for computer vision. In: Proc. CVPR, pp. 613–620 (2003)
Sudderth E.B., Ihler A.T., Freeman W.T., Willsky A.S.: Nonparametric belief propagation. In: Proc. CVPR, pp. 605–612 (2003)
Sudderth E.B., Mandel M.I., Freeman W.T., Willsky A.S.: Visual hand tracking using nonparametric belief propagation. In: IEEE CVPR Workshop on Generative Model based Vision (2004)
Brubakerl M., Fleet D., Hertzmann A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Computer Vis. 87(1), 140–155 (2010)
Fossati A., Fua P.: Linking pose and motion. In: Proc. ECCV (2008)
Gupta A., Chen T., Chen F., Kimber D., Davis L.S.: Context and observation driven latent variable model for human pose estimation. In: Proc. CVPR (2008)
Howe N.R., Leventon M.E., Freeman W.T.: Bayesian reconstruction of 3D human motion from single-camera video. In: NIPS (1999)
Sidenbladh H., Black M.J.: Learning image statistics for bayesian tracking. In: Proc. ICCV 2, pp. 709–716 (2001)
Sigal L., Bhatia S., Roth S., Black M.J., Isard M.: Tracking loose-limbed people. In: Proc. CVPR, pp. 421–428 (2004)
Urtasun R., Fleet D.J., Fua P.: Monocular 3D tracking of the golf swing. In: Proc. CVPR 2, 932–938 (2005)
Urtasun R., Fleet D.J., Hertzmann A., Fua P.: Priors for people tracking from small training sets. In: Proc. ICCV, pp. 403–410 (2005)
Taylor G., Sigal L., Fleet D., Hinton G.: Dynamical binary latent variable models for 3D human pose tracking. In: Proc. CVPR, (2010)
Gleicher M.: Retargeting motion to new characters. In: ACM SIGGRAPH, pp. 33–42 (1998)
Jones M.J., Rehg J.M.: Statistical color models with application to skin detection. IJCV 46(1), 81–96 (2002)
Rother C., Kolmogorov V., Blake A.: Grabcut—interactive foreground extraction using iterated graph cuts. In: Proc. ACM SIGGRAPH, pp. 309–314 (2004)
Myers C., Rabinier L., Rosenberg A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoustic Speech Signal Process. 28(6), 623–635 (1980)
Yang Y., Ramannan D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Leow, W.K., Wang, R. & Leong, H.W. 3-D–2-D spatiotemporal registration for sports motion analysis. Machine Vision and Applications 23, 1177–1194 (2012). https://doi.org/10.1007/s00138-011-0371-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-011-0371-7