Abstract
Hand modeling and tracking are essential in video-based sign language recognition. The high reformability and the large number of degrees of freedom of hands render the problem difficult. To tackle these challenges, a novel approach based on robust principal component analysis (PCA) is proposed. The robust PCA incorporates an L 1 norm objective function to deal with background clutter, and a projection pursuit strategy to deal with the lack of alignment due to the deformation of hands. The learning algorithm of the robust PCA is very simple, involving only a search for the solutions in a finite set constructed from the training data, which leads to the learning of much more representative and interpretable bases. The incorporation of the L 1 regularization in the fitting of the learned robust PCA models results in cleaner reconstructions and more stable fitting. Based on the robust PCA, a hand tracking system is developed that contains a skin-color region segmentation based on graph cuts and template matching in the framework of particle filtering. Experiments on a publicly available sign-language video database demonstrates the strength of the method.
Chapter PDF
Similar content being viewed by others
References
Dorner, B.: Hand shape identification and tracking for sign language interpretation. In: IJCAI Workshop on Looking at People (1993)
Starner, T., Weaver, J., Pentland, A.: Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1371–1375 (1998)
Cooper, H., Bowden, R.: Large Lexicon Detection of Sign Language. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 88–97. Springer, Heidelberg (2007)
Kadir, T., Bowden, R., Ong, E.J., Zisserman, A.: Minimal training, large lexicon, unconstrained sign language recognition. In: British Machine Vision Conference, Kingston, UK (2004)
Ong, E., Bowden, R.: A boosted classifier tree for hand shape detection. In: Internatial Conference on Automatic Face and Gesture Recogntion (2004)
Buehler, P., Everingham, M., Huttenlocher, D., Zisserman, A.: Long term arm and hand tracking for continuous sign language TV broadcasts. In: British Machine Vision Conference (2008)
Cooper, H., Bowden, R.: Learning Signs from Subtitles: A Weakly Supervised Approach to Sign Language Recognition. In: Computer Vision and Pattern Recognition, pp. 2568–2574 (2009)
Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching TV (using weakly aligned subtitles). In: Computer Vision and Pattern Recognition (2009)
Coogan, T., Sutherland, A.: Transformation invariance in hand shape recognition. In: International Conference on Pattern Recognition (2006)
Huang, D.Y., Hu, W.C., Chang, S.H.: Vision-based hand gesture recognition using pca+gabor filters and svm. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 1–4 (2009)
Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991)
Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML 2006: Proceedings of the 23rd International Conference on Machine Learning, pp. 281–288 (2006)
Kwak, N.: Principal component analysis based on L1-norm maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1672–1680 (2008)
Croux, C., Ruiz-Gazen, A.: High breakdown estimators for principal components: the Projection-pursuit approach revisited. Journal of Multivariate Analysis 95, 206–226 (2005)
La Torre, F.D., Black, M.J.: A framework for robust subspace learning. International Journal of Computer Vision 54, 117–142 (2003)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301–320 (2005)
Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing 4, 606–617 (2007)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001)
Boykov, Y., Jolly, M.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: International Conference on Computer Vision, vol. I, pp. 105–112 (2001)
Doucet, A., de Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Godsill, S., Doucet, A., West, M.: Maximum a posteriori sequence estimation using Monte Carlo particle filters. Annals of the Institute of Statistical Mathematics 53, 82–96 (2001)
Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., Ney, H.: Speech Recognition Techniques for a Sign Language Recognition System. In: Interspeech, pp. 2513–2516 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, W., Piater, J. (2012). Hand Modeling and Tracking for Video-Based Sign Language Recognition by Robust Principal Component Analysis. In: Kutulakos, K.N. (eds) Trends and Topics in Computer Vision. ECCV 2010. Lecture Notes in Computer Science, vol 6553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35749-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-35749-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35748-0
Online ISBN: 978-3-642-35749-7
eBook Packages: Computer ScienceComputer Science (R0)