Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1924559.1924561acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Human action recognition in video by 'meaningful' poses

Published: 12 December 2010 Publication History

Abstract

We propose a graph theoretic technique for recognizing actions at a distance by modeling the visual senses associated with human poses. Identifying the intended meaning of poses is a challenging task because of their variability and such variations in poses lead to visual sense ambiguity. Our methodology follows a bag-of-words approach. Here "word" refers to the pose descriptor of the human figure corresponding to a single video frame and a "document" corresponds to the entire video of a particular action. From a large vocabulary of poses we prune out ambiguous poses and extract 'meaningful' [6] poses - for each action type in a supervised fashion - using centrality measure of graph connectivity [16]. The number of 'meaningful' poses per action is determined by setting a bound on the centrality measure. We evaluate our methodology on four standard activity recognition datasets and the results clearly demonstrate the superiority of our approach over the present state-of-the-art.

References

[1]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[2]
C.-C. Chen, M. S. Ryoo, and J. K. Aggarwal. UT-Tower Dataset: Aerial View Activity Classification Challenge. http://cvrc.ece.utexas.edu/SDHA2010/Aerial_View_Activity.html, 2010.
[3]
G. K. M. Cheung, S. Baker, C. Simon, and T. Kanade. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In Computer Vision and Pattern Recognition (volume 1), pages 77--84. IEEE Computer Society, June 2003.
[4]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2003.
[5]
A. Desolneux, L. Moisan, and J.-M. Morel. A grouping principle and four applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4):508--513, April 2003.
[6]
A. Desolneux, L. Moisan, and J.-M. Morel. From Gestalt Theory to Image Analysis: A Probabilistic Approach. Spriger, 2008.
[7]
A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In International Conference on Computer Vision (volume 2), pages 726--733. IEEE Computer Society, October 2003.
[8]
L. Fengjun and R. Nevatia. Single view human action recognition using key pose matching and viterbi path seraching. In Computer Vision and Pattern Recognition. IEEE Computer Society, 2007.
[9]
W. Hoeffding. Probability inequalities for sum of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, March 1963.
[10]
J. Liu, S. Ali, and M. Shah. Recognizing human actions using multiple features. In Computer Vision and Pattern Recognition. IEEE Computer Society, July 2008.
[11]
W. L. Lu, K. Okuma, and J. J. Little. Tracking and recognizing actions of multiple hockey players using the boosted particle filter. Image and Vision Computing, 27(1/2):189--205, January 2009.
[12]
B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, pages 674--679. Morgan Kaufmann Publishers Inc., 1981.
[13]
G. Mori and J. Malik. Estimating human body configurations using shape context matching. In Europian Conference on Computer Vision (volume 3) LNCS 2352, pages 666--680. Springer, January 2002.
[14]
G. Mori, X. Ren, A. Efros, and J. Malik. Recovering human body configurations: Combining segmentation and recognition. In Computer Vision and Pattern Recognition (volume 2), pages 326--333. IEEE Computer Society, June 27-July 2 2004.
[15]
B. L. Narayan, C. A. Murthy, and S. K. Pal. Maxdiff kd-trees for data condensation. Pattern Recognition Letters, 27(3):187--200, February 2006.
[16]
R. Navigli and M. Lapata. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4):678--692, April 2010.
[17]
J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3):299--318, June 2008.
[18]
D. Pelleg and A. W. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727--734. Morgan Kaufmann Publishers Inc., 2000.
[19]
C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In International Conference on Pattern Recognition, pages 32--36. IEEE Computer Society, 2004.
[20]
Y. Wang and G. Mori. Human action recognition by semi-latent topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(10):1762--1774, October 2009.
[21]
D. B. West. Introduction to Graph Theory. Prentice Hall, 2000.

Cited By

View all
  • (2011)Temporal key poses for human action recognition2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)10.1109/ICCVW.2011.6130403(1310-1317)Online publication date: Nov-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVGIP '10: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
December 2010
533 pages
ISBN:9781450300605
DOI:10.1145/1924559
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICVGIP '10

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Temporal key poses for human action recognition2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)10.1109/ICCVW.2011.6130403(1310-1317)Online publication date: Nov-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media