research-article

Human action recognition in video by 'meaningful' poses

Authors:

Snehasis Mukherjee,

Sujoy Kumar Biswas,

Dipti Prasad MukherjeeAuthors Info & Claims

ICVGIP '10: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing

Pages 9 - 16

https://doi.org/10.1145/1924559.1924561

Published: 12 December 2010 Publication History

Abstract

We propose a graph theoretic technique for recognizing actions at a distance by modeling the visual senses associated with human poses. Identifying the intended meaning of poses is a challenging task because of their variability and such variations in poses lead to visual sense ambiguity. Our methodology follows a bag-of-words approach. Here "word" refers to the pose descriptor of the human figure corresponding to a single video frame and a "document" corresponds to the entire video of a particular action. From a large vocabulary of poses we prune out ambiguous poses and extract 'meaningful' [6] poses - for each action type in a supervised fashion - using centrality measure of graph connectivity [16]. The number of 'meaningful' poses per action is determined by setting a bound on the centrality measure. We evaluate our methodology on four standard activity recognition datasets and the results clearly demonstrate the superiority of our approach over the present state-of-the-art.

References

[1]

C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

Digital Library

[2]

C.-C. Chen, M. S. Ryoo, and J. K. Aggarwal. UT-Tower Dataset: Aerial View Activity Classification Challenge. http://cvrc.ece.utexas.edu/SDHA2010/Aerial_View_Activity.html, 2010.

[3]

G. K. M. Cheung, S. Baker, C. Simon, and T. Kanade. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In Computer Vision and Pattern Recognition (volume 1), pages 77--84. IEEE Computer Society, June 2003.

Digital Library

[4]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2003.

Digital Library

[5]

A. Desolneux, L. Moisan, and J.-M. Morel. A grouping principle and four applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4):508--513, April 2003.

Digital Library

[6]

A. Desolneux, L. Moisan, and J.-M. Morel. From Gestalt Theory to Image Analysis: A Probabilistic Approach. Spriger, 2008.

Digital Library

[7]

A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In International Conference on Computer Vision (volume 2), pages 726--733. IEEE Computer Society, October 2003.

Digital Library

[8]

L. Fengjun and R. Nevatia. Single view human action recognition using key pose matching and viterbi path seraching. In Computer Vision and Pattern Recognition. IEEE Computer Society, 2007.

[9]

W. Hoeffding. Probability inequalities for sum of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, March 1963.

[10]

J. Liu, S. Ali, and M. Shah. Recognizing human actions using multiple features. In Computer Vision and Pattern Recognition. IEEE Computer Society, July 2008.

[11]

W. L. Lu, K. Okuma, and J. J. Little. Tracking and recognizing actions of multiple hockey players using the boosted particle filter. Image and Vision Computing, 27(1/2):189--205, January 2009.

Digital Library

[12]

B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, pages 674--679. Morgan Kaufmann Publishers Inc., 1981.

Digital Library

[13]

G. Mori and J. Malik. Estimating human body configurations using shape context matching. In Europian Conference on Computer Vision (volume 3) LNCS 2352, pages 666--680. Springer, January 2002.

Digital Library

[14]

G. Mori, X. Ren, A. Efros, and J. Malik. Recovering human body configurations: Combining segmentation and recognition. In Computer Vision and Pattern Recognition (volume 2), pages 326--333. IEEE Computer Society, June 27-July 2 2004.

Digital Library

[15]

B. L. Narayan, C. A. Murthy, and S. K. Pal. Maxdiff kd-trees for data condensation. Pattern Recognition Letters, 27(3):187--200, February 2006.

Digital Library

[16]

R. Navigli and M. Lapata. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4):678--692, April 2010.

Digital Library

[17]

J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3):299--318, June 2008.

Digital Library

[18]

D. Pelleg and A. W. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727--734. Morgan Kaufmann Publishers Inc., 2000.

Digital Library

[19]

C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In International Conference on Pattern Recognition, pages 32--36. IEEE Computer Society, 2004.

Digital Library

[20]

Y. Wang and G. Mori. Human action recognition by semi-latent topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(10):1762--1774, October 2009.

Digital Library

[21]

D. B. West. Introduction to Graph Theory. Prentice Hall, 2000.

Cited By

Eweiwi ACheema SThurau CBauckhage C(2011)Temporal key poses for human action recognition2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)10.1109/ICCVW.2011.6130403(1310-1317)Online publication date: Nov-2011
https://doi.org/10.1109/ICCVW.2011.6130403

Index Terms

Human action recognition in video by 'meaningful' poses
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
        Video summarization

Recommendations

Silhouette-based human action recognition using sequences of key poses

In this paper, a human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses. Our contribution is twofold. ...
Action database for categorizing and inferring human poses from video sequences

One of the difficulties in automated recognition of human activities is classifying a video into a specific action class by selecting among a large number of human actions. Technology for understanding complex and varied human actions is necessary for ...
Human Action Recognition Technology in Dance Video Image
In order to effectively improve the recognition rate of human action in dance video image, shorten the recognition time of human action, and ensure the recognition effect of dance motion, this study proposes a human motion recognition method of dance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '10: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing

December 2010

533 pages

ISBN:9781450300605

DOI:10.1145/1924559

General Chairs:
Rama Chellappa
University of Maryland
,
Padmanabhan Anandan
Microsoft Research, India
,
Program Chairs:
A. N. Rajagopalan
Indian Institute of Technology Madras, India
,
P. J. Narayanan
International Institute of Information Technology Hyderabad, India
,
Philip Torr
Oxford Brookes University, UK

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICVGIP '10

ICVGIP '10: Seventh Indian Conference on Computer Vision, Graphics and Image Processing

December 12 - 15, 2010

Chennai, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
179
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Eweiwi ACheema SThurau CBauckhage C(2011)Temporal key poses for human action recognition2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)10.1109/ICCVW.2011.6130403(1310-1317)Online publication date: Nov-2011
https://doi.org/10.1109/ICCVW.2011.6130403

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents