research-article

Feature detector and descriptor evaluation in human action recognition

Authors:

Riccardo MattiviAuthors Info & Claims

CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

Pages 477 - 484

https://doi.org/10.1145/1816041.1816111

Published: 05 July 2010 Publication History

Abstract

In this paper, we evaluate and compare different feature detection and feature description methods for part-based approaches in human action recognition. Different methods have been proposed in the literature for both feature detection of space-time interest points and description of local video patches. It is however unclear which method performs better in the field of human action recognition. We compare, in the feature detection section, Dollar's method [18], Laptev's method [22], a bank of 3D-Gabor filters [6] and a method based on Space-Time Differences of Gaussians. We also compare and evaluate different descriptors such as Gradient [18], HOG-HOF [22], 3D SIFT [24] and an enhanced version of LBP-TOP [15]. We show the combination of Dollar's detection method and the improved LBP-TOP descriptor to be computationally efficient and to reach the best recognition accuracy on the KTH database.

References

[1]

A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 726--733, vol. 2, 13--16 Oct, 2003.

Digital Library

[2]

E. Shechtman and M. Irani Space-time behavior based correlation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 405--412, vol. 1, 20--25 June, 2005.

Digital Library

[3]

S. Ali, A. Basharat, and M. Shah. Chaotic invariants for human action recognition. In Proc. of IEEE International Conference on Computer Vision (ICCV), pages 1--8, 2007.

[4]

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision, vol. 65(1/2): 43--72, 2005.

Digital Library

[5]

K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615--1630, Oct. 2005

Digital Library

[6]

H. Ning, Y. Hu, and T. S. Huang. Searching Human Behaviors using Spatial-Temporal words. In Proceedings of IEEE International Conference on Image Processing, vol. 6, pp. VI-337--VI-340, September-October, 2007.

[7]

T. Serre, L. Wolf, and T. Poggio. Object recognition with features inspired by visual cortex. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 994--1000, 2005.

Digital Library

[8]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2):91--110, 2004.

Digital Library

[9]

L. Fei-Fei and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 524--531, 2005.

Digital Library

[10]

W. Cheung and G. Hamarneh. N-sift: N-dimensional scale invariant feature transform for matching medical images. In Proceedings of the 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 720--723, 2007.

[11]

T. Ojala, M. Pietikanen, and D. Harwood. A comparative study of texture measures with classification based on featured distribution. Pattern Recognition, 29(1):51--59, 1996.

[12]

T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24 (7), pp. 971--987, 2002.

Digital Library

[13]

T. Ahonen, A. Hadid, M. Pietikäinen. Face recognition with local binary pattern. In Proceedings of European Conference on Computer Vision (ECCV), 2004.

[14]

T. Ahonen, A. Hadid, and M. Pietikainen. Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037--2041, December 2006.

Digital Library

[15]

G. Zhao and M. Pietikäinen. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 915--928, June 2007.

Digital Library

[16]

M. Heikkila, M. Pietikainen, and C. Schmid. Description of interest regions with center-symmetric local binary patterns. In Proceedings of the 5th Indian Conference on Computer Vision, Graphics and Image Processing, 2006

Digital Library

[17]

C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, pages 32--36, Vol. 3, August 2004.

Digital Library

[18]

P. Dollar, V. Rabaud, G. Cottrell, and S. J. Belongie. Behavior recognition via sparse spatio-temporal features. In Proc. of ICCV Int. work-shop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VSPETS), pages 65--72, 2005.

Digital Library

[19]

I. Laptev and T. Lindeberg. Local descriptors for spatio-temporal recognition. In Proceedings of ECCV Workshop on Spatial Coherence for Visual Motion Analysis, pages 91--103, 2004.

Digital Library

[20]

C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of Alvey Vision Conference, pp. 147--152, 1998.

[21]

I. Laptev. On space-time interest points. International Journal of Computer Vision (IJCV), 64(2--3):107--123, 2005.

Digital Library

[22]

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol., no., pp. 1--8, June 2008. Website: http://www.irisa.fr/vista/actions/.

[23]

H. Wang, M. Ullah, A. Klaser, I. Laptev, C. Schmid. Evaluation of local spatio-temporal features for action recognition. In Proceedings of the British Machine Vision Conference, September 2009.

[24]

P. Scovanner, S. Ali and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia, September 2007. Website: http://www.cs.ucf.edu/~pscovann/

Digital Library

[25]

M. Heikkilä, M. Pietikäinen and C. Schmid. Description of interest regions with local binary patterns. Pattern Recognition, vol. 42(3), pp. 425--436, March 2007.

Digital Library

[26]

S. F. Wong and R. Cipolla. Extracting spatiotemporal interest points using global information. In Proc. of IEEE International Conference on Computer Vision (ICCV), pp. 1--8, 2007.

[27]

J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. In Proceedings of the British Machine Vision Conference, 2006.

[28]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[29]

M. Marszalek, I. Laptev, C. Schmid. Actions in Context. In Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2009.

[30]

R. Mattivi and L. Shao. Human action recognition using LBP-TOP as sparse spatio-temporal feature descriptor. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns (CAIP), Munster, Germany, September 2009.

Digital Library

Cited By

Rathi SKaur BAgrawal R(2022)Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant AnalysisMultimedia Tools and Applications10.1007/s11042-022-12420-281:13(17703-17727)Online publication date: 7-Mar-2022
https://doi.org/10.1007/s11042-022-12420-2
Nazir SYousaf MNebel JVelastin S(2019)Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action RecognitionSensors10.3390/s1912279019:12(2790)Online publication date: 21-Jun-2019
https://doi.org/10.3390/s19122790
Saremi MYaghmaee F(2019)Efficient encoding of video descriptor distribution for action recognitionMultimedia Tools and Applications10.1007/s11042-019-08483-3Online publication date: 12-Dec-2019
https://doi.org/10.1007/s11042-019-08483-3
Show More Cited By

Index Terms

Feature detector and descriptor evaluation in human action recognition
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor
CAIP '09: Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns

In this paper we apply the Local Binary Pattern on Three Orthogonal Planes (LBP-TOP) descriptor to the field of human action recognition. A video sequence is described as a collection of spatial-temporal words after the detection of space-time interest ...
Combining appearance and structural features for human action recognition

In this paper, we propose to integrate structural information with appearance features for human action recognition. In local representations based on detected spatio-temporal interest points (STIPs), the layout of STIPs carries important cues of motion ...
A comprehensive review of current local features for computer vision

Local features are widely utilized in a large number of applications, e.g., object categorization, image retrieval, robust matching, and robot localization. In this review, we focus on detectors and local descriptors. Both earlier corner detectors, e.g.,...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

July 2010

492 pages

ISBN:9781450301176

DOI:10.1145/1816041

Conference Chairs:
Shipeng Li
Microsoft Research Asia, China
,
Xinbo Gao
Xidian University, China
,
Nicu Sebe
University of Trento, Italy

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Shenzhen Science Plan

Conference

CIVR' 10

Sponsor:

SIGMM

CIVR' 10: International Conference on Image and Video Retrieval

July 5 - 7, 2010

Xi'an, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
851
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rathi SKaur BAgrawal R(2022)Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant AnalysisMultimedia Tools and Applications10.1007/s11042-022-12420-281:13(17703-17727)Online publication date: 7-Mar-2022
https://doi.org/10.1007/s11042-022-12420-2
Nazir SYousaf MNebel JVelastin S(2019)Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action RecognitionSensors10.3390/s1912279019:12(2790)Online publication date: 21-Jun-2019
https://doi.org/10.3390/s19122790
Saremi MYaghmaee F(2019)Efficient encoding of video descriptor distribution for action recognitionMultimedia Tools and Applications10.1007/s11042-019-08483-3Online publication date: 12-Dec-2019
https://doi.org/10.1007/s11042-019-08483-3
Anju Panicker MFrigui HCalhoun A(2018)Cardio-Pulmonary Resuscitation (CPR) Scene Retrieval from Medical Simulation Videos Using Local Binary Patterns Over Three Orthogonal Planes2018 International Conference on Content-Based Multimedia Indexing (CBMI)10.1109/CBMI.2018.8516485(1-6)Online publication date: Sep-2018
https://doi.org/10.1109/CBMI.2018.8516485
Lindeberg T(2018)Spatio-Temporal Scale Selection in Video DataJournal of Mathematical Imaging and Vision10.1007/s10851-017-0766-960:4(525-562)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s10851-017-0766-9
Zhen XZheng FShao LCao XXu D(2017)Supervised Local Descriptor Learning for Human Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2017.270020419:9(2056-2065)Online publication date: Sep-2017
https://doi.org/10.1109/TMM.2017.2700204
Babu AShyna A(2017)Spatiotemporal saliency and sub action segmentation for human action recognition2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT.2017.8204134(1-6)Online publication date: Jul-2017
https://doi.org/10.1109/ICCCNT.2017.8204134
Lindeberg T(2017)Temporal Scale Selection in Time-Causal Scale SpaceJournal of Mathematical Imaging and Vision10.1007/s10851-016-0691-358:1(57-101)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s10851-016-0691-3
Dan Meng Cao GXinyu Song Chen WWenming Cao (2016)Automated human physical function measurement using constrained high dispersal network with SVM-linear2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM.2016.7822747(1520-1526)Online publication date: Dec-2016
https://doi.org/10.1109/BIBM.2016.7822747
Harjanto FWang ZLu STsoi AFeng D(2016)Investigating the impact of frame rate towards robust human action recognitionSignal Processing10.1016/j.sigpro.2015.08.006124:C(220-232)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1016/j.sigpro.2015.08.006
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents