Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1816041.1816111acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
research-article

Feature detector and descriptor evaluation in human action recognition

Published: 05 July 2010 Publication History

Abstract

In this paper, we evaluate and compare different feature detection and feature description methods for part-based approaches in human action recognition. Different methods have been proposed in the literature for both feature detection of space-time interest points and description of local video patches. It is however unclear which method performs better in the field of human action recognition. We compare, in the feature detection section, Dollar's method [18], Laptev's method [22], a bank of 3D-Gabor filters [6] and a method based on Space-Time Differences of Gaussians. We also compare and evaluate different descriptors such as Gradient [18], HOG-HOF [22], 3D SIFT [24] and an enhanced version of LBP-TOP [15]. We show the combination of Dollar's detection method and the improved LBP-TOP descriptor to be computationally efficient and to reach the best recognition accuracy on the KTH database.

References

[1]
A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 726--733, vol. 2, 13--16 Oct, 2003.
[2]
E. Shechtman and M. Irani Space-time behavior based correlation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 405--412, vol. 1, 20--25 June, 2005.
[3]
S. Ali, A. Basharat, and M. Shah. Chaotic invariants for human action recognition. In Proc. of IEEE International Conference on Computer Vision (ICCV), pages 1--8, 2007.
[4]
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision, vol. 65(1/2): 43--72, 2005.
[5]
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615--1630, Oct. 2005
[6]
H. Ning, Y. Hu, and T. S. Huang. Searching Human Behaviors using Spatial-Temporal words. In Proceedings of IEEE International Conference on Image Processing, vol. 6, pp. VI-337--VI-340, September-October, 2007.
[7]
T. Serre, L. Wolf, and T. Poggio. Object recognition with features inspired by visual cortex. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 994--1000, 2005.
[8]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2):91--110, 2004.
[9]
L. Fei-Fei and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 524--531, 2005.
[10]
W. Cheung and G. Hamarneh. N-sift: N-dimensional scale invariant feature transform for matching medical images. In Proceedings of the 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 720--723, 2007.
[11]
T. Ojala, M. Pietikanen, and D. Harwood. A comparative study of texture measures with classification based on featured distribution. Pattern Recognition, 29(1):51--59, 1996.
[12]
T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24 (7), pp. 971--987, 2002.
[13]
T. Ahonen, A. Hadid, M. Pietikäinen. Face recognition with local binary pattern. In Proceedings of European Conference on Computer Vision (ECCV), 2004.
[14]
T. Ahonen, A. Hadid, and M. Pietikainen. Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037--2041, December 2006.
[15]
G. Zhao and M. Pietikäinen. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 915--928, June 2007.
[16]
M. Heikkila, M. Pietikainen, and C. Schmid. Description of interest regions with center-symmetric local binary patterns. In Proceedings of the 5th Indian Conference on Computer Vision, Graphics and Image Processing, 2006
[17]
C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, pages 32--36, Vol. 3, August 2004.
[18]
P. Dollar, V. Rabaud, G. Cottrell, and S. J. Belongie. Behavior recognition via sparse spatio-temporal features. In Proc. of ICCV Int. work-shop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VSPETS), pages 65--72, 2005.
[19]
I. Laptev and T. Lindeberg. Local descriptors for spatio-temporal recognition. In Proceedings of ECCV Workshop on Spatial Coherence for Visual Motion Analysis, pages 91--103, 2004.
[20]
C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of Alvey Vision Conference, pp. 147--152, 1998.
[21]
I. Laptev. On space-time interest points. International Journal of Computer Vision (IJCV), 64(2--3):107--123, 2005.
[22]
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol., no., pp. 1--8, June 2008. Website: http://www.irisa.fr/vista/actions/.
[23]
H. Wang, M. Ullah, A. Klaser, I. Laptev, C. Schmid. Evaluation of local spatio-temporal features for action recognition. In Proceedings of the British Machine Vision Conference, September 2009.
[24]
P. Scovanner, S. Ali and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia, September 2007. Website: http://www.cs.ucf.edu/~pscovann/
[25]
M. Heikkilä, M. Pietikäinen and C. Schmid. Description of interest regions with local binary patterns. Pattern Recognition, vol. 42(3), pp. 425--436, March 2007.
[26]
S. F. Wong and R. Cipolla. Extracting spatiotemporal interest points using global information. In Proc. of IEEE International Conference on Computer Vision (ICCV), pp. 1--8, 2007.
[27]
J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. In Proceedings of the British Machine Vision Conference, 2006.
[28]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[29]
M. Marszalek, I. Laptev, C. Schmid. Actions in Context. In Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2009.
[30]
R. Mattivi and L. Shao. Human action recognition using LBP-TOP as sparse spatio-temporal feature descriptor. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns (CAIP), Munster, Germany, September 2009.

Cited By

View all
  • (2022)Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant AnalysisMultimedia Tools and Applications10.1007/s11042-022-12420-281:13(17703-17727)Online publication date: 7-Mar-2022
  • (2019)Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action RecognitionSensors10.3390/s1912279019:12(2790)Online publication date: 21-Jun-2019
  • (2019)Efficient encoding of video descriptor distribution for action recognitionMultimedia Tools and Applications10.1007/s11042-019-08483-3Online publication date: 12-Dec-2019
  • Show More Cited By

Index Terms

  1. Feature detector and descriptor evaluation in human action recognition

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval
      July 2010
      492 pages
      ISBN:9781450301176
      DOI:10.1145/1816041
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 July 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. LBP-TOP
      2. bag of words
      3. feature descriptors
      4. feature detectors
      5. human action recognition

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CIVR' 10
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Selection of Relevant Visual Feature Sets for Enhanced Depression Detection using Incremental Linear Discriminant AnalysisMultimedia Tools and Applications10.1007/s11042-022-12420-281:13(17703-17727)Online publication date: 7-Mar-2022
      • (2019)Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action RecognitionSensors10.3390/s1912279019:12(2790)Online publication date: 21-Jun-2019
      • (2019)Efficient encoding of video descriptor distribution for action recognitionMultimedia Tools and Applications10.1007/s11042-019-08483-3Online publication date: 12-Dec-2019
      • (2018)Cardio-Pulmonary Resuscitation (CPR) Scene Retrieval from Medical Simulation Videos Using Local Binary Patterns Over Three Orthogonal Planes2018 International Conference on Content-Based Multimedia Indexing (CBMI)10.1109/CBMI.2018.8516485(1-6)Online publication date: Sep-2018
      • (2018)Spatio-Temporal Scale Selection in Video DataJournal of Mathematical Imaging and Vision10.1007/s10851-017-0766-960:4(525-562)Online publication date: 1-May-2018
      • (2017)Supervised Local Descriptor Learning for Human Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2017.270020419:9(2056-2065)Online publication date: Sep-2017
      • (2017)Spatiotemporal saliency and sub action segmentation for human action recognition2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT.2017.8204134(1-6)Online publication date: Jul-2017
      • (2017)Temporal Scale Selection in Time-Causal Scale SpaceJournal of Mathematical Imaging and Vision10.1007/s10851-016-0691-358:1(57-101)Online publication date: 1-May-2017
      • (2016)Automated human physical function measurement using constrained high dispersal network with SVM-linear2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM.2016.7822747(1520-1526)Online publication date: Dec-2016
      • (2016)Investigating the impact of frame rate towards robust human action recognitionSignal Processing10.1016/j.sigpro.2015.08.006124:C(220-232)Online publication date: 1-Jul-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media