Natural Action Recognition Using Invariant 3D Motion Encoding

Hadfield, Simon; Lebeda, Karel; Bowden, Richard

doi:10.1007/978-3-319-10605-2_49

Simon Hadfield¹⁹,
Karel Lebeda¹⁹ &
Richard Bowden¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8690))

Included in the following conference series:

European Conference on Computer Vision

17k Accesses
10 Citations

Abstract

We investigate the recognition of actions “in the wild” using 3D motion information. The lack of control over (and knowledge of) the camera configuration, exacerbates this already challenging task, by introducing systematic projective inconsistencies between 3D motion fields, hugely increasing intra-class variance. By introducing a robust, sequence based, stereo calibration technique, we reduce these inconsistencies from fully projective to a simple similarity transform. We then introduce motion encoding techniques which provide the necessary scale invariance, along with additional invariances to changes in camera viewpoint.

On the recent Hollywood 3D natural action recognition dataset, we show improvements of 40% over previous state-of-the-art techniques based on implicit motion encoding. We also demonstrate that our robust sequence calibration simplifies the task of recognising actions, leading to recognition rates 2.5 times those for the same technique without calibration. In addition, the sequence calibrations are made available.

Download to read the full chapter text

Chapter PDF

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Article Open access 21 June 2016

Manifold Methods for Action Recognition

Human Action Recognition for Depth Cameras via Dynamic Frame Warping

Keywords

References

BMVC, September 3-7 (2012)
Google Scholar
Basha, T., Avidan, S., Hornung, A., Matusik, W.: Structure and motion from scene registration. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1426–1433 (June 2012)
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24(4), 509–522 (2002)
Article Google Scholar
Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012)
Chapter Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Chapter Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Article Google Scholar
Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. PAMI 33(5), 883 –897 (may 2011)
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)
Article Google Scholar
Hadfield, S., Bowden, R.: Kinecting the dots: Particle based scene flow from depth sensors. In. In: Proceedings, International Conference on Computer Vision, Barcelona, Spain, November 6-13 (2011)
Google Scholar
Hadfield, S., Bowden, R.: Hollywood 3d: Recognizing actions in 3d natural scenes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Oregon, USA, June 22-28 (2013)
Google Scholar
Hadfield, S., Bowden, R.: Scene particles: Unregularized particle based scene flow estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence 36(3), 564–576 (2014)
Article Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in computer vision. Cambridge University press (2000)
Google Scholar
Konda, K., Memisevic, R.: Learning to combine depth and motion. arXiv preprint arXiv:1312.3429 (2013)
Google Scholar
Kukelova, Z., Bujnak, M., Pajdla, T.: Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems. In: BMVC, pp. 1–10 (2008)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Ninth IEEE Int Computer Vision Conf, pp. 432–439 (2003)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition CVPR 2008, pp. 1–8 (2008)
Google Scholar
Laptev, I., Perez, P.: Retrieving actions in movies. In: Proc. IEEE 11th Int. Conf. Computer Vision ICCV 2007. pp. 1–8 (2007)
Google Scholar
Lebeda, K., Matas, J., Chum, O., Bowden: Fixing the locally optimized ransac. In: Bowden, et al. (eds.) [1], pp. 1013–1023
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14. IEEE (2010)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Lukins, T., Fisher, R.: Colour constrained 4D flow. In: Proc. BMVC, Oxford, UK, September 6-8, pp. 340–348 (2005)
Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proc. IEEE 12th Int. Computer Vision Conf, pp. 104–111 (2009)
Google Scholar
Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 716–723. IEEE (2013)
Google Scholar
Oshin, O., Gilbert, A., Bowden, R.: Capturing the relative distribution of features for action recognition. In: Proc. IEEE Int Automatic Face & Gesture Recognition and Workshops (FG 2011) Conf., pp. 111–116 (2011)
Google Scholar
Saff, E.B., Kuijlaars, A.B.: Distributing many points on a sphere. The Mathematical Intelligencer 19(1), 5–11 (1997)
Article MATH MathSciNet Google Scholar
Sapienza, M., Cuzzolin, F., Torr, P.: Learning discriminative space-time actions from weakly labelled videos. In: Proc. BMVC [1]
Google Scholar
Schuchert, T., Aach, T., Scharr, H.: Range flow in varying illumination: Algorithms and comparisons. PAMI, 1646–1658 (2009)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. 17th Int. Conf. Pattern Recognition ICPR 2004, vol. 3, pp. 32–36 (2004)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, Multimedia 2007, pp. 357–360. ACM, New York (2007)
Google Scholar
Torr, P., Zisserman, A.: Robust computation and parametrization of multiple view relations. In: Sixth International Conference on Computer Vision, pp. 727–732. IEEE (1998)
Google Scholar
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 252–259 (2012)
Google Scholar
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 872–885. Springer, Heidelberg (2012)
Chapter Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060. ACM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Vision, Speech and Signal Processing, University of Surrey, UK
Simon Hadfield, Karel Lebeda & Richard Bowden

Authors

Simon Hadfield
View author publications
You can also search for this author in PubMed Google Scholar
Karel Lebeda
View author publications
You can also search for this author in PubMed Google Scholar
Richard Bowden
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg, 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hadfield, S., Lebeda, K., Bowden, R. (2014). Natural Action Recognition Using Invariant 3D Motion Encoding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8690. Springer, Cham. https://doi.org/10.1007/978-3-319-10605-2_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-10605-2_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10604-5
Online ISBN: 978-3-319-10605-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Action Recognition Using Invariant 3D Motion Encoding

Abstract

Chapter PDF

Similar content being viewed by others

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Manifold Methods for Action Recognition

Human Action Recognition for Depth Cameras via Dynamic Frame Warping

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Natural Action Recognition Using Invariant 3D Motion Encoding

Abstract

Chapter PDF

Similar content being viewed by others

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Manifold Methods for Action Recognition

Human Action Recognition for Depth Cameras via Dynamic Frame Warping

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation