Abstract
We investigate the recognition of actions “in the wild” using 3D motion information. The lack of control over (and knowledge of) the camera configuration, exacerbates this already challenging task, by introducing systematic projective inconsistencies between 3D motion fields, hugely increasing intra-class variance. By introducing a robust, sequence based, stereo calibration technique, we reduce these inconsistencies from fully projective to a simple similarity transform. We then introduce motion encoding techniques which provide the necessary scale invariance, along with additional invariances to changes in camera viewpoint.
On the recent Hollywood 3D natural action recognition dataset, we show improvements of 40% over previous state-of-the-art techniques based on implicit motion encoding. We also demonstrate that our robust sequence calibration simplifies the task of recognising actions, leading to recognition rates 2.5 times those for the same technique without calibration. In addition, the sequence calibrations are made available.
Chapter PDF
Similar content being viewed by others
Keywords
References
BMVC, September 3-7 (2012)
Basha, T., Avidan, S., Hornung, A., Matusik, W.: Structure and motion from scene registration. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1426–1433 (June 2012)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24(4), 509–522 (2002)
Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. PAMI 33(5), 883 –897 (may 2011)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)
Hadfield, S., Bowden, R.: Kinecting the dots: Particle based scene flow from depth sensors. In. In: Proceedings, International Conference on Computer Vision, Barcelona, Spain, November 6-13 (2011)
Hadfield, S., Bowden, R.: Hollywood 3d: Recognizing actions in 3d natural scenes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Oregon, USA, June 22-28 (2013)
Hadfield, S., Bowden, R.: Scene particles: Unregularized particle based scene flow estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence 36(3), 564–576 (2014)
Hartley, R., Zisserman, A.: Multiple View Geometry in computer vision. Cambridge University press (2000)
Konda, K., Memisevic, R.: Learning to combine depth and motion. arXiv preprint arXiv:1312.3429 (2013)
Kukelova, Z., Bujnak, M., Pajdla, T.: Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems. In: BMVC, pp. 1–10 (2008)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Ninth IEEE Int Computer Vision Conf, pp. 432–439 (2003)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition CVPR 2008, pp. 1–8 (2008)
Laptev, I., Perez, P.: Retrieving actions in movies. In: Proc. IEEE 11th Int. Conf. Computer Vision ICCV 2007. pp. 1–8 (2007)
Lebeda, K., Matas, J., Chum, O., Bowden: Fixing the locally optimized ransac. In: Bowden, et al. (eds.) [1], pp. 1013–1023
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14. IEEE (2010)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Lukins, T., Fisher, R.: Colour constrained 4D flow. In: Proc. BMVC, Oxford, UK, September 6-8, pp. 340–348 (2005)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proc. IEEE 12th Int. Computer Vision Conf, pp. 104–111 (2009)
Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 716–723. IEEE (2013)
Oshin, O., Gilbert, A., Bowden, R.: Capturing the relative distribution of features for action recognition. In: Proc. IEEE Int Automatic Face & Gesture Recognition and Workshops (FG 2011) Conf., pp. 111–116 (2011)
Saff, E.B., Kuijlaars, A.B.: Distributing many points on a sphere. The Mathematical Intelligencer 19(1), 5–11 (1997)
Sapienza, M., Cuzzolin, F., Torr, P.: Learning discriminative space-time actions from weakly labelled videos. In: Proc. BMVC [1]
Schuchert, T., Aach, T., Scharr, H.: Range flow in varying illumination: Algorithms and comparisons. PAMI, 1646–1658 (2009)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. 17th Int. Conf. Pattern Recognition ICPR 2004, vol. 3, pp. 32–36 (2004)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, Multimedia 2007, pp. 357–360. ACM, New York (2007)
Torr, P., Zisserman, A.: Robust computation and parametrization of multiple view relations. In: Sixth International Conference on Computer Vision, pp. 727–732. IEEE (1998)
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 252–259 (2012)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 872–885. Springer, Heidelberg (2012)
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060. ACM (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hadfield, S., Lebeda, K., Bowden, R. (2014). Natural Action Recognition Using Invariant 3D Motion Encoding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8690. Springer, Cham. https://doi.org/10.1007/978-3-319-10605-2_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-10605-2_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10604-5
Online ISBN: 978-3-319-10605-2
eBook Packages: Computer ScienceComputer Science (R0)