Authors:
Yasser Boutaleb
1
;
2
;
Catherine Soladie
1
;
Nam-Duong Duong
2
;
Jérôme Royan
2
and
Renaud Seguier
1
Affiliations:
1
IETR/CentraleSupelec, Avenue de la Boulaie, 35510 Cesson-Sevigné, France
;
2
IRT b-com, 1219 Avenue des Champs Blancs, 35510 Cesson-Sevigné, France
Keyword(s):
First-person Hand Activity Recognition, Transfer Learning, Multi-stream Learning, Features Fusion.
Abstract:
First-person hand activity recognition is a challenging task, especially when not enough data are available. In this paper, we tackle this challenge by proposing a new low-cost multi-stage learning pipeline for first-person RGB-based hand activity recognition on a limited amount of data. For a given RGB image activity sequence, in the first stage, the regions of interest are extracted using a pre-trained neural network (NN). Then, in the second stage, high-level spatial features are extracted using pre-trained deep NN. In the third stage, the temporal dependencies are learned. Finally, in the last stage, a hand activity sequence classifier is learned, using a post-fusion strategy, which is applied to the previously learned temporal dependencies. The experiments evaluated on two real-world data sets shows that our pipeline achieves the state-of-the-art. Moreover, it shows that the proposed pipeline achieves good results on limited data.