Few-shot Action Recognition with Permutation-invariant Attention

Zhang, Hongguang; Zhang, Li; Qi, Xiaojuan; Li, Hongdong; Torr, Philip H. S.; Koniusz, Piotr

Computer Science > Computer Vision and Pattern Recognition

arXiv:2001.03905v2 (cs)

[Submitted on 12 Jan 2020 (v1), revised 18 Jul 2020 (this version, v2), latest version 4 Aug 2020 (v3)]

Title:Few-shot Action Recognition with Permutation-invariant Attention

Authors:Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz

View PDF

Abstract:Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift--the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.

Comments:	ECCV2020 Spotlight
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2001.03905 [cs.CV]
	(or arXiv:2001.03905v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2001.03905

Submission history

From: Hongguang Zhang [view email]
[v1] Sun, 12 Jan 2020 10:58:09 UTC (6,614 KB)
[v2] Sat, 18 Jul 2020 17:21:45 UTC (4,013 KB)
[v3] Tue, 4 Aug 2020 02:44:04 UTC (8,056 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Few-shot Action Recognition with Permutation-invariant Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Few-shot Action Recognition with Permutation-invariant Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators