3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition

Wang, Lei; Koniusz, Piotr

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.14474 (cs)

[Submitted on 25 Mar 2023]

Title:3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition

Authors:Lei Wang, Piotr Koniusz

View PDF

Abstract:Many skeletal action recognition models use GCNs to represent the human body by 3D body joints connected body parts. GCNs aggregate one- or few-hop graph neighbourhoods, and ignore the dependency between not linked body joints. We propose to form hypergraph to model hyper-edges between graph nodes (e.g., third- and fourth-order hyper-edges capture three and four nodes) which help capture higher-order motion patterns of groups of body joints. We split action sequences into temporal blocks, Higher-order Transformer (HoT) produces embeddings of each temporal block based on (i) the body joints, (ii) pairwise links of body joints and (iii) higher-order hyper-edges of skeleton body joints. We combine such HoT embeddings of hyper-edges of orders 1, ..., r by a novel Multi-order Multi-mode Transformer (3Mformer) with two modules whose order can be exchanged to achieve coupled-mode attention on coupled-mode tokens based on 'channel-temporal block', 'order-channel-body joint', 'channel-hyper-edge (any order)' and 'channel-only' pairs. The first module, called Multi-order Pooling (MP), additionally learns weighted aggregation along the hyper-edge mode, whereas the second module, Temporal block Pooling (TP), aggregates along the temporal block mode. Our end-to-end trainable network yields state-of-the-art results compared to GCN-, transformer- and hypergraph-based counterparts.

Comments:	This paper is accepted by CVPR 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2303.14474 [cs.CV]
	(or arXiv:2303.14474v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.14474
Journal reference:	CVPR 2023

Submission history

From: Lei Wang [view email]
[v1] Sat, 25 Mar 2023 14:06:31 UTC (1,958 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators