Navigating Open Set Scenarios for Skeleton-Based Action Recognition

Authors

  • Kunyu Peng Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Cheng Yin Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Junwei Zheng Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Ruiping Liu Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • David Schneider Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Jiaming Zhang Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Kailun Yang School of Robotics, Hunan University
  • M. Saquib Sarfraz Mercdes-Benz Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Rainer Stiefelhagen Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology
  • Alina Roitberg Institute for Artificial Intelligence, University of Stuttgart

DOI:

https://doi.org/10.1609/aaai.v38i5.28247

Keywords:

CV: Biometrics, Face, Gesture & Pose, CV: Video Understanding & Activity Analysis

Abstract

In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task and formalize the benchmark on three skeleton-based datasets. We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information.To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as CrossMax - an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. CrossMax outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. We will release the benchmark, code, and models to the community.

Published

2024-03-24

How to Cite

Peng, K., Yin, C., Zheng, J., Liu, R., Schneider, D., Zhang, J., Yang, K., Sarfraz, M. S., Stiefelhagen, R., & Roitberg, A. (2024). Navigating Open Set Scenarios for Skeleton-Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4487-4496. https://doi.org/10.1609/aaai.v38i5.28247

Issue

Section

AAAI Technical Track on Computer Vision IV