Turn tap: Temporal unit regression network for temporal action proposals

J Gao, Z Yang, K Chen, C Sun… - Proceedings of the …, 2017 - openaccess.thecvf.com
Proceedings of the IEEE international conference on computer …, 2017openaccess.thecvf.com
We address the problem of Temporal Action Proposal (TAP) generation. This is an important
problem, as fast extraction of semantically important (eg human actions) segments from
untrimmed videos is an important step for large-scale video analysis. To tackle this problem,
we propose a novel Temporal Unit Regression Network (TURN) model. There are two
salient aspects of TURN:(1) TURN jointly predicts action proposals and refines the temporal
boundaries by temporal coordinate regression with contextual information;(2) Fast …
Abstract
We address the problem of Temporal Action Proposal (TAP) generation. This is an important problem, as fast extraction of semantically important (eg human actions) segments from untrimmed videos is an important step for large-scale video analysis. To tackle this problem, we propose a novel Temporal Unit Regression Network (TURN) model. There are two salient aspects of TURN:(1) TURN jointly predicts action proposals and refines the temporal boundaries by temporal coordinate regression with contextual information;(2) Fast computation is enabled by unit feature reuse: a long untrimmed video is decomposed into video units, which are reused as basic building blocks of temporal proposals. TURN outperforms the state-of-the-art methods under average recall (AR) by a large margin on THUMOS-14 and ActivityNet datasets, and runs over 900 frames per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal generation stage for existing temporal action localization pipelines, and outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.
openaccess.thecvf.com