Google Scholar

Activitynet: A large-scale video benchmark for human activity understanding

F Caba Heilbron, V Escorcia… - Proceedings of the …, 2015 - openaccess.thecvf.com

F Caba Heilbron, V Escorcia, B Ghanem, J Carlos Niebles

Proceedings of the ieee conference on computer vision and …, 2015•openaccess.thecvf.com

Abstract

In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on simple actions and movements occurring on manually trimmed videos. In this paper, we introduce ActivityNet: a new large-scale video benchmark for human activity understanding. Our new benchmark aims at covering a wide range of complex human activities that are of interest to people in their daily living. In its current version, ActivityNet provides samples from 203 activity categories with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 hours of video. We illustrate three scenarios in which ActivityNet can be used to benchmark and compare algorithms for human activity understanding: untrimmed video classification, trimmed activity classification and activity detection.

openaccess.thecvf.com

Show moreShow less

Save Cite Cited by 3108 Related articles All 11 versions View as HTML

Cite

Advanced search

Saved to My library

Activitynet: A large-scale video benchmark for human activity understanding