Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing their vast potential for advancement. However, the development in this field is currently slowed by the lack of comprehensive, large-scale datasets, which are critical for developing robust recognition frameworks. To bridge this gap, we introduces DailyDVS-200, a meticulously curated benchmark dataset tailored for the event-based action recognition community. DailyDVS-200 is extensive, covering 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences. This dataset is designed to reflect a broad spectrum of action types, scene complexities, and data acquisition diversity. Each sequence in the dataset is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions. Moreover, DailyDVS-200 is structured to facilitate a wide range of research paths, offering a solid foundation for both validating existing approaches and inspiring novel methodologies. By setting a new benchmark in the field, we challenge the current limitations of neuromorphic data processing and invite a surge of new approaches in event-based action recognition techniques, which paves the way for future explorations in neuromorphic computing and beyond.
- 2024/10/8: add all_data_label.json
- 2024/8/6: add train/val/test.txt
200
event-specific action categories47
socially recruited subjects22046
video recordingsDVXplorer Lite
event camera with a spatial resolution of320x240
14
attributes are labeled
200 action classes and detailed description can be seen in ./resource/action_description.csv
.
Our subjects are students aged 18 to 25, who vary in height (158 cm to 190 cm) and weight (48 kg to 105 kg). The detail information can be seen in ./resource
.
In the DailyDVS-200 Dataset, the division of training,test and validation sets can be found in the train.txt
, test.txt
and val.txt
files (See Baidu Netdisk). Each line consists of Relative Path and Action ID. The participants IDs responsible for collecting the testing set and validation set are as follows:
- Training set:
0,1,2,6,8,9,12,13,14,15,17,18,19,20,21,22,23,25,26,28,29,30,32,34,35,36,38,39,40,44,45,46
- Testing set:
4,7,10,11,16,33,37,42,45
- Validation set:
3,4,5,24,27,31,41,43
If you want to do attribution test, select data with corresponding attributes from the above testing set as the dataset for attribute testing.
"THUE-ACT-50 & THUE-ACT-50-CHL":see THUE-ACT-50
"Hardvs": See HARDVS
"Bullying10K": See Bullying10K
"DailyDVS-200": See Baidu Netdisk, Google Drive
"DailyDVS-200 [Label about attributes]": Google Drive
In the DailyDVS-200 Dataset, we provide all_data.json
file, which record the attributes of each data. An example are as follows:
{
"FileName": "C0P3M0S1_20231111_09_11_23.aedat4",
"Time": "20231111_09_11_23",
"FilePath": ".../event_raw/11_11/3/C0P3M0S1_20231111_09_11_23.aedat4",
"Scene": "1",
"Action": "0",
"Move": "0",
"PersonNum": "1",
"Range of Motion": "Limbs",
"Complexity of Movement": "Easy",
"Props/No Props": "No",
"Indoor/Outdoor": "Indoor",
"Background Complexity": "Easy",
"Daytime/Nighttime": "Daytime",
"Direction of Light": "Front Lighting",
"Shadow": "No",
"Standing/Sitting": "Standing",
"Height": "Low",
"Distance": "Near",
"Perspective": "",
"ID": "3"
}
In the DailyDVS-200 Dataset, which is provided in the .aedat4 format, the data is structured with 4 elements as follows:
t
: Represents the timestamp of the event.x
: Represents the x-coordinate of the event.y
: Represents the y-coordinate of the event.p
: The polarity value. It contains three categories: 1 and 0. In our experiments, we consider 1 as positive polarity and 0 as negative polarity.
-
Python 3.8
conda create -n your_env_name python=3.8
-
torch 1.13.1 + cu116
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
-
Install
mmcv-full
pip install openmim chardet
mim install mmcv-full==1.7.0
-
Requirements
pip install -r requirements.txt
-
Install
mmaction2
cd mmaction2
pip install -e .
Model | Top-1 Acc. | Top-5 Acc. | Model | Top-1 Acc. | Top-5 Acc. |
---|---|---|---|---|---|
C3D | 21.99 | 45.81 | Timesformer | 44.25 | 74.03 |
I3D | 32.30 | 59.05 | Swin-T | 48.06 | 74.47 |
R2Plus1D | 36.06 | 63.67 | ESTF | 24.68 | 50.18 |
SlowFast | 41.49 | 68.19 | GET | 37.28 | 61.59 |
TSM | 40.87 | 71.46 | Spikformer | 36.94 | 62.37 |
EST | 32.23 | 59.66 | SDT | 35.43 | 58.81 |
This dataset is licensed under the MIT License. Additionally, We have obtained explicit informed consent and au-thorization documentation from all participants involved in data collection.
This project is based on MMaction2 (code),ESTF (paper, code),EST(paper,code),GET(paper,code),SpikFormer(paper,code),SDT(paper,code). Thanks for their wonderful works.
If you find this paper useful, please consider staring this repository and citing our paper:
@article{wang2024dailydvs,
title={DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition},
author={Wang, Qi and Xu, Zhou and Lin, Yuming and Ye, Jingtao and Li, Hongsheng and Zhu, Guangming and Shah, Syed Afaq Ali and Bennamoun, Mohammed and Zhang, Liang},
journal={arXiv preprint arXiv:2407.05106},
year={2024}
}