research-article

Open access

Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks

Authors:

Mohammad Zaki Zadeh,

Ashwin Ramesh Babu,

Ashish Jaiswal,

Maria Kyrarini,

Fillia MakedonAuthors Info & Claims

PETRA '21: Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference

Pages 171 - 176

https://doi.org/10.1145/3453892.3453893

Published: 29 June 2021 Publication History

All formats PDF

Abstract

This article proposes a novel approach for augmenting generative adversarial network (GAN) with a self-supervised task in order to improve its ability for encoding video representations that are useful in downstream tasks such as human activity recognition. In the proposed method, input video frames are randomly transformed by different spatial transformations, such as rotation, translation and shearing or temporal transformations such as shuffling temporal order of frames. Then discriminator is encouraged to predict the applied transformation by introducing an auxiliary loss. Subsequently, results prove superiority of the proposed method over baseline methods for providing a useful representation of videos used in human activity recognition performed on datasets such as KTH, UCF101 and Ball-Drop. Ball-Drop dataset is a specifically designed dataset for measuring executive functions in children through physically and cognitively demanding tasks. Using features from proposed method instead of baseline methods caused the top-1 classification accuracy to increase by more then 4%. Moreover, ablation study was performed to investigate the contribution of different transformations on downstream task.

References

[1]

Unaiza Ahsan, Chen Sun, and Irfan Essa. 2018. DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks. arxiv:1801.07230 [cs.CV]

[2]

Sandra Aigner and Marco Körner. 2018. FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing GANs. arxiv:1810.01325 [cs.CV]

[3]

A. R. Babu, M. Zakizadeh, J. R. Brady, D. Calderon, and F. Makedon. 2019. An Intelligent Action Recognition System to assess Cognitive Behavior for Executive Function Disorder. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). 164–169. https://doi.org/10.1109/COASE.2019.8843199

Digital Library

[4]

Javier Selva Castelló. 2018. A Comprehensive Survey on Deep Future Frame Video Prediction. Master’s thesis. Universitat de Barcelona, The address of the publisher. An optional note.

[5]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arxiv:2002.05709 [cs.LG]

[6]

Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, and Neil Houlsby. 2018. Self-Supervised GANs via Auxiliary Rotation Loss. arxiv:1811.11212

[7]

Alex Dillhoff, Konstantinos Tsiakas, Ashwin Ramesh Babu, Mohammad Zakizadehghariehali, Benjamin Buchanan, Morris Bell, Vassilis Athitsos, and Fillia Makedon. 2019. An Automated Assessment System for Embodied Cognition in Children: From Motion Data to Executive Functioning. In Proceedings of the 6th International Workshop on Sensor-Based Activity Recognition and Interaction (Rostock, Germany) (iWOAR ’19). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages. https://doi.org/10.1145/3361684.3361693

Digital Library

[8]

Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised Visual Representation Learning by Context Prediction. arxiv:1505.05192

[9]

Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. arxiv:1803.07728 [cs.CV]

[10]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger(Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

[11]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. arxiv:1704.00028

[12]

Tengda Han, Weidi Xie, and Andrew Zisserman. 2019. Video Representation Learning by Dense Predictive Coding. arxiv:1909.04656 [cs.CV]

[13]

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00685

[14]

Ashish Jaiswal, Ashwin ramesh babu, Mohammad Zadeh, Debapriya Banerjee, and Fillia Makedon. 2020. A Survey on Contrastive Self-Supervised Learning. Technologies 9 (12 2020), 2. https://doi.org/10.3390/technologies9010002

[15]

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Ryan Sepassi, George Tucker, and Henryk Michalewski. 2019. Model-Based Reinforcement Learning for Atari. arxiv:1903.00374

[16]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arxiv:1412.6980 [cs.LG]

[17]

Yu Kong and Yun Fu. 2018. Human Action Recognition and Prediction: A Survey. arxiv:1806.11230 [cs.CV]

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

[19]

Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, and Sylvain Gelly. 2018. A Large-Scale Study on Regularization and Normalization in GANs. arxiv:1807.04720 [cs.LG]

[20]

Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep Learning. Nature 521, 10 (2015), 436–444. https://doi.org/10.1038/nature14539

[21]

Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2017. Unsupervised Representation Learning by Sorting Sequences. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv.2017.79

[22]

Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. 2017. Are GANs Created Equal? A Large-Scale Study. arxiv:1711.10337 [stat.ML]

[23]

Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arxiv:1511.05440 [cs.LG]

[24]

Ishan Misra, C. Lawrence Zitnick, and Martial Hebert. 2016. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. arxiv:1603.08561 [cs.CV]

[25]

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. arxiv:1802.05957

[26]

Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, Vol. 9910. 69–84. https://doi.org/10.1007/978-3-319-46466-4_5

[27]

Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. 2017. Representation Learning by Learning to Count. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv.2017.628

[28]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arxiv:1912.01703 [cs.LG]

[29]

Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2016. Learning Features by Watching Objects Move. arxiv:1612.06370 [cs.CV]

[30]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context Encoders: Feature Learning by Inpainting. arxiv:1604.07379 [cs.CV]

[31]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arxiv:1511.06434

[32]

Ashwin ramesh babu, Mohammad Zadeh, Ashish Jaiswal, Alexis Lueckenhoff, Maria Kyrarini, and Fillia Makedon. 2020. A Multi-modal System to Assess Cognition in Children from their Physical Movements. https://doi.org/10.1145/3382507.3418829

Digital Library

[33]

Christian Schüldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: A local SVM approach. Proceedings - International Conference on Pattern Recognition 3, 32 – 36 Vol.3. https://doi.org/10.1109/ICPR.2004.1334462

[34]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arxiv:1212.0402 [cs.CV]

[35]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.

[36]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00675

[37]

Trieu H. Trinh, Minh-Thang Luong, and Quoc V. Le. 2019. Selfie: Self-supervised Pretraining for Image Embedding. arxiv:1906.02940 [cs.LG]

[38]

Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful Image Colorization. arxiv:1603.08511 [cs.CV]

Cited By

Yu YLiu XNi RYang SZhao YKot A(2024)PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330989934:8(6926-6936)Online publication date: Aug-2024
https://doi.org/10.1109/TCSVT.2023.3309899
Hossain Khan MMcElwain NHasegawa-Johnson MIslam B(2024)InfantMotion2Vec: Unlabeled Data-Driven Infant Pose Estimation Using a Single Chest IMU2024 IEEE 20th International Conference on Body Sensor Networks (BSN)10.1109/BSN63547.2024.10780750(1-4)Online publication date: 15-Oct-2024
https://doi.org/10.1109/BSN63547.2024.10780750
Zaki Zadeh MRamesh Babu AJaiswal AMakedon F(2022)Self-Supervised Human Activity Representation for Embodied Cognition AssessmentTechnologies10.3390/technologies1001003310:1(33)Online publication date: 17-Feb-2022
https://doi.org/10.3390/technologies10010033
Show More Cited By

Recommendations

Multi-task Self-Supervised Learning for Human Activity Detection

Deep learning methods are successfully used in applications pertaining to ubiquitous computing, pervasive intelligence, health, and well-being. Specifically, the area of human activity recognition (HAR) is primarily transformed by the convolutional and ...
Large-Scale Self-Supervised Human Activity Recognition
PETRA '22: Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments

In this paper, a self-supervised approach is used to obtain an effective human activity representation using a limited set of annotated data. This research is aimed on acquiring human activity representation in order to improve the accuracy of ...
Exploring DeshuffleGANs in Self-Supervised Generative Adversarial Networks
Highlights
- We compare the deshuffling with the other self-supervision tasks on various datasets.
Abstract
Generative Adversarial Networks (GANs) have become the most used networks towards solving the problem of image generation. Self-supervised GANs are later proposed to avoid the catastrophic forgetting of the discriminator and to improve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PETRA '21: Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference

June 2021

593 pages

ISBN:9781450387927

DOI:10.1145/3453892

Conference Chair:
Fillia Makedon

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

PETRA '21

PETRA '21: The 14th PErvasive Technologies Related to Assistive Environments Conference

June 29 - July 2, 2021

Corfu, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
771
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)18

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu YLiu XNi RYang SZhao YKot A(2024)PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330989934:8(6926-6936)Online publication date: Aug-2024
https://doi.org/10.1109/TCSVT.2023.3309899
Hossain Khan MMcElwain NHasegawa-Johnson MIslam B(2024)InfantMotion2Vec: Unlabeled Data-Driven Infant Pose Estimation Using a Single Chest IMU2024 IEEE 20th International Conference on Body Sensor Networks (BSN)10.1109/BSN63547.2024.10780750(1-4)Online publication date: 15-Oct-2024
https://doi.org/10.1109/BSN63547.2024.10780750
Zaki Zadeh MRamesh Babu AJaiswal AMakedon F(2022)Self-Supervised Human Activity Representation for Embodied Cognition AssessmentTechnologies10.3390/technologies1001003310:1(33)Online publication date: 17-Feb-2022
https://doi.org/10.3390/technologies10010033
Khater SHadhoud MFayek M(2022)A novel human activity recognition architecture: using residual inception ConvLSTM layerJournal of Engineering and Applied Science10.1186/s44147-022-00098-069:1Online publication date: 21-May-2022
https://doi.org/10.1186/s44147-022-00098-0
Presotto RCivitarese GBettini C(2022)FedCLAR: Federated Clustering for Personalized Sensor-Based Human Activity Recognition2022 IEEE International Conference on Pervasive Computing and Communications (PerCom)10.1109/PerCom53586.2022.9762352(227-236)Online publication date: 21-Mar-2022
https://doi.org/10.1109/PerCom53586.2022.9762352

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten