Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3453892.3453893acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article
Open access

Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks

Published: 29 June 2021 Publication History

Abstract

This article proposes a novel approach for augmenting generative adversarial network (GAN) with a self-supervised task in order to improve its ability for encoding video representations that are useful in downstream tasks such as human activity recognition. In the proposed method, input video frames are randomly transformed by different spatial transformations, such as rotation, translation and shearing or temporal transformations such as shuffling temporal order of frames. Then discriminator is encouraged to predict the applied transformation by introducing an auxiliary loss. Subsequently, results prove superiority of the proposed method over baseline methods for providing a useful representation of videos used in human activity recognition performed on datasets such as KTH, UCF101 and Ball-Drop. Ball-Drop dataset is a specifically designed dataset for measuring executive functions in children through physically and cognitively demanding tasks. Using features from proposed method instead of baseline methods caused the top-1 classification accuracy to increase by more then 4%. Moreover, ablation study was performed to investigate the contribution of different transformations on downstream task.

References

[1]
Unaiza Ahsan, Chen Sun, and Irfan Essa. 2018. DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks. arxiv:1801.07230 [cs.CV]
[2]
Sandra Aigner and Marco Körner. 2018. FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing GANs. arxiv:1810.01325 [cs.CV]
[3]
A. R. Babu, M. Zakizadeh, J. R. Brady, D. Calderon, and F. Makedon. 2019. An Intelligent Action Recognition System to assess Cognitive Behavior for Executive Function Disorder. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE). 164–169. https://doi.org/10.1109/COASE.2019.8843199
[4]
Javier Selva Castelló. 2018. A Comprehensive Survey on Deep Future Frame Video Prediction. Master’s thesis. Universitat de Barcelona, The address of the publisher. An optional note.
[5]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arxiv:2002.05709 [cs.LG]
[6]
Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, and Neil Houlsby. 2018. Self-Supervised GANs via Auxiliary Rotation Loss. arxiv:1811.11212
[7]
Alex Dillhoff, Konstantinos Tsiakas, Ashwin Ramesh Babu, Mohammad Zakizadehghariehali, Benjamin Buchanan, Morris Bell, Vassilis Athitsos, and Fillia Makedon. 2019. An Automated Assessment System for Embodied Cognition in Children: From Motion Data to Executive Functioning. In Proceedings of the 6th International Workshop on Sensor-Based Activity Recognition and Interaction (Rostock, Germany) (iWOAR ’19). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages. https://doi.org/10.1145/3361684.3361693
[8]
Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised Visual Representation Learning by Context Prediction. arxiv:1505.05192
[9]
Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. arxiv:1803.07728 [cs.CV]
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger(Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
[11]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. arxiv:1704.00028
[12]
Tengda Han, Weidi Xie, and Andrew Zisserman. 2019. Video Representation Learning by Dense Predictive Coding. arxiv:1909.04656 [cs.CV]
[13]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00685
[14]
Ashish Jaiswal, Ashwin ramesh babu, Mohammad Zadeh, Debapriya Banerjee, and Fillia Makedon. 2020. A Survey on Contrastive Self-Supervised Learning. Technologies 9 (12 2020), 2. https://doi.org/10.3390/technologies9010002
[15]
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Ryan Sepassi, George Tucker, and Henryk Michalewski. 2019. Model-Based Reinforcement Learning for Atari. arxiv:1903.00374
[16]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arxiv:1412.6980 [cs.LG]
[17]
Yu Kong and Yun Fu. 2018. Human Action Recognition and Prediction: A Survey. arxiv:1806.11230 [cs.CV]
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[19]
Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, and Sylvain Gelly. 2018. A Large-Scale Study on Regularization and Normalization in GANs. arxiv:1807.04720 [cs.LG]
[20]
Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep Learning. Nature 521, 10 (2015), 436–444. https://doi.org/10.1038/nature14539
[21]
Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2017. Unsupervised Representation Learning by Sorting Sequences. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv.2017.79
[22]
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. 2017. Are GANs Created Equal? A Large-Scale Study. arxiv:1711.10337 [stat.ML]
[23]
Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arxiv:1511.05440 [cs.LG]
[24]
Ishan Misra, C. Lawrence Zitnick, and Martial Hebert. 2016. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. arxiv:1603.08561 [cs.CV]
[25]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. arxiv:1802.05957
[26]
Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, Vol. 9910. 69–84. https://doi.org/10.1007/978-3-319-46466-4_5
[27]
Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. 2017. Representation Learning by Learning to Count. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv.2017.628
[28]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arxiv:1912.01703 [cs.LG]
[29]
Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2016. Learning Features by Watching Objects Move. arxiv:1612.06370 [cs.CV]
[30]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context Encoders: Feature Learning by Inpainting. arxiv:1604.07379 [cs.CV]
[31]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arxiv:1511.06434
[32]
Ashwin ramesh babu, Mohammad Zadeh, Ashish Jaiswal, Alexis Lueckenhoff, Maria Kyrarini, and Fillia Makedon. 2020. A Multi-modal System to Assess Cognition in Children from their Physical Movements. https://doi.org/10.1145/3382507.3418829
[33]
Christian Schüldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: A local SVM approach. Proceedings - International Conference on Pattern Recognition 3, 32 – 36 Vol.3. https://doi.org/10.1109/ICPR.2004.1334462
[34]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arxiv:1212.0402 [cs.CV]
[35]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
[36]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00675
[37]
Trieu H. Trinh, Minh-Thang Luong, and Quoc V. Le. 2019. Selfie: Self-supervised Pretraining for Image Embedding. arxiv:1906.02940 [cs.LG]
[38]
Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful Image Colorization. arxiv:1603.08511 [cs.CV]

Cited By

View all
  • (2024)PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330989934:8(6926-6936)Online publication date: Aug-2024
  • (2024)InfantMotion2Vec: Unlabeled Data-Driven Infant Pose Estimation Using a Single Chest IMU2024 IEEE 20th International Conference on Body Sensor Networks (BSN)10.1109/BSN63547.2024.10780750(1-4)Online publication date: 15-Oct-2024
  • (2022)Self-Supervised Human Activity Representation for Embodied Cognition AssessmentTechnologies10.3390/technologies1001003310:1(33)Online publication date: 17-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PETRA '21: Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference
June 2021
593 pages
ISBN:9781450387927
DOI:10.1145/3453892
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cognitive assessment
  2. computer vision
  3. deep learning
  4. human-computer interaction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PETRA '21

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)199
  • Downloads (Last 6 weeks)18
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330989934:8(6926-6936)Online publication date: Aug-2024
  • (2024)InfantMotion2Vec: Unlabeled Data-Driven Infant Pose Estimation Using a Single Chest IMU2024 IEEE 20th International Conference on Body Sensor Networks (BSN)10.1109/BSN63547.2024.10780750(1-4)Online publication date: 15-Oct-2024
  • (2022)Self-Supervised Human Activity Representation for Embodied Cognition AssessmentTechnologies10.3390/technologies1001003310:1(33)Online publication date: 17-Feb-2022
  • (2022)A novel human activity recognition architecture: using residual inception ConvLSTM layerJournal of Engineering and Applied Science10.1186/s44147-022-00098-069:1Online publication date: 21-May-2022
  • (2022)FedCLAR: Federated Clustering for Personalized Sensor-Based Human Activity Recognition2022 IEEE International Conference on Pervasive Computing and Communications (PerCom)10.1109/PerCom53586.2022.9762352(227-236)Online publication date: 21-Mar-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media