Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TS2ACT: Few-Shot Human Activity Sensing with Cross-Modal Co-Learning

Published: 12 January 2024 Publication History

Abstract

Human Activity Recognition (HAR) based on embedded sensor data has become a popular research topic in ubiquitous computing, which has a wide range of practical applications in various fields such as human-computer interaction, healthcare, and motion tracking. Due to the difficulties of annotating sensing data, unsupervised and semi-supervised HAR methods are extensively studied, but their performance gap to the fully-supervised methods is notable. In this paper, we proposed a novel cross-modal co-learning approach called TS2ACT to achieve few-shot HAR. It introduces a cross-modal dataset augmentation method that uses the semantic-rich label text to search for human activity images to form an augmented dataset consisting of partially-labeled time series and fully-labeled images. Then it adopts a pre-trained CLIP image encoder to jointly train with a time series encoder using contrastive learning, where the time series and images are brought closer in feature space if they belong to the same activity class. For inference, the feature extracted from the input time series is compared with the embedding of a pre-trained CLIP text encoder using prompt learning, and the best match is output as the HAR classification results. We conducted extensive experiments on four public datasets to evaluate the performance of the proposed method. The numerical results show that TS2ACT significantly outperforms the state-of-the-art HAR methods, and it achieves performance close to or better than the fully supervised methods even using as few as 1% labeled data for model training. The source codes of TS2ACT are publicly available on GitHub1.

References

[1]
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. 2013. A public domain dataset for human activity recognition using smartphones. In Esann, Vol. 3. 3.
[2]
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).
[3]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423--443.
[4]
Vaneeta Bhardwaj, Rajat Joshi, and Anshu Mli Gaur. 2022. IoT-based smart health monitoring system for COVID-19. SN Computer Science 3, 2 (2022), 137.
[5]
Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, and Stefano Soatto. 2022. Memot: Multi-object tracking with memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8090--8100.
[6]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.
[7]
Saisakul Chernbumroong, Shuang Cang, Anthony Atkins, and Hongnian Yu. 2013. Elderly activities recognition and classification for applications in assisted living. Expert Systems with Applications 40, 5 (2013), 1662--1674.
[8]
Francesco Daghero, Daniele Jahier Pagliari, and Massimo Poncino. 2022. Two-stage Human Activity Recognition on Microcontrollers with Decision Trees and CNNs. In 2022 17th Conference on Ph. D Research in Microelectronics and Electronics (PRIME). IEEE, 173--176.
[9]
Amit Das, Ivan Tashev, and Shoaib Mohammed. 2017. Ultrasound based gesture recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 406--410. https://doi.org/10.1109/ICASSP.2017.7952187
[10]
Shohreh Deldari, Hao Xue, Aaqib Saeed, Daniel V. Smith, and Flora D. Salim. 2022. COCOA: Cross Modality Contrastive Learning for Sensor Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 108 (sep 2022), 28 pages.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[12]
Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition Using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 2, Article 11 (jun 2017), 28 pages.
[13]
Yu Guan and Thomas Plötz. 2017. Ensembles of deep lstm learners for activity recognition using wearables. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 1, 2 (2017), 1--28.
[14]
Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, and Shi-Min Hu. 2022. Segnext: Rethinking convolutional attention design for semantic segmentation. Advances in Neural Information Processing Systems 35 (2022), 1140--1156.
[15]
Prankit Gupta and Praminda Caleb-Solly. 2018. A framework for semi-supervised adaptive learning for activity recognition in healthcare applications. In Engineering Applications of Neural Networks: 19th International Conference, EANN 2018, Bristol, UK, September 3-5, 2018, Proceedings 19. Springer, 3--15.
[16]
Harish Haresamudram, Irfan Essa, and Thomas Plötz. 2022. Assessing the State of Self-Supervised Human Activity Recognition Using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 116 (sep 2022), 47 pages.
[17]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Shruthi K. Hiremath, Yasutaka Nishimura, Sonia Chernova, and Thomas Plötz. 2022. Bootstrapping Human Activity Recognition Systems for Smart Homes from Scratch. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 119 (sep 2022), 27 pages.
[20]
Shruthi K. Hiremath, Yasutaka Nishimura, Sonia Chernova, and Thomas Plötz. 2022. Bootstrapping Human Activity Recognition Systems for Smart Homes from Scratch. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 119 (sep 2022), 27 pages. https://doi.org/10.1145/3550294
[21]
Simon L. Jones, William Hue, Ryan M. Kelly, Rosemarie Barnett, Violet Henderson, and Raj Sengupta. 2021. Determinants of Longitudinal Adherence in Smartphone-Based Self-Tracking for Chronic Health Conditions: Evidence from Axial Spondyloarthritis. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 16 (mar 2021), 24 pages. https://doi.org/10.1145/3448093
[22]
Jacob W. Kamminga, Duc V. Le, Jan Pieter Meijers, Helena Bisby, Nirvana Meratnia, and Paul J.M. Havinga. 2018. Robust Sensor-Orientation-Independent Feature Selection for Animal Activity Recognition on Collar Tags. 2, 1, Article 15 (mar 2018), 27 pages. https://doi.org/10.1145/3191747
[23]
Hua Kang, Qianyi Huang, and Qian Zhang. 2022. Augmented Adversarial Learning for Human Activity Recognition with Partial Sensor Sets. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 122 (sep 2022), 30 pages.
[24]
Bulat Khaertdinov, Esam Ghaleb, and Stylianos Asteriadis. 2021. Contrastive self-supervised learning for sensor-based human activity recognition. In 2021 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 1--8.
[25]
Yeon-Wook Kim, Woo-Hyeong Cho, Kyu-Sung Kim, and Sangmin Lee. 2022. Inertial-Measurement-Unit-Based Novel Human Activity Recognition Algorithm Using Conformer. Sensors 22, 10 (2022), 3932.
[26]
Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, and Tomokazu Murakami. 2019. Mmact: A large-scale dataset for cross modal human action understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8658--8667.
[27]
Yang Liu, Keze Wang, Guanbin Li, and Liang Lin. 2021. Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition. IEEE Transactions on Image Processing 30 (2021), 5573--5588.
[28]
Haojie Ma, Zhijie Zhang, Wenzhong Li, and Sanglu Lu. 2021. Unsupervised human activity representation learning with multi-task deep clustering. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1 (2021), 1--25.
[29]
Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. 2018. Protecting sensory data against sensitive inferences. In Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems. 1--6.
[30]
Roberto Meattini, Simone Benatti, Umberto Scarcia, Daniele De Gregorio, Luca Benini, and Claudio Melchiorri. 2018. An sEMG-based human--robot interface for robotic hands using machine learning and synergies. IEEE Transactions on Components, Packaging and Manufacturing Technology 8, 7 (2018), 1149--1158.
[31]
Johannes Meyer, Adrian Frank, Thomas Schlebusch, and Enkeljeda Kasneci. 2022. A CNN-Based Human Activity Recognition System Combining a Laser Feedback Interferometry Eye Movement Sensor and an IMU for Context-Aware Smart Glasses. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 172 (dec 2022), 24 pages.
[32]
Rashid Minhas, Abdul Adeel Mohammed, and QM Jonathan Wu. 2011. Incremental learning in human action recognition based on snippets. IEEE Transactions on Circuits and Systems for Video Technology 22, 11 (2011), 1529--1541.
[33]
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
[34]
Sameera Palipana, David Rojas, Piyush Agrawal, and Dirk Pesch. 2018. FallDeFi: Ubiquitous fall detection using commodity Wi-Fi devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 1--25.
[35]
Pinky Paul and Thomas George. 2015. An effective approach for human activity recognition on smartphone. In 2015 IEEE International conference on engineering and technology (ICETECH). IEEE, 1--3.
[36]
Xin Qin, Yiqiang Chen, Jindong Wang, and Chaohui Yu. 2020. Cross-Dataset Activity Recognition via Adaptive Spatial-Temporal Transfer Learning. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 148 (sep 2020), 25 pages.
[37]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[38]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[39]
Anil Rahate, Rahee Walambe, Sheela Ramanna, and Ketan Kotecha. 2022. Multimodal co-learning: challenges, applications with datasets, recent advances and future directions. Information Fusion 81 (2022), 203--239.
[40]
Attila Reiss and Didier Stricker. 2012. Introducing a new benchmarked dataset for activity monitoring. In 2012 16th international symposium on wearable computers. IEEE, 108--109.
[41]
Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. 2019. Multi-task self-supervised learning for human activity detection. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 1--30.
[42]
Taoran Sheng and Manfred Huber. 2020. Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 2, Article 57 (jun 2020), 18 pages.
[43]
Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems 33 (2020), 596--608.
[44]
Elnaz Soleimani and Ehsan Nazerfard. 2021. Cross-subject transfer learning in human activity recognition systems using generative adversarial networks. Neurocomputing 426 (2021), 26--34.
[45]
Ekaterina H Spriggs, Fernando De La Torre, and Martial Hebert. 2009. Temporal segmentation and activity classification from first-person sensing. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 17--24.
[46]
Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM conference on embedded networked sensor systems. 127--140.
[47]
Jie Su, Zhenyu Wen, Tao Lin, and Yu Guan. 2022. Learning Disentangled Behaviour Patterns for Wearable-Based Human Activity Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 1, Article 28 (mar 2022), 19 pages.
[48]
Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, and Ping Luo. 2022. Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20993--21002.
[49]
Mahan Tabatabaie, Suining He, and Kang G. Shin. 2023. Cross-Modality Graph-Based Language and Sensor Data Co-Learning of Human-Mobility Interaction. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 3, Article 125 (sep 2023), 25 pages. https://doi.org/10.1145/3610904
[50]
Chi Ian Tang, Ignacio Perez-Pozuelo, Dimitris Spathis, Soren Brage, Nick Wareham, and Cecilia Mascolo. 2021. SelfHAR: Improving Human Activity Recognition through Self-Training with Unlabeled Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 36 (mar 2021), 30 pages.
[51]
Yin Tang, Lei Zhang, Fuhong Min, and Jun He. 2022. Multiscale deep feature learning for human activity recognition using wearable sensors. IEEE Transactions on Industrial Electronics 70, 2 (2022), 2106--2116.
[52]
Kei Tanigaki, Tze Chuin Teoh, Naoya Yoshimura, Takuya Maekawa, and Takahiro Hara. 2022. Predicting Performance Improvement of Human Activity Recognition Model by Additional Data Collection. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 142 (sep 2022), 33 pages.
[53]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In European conference on computer vision. Springer, 776--794.
[54]
Catherine Tong, Jinchen Ge, and Nicholas D Lane. 2021. Zero-shot learning for imu-based activity recognition using video embeddings. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1--23.
[55]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
[56]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[57]
Chongyang Wang, Yuan Gao, Akhil Mathur, Amanda C De C. Williams, Nicholas D Lane, and Nadia Bianchi-Berthouze. 2021. Leveraging activity recognition to enable protective behavior detection in continuous data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--27.
[58]
Chuyu Wang, Jian Liu, Yingying Chen, Lei Xie, Hong Bo Liu, and Sanclu Lu. 2018. RF-Kinect: A Wearable RFID-Based Approach Towards 3D Body Movement Tracking. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1, Article 41 (mar 2018), 28 pages.
[59]
Tong Wu, Yiqiang Chen, Yang Gu, Jiwei Wang, Siyu Zhang, and Zhanghu Zhechen. 2020. Multi-layer cross loss model for zero-shot human activity recognition. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11--14, 2020, Proceedings, Part I 24. Springer, 210--221.
[60]
Jianyu Xiao, Linlin Chen, Haipeng Chen, and Xuemin Hong. 2021. Baseline model training in sensor-based human activity recognition: An incremental learning approach. IEEE Access 9 (2021), 70261--70272.
[61]
Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, and Xiaolong Wang. 2022. Groupvit: Semantic segmentation emerges from text supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18134--18144.
[62]
Wenchao Xu, Yuxin Pang, Yanqin Yang, and Yanbo Liu. 2018. Human activity recognition based on convolutional neural network. In 2018 24th international conference on pattern recognition (ICPR). IEEE, 165--170.
[63]
Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In International Joint Conferences on Artificial Intelligence, Vol. 15. Buenos Aires, Argentina, 3995--4001.
[64]
Tianming Zhao, Jian Liu, Yan Wang, Hongbo Liu, and Yingying Chen. 2019. Towards low-cost sign language gesture recognition leveraging wearables. IEEE Transactions on Mobile Computing 20, 4 (2019), 1685--1701.
[65]
Han Zhou, Yi Gao, Xinyi Song, Wenxin Liu, and Wei Dong. 2020. LimbMotion: Decimeter-Level Limb Tracking for Wearable-Based Human-Computer Interaction. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 161 (sep 2020), 24 pages.

Cited By

View all
  • (2024)Weak-Annotation of HAR Datasets using Vision Foundation ModelsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676613(55-62)Online publication date: 5-Oct-2024
  • (2024)Multi-modal data clustering using deep learning: A systematic reviewNeurocomputing10.1016/j.neucom.2024.128348(128348)Online publication date: Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 4
December 2023
1613 pages
EISSN:2474-9567
DOI:10.1145/3640795
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 January 2024
Published in IMWUT Volume 7, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human activity recognition
  2. contrastive learning
  3. cross-modal dataset augmentation
  4. few-shot learning
  5. prompt learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)702
  • Downloads (Last 6 weeks)90
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Weak-Annotation of HAR Datasets using Vision Foundation ModelsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676613(55-62)Online publication date: 5-Oct-2024
  • (2024)Multi-modal data clustering using deep learning: A systematic reviewNeurocomputing10.1016/j.neucom.2024.128348(128348)Online publication date: Aug-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media