Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

IF-ConvTransformer: A Framework for Human Activity Recognition Using IMU Fusion and ConvTransformer

Published: 07 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Recent advances in sensor based human activity recognition (HAR) have exploited deep hybrid networks to improve the performance. These hybrid models combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to leverage their complementary advantages, and achieve impressive results. However, the roles and associations of different sensors in HAR are not fully considered by these models, leading to insufficient multi-modal fusion. Besides, the commonly used RNNs in HAR suffer from the 'forgetting' defect, which raises difficulties in capturing long-term information. To tackle these problems, an HAR framework composed of an Inertial Measurement Unit (IMU) fusion block and an applied ConvTransformer subnet is proposed in this paper. Inspired by the complementary filter, our IMU fusion block performs multi-modal fusion of commonly used sensors according to their physical relationships. Consequently, the features of different modalities can be aggregated more effectively. Then, the extracted features are fed into the applied ConvTransformer subnet for classification. Thanks to its convolutional subnet and self-attention layers, ConvTransformer can better capture local features and construct long-term dependencies. Extensive experiments on eight benchmark datasets demonstrate the superior performance of our framework. The source code will be published soon.

    References

    [1]
    Alireza Abedin, Mahsa Ehsanpour, Qinfeng Shi, Hamid Rezatofighi, and Damith C Ranasinghe. 2021. Attend and Discriminate: Beyond the State-of-the-Art for Human Activity Recognition Using Wearable Sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1 (2021), 1--22.
    [2]
    Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. 2013. A public domain dataset for human activity recognition using smartphones. In Esann, Vol. 3. 3.
    [3]
    Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).
    [4]
    Carlos Betancourt, Wen-Hui Chen, and Chi-Wei Kuan. 2020. Self-attention networks for human activity recognition using wearable devices. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 1194--1199.
    [5]
    Charikleia Chatzaki, Matthew Pediaditis, George Vavoulas, and Manolis Tsiknakis. 2016. Human daily activity and fall recognition using a smartphone's acceleration sensor. In International Conference on Information and Communication Technologies for Ageing Well and E-Health. Springer, 100--118.
    [6]
    Ricardo Chavarriaga, Hesam Sagha, Alberto Calatroni, Sundara Tejaswi Digumarti, Gerhard Tröster, José del R Millán, and Daniel Roggen. 2013. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognition Letters 34, 15 (2013), 2033--2042.
    [7]
    Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao Liu. 2021. Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges, and Opportunities. ACM Computing Surveys (CSUR) 54, 4 (2021), 1--40.
    [8]
    Yuwen Chen, Kunhua Zhong, Ju Zhang, Qilong Sun, and Xueliang Zhao. 2016. LSTM networks for mobile human activity recognition. In 2016 International Conference on Artificial Intelligence: Technologies and Applications. Atlantis Press, 50--53.
    [9]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [10]
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
    [11]
    Stefan Duffner, Samuel Berlemont, Grégoire Lefebvre, and Christophe Garcia. 2014. 3D gesture classification with convolutional neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 5432--5436.
    [12]
    Davide Figo, Pedro C Diniz, Diogo R Ferreira, and Joao MP Cardoso. 2010. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing 14, 7 (2010), 645--662.
    [13]
    Hristijan Gjoreski, Mathias Ciliberto, Lin Wang, Francisco Javier Ordonez Morales, Sami Mekki, Stefan Valentin, and Daniel Roggen. 2018. The university of sussex-huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access 6 (2018), 42592--42604.
    [14]
    Yu Guan and Thomas Plötz. 2017. Ensembles of deep lstm learners for activity recognition using wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1--28.
    [15]
    Sojeong Ha and Seungjin Choi. 2016. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In 2016 International Joint Conference on Neural Networks. IEEE, 381--388.
    [16]
    Nils Y Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. Twenty-Fifth International Joint Conference on Artificial Intelligence (2016), 1533--1540.
    [17]
    Harish Haresamudram, Apoorva Beedu, Varun Agrawal, Patrick L Grady, Irfan Essa, Judy Hoffman, and Thomas Plötz. 2020. Masked reconstruction based self-supervision for human activity recognition. In Proceedings of the 2020 International Symposium on Wearable Computers. 45--49.
    [18]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.
    [19]
    Artur Jordao, Antonio C Nazare Jr, Jessica Sena, and William Robson Schwartz. 2018. Human activity recognition based on wearable sensor data: A standardization of the state-of-the-art. arXiv preprint arXiv:1806.05226 (2018).
    [20]
    Dongwon Jung and Panagiotis Tsiotras. 2007. Inertial attitude and position reference system development for a small UAV. In In AIAA Infotech@ Aerospace 2007 Conference and Exhibit. 2763--2778.
    [21]
    Malik Kamal Mazhar, Muhammad Jawad Khan, Aamer Iqbal Bhatti, and Noman Naseer. 2020. A novel roll and pitch estimation approach for a ground vehicle stability improvement using a low cost IMU. Sensors 20, 2 (2020), 340.
    [22]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [23]
    Hyeokhyen Kwon, Gregory D Abowd, and Thomas Plötz. 2018. Adding structural characteristics to distribution-based accelerometer representations for activity recognition using wearables. In Proceedings of the 2018 ACM International Symposium on Wearable Computers. 72--75.
    [24]
    Shengzhong Liu, Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Huajie Shao, and Tarek Abdelzaher. 2020. GIobalFusion: A Global Attentional Deep Learning Framework for Multisensor Information Fusion. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--27.
    [25]
    Zhouyong Liu, Shun Luo, Wubin Li, Jingben Lu, Yufan Wu, Chunguo Li, and Luxi Yang. 2020. ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis. arXiv preprint arXiv:2011.10185 (2020).
    [26]
    Haojie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, and Sanglu Lu. 2019. AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition. In Twenty-Eighth International Joint Conference on Artificial Intelligence. 3109--3115.
    [27]
    Saif Mahmud, M Tonmoy, Kishor Kumar Bhaumik, AKM Rahman, M Ashraful Amin, Mohammad Shoyaib, Muhammad Asif Hossain Khan, and Amin Ahsan Ali. 2020. Human activity recognition from wearable sensor data using self-attention. In 24th European Conference on Artificial Intelligence (ECAI). 1332--1339.
    [28]
    Robert Mahony, Tarek Hamel, and J-M Pflimlin. 2005. Complementary filter design on the special orthogonal group SO (3). In Proceedings of the 44th IEEE Conference on Decision and Control. IEEE, 1477--1484.
    [29]
    Robert Mahony, Tarek Hamel, and Jean-Michel Pflimlin. 2008. Nonlinear complementary filters on the special orthogonal group. IEEE Trans. Automat. Control 53, 5 (2008), 1203--1218.
    [30]
    Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. 2019. Mobile sensor data anonymization. In Proceedings of the International Conference on Internet of Things Design and Implementation. 49--58.
    [31]
    Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. 2020. Privacy and utility preserving sensor-data transformations. Pervasive and Mobile Computing 63 (2020), 101132.
    [32]
    Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-based sensor fusion techniques for multimodal human activity recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers. 158--165.
    [33]
    Vishvak S Murahari and Thomas Plötz. 2018. On attention models for human activity recognition. In Proceedings of the 2018 ACM International Symposium on Wearable Computers. 100--103.
    [34]
    Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (2016), 115.
    [35]
    Thomas Plötz, Nils Y Hammerla, and Patrick L Olivier. 2011. Feature learning for activity recognition in ubiquitous computing. In Twenty-Second International Joint Conference on Artificial Intelligence.
    [36]
    Attila Reiss and Didier Stricker. 2012. Introducing a new benchmarked dataset for activity monitoring. In 2012 16th International Symposium on Wearable Computers. IEEE, 108--109.
    [37]
    Jorge-L Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, and Davide Anguita. 2016. Transition-aware human activity recognition using smartphones. Neurocomputing 171 (2016), 754--767.
    [38]
    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618--626.
    [39]
    Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).
    [40]
    Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. 2021. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16519--16529.
    [41]
    Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. 127--140.
    [42]
    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
    [43]
    Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey. arXiv preprint arXiv:2009.06732 (2020).
    [44]
    Dipanwita Thakur and Suparna Biswas. 2020. Smartphone based human activity monitoring and recognition using ML and DL: a comprehensive survey. Journal of Ambient Intelligence and Humanized Computing (2020), 1--12.
    [45]
    M Tonmoy, Saif Mahmud, AKM Mahbubur Rahman, M Ashraful Amin, and Amin Ahsan Ali. 2021. Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 351--363.
    [46]
    Vincent T Van Hees, Lukas Gorzelniak, Emmanuel Carlos Dean León, Martin Eder, Marcelo Pias, Salman Taherian, Ulf Ekelund, Frida Renström, Paul W Franks, Alexander Horsch, et al. 2013. Separating movement and gravity components in an acceleration signal and implications for the assessment of human daily physical activity. PloS One 8, 4 (2013), e61691.
    [47]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
    [48]
    Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.
    [49]
    Lin Wang, Hristijan Gjoreski, Mathias Ciliberto, Sami Mekki, Stefan Valentin, and Daniel Roggen. 2018. Benchmarking the SHL recognition challenge with classical and deep-learning pipelines. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers. 1626--1635.
    [50]
    Cheng Xu, Duo Chai, Jie He, Xiaotong Zhang, and Shihong Duan. 2019. InnoHAR: A deep neural network for complex human activity recognition. IEEE Access 7 (2019), 9893--9902.
    [51]
    Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In Twenty-Fourth International Joint Conference on Artificial Intelligence, Vol. 15. Buenos Aires, Argentina, 3995--4001.
    [52]
    Rui Yao, Guosheng Lin, Qinfeng Shi, and Damith C Ranasinghe. 2018. Efficient dense labelling of human activity sequences from wearables using fully convolutional networks. Pattern Recognition 78 (2018), 252--266.
    [53]
    Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web. 351--360.
    [54]
    Shuochao Yao, Yiran Zhao, Huajie Shao, Dongxin Liu, Shengzhong Liu, Yifan Hao, Ailing Piao, Shaohan Hu, Su Lu, and Tarek F Abdelzaher. 2019. SADeepSense: Self-attention deep learning framework for heterogeneous on-device sensors in internet of things applications. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1243--1251.
    [55]
    Ming Zeng, Haoxiang Gao, Tong Yu, Ole J Mengshoel, Helge Langseth, Ian Lane, and Xiaobing Liu. 2018. Understanding and improving recurrent networks for human activity recognition by continuous attention. In Proceedings of the 2018 ACM International Symposium on Wearable Computers. 56--63.
    [56]
    Chao Zhang, Zichao Yang, Xiaodong He, and Li Deng. 2020. Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing 14, 3 (2020), 478--493.
    [57]
    Beidi Zhao, Shuai Li, and Yanbo Gao. 2020. IndRNN based long-term temporal recognition in the spatial and frequency domain. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 368--372.
    [58]
    Yida Zhu, Fang Zhao, and Runze Chen. 2019. Applying 1D sensor DenseNet to Sussex-Huawei locomotion-transportation recognition challenge. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers. 873--877.

    Cited By

    View all
    • (2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
    • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
    • (2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
    • Show More Cited By

    Index Terms

    1. IF-ConvTransformer: A Framework for Human Activity Recognition Using IMU Fusion and ConvTransformer

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
        Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 6, Issue 2
        July 2022
        1551 pages
        EISSN:2474-9567
        DOI:10.1145/3547347
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 July 2022
        Published in IMWUT Volume 6, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. ConvTransformer
        2. Human Activity Recognition
        3. complementary filter
        4. multi-sensor fusion

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)997
        • Downloads (Last 6 weeks)124
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
        • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
        • (2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
        • (2024)AutoAugHAR: Automated Data Augmentation for Sensor-based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595898:2(1-27)Online publication date: 15-May-2024
        • (2024)Intelligent Wearable Systems: Opportunities and Challenges in Health and SportsACM Computing Surveys10.1145/364846956:7(1-42)Online publication date: 9-Apr-2024
        • (2024)MetaFormerProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435508:1(1-27)Online publication date: 6-Mar-2024
        • (2024)Community Archetypes: An Empirical Framework for Guiding Research Methodologies to Reflect User Experiences of Sense of Virtual Community on RedditProceedings of the ACM on Human-Computer Interaction10.1145/36373108:CSCW1(1-33)Online publication date: 26-Apr-2024
        • (2024)Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314447:4(1-23)Online publication date: 12-Jan-2024
        • (2024)RLocProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314377:4(1-28)Online publication date: 12-Jan-2024
        • (2024)HyperTrackingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314347:4(1-26)Online publication date: 12-Jan-2024
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media