Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MultiSense: Cross-labelling and Learning Human Activities Using Multimodal Sensing Data

Published: 17 April 2023 Publication History

Abstract

To tap into the gold mine of data generated by Internet of Things (IoT) devices with unprecedented volume and value, there is an urgent need to efficiently and accurately label raw sensor data. To this end, we explore and leverage the hidden connections among the multimodal data collected by various sensing devices and propose to let different modal data complement and learn from each other. But it is challenging to align and fuse multimodal data without knowing their perception (and thus the correct labels). In this work, we propose MultiSense, a paradigm for automatically mining potential perception, cross-labelling each modal data, and then updating the learning models for recognizing human activity to achieve higher accuracy or even recognize new activities. We design innovative solutions for segmenting, aligning, and fusing multimodal data from different sensors, as well as model updating mechanism. We implement our framework and conduct comprehensive evaluations on a rich set of data. Our results demonstrate that MultiSense significantly improves the data usability and the power of the learning models. With nine diverse activities performed by users, our framework automatically labels multimodal sensing data generated by five different sensing mechanisms (video, smart watch, smartphone, audio, and wireless-channel) with an average accuracy 98.5%. Furthermore, it enables models of some modalities to learn unknown activities from other modalities and greatly improves the activity recognition ability.

References

[1]
Rebecca Adaimi and Edison Thomaz. 2019. Leveraging active learning and conditional mutual information to minimize data annotation in human activity recognition. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 3, 3 (2019), 1–23.
[2]
Rosa Ma Alsina-Pagès, Joan Navarro, Francesc Alías, and Marcos Hervás. 2017. homesound: Real-time audio event detection based on high performance computing for behaviour and surveillance remote monitoring. Sensors 17, 4 (2017), 854.
[3]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’17). 7291–7299.
[4]
Heng-Tze Cheng, Feng-Tso Sun, Martin Griss, Paul Davis, Jianguo Li, and Di You. 2013. Nuactiv: Recognizing unseen new activities using semantic attribute-based learning. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 361–374.
[5]
Changde Du, Changying Du, Hao Wang, Jinpeng Li, Wei-Long Zheng, Bao-Liang Lu, and Huiguang He. 2018. Semi-supervised deep generative modelling of incomplete multi-modality emotional data. In Proceedings of the 26th ACM international conference on Multimedia. 108–116.
[6]
Aymen Fakhreddine, Domenico Giustiniano, and Vincent Lenders. 2018. Data fusion for hybrid and autonomous time-of-flight positioning. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’18). IEEE, 266–271.
[7]
Yasuhiro Fujiwara and Go Irie. 2014. Efficient label propagation. In International Conference on Machine Learning. 784–792.
[8]
Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, and Ivan Marsic. 2018. Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder. In Proceedings of the 26th ACM International Conference on Multimedia. 537–545.
[9]
Jun Han, Albert Jin Chung, and Patrick Tague. 2017. Pitchln: Eavesdropping via intelligible speech reconstruction using non-acoustic sensor fusion. In Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks. 181–192.
[10]
Po-Yao Huang, Guoliang Kang, Wenhe Liu, Xiaojun Chang, and Alexander G. Hauptmann. 2019. Annotation efficient cross-modal retrieval with adversarial attentive alignment. In Proceedings of the 27th ACM International Conference on Multimedia. 1758–1767.
[11]
Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the 25th ACM international conference on Multimedia. 795–816.
[12]
R. Kothari and V. Jain. 2002. Learning from labeled and unlabeled data. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’02), Vol. 3. 2803–2808.
[13]
Huafeng Kuang, Rongrong Ji, Hong Liu, Shengchuan Zhang, Xiaoshuai Sun, Feiyue Huang, and Baochang Zhang. 2019. Multi-modal multi-layer fusion network with average binary center loss for face anti-spoofing. In Proceedings of the 27th ACM International Conference on Multimedia. 48–56.
[14]
Hyeokhyen Kwon, Gregory D. Abowd, and Thomas Plötz. 2019. Handling annotation uncertainty in human activity recognition. In Proceedings of the 23rd International Symposium on Wearable Computers. 109–117.
[15]
Jiayu Lei, Zheng Zhang, Lan Zhang, and Xiang-Yang Li. 2022. COCA: Cost-Effective collaborative annotation system by combining experts and amateurs. In Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE’22). IEEE.
[16]
Kehan Li, Jiming Chen, Baosheng Yu, Zhangchong Shen, Chao Li, and Shibo He. 2020. Supreme: Fine-grained radio map reconstruction via spatial-temporal fusion network. In Proceedings of the 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’20). IEEE, 1–12.
[17]
Sheng Li, Xugang Lu, Shinsuke Sakai, Masato Mimura, and Tatsuya Kawahara. 2017. Semi-supervised ensemble DNN acoustic model training. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 5270–5274.
[18]
Cihang Liu, Lan Zhang, Zongqian Liu, Kebin Liu, Xiangyang Li, and Yunhao Liu. 2016. Lasagna: Towards deep hierarchical understanding and searching over mobile sensing data. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 334–347.
[19]
Shengzhong Liu, Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Huajie Shao, and Tarek Abdelzaher. 2020. GIobalFusion: A global attentional deep learning framework for multisensor information fusion. Proc. ACM Interact. Mobile Wear. Ubiq. Technol. 4, 1 (2020), 1–27.
[20]
Jiaxin Ma, Hao Tang, Wei-Long Zheng, and Bao-Liang Lu. 2019. Emotion recognition using multimodal residual LSTM network. In Proceedings of the 27th ACM International Conference on Multimedia. 176–183.
[21]
Akhil Mathur, Tianlin Zhang, Sourav Bhattacharya, Petar Velickovic, Leonid Joffe, Nicholas D. Lane, Fahim Kawsar, and Pietro Lió. 2018. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’18). IEEE, 200–211.
[22]
Xiao Ning, Yang Panlong, Yan Yubo, Zhou Hao, and Li Xiang-Yang. 2018. Motion-Fi: Recognizing and counting repetitive motions with passive wireless backscattering. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’18). IEEE, 2024–2032.
[23]
Abhinav Parate, Meng-Chieh Chiu, Chaniel Chadowitz, Deepak Ganesan, and Evangelos Kalogerakis. 2014. Risq: Recognizing smoking gestures with inertial sensors on a wristband. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 149–161.
[24]
Glenn Shafer. 1976. A Mathematical Theory of Evidence. Vol. 42. Princeton University Press.
[25]
Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Found. Trends Mach. Learn. 4, 2 (2012), 107–194.
[26]
Longfei Shangguan, Zimu Zhou, and Kyle Jamieson. 2017. Enabling gesture-based interactions with objects. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 239–251.
[27]
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’17). 1145–1153.
[28]
Pardeep Singla, Manoj Duhan, and Sumit Saroha. 2022. An ensemble method to forecast 24-h ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network. Earth Sci. Inf. 15, 1 (2022), 291–306.
[29]
Muhammad Tanveer, Ashraf Haroon Rashid, M. A. Ganaie, Motahar Reza, Imran Razzak, and Kai-Lung Hua. 2021. Classification of Alzheimer’s disease using ensemble of deep neural networks trained through transfer learning. IEEE J. Biomed. Health Inf. 26, 4 (2021), 1453–1463.
[30]
Bugra Tekin, Pablo Marquez Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2d and 3d image cues for monocular body pose estimation. In Proceedings of the International Conference on Computer Vision (ICCV’17). 3941–3950.
[31]
Isaac Triguero, Salvador García, and Francisco Herrera. 2015. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowl. Inf. Syst. 42, 2 (2015), 245–284.
[32]
Gul Varol, Ivan Laptev, and Cordelia Schmid. 2017. Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2017), 1510–1517.
[33]
Bin Wang, Bing Xue, and Mengjie Zhang. 2020. Particle swarm optimisation for evolving deep neural networks for image classification by evolving and stacking transferable blocks. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’20). IEEE, 1–8.
[34]
Nian Wang, Zhe Zhang, Tingting Li, Jing Xiao, and Li Cui. 2019. SGSF: A small groups based serial fusion method. In Proceedings of the 18th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’19). IEEE, 97–108.
[35]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’16). 4724–4732.
[36]
Stefan Wilk, Manisha Luthra, and Wolfgang Effelsberg. 2016. One sensor is not enough: Adapting and fusing sensors for the quality assessment of user generated video. In Proceedings of the 24th ACM international conference on Multimedia. 626–630.
[37]
Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. 2015. Deep multiple instance learning for image classification and auto-annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3460–3469.
[38]
Tianwei Xing, Sandeep Singh Sandha, Bharathan Balaji, Supriyo Chakraborty, and Mani Srivastava. 2018. Enabling edge devices that learn from each other: Cross modal training for activity recognition. In Proceedings of the 1st International Workshop on Edge Systems, Analytics and Networking. ACM, 37–42.
[39]
Jun Xu, Ting Yao, Yongdong Zhang, and Tao Mei. 2017. Learning multimodal attention LSTM networks for video captioning. In Proceedings of the 25th ACM International Conference on Multimedia. 537–545.
[40]
Hongfei Xue, Wenjun Jiang, Chenglin Miao, Ye Yuan, Fenglong Ma, Xin Ma, Yijiang Wang, Shuochao Yao, Wenyao Xu, Aidong Zhang, et al. 2019. DeepFusion: A deep learning framework for the fusion of heterogeneous sensory data. In Proceedings of the 20th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 151–160.
[41]
Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 351–360.
[42]
Xuanke You, Lan Zhang, Haikuo Yu, Mu Yuan, and Xiang-Yang Li. 2022. KATN: Key activity detection via inexact supervised learning. (unpublished).
[43]
Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, and Xiang-Yang Li. 2022. InFi: End-to-End learnable input filter for resource-efficient mobile-centric inference. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking (MobiCom’22). ACM, 228–241.
[44]
Mu Yuan, Lan Zhang, and Xiang-Yang Li. 2022. MLink: Linking black-box models for collaborative multi-model inference. In Proceedings fo the AAAI Conference on Artificial Intelligence.
[45]
Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-Grained acoustic-based device-free tracking. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 15–28.
[46]
Lan Zhang, Kebin Liu, Yonghang Jiang, Xiang-Yang Li, Yunhao Liu, Panlong Yang, and Zhenhua Li. 2017. Montage: Combine frames with movement continuity for realtime multi-user tracking. IEEE Trans. Mobile Comput. 16, 4 (2017), 1019–1031.
[47]
Lan Zhang, Daren Zheng, Zhengtao Wu, Mengjing Liu, Mu Yuan, Feng Han, and Xiang-Yang Li. 2021. MultiSense: Cross labelling and learning human activities using multimodal sensing data. In Proceedings of the IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS’21). IEEE, 401–409.

Cited By

View all
  • (2024)Fair and Robust Federated Learning via Decentralized and Adaptive Aggregation based on BlockchainACM Transactions on Sensor Networks10.1145/3673656Online publication date: 17-Jun-2024
  • (2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
  • (2024)xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object StorageACM Transactions on Architecture and Code Optimization10.1145/365260621:2(1-20)Online publication date: 21-May-2024
  • Show More Cited By

Index Terms

  1. MultiSense: Cross-labelling and Learning Human Activities Using Multimodal Sensing Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Sensor Networks
      ACM Transactions on Sensor Networks  Volume 19, Issue 3
      August 2023
      597 pages
      ISSN:1550-4859
      EISSN:1550-4867
      DOI:10.1145/3584865
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 17 April 2023
      Online AM: 31 January 2023
      Accepted: 26 November 2022
      Revised: 06 October 2022
      Received: 05 June 2022
      Published in TOSN Volume 19, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multimodel sensing data
      2. cross-labelling
      3. cross-learning

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Program of China
      • China National Natural Science Foundation
      • The Fundamental Research Funds for the Central Universities

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)183
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Fair and Robust Federated Learning via Decentralized and Adaptive Aggregation based on BlockchainACM Transactions on Sensor Networks10.1145/3673656Online publication date: 17-Jun-2024
      • (2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
      • (2024)xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object StorageACM Transactions on Architecture and Code Optimization10.1145/365260621:2(1-20)Online publication date: 21-May-2024
      • (2024)Suitable and Style-Consistent Multi-Texture Recommendation for Cartoon IllustrationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365251820:7(1-26)Online publication date: 16-May-2024
      • (2024)MS-GDA: Improving Heterogeneous Recipe Representation via Multinomial Sampling Graph Data AugmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364862020:7(1-23)Online publication date: 25-Apr-2024
      • (2024)MSEConv: A Unified Warping Framework for Video Frame InterpolationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3648364Online publication date: 14-Feb-2024
      • (2024)GMS-3DQA: Projection-Based Grid Mini-patch Sampling for 3D Model Quality AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381720:6(1-19)Online publication date: 8-Mar-2024
      • (2024)RAST: Restorable Arbitrary Style TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877020:5(1-21)Online publication date: 22-Jan-2024
      • (2024)Multiple Pseudo-Siamese Network with Supervised Contrast Learning for Medical Multi-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363744120:5(1-23)Online publication date: 11-Jan-2024
      • (2024)Viewpoint Disentangling and Generation for Unsupervised Object Re-IDACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363295920:5(1-23)Online publication date: 22-Jan-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media